public class TextExtractor extends DefaultCallback
Modifier and Type | Field and Description |
---|---|
MutableString |
text
The text resulting from the parsing process.
|
MutableString |
title
The title resulting from the parsing process.
|
EMPTY_CALLBACK_ARRAY
Constructor and Description |
---|
TextExtractor() |
Modifier and Type | Method and Description |
---|---|
boolean |
characters(char[] characters,
int offset,
int length,
boolean flowBroken)
Receive notification of character data inside an element.
|
void |
configure(BulletParser parser)
Configure the parser to parse text.
|
boolean |
endElement(Element element)
Receive notification of the end of an element.
|
void |
startDocument()
Receive notification of the beginning of the document.
|
boolean |
startElement(Element element,
Map<Attribute,MutableString> attrMapUnused)
Receive notification of the start of an element.
|
cdata, endDocument, getInstance
public final MutableString text
public final MutableString title
public void configure(BulletParser parser)
configure
in interface Callback
configure
in class DefaultCallback
public void startDocument()
Callback
The callback must use this method to reset its internal state so that it can be resued. It must be safe to invoke this method several times.
startDocument
in interface Callback
startDocument
in class DefaultCallback
public boolean characters(char[] characters, int offset, int length, boolean flowBroken)
Callback
You must not write into text
, as it could be passed
around to many callbacks.
flowBroken
will be true iff
the flow was broken before text
. This feature makes it possible
to extract quickly the text in a document without looking at the elements.
characters
in interface Callback
characters
in class DefaultCallback
characters
- an array containing the character data.offset
- the start position in the array.length
- the number of characters to read from the array.flowBroken
- whether the flow is broken at the start of text
.public boolean endElement(Element element)
Callback
This method will never be called for element without closing tags, even if such a tag is found.
endElement
in interface Callback
endElement
in class DefaultCallback
element
- the element whose closing tag was found.public boolean startElement(Element element, Map<Attribute,MutableString> attrMapUnused)
Callback
For simple elements, this is the only notification that the callback will ever receive.
startElement
in interface Callback
startElement
in class DefaultCallback
element
- the element whose opening tag was found.attrMapUnused
- a map from Attribute
s to MutableString
s.Copyright © 2006–2019 SYSTAP, LLC DBA Blazegraph. All rights reserved.