public class LineWordReader extends Object implements WordReader, Serializable
WordReader
that considers each line
of a document a single word.
The intended usage of this class is that of indexing stuff like lists of document
identifiers: if the identifiers contain nonalphabetical characters, the default
FastBufferedReader
might do a poor job.
Note that the non-word returned by next(MutableString, MutableString)
is
always empty.
Constructor and Description |
---|
LineWordReader() |
Modifier and Type | Method and Description |
---|---|
LineWordReader |
copy()
Returns a copy of this word reader.
|
boolean |
next(MutableString word,
MutableString nonWord)
Extracts the next word and non-word.
|
LineWordReader |
setReader(Reader reader)
Resets the internal state of this word reader, which will start again reading from the given reader.
|
public boolean next(MutableString word, MutableString nonWord) throws IOException
WordReader
If this method returns true, a new non-empty word, and possibly
a new non-word, have been extracted. It is acceptable
that the first call to this method after creation
or after a call to WordReader.setReader(Reader)
returns an empty
word. In other words both word
and nonWord
are maximal.
next
in interface WordReader
word
- the next word returned by the underlying reader.nonWord
- the nonword following the next word returned by the underlying reader.word
and nonWord
are unchanged).IOException
public LineWordReader setReader(Reader reader)
WordReader
setReader
in interface WordReader
reader
- the new reader providing characters.public LineWordReader copy()
WordReader
This method must return a word reader with a behaviour that matches exactly that of this word reader.
copy
in interface WordReader
Copyright © 2006–2019 SYSTAP, LLC DBA Blazegraph. All rights reserved.