V
- The generic type of the document identifier.public class TokenBuffer<V extends Comparable<V>> extends Object
TermFrequencyData
extracted from a field of some document. When the buffer overflows it is
flush()
, writing on the indices.Constructor and Description |
---|
TokenBuffer(int capacity,
FullTextIndex<V> textIndexer)
Ctor.
|
Modifier and Type | Method and Description |
---|---|
void |
add(V docId,
int fieldId,
String token)
Adds another token to the current field of the current document.
|
protected long |
deleteFromIndex(int n,
byte[][] keys,
byte[][] vals)
Writes on the index.
|
void |
flush()
Write any buffered data on the indices.
|
TermFrequencyData<V> |
get(int index)
Return the
TermFrequencyData for the specified index. |
void |
reset()
Discards all data in the buffer and resets it to a clean state.
|
int |
size()
The #of entries in the buffer.
|
protected long |
writeOnIndex(int n,
byte[][] keys,
byte[][] vals)
Writes on the index.
|
public TokenBuffer(int capacity, FullTextIndex<V> textIndexer)
capacity
- The #of distinct {document,field} tuples that can be held in
the buffer before it will overflow. The buffer will NOT
overflow until you exceed this capacity.textIndexer
- The object on which the buffer will write when it overflows or
is flush()
ed.public void reset()
public int size()
public TermFrequencyData<V> get(int index)
TermFrequencyData
for the specified index.index
- The index in [0:count).TermFrequencyData
at that index.IndexOutOfBoundsException
public void add(V docId, int fieldId, String token)
flush()
ed before beginning a new field.
Note: This method is NOT thread-safe.
Note: There is an assumption that the caller will process all tokens for a given field in the same document at once. Failure to do this will lead to only part of the term-frequency distribution for the field being captured by the indices.
docId
- The document identifier.fieldId
- The field identifier.token
- The token.public void flush()
Note: The writes on the terms index are scattered since the key for the index is {term, docId, fieldId}. This method will batch up and then apply a set of updates, but the total operation is not atomic. Therefore search results which are concurrent with indexing may not have access to the full data for concurrently indexed documents. This issue may be resolved by allowing the indexer to write ahead and using a historical commit time for the search.
Note: If a document is pre-existing, then the existing data for that document MUST be removed unless you know that the fields to be found in the will not have changed (they may have different contents, but the same fields exist in the old and new versions of the document).
protected long writeOnIndex(int n, byte[][] keys, byte[][] vals)
n
- keys
- vals
- protected long deleteFromIndex(int n, byte[][] keys, byte[][] vals)
n
- keys
- vals
- Copyright © 2006–2019 SYSTAP, LLC DBA Blazegraph. All rights reserved.