public class PrefixFilter<E> extends FilterBase implements ITupleFilter<E>
Filter visits all ITuple
s whose keys begin with any of the specified
prefix(s). The filer accepts a key or an array of keys that define the key
prefix(s) whose completions will be visited. It efficiently forms the
successor of each key prefix, performs a key-range scan of the key prefix,
and (if more than one key prefix is given), seeks to the start of the next
key-range scan.
The prefix keys MUST be formed with StrengthEnum.Primary
.
This is necessary in order to match all keys in the index since it causes the
secondary characteristics to NOT be included in the prefix key even if they
are present in the keys in the index. Using other
StrengthEnum
s will result in secondary characteristics being encoded
by additional bytes appended to the key. This will result in scan matching
ONLY the given prefix key(s) and matching nothing if those prefix keys are
not actually present in the index.
For example, the Unicode text "Bryan" is encoded as the unsigned byte[]
[43, 75, 89, 41, 67]
at PRIMARY strength but as the unsigned byte[]
[43, 75, 89, 41, 67, 1, 9, 1, 143, 8]
at IDENTICAL strength. The additional bytes for the IDENTICAL strength reflect the Locale specific Unicode sort key encoding of secondary characteristics such as case. The successor of the IDENTICAL strength byte[] is
[43, 75, 89, 41, 67, 1, 9, 1, 143, 9]
(one was added to the last byte) which spans all keys of interest. However the successor of the PRIMARY strength byte[] would
[43, 75, 89, 41, 68]
and would ONLY span the single tuple whose key was "Bryan".
You can form an appropriate IKeyBuilder
for the prefix keys using
Properties properties = new Properties(); properties.setProperty(KeyBuilder.Options.STRENGTH, StrengthEnum.Primary.toString()); prefixKeyBuilder = KeyBuilder.newUnicodeInstance(properties);
Note: It is NOT trivial to define filter that may be used to accept only keys that extend the prefix on a caller-defined boundary (e.g., corresponding to the encoding of a whitespace or word break). There are two issues: (1) the keys are encoded so the filter needs to recognize the byte(s) in the Unicode sort key that correspond to, e.g., the work boundary. (2) the keys may have been encoded with secondary characteristics, in which case the boundary will not begin immediately after the prefix.
Modifier and Type | Field and Description |
---|---|
protected static org.apache.log4j.Logger |
log |
Constructor and Description |
---|
PrefixFilter(byte[] keyPrefix)
Completion scan with a single prefix.
|
PrefixFilter(byte[][] keyPrefix)
Completion scan with an array of key prefixes.
|
Modifier and Type | Method and Description |
---|---|
ITupleIterator<E> |
filterOnce(Iterator src,
Object context)
Wrap the source iterator with this filter.
|
addFilter, filter, getProperty, getRequiredProperty, setProperty, toString
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
getProperty
public PrefixFilter(byte[] keyPrefix)
keyPrefix
- An unsigned byte[] containing a key prefix.public PrefixFilter(byte[][] keyPrefix)
keyPrefix
- An array of unsigned byte prefixes (the elements of the array
MUST be presented in sorted order and null
s
are not permitted).public ITupleIterator<E> filterOnce(Iterator src, Object context)
FilterBase
filterOnce
in interface ITupleFilter<E>
filterOnce
in class FilterBase
src
- The source iterator.context
- The iterator evaluation context.Copyright © 2006–2019 SYSTAP, LLC DBA Blazegraph. All rights reserved.