public class BlobsIndexHelper extends Object
Modifier and Type | Class and Description |
---|---|
static class |
BlobsIndexHelper.CollisionBucketSizeException
Exception thrown if the maximum size of the collision bucket would be
exceeded for some
BigdataValue . |
Modifier and Type | Field and Description |
---|---|
static int |
LOG_WARN_COUNTER_THRESHOLD
Arbitrary threshold for the collision counter for a given hash code at
which we will log @ WARN.
|
static int |
MAX_COUNTER
The maximum value of the hash collision counter (unsigned short).
|
static int |
NOT_FOUND
Used to signal that the
Value was not found on a read-only
request. |
static int |
OFFSET_COUNTER
The offset at which the counter occurs in the key.
|
static int |
SIZEOF_COUNTER
The size of the hash collision counter.
|
static int |
SIZEOF_HASH |
static int |
SIZEOF_PREFIX_KEY
The size of a prefix key (a key without a hash collision counter).
|
static int |
TERMS_INDEX_KEY_SIZE
The size of a key in the TERMS index.
|
Constructor and Description |
---|
BlobsIndexHelper() |
Modifier and Type | Method and Description |
---|---|
int |
addBNode(IIndex ndx,
IKeyBuilder keyBuilder,
byte[] baseKey,
byte[] val,
byte[] tmp)
Add an entry for a
BNode to the TERMS index (do NOT use when told
blank node semantics apply). |
KVO<BigdataValue>[] |
generateKVOs(BigdataValueSerializer<BigdataValue> valSer,
BigdataValue[] terms,
int numTerms)
Generate the sort keys for
BigdataValue s to be represented as
BlobIV s. |
byte[] |
lookup(IIndex ndx,
BlobIV<?> iv,
IKeyBuilder keyBuilder)
Return the value associated with the
BlobIV in the TERMS index. |
byte[] |
makeKey(IKeyBuilder keyBuilder,
byte[] baseKey,
int counter)
Create a fully formed key for the TERMS index from a baseKey and a hash
collision counter.
|
byte[] |
makeKey(IKeyBuilder keyBuilder,
VTE vte,
int hashCode,
int counter)
Create a fully formed key for the TERMS index from the
VTE , the
hashCode of the BigdataValue , and the hash collision counter. |
byte[] |
makePrefixKey(IKeyBuilder keyBuilder,
BigdataValue value)
Create a prefix key for the TERMS index from the
BigdataValue . |
byte[] |
makePrefixKey(IKeyBuilder keyBuilder,
VTE vte,
int hashCode)
Create a prefix key for the TERMS index from the
VTE and hashCode
of the BigdataValue . |
IKeyBuilder |
newKeyBuilder()
Return a new
IKeyBuilder suitable for formatting keys for the
TERMS index. |
int |
resolveOrAddValue(IIndex termsIndex,
boolean readOnly,
IKeyBuilder keyBuilder,
byte[] baseKey,
byte[] val,
byte[] tmp,
AtomicInteger bucketSize)
Resolve an existing record in the TERMS index and insert the record if
none is found.
|
public static final transient int SIZEOF_HASH
public static final transient int SIZEOF_COUNTER
public static final transient int MAX_COUNTER
public static final transient int OFFSET_COUNTER
public static final transient int SIZEOF_PREFIX_KEY
public static final transient int TERMS_INDEX_KEY_SIZE
Note: The key is size is ONE (1) byte for the [flags] byte, ONE (1) for the extension byte (which describes what kind of non-inline IV this is), FOUR (4) bytes for the hash code, plus a TWO (2) byte counter (to break ties within a collision bucket).
Note: The counter size was increased when the design purpose of this
index was changed to handling large RDF Value
s only. In practice,
the hash codes of the RDF Value
are well distributed and
collisions within a hash bucket (same hash code) are rare. A ONE (1) byte
counter is probably all the distinctions that we could require and
permits up to 256 hash collisions. However, when operating in scale-out
the TWO (2) byte (aka short) counter provides additional confidence that
hash collisions will not result in a hash bucket overflow. Given that the
terms index will be used only with larger RDF Value
s and the
necessity for the "extension" byte, it seems a small added cost to have
the TWO (2) byte counter and provides additional peace of mind. However,
note that scanning large collision buckets is expensive. But by allowing
for large collision buckets, we will pay that cost only when the hash
codes have an unusual distribution for some specific value.
The total key size is only 8 bytes. Since only large values are being stored under the TERMS index, they will always be written as raw records on the backing store. This means that we have an 8 byte key paired with an 8 byte address. That allows for practical branching factors of between 512 and 1024 to obtain an expected average page sizes of ~ 8k (after prefix compression, etc.).
public static final transient int LOG_WARN_COUNTER_THRESHOLD
public static final transient int NOT_FOUND
Value
was not found on a read-only
request.public KVO<BigdataValue>[] generateKVOs(BigdataValueSerializer<BigdataValue> valSer, BigdataValue[] terms, int numTerms)
BigdataValue
s to be represented as
BlobIV
s. The sort key is formed from the VTE
of the
BigdataValue
followed by the hashCode of the BigdataValue
. Note that the sort key formed in this manner is only a prefix key for
the TERMS index. The fully formed key also includes a counter to breaks
ties when the sort key formed from VTE
and hashCode results in a
collision (different BigdataValue
s having the same prefix key).valSer
- The object used to generate the values to be written onto the
index.terms
- The terms whose sort keys will be generated.numTerms
- The #of terms in that array.public int resolveOrAddValue(IIndex termsIndex, boolean readOnly, IKeyBuilder keyBuilder, byte[] baseKey, byte[] val, byte[] tmp, AtomicInteger bucketSize)
termsIndex
- The TERMS index.readOnly
- true
iff the operation is read only.keyBuilder
- The buffer will be reset as necessary.baseKey
- The base key for the hash code (without the counter suffix).val
- The (serialized and compressed) RDF Value.tmp
- The buffer used to format the toKey (optional). A new
byte[] will be allocated if this is null
, but the
same byte[] can be reused for multiple invocations. The buffer
MUST be dimensioned to
SIZEOF_PREFIX_KEY
.bucketSize
- The size of the collision bucket is reported as a side-effect
(optional).Value
was found (if pre-existing), the collision counter assigned to
the Value
iff the value was not found and the operation
permitted writes -or- Integer.MIN_VALUE
iff the
Value
is not in the index and the operation is read-only.BlobsIndexHelper.CollisionBucketSizeException
- if an attempt is made to insert a Value
into a
collision bucket which is full.public int addBNode(IIndex ndx, IKeyBuilder keyBuilder, byte[] baseKey, byte[] val, byte[] tmp)
BNode
to the TERMS index (do NOT use when told
blank node semantics apply).
All BNode
s entered by this method are distinct regardless of
their BNode.getID()
. Since blank nodes can not be unified with
the TERMS index (unless we are using told blank node semantics) we simply
add another entry for the caller's BNode
and return the key for
that entry which will be wrapped as an IV
. That entry will be
made distinct from all other entries for the same VTE
and
hashCode by appending the current collision counter (which is just the
range count).
termsIndex
- The TERMS index.keyBuilder
- The buffer will be reset as necessary.baseKey
- The base key for the hash code (without the counter suffix).val
- The (serialized and compressed) RDF BNode
.tmp
- The buffer used to format the toKey (optional). A new
byte[] will be allocated if this is null
, but the
same byte[] can be reused for multiple invocations. The buffer
MUST be dimensioned to
SIZEOF_PREFIX_KEY
.BlobsIndexHelper.CollisionBucketSizeException
- if an attempt is made to insert a Value
into a
collision bucket which is full.public byte[] lookup(IIndex ndx, BlobIV<?> iv, IKeyBuilder keyBuilder)
BlobIV
in the TERMS index.
Note: The returned byte[]
may be decoded using the
BigdataValueSerializer
associated with the
BigdataValueFactory
for the namespace of the owning
AbstractTripleStore
.
public byte[] makeKey(IKeyBuilder keyBuilder, byte[] baseKey, int counter)
keyBuilder
- The caller is responsible for resetting the buffer as
required.baseKey
- The base key (including the flags byte and the hashCode).counter
- The counter value.public byte[] makeKey(IKeyBuilder keyBuilder, VTE vte, int hashCode, int counter)
VTE
, the
hashCode of the BigdataValue
, and the hash collision counter.keyBuilder
- The caller is responsible for resetting the buffer as
required.vte
- The VTE
.hashCode
- The hash code of the BigdataValue
.counter
- The hash collision counter.public byte[] makePrefixKey(IKeyBuilder keyBuilder, VTE vte, int hashCode)
VTE
and hashCode
of the BigdataValue
.keyBuilder
- The caller is responsible for resetting the buffer as
required.vte
- The VTE
.hashCode
- The hash code of the BigdataValue
.public byte[] makePrefixKey(IKeyBuilder keyBuilder, BigdataValue value)
BigdataValue
.keyBuilder
- The caller is responsible for resetting the buffer as
required.value
- The BigdataValue
public IKeyBuilder newKeyBuilder()
IKeyBuilder
suitable for formatting keys for the
TERMS index.IKeyBuilder
.Copyright © 2006–2019 SYSTAP, LLC DBA Blazegraph. All rights reserved.