public class TermIdEncoder extends Object
The purpose of this encoding is to cause the N high bits to vary rapidly as the local counter is driven up by writes on the index partition. This has the effect of scattering writes on dependent indices (those using the resulting long value as the sole or leading component of their key).
Given a source RDF/XML document with M "terms" distributed uniformly over K TERM2ID index partitions, each term has a uniform likelihood of setting any of the low bits of the local counter. After encoding, this means that the N high-bits of encoded term identifier are uniformly distributed. Assuming that the separator keys for the ID2TERM index divide the key space into equally sized key-ranges, then the reads and writes on the ID2TERM index partitions will be uniformly distributed as well.
The next bits in the encoded values are derived from the partition identifier followed by the term identifier and therefore have a strong bias for the index partition and the sequential assignment of local counter values within an index partition respectively. This means that read / write access within an index partition tends to have some locality, which improves B+Tree performance through several mechanisms (mainly improved cache effects, reduced copy-on-write for dirty leaves and nodes, and less IO costs).
When the #of ID2TERM index partitions GT 2^N
, only a subset of
those index partitions can be directly selected by the N high bits with their
uniform distribution. The next bias is the partition identifier, which begins
at ZERO (0), is inflated to (0, [1:P]), where P is the #of index partitions
generated by a scatter split, and grows relatively slowly thereafter as index
partitions are fill up and are split or are moved to redistribute the load on
the cluster.
Constructor and Description |
---|
TermIdEncoder(int N) |
Modifier and Type | Method and Description |
---|---|
static long |
combine(int pid,
int ctr)
Combines the partition identifier and the local counter using the same
logic as the
BTree.PartitionedCounter . |
long |
decode(long u)
Reverses the effect of
encode(long) . |
long |
encode(long v)
Encode a term identifier using the configured value of
NBits . |
static int |
getLocalCounter(long v)
Return the local counter from the low word of a partitioned counter.
|
int |
getNBits()
Return the #of low bits from the local counter that will be reversed and
written into the high-bits of the encoded long value.
|
static int |
getPartitionId(long v)
Return the partition identifier from the high word of a partitioned
counter.
|
String |
toString() |
public TermIdEncoder(int N)
N
- The #of low bits from the local counter that will be reversed
and written into the high-bits of the encoded long value.public int getNBits()
public long encode(long v)
NBits
.v
- A 64-bit long counter value as generated by an
BTree.PartitionedCounter
.public long decode(long u)
encode(long)
.u
- An encoded long value.public static int getPartitionId(long v)
v
- The partitioned counter.public static int getLocalCounter(long v)
v
- The partitioned counter.public static long combine(int pid, int ctr)
BTree.PartitionedCounter
.pid
- The partition identifier.ctr
- The local counter.BTree.PartitionedCounter, which performs the same operation and MUST
be consistent with this method.
Copyright © 2006–2019 SYSTAP, LLC DBA Blazegraph. All rights reserved.