public static interface IndexMetadata.Options
com.bigdata.btree
package and
the BTree
and IndexSegment
classes. Options that apply
equally to views and AbstractBTree
s are in the package namespace,
such as whether or not a bloom filter is enabled. Options that apply to
all AbstractBTree
s are specified within that namespace while
those that are specific to BTree
or IndexSegment
are
located within their respective class namespaces. Some properties, such
as the branchingFactor, are defined for both the BTree
and the
IndexSegment
because their defaults tend to be different when an
IndexSegment
is generated from an BTree
.Modifier and Type | Field and Description |
---|---|
static String |
BLOOM_FILTER
Optional property controls whether or not a bloom filter is
maintained (default "false").
|
static String |
BTREE_BRANCHING_FACTOR
The name of an optional property whose value specifies the branching
factor for a mutable
BTree . |
static String |
BTREE_CLASS_NAME
The name of a class derived from
BTree that will be used to
re-load the index. |
static String |
BTREE_RECORD_COMPRESSOR_FACTORY
An optional factory providing record-level compression for the nodes
and leaves of an
IndexSegment (default
). |
static String |
DEFAULT_BLOOM_FILTER |
static String |
DEFAULT_BTREE_BRANCHING_FACTOR
The default branching factor for a mutable
BTree . |
static String |
DEFAULT_BTREE_RECORD_COMPRESSOR_FACTORY |
static String |
DEFAULT_INDEX_SEGMENT_BRANCHING_FACTOR
The default branching factor for an
IndexSegment . |
static String |
DEFAULT_INDEX_SEGMENT_BUFFER_NODES |
static String |
DEFAULT_INDEX_SEGMENT_RECORD_COMPRESSOR_FACTORY |
static String |
DEFAULT_MASTER_CHUNK_SIZE |
static String |
DEFAULT_MASTER_CHUNK_TIMEOUT_NANOS |
static String |
DEFAULT_MASTER_QUEUE_CAPACITY |
static String |
DEFAULT_MAX_PARALLEL_EVICT_THREADS |
static String |
DEFAULT_MAX_REC_LEN |
static String |
DEFAULT_MIN_DIRTY_LIST_SIZE_FOR_PARALLEL_EVICT |
static String |
DEFAULT_SCATTER_SPLIT_DATA_SERVICE_COUNT |
static String |
DEFAULT_SCATTER_SPLIT_ENABLED |
static String |
DEFAULT_SCATTER_SPLIT_INDEX_PARTITION_COUNT |
static String |
DEFAULT_SCATTER_SPLIT_PERCENT_OF_SPLIT_THRESHOLD |
static String |
DEFAULT_SINK_CHUNK_SIZE |
static String |
DEFAULT_SINK_CHUNK_TIMEOUT_NANOS |
static String |
DEFAULT_SINK_IDLE_TIMEOUT_NANOS |
static String |
DEFAULT_SINK_POLL_TIMEOUT_NANOS |
static String |
DEFAULT_SINK_QUEUE_CAPACITY |
static String |
DEFAULT_WRITE_RETENTION_QUEUE_CAPACITY |
static String |
DEFAULT_WRITE_RETENTION_QUEUE_SCAN |
static String |
INDEX_SEGMENT_BRANCHING_FACTOR
The name of the property whose value specifies the branching factory
for an immutable
IndexSegment . |
static String |
INDEX_SEGMENT_BUFFER_NODES
When
true an attempt will be made to fully buffer the
nodes (but not the leaves) of the IndexSegment (default
"false"). |
static String |
INDEX_SEGMENT_RECORD_COMPRESSOR_FACTORY
An optional factory providing record-level compression for the nodes
and leaves of an
IndexSegment (default
). |
static String |
INITIAL_DATA_SERVICE
The name of an optional property whose value identifies the data
service on which the initial index partition of a scale-out index
will be created.
|
static String |
KEY_BUILDER_FACTORY
Override the
IKeyBuilderFactory used by the
DefaultTupleSerializer (the default is a
DefaultKeyBuilderFactory initialized with an empty
Properties object). |
static String |
LEAF_KEYS_CODER
Override the
IRabaCoder used for the keys of leaves in
B+Trees (the default is a FrontCodedRabaCoder instance). |
static String |
LEAF_VALUES_CODER
Override the
IRabaCoder used for the values of leaves in
B+Trees (default is a CanonicalHuffmanRabaCoder ). |
static String |
MASTER_CHUNK_SIZE
The desired size of the chunks that the master will draw from its
queue.
|
static String |
MASTER_CHUNK_TIMEOUT_NANOS
The time in nanoseconds that the master will combine smaller chunks
so that it can satisfy the desired masterChunkSize.
|
static String |
MASTER_QUEUE_CAPACITY
The capacity of the queue on which the application writes.
|
static int |
MAX_BTREE_BRANCHING_FACTOR
A reasonable maximum branching factor for a
BTree . |
static int |
MAX_INDEX_SEGMENT_BRANCHING_FACTOR
A reasonable maximum branching factor for an
IndexSegment . |
static String |
MAX_PARALLEL_EVICT_THREADS
The maximum number of threads that will be used to evict dirty nodes
or leaves in a given level of a persistence capable index (BTree or
HTree).
|
static String |
MAX_REC_LEN
When raw record support is enabled for the index, this is the maximum
length of an index value which will be stored within a leaf before it
is automatically promoted to a raw record reference on the backing
store (default "256").
|
static int |
MAX_WRITE_RETENTION_QUEUE_CAPACITY
A large maximum write retention queue capacity.
|
static int |
MIN_BRANCHING_FACTOR
The minimum allowed branching factor (3).
|
static String |
MIN_DIRTY_LIST_SIZE_FOR_PARALLEL_EVICT
The minimum number of dirty nodes or leaves in a given level of the
index (BTree or HTree) before parallel eviction will be used.
|
static int |
MIN_WRITE_RETENTION_QUEUE_CAPACITY
The minimum write retention queue capacity is two (2) in order to
avoid cache evictions of the leaves participating in a split.
|
static String |
NODE_KEYS_CODER
Override the
IRabaCoder used for the keys in the nodes of a
B+Tree (the default is a FrontCodedRabaCoder instance). |
static String |
SCATTER_SPLIT_DATA_SERVICE_COUNT
The #of data services on which the index will be scattered or ZERO(0)
to use all discovered data services (default
"0").
|
static String |
SCATTER_SPLIT_ENABLED
Boolean option indicates whether or not scatter splits are performed
(default ).
|
static String |
SCATTER_SPLIT_INDEX_PARTITION_COUNT
The #of index partitions to generate when an index is scatter split.
|
static String |
SCATTER_SPLIT_PERCENT_OF_SPLIT_THRESHOLD
The percentage of the nominal index partition size at which a scatter
split is triggered when there is only a single index partition for a
given scale-out index (default
DEFAULT_SCATTER_SPLIT_PERCENT_OF_SPLIT_THRESHOLD ). |
static String |
SINK_CHUNK_SIZE
The desired size of the chunks written that will be written by the
sink . |
static String |
SINK_CHUNK_TIMEOUT_NANOS
The maximum amount of time in nanoseconds that a sink will combine
smaller chunks so that it can satisfy the desired sinkChunkSize
(default "9223372036854775807").
|
static String |
SINK_IDLE_TIMEOUT_NANOS
The time in nanoseconds after which an idle sink will be closed
(default "9223372036854775807").
|
static String |
SINK_POLL_TIMEOUT_NANOS
The time in nanoseconds that the
sink will
wait inside of the IAsynchronousIterator when it polls the
iterator for a chunk. |
static String |
SINK_QUEUE_CAPACITY
The capacity of the internal queue for the per-sink output buffer.
|
static String |
WRITE_RETENTION_QUEUE_CAPACITY
The capacity of the hard reference queue used to retain recently
touched nodes (nodes or leaves) and to defer the eviction of dirty
nodes (nodes or leaves).
|
static String |
WRITE_RETENTION_QUEUE_SCAN
The #of entries on the write retention queue that will be scanned for
a match before a new reference is appended to the queue.
|
static final int MIN_BRANCHING_FACTOR
static final int MAX_BTREE_BRANCHING_FACTOR
BTree
.static final int MAX_INDEX_SEGMENT_BRANCHING_FACTOR
IndexSegment
.static final int MIN_WRITE_RETENTION_QUEUE_CAPACITY
static final int MAX_WRITE_RETENTION_QUEUE_CAPACITY
static final String BLOOM_FILTER
AbstractBTree
s. While the mutable
BTree
s might occasionally grow to large to support a bloom
filter, data is periodically migrated onto immutable
IndexSegment
s which have perfect fit bloom filters. This
means that the bloom filter scales-out, but not up.static final String DEFAULT_BLOOM_FILTER
static final String MAX_REC_LEN
static final String DEFAULT_MAX_REC_LEN
static final String INITIAL_DATA_SERVICE
UUID
of that data
service (this is unambiguous) of the name associated with the data
service (it is up to the administrator to not assign the same name to
different data service instances and an arbitrary instance having the
desired name will be used if more than one instance is assigned the
same name). The default behavior is to select a data service using
the load balancer, which is done automatically by
IBigdataFederation.registerIndex(IndexMetadata, UUID)
if
IndexMetadata.getInitialDataServiceUUID()
returns
null
.static final String WRITE_RETENTION_QUEUE_CAPACITY
The purpose of this queue is to retain recently touched nodes and
leaves and to defer eviction of dirty nodes and leaves in case they
will be modified again soon. Once a node falls off the write
retention queue it is checked to see if it is dirty. If it is dirty,
then it is serialized and persisted on the backing store. If the
write retention queue capacity is set to a large value (say, GTE
1000), then that will will increase the commit latency and have a
negative effect on the overall performance. Too small a value will
mean that nodes that are undergoing mutation will be serialized and
persisted prematurely leading to excessive writes on the backing
store. For append-only stores, this directly contributes to what are
effectively redundant and thereafter unreachable copies of the
intermediate state of nodes as only nodes that can be reached by
navigation from a Checkpoint
will ever be read again. The
value 500
appears to be a good default. While it is
possible that some workloads could benefit from a larger value, this
leads to higher commit latency and can therefore have a broad impact
on performance.
Note: The write retention queue is used for both BTree
and
IndexSegment
. Any touched node or leaf is placed onto this
queue. As nodes and leaves are evicted from this queue, they are then
placed onto the optional read-retention queue.
The default value is a function of the JVM heap size. For small heaps, it is "500". For larger heaps the value may be 8000 (1G), or 20000 (10G). These larger defaults are heuristics. Values larger than 8000 benefit the size of disk of the journal, while values up to 8000 can also improve throughput dramatically. Larger values are ONLY useful if the application is performing sustained writes on the index (hundreds of thousands to millions of records).
static final String WRITE_RETENTION_QUEUE_SCAN
AbstractNode.referenceCount
is zero and the node or leaf is dirty.static final String DEFAULT_WRITE_RETENTION_QUEUE_CAPACITY
static final String DEFAULT_WRITE_RETENTION_QUEUE_SCAN
static final String KEY_BUILDER_FACTORY
IKeyBuilderFactory
used by the
DefaultTupleSerializer
(the default is a
DefaultKeyBuilderFactory
initialized with an empty
Properties
object).
FIXME KeyBuilder
configuration support is not finished.static final String NODE_KEYS_CODER
IRabaCoder
used for the keys in the nodes of a
B+Tree (the default is a FrontCodedRabaCoder
instance).static final String LEAF_KEYS_CODER
IRabaCoder
used for the keys of leaves in
B+Trees (the default is a FrontCodedRabaCoder
instance).static final String LEAF_VALUES_CODER
IRabaCoder
used for the values of leaves in
B+Trees (default is a CanonicalHuffmanRabaCoder
).static final String BTREE_CLASS_NAME
static final String BTREE_BRANCHING_FACTOR
BTree
.static final String DEFAULT_BTREE_BRANCHING_FACTOR
BTree
.
Note: on 9/11/2009 I changed the default B+Tree branching factor and write retention queue capacity to 64 (was 32) and 8000 (was 500) respectively. This change in the B+Tree branching factor reduces the height of B+Trees on the Journal, increases the size of the individual records on the disk, and aids performance substantially. The larger write retention queue capacity helps to prevent B+Tree nodes and leaves from being coded and flushed to disk too soon, which decreases disk IO and keeps things in their mutable form in memory longer, which improves search performance and keeps down the costs of mutation operations. [Dropped back to 32/500 on 9/15/09 since this does not do so well at scale on machines with less RAM.]
static final String BTREE_RECORD_COMPRESSOR_FACTORY
IndexSegment
(default
).static final String DEFAULT_BTREE_RECORD_COMPRESSOR_FACTORY
BTREE_RECORD_COMPRESSOR_FACTORY
static final String MAX_PARALLEL_EVICT_THREADS
Note: This is currently a System property (set with -D).
(Reduce commit
latency by parallel checkpoint by level of dirty pages in an
index)
static final String DEFAULT_MAX_PARALLEL_EVICT_THREADS
static final String MIN_DIRTY_LIST_SIZE_FOR_PARALLEL_EVICT
Integer.MAX_VALUE
to disable parallel
level set eviction.
Note: This is currently a System property (set with -D).
MAX_PARALLEL_EVICT_THREADS
,
(Reduce commit
latency by parallel checkpoint by level of dirty pages in an
index)
static final String DEFAULT_MIN_DIRTY_LIST_SIZE_FOR_PARALLEL_EVICT
static final String INDEX_SEGMENT_BRANCHING_FACTOR
IndexSegment
.static final String DEFAULT_INDEX_SEGMENT_BRANCHING_FACTOR
IndexSegment
.static final String INDEX_SEGMENT_BUFFER_NODES
true
an attempt will be made to fully buffer the
nodes (but not the leaves) of the IndexSegment
(default
"false"). The nodes in the
IndexSegment
are serialized in a contiguous region by the
IndexSegmentBuilder
. That region may be fully buffered when
the IndexSegment
is opened, in which case queries against the
IndexSegment
will incur NO disk hits for the nodes and only
one disk hit per visited leaf.
Note: The nodes are read into a buffer allocated from the
DirectBufferPool
. If the size of the nodes region in the
IndexSegmentStore
file exceeds the capacity of the buffers
managed by the DirectBufferPool
, then the nodes WILL NOT be
buffered. The DirectBufferPool
is used both for efficiency
and because a bug dealing with temporary direct buffers would
otherwise cause the C heap to be exhausted!
DEFAULT_INDEX_SEGMENT_BUFFER_NODES
static final String DEFAULT_INDEX_SEGMENT_BUFFER_NODES
static final String INDEX_SEGMENT_RECORD_COMPRESSOR_FACTORY
IndexSegment
(default
).static final String DEFAULT_INDEX_SEGMENT_RECORD_COMPRESSOR_FACTORY
static final String MASTER_QUEUE_CAPACITY
AbstractTaskMaster
, broken
into splits, and each split is written onto the
AbstractSubtask
sink handling writes for the associated index
partition.static final String DEFAULT_MASTER_QUEUE_CAPACITY
static final String MASTER_CHUNK_SIZE
static final String DEFAULT_MASTER_CHUNK_SIZE
static final String MASTER_CHUNK_TIMEOUT_NANOS
static final String DEFAULT_MASTER_CHUNK_TIMEOUT_NANOS
static final String SINK_POLL_TIMEOUT_NANOS
sink
will
wait inside of the IAsynchronousIterator
when it polls the
iterator for a chunk. This value should be relatively small so that
the sink remains responsible rather than blocking inside of the
IAsynchronousIterator
for long periods of time.static final String DEFAULT_SINK_POLL_TIMEOUT_NANOS
static final String SINK_QUEUE_CAPACITY
static final String DEFAULT_SINK_QUEUE_CAPACITY
static final String SINK_CHUNK_SIZE
sink
.static final String DEFAULT_SINK_CHUNK_SIZE
static final String SINK_CHUNK_TIMEOUT_NANOS
SINK_CHUNK_SIZE
elements have accumulated before writing on
the index partition. This makes it much easier to adjust the
performance since you simply adjust the SINK_CHUNK_SIZE
.static final String DEFAULT_SINK_CHUNK_TIMEOUT_NANOS
static final String SINK_IDLE_TIMEOUT_NANOS
If the idle timeout is LT the SINK_CHUNK_TIMEOUT_NANOS
then
a sink will remain open as long as new chunks appear and are combined
within idle timeout, otherwise the sink will decide that it is idle
and will flush its last chunk and close itself. If this is
Long.MAX_VALUE
then the sink will identify itself as idle and
will only be closed if the master is closed or the sink has received
a StaleLocatorException
for the index partition on which the
sink is writing.
static final String DEFAULT_SINK_IDLE_TIMEOUT_NANOS
static final String SCATTER_SPLIT_ENABLED
IDataService
s in the federation. This
is normally very useful.
Sometimes a scatter split is not the "right" thing for an index. An example would be an index where you have to do a LOT of synchronous RPC rather than using asynchronous index writes. In this case, the synchronous RPC can be a bottleneck unless the "chunk" size of the writes is large. This is especially true when writes on other indices must wait for the outcome of the synchronous RPC. E.g., foreign keys.
OverflowManager.Options#SCATTER_SPLIT_ENABLED
static final String DEFAULT_SCATTER_SPLIT_ENABLED
static final String SCATTER_SPLIT_PERCENT_OF_SPLIT_THRESHOLD
DEFAULT_SCATTER_SPLIT_PERCENT_OF_SPLIT_THRESHOLD
). The
scatter split will break the index into multiple partitions and
distribute those index partitions across the federation in order to
allow more resources to be brought to bear on the scale-out index.
The value must LT the nominal index partition split point or normal
index splits will take precedence and a scatter split will never be
performed. The allowable range is therefore constrained to
(0.1 : 1.0)
.static final String DEFAULT_SCATTER_SPLIT_PERCENT_OF_SPLIT_THRESHOLD
static final String SCATTER_SPLIT_DATA_SERVICE_COUNT
static final String DEFAULT_SCATTER_SPLIT_DATA_SERVICE_COUNT
static final String SCATTER_SPLIT_INDEX_PARTITION_COUNT
SCATTER_SPLIT_DATA_SERVICE_COUNT
discovered data services.
When ZERO(0), the scatter split will generate
(NDATA_SERVICES x 2)
index partitions, where
NDATA_SERVICES is either SCATTER_SPLIT_DATA_SERVICE_COUNT
or
the #of discovered data services when that option is ZERO (0).
The "ideal" number of index partitions is generally between (NCORES x NDATA_SERVICES / NINDICES) and (NCORES x NDATA_SERVICES). When there are NCORES x NDATA_SERVICES index partitions, each core is capable of servicing a distinct index partition assuming that the application and the "schema" are capable of driving the data service writes with that concurrency. However, if you have NINDICES, and the application drives writes on all index partitions of all indices at the same rate, then a 1:1 allocation of index partitions to cores would be "ideal".
The "right" answer also depends on the data scale. If you have far less data than can fill that many index partitions to 200M each, then you should adjust the scatter split to use fewer index partitions or fewer data services.
Finally, the higher the scatter the more you will need to use asynchronous index writes in order to obtain high throughput with sustained index writes.
static final String DEFAULT_SCATTER_SPLIT_INDEX_PARTITION_COUNT
Copyright © 2006–2019 SYSTAP, LLC DBA Blazegraph. All rights reserved.