public static interface AbstractTripleStore.Options extends AbstractResource.Options, InferenceEngine.Options, Options, KeyBuilder.Options, DataLoader.Options, FullTextIndex.Options, GeoSpatialConfigOptions.Options
Modifier and Type | Field and Description |
---|---|
static String |
AXIOMS_CLASS
The
Axioms model that will be used (default
). |
static String |
BLOBS_THRESHOLD
The threshold (in character length) at which an RDF
Value
will be inserted into the LexiconKeyOrder.BLOBS index rather
than the LexiconKeyOrder.TERM2ID and
LexiconKeyOrder.ID2TERM indices (default
"256"). |
static String |
BLOBS_THRESHOLD_DISABLE
The constant that may be used to disable the BLOBS index.
|
static String |
BLOOM_FILTER
Optional property controls whether or not a bloom filter is
maintained for the SPO statement index.
|
static String |
BOTTOM_UP_EVALUATION
If this option is set to false, turn off the ASTBottomUpOptimizer.
|
static String |
CLOSURE_CLASS
The name of the
BaseClosure class that will be used (default
). |
static String |
COMPUTE_CLOSURE_FOR_SIDS
If this option is set to false, do not compute closure for sids.
|
static String |
CONSTRAIN_XXXC_SHARDS
Boolean option determines whether or not an
XXXCShardSplitHandler is applied (scale-out only, default
"true"). |
static String |
DATATYPES_TO_TEXT_INDEX
List of datatypes, which will be put into full text index even if
TEXT_INDEX_DATATYPE_LITERALS is not enabled
(default ""). |
static String |
DEFAULT_AXIOMS_CLASS |
static String |
DEFAULT_BLOBS_THRESHOLD |
static String |
DEFAULT_BLOOM_FILTER |
static String |
DEFAULT_BOTTOM_UP_EVALUATION |
static String |
DEFAULT_CLOSURE_CLASS |
static String |
DEFAULT_COMPUTE_CLOSURE_FOR_SIDS |
static String |
DEFAULT_CONSTRAIN_XXXC_SHARDS |
static String |
DEFAULT_DATATYPES_TO_TEXT_INDEX |
static String |
DEFAULT_ENABLE_RAW_RECORDS_SUPPORT |
static String |
DEFAULT_EXTENSION_FACTORY_CLASS |
static String |
DEFAULT_HISTORY_SERVICE |
static String |
DEFAULT_HISTORY_SERVICE_MIN_RELEASE_AGE |
static String |
DEFAULT_INLINE_BNODES |
static String |
DEFAULT_INLINE_DATE_TIMES |
static String |
DEFAULT_INLINE_DATE_TIMES_TIMEZONE |
static String |
DEFAULT_INLINE_TEXT_LITERALS |
static String |
DEFAULT_INLINE_URI_FACTORY_CLASS |
static String |
DEFAULT_INLINE_XSD_DATATYPE_LITERALS |
static String |
DEFAULT_JUSTIFY |
static String |
DEFAULT_LEXICON |
static String |
DEFAULT_MAX_INLINE_STRING_LENGTH
Note that there an interaction when this is enabled with the full
text indexer.
|
static String |
DEFAULT_ONE_ACCESS_PATH |
static String |
DEFAULT_QUADS |
static String |
DEFAULT_QUADS_MODE |
static String |
DEFAULT_REJECT_INVALID_XSD_VALUES |
static String |
DEFAULT_STATEMENT_IDENTIFIERS |
static String |
DEFAULT_STORE_BLANK_NODES |
static String |
DEFAULT_SUBJECT_CENTRIC_TEXT_INDEX
Deprecated.
|
static String |
DEFAULT_SUBJECT_CENTRIC_TEXT_INDEXER_CLASS |
static String |
DEFAULT_TERM_CACHE_CAPACITY |
static String |
DEFAULT_TERMID_BITS_TO_REVERSE |
static String |
DEFAULT_TEXT_INDEX |
static String |
DEFAULT_TEXT_INDEX_DATATYPE_LITERALS |
static String |
DEFAULT_TEXT_INDEXER_CLASS |
static String |
DEFAULT_TRIPLES_MODE |
static String |
DEFAULT_TRIPLES_MODE_WITH_PROVENANCE |
static String |
DEFAULT_VALUE_FACTORY_CLASS |
static String |
DEFAULT_VOCABULARY_CLASS
Note: The default
Vocabulary class may be changed from time
to time as additional VocabularyDecl are created and bundled
into a new default Vocabulary . |
static String |
ENABLE_RAW_RECORDS_SUPPORT
If this option is set to false, turn off using raw records
to store the lexical forms of the RDF Values.
|
static String |
EXTENSION_FACTORY_CLASS
The name of the
IExtensionFactory class. |
static String |
HISTORY_SERVICE
When
true a HISTORY SERVICE and its associated index
will be maintained. |
static String |
HISTORY_SERVICE_MIN_RELEASE_AGE
The minimum amount of history (in milliseconds) that will be retained
by the
HISTORY_SERVICE (default
). |
static String |
INLINE_BNODES
Set up database to inline bnodes directly into the statement indices
rather than using the lexicon to map them to term identifiers and
back.
|
static String |
INLINE_DATE_TIMES
Set up database to inline date/times directly into the statement
indices rather than using the lexicon to map them to term identifiers
and back (default "true").
|
static String |
INLINE_DATE_TIMES_TIMEZONE
The default time zone to be used to a) encode inline xsd:datetime
literals that do not have a time zone specified and b) decode
xsd:datetime literals from the statement indices where they are
stored as UTC milliseconds since the epoch (default
"GMT").
|
static String |
INLINE_TEXT_LITERALS
Inline ANY literal having fewer than
MAX_INLINE_TEXT_LENGTH
characters (default "false"). |
static String |
INLINE_URI_FACTORY_CLASS
The name of the
IInlineURIFactory class. |
static String |
INLINE_XSD_DATATYPE_LITERALS
Set up database to inline XSD datatype literals corresponding to
primitives (boolean) and numerics (byte, short, int, etc) directly
into the statement indices (default
"true").
|
static String |
JUSTIFY
When
true (default ),
proof chains for entailments generated by forward chaining are stored
in the database. |
static String |
LEXICON
Boolean option (default
true ) enables support for the
lexicon (the forward and backward term indices). |
static String |
MAX_INLINE_TEXT_LENGTH
The maximum length of a String value which may be inlined into the
statement indices (default "0"
).
|
static String |
ONE_ACCESS_PATH
Boolean option (default
false ) disables all but a
single statement index (aka access path). |
static String |
QUADS
Boolean option determines whether the KB instance will be a quad
store or a triple store.
|
static String |
QUADS_MODE
Set up database in quads mode.
|
static String |
RDR_HISTORY_CLASS
The name of the
RDRHistory class. |
static String |
REJECT_INVALID_XSD_VALUES
When
true AND is
true , literals having an xsd datatype URI which can not
be validated against that datatype will be rejected (default
DEFAULT_REJECT_INVALID_XSD_VALUES ). |
static String |
STATEMENT_IDENTIFIERS
Boolean option (default "false")
enables support for statement identifiers.
|
static String |
STORE_BLANK_NODES
Boolean option (default "false") controls
whether or not we store blank nodes in the forward mapping of the
lexicon (this is also known as the "told bnodes" mode).
|
static String |
SUBJECT_CENTRIC_TEXT_INDEX
Deprecated.
Feature was never completed due to scalability issues.
See BZLG-1548, BLZG-563.
|
static String |
SUBJECT_CENTRIC_TEXT_INDEXER_CLASS
The name of the
ITextIndexer class. |
static String |
TERM_CACHE_CAPACITY
Integer option whose value is the capacity of the term cache.
|
static String |
TERMID_BITS_TO_REVERSE
Option effects how evenly distributed the assigned term identifiers
which has a pronounced effect on the ID2TERM and statement indices
for scale-out deployments.
|
static String |
TEXT_INDEX
Boolean option (default "true") enables support
for a full text index that may be used to lookup literals by tokens
found in the text of those literals.
|
static String |
TEXT_INDEX_DATATYPE_LITERALS
Boolean option enables support for a full text index that may be used
to lookup datatype literals by tokens found in the text of those
literals (default "true").
|
static String |
TEXT_INDEXER_CLASS
The name of the
IValueCentricTextIndexer class. |
static String |
TRIPLES_MODE
Set up database in triples mode, no provenance.
|
static String |
TRIPLES_MODE_WITH_PROVENANCE
Set up database in triples mode with provenance.
|
static String |
VALUE_FACTORY_CLASS
The name of the
BigdataValueFactory class. |
static String |
VOCABULARY_CLASS
The name of the class that will establish the pre-defined
Vocabulary for the database (default
). |
CHUNK_CAPACITY, CHUNK_OF_CHUNKS_CAPACITY, CHUNK_TIMEOUT, DEFAULT_CHUNK_CAPACITY, DEFAULT_CHUNK_OF_CHUNKS_CAPACITY, DEFAULT_CHUNK_TIMEOUT, DEFAULT_FORCE_SERIAL_EXECUTION, DEFAULT_FULLY_BUFFERED_READ_THRESHOLD, DEFAULT_MAX_PARALLEL_SUBQUERIES, FORCE_SERIAL_EXECUTION, FULLY_BUFFERED_READ_THRESHOLD, MAX_PARALLEL_SUBQUERIES
DEFAULT_ENABLE_OWL_FUNCTIONAL_AND_INVERSE_FUNCTIONAL_PROPERTY, DEFAULT_FORWARD_CHAIN_OWL_EQUIVALENT_CLASS, DEFAULT_FORWARD_CHAIN_OWL_EQUIVALENT_PROPERTY, DEFAULT_FORWARD_CHAIN_OWL_HAS_VALUE, DEFAULT_FORWARD_CHAIN_OWL_INVERSE_OF, DEFAULT_FORWARD_CHAIN_OWL_SAMEAS_CLOSURE, DEFAULT_FORWARD_CHAIN_OWL_SAMEAS_PROPERTIES, DEFAULT_FORWARD_CHAIN_OWL_SYMMETRIC_PROPERTY, DEFAULT_FORWARD_CHAIN_OWL_TRANSITIVE_PROPERTY, DEFAULT_FORWARD_RDF_TYPE_RDFS_RESOURCE, ENABLE_OWL_FUNCTIONAL_AND_INVERSE_FUNCTIONAL_PROPERTY, FORWARD_CHAIN_OWL_EQUIVALENT_CLASS, FORWARD_CHAIN_OWL_EQUIVALENT_PROPERTY, FORWARD_CHAIN_OWL_HAS_VALUE, FORWARD_CHAIN_OWL_INVERSE_OF, FORWARD_CHAIN_OWL_SAMEAS_CLOSURE, FORWARD_CHAIN_OWL_SAMEAS_PROPERTIES, FORWARD_CHAIN_OWL_SYMMETRIC_PROPERTY, FORWARD_CHAIN_OWL_TRANSITIVE_PROPERTY, FORWARD_CHAIN_RDF_TYPE_RDFS_RESOURCE
ALTERNATE_ROOT_BLOCK, BUFFER_MODE, CREATE, CREATE_TEMP_FILE, CREATE_TIME, DEFAULT_BUFFER_MODE, DEFAULT_CREATE, DEFAULT_CREATE_TEMP_FILE, DEFAULT_DELETE_ON_CLOSE, DEFAULT_DELETE_ON_EXIT, DEFAULT_DOUBLE_SYNC, DEFAULT_FILE_LOCK_ENABLED, DEFAULT_FORCE_ON_COMMIT, DEFAULT_FORCE_WRITES, DEFAULT_HALOG_COMPRESSOR, DEFAULT_HISTORICAL_INDEX_CACHE_CAPACITY, DEFAULT_HISTORICAL_INDEX_CACHE_TIMEOUT, DEFAULT_HOT_CACHE_SIZE, DEFAULT_HOT_CACHE_THRESHOLD, DEFAULT_INITIAL_EXTENT, DEFAULT_LIVE_INDEX_CACHE_CAPACITY, DEFAULT_LIVE_INDEX_CACHE_TIMEOUT, DEFAULT_MAXIMUM_EXTENT, DEFAULT_MINIMUM_EXTENSION, DEFAULT_READ_CACHE_BUFFER_COUNT, DEFAULT_READ_ONLY, DEFAULT_USE_DIRECT_BUFFERS, DEFAULT_VALIDATE_CHECKSUM, DEFAULT_WRITE_CACHE_BUFFER_COUNT, DEFAULT_WRITE_CACHE_COMPACTION_THRESHOLD, DEFAULT_WRITE_CACHE_ENABLED, DEFAULT_WRITE_CACHE_MIN_CLEAN_LIST_SIZE, DELETE_ON_CLOSE, DELETE_ON_EXIT, DOUBLE_SYNC, FILE, FILE_LOCK_ENABLED, FORCE_ON_COMMIT, FORCE_WRITES, HALOG_COMPRESSOR, HISTORICAL_INDEX_CACHE_CAPACITY, HISTORICAL_INDEX_CACHE_TIMEOUT, HOT_CACHE_SIZE, HOT_CACHE_THRESHOLD, IGNORE_BAD_ROOT_BLOCK, INITIAL_EXTENT, JNL, LIVE_INDEX_CACHE_CAPACITY, LIVE_INDEX_CACHE_TIMEOUT, MAXIMUM_EXTENT, MEM_MAX_EXTENT, MINIMUM_EXTENSION, minimumInitialExtent, minimumMinimumExtension, OFFSET_BITS, OTHER_MAX_EXTENT, READ_CACHE_BUFFER_COUNT, READ_ONLY, RW_MAX_EXTENT, SEG, TMP_DIR, UPDATE_ICU_VERSION, USE_DIRECT_BUFFERS, VALIDATE_CHECKSUM, WRITE_CACHE_BUFFER_COUNT, WRITE_CACHE_COMPACTION_THRESHOLD, WRITE_CACHE_ENABLED, WRITE_CACHE_MIN_CLEAN_LIST_SIZE
COLLATOR, DECOMPOSITION, STRENGTH, USER_COUNTRY, USER_LANGUAGE, USER_VARIANT
BUFFER_CAPACITY, CLOSURE, COMMIT, DEFAULT_BUFFER_CAPACITY, DEFAULT_CLOSURE, DEFAULT_COMMIT, DEFAULT_DUMP_JOURNAL, DEFAULT_DURABLE_QUEUES, DEFAULT_FLUSH, DEFAULT_GZIP_BUFFER_SIZE, DEFAULT_IGNORE_INVALID_FILES, DEFAULT_QUEUE_CAPACITY, DEFAULT_VERBOSE, DUMP_JOURNAL, DURABLE_QUEUES, FLUSH, GZIP_BUFFER_SIZE, IGNORE_INVALID_FILES, QUEUE_CAPACITY, VERBOSE
DATATYPE_HANDLING, DEFAULT_DATATYPE_HANDLING, DEFAULT_PRESERVE_BNODE_IDS, DEFAULT_STOP_AT_FIRST_ERROR, DEFAULT_VERIFY_DATA, PRESERVE_BNODE_IDS, STOP_AT_FIRST_ERROR, VERIFY_DATA
ANALYZER_FACTORY_CLASS, DEFAULT_ANALYZER_FACTORY_CLASS, DEFAULT_FIELDS_ENABLED, DEFAULT_HIT_CACHE_SIZE, DEFAULT_HIT_CACHE_TIMEOUT_MILLIS, DEFAULT_INDEXER_COLLATOR_STRENGTH, DEFAULT_INDEXER_TIMEOUT, DEFAULT_OVERWRITE, FIELDS_ENABLED, HIT_CACHE_SIZE, HIT_CACHE_TIMEOUT_MILLIS, INDEXER_COLLATOR_STRENGTH, INDEXER_TIMEOUT, OVERWRITE
DEFAULT_GEO_SPATIAL, DEFAULT_GEO_SPATIAL_DEFAULT_DATATYPE, DEFAULT_GEO_SPATIAL_INCLUDE_BUILTIN_DATATYPES, GEO_SPATIAL, GEO_SPATIAL_DATATYPE_CONFIG, GEO_SPATIAL_DEFAULT_DATATYPE, GEO_SPATIAL_INCLUDE_BUILTIN_DATATYPES, GEO_SPATIAL_LITERAL_V1_LAT_LON_CONFIG, GEO_SPATIAL_LITERAL_V1_LAT_LON_TIME_CONFIG
static final String LEXICON
true
) enables support for the
lexicon (the forward and backward term indices). When
false
, the lexicon indices are not registered. This can
be safely turned off for the TempTripleStore
when only the
statement indices are to be used.
You can control how the triple store will interpret the RDF URIs, and
literals using the KeyBuilder.Options
. For example:
// Force ASCII key comparisons. properties.setProperty(Options.COLLATOR, CollatorEnum.ASCII.toString());or
// Force identical unicode comparisons (assuming default COLLATOR setting). properties.setProperty(Options.STRENGTH, StrengthEnum.IDENTICAL.toString());
LexiconRelation
,
KeyBuilder.Options
static final String DEFAULT_LEXICON
static final String STORE_BLANK_NODES
When false
blank node semantics are enforced, you CAN
NOT unify blank nodes based on their IDs in the lexicon, and
AbstractTripleStore.getBNodeCount()
is disabled.
When true
, you are able to violate blank node semantics
and force unification of blank nodes by assigning the ID from the RDF
interchange syntax to the blank node. RIO has an option that will
allow you to do this. When this option is also true
,
then you will in fact be able to resolve pre-existing blank nodes
using their identifiers. The tradeoff is time and space : if you have
a LOT of document using blank nodes then you might want to disable
this option in order to spend less time writing the forward lexicon
index (and it will also take up less space).
static final String DEFAULT_STORE_BLANK_NODES
static final String TERMID_BITS_TO_REVERSE
For the scale-out triple store, the term identifiers are formed by placing the index partition identifier in the high word and the local counter for the index partition into the low word. The effect of this option is to cause the low N bits of the local counter value to be reversed and written into the high N bits of the term identifier (the other bits are shifted down to make room for this). Regardless of the configured value for this option, all bits of the both the partition identifier and the local counter are preserved.
Normally, the low bits of a sequential counter will vary the most rapidly. By reversing the localCounter and placing some of the reversed bits into the high bits of the term identifier we cause the term identifiers to be uniformly (but not randomly) distributed. This is much like using hash function without collisions or a random number generator that does not produce duplicates. When the value of this option is ZERO (0), no bits are reversed so the high bits of the term identifiers directly reflect the partition identifier and the low bits are assigned sequentially by the local counter within each TERM2ID index partition.
The use of a non-zero value for this option can easily cause the write load on the index partitions for the ID2TERM and statement indices to be perfectly balanced. However, using too many bits has some negative consequences on locality of operations within an index partition (since the distribution of the keys be approximately uniform distribution, leading to poor cache performance, more copy-on-write for the B+Tree, and both more IO and faster growth in the journal for writes (since there will be more leaves made dirty on average by each bulk write)).
The use of a non-zero value for this option also directly effects the degree of scatter for bulk read or write operations. As more bits are used, it becomes increasingly likely that each bulk read or write operation will on average touch all index partitions. This is because #of low order local counter bits reversed and rotated into the high bits of the term identifier places an approximate bound on the #of index partitions of the ID2TERM or a statement index that will be touched by a scattered read or write. However, that number will continue to grow slowly over time as new partition identifiers are introduced (the partition identifiers appear next in the encoded term identifier and therefore determine the degree of locality or scatter once the quickly varying high bits have had their say).
The "right" value really depends on the expected scale of the knowledge base. If you estimate that you will have 50 x 200M index partitions for the statement indices, then SQRT(50) =~ 7 would be a good choice.
TermIdEncoder
static final String DEFAULT_TERMID_BITS_TO_REVERSE
static final String TERM_CACHE_CAPACITY
Value
s by
their term identifier.static final String DEFAULT_TERM_CACHE_CAPACITY
static final String VOCABULARY_CLASS
Vocabulary
for the database (default
). The class MUST extend
BaseVocabulary
. This option is ignored if the lexicon is
disabled.
The Vocabulary
is initialized by
AbstractTripleStore.create()
. Its state is stored in the
global row store under the TripleStoreSchema.VOCABULARY
property. The named Vocabulary
class will be used to
instantiate a consistent vocabulary mapping each time a view of the
AbstractTripleStore
is materialized. This depends on the
named Vocabulary
class having a stable behavior. Thus the
BaseVocabulary
class builds in protection against version
changes and will refuse to materialize a view of the
AbstractTripleStore
if the Vocabulary
would not be
consistent.
The BaseVocabulary
class is designed for easy and modular
extension. You can trivially define a concrete instance of this class
which provides any (reasonable) number of VocabularyDecl
instances. Each VocabularyDecl
declares the namespace(s) and
the URI
s for some ontology. A number of such classes have
been created and are combined by the
DEFAULT_VOCABULARY_CLASS
. You can create your own
VocabularyDecl
classes and combine them within your own
Vocabulary
, but it must extend BaseVocabulary
.
Note: There is an interaction between the Vocabulary
and
IExtension
s. The IDatatypeURIResolver
requires that
URIs used by an IExtension
are pre-declared by the
Vocabulary
.
static final String DEFAULT_VOCABULARY_CLASS
Vocabulary
class may be changed from time
to time as additional VocabularyDecl
are created and bundled
into a new default Vocabulary
. However, a deployed concrete
instance of the default Vocabulary
class MUST NOT be modified
since that could introduce inconsistencies into the URI to IV mapping
which it provides for AbstractTripleStore
s created using that
class.static final String AXIOMS_CLASS
Axioms
model that will be used (default
). The value is the name of the
class that will be instantiated by
AbstractTripleStore.create()
. The class must extend
BaseAxioms
. This option is ignored if the lexicon is
disabled. Use NoAxioms
to disable inference.static final String DEFAULT_AXIOMS_CLASS
static final String CLOSURE_CLASS
BaseClosure
class that will be used (default
). The value is the name of
the class that will be used to generate the Program
that
computes the closure of the database. The class must extend
BaseClosure
. This option is ignored if the inference is
disabled.
There are two pre-defined "programs" used to compute and maintain
closure. The FullClosure
program is a simple fix point of the
RDFS+ entailments, except for the
foo rdf:type rdfs:Resource
entailments which are
normally generated at query time. The FastClosure
program
breaks nearly all cycles in the RDFS rules and runs nearly entirely
as a sequence of IRule
s, including several custom rules.
It is far easier to modify the FullClosure
program since any
new rules can just be dropped into place. Modifying the
FastClosure
program requires careful consideration of the
entailments computed at each stage in order to determine where a new
rule would fit in.
Note: When support for owl:sameAs
, etc. processing is
enabled, some of the entailments are computed by rules run during
forward closure and some of the entailments are computed by rules run
at query time. Both FastClosure
and FullClosure
are
aware of this and handle it correctly (e.g., as configured).
static final String DEFAULT_CLOSURE_CLASS
static final String ONE_ACCESS_PATH
false
) disables all but a
single statement index (aka access path).
Note: The main purpose of the option is to make it possible to turn
off the other access paths for special bulk load purposes. The use of
this option is NOT compatible with either the application of the
InferenceEngine
or high-level query.
Note: You may want to explicitly enable or disable the bloom filter
for this. Normally a single access path (SPO) is used for a temporary
store. Temporary stores tend to be smaller, so if you will also be
doing point tests on the temporary store then you probably want to
use the BLOOM_FILTER
. Otherwise it may be turned off to
realize some (minimal) performance gain.
static final String DEFAULT_ONE_ACCESS_PATH
static final String BLOOM_FILTER
AbstractBTree
s. While the mutable BTree
s might
occasionally grow too large to support a bloom filter, data is
periodically migrated onto immutable IndexSegment
s which have
perfect fit bloom filters. This means that the bloom filter
scales-out, but not up.
Note: The SPO access path is used any time we have an access path that corresponds to a point test. Therefore this is the only index for which it makes sense to maintain a bloom filter.
If you are going to do a lot of small commits, then please DO NOT
enable the bloom filter for the AbstractTripleStore
. The
bloom filter takes 1 MB each time you commit on the SPO/SPOC index.
The bloom filter limited value in any case for scale-up since its
nominal error rate will be exceeded at ~2M triples. This concern does
not apply for scale-out, where the bloom filter is always a good
idea.
static final String DEFAULT_BLOOM_FILTER
static final String JUSTIFY
true
(default ),
proof chains for entailments generated by forward chaining are stored
in the database. This option is required for truth maintenance when
retracting assertion.
If you will not be retracting statements from the database then you
can specify false
for a significant performance boost
during writes and a smaller profile on the disk.
This option does not effect query performance since the justifications are maintained in a distinct index and are only used when retracting assertions.
static final String DEFAULT_JUSTIFY
static final String STATEMENT_IDENTIFIERS
Statement identifiers are assigned consistently when Statement
s are mapped into the database. This is done using an extension of
the term:id
index to map the statement as if it were a
term onto a unique statement identifier. While the statement
identifier is assigned canonically by the term:id
index,
it is stored redundantly in the value position for each of the
statement indices. While the statement identifier is, in fact, a term
identifier, the reverse mapping is NOT stored in the id:term index
and you CAN NOT translate from a statement identifier back to the
original statement.
bigdata supports an RDF/XML interchange extension for the interchange
of triples with statement identifiers that may be used as
blank nodes to make statements about statements. See BD
and
RDFXMLParser
.
Statement identifiers add some latency when loading data since it increases the size of the writes on the terms index (and also its space requirements since all statements are also replicated in the terms index). However, if you are doing concurrent data load then the added latency is nicely offset by the parallelism.
The main benefit for statement identifiers is that they provide a mechanism for statement level provenance. This is critical for some applications.
An alternative approach to provenance within RDF is to use the concatenation of the subject, predicate, and object (or a hash of their concatenation) as the value in the context position. While this approach can be used with any quad store, it is less transparent and requires twice the amount of data on the disk since you need an additional three statement indices to cover the quad access paths.
The provenance mode (SIDs) IS NOT compatible with the QUADS
mode. You may use either one, but not both in the same KB instance.
There are examples for using the provenance mode online.
static final String DEFAULT_STATEMENT_IDENTIFIERS
static final String QUADS
STATEMENT_IDENTIFIERS
option determines whether or not the
provenance mode is enabled.static final String DEFAULT_QUADS
static final String TRIPLES_MODE
QUADS
= false
STATEMENT_IDENTIFIERS
= false
static final String DEFAULT_TRIPLES_MODE
static final String TRIPLES_MODE_WITH_PROVENANCE
QUADS
= false
STATEMENT_IDENTIFIERS
= true
static final String DEFAULT_TRIPLES_MODE_WITH_PROVENANCE
static final String QUADS_MODE
QUADS
= true
STATEMENT_IDENTIFIERS
= false
AXIOMS_CLASS
= com.bigdata.rdf.store.AbstractTripleStore.NoAxioms
static final String DEFAULT_QUADS_MODE
static final String VALUE_FACTORY_CLASS
BigdataValueFactory
class. The implementation
MUST declare a method with the following signature which will be used
as a canonicalizing factory for the instances of that class.
public static BigdataValueFactory getInstance(final String namespace)
DEFAULT_VALUE_FACTORY_CLASS
static final String DEFAULT_VALUE_FACTORY_CLASS
static final String TEXT_INDEX
static final String DEFAULT_TEXT_INDEX
@Deprecated static final String SUBJECT_CENTRIC_TEXT_INDEX
true
) enables support for a full
text index that may be used to lookup literals by tokens found in the
text of those literals.@Deprecated static final String DEFAULT_SUBJECT_CENTRIC_TEXT_INDEX
static final String TEXT_INDEX_DATATYPE_LITERALS
xsd:string
,
xsd:int
, etc. If disabled, only plain, xsd:string
and rdf:langString
(language tagged literals), will be indexed.static final String DEFAULT_TEXT_INDEX_DATATYPE_LITERALS
static final String DATATYPES_TO_TEXT_INDEX
TEXT_INDEX_DATATYPE_LITERALS
is not enabled
(default "").static final String DEFAULT_DATATYPES_TO_TEXT_INDEX
static final String TEXT_INDEXER_CLASS
IValueCentricTextIndexer
class. The implementation MUST
declare a method with the following signature which will be used to
locate instances of that class.
static public ITextIndexer getInstance(final IIndexManager indexManager, final String namespace, final Long timestamp, final Properties properties)
DEFAULT_TEXT_INDEXER_CLASS
static final String DEFAULT_TEXT_INDEXER_CLASS
static final String SUBJECT_CENTRIC_TEXT_INDEXER_CLASS
ITextIndexer
class. The implementation MUST
declare a method with the following signature which will be used to
locate instances of that class.
static public ITextIndexer getInstance(final IIndexManager indexManager, final String namespace, final Long timestamp, final Properties properties)
DEFAULT_TEXT_INDEXER_CLASS
static final String DEFAULT_SUBJECT_CENTRIC_TEXT_INDEXER_CLASS
static final String BLOBS_THRESHOLD
Value
will be inserted into the LexiconKeyOrder.BLOBS
index rather
than the LexiconKeyOrder.TERM2ID
and
LexiconKeyOrder.ID2TERM
indices (default
"256").
The LexiconKeyOrder.BLOBS
index is capable of storing very
large literals but has more IO scatter due to the hash code component
of the key for that index. Therefore smaller RDF Value
s
should be inserted into the LexiconKeyOrder.TERM2ID
and
LexiconKeyOrder.ID2TERM
indices while very large RDF
Value
s MUST be inserted into the
LexiconKeyOrder.BLOBS
index.
The LexiconKeyOrder.TERM2ID
index keys are Unicode sort codes
based on the RDF Value
s. This threshold essentially limits
the maximum length of the keys in the LexiconKeyOrder.TERM2ID
index.
Note: The BLOBS index MAY be disabled entirely by setting this
property to BLOBS_THRESHOLD_DISABLE
(@value
BLOBS_THRESHOLD_DISABLE
). However, this is generally not
advised since it implies that large literals will be appear as keys
in the TERM2ID index.
static final String DEFAULT_BLOBS_THRESHOLD
static final String BLOBS_THRESHOLD_DISABLE
static final String INLINE_XSD_DATATYPE_LITERALS
Note: xsd:dateTime
inlining is controlled by a distinct
option. See INLINE_DATE_TIMES
.
Note: xsd:string
inlining and the inlining of non-xsd
literals are controlled by INLINE_TEXT_LITERALS
and
MAX_INLINE_TEXT_LENGTH
.
static final String DEFAULT_INLINE_XSD_DATATYPE_LITERALS
static final String INLINE_TEXT_LITERALS
MAX_INLINE_TEXT_LENGTH
characters (default "false").
Note: This option exists mainly to support a scale-out design in which everything is inlined into the statement indices. This design is similar to the YARS2 system with its ISAM files and has the advantage that little or nothing is stored within the lexicon.
Inlining of large literals via this option is NOT compatible with
TEXT_INDEX
. The problem is that we need to index literals
which are inlined as well as those which are not inlined. While the
full text index does support this, indexing fully inline literals
only makes sense for reasonably short literals. This is because the
IV
of the inlined literal (a) embeds its (compressed) Unicode
representation; and (b) is replicated for each token within that
literal. For large literals, this causes a substantial expansion in
the full text index.
static final String DEFAULT_INLINE_TEXT_LITERALS
static final String MAX_INLINE_TEXT_LENGTH
XSDStringExtension
is registered by the
DefaultExtensionFactory
when GT ZERO (0).
Note: URIs may be readily inlined using this mechanism without
causing an interaction with the full text index since they are not
indexed by the full text index. However, inlining literals in this
manner causes the Unicode representation of the literal to be
duplicated within the full text index for each token in that literal.
See TEXT_INDEX
and INLINE_TEXT_LITERALS
.
DefaultExtensionFactory
static final String DEFAULT_MAX_INLINE_STRING_LENGTH
static final String INLINE_BNODES
See STORE_BLANK_NODES
.
static final String DEFAULT_INLINE_BNODES
static final String INLINE_DATE_TIMES
INLINE_DATE_TIMES_TIMEZONE
static final String DEFAULT_INLINE_DATE_TIMES
static final String INLINE_DATE_TIMES_TIMEZONE
INLINE_DATE_TIMES
static final String DEFAULT_INLINE_DATE_TIMES_TIMEZONE
static final String EXTENSION_FACTORY_CLASS
IExtensionFactory
class. The implementation
MUST declare a constructor that accepts an
IDatatypeURIResolver
as its only argument. The
IExtension
s constructed by the factory need a resolver to
resolve datatype URIs to term identifiers in the database.DEFAULT_EXTENSION_FACTORY_CLASS
static final String DEFAULT_EXTENSION_FACTORY_CLASS
static final String REJECT_INVALID_XSD_VALUES
true
AND is
true
, literals having an xsd datatype URI which can not
be validated against that datatype will be rejected (default
DEFAULT_REJECT_INVALID_XSD_VALUES
). For example, when
true
abc^^xsd:int
would be rejected. When
false
the literal will be accepted, but it will not be
inlined with the rest of the literals for that value space and will
typically encounter an SPARQL type error during query evaluation.static final String DEFAULT_REJECT_INVALID_XSD_VALUES
static final String CONSTRAIN_XXXC_SHARDS
XXXCShardSplitHandler
is applied (scale-out only, default
"true").
When true
, shards whose SPOKeyOrder
name ends
with "C" are constrained such that all quads for the same triple will
be co-located on the same shard. This constraint allows certain
optimizations for default graph handling.
This constraint may be used if you do not expect to have more than ~200MB worth of distinct graphs within which the same triple may be asserted. This is a soft constraint as larger shards are permitted, but performance will degrade if this constraint forces some shards to be many times larger than their nominal capacity.
XXXCShardSplitHandler
static final String DEFAULT_CONSTRAIN_XXXC_SHARDS
static final String HISTORY_SERVICE
true
a HISTORY SERVICE and its associated index
will be maintained.static final String DEFAULT_HISTORY_SERVICE
static final String HISTORY_SERVICE_MIN_RELEASE_AGE
HISTORY_SERVICE
(default
). The head of the
index will be pruned during update to remove tuples associated with
older commit points.static final String DEFAULT_HISTORY_SERVICE_MIN_RELEASE_AGE
static final String BOTTOM_UP_EVALUATION
ASTBottomUpOptimizer}
static final String DEFAULT_BOTTOM_UP_EVALUATION
static final String INLINE_URI_FACTORY_CLASS
IInlineURIFactory
class.DEFAULT_EXTENSION_FACTORY_CLASS
static final String DEFAULT_INLINE_URI_FACTORY_CLASS
static final String RDR_HISTORY_CLASS
RDRHistory
class. Null by default.static final String COMPUTE_CLOSURE_FOR_SIDS
static final String DEFAULT_COMPUTE_CLOSURE_FOR_SIDS
static final String ENABLE_RAW_RECORDS_SUPPORT
static final String DEFAULT_ENABLE_RAW_RECORDS_SUPPORT
Copyright © 2006–2019 SYSTAP, LLC DBA Blazegraph. All rights reserved.