public class LexiconRelation extends AbstractRelation<BigdataValue> implements IDatatypeURIResolver
LexiconRelation handles all things related to the indices mapping
external RDF Values onto IVs (internal values)s and provides
methods for efficient materialization of external RDF Values from
IVs.AbstractResource.Options| Modifier and Type | Field and Description |
|---|---|
static String |
NAME_LEXICON_RELATION
Constant for the
LexiconRelation namespace component. |
indexManager| Constructor and Description |
|---|
LexiconRelation(AbstractTripleStore container,
IIndexManager indexManager,
String namespace,
Long timestamp,
Properties properties) |
LexiconRelation(IIndexManager indexManager,
String namespace,
Long timestamp,
Properties properties)
Note: The term:id and id:term indices MUST use unisolated write operation
to ensure consistency without write-write conflicts.
|
| Modifier and Type | Method and Description |
|---|---|
long |
addTerms(BigdataValue[] values,
int numTerms,
boolean readOnly)
Batch insert of terms into the database.
|
Iterator<org.openrdf.model.Value> |
blobsIterator()
|
void |
buildSubjectCentricTextIndex()
Deprecated.
Feature was never completed due to scalability issues. See
BZLG-1548, BLZG-563.
|
static void |
clearTermCacheFactory(String namespace)
Clear all term caches for the supplied namespace.
|
void |
create()
Create any logically contained resources (relations, indices).
|
long |
delete(IChunkedOrderedIterator<BigdataValue> itr)
Note : this method is part of the mutation api.
|
void |
destroy()
Destroy any logically contained resources (relations, indices).
|
protected Class<IExtensionFactory> |
determineExtensionFactoryClass() |
protected Class<IInlineURIFactory> |
determineInlineURIFactoryClass() |
protected Class<ISubjectCentricTextIndexer> |
determineSubjectCentricTextIndexerClass() |
protected Class<IValueCentricTextIndexer> |
determineTextIndexerClass() |
protected Class<BigdataValueFactory> |
determineValueFactoryClass() |
boolean |
exists() |
IIndex |
getBlobsIndex() |
protected IndexMetadata |
getBlobsIndexMetadata(String name)
Return the
IndexMetadata for the TERMS index. |
AbstractTripleStore |
getContainer()
Strengthens the return type.
|
Class<BigdataValue> |
getElementClass()
Return the class for the generic type of this relation.
|
IIndex |
getId2TermIndex() |
protected IndexMetadata |
getId2TermIndexMetadata(String name)
Return the
IndexMetadata for the ID2TERM index. |
IIndex |
getIndex(IKeyOrder<? extends BigdataValue> keyOrder)
Overridden to use local cache of the index reference.
|
Set<String> |
getIndexNames()
Return the fully qualified name of each index maintained by this
relation.
|
TimeZone |
getInlineDateTimesTimeZone()
Return the default time zone to be used for inlining.
|
IV |
getInlineIV(org.openrdf.model.Value value)
Attempt to convert the value to an inline internal value.
|
IV |
getIV(org.openrdf.model.Value value)
Deprecated.
Not even the unit tests should be doing this.
|
IKeyOrder<BigdataValue> |
getKeyOrder(IPredicate<BigdataValue> p)
Return the
IKeyOrder for the predicate corresponding to the
perfect access path. |
Iterator<IKeyOrder<BigdataValue>> |
getKeyOrders()
Return the
IKeyOrders corresponding to the registered indices for
this relation. |
ILexiconConfiguration<BigdataValue> |
getLexiconConfiguration()
Return the
lexiconConfiguration instance. |
int |
getMaxInlineStringLength()
Return the maximum length a string value which may be inlined into the
statement indices.
|
LexiconKeyOrder |
getPrimaryKeyOrder()
Return the
IKeyOrder for the primary index for the relation. |
IValueCentricTextIndexer<?> |
getSearchEngine()
A factory returning the softly held singleton for the
FullTextIndex. |
ISubjectCentricTextIndexer<?> |
getSubjectCentricSearchEngine()
Deprecated.
Feature was never completed due to scalability issues. See
BZLG-1548, BLZG-563.
|
BigdataValue |
getTerm(IV iv)
Note:
BNodes are not stored in the reverse lexicon and are
recognized using AbstractTripleStore#isBNode(long). |
IIndex |
getTerm2IdIndex() |
protected IndexMetadata |
getTerm2IdIndexMetadata(String name)
Return the
IndexMetadata for the TERM2ID index. |
int |
getTermIdBitsToReverse()
The #of low bits from the term identifier that are reversed and
rotated into the high bits when it is assigned.
|
Map<IV<?,?>,BigdataValue> |
getTerms(Collection<IV<?,?>> ivs)
Batch resolution of internal values to
BigdataValues. |
Map<IV<?,?>,BigdataValue> |
getTerms(Collection<IV<?,?>> ivsUnmodifiable,
int termsChunksSize,
int blobsChunkSize)
Batch resolution of internal values to
BigdataValues. |
BigdataValueFactory |
getValueFactory()
The canonical
BigdataValueFactoryImpl reference (JVM wide) for the
lexicon namespace. |
LexiconRelation |
init()
The default implementation only logs the event.
|
long |
insert(IChunkedOrderedIterator<BigdataValue> itr)
Note : this method is part of the mutation api.
|
boolean |
isBlob(org.openrdf.model.Value v)
|
boolean |
isInlineDateTimes()
Return
true if xsd:datetime literals are being inlined into
the statement indices. |
boolean |
isInlineLiterals()
Return
true if datatype literals are being inlined into
the statement indices. |
boolean |
isStoreBlankNodes()
true iff blank nodes are being stored in the lexicon's
forward index. |
boolean |
isSubjectCentricTextIndex()
Deprecated.
Feature was never completed due to scalability issues. See
BZLG-1548, BLZG-563.
|
boolean |
isTextIndex()
true iff the (value centric) full text index is enabled. |
IAccessPath<BigdataValue> |
newAccessPath(IIndexManager localIndexManager,
IPredicate<BigdataValue> predicate,
IKeyOrder<BigdataValue> keyOrder)
Necessary for lexicon joins, which are injected into query plans as
necessary by the query planner.
|
BigdataValue |
newElement(List<BOp> a,
IBindingSet bindingSet)
Note : this method is part of the mutation api.
|
Iterator<IV> |
prefixScan(org.openrdf.model.Literal lit)
A scan of all literals having the given literal as a prefix.
|
Iterator<IV> |
prefixScan(org.openrdf.model.Literal[] lits)
A scan of all literals having any of the given literals as a prefix.
|
void |
rebuildTextIndex(boolean forceCreate)
Utility method to (re-)build the full text index.
|
BigdataURI |
resolve(org.openrdf.model.URI uri)
Returns a fully resolved datatype URI with the
IV set. |
getAccessPath, getAccessPath, getAccessPath, getFQN, getFQN, getFQN, getIndex, getIndex, newIndexMetadataacquireExclusiveLock, assertWritable, getBareProperties, getChunkCapacity, getChunkOfChunksCapacity, getChunkTimeout, getCommitTime, getContainerNamespace, getExecutorService, getFullyBufferedReadThreshold, getIndexManager, getMaxParallelSubqueries, getNamespace, getProperties, getProperty, getProperty, getTimestamp, isForceSerialExecution, isReadOnly, toString, unlockclone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, waitgetExecutorService, getIndexManagergetContainerNamespace, getNamespace, getTimestamppublic static final transient String NAME_LEXICON_RELATION
LexiconRelation namespace component.
Note: To obtain the fully qualified name of an index in the
LexiconRelation you need to append a "." to the relation's
namespace, then this constant, then a "." and then the local name of the
index.
public LexiconRelation(IIndexManager indexManager, String namespace, Long timestamp, Properties properties)
indexManager - namespace - timestamp - properties - public LexiconRelation(AbstractTripleStore container, IIndexManager indexManager, String namespace, Long timestamp, Properties properties)
protected Class<BigdataValueFactory> determineValueFactoryClass()
protected Class<IValueCentricTextIndexer> determineTextIndexerClass()
protected Class<ISubjectCentricTextIndexer> determineSubjectCentricTextIndexerClass()
protected Class<IExtensionFactory> determineExtensionFactoryClass()
protected Class<IInlineURIFactory> determineInlineURIFactoryClass()
public BigdataValueFactory getValueFactory()
BigdataValueFactoryImpl reference (JVM wide) for the
lexicon namespace.public AbstractTripleStore getContainer()
getContainer in class AbstractResource<IRelation<BigdataValue>>null if there is no container.public boolean exists()
public LexiconRelation init()
AbstractResourceinit in interface ILocatableResource<IRelation<BigdataValue>>init in class AbstractResource<IRelation<BigdataValue>>public void create()
IMutableResourceILocatableResource.init() is suitable for invocation from
IMutableResource.create(). Instead, you are responsible for invoking ILocatableResource.init()
from this method IFF it is appropriate to reuse its initialization logic.create in interface IMutableResource<IRelation<BigdataValue>>create in class AbstractResource<IRelation<BigdataValue>>public void destroy()
IMutableResourcedestroy in interface IMutableResource<IRelation<BigdataValue>>destroy in class AbstractResource<IRelation<BigdataValue>>public final boolean isInlineLiterals()
true if datatype literals are being inlined into
the statement indices.public final int getMaxInlineStringLength()
public final boolean isInlineDateTimes()
true if xsd:datetime literals are being inlined into
the statement indices.public final TimeZone getInlineDateTimesTimeZone()
public final int getTermIdBitsToReverse()
AbstractTripleStore.Options#TERMID_BITS_TO_REVERSEpublic final boolean isStoreBlankNodes()
true iff blank nodes are being stored in the lexicon's
forward index.AbstractTripleStore.Options#STORE_BLANK_NODESpublic final boolean isTextIndex()
true iff the (value centric) full text index is enabled.AbstractTripleStore.Options#TEXT_INDEX@Deprecated public final boolean isSubjectCentricTextIndex()
true iff the subject-centric full text index is enabled.AbstractTripleStore.Options#SUBJECT_CENTRIC_TEXT_INDEXpublic IIndex getIndex(IKeyOrder<? extends BigdataValue> keyOrder)
getIndex in interface IRelation<BigdataValue>getIndex in class AbstractRelation<BigdataValue>keyOrder - The natural index order.null iff the index does not exist as
of the timestamp for this view of the relation.AbstractRelation.getIndex(String)public final IIndex getTerm2IdIndex()
public final IIndex getId2TermIndex()
public final IIndex getBlobsIndex()
public IValueCentricTextIndexer<?> getSearchEngine()
FullTextIndex.AbstractTripleStore.Options#TEXT_INDEX@Deprecated public ISubjectCentricTextIndexer<?> getSubjectCentricSearchEngine()
FullTextIndex representing the subject-centric full text index.AbstractTripleStore.Options#TEXT_INDEXprotected IndexMetadata getTerm2IdIndexMetadata(String name)
IndexMetadata for the TERM2ID index.name - The name of the index.IndexMetadata.protected IndexMetadata getId2TermIndexMetadata(String name)
IndexMetadata for the ID2TERM index.name - The name of the index.IndexMetadata.protected IndexMetadata getBlobsIndexMetadata(String name)
IndexMetadata for the TERMS index.name - The name of the index.IndexMetadata.public Set<String> getIndexNames()
IRelationgetIndexNames in interface IRelation<BigdataValue>public Iterator<IKeyOrder<BigdataValue>> getKeyOrders()
IRelationIKeyOrders corresponding to the registered indices for
this relation. [rather than getIndexNames?]getKeyOrders in interface IRelation<BigdataValue>public LexiconKeyOrder getPrimaryKeyOrder()
IRelationIKeyOrder for the primary index for the relation.getPrimaryKeyOrder in interface IRelation<BigdataValue>public BigdataValue newElement(List<BOp> a, IBindingSet bindingSet)
newElement in interface IRelation<BigdataValue>a - An ordered list of variables and/or constants.bindingSet - A set of bindings.UnsupportedOperationExceptionpublic Class<BigdataValue> getElementClass()
IRelationgetElementClass in interface IRelation<BigdataValue>public long delete(IChunkedOrderedIterator<BigdataValue> itr)
delete in interface IMutableRelation<BigdataValue>itr - An iterator visiting the elements to be removed. Existing
elements in the relation having a key equal to the key formed
from the visited elements will be removed from the relation.UnsupportedOperationExceptionpublic long insert(IChunkedOrderedIterator<BigdataValue> itr)
insert in interface IMutableRelation<BigdataValue>itr - An iterator visiting the elements to be written.UnsupportedOperationExceptionpublic Iterator<IV> prefixScan(org.openrdf.model.Literal lit)
lit - A literal.Literals.
TODO Prefix scan only visits the TERM2ID index (blobs and inline
literals will not be observed). This should be mapped onto a free
text index query instead. In order to have the same semantics we
must also verify that (a) the prefix match is at the start of the
literal; and (b) the match is contiguous.public Iterator<IV> prefixScan(org.openrdf.model.Literal[] lits)
lits - An array of literals.Literals.
TODO Prefix scan only visits the TERM2ID index (blobs and inline
literals will not be observed). This should be mapped onto a free
text index query instead. In order to have the same semantics we
must also verify that (a) the prefix match is at the start of the
literal; and (b) the match is contiguous.public BigdataURI resolve(org.openrdf.model.URI uri)
IV set.
IExtensions handle encoding and decoding of inline literals for
custom datatypes, however to do so they need the IV for the
custom datatype. By passing an instance of this interface to the
IExtension, it will be able to resolve its datatype URI(s) and
cache them for future use.
The URIs used by IExtensions MUST be pre-declared by the
Vocabulary.
This interface is implemented by the LexiconRelation.
resolve in interface IDatatypeURIResolverIDatatypeURIResolverpublic boolean isBlob(org.openrdf.model.Value v)
v - The value.true if it is a "large value" according to the
configuration of the lexicon.AbstractTripleStore.Options#BLOBS_THRESHOLDpublic long addTerms(BigdataValue[] values, int numTerms, boolean readOnly)
Note: Duplicate BigdataValue references and BigdataValues
that already have an assigned term identifiers are ignored by this
operation.
Note: This implementation is designed to use unisolated batch writes on the terms and ids index that guarantee consistency.
If the full text index is enabled, then the terms will also be inserted into the full text index.
terms - An array whose elements [0:nterms-1] will be inserted.numTerms - The #of terms to insert.readOnly - When true, unknown terms will not be inserted
into the database. Otherwise unknown terms are inserted into
the database.public void rebuildTextIndex(boolean forceCreate)
AbstractTripleStore for this operation.
AbstractTripleStore.Options#TEXT_INDEX must be enabled. This
operation is only supported when the IValueCentricTextIndexer uses the
FullTextIndex class.forceCreate - When true a new text index will be created
for a namespace that had no it before.public final Map<IV<?,?>,BigdataValue> getTerms(Collection<IV<?,?>> ivs)
BigdataValues.ivs - An collection of internal values. This may be an unmodifiable collection.BigdataValue. If an
internal value was not resolved then the map will not contain an
entry for that internal value.@Deprecated public void buildSubjectCentricTextIndex()
AbstractTripleStore for this
operation. AbstractTripleStore.Options#TEXT_INDEX must be
enabled. This operation is only supported when the ITextIndexer
uses the FullTextIndex class.
The subject-based full text index is one that rolls up normal object-based full text index into a similarly structured index that captures relevancy across subjects. Instead of (t,s) => s.len, termWeight Where s is the subject's IV. The term weight has the same interpretation, but it is across all literals which are linked to that subject and which contain the given token. This index basically pre-computes the (?s ?p ?o) join that sometimes follows the (?o bd:search "xyz") request.
Truth Maintenance
We will need to perform truth maintenance on the subject-centric text index, that is - the index will need to be updated as statements are added and removed (to the extent that those statements involving a literal in the object position). Adding a statement is the easier case because we will never need to remove entries from the index, we can simply write over them with new relevance values. All that is involved with truth maintenance for adding a statement is taking a post- commit snapshot of the subject in the statement and running it through the indexer (a "subject-refresh").
The same "subject-refresh" will be necessary for truth maintenance for removal, but an additional step will be necessary beforehand - the index entries associated with the deleted subject/object (tokens+subject) will need to be removed in case the token appears only in the removed literal. After this pruning step the subject can be refreshed in the index exactly the same as for truth maintenance on add.
It looks like the right place to hook in truth maintenance for add is
AbstractTripleStore.addStatements(AbstractTripleStore, boolean, IChunkedOrderedIterator, com.bigdata.relation.accesspath.IElementFilter)
after the ISPOs are added to the SPORelation. Likewise, the place to hook
in truth maintenance for delete is
AbstractTripleStore.removeStatements(IChunkedOrderedIterator, boolean)
after the ISPOs are removed from the SPORelation.
public final Map<IV<?,?>,BigdataValue> getTerms(Collection<IV<?,?>> ivsUnmodifiable, int termsChunksSize, int blobsChunkSize)
BigdataValues.ivsUnmodifiable - An collection of internal valuesBigdataValue. If an
internal value was not resolved then the map will not contain an
entry for that internal value.getTerms(Collection)public static void clearTermCacheFactory(String namespace)
public final BigdataValue getTerm(IV iv)
BNodes are not stored in the reverse lexicon and are
recognized using AbstractTripleStore#isBNode(long).
Note: Statement identifiers (when enabled) are not stored in the reverse
lexicon and are recognized using
AbstractTripleStore#isStatement(IV). If the term identifier is
recognized as being, in fact, a statement identifier, then it is
externalized as a BNode. This fits rather well with the notion
in a quad store that the context position may be either a URI or
a BNode and the fact that you can use BNodes to "stamp"
statement identifiers.
Note: Handles both unisolatable and isolatable indices.
Note: Sets BigdataValue.getIV() as a side-effect.
Note: this always mints a new BNode instance when the term
identifier identifies a BNode or a Statement.
BigdataValue -or- null iff there is no
BigdataValue for that term identifier in the lexicon.public final IV getIV(org.openrdf.model.Value value)
Note: If BigdataValue.getIV() is set, then returns that value
immediately. Next, try to get an inline internal value for the value.
Otherwise looks up the termId in the index and
sets the term identifier as a side-effect.
#getTerms(Collection), Use this method to resolve {@link Value} to
their {@link IV}s efficiently.public final IV getInlineIV(org.openrdf.model.Value value)
BigdataValue and this method is successful, then the
IV will be set as a side-effect on the BigdataValue.value - The value to convertnull if it cannot be
convertedILexiconConfiguration.createInlineIV(Value)public Iterator<org.openrdf.model.Value> blobsIterator()
public ILexiconConfiguration<BigdataValue> getLexiconConfiguration()
lexiconConfiguration instance. Used to determine
how to encode and decode terms in the key space.public IKeyOrder<BigdataValue> getKeyOrder(IPredicate<BigdataValue> p)
IKeyOrder for the predicate corresponding to the
perfect access path. A perfect access path is one where the bound values
in the predicate form a prefix in the key space of the corresponding
index.
This implementation examines the predicate, looking at the
LexiconKeyOrder.SLOT_IV and LexiconKeyOrder.SLOT_TERM
slots and chooses the appropriate index based on the IV and/or
Value which it founds bound. When both slots are bound it prefers
the index for the IV => Value mapping as that index will
be faster (ID2TERM has a shorter key and higher fan-out than TERM2ID).
getKeyOrder in interface IRelation<BigdataValue>IKeyOrder for the perfect access path -or-
null if there is no index which provides a perfect
access path for that predicate.public IAccessPath<BigdataValue> newAccessPath(IIndexManager localIndexManager, IPredicate<BigdataValue> predicate, IKeyOrder<BigdataValue> keyOrder)
LexPredicate to
perform either a forward (BigdataValue to IV) or reverse
( IV to BigdataValue) lookup. Either lookup will cache
the BigdataValue on the IV as a side effect.
Note: If you query with IV or BigdataValue which is
already cached (either on one another or in the termsCache) then the
cached value will be returned (fast path).
Note: Blank nodes will not unify with themselves unless you are using told blank node semantics.
Note: This has the side effect of caching materialized
BigdataValues on IVs using
IVCache.setValue(BigdataValue) for use in downstream operators that
need materialized values to evaluate properly. The query planner is
responsible for managing when we materialize and cache values. This keeps
us from wiring BigdataValue onto IVs all the
time.
The lexicon has a single TERMS index. The keys are BlobIVs formed
from the VTE of the BigdataValue,
BigdataValue#hashCode(), and a collision counter. The value is
the BigdataValue as serialized by the
BigdataValueSerializer.
There are four possible ways to query this index using the
LexPredicate.
IV is given and its BigdataValue will be sought.BigdataValueis given and its IV will be sought.
This case requires a key-range scan with a filter. It has to scan the
collision bucket and filter for the specified Value. We get the collision
bucket by creating a prefix key for the Value (using its VTE and
hashCode). This will either return the IV for that Value or nothing.newAccessPath in class AbstractRelation<BigdataValue>localIndexManager - The local index manager (optional, except when there is a
request for a shard local access path in scale-out).predicate - The predicate used to request the access path.keyOrder - The index which the access path will use.LexAccessPatternEnum,
LexPredicate,
LexiconKeyOrderCopyright © 2006–2019 SYSTAP, LLC DBA Blazegraph. All rights reserved.