public class LexiconRelation extends AbstractRelation<BigdataValue> implements IDatatypeURIResolver
LexiconRelation
handles all things related to the indices mapping
external RDF Value
s onto IV
s (internal values)s and provides
methods for efficient materialization of external RDF Value
s from
IV
s.AbstractResource.Options
Modifier and Type | Field and Description |
---|---|
static String |
NAME_LEXICON_RELATION
Constant for the
LexiconRelation namespace component. |
indexManager
Constructor and Description |
---|
LexiconRelation(AbstractTripleStore container,
IIndexManager indexManager,
String namespace,
Long timestamp,
Properties properties) |
LexiconRelation(IIndexManager indexManager,
String namespace,
Long timestamp,
Properties properties)
Note: The term:id and id:term indices MUST use unisolated write operation
to ensure consistency without write-write conflicts.
|
Modifier and Type | Method and Description |
---|---|
long |
addTerms(BigdataValue[] values,
int numTerms,
boolean readOnly)
Batch insert of terms into the database.
|
Iterator<org.openrdf.model.Value> |
blobsIterator()
|
void |
buildSubjectCentricTextIndex()
Deprecated.
Feature was never completed due to scalability issues. See
BZLG-1548, BLZG-563.
|
static void |
clearTermCacheFactory(String namespace)
Clear all term caches for the supplied namespace.
|
void |
create()
Create any logically contained resources (relations, indices).
|
long |
delete(IChunkedOrderedIterator<BigdataValue> itr)
Note : this method is part of the mutation api.
|
void |
destroy()
Destroy any logically contained resources (relations, indices).
|
protected Class<IExtensionFactory> |
determineExtensionFactoryClass() |
protected Class<IInlineURIFactory> |
determineInlineURIFactoryClass() |
protected Class<ISubjectCentricTextIndexer> |
determineSubjectCentricTextIndexerClass() |
protected Class<IValueCentricTextIndexer> |
determineTextIndexerClass() |
protected Class<BigdataValueFactory> |
determineValueFactoryClass() |
boolean |
exists() |
IIndex |
getBlobsIndex() |
protected IndexMetadata |
getBlobsIndexMetadata(String name)
Return the
IndexMetadata for the TERMS index. |
AbstractTripleStore |
getContainer()
Strengthens the return type.
|
Class<BigdataValue> |
getElementClass()
Return the class for the generic type of this relation.
|
IIndex |
getId2TermIndex() |
protected IndexMetadata |
getId2TermIndexMetadata(String name)
Return the
IndexMetadata for the ID2TERM index. |
IIndex |
getIndex(IKeyOrder<? extends BigdataValue> keyOrder)
Overridden to use local cache of the index reference.
|
Set<String> |
getIndexNames()
Return the fully qualified name of each index maintained by this
relation.
|
TimeZone |
getInlineDateTimesTimeZone()
Return the default time zone to be used for inlining.
|
IV |
getInlineIV(org.openrdf.model.Value value)
Attempt to convert the value to an inline internal value.
|
IV |
getIV(org.openrdf.model.Value value)
Deprecated.
Not even the unit tests should be doing this.
|
IKeyOrder<BigdataValue> |
getKeyOrder(IPredicate<BigdataValue> p)
Return the
IKeyOrder for the predicate corresponding to the
perfect access path. |
Iterator<IKeyOrder<BigdataValue>> |
getKeyOrders()
Return the
IKeyOrder s corresponding to the registered indices for
this relation. |
ILexiconConfiguration<BigdataValue> |
getLexiconConfiguration()
Return the
lexiconConfiguration instance. |
int |
getMaxInlineStringLength()
Return the maximum length a string value which may be inlined into the
statement indices.
|
LexiconKeyOrder |
getPrimaryKeyOrder()
Return the
IKeyOrder for the primary index for the relation. |
IValueCentricTextIndexer<?> |
getSearchEngine()
A factory returning the softly held singleton for the
FullTextIndex . |
ISubjectCentricTextIndexer<?> |
getSubjectCentricSearchEngine()
Deprecated.
Feature was never completed due to scalability issues. See
BZLG-1548, BLZG-563.
|
BigdataValue |
getTerm(IV iv)
Note:
BNode s are not stored in the reverse lexicon and are
recognized using AbstractTripleStore#isBNode(long) . |
IIndex |
getTerm2IdIndex() |
protected IndexMetadata |
getTerm2IdIndexMetadata(String name)
Return the
IndexMetadata for the TERM2ID index. |
int |
getTermIdBitsToReverse()
The #of low bits from the term identifier that are reversed and
rotated into the high bits when it is assigned.
|
Map<IV<?,?>,BigdataValue> |
getTerms(Collection<IV<?,?>> ivs)
Batch resolution of internal values to
BigdataValue s. |
Map<IV<?,?>,BigdataValue> |
getTerms(Collection<IV<?,?>> ivsUnmodifiable,
int termsChunksSize,
int blobsChunkSize)
Batch resolution of internal values to
BigdataValue s. |
BigdataValueFactory |
getValueFactory()
The canonical
BigdataValueFactoryImpl reference (JVM wide) for the
lexicon namespace. |
LexiconRelation |
init()
The default implementation only logs the event.
|
long |
insert(IChunkedOrderedIterator<BigdataValue> itr)
Note : this method is part of the mutation api.
|
boolean |
isBlob(org.openrdf.model.Value v)
|
boolean |
isInlineDateTimes()
Return
true if xsd:datetime literals are being inlined into
the statement indices. |
boolean |
isInlineLiterals()
Return
true if datatype literals are being inlined into
the statement indices. |
boolean |
isStoreBlankNodes()
true iff blank nodes are being stored in the lexicon's
forward index. |
boolean |
isSubjectCentricTextIndex()
Deprecated.
Feature was never completed due to scalability issues. See
BZLG-1548, BLZG-563.
|
boolean |
isTextIndex()
true iff the (value centric) full text index is enabled. |
IAccessPath<BigdataValue> |
newAccessPath(IIndexManager localIndexManager,
IPredicate<BigdataValue> predicate,
IKeyOrder<BigdataValue> keyOrder)
Necessary for lexicon joins, which are injected into query plans as
necessary by the query planner.
|
BigdataValue |
newElement(List<BOp> a,
IBindingSet bindingSet)
Note : this method is part of the mutation api.
|
Iterator<IV> |
prefixScan(org.openrdf.model.Literal lit)
A scan of all literals having the given literal as a prefix.
|
Iterator<IV> |
prefixScan(org.openrdf.model.Literal[] lits)
A scan of all literals having any of the given literals as a prefix.
|
void |
rebuildTextIndex(boolean forceCreate)
Utility method to (re-)build the full text index.
|
BigdataURI |
resolve(org.openrdf.model.URI uri)
Returns a fully resolved datatype URI with the
IV set. |
getAccessPath, getAccessPath, getAccessPath, getFQN, getFQN, getFQN, getIndex, getIndex, newIndexMetadata
acquireExclusiveLock, assertWritable, getBareProperties, getChunkCapacity, getChunkOfChunksCapacity, getChunkTimeout, getCommitTime, getContainerNamespace, getExecutorService, getFullyBufferedReadThreshold, getIndexManager, getMaxParallelSubqueries, getNamespace, getProperties, getProperty, getProperty, getTimestamp, isForceSerialExecution, isReadOnly, toString, unlock
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
getExecutorService, getIndexManager
getContainerNamespace, getNamespace, getTimestamp
public static final transient String NAME_LEXICON_RELATION
LexiconRelation
namespace component.
Note: To obtain the fully qualified name of an index in the
LexiconRelation
you need to append a "." to the relation's
namespace, then this constant, then a "." and then the local name of the
index.
public LexiconRelation(IIndexManager indexManager, String namespace, Long timestamp, Properties properties)
indexManager
- namespace
- timestamp
- properties
- public LexiconRelation(AbstractTripleStore container, IIndexManager indexManager, String namespace, Long timestamp, Properties properties)
protected Class<BigdataValueFactory> determineValueFactoryClass()
protected Class<IValueCentricTextIndexer> determineTextIndexerClass()
protected Class<ISubjectCentricTextIndexer> determineSubjectCentricTextIndexerClass()
protected Class<IExtensionFactory> determineExtensionFactoryClass()
protected Class<IInlineURIFactory> determineInlineURIFactoryClass()
public BigdataValueFactory getValueFactory()
BigdataValueFactoryImpl
reference (JVM wide) for the
lexicon namespace.public AbstractTripleStore getContainer()
getContainer
in class AbstractResource<IRelation<BigdataValue>>
null
if there is no container.public boolean exists()
public LexiconRelation init()
AbstractResource
init
in interface ILocatableResource<IRelation<BigdataValue>>
init
in class AbstractResource<IRelation<BigdataValue>>
public void create()
IMutableResource
ILocatableResource.init()
is suitable for invocation from
IMutableResource.create()
. Instead, you are responsible for invoking ILocatableResource.init()
from this method IFF it is appropriate to reuse its initialization logic.create
in interface IMutableResource<IRelation<BigdataValue>>
create
in class AbstractResource<IRelation<BigdataValue>>
public void destroy()
IMutableResource
destroy
in interface IMutableResource<IRelation<BigdataValue>>
destroy
in class AbstractResource<IRelation<BigdataValue>>
public final boolean isInlineLiterals()
true
if datatype literals are being inlined into
the statement indices.public final int getMaxInlineStringLength()
public final boolean isInlineDateTimes()
true
if xsd:datetime literals are being inlined into
the statement indices.public final TimeZone getInlineDateTimesTimeZone()
public final int getTermIdBitsToReverse()
AbstractTripleStore.Options#TERMID_BITS_TO_REVERSE
public final boolean isStoreBlankNodes()
true
iff blank nodes are being stored in the lexicon's
forward index.AbstractTripleStore.Options#STORE_BLANK_NODES
public final boolean isTextIndex()
true
iff the (value centric) full text index is enabled.AbstractTripleStore.Options#TEXT_INDEX
@Deprecated public final boolean isSubjectCentricTextIndex()
true
iff the subject-centric full text index is enabled.AbstractTripleStore.Options#SUBJECT_CENTRIC_TEXT_INDEX
public IIndex getIndex(IKeyOrder<? extends BigdataValue> keyOrder)
getIndex
in interface IRelation<BigdataValue>
getIndex
in class AbstractRelation<BigdataValue>
keyOrder
- The natural index order.null
iff the index does not exist as
of the timestamp for this view of the relation.AbstractRelation.getIndex(String)
public final IIndex getTerm2IdIndex()
public final IIndex getId2TermIndex()
public final IIndex getBlobsIndex()
public IValueCentricTextIndexer<?> getSearchEngine()
FullTextIndex
.AbstractTripleStore.Options#TEXT_INDEX
@Deprecated public ISubjectCentricTextIndexer<?> getSubjectCentricSearchEngine()
FullTextIndex
representing the subject-centric full text index.AbstractTripleStore.Options#TEXT_INDEX
protected IndexMetadata getTerm2IdIndexMetadata(String name)
IndexMetadata
for the TERM2ID index.name
- The name of the index.IndexMetadata
.protected IndexMetadata getId2TermIndexMetadata(String name)
IndexMetadata
for the ID2TERM index.name
- The name of the index.IndexMetadata
.protected IndexMetadata getBlobsIndexMetadata(String name)
IndexMetadata
for the TERMS index.name
- The name of the index.IndexMetadata
.public Set<String> getIndexNames()
IRelation
getIndexNames
in interface IRelation<BigdataValue>
public Iterator<IKeyOrder<BigdataValue>> getKeyOrders()
IRelation
IKeyOrder
s corresponding to the registered indices for
this relation. [rather than getIndexNames?]getKeyOrders
in interface IRelation<BigdataValue>
public LexiconKeyOrder getPrimaryKeyOrder()
IRelation
IKeyOrder
for the primary index for the relation.getPrimaryKeyOrder
in interface IRelation<BigdataValue>
public BigdataValue newElement(List<BOp> a, IBindingSet bindingSet)
newElement
in interface IRelation<BigdataValue>
a
- An ordered list of variables and/or constants.bindingSet
- A set of bindings.UnsupportedOperationException
public Class<BigdataValue> getElementClass()
IRelation
getElementClass
in interface IRelation<BigdataValue>
public long delete(IChunkedOrderedIterator<BigdataValue> itr)
delete
in interface IMutableRelation<BigdataValue>
itr
- An iterator visiting the elements to be removed. Existing
elements in the relation having a key equal to the key formed
from the visited elements will be removed from the relation.UnsupportedOperationException
public long insert(IChunkedOrderedIterator<BigdataValue> itr)
insert
in interface IMutableRelation<BigdataValue>
itr
- An iterator visiting the elements to be written.UnsupportedOperationException
public Iterator<IV> prefixScan(org.openrdf.model.Literal lit)
lit
- A literal.Literal
s.
TODO Prefix scan only visits the TERM2ID index (blobs and inline
literals will not be observed). This should be mapped onto a free
text index query instead. In order to have the same semantics we
must also verify that (a) the prefix match is at the start of the
literal; and (b) the match is contiguous.public Iterator<IV> prefixScan(org.openrdf.model.Literal[] lits)
lits
- An array of literals.Literal
s.
TODO Prefix scan only visits the TERM2ID index (blobs and inline
literals will not be observed). This should be mapped onto a free
text index query instead. In order to have the same semantics we
must also verify that (a) the prefix match is at the start of the
literal; and (b) the match is contiguous.public BigdataURI resolve(org.openrdf.model.URI uri)
IV
set.
IExtension
s handle encoding and decoding of inline literals for
custom datatypes, however to do so they need the IV
for the
custom datatype. By passing an instance of this interface to the
IExtension
, it will be able to resolve its datatype URI(s) and
cache them for future use.
The URIs used by IExtension
s MUST be pre-declared by the
Vocabulary
.
This interface is implemented by the LexiconRelation
.
resolve
in interface IDatatypeURIResolver
IDatatypeURIResolver
public boolean isBlob(org.openrdf.model.Value v)
v
- The value.true
if it is a "large value" according to the
configuration of the lexicon.AbstractTripleStore.Options#BLOBS_THRESHOLD
public long addTerms(BigdataValue[] values, int numTerms, boolean readOnly)
Note: Duplicate BigdataValue
references and BigdataValue
s
that already have an assigned term identifiers are ignored by this
operation.
Note: This implementation is designed to use unisolated batch writes on the terms and ids index that guarantee consistency.
If the full text index is enabled, then the terms will also be inserted into the full text index.
terms
- An array whose elements [0:nterms-1] will be inserted.numTerms
- The #of terms to insert.readOnly
- When true
, unknown terms will not be inserted
into the database. Otherwise unknown terms are inserted into
the database.public void rebuildTextIndex(boolean forceCreate)
AbstractTripleStore
for this operation.
AbstractTripleStore.Options#TEXT_INDEX
must be enabled. This
operation is only supported when the IValueCentricTextIndexer
uses the
FullTextIndex
class.forceCreate
- When true
a new text index will be created
for a namespace that had no it before.public final Map<IV<?,?>,BigdataValue> getTerms(Collection<IV<?,?>> ivs)
BigdataValue
s.ivs
- An collection of internal values. This may be an unmodifiable collection.BigdataValue
. If an
internal value was not resolved then the map will not contain an
entry for that internal value.@Deprecated public void buildSubjectCentricTextIndex()
AbstractTripleStore
for this
operation. AbstractTripleStore.Options#TEXT_INDEX
must be
enabled. This operation is only supported when the ITextIndexer
uses the FullTextIndex
class.
The subject-based full text index is one that rolls up normal object-based full text index into a similarly structured index that captures relevancy across subjects. Instead of (t,s) => s.len, termWeight Where s is the subject's IV. The term weight has the same interpretation, but it is across all literals which are linked to that subject and which contain the given token. This index basically pre-computes the (?s ?p ?o) join that sometimes follows the (?o bd:search "xyz") request.
Truth Maintenance
We will need to perform truth maintenance on the subject-centric text index, that is - the index will need to be updated as statements are added and removed (to the extent that those statements involving a literal in the object position). Adding a statement is the easier case because we will never need to remove entries from the index, we can simply write over them with new relevance values. All that is involved with truth maintenance for adding a statement is taking a post- commit snapshot of the subject in the statement and running it through the indexer (a "subject-refresh").
The same "subject-refresh" will be necessary for truth maintenance for removal, but an additional step will be necessary beforehand - the index entries associated with the deleted subject/object (tokens+subject) will need to be removed in case the token appears only in the removed literal. After this pruning step the subject can be refreshed in the index exactly the same as for truth maintenance on add.
It looks like the right place to hook in truth maintenance for add is
AbstractTripleStore.addStatements(AbstractTripleStore, boolean, IChunkedOrderedIterator, com.bigdata.relation.accesspath.IElementFilter)
after the ISPOs are added to the SPORelation. Likewise, the place to hook
in truth maintenance for delete is
AbstractTripleStore.removeStatements(IChunkedOrderedIterator, boolean)
after the ISPOs are removed from the SPORelation.
public final Map<IV<?,?>,BigdataValue> getTerms(Collection<IV<?,?>> ivsUnmodifiable, int termsChunksSize, int blobsChunkSize)
BigdataValue
s.ivsUnmodifiable
- An collection of internal valuesBigdataValue
. If an
internal value was not resolved then the map will not contain an
entry for that internal value.getTerms(Collection)
public static void clearTermCacheFactory(String namespace)
public final BigdataValue getTerm(IV iv)
BNode
s are not stored in the reverse lexicon and are
recognized using AbstractTripleStore#isBNode(long)
.
Note: Statement identifiers (when enabled) are not stored in the reverse
lexicon and are recognized using
AbstractTripleStore#isStatement(IV)
. If the term identifier is
recognized as being, in fact, a statement identifier, then it is
externalized as a BNode
. This fits rather well with the notion
in a quad store that the context position may be either a URI
or
a BNode
and the fact that you can use BNode
s to "stamp"
statement identifiers.
Note: Handles both unisolatable and isolatable indices.
Note: Sets BigdataValue.getIV()
as a side-effect.
Note: this always mints a new BNode
instance when the term
identifier identifies a BNode
or a Statement
.
BigdataValue
-or- null
iff there is no
BigdataValue
for that term identifier in the lexicon.public final IV getIV(org.openrdf.model.Value value)
Note: If BigdataValue.getIV()
is set, then returns that value
immediately. Next, try to get an inline internal value for the value.
Otherwise looks up the termId in the index and
sets the term identifier
as a side-effect.
#getTerms(Collection), Use this method to resolve {@link Value} to
their {@link IV}s efficiently.
public final IV getInlineIV(org.openrdf.model.Value value)
BigdataValue
and this method is successful, then the
IV
will be set as a side-effect on the BigdataValue
.value
- The value to convertnull
if it cannot be
convertedILexiconConfiguration.createInlineIV(Value)
public Iterator<org.openrdf.model.Value> blobsIterator()
public ILexiconConfiguration<BigdataValue> getLexiconConfiguration()
lexiconConfiguration
instance. Used to determine
how to encode and decode terms in the key space.public IKeyOrder<BigdataValue> getKeyOrder(IPredicate<BigdataValue> p)
IKeyOrder
for the predicate corresponding to the
perfect access path. A perfect access path is one where the bound values
in the predicate form a prefix in the key space of the corresponding
index.
This implementation examines the predicate, looking at the
LexiconKeyOrder.SLOT_IV
and LexiconKeyOrder.SLOT_TERM
slots and chooses the appropriate index based on the IV
and/or
Value
which it founds bound. When both slots are bound it prefers
the index for the IV
=> Value
mapping as that index will
be faster (ID2TERM has a shorter key and higher fan-out than TERM2ID).
getKeyOrder
in interface IRelation<BigdataValue>
IKeyOrder
for the perfect access path -or-
null
if there is no index which provides a perfect
access path for that predicate.public IAccessPath<BigdataValue> newAccessPath(IIndexManager localIndexManager, IPredicate<BigdataValue> predicate, IKeyOrder<BigdataValue> keyOrder)
LexPredicate
to
perform either a forward (BigdataValue
to IV
) or reverse
( IV
to BigdataValue
) lookup. Either lookup will cache
the BigdataValue
on the IV
as a side effect.
Note: If you query with IV
or BigdataValue
which is
already cached (either on one another or in the termsCache) then the
cached value will be returned (fast path).
Note: Blank nodes will not unify with themselves unless you are using told blank node semantics.
Note: This has the side effect of caching materialized
BigdataValue
s on IV
s using
IVCache.setValue(BigdataValue)
for use in downstream operators that
need materialized values to evaluate properly. The query planner is
responsible for managing when we materialize and cache values. This keeps
us from wiring BigdataValue
onto IV
s all the
time.
The lexicon has a single TERMS index. The keys are BlobIV
s formed
from the VTE
of the BigdataValue
,
BigdataValue#hashCode()
, and a collision counter. The value is
the BigdataValue
as serialized by the
BigdataValueSerializer
.
There are four possible ways to query this index using the
LexPredicate
.
IV
is given and its BigdataValue
will be sought.BigdataValue
is given and its IV
will be sought.
This case requires a key-range scan with a filter. It has to scan the
collision bucket and filter for the specified Value. We get the collision
bucket by creating a prefix key for the Value (using its VTE and
hashCode). This will either return the IV for that Value or nothing.newAccessPath
in class AbstractRelation<BigdataValue>
localIndexManager
- The local index manager (optional, except when there is a
request for a shard local access path in scale-out).predicate
- The predicate used to request the access path.keyOrder
- The index which the access path will use.LexAccessPatternEnum
,
LexPredicate
,
LexiconKeyOrder
Copyright © 2006–2019 SYSTAP, LLC DBA Blazegraph. All rights reserved.