F
- The generic type of the source Statement
added to the
buffer by the callers.G
- The generic type of the BigdataStatement
s stored in the
buffer.public abstract class AbstractStatementBuffer<F extends org.openrdf.model.Statement,G extends BigdataStatement> extends Object implements IStatementBuffer<F>
Statement
s into
BigdataStatement
s, including resolving term identifiers (or adding
entries to the lexicon for unknown terms) as required. The class does not
write the converted BigdataStatement
s onto the database, but that can
be easily done using a resolving iterator pattern.Modifier and Type | Class and Description |
---|---|
static class |
AbstractStatementBuffer.StatementBuffer2<F extends org.openrdf.model.Statement,G extends BigdataStatement>
Loads
Statement s into an RDF database. |
Modifier and Type | Field and Description |
---|---|
protected static boolean |
DEBUG |
protected static boolean |
INFO |
protected static org.apache.log4j.Logger |
log |
protected boolean |
readOnly
When
true , Value s will be resolved against
the LexiconRelation and Statement s will be resolved
against the SPORelation , but unknown Value s and
unknown Statement s WILL NOT be inserted into the
corresponding relations. |
protected G[] |
statementBuffer
Buffer for accepted
BigdataStatement s. |
Constructor and Description |
---|
AbstractStatementBuffer(AbstractTripleStore db,
boolean readOnly,
int capacity) |
Modifier and Type | Method and Description |
---|---|
void |
add(F e)
Imposes a canonical mapping on the subject, predicate, and objects of the
given
Statement s and stores a new BigdataStatement
instance in the internal buffer. |
void |
add(org.openrdf.model.Resource s,
org.openrdf.model.URI p,
org.openrdf.model.Value o)
Add an "explicit" statement to the buffer with a "null" context.
|
void |
add(org.openrdf.model.Resource s,
org.openrdf.model.URI p,
org.openrdf.model.Value o,
org.openrdf.model.Resource c)
Add an "explicit" statement to the buffer.
|
void |
add(org.openrdf.model.Resource s,
org.openrdf.model.URI p,
org.openrdf.model.Value o,
org.openrdf.model.Resource c,
StatementEnum type)
Add a statement to the buffer.
|
protected void |
clear()
Clears the state associated with the
BigdataStatement s in
the internal buffer but does not discard the blank nodes or deferred
statements. |
protected BigdataValue |
convertValue(org.openrdf.model.Value value)
Return a canonical
BigdataValue instance representing the given
value. |
long |
flush()
Converts any buffered statements and any deferred statements and then
invokes
overflow() to flush anything remaining in the buffer. |
AbstractTripleStore |
getDatabase()
The database from the ctor.
|
AbstractTripleStore |
getStatementStore()
Note: Returns the same value as
getDatabase() since the
distinction is not captured by this class. |
BigdataValueFactory |
getValueFactory()
The
ValueFactory for Statement s and Value s
created by this class. |
protected abstract int |
handleProcessedStatements(G[] a)
Invoked by
overflow() . |
boolean |
isEmpty()
true if there are no buffered statements and no
buffered deferred statements |
protected void |
overflow()
Invoked each time the
statementBuffer buffer would overflow. |
protected void |
processBufferedValues()
Efficiently resolves/adds term identifiers for the buffered
BigdataValue s. |
protected void |
processDeferredStatements()
Processes any
BigdataStatement s in the
deferredStatementBuffer , adding them to the
statementBuffer , which may cause the latter to
overflow() . |
void |
reset()
Discards all state (term map, bnodes, deferred statements, the buffered
statements, and the counter whose value is reported by
flush() ). |
void |
setBNodeMap(Map<String,BigdataBNode> bnodes)
Set the canonicalizing map for blank nodes based on their ID.
|
int |
size()
#of buffered statements plus the #of buffered statements that are
being deferred.
|
protected static final org.apache.log4j.Logger log
protected static final boolean INFO
protected static final boolean DEBUG
protected final boolean readOnly
true
, Value
s will be resolved against
the LexiconRelation
and Statement
s will be resolved
against the SPORelation
, but unknown Value
s and
unknown Statement
s WILL NOT be inserted into the
corresponding relations.protected final G extends BigdataStatement[] statementBuffer
BigdataStatement
s. This buffer is
cleared each time it would overflow.public AbstractStatementBuffer(AbstractTripleStore db, boolean readOnly, int capacity)
db
- The database against which the Value
s will be
resolved (or added). If this database supports statement
identifiers, then statement identifiers for the converted
statements will be resolved (or added) to the lexicon.readOnly
- When true
, Value
s (and statement
identifiers iff enabled) will be resolved against the
LexiconRelation
, but entries WILL NOT be inserted
into the LexiconRelation
for unknown Value
s
(or for statement identifiers for unknown
Statement
s when statement identifiers are
enabled).capacity
- The capacity of the backing buffer.public AbstractTripleStore getDatabase()
getDatabase
in interface IStatementBuffer<F extends org.openrdf.model.Statement>
public AbstractTripleStore getStatementStore()
getDatabase()
since the
distinction is not captured by this class. This MUST be overriden in
derived classes which make this distinction.getStatementStore
in interface IStatementBuffer<F extends org.openrdf.model.Statement>
public BigdataValueFactory getValueFactory()
ValueFactory
for Statement
s and Value
s
created by this class.public void setBNodeMap(Map<String,BigdataBNode> bnodes)
IStatementBuffer
IStatementBuffer
instances. For example, the BigdataSail
does this so that the
same bnode map is used throughout the life of a SailConnection
.
While RIO provides blank node correlation within a given source, it does
NOT provide blank node correlation across sources. You need to use this
method to do that.
Note: It is reasonable to expect that the bnodes map is used by
concurrent threads. For this reason, the map SHOULD be thread-safe. This
can be accomplished either using Collections.synchronizedMap(Map)
or a ConcurrentHashMap
. However, implementations MUST still be
synchronized on the map reference across operations which conditionally
insert into the map in order to make that update atomic and thread-safe.
Otherwise a race condition exists for the conditional insert and
different threads could get incoherent answers.
setBNodeMap
in interface IStatementBuffer<F extends org.openrdf.model.Statement>
bnodes
- The blank nodes map.protected BigdataValue convertValue(org.openrdf.model.Value value)
BigdataValue
instance representing the given
value. The scope of the canonical instance is until the next
internal buffer overflow (URI
s and Literal
s) or until
flush()
(BNode
s, since blank nodes are global for a
given source). The purpose of the canonicalizing mapping is to reduce the
buffered BigdataValue
s to the minimum variety required to
represent the buffered BigdataStatement
s, which improves
throughput significantly (40%) when resolving terms to the corresponding
term identifiers using the LexiconRelation
.
Note: This is not a true canonicalizing map when statement identifiers
are used since values used in deferred statements will be held over until
the buffer is flush()
ed. This relaxation of the canonicalizing
mapping is not a problem since the purpose of the mapping is to provide
better throughput and nothign relies on a pure canonicalization of the
Value
s.
value
- A value.BigdataValue
for the target
BigdataValueFactory
. This will be null
iff the value is null
(allows for the
context to be undefined).public boolean isEmpty()
true
if there are no buffered statements and no
buffered deferred statementspublic int size()
public void add(F e)
Statement
s and stores a new BigdataStatement
instance in the internal buffer. If the given statement is a
BigdataStatement
then its StatementEnum
will be used.
Otherwise the new statement will be StatementEnum.Explicit
.
Note: Unlike the Value
s, a canonicalizing mapping is NOT imposed
for the statements. This is because, unlike the Value
s, there
tends to be little duplication in Statement
s when processing
RDF.
add
in interface IStatementBuffer<F extends org.openrdf.model.Statement>
add
in interface IBuffer<F extends org.openrdf.model.Statement>
e
- The statement. If stmt implements
BigdataStatement
then the StatementEnum
will
be used (this makes it possible to load axioms into the
database as axioms) but the term identifiers on the stmt's
values will be ignored.public void add(org.openrdf.model.Resource s, org.openrdf.model.URI p, org.openrdf.model.Value o)
IStatementBuffer
add
in interface IStatementBuffer<F extends org.openrdf.model.Statement>
s
- The subject.p
- The predicate.o
- The object.public void add(org.openrdf.model.Resource s, org.openrdf.model.URI p, org.openrdf.model.Value o, org.openrdf.model.Resource c)
IStatementBuffer
add
in interface IStatementBuffer<F extends org.openrdf.model.Statement>
s
- The subject.p
- The predicate.o
- The object.c
- The context (optional).public void add(org.openrdf.model.Resource s, org.openrdf.model.URI p, org.openrdf.model.Value o, org.openrdf.model.Resource c, StatementEnum type)
IStatementBuffer
Note: The context parameter (c) is NOT used. The database at this time is either a triple store or a triple store with statement identifiers, and in neither case is the context used.
add
in interface IStatementBuffer<F extends org.openrdf.model.Statement>
s
- The subject.p
- The predicate.o
- The object.c
- The context (optional).type
- The statement type (optional).protected void processBufferedValues()
BigdataValue
s.
If readOnly
), then the term identifier for unknown values
will remain IRawTripleStore#NULL
.
protected void processDeferredStatements()
BigdataStatement
s in the
deferredStatementBuffer
, adding them to the
statementBuffer
, which may cause the latter to
overflow()
.protected final void overflow()
statementBuffer
buffer would overflow.
This method is responsible for bulk resolving / adding the buffered
BigdataValue
s against the db
and adding the fully
resolved BigdataStatement
s to the queue on which the
iterator is reading.protected abstract int handleProcessedStatements(G[] a)
overflow()
.a
- An array of processed BigdataStatement
s.counter
reported by
flush()
.public long flush()
overflow()
to flush anything remaining in the buffer.public void reset()
flush()
).protected void clear()
BigdataStatement
s in
the internal buffer but does not discard the blank nodes or deferred
statements.Copyright © 2006–2019 SYSTAP, LLC DBA Blazegraph. All rights reserved.