StatementBuffer (Blazegraph Database Platform 2.1.5 API)

java.lang.Object
- com.bigdata.rdf.rio.StatementBuffer<S>

All Implemented Interfaces:

ICounterSetAccess, IStatementBuffer<S>, IBuffer<S>

Direct Known Subclasses:

VerifyStatementBuffer
```
public class StatementBuffer<S extends org.openrdf.model.Statement>
extends Object
implements IStatementBuffer<S>, ICounterSetAccess
```
A write buffer for absorbing the output of the RIO parser or other Statement source and writing that output onto an AbstractTripleStore using the batch API.
Note: there is a LOT of Value duplication in parsed RDF and we get a significant reward for reducing Values to only the distinct Values during processing. On the other hand, there is little Statement duplication. Hence we pay an unnecessary overhead if we try to make the statements distinct in the buffer.
Note: This also provides an explanation for why neither this class nor writes of SPOs do better when "distinct" statements is turned on - the "Value" objects in that case are only represented by long integers and duplication in their values does not impose a burden on either the heap or the index writers. In contrast, the duplication of Values in the StatementBuffer imposes a burden on both the heap and the index writers.

Author:

Bryan Thompson

Nested Class Summary

Nested Classes
Modifier and Type Class and Description

static interface StatementBuffer.IWrittenSPOArray
Note: The use of this interface is NOT encouraged.

Nested Classes
Modifier and Type	Class and Description
`static interface`	`StatementBuffer.IWrittenSPOArray` Note: The use of this interface is NOT encouraged.

Field Summary

Fields
Modifier and Type	Field and Description
`protected AbstractTripleStore`	`database` The database that will be used to resolve terms.
`protected StatementBuffer.IWrittenSPOArray`	`didWriteCallback`
`protected int`	`numBNodes`
`protected int`	`numLiterals`
`protected int`	`numSIDs` The #of blank nodes which appear in the context position and zero (0) if statement identifiers are not enabled.
`protected int`	`numStmts` #of valid entries in `stmts`.
`protected int`	`numURIs`
`protected int`	`numValues` #of valid entries in `values`.
`protected BigdataStatement[]`	`stmts` Buffer for parsed RDF `Statement`s.
`protected BigdataValueFactory`	`valueFactory`
`protected BigdataValue[]`	`values` Buffer for parsed RDF `Value`s.

Constructor Summary

Constructors
Constructor and Description
`StatementBuffer(AbstractTripleStore database, int capacity)` Create a buffer that converts Sesame `Value` objects to `SPO` s and writes on the database when it is `flush()`ed.
`StatementBuffer(AbstractTripleStore database, int capacity, int queueCapacity)`
`StatementBuffer(TempTripleStore statementStore, AbstractTripleStore database, int capacity, int queueCapacity)` Create a buffer that writes on a `TempTripleStore` when it is `flush()`ed.

Method Summary

Methods
Modifier and Type	Method and Description
`protected void`	`_clear()` Invoked by `incrementalWrite()` to clear terms and statements which have been written in preparation for buffering more writes.
`void`	`add(org.openrdf.model.Resource s, org.openrdf.model.URI p, org.openrdf.model.Value o)` Add an "explicit" statement to the buffer (flushes on overflow, no context).
`void`	`add(org.openrdf.model.Resource s, org.openrdf.model.URI p, org.openrdf.model.Value o, org.openrdf.model.Resource c)` Add an "explicit" statement to the buffer (flushes on overflow).
`void`	`add(org.openrdf.model.Resource s, org.openrdf.model.URI p, org.openrdf.model.Value o, org.openrdf.model.Resource c, StatementEnum type)` Add a statement to the buffer (core impl, flushes on overflow).
`void`	`add(org.openrdf.model.Statement e)` Add a statement to the buffer.
`protected void`	`finalize()` Added to ensure that the `FutureTask` is cancelled in case the caller does not shutdown the `StatementBuffer` normally.
`long`	`flush()` Signals the end of a source and causes all buffered statements to be written.
`int`	`getCapacity()` The maximum #of Statements, URIs, Literals, or BNodes that the buffer can hold.
`CounterSet`	`getCounters()` Return performance counters.
`AbstractTripleStore`	`getDatabase()` The database that will be used to resolve terms.
`int`	`getQueueCapacity()` The capacity of the optional queue used to overlap the parser with the index writer -or- ZERO (0) iff the queue is disabled and index writes will be synchronous and alternate with the parser (the historical behavior).
`AbstractTripleStore`	`getStatementStore()` The optional store into which statements will be inserted when non- `null`.
`protected void`	`handleStatement(org.openrdf.model.Resource _s, org.openrdf.model.URI _p, org.openrdf.model.Value _o, org.openrdf.model.Resource _c, StatementEnum type)` Adds the values and the statement into the buffer.
`protected void`	`incrementalWrite()` Batch insert buffered data (terms and statements) into the store.
`boolean`	`isEmpty()` True iff there are no elements in the buffer.
`boolean`	`nearCapacity()` Returns true if the bufferQueue has less than three slots remaining for any of the value arrays (URIs, Literals, or BNodes) or if there are no slots remaining in the statements array.
`void`	`reset()` Clears all buffered data, including the canonicalizing mapping for blank nodes and deferred provenance statements.
`void`	`setBNodeMap(Map<String,BigdataBNode> bnodes)` Set the canonicalizing map for blank nodes based on their ID.
`void`	`setChangeLog(IChangeLog changeLog)` Set an `IChangeLog` listener that will be notified about each statement actually written onto the backing store.
`void`	`setReadOnly()` When invoked, the `StatementBuffer` will resolve terms against the lexicon, but not enter new terms into the lexicon.
`int`	`size()` The #of elements currently in the buffer.
`String`	`toString()`

Methods inherited from class java.lang.Object
clone, equals, getClass, hashCode, notify, notifyAll, wait, wait, wait

- Field Detail
  - values
```
protected final BigdataValue[] values
```
    Buffer for parsed RDF Values.
  - stmts
```
protected final BigdataStatement[] stmts
```
    Buffer for parsed RDF Statements.
  - numValues
```
protected int numValues
```
    #of valid entries in values.
  - numStmts
```
protected int numStmts
```
    #of valid entries in stmts.
  - numURIs
```
protected int numURIs
```
  - numLiterals
```
protected int numLiterals
```
  - numBNodes
```
protected int numBNodes
```
  - numSIDs
```
protected int numSIDs
```
    The #of blank nodes which appear in the context position and zero (0) if statement identifiers are not enabled.
  - database
```
protected final AbstractTripleStore database
```
    The database that will be used to resolve terms. When statementStore is null, statements will be written into this store as well.
  - valueFactory
```
protected final BigdataValueFactory valueFactory
```
  - didWriteCallback
```
protected StatementBuffer.IWrittenSPOArray didWriteCallback
```
- Constructor Detail
  - StatementBuffer
```
public StatementBuffer(AbstractTripleStore database,
               int capacity)
```
    Create a buffer that converts Sesame Value objects to SPO s and writes on the database when it is flush()ed. This may be used to perform efficient batch write of Sesame Values or Statements onto the database. If you already have SPOs then use IRawTripleStore.addStatements(IChunkedOrderedIterator, IElementFilter) and friends.
    
    Parameters:
    database - The database into which the termS and statements will be inserted.
    capacity - The #of statements that the buffer can hold.
  - StatementBuffer
```
public StatementBuffer(AbstractTripleStore database,
               int capacity,
               int queueCapacity)
```
  - StatementBuffer
```
public StatementBuffer(TempTripleStore statementStore,
               AbstractTripleStore database,
               int capacity,
               int queueCapacity)
```
    Create a buffer that writes on a TempTripleStore when it is flush()ed. This variant is used during truth maintenance since the terms are written on the database lexicon but the statements are asserted against the TempTripleStore.
    
    Parameters:
    statementStore - The store into which the statements will be inserted (optional). When null, both statements and terms will be inserted into the database. This optional argument provides the ability to load statements into a temporary store while the terms are resolved against the main database. This facility is used during incremental load+close operations.
    database - The database. When statementStore is null, both terms and statements will be inserted into the database.
    capacity - The #of statements that the buffer can hold.
    queueCapacity - The capacity of blocking queue used by the StatementBuffer -or- ZERO (0) to disable the blocking queue and perform synchronous writes (default is statements). The blocking queue holds parsed data pending writes onto the backing store and makes it possible for the parser to race ahead while writer is blocked writing onto the database indices.
    See Also:
    (added blocking queue)
- Method Detail
  - getStatementStore
```
public final AbstractTripleStore getStatementStore()
```
    The optional store into which statements will be inserted when non- null.
    
    Specified by:
    
    getStatementStore in interface IStatementBuffer<S extends org.openrdf.model.Statement>
  - getDatabase
```
public final AbstractTripleStore getDatabase()
```
    The database that will be used to resolve terms. When getStatementStore() is null, statements will be written into this store as well.
    
    Specified by:
    
    getDatabase in interface IStatementBuffer<S extends org.openrdf.model.Statement>
  - getCapacity
```
public int getCapacity()
```
    The maximum #of Statements, URIs, Literals, or BNodes that the buffer can hold. The minimum capacity is three (3) since that corresponds to a single triple where all terms are URIs.
  - getQueueCapacity
```
public int getQueueCapacity()
```
    The capacity of the optional queue used to overlap the parser with the index writer -or- ZERO (0) iff the queue is disabled and index writes will be synchronous and alternate with the parser (the historical behavior).
    
    See Also:
    BLZG-1552
  - isEmpty
```
public boolean isEmpty()
```
    Description copied from interface: IBuffer
    
    True iff there are no elements in the buffer.
    
    Specified by:
    
    isEmpty in interface IBuffer<S extends org.openrdf.model.Statement>
  - size
```
public int size()
```
    Description copied from interface: IBuffer
    
    The #of elements currently in the buffer.
    
    Specified by:
    
    size in interface IBuffer<S extends org.openrdf.model.Statement>
  - toString
```
public String toString()
```
    Overrides:
    
    toString in class Object
  - getCounters
```
public CounterSet getCounters()
```
    Description copied from interface: ICounterSetAccess
    
    Return performance counters.
    
    Specified by:
    
    getCounters in interface ICounterSetAccess
  - setReadOnly
```
public void setReadOnly()
```
    When invoked, the StatementBuffer will resolve terms against the lexicon, but not enter new terms into the lexicon. This mode can be used to efficiently resolve terms to SPOs.
  - setChangeLog
```
public void setChangeLog(IChangeLog changeLog)
```
    Set an IChangeLog listener that will be notified about each statement actually written onto the backing store.
    
    Parameters:
    changeLog - The change log listener.
  - finalize
```
protected void finalize()
                 throws Throwable
```
    Added to ensure that the FutureTask is cancelled in case the caller does not shutdown the StatementBuffer normally.
    
    Overrides:
    
    finalize in class Object
    
    Throws:
    
    Throwable
  - flush
```
public long flush()
```
    Signals the end of a source and causes all buffered statements to be written.
    Note: The source limits the scope within which blank nodes are co-referenced by their IDs. Calling this method will flush the buffer, cause any deferred statements to be written, and cause the canonicalizing mapping for blank nodes to be discarded.
    
    Specified by:
    
    flush in interface IBuffer<S extends org.openrdf.model.Statement>
    
    Returns:
    The #of elements written on the backing IRelation. See IMutableRelation
  - reset
```
public void reset()
```
    Clears all buffered data, including the canonicalizing mapping for blank nodes and deferred provenance statements.
    
    Specified by:
    
    reset in interface IBuffer<S extends org.openrdf.model.Statement>
  - setBNodeMap
```
public void setBNodeMap(Map<String,BigdataBNode> bnodes)
```
    Description copied from interface: IStatementBuffer
    
    Set the canonicalizing map for blank nodes based on their ID. This allows you to reuse the same map across multiple IStatementBuffer instances. For example, the BigdataSail does this so that the same bnode map is used throughout the life of a SailConnection. While RIO provides blank node correlation within a given source, it does NOT provide blank node correlation across sources. You need to use this method to do that.
    Note: It is reasonable to expect that the bnodes map is used by concurrent threads. For this reason, the map SHOULD be thread-safe. This can be accomplished either using Collections.synchronizedMap(Map) or a ConcurrentHashMap. However, implementations MUST still be synchronized on the map reference across operations which conditionally insert into the map in order to make that update atomic and thread-safe. Otherwise a race condition exists for the conditional insert and different threads could get incoherent answers.
    
    Specified by:
    
    setBNodeMap in interface IStatementBuffer<S extends org.openrdf.model.Statement>
    
    Parameters:
    bnodes - The blank nodes map.
  - _clear
```
protected void _clear()
```
    Invoked by incrementalWrite() to clear terms and statements which have been written in preparation for buffering more writes. This does NOT discard either the canonicalizing mapping for blank nodes NOR any deferred statements.
  - incrementalWrite
```
protected void incrementalWrite()
```
    Batch insert buffered data (terms and statements) into the store.
  - add
```
public void add(org.openrdf.model.Resource s,
       org.openrdf.model.URI p,
       org.openrdf.model.Value o)
```
    Add an "explicit" statement to the buffer (flushes on overflow, no context).
    
    Specified by:
    
    add in interface IStatementBuffer<S extends org.openrdf.model.Statement>
    
    Parameters:
    s -
    p -
    o -
  - add
```
public void add(org.openrdf.model.Resource s,
       org.openrdf.model.URI p,
       org.openrdf.model.Value o,
       org.openrdf.model.Resource c)
```
    Add an "explicit" statement to the buffer (flushes on overflow).
    
    Specified by:
    
    add in interface IStatementBuffer<S extends org.openrdf.model.Statement>
    
    Parameters:
    s -
    p -
    o -
    c -
  - add
```
public void add(org.openrdf.model.Resource s,
       org.openrdf.model.URI p,
       org.openrdf.model.Value o,
       org.openrdf.model.Resource c,
       StatementEnum type)
```
    Add a statement to the buffer (core impl, flushes on overflow).
    
    Specified by:
    
    add in interface IStatementBuffer<S extends org.openrdf.model.Statement>
    
    Parameters:
    s -
    p -
    o -
    type -
    c - The context (optional).
  - add
```
public void add(org.openrdf.model.Statement e)
```
    Description copied from interface: IStatementBuffer
    
    Add a statement to the buffer.
    
    Specified by:
    
    add in interface IStatementBuffer<S extends org.openrdf.model.Statement>
    
    Specified by:
    
    add in interface IBuffer<S extends org.openrdf.model.Statement>
    
    Parameters:
    e - The statement. If stmt implements BigdataStatement then the StatementEnum will be used (this makes it possible to load axioms into the database as axioms) but the term identifiers on the stmt's values will be ignored.
  - nearCapacity
```
public boolean nearCapacity()
```
    Returns true if the bufferQueue has less than three slots remaining for any of the value arrays (URIs, Literals, or BNodes) or if there are no slots remaining in the statements array. Under those conditions adding another statement to the bufferQueue could cause an overflow.
    
    Returns:
    True if the bufferQueue might overflow if another statement were added.
  - handleStatement
```
protected void handleStatement(org.openrdf.model.Resource _s,
                   org.openrdf.model.URI _p,
                   org.openrdf.model.Value _o,
                   org.openrdf.model.Resource _c,
                   StatementEnum type)
```
    Adds the values and the statement into the buffer.
    
    Parameters:
    _s - The subject.
    _p - The predicate.
    _o - The object.
    _c - The context (may be null).
    type - The statement type.
    
    Throws:
    
    IndexOutOfBoundsException - if the buffer capacity is exceeded.
    See Also:
    nearCapacity()

Class StatementBuffer<S extends org.openrdf.model.Statement>

Nested Class Summary

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Detail

values

stmts

numValues

numStmts

numURIs

numLiterals

numBNodes

numSIDs

database

valueFactory

didWriteCallback

Constructor Detail

StatementBuffer

StatementBuffer

StatementBuffer

Method Detail

getStatementStore

getDatabase

getCapacity

getQueueCapacity

isEmpty

size

toString

getCounters

setReadOnly

setChangeLog

finalize

flush

reset

setBNodeMap

_clear

incrementalWrite

add

add

add

add

nearCapacity

handleStatement