public class ParseOp extends PipelineOp
ChunkedResolutionOp
and the InsertStatementsOp
.
DataLoader.ClosureEnum
and DataLoader.CommitEnum
shape the way in which
the update plan is generated. They are not options on the
ParseOp
itself.
We need to setup the assertion and retraction buffers such that they have the appropriate scope or (for database at once closure) we do not setup those buffers but we recompute the closure of the database afterwards.
The assertion buffers might be populated after the IV resolution step and before we write on the indices. We then compute the fixed point of the closure over the delta and then write that onto the database. We should be able to specify that some sources contain data to be removed (INSERT DATA and REMOVE DATA or UNLOAD src). The operation should combine assertions and retractions to be efficient.
See DataLoader
.
TODO Add an operator which handles a zip archive, creating a LOAD
for each resource in that archive. Recursive directory processing is
similar. Both should result in multiple ParseOp instances which can
run in parallel. Those ParseOp instances will feed the IV
resolution, optional TM, and statement writer operations.
If we can make the SOURCE_URI a value expression, then we could flow
solutions into the LOAD operation which would be the bindings for
the source URI. Very nice! Then we could hash partition the LOAD
operator across a cluster and do a parallel load very easily. If the
source for those solutions was the parse of a single RDF file (or
streamed URI) containing the files to be loaded then we could also
gain the indirection necessary to load large numbers of files in
parallel on a cluster.
TODO In at least the SIDS mode, we need to do some special
operations when the statement buffer is flushed. That statement
buffer could either be fed directly by the ParserOp
or
indirectly through solutions modeling statements flowing through the
query engine. I am inclined to the latter for better parallelism.
Even though there is more stuff on the heap and more latency within
the stages, I think that we will get more out of the increased
parallelism.
TODO Any annotation here should be configurable from the
LoadGraph
AST node and (ideally) the SPARQL UPDATE syntax.
FIXME This does not handle SIDS. The StatementBuffer
logic
needs to get into InsertStatementsOp
for that to work, or
the plan needs to be slightly different and hit a different insert
operator for statements all together.
FIXME This does not handle Truth Maintenance.
PresortRioLoader
,
StatementBuffer
,
DataLoader
,
DataLoader.Options
,
RDFParserOptions
,
DataLoader.ClosureEnum
,
DataLoader.CommitEnum
,
Serialized FormModifier and Type | Class and Description |
---|---|
static interface |
ParseOp.Annotations
Note:
BOp.Annotations#TIMEOUT is respected to limit the read time
on an HTTP connection. |
Modifier and Type | Field and Description |
---|---|
protected static Var<?> |
c
The s, p, o, and c variable names.
|
protected static Var<?> |
o
The s, p, o, and c variable names.
|
protected static Var<?> |
p
The s, p, o, and c variable names.
|
protected static Var<?> |
s
The s, p, o, and c variable names.
|
DEFAULT_INITIAL_CAPACITY
Constructor and Description |
---|
ParseOp(BOp[] args,
Map<String,Object> annotations) |
ParseOp(ParseOp op) |
Modifier and Type | Method and Description |
---|---|
FutureTask<Void> |
eval(BOpContext<IBindingSet> context)
Return a
FutureTask which computes the operator against the
evaluation context. |
ParserStats |
newStats()
Return a new object which can be used to collect statistics on the
operator evaluation.
|
assertAtOnceJavaHeapOp, assertMaxParallelOne, getChunkCapacity, getChunkOfChunksCapacity, getChunkTimeout, getMaxMemory, getMaxParallel, isAtOnceEvaluation, isBlockedEvaluation, isLastPassRequested, isPipelinedEvaluation, isReorderSolutions, isSharedState
__replaceArg, _clearProperty, _set, _setProperty, annotations, annotationsCopy, annotationsEqual, annotationsRef, argIterator, args, argsCopy, arity, clearAnnotations, clearProperty, deepCopy, deepCopy, get, getProperty, setArg, setProperty, setUnboundProperty, toArray, toArray
annotationsEqual, annotationsToString, annotationsToString, annotationValueToString, checkArgs, clone, equals, getEvaluationContext, getId, getProperty, getRequiredProperty, hashCode, indent, isController, mutation, shortenName, toShortString, toString, toString
protected static final Var<?> s
protected static final Var<?> p
protected static final Var<?> o
protected static final Var<?> c
public ParseOp(ParseOp op)
public ParserStats newStats()
PipelineOp
newStats
in class PipelineOp
public FutureTask<Void> eval(BOpContext<IBindingSet> context)
PipelineOp
FutureTask
which computes the operator against the
evaluation context. The caller is responsible for executing the
FutureTask
(this gives them the ability to hook the completion of
the computation).eval
in class PipelineOp
context
- The evaluation context.FutureTask
which will compute the operator's
evaluation.Copyright © 2006–2019 SYSTAP, LLC DBA Blazegraph. All rights reserved.