public class BigdataGASEngine extends GASEngine
IGASEngine
for dynamic activation of vertices. This implementation
maintains a frontier and lazily initializes the vertex state when the vertex
is visited for the first time. This is appropriate for algorithms, such as
BFS, that use a dynamic frontier.
The first way to compute an analytic over a dynamic graph is to specify the
timestamp of the view as ITx.READ_COMMITTED
. The view of the graph
in each round will be automatically advanced to the most recently
committed view of that graph. Thus, if there are concurrent commits, each
time the IGASProgram
is executed within a given round of evaluation,
it will see the most recently committed state of the data graph.
The second way to compute an analytic over a dynamic graph is to explicitly
change the view before each round. This can be achieved by tunneling the
BigdataGASEngine.BigdataGraphAccessor
interface from
IGASProgram.nextRound(IGASContext)
. If you take this approach, then
you could explicitly walk through an iterator over the commit record index
and update the timestamp of the view. This approach allows you to replay
historical committed states of the graph at a known one-to-one rate (one
graph state per round of the GAS computation).
TODO Algorithms that need to visit all vertices in each round (CC, BC, PR)
can be more optimially executed by a different implementation strategy. The
vertex state should be arranged in a dense map (maybe an array) and presized.
For example, this could be done on the first pass when we identify a vertex
index for each distinct V in visitation order.
TODO Vectored expansion with conditional materialization of attribute values
could be achieved using CONSTRUCT. This would force URI materialization as
well. If we drop down one level, then we can push in the frontier and avoid
the materialization. Or we can just write an operator that accepts a frontier
and returns the new frontier and which maintains an internal map containing
both the visited vertices, the vertex state, and the edge state.
TODO Some computations could be maintained and accelerated. A great example
is Shortest Path (as per RDF3X). Reachability queries for a hierarchy can
also be maintained and accelerated (again, RDF3X using a ferrari index).
TODO Option to materialize Literals (or to declare the set of literals of
interest) [Note: We can also require that people inline all URIs and Literals
if they need to have them materialized, but a materialization filter for
Gather and Scatter would be nice if it can be selective for just those
attributes or vertex identifiers that matter).
TODO DYNAMIC GRAPHS: Another possibility would be to replay a history log,
explicitly making changes to the graph. In order to provide high concurrency
for readers, this would require a shadowing of the graph (effectively,
shadowing the indices). That might be achieved by replaying the changes into
a version fork of the graph and then using a read-only view of the fork. This
is basically a retroactive variant of replaying the commit points from the
commit record index. I am not sure if it has much to recommend it.
The thing that is interesting about the history index, is that it captures just the delta. Actually computing the delta between two commit points is none-trivial without the history index. However, I am not sure how we can leverage that delta in an interesting fashion for dynamic graphs.
Modifier and Type | Class and Description |
---|---|
static class |
BigdataGASEngine.BigdataGraphAccessor |
Constructor and Description |
---|
BigdataGASEngine(IIndexManager indexManager,
int nthreads) |
BigdataGASEngine(org.openrdf.sail.Sail sail,
int nthreads)
Convenience constructor
|
Modifier and Type | Method and Description |
---|---|
boolean |
getSortFrontier()
Returns
true since the IOs will be vectored if the
frontier is sorted. |
<VS,ES,ST> IGASState<VS,ES,ST> |
newGASState(IGraphAccessor graphAccessor,
IGASProgram<VS,ES,ST> gasProgram) |
getGASThreadPool, getNThreads, getSchedulerClass, newFrontierStrategy, newGASContext, newScheduler, newStaticFrontier, setSchedulerClass, shutdown, shutdownNow
public BigdataGASEngine(org.openrdf.sail.Sail sail, int nthreads)
sail
- The sail (must be a BigdataSail
).nthreads
- The number of threads to use for the SCATTER and GATHER
phases.public BigdataGASEngine(IIndexManager indexManager, int nthreads)
indexManager
- The index manager.nthreads
- The number of threads to use for the SCATTER and GATHER
phases.
TODO Scale-out: The IIndexmanager
MAY be an
IBigdataFederation
. The BigdataGASEngine
would
automatically use remote indices. However, for proper
scale-out we want to partition the work and the VS/ES so that
would imply a different IGASEngine
design.public <VS,ES,ST> IGASState<VS,ES,ST> newGASState(IGraphAccessor graphAccessor, IGASProgram<VS,ES,ST> gasProgram)
newGASState
in class GASEngine
public boolean getSortFrontier()
true
since the IOs will be vectored if the
frontier is sorted.Copyright © 2006–2019 SYSTAP, LLC DBA Blazegraph. All rights reserved.