See: Description
Interface | Description |
---|---|
IJoinMaster |
Interface exported by the
JoinMasterTask . |
Class | Description |
---|---|
DistributedJoinMasterTask |
Implementation for distributed join execution.
|
DistributedJoinTask |
Implementation used by scale-out deployments.
|
JoinMasterTask |
Master providing efficient distributed evaluation of
IRule s. |
JoinStats |
Statistics about processing for a single join dimension as reported by a
single
JoinTask . |
JoinTask |
Consumes
IBindingSet chunks from the previous join dimension. |
JoinTaskFactoryTask |
A factory for
DistributedJoinTask s. |
JoinTaskSink | |
LocalJoinMasterTask |
Implementation for local join execution on a
Journal . |
LocalJoinTask | |
UnsyncLocalOutputBuffer<E extends IBindingSet> |
Keeps track of the chunks of binding sets that are generated on the caller's
JoinStats . |
This package implements a pipeline join. The basic design has a join task running locally on the data service for each index partition touched by the rule and a join master to setup the execution of the 1st join dimension. The intermediate binding sets are "pipelined" from one join dimension to the next. When the index is NOT partitioned, the fan-in and fan-out are both ONE (1). When the index is partitioned, the fan-in and fan-out depend on the data and can be greater than one. The master is executed by the client. For query, the buffer on which the solutions will be written is proxied and passed along until it reaches the last join dimension, which then writes on the proxied query buffer. For mutation, the join tasks in the last join dimension obtain their own solution buffers that write on the scale-out view of the mutable relation identified by the head of the rule -- this means that the generated solutions do not pass through the master for mutation operations, but only for query. Since the master is the client (this is required for query, but optional for mutation), the solutions in effect are gathered to the client which is exactly what we want. The pipeline join is MUCH more efficient than nested subquery for the scale-out architecture. There are serveral reasons why: (a) the index reads are purely local; (b) the join tasks are able to order the access paths to be joined with the incoming binding sets in order, which results in ordered reads on the index partition and also allows us to trivally eliminate duplicate reads on the same access path; (c) there are far fewer RMI calls involved; and (d) each join task executes in its own thread and on the machine where the index partition resides on which it will read -- the computation is therefore perfectly mapped to the data.
Copyright © 2006–2019 SYSTAP, LLC DBA Blazegraph. All rights reserved.