com.bigdata.relation.rule.eval.pipeline (Blazegraph Database Platform 2.1.5 API)

Interface Summary
Interface Description

IJoinMaster
Interface exported by the JoinMasterTask.

Interface Summary
Interface	Description
IJoinMaster	Interface exported by the `JoinMasterTask`.

Class Summary
Class	Description
DistributedJoinMasterTask	Implementation for distributed join execution.
DistributedJoinTask	Implementation used by scale-out deployments.
JoinMasterTask	Master providing efficient distributed evaluation of `IRule`s.
JoinStats	Statistics about processing for a single join dimension as reported by a single `JoinTask`.
JoinTask	Consumes `IBindingSet` chunks from the previous join dimension.
JoinTaskFactoryTask	A factory for `DistributedJoinTask`s.
JoinTaskSink	An object used by a `JoinTask` to write on another `JoinTask` providing a sink for a specific index partition.
LocalJoinMasterTask	Implementation for local join execution on a `Journal`.
LocalJoinTask	`JoinTask` implementation for a `Journal`.
UnsyncLocalOutputBuffer<E extends IBindingSet>	Keeps track of the chunks of binding sets that are generated on the caller's `JoinStats`.

Package com.bigdata.relation.rule.eval.pipeline Description

This package implements a pipeline join. The basic design has a join task running locally on the data service for each index partition touched by the rule and a join master to setup the execution of the 1st join dimension. The intermediate binding sets are "pipelined" from one join dimension to the next. When the index is NOT partitioned, the fan-in and fan-out are both ONE (1). When the index is partitioned, the fan-in and fan-out depend on the data and can be greater than one. The master is executed by the client. For query, the buffer on which the solutions will be written is proxied and passed along until it reaches the last join dimension, which then writes on the proxied query buffer. For mutation, the join tasks in the last join dimension obtain their own solution buffers that write on the scale-out view of the mutable relation identified by the head of the rule -- this means that the generated solutions do not pass through the master for mutation operations, but only for query. Since the master is the client (this is required for query, but optional for mutation), the solutions in effect are gathered to the client which is exactly what we want. The pipeline join is MUCH more efficient than nested subquery for the scale-out architecture. There are serveral reasons why: (a) the index reads are purely local; (b) the join tasks are able to order the access paths to be joined with the incoming binding sets in order, which results in ordered reads on the index partition and also allows us to trivally eliminate duplicate reads on the same access path; (c) there are far fewer RMI calls involved; and (d) each join task executes in its own thread and on the machine where the index partition resides on which it will read -- the computation is therefore perfectly mapped to the data.