public class HTreeHashJoinOp<E> extends HashJoinOp<E> implements ISingleThreadedOp
IAccessPath based on the HTree and
suitable for very large intermediate result sets. Source solutions are
buffered on the HTree on each evaluation pass. When the memory demand
of the HTree is not bounded, the hash join will run a single pass
over the IAccessPath for the target IPredicate. For some
queries, this can be more efficient than probing as-bound instances of the
target IPredicate using a nested indexed join, such as
PipelineJoin. This can also be more efficient on a cluster where the
key range scan of the target IPredicate will be performed using
predominately sequential IO.
If the PipelineOp.Annotations#MAX_MEMORY annotation is specified then
an evaluation pass over the target IAccessPath will be triggered if,
after having buffered some chunk of solutions on the HTree, the
memory demand of the HTree exceeds the capacity specified by that
annotation. This "blocked" evaluation trades off multiple scans of the target
IPredicate against the memory demand of the intermediate result set.
The source solutions presented to a hash join MUST have bindings for the
HashJoinAnnotations.JOIN_VARS in order to join (they can still
succeed as optionals if the join variables are not bound).
While it is easy enough to associate a flag or counter with each source solution when running on the JVM heap, updating that flag or counter when the data are on a persistent index is more expensive. Another approach is to build up a second hash index (a "join set") of the solutions which joined and then do a scan over the original hash index, writing out any solution which is not in the joinSet. This is also expensive since we could wind up double buffering the source solutions. Both approaches also require us to scan the total multiset of the source solutions in order to detect and write out any optional solutions. I've gone with the joinSet approach here as it reduces the complexity associated with update of a per-solution counter in the hash index.
Finally, note that "blocked" evaluation is not possible with OPTIONAL because
we must have ALL solutions on hand in order to decide which solutions did not
join. Therefore PipelineOp.Annotations#MAX_MEMORY must be set to
Long.MAX_VALUE when the IPredicate is
IPredicate.Annotations#OPTIONAL.
HTreeHashJoinUtility,
Serialized Form| Modifier and Type | Class and Description |
|---|---|
static interface |
HTreeHashJoinOp.Annotations |
DEFAULT_INITIAL_CAPACITY| Constructor and Description |
|---|
HTreeHashJoinOp(BOp[] args,
Map<String,Object> annotations) |
HTreeHashJoinOp(BOp[] args,
NV... annotations) |
HTreeHashJoinOp(HTreeHashJoinOp<E> op) |
| Modifier and Type | Method and Description |
|---|---|
protected IHashJoinUtility |
newState(BOpContext<IBindingSet> context,
INamedSolutionSetRef namedSetRef,
JoinTypeEnum joinType)
Return the instance of the
IHashJoinUtility to be used by this
operator. |
protected boolean |
runHashJoin(BOpContext<?> context,
IHashJoinUtility state)
Return
true if ChunkTask#doHashJoin() should be
executed in a given operator ChunkTask invocation. |
eval, getPredicate, isOptional, newStatsassertAtOnceJavaHeapOp, assertMaxParallelOne, getChunkCapacity, getChunkOfChunksCapacity, getChunkTimeout, getMaxMemory, getMaxParallel, isAtOnceEvaluation, isBlockedEvaluation, isLastPassRequested, isPipelinedEvaluation, isReorderSolutions, isSharedState__replaceArg, _clearProperty, _set, _setProperty, annotations, annotationsCopy, annotationsEqual, annotationsRef, argIterator, args, argsCopy, arity, clearAnnotations, clearProperty, deepCopy, deepCopy, get, getProperty, setArg, setProperty, setUnboundProperty, toArray, toArrayannotationsEqual, annotationsToString, annotationsToString, annotationValueToString, checkArgs, clone, equals, getEvaluationContext, getId, getProperty, getRequiredProperty, hashCode, indent, isController, mutation, shortenName, toShortString, toString, toStringpublic HTreeHashJoinOp(HTreeHashJoinOp<E> op)
op - protected IHashJoinUtility newState(BOpContext<IBindingSet> context, INamedSolutionSetRef namedSetRef, JoinTypeEnum joinType)
HashJoinOpIHashJoinUtility to be used by this
operator. This method is invoked once, the first time this operator is
evaluated. The returned IHashJoinUtility reference is attached to
the IQueryAttributes and accessed there on subsequent evaluation
passes for this operator.newState in class HashJoinOp<E>context - The BOpEvaluationContextnamedSetRef - Metadata to identify the named solution set.joinType - The type of join.protected boolean runHashJoin(BOpContext<?> context, IHashJoinUtility state)
true if ChunkTask#doHashJoin() should be
executed in a given operator ChunkTask invocation.
The HTreeHashJoinOp runs the hash join either exactly once
(at-once evaluation) or once a target memory threshold has been exceeded
(blocked evaluation).
runHashJoin in class HashJoinOp<E>context - The operator evaluation context.state - The IHashJoinUtility instance.Copyright © 2006–2019 SYSTAP, LLC DBA Blazegraph. All rights reserved.