public class HTreeHashJoinOp<E> extends HashJoinOp<E> implements ISingleThreadedOp
IAccessPath
based on the HTree
and
suitable for very large intermediate result sets. Source solutions are
buffered on the HTree
on each evaluation pass. When the memory demand
of the HTree
is not bounded, the hash join will run a single pass
over the IAccessPath
for the target IPredicate
. For some
queries, this can be more efficient than probing as-bound instances of the
target IPredicate
using a nested indexed join, such as
PipelineJoin
. This can also be more efficient on a cluster where the
key range scan of the target IPredicate
will be performed using
predominately sequential IO.
If the PipelineOp.Annotations#MAX_MEMORY
annotation is specified then
an evaluation pass over the target IAccessPath
will be triggered if,
after having buffered some chunk of solutions on the HTree
, the
memory demand of the HTree
exceeds the capacity specified by that
annotation. This "blocked" evaluation trades off multiple scans of the target
IPredicate
against the memory demand of the intermediate result set.
The source solutions presented to a hash join MUST have bindings for the
HashJoinAnnotations.JOIN_VARS
in order to join (they can still
succeed as optionals if the join variables are not bound).
While it is easy enough to associate a flag or counter with each source solution when running on the JVM heap, updating that flag or counter when the data are on a persistent index is more expensive. Another approach is to build up a second hash index (a "join set") of the solutions which joined and then do a scan over the original hash index, writing out any solution which is not in the joinSet. This is also expensive since we could wind up double buffering the source solutions. Both approaches also require us to scan the total multiset of the source solutions in order to detect and write out any optional solutions. I've gone with the joinSet approach here as it reduces the complexity associated with update of a per-solution counter in the hash index.
Finally, note that "blocked" evaluation is not possible with OPTIONAL because
we must have ALL solutions on hand in order to decide which solutions did not
join. Therefore PipelineOp.Annotations#MAX_MEMORY
must be set to
Long.MAX_VALUE
when the IPredicate
is
IPredicate.Annotations#OPTIONAL
.
HTreeHashJoinUtility
,
Serialized FormModifier and Type | Class and Description |
---|---|
static interface |
HTreeHashJoinOp.Annotations |
DEFAULT_INITIAL_CAPACITY
Constructor and Description |
---|
HTreeHashJoinOp(BOp[] args,
Map<String,Object> annotations) |
HTreeHashJoinOp(BOp[] args,
NV... annotations) |
HTreeHashJoinOp(HTreeHashJoinOp<E> op) |
Modifier and Type | Method and Description |
---|---|
protected IHashJoinUtility |
newState(BOpContext<IBindingSet> context,
INamedSolutionSetRef namedSetRef,
JoinTypeEnum joinType)
Return the instance of the
IHashJoinUtility to be used by this
operator. |
protected boolean |
runHashJoin(BOpContext<?> context,
IHashJoinUtility state)
Return
true if ChunkTask#doHashJoin() should be
executed in a given operator ChunkTask invocation. |
eval, getPredicate, isOptional, newStats
assertAtOnceJavaHeapOp, assertMaxParallelOne, getChunkCapacity, getChunkOfChunksCapacity, getChunkTimeout, getMaxMemory, getMaxParallel, isAtOnceEvaluation, isBlockedEvaluation, isLastPassRequested, isPipelinedEvaluation, isReorderSolutions, isSharedState
__replaceArg, _clearProperty, _set, _setProperty, annotations, annotationsCopy, annotationsEqual, annotationsRef, argIterator, args, argsCopy, arity, clearAnnotations, clearProperty, deepCopy, deepCopy, get, getProperty, setArg, setProperty, setUnboundProperty, toArray, toArray
annotationsEqual, annotationsToString, annotationsToString, annotationValueToString, checkArgs, clone, equals, getEvaluationContext, getId, getProperty, getRequiredProperty, hashCode, indent, isController, mutation, shortenName, toShortString, toString, toString
public HTreeHashJoinOp(HTreeHashJoinOp<E> op)
op
- protected IHashJoinUtility newState(BOpContext<IBindingSet> context, INamedSolutionSetRef namedSetRef, JoinTypeEnum joinType)
HashJoinOp
IHashJoinUtility
to be used by this
operator. This method is invoked once, the first time this operator is
evaluated. The returned IHashJoinUtility
reference is attached to
the IQueryAttributes
and accessed there on subsequent evaluation
passes for this operator.newState
in class HashJoinOp<E>
context
- The BOpEvaluationContext
namedSetRef
- Metadata to identify the named solution set.joinType
- The type of join.protected boolean runHashJoin(BOpContext<?> context, IHashJoinUtility state)
true
if ChunkTask#doHashJoin()
should be
executed in a given operator ChunkTask
invocation.
The HTreeHashJoinOp
runs the hash join either exactly once
(at-once evaluation) or once a target memory threshold has been exceeded
(blocked evaluation).
runHashJoin
in class HashJoinOp<E>
context
- The operator evaluation context.state
- The IHashJoinUtility
instance.Copyright © 2006–2019 SYSTAP, LLC DBA Blazegraph. All rights reserved.