public interface IHashJoinUtility
This class also supports DISTINCT SOLUTIONS filters. For this use case, the
caller uses filterSolutions(ICloseableIterator, BOpStats, IBuffer)
method.
Modifier and Type | Method and Description |
---|---|
long |
acceptSolutions(ICloseableIterator<IBindingSet[]> itr,
BOpStats stats)
Buffer solutions on a hash index.
|
long |
filterSolutions(ICloseableIterator<IBindingSet[]> itr,
BOpStats stats,
IBuffer<IBindingSet> sink)
Filter solutions, writing only the DISTINCT solutions onto the sink.
|
IVariable<?> |
getAskVar()
The variable bound based on whether or not a solution survives an
"EXISTS" graph pattern (optional).
|
IConstraint[] |
getConstraints()
The join constraints (optional).
|
JoinTypeEnum |
getJoinType()
Return the type safe enumeration indicating what kind of operation is to
be performed.
|
IVariable<?>[] |
getJoinVars()
The join variables.
|
long |
getRightSolutionCount()
Return the #of solutions in the hash index.
|
IVariable<?>[] |
getSelectVars()
The variables to be retained (optional, all variables are retained if
not specified).
|
void |
hashJoin(ICloseableIterator<IBindingSet[]> leftItr,
BOpStats stats,
IBuffer<IBindingSet> outputBuffer)
Do a hash join between a stream of source solutions (left) and a hash
index (right).
|
void |
hashJoin2(ICloseableIterator<IBindingSet[]> leftItr,
BOpStats stats,
IBuffer<IBindingSet> outputBuffer,
IConstraint[] constraints)
Variant hash join method allows the caller to impose different
constraints or additional constraints.
|
ICloseableIterator<IBindingSet> |
indexScan()
Return an
BytesTrie.Iterator that visits all solutions in the index (index
scan). |
boolean |
isEmpty()
Return
true iff there are no solutions in the hash index. |
boolean |
isOutputDistinctJoinVars()
Returns true if the projection outputs the distinct join vars (in
that case, the variables delivered by {
getSelectVars() will
be ignored, might even be uninitialized). |
void |
mergeJoin(IHashJoinUtility[] others,
IBuffer<IBindingSet> outputBuffer,
IConstraint[] constraints,
boolean optional)
Perform an N-way merge join.
|
void |
outputJoinSet(IBuffer<IBindingSet> out)
Output the solutions which joined.
|
void |
outputOptionals(IBuffer<IBindingSet> outputBuffer)
Identify and output the optional solutions.
|
void |
outputSolutions(IBuffer<IBindingSet> out)
Output the solutions buffered in the hash index.
|
void |
release()
Discard the hash index.
|
void |
saveSolutionSet()
Checkpoint the generated hash index such that it becomes safe for
concurrent readers.
|
JoinTypeEnum getJoinType()
IVariable<?> getAskVar()
HashJoinAnnotations.ASK_VAR
IVariable<?>[] getJoinVars()
HashJoinAnnotations.JOIN_VARS
IVariable<?>[] getSelectVars()
JoinAnnotations.SELECT
boolean isOutputDistinctJoinVars()
getSelectVars()
will
be ignored, might even be uninitialized). See
HashJoinAnnotations.OUTPUT_DISTINCT_JVs
.IConstraint[] getConstraints()
JoinAnnotations.CONSTRAINTS
boolean isEmpty()
true
iff there are no solutions in the hash index.long getRightSolutionCount()
void release()
long acceptSolutions(ICloseableIterator<IBindingSet[]> itr, BOpStats stats)
When optional:=true
, solutions which do not have a binding
for one or more of the join variables will be inserted into the hash
index anyway using hashCode:=1
. This allows the solutions to
be discovered when we scan the hash index and the set of solutions which
did join to identify the optional solutions.
itr
- The source from which the solutions will be drained.stats
- The statistics to be updated as the solutions are buffered on
the hash index.long filterSolutions(ICloseableIterator<IBindingSet[]> itr, BOpStats stats, IBuffer<IBindingSet> sink)
itr
- The source solutions.stats
- The stats to be updated.sink
- The sink.void hashJoin(ICloseableIterator<IBindingSet[]> leftItr, BOpStats stats, IBuffer<IBindingSet> outputBuffer)
Note: Some JoinTypeEnum
s have side-effects on the join state. For
this joins, once method has been invoked for the final time, you must
then invoke either outputOptionals(IBuffer)
(Optional or
NotExists) or outputJoinSet(IBuffer)
(Exists).
leftItr
- A stream of chunks of solutions to be joined against the hash
index (left).stats
- The statistics to be updated as solutions are drained from the
leftItr (optional). When left
is the
pipeline, BOpStats.chunksIn
and
BOpStats.unitsIn
should be updated by passing in the
BOpStats
object. When left
is a hash
index (i.e., for a hash join against an access path), you
should pass null
since the chunksIn and unitsIn
are updated as the HashIndexOp
builds the hash index
rather than when it executes the join against the access
path).outputBuffer
- Where to write the solutions which join.void hashJoin2(ICloseableIterator<IBindingSet[]> leftItr, BOpStats stats, IBuffer<IBindingSet> outputBuffer, IConstraint[] constraints)
Note: Some JoinTypeEnum
s have side-effects on the join state. For
this joins, once method has been invoked for the final time, you must
then invoke either outputOptionals(IBuffer)
(Optional or
NotExists) or outputJoinSet(IBuffer)
(Exists).
leftItr
- A stream of chunks of solutions to be joined against the hash
index (left).stats
- The statistics to be updated as solutions are drained from the
leftItr.outputBuffer
- Where to write the solutions which join.constraints
- Constraints attached to this join (optional). Any constraints
specified here are combined with those specified in the
constructor.void mergeJoin(IHashJoinUtility[] others, IBuffer<IBindingSet> outputBuffer, IConstraint[] constraints, boolean optional)
The merge join takes a set of solution sets in the some order and having the same join variables. It examines the next solution in order for each solution set and compares them. For each solution set which reported a solution having the same join variables as that earliest solution, it outputs the cross product and advances the iterator on that solution set.
The iterators draining the source solution sets need to be synchronized such that we consider only solutions having the same hash code in each cycle of the MERGE JOIN. The synchronization step is different depending on whether or not the MERGE JOIN is OPTIONAL.
If the MERGE JOIN is REQUIRED, then we want to synchronize the source solution iterators on the next lowest key (aka hash code) which they all have in common.
If the MERGE JOIN is OPTIONAL, then we want to synchronize the source solution iterators on the next lowest key (aka hash code) which appears for any source iterator. Solutions will not be drawn from iterators not having that key in that pass.
Note that each hash code may be an alias for solutions having different values for their join variables. Such solutions will not join. However, only solutions having the same values for the hash code can join. Thus, by proceeding with synchronized iterators and operating only on solutions having the same hash code in each round, we will consider all solutions which COULD join with one another in each round.
Note: If the solutions are not in a stable and mutually consistent order
by hash code in the hash indices then the solutions in each hash index
MUST be SORTED before proceeding. (The HTree
maintains solutions
in such an order but the JVM collections do not.)
others
- The other solution sets to be joined. All instances must be of
the same concrete type as this.outputBuffer
- Where to write the solutions.constraints
- The join constraints.optional
- true
iff the join is optional.void saveSolutionSet()
void outputOptionals(IBuffer<IBindingSet> outputBuffer)
Optionals are identified using a joinSet containing each right solution which joined with at least one left solution. The total set of right solutions is then scanned once. For each right solution, we probe the joinSet. If the right solution did not join, then it is output now as an optional join.
outputBuffer
- Where to write the optional solutions.void outputSolutions(IBuffer<IBindingSet> out)
out
- Where to write the solutions.ICloseableIterator<IBindingSet> indexScan()
BytesTrie.Iterator
that visits all solutions in the index (index
scan). The visited solutions MAY contain variables that would not be
projected out of the hash join.
Note: This is very nearly the same as outputSolutions(IBuffer)
except that the latter only outputs the projected variables and it writes
onto an IBuffer
rather than returning an
ICloseableIterator
.
BytesTrie.Iterator
.void outputJoinSet(IBuffer<IBindingSet> out)
out
- Where to write the solutions.Copyright © 2006–2019 SYSTAP, LLC DBA Blazegraph. All rights reserved.