public class JVMHashJoinUtility extends Object implements IHashJoinUtility
Modifier and Type | Field and Description |
---|---|
protected IVariable<?> |
askVar |
protected IConstraint[] |
constraints
The join constraints (optional).
|
static IHashJoinUtilityFactory |
factory
Singleton
IHashJoinUtilityFactory that can be used to create a
new JVMHashJoinUtility . |
protected JoinTypeEnum |
joinType
The type of join to be performed.
|
protected IVariable<?>[] |
joinVars
The join variables.
|
protected CAT |
nJoinsConsidered
The #of solution pairs considered for a join.
|
protected CAT |
nleftConsidered
The #of left solutions considered for a join.
|
protected CAT |
nrightConsidered
The #of right solutions considered for a join.
|
protected AtomicBoolean |
open
true until the state is discarded by release() . |
protected boolean |
outputDistinctJVs
True if the hash join utility class is to output the distinct join
variables.
|
protected CAT |
rightSolutionCount
The #of solutions accepted into the hash index.
|
protected AtomicReference<JVMHashIndex> |
rightSolutionsRef
The hash index.
|
protected IVariable<?>[] |
selectVars
The variables to be retained (aka projected out) (optional, all variables
are retained if not specified).
|
Constructor and Description |
---|
JVMHashJoinUtility(PipelineOp op,
JoinTypeEnum joinType) |
Modifier and Type | Method and Description |
---|---|
long |
acceptSolutions(ICloseableIterator<IBindingSet[]> itr,
BOpStats stats)
Buffer solutions on a hash index.
|
long |
filterSolutions(ICloseableIterator<IBindingSet[]> itr,
BOpStats stats,
IBuffer<IBindingSet> sink)
Filter solutions, writing only the DISTINCT solutions onto the sink.
|
IVariable<?> |
getAskVar()
The variable bound based on whether or not a solution survives an
"EXISTS" graph pattern (optional).
|
IConstraint[] |
getConstraints()
The join constraints (optional).
|
JoinTypeEnum |
getJoinType()
Return the type safe enumeration indicating what kind of operation is to
be performed.
|
IVariable<?>[] |
getJoinVars()
The join variables.
|
protected long |
getNoJoinVarsLimit() |
long |
getRightSolutionCount()
Return the #of solutions in the hash index.
|
protected JVMHashIndex |
getRightSolutions() |
IVariable<?>[] |
getSelectVars()
The variables to be retained (optional, all variables are retained if
not specified).
|
void |
hashJoin(ICloseableIterator<IBindingSet[]> leftItr,
BOpStats stats,
IBuffer<IBindingSet> outputBuffer)
Do a hash join between a stream of source solutions (left) and a hash
index (right).
|
void |
hashJoin2(ICloseableIterator<IBindingSet[]> leftItr,
BOpStats stats,
IBuffer<IBindingSet> outputBuffer,
IConstraint[] constraints)
Variant hash join method allows the caller to impose different
constraints or additional constraints.
|
ICloseableIterator<IBindingSet> |
indexScan()
Return an
BytesTrie.Iterator that visits all solutions in the index (index
scan). |
boolean |
isEmpty()
Return
true iff there are no solutions in the hash index. |
boolean |
isOutputDistinctJoinVars()
Returns true if the projection outputs the distinct join vars (in
that case, the variables delivered by {
IHashJoinUtility.getSelectVars() will
be ignored, might even be uninitialized). |
void |
mergeJoin(IHashJoinUtility[] others,
IBuffer<IBindingSet> outputBuffer,
IConstraint[] constraints,
boolean optional)
Perform an N-way merge join.
|
void |
outputJoinSet(IBuffer<IBindingSet> outputBuffer)
Output the solutions which joined.
|
void |
outputOptionals(IBuffer<IBindingSet> outputBuffer)
Identify and output the optional solutions.
|
protected void |
outputSolution(IBuffer<IBindingSet> outputBuffer,
IBindingSet outSolution)
Output a solution.
|
void |
outputSolutions(IBuffer<IBindingSet> out)
Output the solutions buffered in the hash index.
|
void |
release()
Discard the hash index.
|
void |
saveSolutionSet()
Checkpoint the generated hash index such that it becomes safe for
concurrent readers.
|
String |
toString()
Human readable representation of the
IHashJoinUtility metadata
(but not the solutions themselves). |
public static final IHashJoinUtilityFactory factory
IHashJoinUtilityFactory
that can be used to create a
new JVMHashJoinUtility
.protected final AtomicBoolean open
true
until the state is discarded by release()
.protected final JoinTypeEnum joinType
protected final IVariable<?> askVar
HashJoinAnnotations.ASK_VAR
protected final IVariable<?>[] joinVars
protected final IVariable<?>[] selectVars
protected boolean outputDistinctJVs
protected final IConstraint[] constraints
protected final AtomicReference<JVMHashIndex> rightSolutionsRef
Note: There is no separate "joinSet". Instead, the JVMHashIndex.SolutionHit
class provides a join hit counter.
protected final CAT rightSolutionCount
protected final CAT nleftConsidered
protected final CAT nrightConsidered
protected final CAT nJoinsConsidered
public JVMHashJoinUtility(PipelineOp op, JoinTypeEnum joinType)
op
- The operator whose annotation will inform construction the
hash index. The HTreeAnnotations
may be specified for
this operator and will control the initialization of the
various HTree
instances.joinType
- The type of join to be performed.JVMHashJoinAnnotations
public String toString()
IHashJoinUtility
metadata
(but not the solutions themselves).public JoinTypeEnum getJoinType()
IHashJoinUtility
getJoinType
in interface IHashJoinUtility
public IVariable<?> getAskVar()
IHashJoinUtility
getAskVar
in interface IHashJoinUtility
HashJoinAnnotations.ASK_VAR
public IVariable<?>[] getJoinVars()
IHashJoinUtility
getJoinVars
in interface IHashJoinUtility
HashJoinAnnotations.JOIN_VARS
public IVariable<?>[] getSelectVars()
IHashJoinUtility
getSelectVars
in interface IHashJoinUtility
JoinAnnotations.SELECT
public boolean isOutputDistinctJoinVars()
IHashJoinUtility
IHashJoinUtility.getSelectVars()
will
be ignored, might even be uninitialized). See
HashJoinAnnotations.OUTPUT_DISTINCT_JVs
.isOutputDistinctJoinVars
in interface IHashJoinUtility
public IConstraint[] getConstraints()
IHashJoinUtility
getConstraints
in interface IHashJoinUtility
JoinAnnotations.CONSTRAINTS
public boolean isEmpty()
IHashJoinUtility
true
iff there are no solutions in the hash index.isEmpty
in interface IHashJoinUtility
protected long getNoJoinVarsLimit()
protected JVMHashIndex getRightSolutions()
public long getRightSolutionCount()
IHashJoinUtility
getRightSolutionCount
in interface IHashJoinUtility
public void release()
IHashJoinUtility
release
in interface IHashJoinUtility
public long acceptSolutions(ICloseableIterator<IBindingSet[]> itr, BOpStats stats)
IHashJoinUtility
When optional:=true
, solutions which do not have a binding
for one or more of the join variables will be inserted into the hash
index anyway using hashCode:=1
. This allows the solutions to
be discovered when we scan the hash index and the set of solutions which
did join to identify the optional solutions.
acceptSolutions
in interface IHashJoinUtility
itr
- The source from which the solutions will be drained.stats
- The statistics to be updated as the solutions are buffered on
the hash index.public long filterSolutions(ICloseableIterator<IBindingSet[]> itr, BOpStats stats, IBuffer<IBindingSet> sink)
IHashJoinUtility
filterSolutions
in interface IHashJoinUtility
itr
- The source solutions.stats
- The stats to be updated.sink
- The sink.public void hashJoin(ICloseableIterator<IBindingSet[]> leftItr, BOpStats stats, IBuffer<IBindingSet> outputBuffer)
IHashJoinUtility
Note: Some JoinTypeEnum
s have side-effects on the join state. For
this joins, once method has been invoked for the final time, you must
then invoke either IHashJoinUtility.outputOptionals(IBuffer)
(Optional or
NotExists) or IHashJoinUtility.outputJoinSet(IBuffer)
(Exists).
hashJoin
in interface IHashJoinUtility
leftItr
- A stream of chunks of solutions to be joined against the hash
index (left).stats
- The statistics to be updated as solutions are drained from the
leftItr (optional). When left
is the
pipeline, BOpStats.chunksIn
and
BOpStats.unitsIn
should be updated by passing in the
BOpStats
object. When left
is a hash
index (i.e., for a hash join against an access path), you
should pass null
since the chunksIn and unitsIn
are updated as the HashIndexOp
builds the hash index
rather than when it executes the join against the access
path).outputBuffer
- Where to write the solutions which join.public void hashJoin2(ICloseableIterator<IBindingSet[]> leftItr, BOpStats stats, IBuffer<IBindingSet> outputBuffer, IConstraint[] constraints)
Note: Some JoinTypeEnum
s have side-effects on the join state. For
this joins, once method has been invoked for the final time, you must
then invoke either IHashJoinUtility.outputOptionals(IBuffer)
(Optional or
NotExists) or IHashJoinUtility.outputJoinSet(IBuffer)
(Exists).
For each source solution materialized, the hash table is probed using the as-bound join variables for that source solution. A join hit counter is carried for each solution in the hash index and is used to support OPTIONAL joins.
hashJoin2
in interface IHashJoinUtility
leftItr
- A stream of chunks of solutions to be joined against the hash
index (left).stats
- The statistics to be updated as solutions are drained from the
leftItr.outputBuffer
- Where to write the solutions which join.constraints
- Constraints attached to this join (optional). Any constraints
specified here are combined with those specified in the
constructor.public void saveSolutionSet()
This implementation is a NOP since the underlying Java collection class is thread-safe for concurrent readers.
saveSolutionSet
in interface IHashJoinUtility
protected void outputSolution(IBuffer<IBindingSet> outputBuffer, IBindingSet outSolution)
outputBuffer
- Where to write the solution.outSolution
- The solution.public void outputOptionals(IBuffer<IBindingSet> outputBuffer)
IHashJoinUtility
Optionals are identified using a joinSet containing each right solution which joined with at least one left solution. The total set of right solutions is then scanned once. For each right solution, we probe the joinSet. If the right solution did not join, then it is output now as an optional join.
outputOptionals
in interface IHashJoinUtility
outputBuffer
- Where to write the optional solutions.public ICloseableIterator<IBindingSet> indexScan()
IHashJoinUtility
BytesTrie.Iterator
that visits all solutions in the index (index
scan). The visited solutions MAY contain variables that would not be
projected out of the hash join.
Note: This is very nearly the same as IHashJoinUtility.outputSolutions(IBuffer)
except that the latter only outputs the projected variables and it writes
onto an IBuffer
rather than returning an
ICloseableIterator
.
indexScan
in interface IHashJoinUtility
BytesTrie.Iterator
.public void outputSolutions(IBuffer<IBindingSet> out)
IHashJoinUtility
outputSolutions
in interface IHashJoinUtility
out
- Where to write the solutions.public void outputJoinSet(IBuffer<IBindingSet> outputBuffer)
IHashJoinUtility
outputJoinSet
in interface IHashJoinUtility
outputBuffer
- Where to write the solutions.public void mergeJoin(IHashJoinUtility[] others, IBuffer<IBindingSet> outputBuffer, IConstraint[] constraints, boolean optional)
IHashJoinUtility
The merge join takes a set of solution sets in the some order and having the same join variables. It examines the next solution in order for each solution set and compares them. For each solution set which reported a solution having the same join variables as that earliest solution, it outputs the cross product and advances the iterator on that solution set.
The iterators draining the source solution sets need to be synchronized such that we consider only solutions having the same hash code in each cycle of the MERGE JOIN. The synchronization step is different depending on whether or not the MERGE JOIN is OPTIONAL.
If the MERGE JOIN is REQUIRED, then we want to synchronize the source solution iterators on the next lowest key (aka hash code) which they all have in common.
If the MERGE JOIN is OPTIONAL, then we want to synchronize the source solution iterators on the next lowest key (aka hash code) which appears for any source iterator. Solutions will not be drawn from iterators not having that key in that pass.
Note that each hash code may be an alias for solutions having different values for their join variables. Such solutions will not join. However, only solutions having the same values for the hash code can join. Thus, by proceeding with synchronized iterators and operating only on solutions having the same hash code in each round, we will consider all solutions which COULD join with one another in each round.
Note: If the solutions are not in a stable and mutually consistent order
by hash code in the hash indices then the solutions in each hash index
MUST be SORTED before proceeding. (The HTree
maintains solutions
in such an order but the JVM collections do not.)
mergeJoin
in interface IHashJoinUtility
others
- The other solution sets to be joined. All instances must be of
the same concrete type as this.outputBuffer
- Where to write the solutions.constraints
- The join constraints.optional
- true
iff the join is optional.Copyright © 2006–2019 SYSTAP, LLC DBA Blazegraph. All rights reserved.