public class PipelineJoinStats extends BaseJoinStats
Modifier and Type | Field and Description |
---|---|
CAT |
inputSolutions
The #of input solutions consumed (not just accepted).
|
CAT |
outputSolutions
The #of output solutions generated.
|
accessPathChunksIn, accessPathCount, accessPathDups, accessPathRangeCount, accessPathUnitsIn
chunksIn, chunksOut, elapsed, mutationCount, opCount, typeErrors, unitsIn, unitsOut
Constructor and Description |
---|
PipelineJoinStats() |
Modifier and Type | Method and Description |
---|---|
void |
add(BOpStats o)
Combine the statistics (addition), but do NOT add to self.
|
double |
getJoinHitRatio()
The estimated join hit ratio.
|
protected void |
toString(StringBuilder sb)
Extension hook for
BOpStats.toString() . |
public final CAT inputSolutions
Note: This counter is highly correlated with BOpStats.unitsIn
but
is incremented only when we begin evaluation of the IAccessPath
associated with a specific input solution.
When PipelineJoin.Annotations.COALESCE_DUPLICATE_ACCESS_PATHS
is
true
, multiple input binding sets can be mapped onto the
same IAccessPath
and this counter will be incremented by the #of
such input binding sets.
public final CAT outputSolutions
getJoinHitRatio()
. Of
necessity, updates to inputSolutions
slightly lead updates to
inputSolutions
.
Note: This counter is highly correlated with BOpStats.unitsOut
.
public double getJoinHitRatio()
outputSolutions / inputSolutionsIt is ZERO (0) when
inputSolutions
is ZERO (0).
The join hit ratio is always accurate when the join is fully executed.
However, when a cutoff join is used to estimate the join hit ratio a
measurement error can be introduced into the join hit ratio unless
PipelineJoin.Annotations.COALESCE_DUPLICATE_ACCESS_PATHS
is false
,
PipelineOp.Annotations.MAX_PARALLEL
is GT ONE (1), or
PipelineJoin.Annotations.MAX_PARALLEL_CHUNKS
is GT ZERO (0).
When access paths are coalesced because there is an inner loop over the
input solutions mapped onto the same access path. This inner loop the
causes inputSolutions
to be incremented by the
#of coalesced access paths before any outputSolutions
are counted. Coalescing access paths therefore can cause the join hit
ratio to be underestimated as there may appear to be more input solutions
consumed than were actually applied to produce output solutions if the
join was cutoff while processing a set of input solutions which were
identified as using the same as-bound access path.
The worst case can introduce substantial error into the estimated join
hit ratio. Consider a cutoff of 100
. If one input solution
generates 100 output solutions and two input solutions are mapped onto
the same access path, then the input count will be 2 and the output count
will be 100, which gives a reported join hit ration of 100/2
when the actual join hit ratio is 100/1
.
A similar problem can occur if PipelineOp.Annotations.MAX_PARALLEL
or
PipelineJoin.Annotations.MAX_PARALLEL_CHUNKS
is GT ONE (1) since input count
can be incremented by the #of threads before any output solutions are
generated. Estimation error can also occur if multiple join tasks are run
in parallel for different chunks of input solutions.
public void add(BOpStats o)
BOpStats
add
in class BaseJoinStats
o
- Another statistics object.protected void toString(StringBuilder sb)
BOpStats
BOpStats.toString()
.toString
in class BaseJoinStats
sb
- Where to write the additional state.Copyright © 2006–2019 SYSTAP, LLC DBA Blazegraph. All rights reserved.