public class ASTRangeConstraintOptimizer extends Object implements IASTOptimizer
StatementPatternNode
s.
Datatype constraints on a variable always bind on the Object position of a
statement as it is only in the Object position that a Literal may be bound.
Due to the nature of the indices, which are organized in key ranges for each
of the DTE
s, a datatype constraint implies one or more key ranges of
a statement index, depending on whether the datatype constraint is "ground"
(a concrete datatype) or non-ground (a set of datatypes, any of which are
legal given the value expressions in which the variable may appear). This
requires reasoning about datatype constraints, including the most restrictive
hierarchy of implicit conversions which are permitted by the value
expressions in which the variable with the range constraint is used.
The most direct way to specify a datatype constraint is to explicit specify a
datatype using FunctionRegistry.DATATYPE
. However, even in this case
the datatype constraint may be part of an OR, in which case the effective
datatype constraint is the UNION of the specified datatypes. Likewise, a
value expression which will provably fail (e.g., by causing a type error) for
a datatype effectively provides an exclusion for that datatype.
Range constraints are identified by the comparator operators (GT, GTE, LT,
LTE). Range constraints apply to all datatypes which the variable may take
on. The most effectively constraint occurs when the datatype and the range
are both known. For example, the variable has a known datatype constraint of
xsd:float
and a value range constraint of (4.:5.]
.
Datatype and range constraints have the most utility where we would otherwise be 0-bound statement index (OS(C)P). The utility of the constraint decreases as the #of known bound components of the key increases. E.g., a datatype or value range constraint on PO(C)S provides less utility than one one OS(C)P, etc. This is because the datatype / value range constraint does less to improve the selectivity of the predicate as the #of known bound key components increases.
See Default graphs always uses SCAN + FILTER and lacks efficient PARALLEL SUBQUERY code path 407 for more on this issue.
ILexiconConfiguration
is such that NO values consistent with
the datatype constraint may appear outside of that implied key constraint.
For example, it must not be possible that values of that datatype could
appear in either the TERMS or BLOBS index. Thus, applying datatype
constraints to non-inline IVs requires reasoning about the
ILexiconConfiguration
.
The following schema was excerpted from AbstractIV
. See that class
for more detailed and authoriative information about the encoding of
IV
s.
[valueType] : 2 bits (Literal) [inline] : 1 bit (true) [extension] : 1 bit (false unless datatype is an extension type) [dataTypeCode] : 4 bits (the DTE code) ---- extensionIV : IFF the datatype is an extension type. ---- natural encoding of the value for the specified DTE.Value ranges constraints apply only to inline
IV
s. This means that
the valueType will always be VTE.LITERAL
and the bit flag
inline:=true. The extension bit will either be set or cleared depending on
whether the datatype is an extension type. (Reasoning about extension types
would require an extension to how they are declared.)
The value range constraint, if any, is applied after the optional extension
IV
. This means that all statements which satisfy the DTE
, the
optional extensionIV, and the value range constraint will actually have the
correct datatype.
TODO Static optimizer. The static optimizer must put the joins into an order
which is consistent with the selectivity of the predicates. When a predicate
has a datatype and/or value range constraint, then that MUST be considered by
the static optimizer in order to produce a join ordering which benefits from
the added selectivity of that constraint.
The static optimizer needs to treat SPs with attached datatype and/or range
constraints as "somewhat" more bound. When the statement index would begin
with O, e.g., OSP or OSCP, the SP is effectively 1-bound rather than 0-bound.
If the datatype is ground, then the range count for that datatype and value
range is the range count of interest for the purposes of the static join
order optimizer. If the datatype is non-ground, then the sum of the fast
range counts across each possible ground datatype for the value range
constraint (when cast to the appropriate DTE
) gives the effective
range count.
TODO Identify implicit datatype constraints by examining the in scope value
expression(s) in which each variable appears and determining which datatypes
are (in)consistent with those value expression(s).
TODO Historically, the range constraints were attached as RangeBOp AND left
in place as normal FILTERs. This is because the range constraints were not
integrated into the optimizers in any depth. If the RangeBOp would up
attached to a JOIN where it could be imposed, then it was. If not, then the
FILTERs would handle the constraint eventually. The code in RangeBOp reflects
this practice.
TODO Integrate code to attach RangeBOps to predicates. (This code is from the
old BigdataEvaluationStrategyImpl3 class. It should be moved into an
IASTOptimizer for recognizing range constraints.)
TODO Ranges can be just upper or lower. They do not need to be both. They can
even be an excluded middle. Ranges can also be just a datatype constraint,
since that implies the key range in the OS(C)P index in which the variable
may take on that datatype.
TODO The big thing about handling range constraints is make sure that we
query each part of the OS(C)P index which corresponds to a datatype which
could have a value legally promotable within the context in which the
LT/LTE/GT/GTE filter(s) occur. For example, x>5 && x<10 needs to do a
key-range scan for xsd:int, xsd:integer, .... The big win comes when we can
recognize a datatype constraint at the same time such that we only touch one
part of the index for that key range of values.
TODO Each GT(E)/LT(E) constraint should be broken down into a separate filter
so we can apply one even when the other might depend on a variable which is
not yet bound.
SPOKeyOrder.getFromKey(IKeyBuilder, IPredicate)
,
(lift range
constraints onto access path)
,
(default graph join
optimization)
Constructor and Description |
---|
ASTRangeConstraintOptimizer() |
Modifier and Type | Method and Description |
---|---|
QueryNodeWithBindingSet |
optimize(AST2BOpContext context,
QueryNodeWithBindingSet input)
Optimize the AST.
|
public QueryNodeWithBindingSet optimize(AST2BOpContext context, QueryNodeWithBindingSet input)
IASTOptimizer
optimize
in interface IASTOptimizer
context
- The evaluation context.input
- The input to the optimizer, consisting of a queryNode and
input binding set.Copyright © 2006–2019 SYSTAP, LLC DBA Blazegraph. All rights reserved.