public class IVSolutionSetEncoder extends Object implements IBindingSetEncoder
IBindingSets whose bound values are IVs
and their cached BigdataValues. The IVs and the cached
BigdataValues are efficiently and compactly represented in format
suitable for chunked messages or streaming. Decode is a fast online process.
Both encode and decode require the maintenance of a map from the IV
having cached BigdataValues to those cached values.
nbound nvars ncached (namespace) var[0]...var[nvars-1] bitmap-for-bound-variables bitmap-for-IV-with-cached-Values IV[0] ... IV[nbound-1] Value[0] ... Value[ncached-1]where
nbound is the #of bindings in the binding set. When zero,
the rest of the record is omitted.
where nvars is the #of new variables in this binding set. The
"schema" used to encode the bindings is based on the ordered set of variables
for which bindings are observed. The encoder writes this information out
incrementally. The decoder builds up this information as it decodes
solutions.
where ncached is the #of bindings in the binding set for which
there is a cached BigdataValue which has not already been written
into a previous record. Even if the IV has a cached
BigdataValue, if the IV has been previously written into a
record then the IV is NOT record in this record with a cached Value.
Further, if the IV appears more than once in a given record, the
cached value is only marked in the bitmap for the first such occurrence and
the cached value is only written into the record once.
where namespace is the namespace of the lexicon relation. This
is written out for the first solution having an IVCache association.
It is assumed that all Values are BigdataValue for the same
lexicon relation. If no solutions have an IVCache association, then
the namespace will never be written into the encoded output.
where var is the name of a variable for which a binding was
first observed for the current solution. The names of the variables are
written in the order in which they are first observed. This forms the
implicit "schema" required to decode the IV[].
where bitmap-for-bound-variables is zero or more bytes providing
a bit map indicating those variables which are bound in this solution out of
the total set of variables which have been observed in the solutions
presented to this encode.
where bitmap-for-IVs-with-cached-Values is zero or more bytes
providing a bit map indicating which IVs are associated with cached values
written into the record. Whether or not an IV has a cached value must be
decided by the caller after processing the record and consulting an
(IV,Value) cache which they maintain over the set of records processed to
date. Cached values are written out (and the bit set) only the first time a
given IV with a cached Value is observed.
where IV[n] is an IV as encoded by IVUtility.
where BigdataValue is an RDF Value serialized using the
BigdataValueSerializer for the namespace of the lexicon.
BigdataValueFactory and BigdataValueSerializer used to decode
and materialize the cached BigdataValues. This information can be
sent before the records if it is not known to the caller.
The decoder materializes the cached values into a map (either a HashMap or HTree, as appropriate for the data scale) as the records are processed. Only one solution needs to be decoded at a time, but the decoder must maintain the (IV,Value) cache across all decoded records. There is no need to indicate the #of records, but IChunkMessage#getSolutionCount() in fact reports exactly that information.
Each solution can be turned into an IBindingSet at the time that it
is decoded. If we use a standard ListBindingSet, then we need to
resolve each IV against the IV cache, setting its RDF Value
as a side effect before returning the IBindingSet to the caller. If we do a
custom IBindingSet implementation, then the cached
BigdataValue could be lazily materialized by hooking
IVCache.getValue(). Either way, the life cycle of the materialized
objects will be very short unless they are propagated into new solutions.
Short life cycle objects entail very little heap burden.
NOTE: the IVSolutionSetEncode may give us *DIFFERENT* representations for
the same binding set, depending on its internal state. This is relevant
insofar as we cannot perform safe equality checks over encoded values
(the IVBindingSetEncoder provides this guarantee).
| Constructor and Description |
|---|
IVSolutionSetEncoder() |
| Modifier and Type | Method and Description |
|---|---|
void |
encodeSolution(DataOutputBuffer out,
IBindingSet bset)
Encode the solution on the stream.
|
byte[] |
encodeSolution(IBindingSet bset)
|
byte[] |
encodeSolution(IBindingSet bset,
boolean updateCacheIsIgnored)
Encode the solution as an
IV[]. |
void |
flush()
Flush any updates.
|
boolean |
isValueCache()
Return
true iff the IVCache associations are
preserved by the encoder. |
void |
release()
Release the state associated with the
IVBindingSetEncoder. |
String |
toString() |
public void encodeSolution(DataOutputBuffer out, IBindingSet bset)
out - The stream.bset - The solution.public byte[] encodeSolution(IBindingSet bset)
IBindingSetEncoderencodeSolution in interface IBindingSetEncoderbset - The solution to be encoded.public byte[] encodeSolution(IBindingSet bset, boolean updateCacheIsIgnored)
IV[].
Note: The IVCache associations may be buffered by this method.
Use IBindingSetEncoder.flush() to vector any buffered associations.
TODO We typically use a ListBindingSet. If the
IBindingSet is large enough, then it would be more efficient to
create an IVariable to IV map within this method since we
have to lookup bindings by variables more than once.
encodeSolution in interface IBindingSetEncoderbset - The solution to be encoded.updateCacheIsIgnored - When true, updates are accumulated for the
IV to BigdataValue cache. You must still use
IBindingSetEncoder.flush() to vector the accumulated updates.
If you are only generating the encoding in order to resolve a
key in a hash index, then you would use false
since you do not need to maintain the IVCache
association for the given IBindingSet.
public void release()
IBindingSetEncoderIVBindingSetEncoder.release in interface IBindingSetEncoderpublic void flush()
IBindingSetEncoderIVCache associations.flush in interface IBindingSetEncoderpublic boolean isValueCache()
isValueCache in interface IBindingSetEncoderCopyright © 2006–2019 SYSTAP, LLC DBA Blazegraph. All rights reserved.