public class IVSolutionSetEncoder extends Object implements IBindingSetEncoder
IBindingSet
s whose bound values are IV
s
and their cached BigdataValue
s. The IV
s and the cached
BigdataValue
s are efficiently and compactly represented in format
suitable for chunked messages or streaming. Decode is a fast online process.
Both encode and decode require the maintenance of a map from the IV
having cached BigdataValue
s to those cached values.
nbound nvars ncached (namespace) var[0]...var[nvars-1] bitmap-for-bound-variables bitmap-for-IV-with-cached-Values IV[0] ... IV[nbound-1] Value[0] ... Value[ncached-1]where
nbound
is the #of bindings in the binding set. When zero,
the rest of the record is omitted.
where nvars
is the #of new variables in this binding set. The
"schema" used to encode the bindings is based on the ordered set of variables
for which bindings are observed. The encoder writes this information out
incrementally. The decoder builds up this information as it decodes
solutions.
where ncached
is the #of bindings in the binding set for which
there is a cached BigdataValue
which has not already been written
into a previous record. Even if the IV
has a cached
BigdataValue
, if the IV
has been previously written into a
record then the IV
is NOT record in this record with a cached Value.
Further, if the IV
appears more than once in a given record, the
cached value is only marked in the bitmap for the first such occurrence and
the cached value is only written into the record once.
where namespace
is the namespace of the lexicon relation. This
is written out for the first solution having an IVCache
association.
It is assumed that all Value
s are BigdataValue
for the same
lexicon relation. If no solutions have an IVCache
association, then
the namespace will never be written into the encoded output.
where var
is the name of a variable for which a binding was
first observed for the current solution. The names of the variables are
written in the order in which they are first observed. This forms the
implicit "schema" required to decode the IV
[].
where bitmap-for-bound-variables
is zero or more bytes providing
a bit map indicating those variables which are bound in this solution out of
the total set of variables which have been observed in the solutions
presented to this encode.
where bitmap-for-IVs-with-cached-Values
is zero or more bytes
providing a bit map indicating which IVs are associated with cached values
written into the record. Whether or not an IV has a cached value must be
decided by the caller after processing the record and consulting an
(IV,Value) cache which they maintain over the set of records processed to
date. Cached values are written out (and the bit set) only the first time a
given IV with a cached Value is observed.
where IV[n]
is an IV
as encoded by IVUtility
.
where BigdataValue
is an RDF Value serialized using the
BigdataValueSerializer
for the namespace of the lexicon.
BigdataValueFactory
and BigdataValueSerializer
used to decode
and materialize the cached BigdataValue
s. This information can be
sent before the records if it is not known to the caller.
The decoder materializes the cached values into a map (either a HashMap or HTree, as appropriate for the data scale) as the records are processed. Only one solution needs to be decoded at a time, but the decoder must maintain the (IV,Value) cache across all decoded records. There is no need to indicate the #of records, but IChunkMessage#getSolutionCount() in fact reports exactly that information.
Each solution can be turned into an IBindingSet
at the time that it
is decoded. If we use a standard ListBindingSet
, then we need to
resolve each IV
against the IV
cache, setting its RDF Value
as a side effect before returning the IBindingSet to the caller. If we do a
custom IBindingSet
implementation, then the cached
BigdataValue
could be lazily materialized by hooking
IVCache.getValue()
. Either way, the life cycle of the materialized
objects will be very short unless they are propagated into new solutions.
Short life cycle objects entail very little heap burden.
NOTE: the IVSolutionSetEncode may give us *DIFFERENT* representations for
the same binding set, depending on its internal state. This is relevant
insofar as we cannot perform safe equality checks over encoded values
(the IVBindingSetEncoder provides this guarantee).
Constructor and Description |
---|
IVSolutionSetEncoder() |
Modifier and Type | Method and Description |
---|---|
void |
encodeSolution(DataOutputBuffer out,
IBindingSet bset)
Encode the solution on the stream.
|
byte[] |
encodeSolution(IBindingSet bset)
|
byte[] |
encodeSolution(IBindingSet bset,
boolean updateCacheIsIgnored)
Encode the solution as an
IV []. |
void |
flush()
Flush any updates.
|
boolean |
isValueCache()
Return
true iff the IVCache associations are
preserved by the encoder. |
void |
release()
Release the state associated with the
IVBindingSetEncoder . |
String |
toString() |
public void encodeSolution(DataOutputBuffer out, IBindingSet bset)
out
- The stream.bset
- The solution.public byte[] encodeSolution(IBindingSet bset)
IBindingSetEncoder
encodeSolution
in interface IBindingSetEncoder
bset
- The solution to be encoded.public byte[] encodeSolution(IBindingSet bset, boolean updateCacheIsIgnored)
IV
[].
Note: The IVCache
associations may be buffered by this method.
Use IBindingSetEncoder.flush()
to vector any buffered associations.
TODO We typically use a ListBindingSet
. If the
IBindingSet
is large enough, then it would be more efficient to
create an IVariable
to IV
map within this method since we
have to lookup bindings by variables more than once.
encodeSolution
in interface IBindingSetEncoder
bset
- The solution to be encoded.updateCacheIsIgnored
- When true
, updates are accumulated for the
IV
to BigdataValue
cache. You must still use
IBindingSetEncoder.flush()
to vector the accumulated updates.
If you are only generating the encoding in order to resolve a
key in a hash index, then you would use false
since you do not need to maintain the IVCache
association for the given IBindingSet
.
public void release()
IBindingSetEncoder
IVBindingSetEncoder
.release
in interface IBindingSetEncoder
public void flush()
IBindingSetEncoder
IVCache
associations.flush
in interface IBindingSetEncoder
public boolean isValueCache()
isValueCache
in interface IBindingSetEncoder
Copyright © 2006–2019 SYSTAP, LLC DBA Blazegraph. All rights reserved.