public class SparseRowStore extends Object implements IRowStoreConstants
IIndex
to provide an
efficient data model in which a logical row is stored as one or more entries
in the IIndex
. Operations are provided for atomic read and write of
logical row. While the scan operations are always consistent (they will never
reveal data from a row that undergoing concurrent modification), they do NOT
cause concurrent atomic row writes to block. This means that rows that would
be visited by a scan MAY be modified before the scan reaches those rows and
the client will see the updates.
The SparseRowStore
requires that you declare the KeyType
for
primary key so that it may impose a consistent total ordering over the
generated keys in the index.
There is no intrinsic reason why column values must be strongly typed.
Therefore, by default column values are loosely typed. However, column values
MAY be constrained by a Schema
.
This class builds keys using the sparse row store design pattern. Each logical row is modeled as an ordered set of index entries whose keys are formed as:
[schemaName][primaryKey][columnName][timestamp]
and the values are the value for a given column for that primary key.
Timestamps are either generated by the application, in which case they define the semantics of a write-write conflict, or on write by the index. In the latter case, write-write conflicts never arise. Regardless of how timestamps are generated, the use of the timestamp in the key requires that applications specify filters that are applied during row scans to limit the data points actually returned as part of the row. For example, only returning the most recent column values no later than a given timestamp for all columns for some primary key.
For example, assuming records with the following columns
[employee][12][DateOfHire][t0] : [4/30/02] [employee][12][DateOfHire][t1] : [4/30/05] [employee][12][Employer][t0] : [SAIC] [employee][12][Employer][t1] : [SYSTAP] [employee][12][Id][t0] : [12] [employee][12][Name][t0] : [Bryan Thompson]
In order to read the logical row whose last update was t0
,
the caller would specify t0
as the toTime of interest.
The values read in this example would be {<DateOfHire, t0, 4/30/02>,
<Employer, t0, SAIC>, <Id, t0, 12>, <Name, t0, Bryan
Thompson>}.
Likewise, in order to read the logical row whose last update was <code>t1</code> the caller would specify <code>t1</code> as the toTime of interest. The values read in this example would be {<DateOfHire, t1, 4/30/05>, <Employer, t0, SYSTAP>, <Id, t0, 12>, <Name, t0, Bryan Thompson>}. Notice that values written at <code>t0</code> and not overwritten or deleted by <code>t1</code> are present in the resulting logical row.
Note: Very large objects should be stored in the BigdataFileSystem
(distributed, atomic, versioned, chunked file system) and the identifier for
that object can then be stored in the row store.
SparseRowStore
. A caching layer in the web app could be used to
reduce any hotspots., $Id$Modifier and Type | Class and Description |
---|---|
static interface |
SparseRowStore.Options
Options for the
SparseRowStore . |
Modifier and Type | Field and Description |
---|---|
protected static org.apache.log4j.Logger |
log |
AUTO_TIMESTAMP, AUTO_TIMESTAMP_UNIQUE, CURRENT_ROW, MAX_TIMESTAMP, MIN_TIMESTAMP
Constructor and Description |
---|
SparseRowStore(IIndex ndx)
Create a client-side abstraction that treats an
IIndex as a
SparseRowStore . |
Modifier and Type | Method and Description |
---|---|
ITPS |
delete(Schema schema,
Object primaryKey)
Atomic delete of all property values for the current logical row.
|
ITPS |
delete(Schema schema,
Object primaryKey,
long fromTime,
long toTime,
long writeTime,
INameFilter filter)
Atomic delete of all property values for the logical row.
|
Object |
get(Schema schema,
Object primaryKey,
String name)
Return the current binding for the named property.
|
IIndex |
getIndex()
The backing index.
|
List<String> |
getNamespaces(long tx)
List of namespaces, defined in the row store.
|
Iterator<? extends ITPS> |
rangeIterator(Schema schema)
A logical row scan.
|
Iterator<? extends ITPS> |
rangeIterator(Schema schema,
Object fromKey,
Object toKey)
A logical row scan.
|
Iterator<? extends ITPS> |
rangeIterator(Schema schema,
Object fromKey,
Object toKey,
INameFilter filter)
A logical row scan.
|
Iterator<? extends ITPS> |
rangeIterator(Schema schema,
Object fromKey,
Object toKey,
int capacity,
long fromTime,
long toTime,
INameFilter nameFilter)
A logical row scan.
|
Map<String,Object> |
read(Schema schema,
Object primaryKey)
Read the most recent logical row from the index.
|
Map<String,Object> |
read(Schema schema,
Object primaryKey,
INameFilter filter)
Read the most recent logical row from the index.
|
ITPS |
read(Schema schema,
Object primaryKey,
long fromTime,
long toTime,
INameFilter filter)
Read a logical row from the index.
|
Map<String,Object> |
write(Schema schema,
Map<String,Object> propertySet)
Atomic write with atomic read-back of the post-update state of the
logical row.
|
Map<String,Object> |
write(Schema schema,
Map<String,Object> propertySet,
long writeTime)
Atomic write with atomic read-back of the post-update state of the
logical row.
|
TPS |
write(Schema schema,
Map<String,Object> propertySet,
long writeTime,
INameFilter filter,
IPrecondition precondition)
Atomic write with atomic read of the then current post-condition state of
the logical row.
|
TPS |
write(Schema schema,
Map<String,Object> propertySet,
long fromTime,
long toTime,
long writeTime,
INameFilter filter,
IPrecondition precondition)
Atomic write with atomic read of the post-condition state of the logical
row.
|
public SparseRowStore(IIndex ndx)
IIndex
as a
SparseRowStore
.
Note: When creating the backing index you MUST specify the split handler to ensure that dynamic sharding does not break logical rows, e.g.:
md.setSplitHandler(LogicalRowSplitHandler.INSTANCE);Note: The JDK
RuleBasedCollator
embeds nul
bytes in
the Unicode sort keys. This makes them unsuitable for the row store which
can not locate the start of the column name if there are embedded
nul
s in the primaryKey. Therefore, if you are using the
CollatorEnum.JDK
as your default collator, then you MUST override
the IndexMetadata
for the row store to use either an ASCII
collator or the ICU collator. In general, the ICU collator is superior to
the JDK collator and will be used by default. The ASCII collator is not
ideal since non-ascii distinctions will be lost, but it is better than
being unable to decode the data in the row store.ndx
- The index.public IIndex getIndex()
public Object get(Schema schema, Object primaryKey, String name)
schema
- The Schema
governing the logical row.primaryKey
- The primary key that identifies the logical row.name
- The property name.null
iff the property is
not bound.public Map<String,Object> read(Schema schema, Object primaryKey)
schema
- The Schema
governing the logical row.primaryKey
- The primary key that identifies the logical row.null
IFF there are no property values for that
logical row (including no deleted property values, no property
values that are excluded due to their timestamps, and no property
values that are excluded due to a property name filter). A
null
return is a strong guarantee that NO data
existed in the row store and that time of the read for the given
schema and primaryKey.public Map<String,Object> read(Schema schema, Object primaryKey, INameFilter filter)
schema
- The Schema
governing the logical row.primaryKey
- The primary key that identifies the logical row.filter
- An optional filter.null
IFF there are no property values for that
logical row (including no deleted property values, no property
values that are excluded due to their timestamps, and no property
values that are excluded due to a property name filter). A
null
return is a strong guarantee that NO data
existed in the row store and that time of the read for the given
schema and primaryKey.public ITPS read(Schema schema, Object primaryKey, long fromTime, long toTime, INameFilter filter)
schema
- The Schema
governing the logical row.primaryKey
- The primary key that identifies the logical row.fromTime
- The first timestamp for which timestamped property values will
be accepted.toTime
- The first timestamp for which timestamped property values will
NOT be accepted -or- IRowStoreConstants.CURRENT_ROW
to
accept only the most current binding whose timestamp is GTE
fromTime.filter
- An optional filter that may be used to select values for
property names accepted by the filter.null
IFF there are no
property values for that logical row (including no deleted
property values, no property values that are excluded due to
their timestamps, and no property values that are excluded due to
a property name filter). A null
return is a strong
guarantee that NO data existed in the row store and that time of
the read for the given schema and primaryKey.IllegalArgumentException
- if the schema is null
.IllegalArgumentException
- if the primaryKey is null
.IllegalArgumentException
- if the fromFrom and or toTime are invalid.ITimestampPropertySet#asMap(), return the most current bindings.
,
ITimestampPropertySet#asMap(long)), return the most current bindings
as of the specified timestamp.
,
IRowStoreConstants.CURRENT_ROW
,
IRowStoreConstants.MIN_TIMESTAMP
,
IRowStoreConstants.MAX_TIMESTAMP
public Map<String,Object> write(Schema schema, Map<String,Object> propertySet)
Note: In order to cause a column value for row to be deleted you MUST
specify a null
column value for that column.
Note: the value of the primaryKey is written each time the logical row is updated and timestamp associate with the value for the primaryKey property tells you the timestamp of each row revision.
schema
- The Schema
governing the logical row.propertySet
- The column names and values for that row.public Map<String,Object> write(Schema schema, Map<String,Object> propertySet, long writeTime)
schema
- The Schema
governing the logical row.propertySet
- The column names and values for that row.writeTime
- The timestamp to use for the row -or-
IRowStoreConstants.AUTO_TIMESTAMP
if the timestamp
will be generated by the server -or-
IRowStoreConstants.AUTO_TIMESTAMP_UNIQUE
if a
federation-wide unique timestamp will be generated by the
server.public TPS write(Schema schema, Map<String,Object> propertySet, long writeTime, INameFilter filter, IPrecondition precondition)
Note: In order to cause a column value for row to be deleted you MUST
specify a null
column value for that column. A
null
will be written under the key for the column value
with a new timestamp. This is interpreted as a deleted property value
when the row is simplified as a Map
. If you examine the
ITPS
you can see the ITPV
with the null
value and the timestamp of the delete.
Note: the value of the primaryKey is written each time the logical row is updated and timestamp associate with the value for the primaryKey property tells you the timestamp of each row revision.
Note: If the caller specified a timestamp, then that timestamp is used by the atomic read. If the timestamp was assigned by the server, then the server assigned timestamp is used by the atomic read.
Note: You can verify pre-conditions for the logical row on the server. Among other things this could be used to reject an update if someone has modified the logical row since you last read some value.
schema
- The Schema
governing the logical row.propertySet
- The column names and values for that row. The primaryKey as
identified by the Schema
MUST be present in the
propertySet.writeTime
- The timestamp to use for the row -or-
IRowStoreConstants.AUTO_TIMESTAMP
if the timestamp
will be generated by the server -or-
IRowStoreConstants.AUTO_TIMESTAMP_UNIQUE
if a
federation-wide unique timestamp will be generated by the
server.filter
- An optional filter used to select the property values that
will be returned (this has no effect on the atomic write).precondition
- When present, the pre-condition state of the row will be read
and offered to the IPrecondition
. If the
IPrecondition
fails, then the atomic write will NOT be
performed and the pre-condition state of the row will be
returned. If the IPrecondition
succeeds, then the
atomic write will be performed and the post-condition state of
the row will be returned. Use TPS.isPreconditionOk()
to determine whether or not the write was performed.null
iff there is no data for the
primaryKey (per the contract for an atomic read).
If an optional IPrecondition
was specified and the
IPrecondition
was NOT satisfied, then the write
operation was NOT performed and the result is the pre-condition
state of the logical row (which, again, will be null
IFF there is NO data for the primaryKey).
ITPS.getWriteTimestamp()
public TPS write(Schema schema, Map<String,Object> propertySet, long fromTime, long toTime, long writeTime, INameFilter filter, IPrecondition precondition)
Note: In order to cause a column value for row to be deleted you MUST
specify a null
column value for that column. A
null
will be written under the key for the column value
with a new timestamp. This is interpreted as a deleted property value
when the row is simplified as a Map
. If you examine the
ITPS
you can see the ITPV
with the null
value and the timestamp of the delete.
Note: the value of the primaryKey is written each time the logical row is updated and timestamp associate with the value for the primaryKey property tells you the timestamp of each row revision.
Note: If the caller specified a timestamp, then that timestamp is used by the atomic read. If the timestamp was assigned by the server, then the server assigned timestamp is used by the atomic read.
Note: You can verify pre-conditions for the logical row on the server. Among other things this could be used to reject an update if someone has modified the logical row since you last read some value.
schema
- The Schema
governing the logical row.propertySet
- The column names and values for that row. The primaryKey as
identified by the Schema
MUST be present in the
propertySet.fromTime
- During pre-condition and post-condition reads, the
first timestamp for which timestamped property values will be
accepted.toTime
- During pre-condition and post-condition reads, the
first timestamp for which timestamped property values will NOT
be accepted -or- IRowStoreConstants.CURRENT_ROW
to
accept only the most current binding whose timestamp is GTE
fromTime.writeTime
- The timestamp to use for the row -or-
IRowStoreConstants.AUTO_TIMESTAMP
if the timestamp
will be generated by the server -or-
IRowStoreConstants.AUTO_TIMESTAMP_UNIQUE
if a
federation-wide unique timestamp will be generated by the
server.filter
- An optional filter used to select the property values that
will be returned (this has no effect on the atomic write).precondition
- When present, the pre-condition state of the row will be read
and offered to the IPrecondition
. If the
IPrecondition
fails, then the atomic write will NOT be
performed and the pre-condition state of the row will be
returned. If the IPrecondition
succeeds, then the
atomic write will be performed and the post-condition state of
the row will be returned. Use TPS.isPreconditionOk()
to determine whether or not the write was performed.null
IFF there is NO
data for the primaryKey.
If an optional IPrecondition
was specified and the
IPrecondition
was NOT satisified, then the write
operation was NOT performed and the result is the pre-condition
state of the logical row (which, again, will be null
IFF there is NO data for the primaryKey).
UnsupportedOperationException
- if a property has an auto-increment type and the
ValueType
of the property does not support
auto-increment.UnsupportedOperationException
- if a property has an auto-increment type but there is no
successor in the value space of that property.ITPS.getWriteTimestamp()
public ITPS delete(Schema schema, Object primaryKey)
schema
- The schema.primaryKey
- The primary key for the logical row.public ITPS delete(Schema schema, Object primaryKey, long fromTime, long toTime, long writeTime, INameFilter filter)
null
, and the read property values are
returned.schema
- The schema.primaryKey
- The primary key for the logical row.fromTime
- During pre-condition and post-condition reads, the
first timestamp for which timestamped property values will be
accepted.toTime
- During pre-condition and post-condition reads, the
first timestamp for which timestamped property values will NOT
be accepted -or- IRowStoreConstants.CURRENT_ROW
to
accept only the most current binding whose timestamp is GTE
fromTime.writeTime
- The timestamp that will be written into the "deleted" entries
-or- IRowStoreConstants.AUTO_TIMESTAMP
if the
timestamp will be generated by the server -or-
IRowStoreConstants.AUTO_TIMESTAMP_UNIQUE
if a
federation-wide unique timestamp will be generated by the
server.filter
- An optional filter used to select the property values that
will be deleted.ITPS.getWriteTimestamp()
will report
the timestamp assigned to the deleted entries used to overwrite
these property values in the store.public Iterator<? extends ITPS> rangeIterator(Schema schema)
schema
- The Schema
governing the logical row.public Iterator<? extends ITPS> rangeIterator(Schema schema, Object fromKey, Object toKey)
schema
- The Schema
governing the logical row.fromKey
- The value of the primary key for lower bound (inclusive) of
the key range -or- null
iff there is no lower
bound.toKey
- The value of the primary key for upper bound (exclusive) of
the key range -or- null
iff there is no lower
bound.public Iterator<? extends ITPS> rangeIterator(Schema schema, Object fromKey, Object toKey, INameFilter filter)
schema
- The Schema
governing the logical row.fromKey
- The value of the primary key for lower bound (inclusive) of
the key range -or- null
iff there is no lower
bound.toKey
- The value of the primary key for upper bound (exclusive) of
the key range -or- null
iff there is no lower
bound.filter
- An optional filter.public Iterator<? extends ITPS> rangeIterator(Schema schema, Object fromKey, Object toKey, int capacity, long fromTime, long toTime, INameFilter nameFilter)
schema
- The Schema
governing the logical row.fromKey
- The value of the primary key for lower bound (inclusive) of
the key range -or- null
iff there is no lower
bound.toKey
- The value of the primary key for upper bound (exclusive) of
the key range -or- null
iff there is no lower
bound.capacity
- When non-zero, this is the maximum #of logical rows that will
be read atomically. This is only an upper bound. The actual
#of logical rows in an atomic read depends on a variety of
factors.fromTime
- The first timestamp for which timestamped property values will
be accepted.toTime
- The first timestamp for which timestamped property values will
NOT be accepted -or- IRowStoreConstants.CURRENT_ROW
to
accept only the most current binding whose timestamp is GTE
fromTime.nameFilter
- An optional filter used to select the property(s) of interest.public List<String> getNamespaces(long tx)
tx
- The transaction identifier -or- timestamp
if the
IIndexManager
is not a Journal
.Copyright © 2006–2019 SYSTAP, LLC DBA Blazegraph. All rights reserved.