public class UnisolatedReadWriteIndex extends Object implements IIndex, ILinearList, IReadWriteLockManager
A view onto an unisolated index partition which enforces the constraint that
either concurrent readers -or- a single writer may have access to the
unisolated index at any given time. This provides the maximum possible
concurrency for an unisolated index using an internal ReadWriteLock
to coordinate threads.
The possible concurrency with this approach is higher than that provided by
the IConcurrencyManager
since the latter only allows a single process
access to the unisolated index while this class can allow multiple readers
concurrent access to the same unisolated index. The use of this class
is NOT compatible with the IConcurrencyManager
(the
IConcurrencyManager
does not respect the locks managed by this
class).
This class does NOT handle deadlock detection. However, it does not expose
the underlying lock and the scope of the acquired lock should always be
restricted to a single operation as defined by IIndex
. If you
circumvent this by writing and submitting an IIndexProcedure
that
attempts an operation on another UnisolatedReadWriteIndex
then a
deadlock MAY occur.
The point test methods on this class (get, contains, lookup, remove) have
relatively high overhead since they need to acquire and release the lock per
point test. If you need to do a bunch of point tests, then submit an
IIndexProcedure
that will run against the underlying index once it
has acquired the appropriate lock -- point tests from within the
IIndexProcedure
will be very efficient.
ConcurrencyManager
and the
group commit protocol which it imposes on writers. It also facilitates the
reuse of the buffers backing the unisolated index, which can reduce IO
associated with index operations when compared to reading on a read-committed
view of the index with concurrent writes and interleaved commits on the
corresponding unisolated index.
Reading on the read-committed index view has greater possible concurrency, but requires that writes are committed before they become visible and must read the data from the disk since it does not have access to the buffers for the unisolated index. Requiring a commit in order for the writes to become visible imposes significant latency, especially when computing the fix point of a rule set which may take multiple rounds. Reading on the unisolated index should do better in terms of buffer reuse and does NOT require commits or checkpoints of the index for writes to become visible to readers but does require a means to correctly interleave access to the unisolated index, which is the purpose of this class.
While the lock manager could be modified to support Share vs Exclusive locks and to use Share locks for readers and Exclusive locks for writers, writers would still block until the next commit so the throughput (e.g., when computing the fix point of a rule set) is significantly lower.
Modifier and Type | Field and Description |
---|---|
protected static int |
DEFAULT_CAPACITY
The default capacity for iterator reads against the underlying index.
|
Constructor and Description |
---|
UnisolatedReadWriteIndex(BTree ndx)
Creates a view of an unisolated index that will enforce the concurrency
constraints of the
BTree class, but only among other instances of
this class for the same underlying index. |
UnisolatedReadWriteIndex(BTree ndx,
int defaultCapacity)
Creates a view of an unisolated index that will enforce the concurrency
constraints of the
BTree class, but only among other instances of
this class for the same underlying index. |
Modifier and Type | Method and Description |
---|---|
boolean |
contains(byte[] key)
Return
true iff there is a (non-deleted) index entry for
the key. |
boolean |
contains(Object key)
Return true iff there is an entry for the key.
|
ScanCostReport |
estimateCost(DiskCostModel diskCostModel,
long rangeCount)
Estimate the cost of a range scan.
|
ICounter |
getCounter()
This throws an exception.
|
CounterSet |
getCounters()
Return performance counters.
|
IndexMetadata |
getIndexMetadata()
The metadata for the index.
|
int |
getReadLockCount()
Return the #of read-locks held by the current thread for a mutable index
view.
|
IResourceMetadata[] |
getResourceMetadata()
The description of the resources comprising the index view.
|
IRawStore |
getStore()
Return the backing store for the index.
|
long |
indexOf(byte[] key)
Lookup the index position of the key.
|
byte[] |
insert(byte[] key,
byte[] value)
Insert or update a value under the key.
|
Object |
insert(Object key,
Object value)
Insert with auto-magic handling of keys and value objects.
|
boolean |
isReadOnly()
Return
true iff the data structure is read-only. |
byte[] |
keyAt(long index)
Return the key for the identified entry.
|
byte[] |
lookup(byte[] key)
Lookup a value for a key.
|
Object |
lookup(Object key)
Lookup a value for a key.
|
byte[] |
putIfAbsent(byte[] key,
byte[] value)
Insert or update a value under the key iff there is no entry for that key
in the index.
|
long |
rangeCount()
Return the #of tuples in the index.
|
long |
rangeCount(byte[] fromKey,
byte[] toKey)
Return the #of tuples in a half-open key range.
|
long |
rangeCountExact(byte[] fromKey,
byte[] toKey)
Return the exact #of tuples in a half-open key range.
|
long |
rangeCountExactWithDeleted(byte[] fromKey,
byte[] toKey)
Return the exact #of tuples in a half-open key range, including any
deleted tuples.
|
ITupleIterator |
rangeIterator()
Visits all tuples in key order.
|
ITupleIterator |
rangeIterator(byte[] fromKey,
byte[] toKey)
Return an iterator that visits the entries in a half-open key range.
|
ITupleIterator |
rangeIterator(byte[] fromKey,
byte[] toKey,
int capacity,
int flags,
IFilter filter)
The iterator will read on the underlying index in chunks, buffering
tuples as it goes.
|
Lock |
readLock()
Return a
Lock that may be used to obtain a shared read lock which
is used (in the absence of other concurrency control mechanisms) to
permit concurrent readers on an unisolated index while serializing access
to that index when a writer must run. |
byte[] |
remove(byte[] key)
Remove the key and its associated value.
|
Object |
remove(Object key)
Remove the key and its associated value.
|
void |
submit(byte[] fromKey,
byte[] toKey,
IKeyRangeIndexProcedure proc,
IResultHandler handler)
The procedure will be transparently applied against each index partition
spanned by the given key range.
|
<T> T |
submit(byte[] key,
ISimpleIndexProcedure<T> proc)
Submits an index procedure that operations on a single key to the
appropriate index partition returning the result of that procedure.
|
void |
submit(int fromIndex,
int toIndex,
byte[][] keys,
byte[][] vals,
AbstractKeyArrayIndexProcedureConstructor ctor,
IResultHandler aggregator)
Runs a procedure against an index.
|
String |
toString() |
byte[] |
valueAt(long index)
Return the value for the identified entry.
|
Lock |
writeLock()
Return a
Lock that may be used to obtain an exclusive write lock
which is used (in the absence of other concurrency control mechanisms) to
serialize all processes accessing an unisolated index when a writer must
run. |
protected static final int DEFAULT_CAPACITY
ReadWriteLock
.public UnisolatedReadWriteIndex(BTree ndx)
BTree
class, but only among other instances of
this class for the same underlying index.ndx
- The underlying unisolated index.IllegalArgumentException
- if the index is null
.public UnisolatedReadWriteIndex(BTree ndx, int defaultCapacity)
BTree
class, but only among other instances of
this class for the same underlying index.ndx
- The underlying unisolated index.defaultCapacity
- The capacity for iterator reads against the underlying index.
The main purpose of the capacity is to reduce the contention
for the ReadWriteLock
. Relatively small values should
therefore be fine. See DEFAULT_CAPACITY
.IllegalArgumentException
- if the index is null
.public Lock readLock()
IReadWriteLockManager
Lock
that may be used to obtain a shared read lock which
is used (in the absence of other concurrency control mechanisms) to
permit concurrent readers on an unisolated index while serializing access
to that index when a writer must run. This is exposed for processes which
need to obtain the write lock to coordinate external operations.
Note: If the persistence capable data structure is read-only then the
returned Lock
is a singleton that ignores all lock requests. This
is because our read-only persistence capable data structures are already
thread-safe for concurrent readers.
readLock
in interface IReadWriteLockManager
public Lock writeLock()
IReadWriteLockManager
Lock
that may be used to obtain an exclusive write lock
which is used (in the absence of other concurrency control mechanisms) to
serialize all processes accessing an unisolated index when a writer must
run. This is exposed for processes which need to obtain the write lock to
coordinate external operations.writeLock
in interface IReadWriteLockManager
public int getReadLockCount()
IReadWriteLockManager
getReadLockCount
in interface IReadWriteLockManager
public boolean isReadOnly()
IReadWriteLockManager
true
iff the data structure is read-only.isReadOnly
in interface IReadWriteLockManager
public IndexMetadata getIndexMetadata()
IIndex
Note: The same method is exposed by ICheckpointProtocol
. It is
also exposed here in order to provide access to the IndexMetadata
to remote clients in the scale-out architecture.
getIndexMetadata
in interface IIndex
ICheckpointProtocol.getIndexMetadata()
public IResourceMetadata[] getResourceMetadata()
IIndex
getResourceMetadata
in interface IIndex
public CounterSet getCounters()
IIndex
Interesting performance counters and other statistics about the index.
getCounters
in interface IIndex
getCounters
in interface ICounterSetAccess
public ICounter getCounter()
ICounter
for
the index partition, then write and submit an IIndexProcedure
.getCounter
in interface IIndexLocalCounter
UnsupportedOperationException
public boolean contains(Object key)
IAutoboxBTree
contains
in interface IAutoboxBTree
key
- The key is implicitly converted to an unsigned
byte[]
.public Object insert(Object key, Object value)
IAutoboxBTree
insert
in interface IAutoboxBTree
key
- The key is implicitly converted to an unsigned
byte[]
.value
- The value is implicitly converted to a byte[]
.null
if there was
no value stored under that key.public Object lookup(Object key)
IAutoboxBTree
lookup
in interface IAutoboxBTree
key
- The key is implicitly converted to an unsigned
byte[]
.null
if there is no
entry for that key.public Object remove(Object key)
IAutoboxBTree
remove
in interface IAutoboxBTree
key
- The key is implicitly converted to an unsigned
byte[]
.null
if the key was not found.public boolean contains(byte[] key)
ISimpleBTree
true
iff there is a (non-deleted) index entry for
the key. An index entry with a null
value will cause this
method to return true
. A deleted index entry will cause
this method to return false
.contains
in interface ISimpleBTree
key
- The key.true
if the index contains an (un-deleted) entry
for that key.public byte[] lookup(byte[] key)
ISimpleBTree
lookup
in interface ISimpleBTree
null
if there
is no entry for that key or if the entry under that key is marked
as deleted.public byte[] insert(byte[] key, byte[] value)
ISimpleBTree
insert
in interface ISimpleBTree
key
- The key.value
- The value (may be null).null
if the
key was not found or if the previous entry for that key was
marked as deleted.public byte[] putIfAbsent(byte[] key, byte[] value)
ISimpleBTree
if (!contains(key)) insert(key, value);However, if the index allows
null
values to be stored under
a key and the application in fact stores null
values for
some tuples, then caller is not able to decide using this method whether
or not the mutation was applied based on the return value. For these
cases if the caller needs to know whether or not the conditional mutation
actually took place, the caller CAN use the pattern
if(!contains()) insert(key,value);
to obtain that
information.putIfAbsent
in interface ISimpleBTree
key
- The key.value
- The value (may be null).null
if the key
was not found or if the previous entry for that key was marked as
deleted. Note that the return value MAY be null
even
if there was an entry under the key. This is because the index is
capable of storing a null
value. In such cases the
conditional mutation WAS NOT applied.(putIfAbsent)
public byte[] remove(byte[] key)
ISimpleBTree
remove
in interface ISimpleBTree
key
- The key.null
if the key
was not found or if the previous entry under that key was marked
as deleted.public long rangeCount()
IRangeQuery
Note: If the index supports deletion markers then the range count will be an upper bound and may double count tuples which have been overwritten, including the special case where the overwrite is a delete.
rangeCount
in interface IRangeQuery
ISimpleIndexAccess.rangeCount()
public long rangeCount(byte[] fromKey, byte[] toKey)
IRangeQuery
Note: If the index supports deletion markers then the range count will be an upper bound and may double count tuples which have been overwritten, including the special case where the overwrite is a delete.
rangeCount
in interface IRangeQuery
fromKey
- The lowest key that will be counted (inclusive). When
null
there is no lower bound.toKey
- The first key that will not be counted (exclusive). When
null
there is no upper bound.public long rangeCountExact(byte[] fromKey, byte[] toKey)
IRangeQuery
Note: If the index supports deletion markers then this operation will require a key-range scan.
rangeCountExact
in interface IRangeQuery
fromKey
- The lowest key that will be counted (inclusive). When
null
there is no lower bound.toKey
- The first key that will not be counted (exclusive). When
null
there is no upper bound.public long rangeCountExactWithDeleted(byte[] fromKey, byte[] toKey)
IRangeQuery
When the view is just an AbstractBTree
the result is the same as
for IRangeQuery.rangeCount(byte[], byte[])
, which already
reports all tuples regardless of whether or not they are deleted.
When the index is a view with multiple sources, this operation requires a key-range scan where both deleted and undeleted tuples are visited.
rangeCountExactWithDeleted
in interface IRangeQuery
fromKey
- The lowest key that will be counted (inclusive). When
null
there is no lower bound.toKey
- The first key that will not be counted (exclusive). When
null
there is no upper bound.IRangeQuery.rangeCountExact(byte[], byte[])
public final ITupleIterator rangeIterator()
IRangeQuery
rangeIterator(null, null)
rangeIterator
in interface IRangeQuery
public ITupleIterator rangeIterator(byte[] fromKey, byte[] toKey)
IRangeQuery
rangeIterator
in interface IRangeQuery
fromKey
- The first key that will be visited (inclusive lower bound).
When null
there is no lower bound.toKey
- The first key that will NOT be visited (exclusive upper
bound). When null
there is no upper bound.SuccessorUtil, which may be used to compute the successor of a value
before encoding it as a component of a key.
,
BytesUtil#successor(byte[]), which may be used to compute the
successor of an encoded key.
,
EntryFilter, which may be used to filter the entries visited by the
iterator.
public ITupleIterator rangeIterator(byte[] fromKey, byte[] toKey, int capacity, int flags, IFilter filter)
rangeIterator
in interface IRangeQuery
fromKey
- The first key that will be visited (inclusive lower bound).
When null
there is no lower bound.toKey
- The first key that will NOT be visited (exclusive upper
bound). When null
there is no upper bound.capacity
- The #of entries to buffer at a time. This is a hint and MAY be
zero (0) to use an implementation specific default
capacity. A non-zero value may be used if you know that you
want at most N results or if you want to override the default
#of results to be buffered before sending them across a
network interface. (Note that you can control the default
value using
IBigdataClient.Options#DEFAULT_CLIENT_RANGE_QUERY_CAPACITY
).flags
- A bitwise OR of IRangeQuery.KEYS
, IRangeQuery.VALS
, etc.filter
- An optional object used to construct a stacked iterator. When
IRangeQuery.CURSOR
is specified in flags, the base
iterator will implement ITupleCursor
and the first
filter in the stack can safely cast the source iterator to an
ITupleCursor
. If the outermost filter in the stack
does not implement ITupleIterator
, then it will be
wrapped an ITupleIterator
.SuccessorUtil, which may be used to compute the successor of a value
before encoding it as a component of a key.
,
BytesUtil#successor(byte[]), which may be used to compute the
successor of an encoded key.
,
IFilterConstructor, which may be used to construct an iterator stack
performing filtering or other operations.
public <T> T submit(byte[] key, ISimpleIndexProcedure<T> proc)
IIndex
submit
in interface IIndex
key
- The key.proc
- The procedure.IIndexProcedure.apply(IIndex)
public void submit(byte[] fromKey, byte[] toKey, IKeyRangeIndexProcedure proc, IResultHandler handler)
IIndex
Note: Since this variant of submit() does not split keys the
fromIndex and toIndex in the Split
s reported to
the IResultHandler
will be zero (0).
submit
in interface IIndex
fromKey
- The lower bound (inclusive) -or- null
if there
is no lower bound.toKey
- The upper bound (exclusive) -or- null
if there
is no upper bound.proc
- The procedure. If the procedure implements the
IParallelizableIndexProcedure
marker interface then it
MAY be executed in parallel against the relevant index
partition(s).public void submit(int fromIndex, int toIndex, byte[][] keys, byte[][] vals, AbstractKeyArrayIndexProcedureConstructor ctor, IResultHandler aggregator)
IIndex
Note: This may be used to send custom logic together with the data to a
remote index or index partition. When the index is remote both the
procedure and the return value MUST be Serializable
.
Note: The scale-out indices add support for auto-split of the procedure such that it runs locally against each relevant index partition.
submit
in interface IIndex
fromIndex
- The index of the first key to be used (inclusive).toIndex
- The index of the last key to be used (exclusive).keys
- The keys (required).vals
- The values (optional depending on the procedure).ctor
- An object that can create instances of the procedure.aggregator
- When defined, results from each procedure application will be
reported to this object.
TODO In order to allow parallelization within a shard, we need to modify
this method signature to pass in an IResultHandler
constructor
object. That might be something which could be pushed down onto the ctor
argument. It would be used in scale-out to create a DS local result handler
so we can locally aggregate when parallelizing against each shard and then
return that aggregated result to the client which would extract the aggregate
result across the shards from the client's result handler. See BLZG-1537.(Schedule more IOs when loading data)
public ScanCostReport estimateCost(DiskCostModel diskCostModel, long rangeCount)
diskCostModel
- The disk cost model.rangeCount
- The #of tuples to be visited.public long indexOf(byte[] key)
ILinearList
Note that ILinearList.indexOf(byte[])
is the basis for implementing the
IRangeQuery
interface.
indexOf
in interface ILinearList
key
- The key.(-(insertion point) - 1)
. The insertion point is
defined as the point at which the key would be found it it were
inserted into the btree without intervening mutations. Note that
this guarantees that the return value will be >= 0 if and only if
the key is found. When found the index will be in [0:nentries).
Adding or removing entries in the tree may invalidate the index.
pos = -(pos+1)
will convert an insertion point to
the index at which the key would be found if it were
inserted - this is also the index of the predecessor of key
in the index.
ILinearList.keyAt(long)
,
ILinearList.valueAt(long)
public byte[] keyAt(long index)
ILinearList
ISimpleBTree.lookup(byte[])
.keyAt
in interface ILinearList
index
- The index position of the entry (origin zero).ILinearList.indexOf(byte[])
,
ILinearList.valueAt(long)
public byte[] valueAt(long index)
ILinearList
ISimpleBTree.lookup(byte[])
.valueAt
in interface ILinearList
index
- The index position of the entry (origin zero).null
if
there is a deleted entry at that index position thenILinearList.indexOf(byte[])
,
ILinearList.keyAt(long)
public IRawStore getStore()
Copyright © 2006–2019 SYSTAP, LLC DBA Blazegraph. All rights reserved.