public abstract class AbstractJournal extends Object implements IJournal, IAllocationManager, IAllocationManagerStore
The journal is a persistence capable data structure supporting atomic commit,
named indices, and full transactions. The BufferMode.DiskRW
mode
provides an persistence scheme based on reusable allocation slots while the
BufferMode.DiskWORM
mode provides an append only persistence scheme.
Journals may be configured in highly available quorums.
This class is an abstract implementation of the IJournal
interface
that does not implement the IConcurrencyManager
,
IResourceManager
, or ITransactionService
interfaces. The
Journal
provides a concrete implementation that may be used for a
standalone database complete with concurrency control and transaction
management.
The IIndexStore
implementation on this class is NOT thread-safe. The
basic limitation is that the mutable BTree
is NOT thread-safe. The
getIndex(String)
method exposes this mutable BTree
. If you
use this method to access the mutable BTree
then YOU are responsible
for avoiding concurrent writes on the returned object.
See IConcurrencyManager.submit(AbstractTask)
for a thread-safe API
that provides suitable concurrency control for both isolated and unisolated
operations on named indices. Note that the use of the thread-safe API does
NOT protect against applications that directly access the mutable
BTree
using getIndex(String)
.
The IRawStore
interface on this class is thread-safe. However, this
is a low-level API that is not used by directly by most applications. The
BTree
class uses this low-level API to read and write its nodes and
leaves on the store. Applications generally use named indices rather than the
IRawStore
interface.
Note: transaction processing MAY occur be concurrent since the write set of a
each transaction is written on a distinct TemporaryStore
. However,
without additional concurrency controls, each transaction is NOT thread-safe
and MUST NOT be executed by more than one concurrent thread. Again, see
IConcurrencyManager.submit(AbstractTask)
for a high-concurrency API
for both isolated operations (transactions) and unisolated operations. Note
that the TemporaryStore
backing a transaction will spill
automatically from memory onto disk if the write set of the transaction grows
too large.
The journal maintains two root blocks. Commit updates the root blocks using
the Challis algorithm. (The root blocks are updated using an alternating
pattern and "timestamps" are recorded at the head and tail of each root block
to detect partial writes. See IRootBlockView
and
RootBlockView
.) When the journal is backed by a disk file, the data
are optionally flushed to disk on commit
. If
desired, the writes may be flushed before the root blocks are updated to
ensure that the writes are not reordered - see Options.DOUBLE_SYNC
.
Modifier and Type | Class and Description |
---|---|
protected class |
AbstractJournal.BasicHA
Implementation hooks into the various low-level operations required to
support HA for the journal.
|
static interface |
AbstractJournal.ISnapshotData |
static interface |
AbstractJournal.ISnapshotEntry |
static class |
AbstractJournal.SnapshotData |
Modifier and Type | Field and Description |
---|---|
static int |
DELETEBLOCK
The index of the root address of the delete blocks associated with
this transaction.
|
protected boolean |
deleteOnClose
Option set by the test suites causes the file backing the journal to be
deleted when the journal is closed.
|
protected boolean |
doubleSync
Option controls whether the journal forces application data to disk
before updating the root blocks.
|
protected ForceEnum |
forceOnCommit
Option controls how the journal behaves during a commit.
|
protected static org.apache.log4j.Logger |
haLog
Logger for HA events.
|
static int |
PREV_ROOTBLOCK
The index of the root address where the root block copy from the previous
commit is stored.
|
protected Properties |
properties
A clone of the properties used to initialize the
Journal . |
static int |
ROOT_ICUVERSION
The index of the root address containing the
ICUVersionRecord . |
static int |
ROOT_NAME2ADDR
|
File |
tmpDir
The directory that should be used for temporary files.
|
NULL
Modifier | Constructor and Description |
---|---|
protected |
AbstractJournal(Properties properties)
Create or re-open a journal.
|
protected |
AbstractJournal(Properties properties,
Quorum<HAGlue,QuorumService<HAGlue>> quorum)
Create or re-open a journal as part of a highly available
Quorum . |
Modifier and Type | Method and Description |
---|---|
protected void |
_close()
Core implementation of immediate shutdown handles event reporting.
|
protected Name2Addr |
_getName2Addr()
Return the "live" BTree mapping index names to the last metadata record
committed for the named index.
|
void |
abort()
Abandon the current write set (synchronous).
|
void |
abortContext(IAllocationContext context)
Indicates that the allocation context will no longer be used and that the
allocations made within the context should be discarded.
|
protected void |
assertBefore(UUID serviceId1,
UUID serviceId2,
long t1,
long t2)
Assert that
t1 LT t2 , where t1 and
t2 are timestamps obtain such that this relation will be
true if the clocks on the nodes are synchronized. |
protected void |
assertCanRead()
Assert that the journal is readable.
|
protected void |
assertCanWrite()
Assert that the journal is writable.
|
protected void |
assertCommitTimeAdvances(long commitTime)
Method verifies that the commit time strictly advances on the local store
by checking against the current root block.
|
void |
assertHAReady(long token)
Assert that the
getHAReady() token has the specified value. |
protected void |
assertOpen()
Assert that the store is open.
|
protected static void |
assertPriorCommitTimeAdvances(long currentCommitTime,
long priorCommitTime)
Method verifies that the commit time strictly advances on the local store
by checking against the current root block.
|
long |
awaitHAReady(long timeout,
TimeUnit units)
Await the service being ready to partitipate in an HA quorum.
|
protected void |
clearQuorumToken(long newValue)
This method will update the quorum token on the journal and the
associated values as if the service was not joined with a met
quorum.
|
void |
close()
Invokes
shutdownNow() . |
void |
closeForWrites(long closeTime)
Restart safe conversion of the store into a read-only store with the
specified closeTime.
|
long |
commit()
Atomic commit (synchronous).
|
protected long |
commitNow(long commitTime)
An atomic commit is performed by directing each registered
ICommitter to flush its state onto the store using
ICommitter.handleCommit(long) . |
void |
delete(long addr)
Delete the data (unisolated).
|
void |
delete(long addr,
IAllocationContext context)
Delete the data associated with the address within the allocation
context.
|
void |
deleteResources()
Deletes the backing file(s) (if any).
|
void |
destroy()
Closes the store immediately (if open) and deletes its persistent
resources.
|
void |
detachContext(IAllocationContext context)
Indicates that the allocation context will no longer be used, but that
the allocations made within the context should be preserved.
|
protected void |
discardCommitters()
This method is invoked by
abort() when the store must discard
any hard references that it may be holding to objects registered as
ICommitter s. |
void |
doLocalAbort()
Local commit protocol (HA).
|
void |
doLocalCommit(IRootBlockView rootBlock)
Local commit protocol (HA, offline).
|
protected void |
doLocalCommit(QuorumService<HAGlue> localService,
IRootBlockView rootBlock)
Local commit protocol (HA).
|
void |
dropIndex(String name)
Drops the named index.
|
long |
ensureMinFree(long minFree)
Make sure that the journal has at least the specified number of bytes of
unused capacity remaining in the user extent.
|
protected void |
finalize()
Closes out the journal iff it is still open.
|
void |
force(boolean metadata)
Force the data to stable storage.
|
IBufferStrategy |
getBufferStrategy()
Return the delegate that implements the
BufferMode . |
int |
getByteCount(long addr)
The length of the datum in bytes.
|
ICommitRecord |
getCommitRecord()
Returns a read-only view of the most recently committed
ICommitRecord containing the root addresses. |
ICommitRecord |
getCommitRecord(long commitTime)
Return the
ICommitRecord for the most recent committed state
whose commit timestamp is less than or equal to timestamp. |
protected CommitRecordIndex |
getCommitRecordIndex(long addr,
boolean readOnly)
Create or load and return the index that resolves timestamps to
ICommitRecord s. |
ICommitRecord |
getCommitRecordStrictlyGreaterThan(long commitTime)
Return the first commit record whose timestamp is strictly greater than
the given commitTime.
|
CounterSet |
getCounters()
Return counters reporting on various aspects of the journal.
|
protected ICommitRecord |
getEarliestVisibleCommitRecordForHA(long releaseTime)
Resolve the
ICommitRecord for the earliest visible commit point
based on the caller's releaseTime. |
abstract ExecutorService |
getExecutorService()
Service for running arbitrary tasks in support of
IResourceLocator . |
File |
getFile()
The backing file -or-
null if there is no backing file
for the store. |
long |
getHAPrepareTimeout()
The HA timeout in milliseconds for a 2-phase prepare.
|
long |
getHAReady()
Returns the current value of the
haReadyToken
(non-blocking). |
long |
getHAReleaseTimeConsensusTimeout()
The HA timeout in milliseconds for the release time consensus protocol.
|
HAStatusEnum |
getHAStatus()
A simplified summary of the HA status of the service.
|
protected int |
getHistoricalIndexCacheSize()
The size of the canonicalizing cache from addr to
IIndex . |
BTree |
getIndex(String name)
Return the mutable view of the named index (aka the "live" or
ITx.UNISOLATED index). |
IIndex |
getIndex(String name,
long commitTime)
Return a view of the named index as of the specified timestamp.
|
protected int |
getIndexCacheSize()
The size of the cache from (name,timestamp) to
IIndex . |
protected CounterSet |
getIndexCounters()
Return a
CounterSet reflecting the named indices that are open or
which have been recently opened. |
ICheckpointProtocol |
getIndexLocal(String name,
long commitTime)
Core implementation for access to historical index views.
|
ICheckpointProtocol |
getIndexWithCheckpointAddr(long checkpointAddr)
A canonicalizing mapping for historical (read-only) views of
persistence capable data structures (core impl).
|
ICheckpointProtocol |
getIndexWithCommitRecord(String name,
ICommitRecord commitRecord)
Returns a read-only named index loaded from a
ICommitRecord . |
InputStream |
getInputStream(long addr)
Return an input stream from which a previously written stream may be read
back.
|
long |
getLastCommitTime()
The database wide timestamp of the most recent commit on the store or 0L
iff there have been no commits.
|
protected long |
getMaximumClockSkewMillis()
The maximum error allowed (milliseconds) in the clocks.
|
long |
getMaximumExtent()
The maximum extent before a
commit() will #overflow() . |
int |
getMaxRecordSize()
The maximum length of a record that may be written on the store.
|
IIndex |
getName2Addr()
A read-only view of the
Name2Addr object mapping index names to
the most recent committed Name2Addr.Entry for the named index. |
IIndex |
getName2Addr(long commitTime)
Return a read-only view of the
Name2Addr object as of the
specified commit time. |
long |
getOffset(long addr)
The offset on the store at which the datum is stored.
|
int |
getOffsetBits() |
IPSOutputStream |
getOutputStream()
Return an output stream which can be used to write on the backing store.
|
IPSOutputStream |
getOutputStream(IAllocationContext context)
Return an output stream which can be used to write on the backing store
within the given allocation context.
|
long |
getPhysicalAddress(long addr)
Determine the unencoded physical address
|
Properties |
getProperties()
A copy of the properties used to initialize this journal.
|
protected static String |
getProperty(Properties properties,
String name,
String defaultValue)
Resolves the property value (static variant for ctor initialization).
|
protected String |
getProperty(String name,
String defaultValue)
Resolves the property value.
|
protected <E> E |
getProperty(String name,
String defaultValue,
IValidator<E> validator)
Resolves, parses, and validates the property value.
|
Quorum<HAGlue,QuorumService<HAGlue>> |
getQuorum()
The
Quorum for this service -or- null if the service
is not running with a quorum. |
protected long |
getQuorumToken() |
CommitRecordIndex |
getReadOnlyCommitRecordIndex()
Return a read-only view of the last committed state of the
CommitRecordIndex . |
IResourceMetadata |
getResourceMetadata()
A description of this store in support of the scale-out architecture.
|
long |
getRootAddr(int index)
The last address stored in the specified root slot as of the last
committed state of the store.
|
protected IRootBlockView[] |
getRootBlocks()
Return both root blocks (atomically - used by HA).
|
IRootBlockView |
getRootBlockView()
Return a read-only view of the current root block.
|
IRootBlockView |
getRootBlockViewWithLock()
Variant of
getRootBlockView() that takes the internal lock in
order to provide an appropriate synchronization barrier when installing
new root blocks onto an empty journal in HA. |
ICheckpointProtocol |
getUnisolatedIndex(String name)
Return the mutable view of the named persistence capable data structure
(aka the "live" or
ITx.UNISOLATED view). |
UUID |
getUUID()
|
Iterator<String> |
indexNameScan(String prefix,
long timestamp)
Iterator visits the names of all indices spanned by the given prefix.
|
protected void |
installRootBlocks(IRootBlockView rootBlock0,
IRootBlockView rootBlock1)
Install identical root blocks on the journal.
|
protected void |
invalidateCommitters()
This method is invoked to mark any persistence capable data structures
as invalid (in an error state).
|
boolean |
isChecked()
Return
true if the persistence store uses record level
checksums. |
boolean |
isDirty()
Return
true if the store has been modified since the last
IAtomicStore.commit() or IAtomicStore.abort() . |
boolean |
isDoubleSync() |
boolean |
isFullyBuffered()
True iff the store is fully buffered (all reads are against memory).
|
boolean |
isOpen()
true iff the store is open. |
boolean |
isReadOnly()
Return
true if the journal was opened in a read-only mode or
if closeForWrites(long) was used to seal the journal against
further writes. |
boolean |
isStable()
True iff backed by stable storage.
|
IAllocationContext |
newAllocationContext(boolean isolated)
Creates a context to be used to isolate updates to within the context until it
is released to the parent environment.
|
protected HAGlue |
newHAGlue(UUID serviceId)
Factory for the
HADelegate object for this
AbstractJournal . |
ByteBuffer |
read(long addr)
Read the data (unisolated).
|
ICheckpointProtocol |
register(String name,
IndexMetadata metadata)
Variant method creates and registered a named persistence capable data
structure but does not assume that the data structure will be a
BTree . |
void |
registerIndex(IndexMetadata metadata)
Registers a named index.
|
BTree |
registerIndex(String name,
BTree ndx)
Register a named index.
|
void |
registerIndex(String name,
HTree ndx) |
BTree |
registerIndex(String name,
IndexMetadata metadata)
Deprecated.
|
int |
removeCommitRecordEntries(byte[] fromKey,
byte[] toKey)
Remove all commit records between the two provided keys.
|
void |
rollback()
Deprecated.
Do not use this method. HA provides point in time restore. Use
that. Or you can open a journal using the alternate root block by specifying
Options.ALTERNATE_ROOT_BLOCK |
void |
setCommitter(int rootSlot,
ICommitter committer)
Set a persistence capable data structure for callback during the commit
protocol.
|
protected void |
setQuorumToken(long newValue) |
protected void |
setupCommitters()
Invoked when a journal is first created, re-opened, or when the
committers have been
discarded . |
void |
shutdown()
Shutdown the journal (running tasks will run to completion, but no new
tasks will start).
|
void |
shutdownNow()
Immediate shutdown (running tasks are canceled rather than being
permitted to complete).
|
long |
size()
The #of application data bytes written on the store (does not count any
headers or root blocks that may exist for the store).
|
AbstractJournal.ISnapshotData |
snapshotAllocationData(AtomicReference<IRootBlockView> rbv)
With lock held to ensure that there is no concurrent commit, copy
key data atomically to ensure recovered snapshot is consistent with
the commit state when the snapshot is taken.
|
long |
toAddr(int nbytes,
long offset)
Converts a byte count and offset into a long integer.
|
String |
toString(long addr)
A human readable representation of the address.
|
void |
truncate()
Truncate the backing buffer such that there is no remaining free space in
the journal.
|
protected void |
validateIndexMetadata(String name,
IndexMetadata metadata)
Provides an opportunity to validate some aspects of the
IndexMetadata for an index partition. |
long |
write(ByteBuffer data)
Write the data (unisolated).
|
long |
write(ByteBuffer data,
IAllocationContext context)
Write the data within the allocation context.
|
clone, equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
getLocalTransactionManager, isHAJournal
isGroupCommit
addScheduledTask, getCollectPlatformStatistics, getCollectQueueStatistics, getGlobalFileSystem, getGlobalRowStore, getGlobalRowStore, getHttpdPort, getResourceLocator, getResourceLockService, getTempStore
protected static final org.apache.log4j.Logger haLog
public static final transient int ROOT_NAME2ADDR
Name2Addr
mapping names to BTree
s registered for the
store.public static final transient int PREV_ROOTBLOCK
public static final transient int DELETEBLOCK
public static final transient int ROOT_ICUVERSION
ICUVersionRecord
.
That record specifies the ICU version metadata which was in force when
the journal was created.protected final Properties properties
Journal
.public final File tmpDir
protected final boolean doubleSync
protected final ForceEnum forceOnCommit
protected final boolean deleteOnClose
protected AbstractJournal(Properties properties)
properties
- The properties as defined by Options
.RuntimeException
- If there is a problem when creating, opening, or reading from
the journal file.Options
protected AbstractJournal(Properties properties, Quorum<HAGlue,QuorumService<HAGlue>> quorum)
Quorum
.properties
- The properties as defined by Options
.quorum
- The quorum with which the journal will join (HA mode only).RuntimeException
- If there is a problem when creating, opening, or reading from
the journal file.Options
protected Name2Addr _getName2Addr()
The "live" name2addr index is required for unisolated writers regardless whether they are adding an index, dropping an index, or just recovering the "live" version of an existing named index.
Operations that read on historical Name2Addr
objects can of
course be concurrent. Those objects are loaded from an
ICommitRecord
. See #getIndex(String, ICommitRecord)
.
Note: the "live" Name2Addr
index is a mutable BTree
. All
access to the object MUST be synchronized on that object.
public IIndex getName2Addr()
Name2Addr
object mapping index names to
the most recent committed Name2Addr.Entry
for the named index. The keys are
index names (unicode strings). The values are Name2Addr.Entry
s containing
the names, commitTime, and last known Checkpoint
address of the
named BTree
on the Journal
.public IIndex getName2Addr(long commitTime)
Name2Addr
object as of the
specified commit time.commitTime
- A commit time.null
if there is no commit
record for that commitTime.getName2Addr()
public final long getMaximumExtent()
commit()
will #overflow()
.
In practice, overflow tries to trigger before this point in order to
avoid extending the journal.Options.MAXIMUM_EXTENT
protected static String getProperty(Properties properties, String name, String defaultValue)
protected String getProperty(String name, String defaultValue)
protected <E> E getProperty(String name, String defaultValue, IValidator<E> validator)
name
- The property name.defaultValue
- The default value.public final Properties getProperties()
IJournal
getProperties
in interface IJournal
public IBufferStrategy getBufferStrategy()
BufferMode
.
Note: this method MUST NOT check to see whether the journal is open since
we need to use it if we want to invoke
IRawStore.deleteResources()
and we can only invoke that
method once the journal is closed.
public abstract ExecutorService getExecutorService()
IResourceLocator
. There is no concurrency control associated with
this service, but tasks run here may submit tasks to the
ConcurrencyManager
.getExecutorService
in interface IIndexStore
public void shutdown()
Note: You SHOULD use this method rather than close()
for normal
shutdown of the journal.
shutdown
in interface IJournal
shutdownNow()
public void shutdownNow()
shutdownNow
in interface IJournal
shutdown()
protected void finalize() throws Throwable
public CounterSet getCounters()
getCounters
in interface ICounterSetAccess
protected final CounterSet getIndexCounters()
CounterSet
reflecting the named indices that are open or
which have been recently opened.
Note: This method prefers the live view of the index since this gets us the most recent metadata for the index depth, #of nodes, #of leaves, etc. However, when the live view is not currently open, we prefer the most recent view of the index.
Note: This scans the live Name2Addr
s internal index cache for the
live indices and then scans the historicalIndexCache
for the
read-only index views. Only a single CounterSet
is reported for
any given named index.
CounterSet
reflecting the named indices that were
open as of the time that this method was invoked.public final File getFile()
IRawStore
null
if there is no backing file
for the store.protected void assertBefore(UUID serviceId1, UUID serviceId2, long t1, long t2) throws ClocksNotSynchronizedException
t1
LT t2
, where t1
and
t2
are timestamps obtain such that this relation will be
true
if the clocks on the nodes are synchronized.
Note: Clock synchronization errors can arise across nodes if the nodes are not using a common network time source.
Note: Synchronization errors can arise on a single node if the clock is changed on that node - specifically if the clock is move backwards to before the most recent commit timestamp. For example, if the timezone is changed.
serviceId1
- The service that reported the timestamp t1
.serviceId2
- The service that reported the timestamp t2
.t1
- A timestamp from one service.t2
- A timestamp from the another service.ClocksNotSynchronizedException
ClocksNotSynchronizedException
protected long getMaximumClockSkewMillis()
assertBefore(UUID, UUID, long, long)
to verify that the clocks
are within some acceptable skew of one another. It is also used by
nextCommitTimestamp()
where it specifies the maximum clock skew
that will be corrected without operator intervention.
Note: This is overridden by the HAJournal.
assertBefore(UUID, UUID, long, long)
public long getHAPrepareTimeout()
UnsupportedOperationException
- always.public long getHAReleaseTimeConsensusTimeout()
UnsupportedOperationException
- always.protected void _close()
public void deleteResources()
Note: This is the core implementation of delete and handles event reporting.
deleteResources
in interface IRawStore
IllegalStateException
- if the journal is open.public void truncate()
Note: The caller MUST have exclusive write access to the journal. When
the ConcurrencyManager
is used, that means that the caller MUST
have an exclusive lock on the WriteExecutorService
.
Note: The BufferMode.DiskRW
does NOT support this operation. This
is because it stores meta-allocation information at the end of the file,
which makes it impossible to shrink the file. Therefore this method will
return without causing the file on disk to be shrunk for the RWStore.
public long ensureMinFree(long minFree)
Note: You need an exclusive write lock on the journal to extend it.
minFree
- The minimum #of bytes free for the user extent.public void closeForWrites(long closeTime)
This implementation sets the "closeTime" on the root block such that the journal will no longer accept writes, flushes all buffered writes, and releases any write cache buffers since they will no longer be used. This method is normally used when one journal is being closed out for writes during synchronous overflow processing and new writes will be buffered on a new journal. This has advantages over closing the journal directly including that it does not disturb concurrent readers.
Note: The caller MUST have exclusive write access to the journal. When
the ConcurrencyManager
is used, that means that the caller MUST
have an exclusive lock on the WriteExecutorService
.
Note: This does NOT perform a commit - any uncommitted writes will be discarded.
IllegalStateException
- If there are no commits on the journal.public void close()
shutdownNow()
.public void destroy()
IRawStore
IllegalStateException
if the store
is already closed, but still deletes the backing resources.destroy
in interface IIndexStore
destroy
in interface IRawStore
IRawStore.deleteResources()
protected void assertOpen()
Note: You can see an IllegalStateException
thrown out of here if
there are tasks running during shutdown()
and one of the various
task services times out while awaiting termination. Such exceptions are
normal since the store was closed asynchronously while task(s) were still
running.
IllegalStateException
- if the store is closed.public final IResourceMetadata getResourceMetadata()
IRawStore
getResourceMetadata
in interface IRawStore
public boolean isOpen()
true
iff the store is open.
Note: This will report false
for a new highly available
journal until the quorum has met and #init()
has been invoked for
the Quorum
.
public boolean isReadOnly()
true
if the journal was opened in a read-only mode or
if closeForWrites(long)
was used to seal the journal against
further writes.isReadOnly
in interface IRawStore
protected void assertCanRead()
IllegalStateException
- if the journal is not writable at this time.protected void assertCanWrite()
IllegalStateException
- if the journal is not writable at this time.public boolean isStable()
IRawStore
public boolean isFullyBuffered()
IRawStore
Note: This does not guarantee that the OS will not swap the buffer onto disk.
isFullyBuffered
in interface IRawStore
public boolean isDoubleSync()
public boolean isChecked()
true
if the persistence store uses record level
checksums. When true
, the store will detect bad reads by
comparing the record as read from the disk against the checksum for that
record.public final IRootBlockView getRootBlockView()
Returns the current root block (immediate, non-blocking peek).
Note: The root block reference can be null
until the journal
has been initialized. Once it has been set, the root block will always be
non-null
. Since this method does not obtain the inner lock,
it is possible for another thread to change the root block reference
through a concurrent abort()
or commitNow(long)
. The
IRootBlockView
itself is an immutable data structure.
getRootBlockView
in interface IAtomicStore
public final IRootBlockView getRootBlockViewWithLock()
getRootBlockView()
that takes the internal lock in
order to provide an appropriate synchronization barrier when installing
new root blocks onto an empty journal in HA.public final long getLastCommitTime()
IIndexStore
This method is useful if you plan to issue a series of read-consistent requests against the most current commit point of the database.
getLastCommitTime
in interface IIndexStore
IRootBlockView.getLastCommitTime()
public final void setCommitter(int rootSlot, ICommitter committer)
Note: the committers must be reset after restart or whenever the committers are discarded (the committers are themselves transient objects).
setCommitter
in interface IAtomicStore
rootSlot
- The slot in the root block where the address of the
ICommitter
will be recorded.committer
- The commiter.public void abort()
IAtomicStore
abort
in interface IAtomicStore
public void rollback()
Options.ALTERNATE_ROOT_BLOCK
Note: You MUST have an exclusive write lock on the journal.
Note: To restore the last root block we copy the alternative root block over the current root block. That gives us two identical root blocks and restores us to the root block that was in effect before the last commit.
public boolean isDirty()
IAtomicStore
true
if the store has been modified since the last
IAtomicStore.commit()
or IAtomicStore.abort()
.isDirty
in interface IAtomicStore
IAtomicStore.commit()
or
IAtomicStore.abort()
.public long commit()
IAtomicStore
Note: if the commit fails (by throwing any kind of exception) then you
MUST invoke IAtomicStore.abort()
to abandon the current write set.
commit
in interface IAtomicStore
ICommitRecord
-or- 0L if
there were no data to commit.protected long commitNow(long commitTime)
ICommitter
to flush its state onto the store using
ICommitter.handleCommit(long)
. The address returned by that
method is the address from which the ICommitter
may be reloaded
(and its previous address if its state has not changed). That address is
saved in the ICommitRecord
under the index for which that
committer was registered
. We
then force the data to stable store, update the root block, and force the
root block and the file metadata to stable store.
Note: Each invocation of this method MUST use a distinct commitTime and the commitTimes MUST be monotonically increasing. These guarantees support both the database version history mechanisms and the High Availability mechanisms.
commitTime
- The commit time either of a transaction or of an unisolated
commit. Note that when mixing isolated and unisolated commits
you MUST use the same ITimestampService
for both
purposes.protected void assertCommitTimeAdvances(long commitTime)
commitTime
- The proposed commit time.IllegalArgumentException
- if the commitTime is LTE the value reported by
IRootBlockView.getLastCommitTime()
.protected static void assertPriorCommitTimeAdvances(long currentCommitTime, long priorCommitTime)
currentCommitTime
- priorCommitTime
- IllegalArgumentException
- if the commitTime is LTE the value reported by
IRootBlockView.getLastCommitTime()
.public void force(boolean metadata)
IRawStore
public long size()
IRawStore
public ByteBuffer read(long addr)
IRawStore
read
in interface IRawStore
addr
- A long integer that encodes both the offset from which the
data will be read and the #of bytes to be read. See
IAddressManager.toAddr(int, long)
.public long write(ByteBuffer data)
IRawStore
write
in interface IRawStore
data
- The data. The bytes from the current
Buffer.position()
to the
Buffer.limit()
will be written and the
Buffer.position()
will be advanced to the
Buffer.limit()
. The caller may subsequently
modify the contents of the buffer without changing the state
of the store (i.e., the data are copied into the store).IAddressManager
.public long write(ByteBuffer data, IAllocationContext context)
IAllocationManagerStore
write
in interface IAllocationManagerStore
data
- The data.context
- The allocation context.public IPSOutputStream getOutputStream()
IStreamStore
IPSOutputStream
.getOutputStream
in interface IStreamStore
public IPSOutputStream getOutputStream(IAllocationContext context)
IAllocationManagerStore
IPSOutputStream
.getOutputStream
in interface IAllocationManagerStore
context
- The context within which any allocations are made by the
returned IPSOutputStream
.public InputStream getInputStream(long addr)
IStreamStore
getInputStream
in interface IStreamStore
addr
- The address at which the stream was written.public void delete(long addr)
IRawStore
After this operation subsequent reads on the address MAY fail and the caller MUST NOT depend on the ability to read at that address.
public void delete(long addr, IAllocationContext context)
IAllocationManagerStore
delete
in interface IAllocationManagerStore
addr
- The address whose allocation is to be deleted.context
- The allocation context.public void detachContext(IAllocationContext context)
IAllocationManager
IStore
is the top-level parent of
allocation contexts. The allocators associated with the allocation
context are return to the global list of available allocators.detachContext
in interface IAllocationManager
context
- The application object which serves as the allocation context.public void abortContext(IAllocationContext context)
IAllocationManager
abortContext
in interface IAllocationManager
context
- The application object which serves as the allocation context.public final long getRootAddr(int index)
IAtomicStore
getRootAddr
in interface IAtomicStore
index
- The index of the root address to be retrieved.protected ICommitRecord getEarliestVisibleCommitRecordForHA(long releaseTime)
ICommitRecord
for the earliest visible commit point
based on the caller's releaseTime.
Note: This method is used for HA. The caller provides a releaseTime based
on the readsOnCommitTime of the earliestActiveTx and the minReleaseAge
rather than ITransactionService.getReleaseTime()
since the latter
is only updated by the release time consensus protocol during a 2-phase
commit.
public ICommitRecord getCommitRecord()
ICommitRecord
containing the root addresses.ICommitRecord
and never null
.protected void invalidateCommitters()
https://jira.blazegraph.com/browse/BLZG-1953
protected void discardCommitters()
abort()
when the store must discard
any hard references that it may be holding to objects registered as
ICommitter
s.
The default implementation discards the btree mapping names to named btrees.
Subclasses MAY extend this method to discard their own committers but MUST NOT override it completely.
protected void setupCommitters()
discarded
.
The basic implementation sets up the btree that is responsible for resolving named btrees.
Subclasses may extend this method to setup their own committers.
public CommitRecordIndex getReadOnlyCommitRecordIndex()
CommitRecordIndex
.CommitRecordIndex
.protected CommitRecordIndex getCommitRecordIndex(long addr, boolean readOnly)
ICommitRecord
s. This method is capable of returning either the
live CommitRecordIndex
or a read-only view of any committed
version of that index.
CAUTION: DO NOT EXPOSE THE LIVE COMMIT RECORD INDEX OUTSIDE OF
THIS CLASS. IT IS NOT POSSIBLE TO HAVE CORRECT SYNCHRONIZATION ON THAT
INDEX IN ANOTHER CLASS.addr
- The root address of the index -or- 0L if the index has not
been created yet. When addr is non-IAddressManager.NULL
, each
invocation will return a distinct CommitRecordIndex
object.readOnly
- When false
the returned is NOT cached.CommitRecordIndex
for that address or a new index if
0L
was specified as the address._commitRecordIndex
public ICommitRecord getCommitRecord(long commitTime)
ICommitRecord
for the most recent committed state
whose commit timestamp is less than or equal to timestamp. This
is used by a transaction
to locate the committed state that is
the basis for its operations.getCommitRecord
in interface IAtomicStore
commitTime
- The timestamp of interest.ICommitRecord
for the most recent committed state
whose commit timestamp is less than or equal to timestamp
-or- null
iff there are no ICommitRecord
s
that satisfy the probe.public ICommitRecord getCommitRecordStrictlyGreaterThan(long commitTime)
commitTime
- The commit time.null
if there is no commit
record whose timestamp is strictly greater than
commitTime.public IIndex getIndex(String name, long commitTime)
Note: Transactions should pass in the timestamp against which they are
reading rather than the transaction identifier (aka startTime). By
providing the timestamp of the commit point, the transaction will hit the
indexCache
. If the transaction passes the startTime instead,
then all startTimes will be different and the cache will be defeated.
getIndex
in interface IIndexManager
name
- The index name.commitTime
- A timestamp which represents either a possible commit time on
the store or a read-only transaction identifier.null
iff there is no index registered
with that name for that timestamp.UnsupportedOperationException
- If you pass in ITx.UNISOLATED
,
ITx.READ_COMMITTED
, or a timestamp that corresponds
to a read-write transaction since those are not "commit
times".indexCache
,
Add
cache for access to historical index views on the Journal by name
and commitTime.
FIXME GIST Reconcile API tension with {@link IIndex} and
{@link ICheckpointProtocol}, however this method is overridden by
{@link Journal} and is also implemented by
{@link IBigdataFederation}. The central remaining tensions are
{@link FusedView} and the local/remote aspect. {@link FusedView}
could probably be "fixed" by extending {@link AbstractBTree} rather
than having an inner delegate for the mutable view. The local/remote
issue is more complex.public final ICheckpointProtocol getIndexLocal(String name, long commitTime)
Note: Transactions should pass in the timestamp against which they are
reading rather than the transaction identifier (aka startTime). By
providing the timestamp of the commit point, the transaction will hit the
indexCache
. If the transaction passes the startTime instead,
then all startTimes will be different and the cache will be defeated.
getIndexLocal
in interface IGISTLocalManager
UnsupportedOperationException
- If you pass in ITx.UNISOLATED
,
ITx.READ_COMMITTED
, or a timestamp that corresponds
to a read-write transaction since those are not "commit
times".indexCache
,
getIndex(String, long)
,
Add
cache for access to historical index views on the Journal by name
and commitTime. protected int getIndexCacheSize()
IIndex
.protected int getHistoricalIndexCacheSize()
IIndex
.public final ICheckpointProtocol getIndexWithCommitRecord(String name, ICommitRecord commitRecord)
ICommitRecord
. The
index will be marked as read-only, it will NOT permit writes, and
ICheckpointProtocol#getLastCommitTime(long)
will report the value
associated with Name2Addr.Entry.commitTime
for the historical
Name2Addr
instance for that ICommitRecord
.
Note: This method should be preferred to
getIndexWithCheckpointAddr(long)
for read-historical indices
since it will explicitly mark the index as read-only and specifies the
lastCommitTime on the returned index based on
Name2Addr.Entry.commitTime
, which is the actual commit time for
the last update to the index.
null
iff the named index did
not exist as of that commit record.public final ICheckpointProtocol getIndexWithCheckpointAddr(long checkpointAddr)
Note: This method imposes a canonicalizing mapping and ensures that there will be at most one object providing a view of the historical data structure as of the specified timestamp. This guarentee is used to facilitate buffer management.
Note: The canonicalizing mapping for unisolated views of persistence
capable data structures is maintained by the ITx.UNISOLATED
Name2Addr
instance.
checkpointAddr
- The address of the Checkpoint
record.Checkpoint
.Options.HISTORICAL_INDEX_CACHE_CAPACITY
public final void registerIndex(IndexMetadata metadata)
Note: A named index must be registered outside of any transaction before it may be used inside of a transaction.
Note: You MUST commit()
before the registered index will be
either restart-safe or visible to new transactions.
registerIndex
in interface IGISTManager
metadata
- The metadata describing the index.IGISTLocalManager.getIndexLocal(String, long)
protected void validateIndexMetadata(String name, IndexMetadata metadata)
IndexMetadata
for an index partition.public final BTree registerIndex(String name, IndexMetadata metadata)
register(String, IndexMetadata)
This variant allows you to register an index under a name other than the
value returned by IndexMetadata.getName()
.
Note: This variant is generally by the IMetadataService
when it
needs to register a index partition of some scale-out index on a
IDataService
. In this case the name is the name of the
index partition while the value reported by
IndexMetadata.getName()
is the name of the scale-out index. In
nearly all other cases you can use IGISTManager.registerIndex(IndexMetadata)
instead. The same method signature is also declared by
IDataService.registerIndex(String, IndexMetadata)
in order to
support registration of index partitions.
Note: Due to the method signature, this method CAN NOT be used to create
and register persistence capable data structures other than an
IIndex
(aka B+Tree).
Once registered the index will participate in atomic commits.
Note: A named index must be registered outside of any transaction before it may be used inside of a transaction.
Note: You MUST commit()
before the registered index will be
either restart-safe or visible to new transactions.
registerIndex
in interface IBTreeManager
name
- The name that can be used to recover the index.metadata
- The metadata describing the index.IBTreeManager.getIndex(String)
.IIndexManager.getIndex(String, long)
,
FIXME GIST Due to the method signature, this method CAN NOT be used
to create and register persistence capable data structures other
than an {@link IIndex} (aka B+Tree). It is difficult to reconcile
this method with other method signatures since this method is
designed for scale-out and relies on {@link IIndex}. However, only
the B+Tree is an {@link IIndex}. Therefore, this method signature
can not be readily reconciled with the {@link HTree}. The only
interface which the {@link BTree} and {@link HTree} share is the
{@link ICheckpointProtocol} interface, but that is a purely local
(not remote) interface and is therefore not suitable to scale-out.
Also, it is only in scale-out where the returned object can be a
different type than the simple {@link BTree} class, e.g., a
{@link FusedView} or even an {@link IClientIndex}.
public ICheckpointProtocol register(String name, IndexMetadata metadata)
BTree
.register
in interface IGISTLocalManager
store
- The backing store.metadata
- The metadata that describes the data structure to be created.Checkpoint.create(IRawStore, IndexMetadata)
public final BTree registerIndex(String name, BTree ndx)
IBTreeManager
registerIndex
in interface IBTreeManager
name
- The name that can be used to recover the index.ndx
- The btree.IBTreeManager.getIndex(String)
.IGISTManager.registerIndex(IndexMetadata)
,
IIndexManager.getIndex(String, long)
,
IGISTLocalManager.getIndexLocal(String, long)
public void dropIndex(String name)
Note: Whether or not and when index resources are reclaimed is dependent on the store. For example, an immortal store will retain all historical states for all indices. Likewise, a store that uses index partitions may be able to delete index segments immediately.
Drops the named index. The index will no longer participate in atomic commits and will not be visible to new transactions. Storage will be reclaimed IFF the backing store support that functionality.
dropIndex
in interface IGISTManager
name
- The name of the index to be dropped.public Iterator<String> indexNameScan(String prefix, long timestamp)
IGISTManager
indexNameScan
in interface IGISTManager
prefix
- The prefix (optional). When given, this MUST include a
.
if you want to restrict the scan to only those
indices in a given namespace. Otherwise you can find indices
in kb2
if you provide the prefix kb
where both kb and kb2 are namespaces since the indices spanned
by kb
would include both kb.xyz
and
kb2.xyx
.timestamp
- A timestamp which represents either a possible commit time on
the store or a read-only transaction identifier.public final BTree getIndex(String name)
ITx.UNISOLATED
index). This object is NOT thread-safe. You MUST
NOT write on this index unless you KNOW that you are the only writer. See
ConcurrencyManager
, which handles exclusive locks for
ITx.UNISOLATED
indices.getIndex
in interface IBTreeManager
name
- The name of the index.#getLiveView(String, long)
public final ICheckpointProtocol getUnisolatedIndex(String name)
ITx.UNISOLATED
view).getUnisolatedIndex
in interface IGISTLocalManager
IIndexManager#getIndex(String)
,
GIST public final long getOffset(long addr)
IAddressManager
getOffset
in interface IAddressManager
addr
- The opaque identifier that is the within store locator for
some datum.public final long getPhysicalAddress(long addr)
IAddressManager
getPhysicalAddress
in interface IAddressManager
addr
- The encoded addresspublic final int getByteCount(long addr)
IAddressManager
IRawStore
.getByteCount
in interface IAddressManager
addr
- The opaque identifier that is the within store locator for
some datum.public final long toAddr(int nbytes, long offset)
IAddressManager
toAddr
in interface IAddressManager
nbytes
- The byte count.offset
- The byte offset.public final String toString(long addr)
IAddressManager
toString
in interface IAddressManager
addr
- The opaque identifier that is the within store locator for
some datum.public final int getOffsetBits()
public final int getMaxRecordSize()
protected long getQuorumToken()
protected void clearQuorumToken(long newValue)
newValue
- The new value.protected void setQuorumToken(long newValue)
public final long awaitHAReady(long timeout, TimeUnit units) throws InterruptedException, TimeoutException, AsynchronousQuorumCloseException
IJournal
#setQuorumToken(long)
commitCounter:=0
, then the root blocks from the leader have
been installed on the follower.awaitHAReady
in interface IJournal
timeout
- The timeout to await this condition.units
- The units for that timeout.InterruptedException
TimeoutException
AsynchronousQuorumCloseException
public final long getHAReady()
haReadyToken
(non-blocking).public final HAStatusEnum getHAStatus()
HAStatusEnum.Leader
, a
HAStatusEnum.Follower
, or HAStatusEnum.NotReady
. This is
exposed both here (an RMI interface) and by the REST API.HAStatusEnum
or null
if the store is not
associated with a Quorum
.HAGlue.getHAStatus()
public final void assertHAReady(long token) throws QuorumException
getHAReady()
token has the specified value.token
- The specified value.QuorumException
protected void installRootBlocks(IRootBlockView rootBlock0, IRootBlockView rootBlock1)
- the DirectBufferPool.INSTANCE has the same buffer capacity (so there will be room for the write cache data in the buffers on all nodes).
QuorumService#installRootBlocks(IRootBlockView)
public final void doLocalAbort()
public final void doLocalCommit(IRootBlockView rootBlock)
Note: This is used to support RESTORE by replay of HALog files when the HAJournalServer is offline. TODO This method should be protected. If we move the HARestore class into this package, then it can be changed from public to protected or package private.
protected void doLocalCommit(QuorumService<HAGlue> localService, IRootBlockView rootBlock)
localService
- For HA modes only. When non-null
, this is used to
identify whether the service is the leader. When the service
is not the leader, we need to do some additional work to
maintain the IRWStrategy
allocators in synch at each
commit point.rootBlock
- The new root block.public Quorum<HAGlue,QuorumService<HAGlue>> getQuorum()
IJournal
Quorum
for this service -or- null
if the service
is not running with a quorum.protected HAGlue newHAGlue(UUID serviceId)
HADelegate
object for this
AbstractJournal
. The object returned by this method will be made
available using QuorumMember.getService()
.UnsupportedOperationException
- always.protected IRootBlockView[] getRootBlocks()
Note: This takes a lock to ensure that the root blocks are consistent with a commit point on the backing store.
public AbstractJournal.ISnapshotData snapshotAllocationData(AtomicReference<IRootBlockView> rbv) throws IOException
If this is not done then it is possible for the allocation data - both metabits and fixed allocator commit bits - to be overwritten and inconsistent with the saved root blocks.
IOException
public int removeCommitRecordEntries(byte[] fromKey, byte[] toKey)
This is called from the RWStore when it checks for deferredFrees against the CommitRecordIndex where the CommitRecords reference the deleteBlocks that have been deferred.
Once processed the records for the effected range muct be removed as they reference invalid states.
fromKey
- toKey
- https://sourceforge.net/apps/trac/bigdata/ticket/440
,
IHistoryManager.checkDeferredFrees(AbstractJournal)
public IAllocationContext newAllocationContext(boolean isolated)
IAllocationManager
newAllocationContext
in interface IAllocationManager
Copyright © 2006–2019 SYSTAP, LLC DBA Blazegraph. All rights reserved.