Package com.bigdata.journal

The journal is an append-only persistence capable data structure supporting atomic commit, named indices, and transactions.

See: Description

Package com.bigdata.journal Description

The journal is an append-only persistence capable data structure supporting atomic commit, named indices, and transactions. Writes are logically appended to the journal to minimize disk head movement. The addressing scheme of the journal is configurable. The scale-up default allows individual journal that address up to 4 terabytes and allows records up to 4 megabytes in length. The scale-out default allows records of up to 64 megabytes in length, but the maximum file size is smaller. See the WormAddressManager for details. The journal supports the concept of "overflow", which is triggered when the journal exceeds a threshold extent. An implementation that handles overflow will expunge B+Trees from the journal onto read-optimized index segments, thereby creating a database of range partitioned indices. See the ResourceManager for overflow handling, which is part of the basic DataService.

The journal may be used as a Write Once Read Many (WORM) store. An index is maintained over the historical commit points for the store, so a journal may also be used as an immortal store in which all historical consistent states may be accessed. A read-write architecture may be realized by limiting the journal to 100s of megabytes in length. Both incrementally and when the journal overflows, key ranges of indices are expunged from the journal onto read-only index segments. When used in this manner, a consistent read on an index partition requires a fused view of the data on the journal and the data in the active index segments. Again, see ResourceManager for details.

The journal can be either wired into memory or accessed in place on disk. The advantage of wiring the journal into memory is that the disk head does not move for reads from the journal. Wiring the journal into memory can offer performance benefits if writes are to be absorbed in a persistent buffer and migrated to a read-optimized index segments. However, the B+Tree implementation provides caching for recently used nodes and leaves. In addition, wiring a journal limits the maximum size of the journal to less than 2 gigabytes (it must fit into the JVM memory pool) and can cause contention for system RAM.

The journal relies on the BTree to provide a persistent mapping from keys to values. When used as an object store, the B+Tree naturally clusters serialized objects within the leaves of the B+Tree based on their object identifier and provides IO efficiencies. (From the perspective of the journal, an object is a byte[] or byte stream.) If the database generalizes the concept of an object identifier to a variable length byte[], then the application can take control over the clustering behavior by simply choosing how to code a primary key for their objects.

When using journals with an overflow limit of a few 100MB, a few large records would cause the journal to overflow. In such cases the journal may be configured to have a larger maximum extent and therefore defer overflow. The scale-out file system is an example of such an application and records of (up to) 64M by default. The index for the file system only stores a reference to the raw record on the journal. During overflow processing, the record is replicated onto an index segment and the index entry in the index segment is modified during the build so that it has a reference to the replicated record.

Copyright © 2006–2019 SYSTAP, LLC DBA Blazegraph. All rights reserved.