CustomByteArrayFrontCodedList (Blazegraph Database Platform 2.1.5 API)

java.lang.Object
- java.util.AbstractCollection<K>
- - it.unimi.dsi.fastutil.objects.AbstractObjectCollection<K>
 - - it.unimi.dsi.fastutil.objects.AbstractObjectList<byte[]>
 - - it.unimi.dsi.fastutil.bytes.custom.CustomByteArrayFrontCodedList

All Implemented Interfaces:

it.unimi.dsi.fastutil.objects.ObjectCollection<byte[]>, it.unimi.dsi.fastutil.objects.ObjectIterable<byte[]>, it.unimi.dsi.fastutil.objects.ObjectList<byte[]>, it.unimi.dsi.fastutil.Stack<byte[]>, Serializable, Cloneable, Comparable<List<? extends byte[]>>, Iterable<byte[]>, Collection<byte[]>, List<byte[]>
```
public class CustomByteArrayFrontCodedList
extends it.unimi.dsi.fastutil.objects.AbstractObjectList<byte[]>
implements Serializable, Cloneable
```
Compact storage of lists of arrays using front coding.
This class stores immutably a list of arrays in a single large array using front coding (of course, the compression will be reasonable only if the list is sorted lexicographically—see below). It implements an immutable type-specific list that returns the i-th array when calling get(i). The returned array may be freely modified.
Front coding is based on the idea that if the i-th and the (i+1)-th array have a common prefix, we might store the length of the common prefix, and then the rest of the second array.
This approach, of course, requires that once in a while an array is stored entirely. The ratio of a front-coded list defines how often this happens (once every ratio() arrays). A higher ratio means more compression, but means also a longer access time, as more arrays have to be probed to build the result. Note that we must build an array every time get(int) is called, but this class provides also methods that extract one of the stored arrays in a given array, reducing garbage collection. See the documentation of the family of get() methods.
By setting the ratio to 1 we actually disable front coding: however, we still have a data structure storing large list of arrays with a reduced overhead (just one integer per array, plus the space required for lengths).
Note that the typical usage of front-coded lists is under the form of serialized objects; usually, the data that has to be compacted is processed offline, and the resulting structure is stored permanently. Since the pointer array is not stored, the serialized format is very small.
Implementation Details

All arrays are stored in a large array. A separate array of pointers indexes arrays whose position is a multiple of the ratio: thus, a higher ratio means also less pointers.
More in detail, an array whose position is a multiple of the ratio is stored as the array length, followed by the elements of the array. The array length is coded by a simple variable-length list of k-1 bit blocks, where k is the number of bits of the underlying primitive type. All other arrays are stored as follows: let common the length of the maximum common prefix between the array and its predecessor. Then we store the array length decremented by common, followed by common, followed by the array elements whose index is greater than or equal to common. For instance, if we store foo, foobar, football and fool in a front-coded character-array list with ratio 3, the character array will contain
```
 3 f o o 3 3 b a r 5 3 t b a l l 4 f o o l
 
```
Limitations

All arrays are stored in a large array: thus, the compressed list must not exceed Integer.MAX_VALUE elements. Moreover, iterators are less efficient when they move back, as previous() cannot build incrementally the previous array (whereas ( next() can).
Modifications
This class was derived from it.unimi.dsi.fastutil.bytes.ArrayFrontCodedList, which is part of fastutils. The folowing changes were made:
- The name of the class has been changed to prevent classpath problems.
- The class has a new serialVersionUID and the serialization logic has been modified to allow serialization against DataOutput by defining getBackingBuffer() and a new constructors that operate on a byte[] slice.
- The test code from main() has been isolated in a junit test suite.
- The backing byte[] array has been replaced by an interface suitable for wrapping either a byte[] or a ByteBuffer. This was done in order to permit access to the front-coded representation without "de-serializing" the data and a suitable constructor was added for the ByteBuffer case.
- Make the Collection and Iterator ctors strongly typed.
See Also:
Serialized Form

Nested Class Summary

Nested Classes
Modifier and Type Class and Description

static interface CustomByteArrayFrontCodedList.BackingBuffer
Abstraction allowing different implementations of the backing buffer.
- Nested classes/interfaces inherited from class it.unimi.dsi.fastutil.objects.AbstractObjectList
 it.unimi.dsi.fastutil.objects.AbstractObjectList.ObjectSubList<K>

Nested Classes
Modifier and Type	Class and Description
`static interface`	`CustomByteArrayFrontCodedList.BackingBuffer` Abstraction allowing different implementations of the backing buffer.

Field Summary

Fields
Modifier and Type	Field and Description
`protected int`	`n` The number of arrays in the list.
`protected int[]`	`p` The pointers to entire arrays in the list.
`protected int`	`ratio` The ratio of this front-coded list.

Constructor Summary

Constructors
Constructor and Description
`CustomByteArrayFrontCodedList(Collection<byte[]> c, int ratio)` Creates a new front-coded list containing the arrays in the given collection.
`CustomByteArrayFrontCodedList(Collection<byte[]> c, int ratio, boolean hasDups)` Creates a new front-coded list containing the arrays in the given collection.
`CustomByteArrayFrontCodedList(int n, int ratio, byte[] array)` Reconsitute an instance from just the coded byte[], the #of elements in the array, and the ratio.
`CustomByteArrayFrontCodedList(int n, int ratio, byte[] array, int off, int len, boolean hasDups)` Reconsitute an instance from a slice byte[] containing the coded data, the #of elements in the array, and the ratio.
`CustomByteArrayFrontCodedList(int n, int ratio, ByteBuffer b)` Reconsitute an instance from just a `ByteBuffer` view onto the coded byte[], the #of elements in the array, and the ratio.
`CustomByteArrayFrontCodedList(Iterator<byte[]> arrays, int ratio)` Creates a new front-coded list containing the arrays returned by the given iterator.
`CustomByteArrayFrontCodedList(Iterator<byte[]> arrays, int ratio, boolean hasDups)`

Method Summary

Methods
Modifier and Type	Method and Description
`int`	`arrayLength(int index)` Computes the length of the array at the given index.
`Object`	`clone()` Returns a copy of this list.
`byte[]`	`get(int index)`
`int`	`get(int index, byte[] a)` Stores in the given array an array stored in this front-coded list.
`int`	`get(int index, byte[] a, int offset, int length)` Stores in the given array elements from an array stored in this front-coded list.
`byte[]`	`getArray(int index)`
`CustomByteArrayFrontCodedList.BackingBuffer`	`getBackingBuffer()` Return the backing buffer.
`it.unimi.dsi.fastutil.objects.ObjectListIterator<byte[]>`	`listIterator(int start)`
`int`	`ratio()` Returns the ratio of this list.
`int`	`search(byte[] a)` Search for the index of the value having the same data.
`int`	`size()`
`String`	`toString()` Modified to dump internal record metadata and to show the byte[]s as unsigned values.
`static String`	`toString(byte[] key)` Formats a key as a series of comma delimited unsigned bytes.
`static String`	`toString(byte[] key, int off, int len)` Formats a key as a series of comma delimited unsigned bytes.
`int`	`writeOn(OutputStream os, int index)` Write the specified byte[] onto a stream.

Methods inherited from class it.unimi.dsi.fastutil.objects.AbstractObjectList
add, add, addAll, addAll, addElements, addElements, compareTo, contains, ensureIndex, ensureRestrictedIndex, equals, getElements, hashCode, indexOf, iterator, lastIndexOf, listIterator, objectListIterator, objectListIterator, objectSubList, peek, pop, push, remove, removeElements, set, size, subList, top

Methods inherited from class it.unimi.dsi.fastutil.objects.AbstractObjectCollection
containsAll, isEmpty, objectIterator, removeAll, retainAll, toArray, toArray

Methods inherited from class java.util.AbstractCollection
clear, remove

Methods inherited from class java.lang.Object
finalize, getClass, notify, notifyAll, wait, wait, wait

Methods inherited from interface java.util.List
clear, containsAll, isEmpty, remove, removeAll, retainAll, toArray, toArray

Methods inherited from interface it.unimi.dsi.fastutil.objects.ObjectCollection
objectIterator, toArray

Methods inherited from interface it.unimi.dsi.fastutil.Stack
isEmpty

- Field Detail
 - n
```
protected int n
```
 The number of arrays in the list.
 - ratio
```
protected int ratio
```
 The ratio of this front-coded list.
 - p
```
protected transient int[] p
```
 The pointers to entire arrays in the list.
- Constructor Detail
 - CustomByteArrayFrontCodedList
```
public CustomByteArrayFrontCodedList(Iterator<byte[]> arrays,
 int ratio)
```
 Creates a new front-coded list containing the arrays returned by the given iterator.
 
 Parameters:
 arrays - an iterator returning arrays.
 ratio - the desired ratio.
 - CustomByteArrayFrontCodedList
```
public CustomByteArrayFrontCodedList(Iterator<byte[]> arrays,
 int ratio,
 boolean hasDups)
```
 - CustomByteArrayFrontCodedList
```
public CustomByteArrayFrontCodedList(Collection<byte[]> c,
 int ratio)
```
 Creates a new front-coded list containing the arrays in the given collection.
 
 Parameters:
 c - a collection containing arrays.
 ratio - the desired ratio.
 - CustomByteArrayFrontCodedList
```
public CustomByteArrayFrontCodedList(Collection<byte[]> c,
 int ratio,
 boolean hasDups)
```
 Creates a new front-coded list containing the arrays in the given collection.
 
 Parameters:
 c - a collection containing arrays.
 ratio - the desired ratio.
 hasDups - true iff the list allows duplicate keys.
 - CustomByteArrayFrontCodedList
```
public CustomByteArrayFrontCodedList(int n,
 int ratio,
 byte[] array)
```
 Reconsitute an instance from just the coded byte[], the #of elements in the array, and the ratio.
 
 Parameters:
 n - The #of elements in the array.
 ratio - The ratio of this front-coded list.
 array - The array containing the compressed arrays.
 - CustomByteArrayFrontCodedList
```
public CustomByteArrayFrontCodedList(int n,
 int ratio,
 byte[] array,
 int off,
 int len,
 boolean hasDups)
```
 Reconsitute an instance from a slice byte[] containing the coded data, the #of elements in the array, and the ratio.
 
 Parameters:
 n - The #of elements in the array.
 ratio - The ratio of this front-coded list.
 array - The array containing the compressed arrays.
 hasDups - true iff the list allows duplicate keys.
 - CustomByteArrayFrontCodedList
```
public CustomByteArrayFrontCodedList(int n,
 int ratio,
 ByteBuffer b)
```
 Reconsitute an instance from just a ByteBuffer view onto the coded byte[], the #of elements in the array, and the ratio.
 
 Parameters:
 n - The #of elements in the array.
 ratio - The ratio of this front-coded list.
 b - The view onto the compressed arrays.
- Method Detail
 - ratio
```
public int ratio()
```
 Returns the ratio of this list.
 
 Returns:
 the ratio of this list.
 - arrayLength
```
public int arrayLength(int index)
```
 Computes the length of the array at the given index.
 
 Parameters:
 index - an index.
 
 Returns:
 the length of the index-th array.
 - get
```
public byte[] get(int index)
```
 Specified by:
 
 get in interface List<byte[]>
 - getArray
```
public byte[] getArray(int index)
```
 See Also:
 get(int)
 - writeOn
```
public int writeOn(OutputStream os,
 int index)
 throws IOException
```
 Write the specified byte[] onto a stream.
 
 Parameters:
 os - The stream.
 index - The index of the byte[].
 
 Returns:
 The #of bytes written on the stream.
 
 Throws:
 
 IOException
 - get
```
public int get(int index,
 byte[] a,
 int offset,
 int length)
```
 Stores in the given array elements from an array stored in this front-coded list.
 
 Parameters:
 index - an index.
 a - the array that will store the result.
 offset - an offset into a where elements will be store.
 length - a maximum number of elements to store in a.
 
 Returns:
 if a can hold the extracted elements, the number of extracted elements; otherwise, the number of remaining elements with the sign changed.
 - get
```
public int get(int index,
 byte[] a)
```
 Stores in the given array an array stored in this front-coded list.
 
 Parameters:
 index - an index.
 a - the array that will store the content of the result (we assume that it can hold the result).
 
 Returns:
 if a can hold the extracted elements, the number of extracted elements; otherwise, the number of remaining elements with the sign changed.
 - size
```
public int size()
```
 Specified by:
 
 size in interface Collection<byte[]>
 
 Specified by:
 
 size in interface List<byte[]>
 
 Specified by:
 
 size in class AbstractCollection<byte[]>
 - listIterator
```
public it.unimi.dsi.fastutil.objects.ObjectListIterator<byte[]> listIterator(int start)
```
 Specified by:
 
 listIterator in interface it.unimi.dsi.fastutil.objects.ObjectList<byte[]>
 
 Specified by:
 
 listIterator in interface List<byte[]>
 
 Overrides:
 
 listIterator in class it.unimi.dsi.fastutil.objects.AbstractObjectList<byte[]>
 - clone
```
public Object clone()
```
 Returns a copy of this list.
 
 Overrides:
 
 clone in class Object
 
 Returns:
 a copy of this list.
 - toString
```
public String toString()
```
 Modified to dump internal record metadata and to show the byte[]s as unsigned values.
 
 Overrides:
 
 toString in class it.unimi.dsi.fastutil.objects.AbstractObjectList<byte[]>
 - getBackingBuffer
```
public CustomByteArrayFrontCodedList.BackingBuffer getBackingBuffer()
```
 Return the backing buffer.
 - search
```
public int search(byte[] a)
```
 Search for the index of the value having the same data. The results are meaningless if the list is not ordered. A binary search is performed against each of the entries that is coded as a full length value. If there is a match, the index of that entry is returned directly. Otherwise a linear scan is performed starting with the insertion point as identified by the binary search. The combination of a binary search followed by a linear scan implies that search is fastest when ratio is small. However, the compression is highest when the ratio is equal to the #of entries in the list. Therefore, search performance is traded off against compression.
 Note: The full length entry is coded every [i/ratio] entries. However, the subsequent entries code their common length with respect to the previous front-coded entry NOT to the full length entry. This means that the length of the common prefix can increase or decrease as we scan a bucket and the #of already matched bytes can increase or decrease as well. Consider the following example, when coded with a ratio of 8.
```
 [121, 59, 18, 79, 99, 112, 24, 116], // #0 rlen=8, clen=0 (new bucket)
 [121, 59, 18, 79, 99, 112, 43, 68], // #1 rlen=2, clen=6
 [121, 59, 18, 79, 99, 112, 46, 78], // #2 rlen=2, clen=6
 [121, 59, 18, 79, 99, 112, 54, 48], // #3 rlen=2, clen=6
 [121, 59, 18, 79, 99, 112, 54, 108], // #4 rlen=1, clen=7 (***)
 [121, 59, 18, 79, 99, 112, 55, 81], // #5 rlen=2, clen=6
 [121, 59, 18, 79, 99, 112, 62, 85], // #6 rlen=2, clen=6
 [121, 59, 18, 79, 99, 112, 63, 110], // #7 rlen=8, clen=0 (new bucket)
 [121, 59, 18, 79, 99, 112, 71, 124], // #8 ...
 [121, 59, 18, 79, 99, 112, 73, 49] // #9 ...
 
```
 The common length grows for entry #4 because the [54] in the next to last byte in the array already appears in the same position in the previous front-coded entry. However, the common length decreases again for the next entry (#5).
 The following rules guide the linear search of the bucket identified by the binary search.
 1. If clen GT mlen, then skip to the next entry in the bucket as no match is possible (the common prefix was demonstrated to be longer than the matched prefix on a previous entry).
 2. Compare the remaining bytes in the search probe with the remainder for the current entry.
 3. If the search probe is EQ to the bucket entry, then halt. The probe key was found.
 4. If the search probe is LT the bucket entry, then halt. The probe key was not found.
 5. Otherwise, mlen += prefixLength, where prefixLength is the length of the matched prefix from step 2.
 Parameters:
 a - The search probe, which is interpreted as an unsigned byte[]
 
 Returns:
 index of the search key, if it is found; otherwise, (-(insertion point) - 1). The insertion point is defined as the point at which the key would be inserted. Note that this guarantees that the return value will be >= 0 if and only if the key is found.
 toString
 
 public static final String toString(byte[] key)
 
 Formats a key as a series of comma delimited unsigned bytes.
 
 Parameters:
 key - The key.
 
 Returns:
 The string representation of the array as unsigned bytes.
 
 toString
 
 public static final String toString(byte[] key, int off, int len)
 
 Formats a key as a series of comma delimited unsigned bytes.
 
 Parameters:
 key - The key.
 off - The index of the first byte that will be visited.
 len - The #of bytes to visit.
 
 Returns:
 The string representation of the array as unsigned bytes.

Class CustomByteArrayFrontCodedList

Implementation Details

Limitations

Modifications

Nested Class Summary

Nested classes/interfaces inherited from class it.unimi.dsi.fastutil.objects.AbstractObjectList

Field Summary

Constructor Summary

Method Summary

Methods inherited from class it.unimi.dsi.fastutil.objects.AbstractObjectList

Methods inherited from class it.unimi.dsi.fastutil.objects.AbstractObjectCollection

Methods inherited from class java.util.AbstractCollection

Methods inherited from class java.lang.Object

Methods inherited from interface java.util.List

Methods inherited from interface it.unimi.dsi.fastutil.objects.ObjectCollection

Methods inherited from interface it.unimi.dsi.fastutil.Stack

Field Detail

n

ratio

p

Constructor Detail

CustomByteArrayFrontCodedList

CustomByteArrayFrontCodedList

CustomByteArrayFrontCodedList

CustomByteArrayFrontCodedList

CustomByteArrayFrontCodedList

CustomByteArrayFrontCodedList

CustomByteArrayFrontCodedList

Method Detail

ratio

arrayLength

get

getArray

writeOn

get

get

size

listIterator

clone

toString

getBackingBuffer

search

toString

toString