BytesUtil (Blazegraph Database Platform 2.1.5 API)

java.lang.Object
- com.bigdata.util.BytesUtil

```
public class BytesUtil
extends Object
```
Class supporting operations on variable length byte[] keys.
Comparison operations that accept a starting offset are used when the byte[]s are known to share a leading prefix that may be skipped during comparison.
Comparison operations that accept a starting offset and length are used when immutable keys are stored in a single byte[] and an index into starting positions in that byte[] is maintained.
JNI methods are provided for unsigned byte[] comparison. However, note that the JNI methods do not appear to be as fast as the pure Java methods - presumably because of the overhead of going from Java to C. In order to execute using the JNI methods you MUST define the optional boolean system property, e.g.,
```
 java -Dcom.bigdata.btree.BytesUtil.jni=true ...
 
```
See BytesUtil.c in this package for instructions on compiling the JNI methods.
See main(String[]) which provides a test for the JNI integration and some pointers on how to get this running on your platform.
Version:

$Id$

Author:

Bryan Thompson

Nested Class Summary

Nested Classes
Modifier and Type Class and Description

static class BytesUtil.UnsignedByteArrayComparator
Compares two unsigned byte[]s.

Nested Classes
Modifier and Type	Class and Description
`static class`	`BytesUtil.UnsignedByteArrayComparator` Compares two unsigned byte[]s.

Field Summary

Fields
Modifier and Type	Field and Description
`static byte[]`	`EMPTY` An empty `byte[]`.
`static byte[][]`	`EMPTY2` An empty `byte[][]`.
`static int`	`minlen` JNI routines are not invoked unless we will compare byte[]s with at least this many potential bytes to compare (the actual# may be much less of course since comparisons may fail fast).

Constructor Summary

Constructors
Constructor and Description

BytesUtil()

Constructors
Constructor and Description
`BytesUtil()`

Method Summary

Methods
Modifier and Type	Method and Description
`static int`	`altGetBits32(byte[] a, int off, int len)`
`static long`	`altGetBits64(byte[] a, int off, int len)`
`static int`	`binarySearch(byte[][] keys, int base, int nmem, byte[] key)` Binary search on an array whose members are variable length unsigned byte[]s.
`static int`	`bitFlagByteLength(int nbits)` Return the #of bytes required to bit code the specified #of bits.
`static String`	`byteArrToBinaryStr(byte[] zOrderByteArray)` Converts a byte array into a binary string.
`static int`	`byteIndexForBit(long bitIndex)` Return the index of the byte in which the bit with the given index is encoded.
`static boolean`	`bytesEqual(byte[] a, byte[] b)` True iff the two arrays compare as equal.
`static int`	`compareBytes(byte[] a, byte[] b)` Byte-wise comparison of byte[]s (the arrays are treated as arrays of unsigned bytes).
`static int`	`compareBytesWithLenAndOffset(int aoff, int alen, byte[] a, int boff, int blen, byte[] b)` Byte-wise comparison of byte[]s (the arrays are treated as arrays of unsigned bytes).
`static boolean`	`getBit(byte[] buf, long bitIndex)` Get the value of a bit.
`static int`	`getBits(byte[] a, int off, int len)` Return the n-bit integer corresponding to the inclusive bit range of the byte[].
`static int`	`getBits(int a, int off, int len)` Return the n-bit integer corresponding to the inclusive bit range of the byte[].
`static long`	`getBits64(byte[] a, int off, int len)`
`static long`	`getByteCount(String s)` Decode a string of the form `[0-9]+(k\|kb\|m\|mb\|g\|gb)?`, returning the number of bytes.
`static byte[]`	`getBytes(ByteBuffer buf)` Return the data in the buffer.
`static int`	`getMSBMask(int nbits)` Return a bit mask which reveals only the MSB (Most Significant Bits) N bits of an int32 value.
`static byte[]`	`getPrefix(byte[] a, byte[] b)` Return a new byte[] containing the leading bytes in common between two byte[]s.
`static int`	`getPrefixLength(byte[] a, byte[] b)` Return the #of leading bytes in common.
`static byte[]`	`getSeparatorKey(byte[] givenKey, byte[] priorKey)` The keys in the nodes of a btree are known as separator keys.
`static boolean`	`loadJNILibrary()` Attempt to load the JNI library.
`static void`	`main(String[] args)` This method forces the load of the JNI library and tries to execute the JNI methods.
`static int`	`maskOffLSB(int h, int nbits)` Mask off all but the LSB nbits of the hash value.
`static int`	`maskOffMSB(int h, int nbits)` Mask off all but the MSB nbits of the hash value and shift them down such that the masked bits appear at bits (nbits:0] of the returned value.
`static int`	`optGetBits(byte[] a, int off, int len)` Some benchmarks seem to indicate that altGetBits32 is faster than getBits for smaller byte counts.
`static void`	`printHexString(StringBuilder sb, String hexData)` Formats hex dta into 64 byte rows.
`static boolean`	`rangeCheck(byte[] key, byte[] fromKey, byte[] toKey)` Return `true` if the key lies inside of the optional half-open range constraint.
`static boolean`	`setBit(byte[] buf, long bitIndex, boolean value)` Set the value of a bit - this is NOT thread-safe (contention for the byte in the backing buffer can cause lost updates).
`static byte[]`	`successor(byte[] key)` Computes the successor of a variable length byte array by appending a unsigned zero(0) byte to the end of the array.
`static byte[]`	`toArray(ByteBuffer b)` Return a byte[] having the data in the `ByteBuffer` from the `Buffer.position()` to the `Buffer.limit()`.
`static byte[]`	`toArray(ByteBuffer b, boolean forceCopy, byte[] dst)` Return a byte[] having the data in the `ByteBuffer` from the `Buffer.position()` to the `Buffer.limit()`.
`static String`	`toBitString(byte[] b)` Return the binary representation of the unsigned byte[].
`static String`	`toHexString(byte[] buf)` Utility to convert a byte array to a hex string.
`static String`	`toHexString(byte[] buf, int n)` Utility to display byte array of maximum i bytes as hexString.
`static String`	`toHexString(int[] ibuf)` Utility to convert an int array to a hex string
`static String`	`toString(byte[] key)` Formats a key as a series of comma delimited unsigned bytes.
`static String`	`toString(byte[][] data)` Formats the data into a `String`.
`static String`	`toString(byte[] key, int off, int len)` Formats a key as a series of comma delimited unsigned bytes.
`static int`	`withinByteIndexForBit(long bitIndex)` Return the offset within the byte in which the bit is coded of the bit (this is just the remainder `bitIndex % 8`).

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - EMPTY
```
public static final byte[] EMPTY
```
    An empty byte[].
  - EMPTY2
```
public static final byte[][] EMPTY2
```
    An empty byte[][].
  - minlen
```
public static final int minlen
```
    JNI routines are not invoked unless we will compare byte[]s with at least this many potential bytes to compare (the actual# may be much less of course since comparisons may fail fast).
    
    See Also:
    Constant Field Values
- Constructor Detail
  - BytesUtil
```
public BytesUtil()
```
- Method Detail
  - loadJNILibrary
```
public static boolean loadJNILibrary()
```
    Attempt to load the JNI library.
    Note: this is done automatically if the optional boolean system property com.bigdata.btree.BytesUtil.jni=true is specified, e.g., using
```
    java -Dcom.bigdata.btree.BytesUtil.jni=true ...
 
```
    Returns:
    True iff the JNI library was successfully linked.
  - bytesEqual
```
public static final boolean bytesEqual(byte[] a,
                 byte[] b)
```
    True iff the two arrays compare as equal. This is somewhat optimized in that it tests the array lengths first, assumes that it is being used on sorted data and therefore compares the last bytes first, and does not convert the bytes to unsigned integers before testing for equality.
    
    Parameters:
    a - A byte[].
    b - Another byte[].
    
    Returns:
    If the two arrays have the same reference (including null) or if they have the same data.
  - compareBytes
```
public static final int compareBytes(byte[] a,
               byte[] b)
```
    Byte-wise comparison of byte[]s (the arrays are treated as arrays of unsigned bytes).
    
    Parameters:
    a - A byte[].
    b - A byte[].
    
    Returns:
    a negative integer, zero, or a positive integer if the first argument is less than, equal to, or greater than the second.
  - compareBytesWithLenAndOffset
```
public static final int compareBytesWithLenAndOffset(int aoff,
                               int alen,
                               byte[] a,
                               int boff,
                               int blen,
                               byte[] b)
```
    Byte-wise comparison of byte[]s (the arrays are treated as arrays of unsigned bytes).
    
    Parameters:
    aoff - The offset into a at which the comparison will begin.
    alen - The #of bytes in a to consider starting at aoff.
    a - A byte[].
    boff - The offset into b at which the comparison will begin.
    blen - The #of bytes in b to consider starting at boff.
    b - A byte[].
    
    Returns:
    a negative integer, zero, or a positive integer as the first argument is less than, equal to, or greater than the second.
  - getPrefixLength
```
public static final int getPrefixLength(byte[] a,
                  byte[] b)
```
    Return the #of leading bytes in common. This is used to compute the prefix for a node or leaf, which is formed by the leading bytes in common between the first and last key for a node or leaf.
    
    Parameters:
    a - A variable length unsigned byte array.
    b - A variable length unsigned byte array.
    
    Returns:
    The #of leading bytes in common (aka the index of the first byte in which the two arrays differ, although that index could lie beyond the end of one of the arrays).
  - getPrefix
```
public static final byte[] getPrefix(byte[] a,
               byte[] b)
```
    Return a new byte[] containing the leading bytes in common between two byte[]s. This is often used to compute the minimum length separator key.
    
    Parameters:
    a - A variable length unsigned byte array[].
    b - A variable length unsigned byte array[].
    
    Returns:
    A new byte[] containing the leading bytes in common between the two arrays.
  - successor
```
public static final byte[] successor(byte[] key)
```
    Computes the successor of a variable length byte array by appending a unsigned zero(0) byte to the end of the array.
    
    Parameters:
    key - A variable length unsigned byte array.
    
    Returns:
    A new unsigned byte[] that is the successor of the key.
  - getSeparatorKey
```
public static final byte[] getSeparatorKey(byte[] givenKey,
                     byte[] priorKey)
```
    The keys in the nodes of a btree are known as separator keys. The role of the separator keys is to direct search towards the leaf in which a key exists or would exist by always searching the first child having a separator key that is greater than or equal to the search key.
    
    Separator keys separate leaves and must be choosen with that purpose in mind. The simplest way to choose the separator key is to just take the first key of the leaf - this is always correct. However, shorter separator keys may be choosen by defining the separator key as the shortest key that is less than or equal to the first key of a leaf and greater than the last key of the left sibling of that leaf (that is, the key for the entry that immediately proceeds the first entry on the leaf).
    
    There are several advantages to always choosing the shortest separator key. The original rationale (in "Prefix B-Trees" by Bayer and Unterauer) was to increase the branching factors for fixed size pages. Since we use variable size serialized record, that is not an issue. However, using the shortest separator keys in this implementation provides both smaller serialized records for nodes and faster search since fewer bytes must be tested.
    
    Note that this trick can not be used at higher levels in the btree - separator keys are always formed based on the keys in the leaves and then propagated through the tree.
    
    The rules are simple enough:
    1. The separator contains all bytes in the shared prefix (if any) plus the 1st byte at which the given key differs from the prior key.
    2. If the separator key would equal the given key by value then return the reference to the given key.
    Parameters:
    givenKey - A key.
    priorKey - Another key that proceeds the givenKey.
    
    Returns:
    The shortest key that is less than or equal to givenKey and greater than priorKey. This will be a reference to the givenKey iff that is also the shortest separator.
    
    Throws:
    
    IllegalArgumentException - if either key is null.
    
    IllegalArgumentException - if both keys are the same reference.
    See Also:
    http://portal.acm.org/citation.cfm?doid=320521.320530
  - toString
```
public static final String toString(byte[] key)
```
    Formats a key as a series of comma delimited unsigned bytes.
    
    Parameters:
    key - The key.
    
    Returns:
    The string representation of the array as unsigned bytes.
  - toString
```
public static final String toString(byte[] key,
              int off,
              int len)
```
    Formats a key as a series of comma delimited unsigned bytes.
    
    Parameters:
    key - The key.
    off - The index of the first byte that will be visited.
    len - The #of bytes to visit.
    
    Returns:
    The string representation of the array as unsigned bytes.
  - toString
```
public static String toString(byte[][] data)
```
    Formats the data into a String.
    
    Parameters:
    data - An array of unsigned byte arrays.
  - binarySearch
```
public static final int binarySearch(byte[][] keys,
               int base,
               int nmem,
               byte[] key)
```
    Binary search on an array whose members are variable length unsigned byte[]s.
    
    Parameters:
    keys - The buffer.
    base - The offset of the base of the array within the buffer.
    nmem - The #of members in the array. When [nmem == 0], the array is empty.
    key - The key for the search.
    
    Returns:
    index of the search key, if it is contained in keys; otherwise, (-(insertion point) - 1). The insertion point is defined as the point at which the key would be inserted into the array of keys. Note that this guarantees that the return value will be >= 0 if and only if the key is found.
  - rangeCheck
```
public static final boolean rangeCheck(byte[] key,
                 byte[] fromKey,
                 byte[] toKey)
```
    Return true if the key lies inside of the optional half-open range constraint.
    
    Parameters:
    key - The key.
    fromKey - The inclusive lower bound -or- null if there is no lower bound.
    toKey - The exclusive upper bound -or- null if there is no upper bound.
    
    Returns:
    true unless the key is LT [fromKey] or GTE [toKey].
  - main
```
public static void main(String[] args)
```
    This method forces the load of the JNI library and tries to execute the JNI methods.
    In order to use the JNI library under Windows, you must specify the JNI library location using the PATH environment variable, e.g.,
```
   cd bigdata
   set PATH=%PATH%;lib
   java -cp bin com.bigdata.btree.BytesUtil
 
```
    In order to use the JNI library under un*x, you must specify the JNI library location
```
    java -Djava.library.path=lib com.bigdata.btree.BytesUtil
 
```
    Parameters:
    args -
    
    Throws:
    
    UnsatisfiedLinkError - if the JNI methods can not be resolved.
    
    AssertionError - if the JNI methods do not produce the expected answers.
  - bitFlagByteLength
```
public static final int bitFlagByteLength(int nbits)
```
    Return the #of bytes required to bit code the specified #of bits.
    
    Parameters:
    nbits - The #of bit flags.
    
    Returns:
    The #of bytes required. This will be zero iff nbits is zero.
  - byteIndexForBit
```
public static final int byteIndexForBit(long bitIndex)
```
    Return the index of the byte in which the bit with the given index is encoded.
    
    Parameters:
    bitIndex - The bit index.
    
    Returns:
    The byte index.
  - withinByteIndexForBit
```
public static final int withinByteIndexForBit(long bitIndex)
```
    Return the offset within the byte in which the bit is coded of the bit (this is just the remainder bitIndex % 8).
    Note, the computation of the bit offset is intentionally aligned with OutputBitStream and InputBitStream.
    
    Parameters:
    bitIndex - The bit index into the byte[].
    
    Returns:
    The offset of the bit in the appropriate byte.
  - getBit
```
public static final boolean getBit(byte[] buf,
             long bitIndex)
```
    Get the value of a bit.
    Note, the computation of the bit offset is intentionally aligned with OutputBitStream and InputBitStream.
    
    Parameters:
    bitIndex - The index of the bit.
    
    Returns:
    The value of the bit.
  - setBit
```
public static final boolean setBit(byte[] buf,
             long bitIndex,
             boolean value)
```
    Set the value of a bit - this is NOT thread-safe (contention for the byte in the backing buffer can cause lost updates).
    Note, the computation of the bit offset is intentionally aligned with OutputBitStream and InputBitStream.
    
    Parameters:
    bitIndex - The index of the bit.
    
    Returns:
    The old value of the bit.
  - getMSBMask
```
public static int getMSBMask(int nbits)
```
    Return a bit mask which reveals only the MSB (Most Significant Bits) N bits of an int32 value.
    
    Parameters:
    nbits - The #of bits to be revealed.
    
    Returns:
    The mask.
    
    Throws:
    
    IllegalArgumentException - if nbits is LT ZERO (0).
    
    IllegalArgumentException - if nbits is GT 32.
  - maskOffMSB
```
public static int maskOffMSB(int h,
             int nbits)
```
    Mask off all but the MSB nbits of the hash value and shift them down such that the masked bits appear at bits (nbits:0] of the returned value. This is used to index into a dictionary page based on the revealed bits.
    
    Parameters:
    h - The hash value.
    nbits - The #of bits already accounted for by the path from the root.
    
    Returns:
    The hash value considering only the MSB nbits and shifted down into an nbits integer.
  - maskOffLSB
```
public static int maskOffLSB(int h,
             int nbits)
```
    Mask off all but the LSB nbits of the hash value.
    
    Parameters:
    h - The hash value.
    nbits - The #of LSB bits to be retained.
    
    Returns:
    The LSB nbits. TODO unit test.
  - getBits
```
public static int getBits(byte[] a,
          int off,
          int len)
```
    Return the n-bit integer corresponding to the inclusive bit range of the byte[]. Bit ZERO (0) is the Most Significant Bit (MSB). Bit positions increase from zero up to a.length * 8 - 1. The return value is an int32 and the bit range must not be greater than 32 bits.
    For example, given the following data and the bit range (0,2)
```
 bit index: 01234567 
 ---------+---------- 
 bit value: 10110000
 
```
    TWO (2) bits starting at bit offset ZERO (0) would be extracted and returned as a 2-bit integer. For those data, the return value would be an int32 value whose binary representation was 10 (with leading zeros suppressed).
    Note: This method is design for use with the unsigned byte[] keys in a bigdata hash index. All keys in bigdata are internally represented as unsigned byte[]s, which is why this method accepts a byte[] rather than an long[] for the bits. Also, while the length of an unsigned byte[] key can vary, they are never huge and an int32 value is sufficient to index into the bits in the byte[]. Finally, the return value is an int because it will be used in hash table designs to index into a hash table based on those bits in a hash code key which are masked as relevant to that hash table. 32bits is far more than we will need to index into a hash table. For an 8k page, we might expect a fan out of at most 1024 which is only 10 bits.
    Parameters:
    a - A byte[].
    off - The index of the first bit to be included.
    len - The number of bits to be returned in [0:32]. However, a bit length of zero will always return zero.
    
    Returns:
    The integer extracted from the specified bit range.
  - altGetBits64
```
public static long altGetBits64(byte[] a,
                int off,
                int len)
```
  - altGetBits32
```
public static int altGetBits32(byte[] a,
               int off,
               int len)
```
  - getBits64
```
public static long getBits64(byte[] a,
             int off,
             int len)
```
  - optGetBits
```
public static int optGetBits(byte[] a,
             int off,
             int len)
```
    Some benchmarks seem to indicate that altGetBits32 is faster than getBits for smaller byte counts. OTOH the cost of the redirection may outweigh any benefit.
  - getBits
```
public static int getBits(int a,
          int off,
          int len)
```
    Return the n-bit integer corresponding to the inclusive bit range of the byte[]. Bit ZERO (0) is the Most Significant Bit (MSB). Bit positions increase from zero up to 31. The return value is an int32 and the bit range must not be greater than 32 bits.
    For example, given the following data and the bit range (0,2)
```
 bit index: 01234567 
 ---------+---------- 
 bit value: 10110000
 
```
    TWO (2) bits starting at bit offset ZERO (0) would be extracted and returned as a 2-bit integer. For those data, the return value would be an int32 value whose binary representation was 10 (with leading zeros suppressed).
    Note: This method is design for use in a bigdata hash index having native int32 keys rather than unsigned byte[] keys.
    Parameters:
    a - An integer.
    off - The index of the first bit to be included.
    len - The number of bits to be returned in [0:32]. However, a bit length of zero will always return zero.
    
    Returns:
    The integer extracted from the specified bit range.
  - toBitString
```
public static String toBitString(byte[] b)
```
    Return the binary representation of the unsigned byte[].
    
    Parameters:
    b - The unsigned byte[].
    
    Returns:
    The representation of the bits in that unsigned byte[].
    
    Throws:
    
    IllegalArgumentException - if the argument is null.
  - getByteCount
```
public static long getByteCount(String s)
```
    Decode a string of the form [0-9]+(k|kb|m|mb|g|gb)?, returning the number of bytes. When a suffix indicates kilobytes, megabytes, or gigabytes then the returned value is scaled accordingly. The suffix is NOT case sensitive.
    
    Parameters:
    s - The string value.
    
    Returns:
    The byte count.
    
    Throws:
    
    IllegalArgumentException - if there is a problem with the argument (null, ill-formed, etc).
  - toArray
```
public static byte[] toArray(ByteBuffer b)
```
    Return a byte[] having the data in the ByteBuffer from the Buffer.position() to the Buffer.limit(). The position, limit, and mark are not affected by this operation. When the ByteBuffer has a backing array, the array offset is ZERO (0), and the Buffer.limit() is equal to the Buffer.capacity() then the backing array is returned. Otherwise, a new byte[] is allocated and the data are copied into that byte[], which is then returned.
    
    Parameters:
    b - The ByteBuffer.
    
    Returns:
    The byte[].
  - toArray
```
public static byte[] toArray(ByteBuffer b,
             boolean forceCopy,
             byte[] dst)
```
    Return a byte[] having the data in the ByteBuffer from the Buffer.position() to the Buffer.limit(). The position, limit, and mark are not affected by this operation.
    Under certain circumstances it is possible and may be desirable to return the backing ByteBuffer.array(). This behavior is enabled by forceCopy := false.
    It is possible to return the backing byte[] when the ByteBuffer has a backing array, the array offset is ZERO (0), and the Buffer.limit() is equal to the Buffer.capacity() then the backing array is returned. Otherwise, a new byte[] must be allocated, and the data are copied into that byte[], which may then be returned.
    
    Parameters:
    b - The ByteBuffer.
    forceCopy - When false, the backing array will be returned if possible.
    dst - A byte[] provided by the caller (optional). When non- null and having a length GTE Buffer.remaining(), this array will be preferred to a newly allocated array.
    
    Returns:
    The byte[] having the data. When dst is non- null this MAY be the caller's array. When it is the caller's array, it MAY be larger than the #of bytes actually read.
  - toHexString
```
public static String toHexString(int[] ibuf)
```
    Utility to convert an int array to a hex string
    
    Parameters:
    buf - The data.
    
    Returns:
    The hex string.
  - toHexString
```
public static String toHexString(byte[] buf)
```
    Utility to convert a byte array to a hex string.
    
    Parameters:
    buf - The data.
    
    Returns:
    The hex string.
  - toHexString
```
public static String toHexString(byte[] buf,
                 int n)
```
    Utility to display byte array of maximum i bytes as hexString.
    
    Parameters:
    buf - The data.
    n - The #of bytes to convert.
    
    Returns:
    The hex string.
  - printHexString
```
public static void printHexString(StringBuilder sb,
                  String hexData)
```
    Formats hex dta into 64 byte rows.
    
    Parameters:
    sb - Where to format the data.
    hexData - The data.
  - getBytes
```
public static byte[] getBytes(ByteBuffer buf)
```
    Return the data in the buffer. When possible, the backing array is returned. Otherwise, a new byte[] is allocated, the data are copied into the array, and the new array is returned.
  - byteArrToBinaryStr
```
public static String byteArrToBinaryStr(byte[] zOrderByteArray)
```
    Converts a byte array into a binary string. Useful for debugging.
    
    Parameters:
    zOrderByteArray -
    
    Returns:

Class BytesUtil

Nested Class Summary

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Detail

EMPTY

EMPTY2

minlen

Constructor Detail

BytesUtil

Method Detail

loadJNILibrary

bytesEqual

compareBytes

compareBytesWithLenAndOffset

getPrefixLength

getPrefix

successor

getSeparatorKey

toString

toString

toString

binarySearch

rangeCheck

main

bitFlagByteLength

byteIndexForBit

withinByteIndexForBit

getBit

setBit

getMSBMask

maskOffMSB

maskOffLSB

getBits

altGetBits64

altGetBits32

getBits64

optGetBits

getBits

toBitString

getByteCount

toArray

toArray

toHexString

toHexString

toHexString

printHexString

getBytes

byteArrToBinaryStr