public class BytesUtil extends Object
Comparison operations that accept a starting offset are used when the byte[]s are known to share a leading prefix that may be skipped during comparison.
Comparison operations that accept a starting offset and length are used when immutable keys are stored in a single byte[] and an index into starting positions in that byte[] is maintained.
JNI methods are provided for unsigned byte[] comparison. However, note that the JNI methods do not appear to be as fast as the pure Java methods - presumably because of the overhead of going from Java to C. In order to execute using the JNI methods you MUST define the optional boolean system property, e.g.,
java -Dcom.bigdata.btree.BytesUtil.jni=true ...See BytesUtil.c in this package for instructions on compiling the JNI methods. See
main(String[])
which provides a test for the JNI integration and
some pointers on how to get this running on your platform.Modifier and Type | Class and Description |
---|---|
static class |
BytesUtil.UnsignedByteArrayComparator
Compares two unsigned byte[]s.
|
Modifier and Type | Field and Description |
---|---|
static byte[] |
EMPTY
An empty
byte[] . |
static byte[][] |
EMPTY2
An empty
byte[][] . |
static int |
minlen
JNI routines are not invoked unless we will compare byte[]s with at least
this many potential bytes to compare (the actual# may be much less of
course since comparisons may fail fast).
|
Constructor and Description |
---|
BytesUtil() |
Modifier and Type | Method and Description |
---|---|
static int |
altGetBits32(byte[] a,
int off,
int len) |
static long |
altGetBits64(byte[] a,
int off,
int len) |
static int |
binarySearch(byte[][] keys,
int base,
int nmem,
byte[] key)
Binary search on an array whose members are variable length unsigned
byte[]s.
|
static int |
bitFlagByteLength(int nbits)
Return the #of bytes required to bit code the specified #of bits.
|
static String |
byteArrToBinaryStr(byte[] zOrderByteArray)
Converts a byte array into a binary string.
|
static int |
byteIndexForBit(long bitIndex)
Return the index of the byte in which the bit with the given index is
encoded.
|
static boolean |
bytesEqual(byte[] a,
byte[] b)
True iff the two arrays compare as equal.
|
static int |
compareBytes(byte[] a,
byte[] b)
Byte-wise comparison of byte[]s (the arrays are treated as arrays of
unsigned bytes).
|
static int |
compareBytesWithLenAndOffset(int aoff,
int alen,
byte[] a,
int boff,
int blen,
byte[] b)
Byte-wise comparison of byte[]s (the arrays are treated as arrays of
unsigned bytes).
|
static boolean |
getBit(byte[] buf,
long bitIndex)
Get the value of a bit.
|
static int |
getBits(byte[] a,
int off,
int len)
Return the n-bit integer corresponding to the inclusive bit range of the
byte[].
|
static int |
getBits(int a,
int off,
int len)
Return the n-bit integer corresponding to the inclusive bit range of the
byte[].
|
static long |
getBits64(byte[] a,
int off,
int len) |
static long |
getByteCount(String s)
Decode a string of the form
[0-9]+(k|kb|m|mb|g|gb)? ,
returning the number of bytes. |
static byte[] |
getBytes(ByteBuffer buf)
Return the data in the buffer.
|
static int |
getMSBMask(int nbits)
Return a bit mask which reveals only the MSB (Most Significant Bits) N
bits of an int32 value.
|
static byte[] |
getPrefix(byte[] a,
byte[] b)
Return a new byte[] containing the leading bytes in common between two
byte[]s.
|
static int |
getPrefixLength(byte[] a,
byte[] b)
Return the #of leading bytes in common.
|
static byte[] |
getSeparatorKey(byte[] givenKey,
byte[] priorKey)
The keys in the nodes of a btree are known as separator keys.
|
static boolean |
loadJNILibrary()
Attempt to load the JNI library.
|
static void |
main(String[] args)
This method forces the load of the JNI library and tries to execute the
JNI methods.
|
static int |
maskOffLSB(int h,
int nbits)
Mask off all but the LSB nbits of the hash value.
|
static int |
maskOffMSB(int h,
int nbits)
Mask off all but the MSB nbits of the hash value and shift them
down such that the masked bits appear at bits (nbits:0] of the returned
value.
|
static int |
optGetBits(byte[] a,
int off,
int len)
Some benchmarks seem to indicate that altGetBits32 is faster than getBits
for smaller byte counts.
|
static void |
printHexString(StringBuilder sb,
String hexData)
Formats hex dta into 64 byte rows.
|
static boolean |
rangeCheck(byte[] key,
byte[] fromKey,
byte[] toKey)
Return
true if the key lies inside of the optional
half-open range constraint. |
static boolean |
setBit(byte[] buf,
long bitIndex,
boolean value)
Set the value of a bit - this is NOT thread-safe (contention for the byte
in the backing buffer can cause lost updates).
|
static byte[] |
successor(byte[] key)
Computes the successor of a variable length byte array by appending a
unsigned zero(0) byte to the end of the array.
|
static byte[] |
toArray(ByteBuffer b)
|
static byte[] |
toArray(ByteBuffer b,
boolean forceCopy,
byte[] dst)
|
static String |
toBitString(byte[] b)
Return the binary representation of the unsigned byte[].
|
static String |
toHexString(byte[] buf)
Utility to convert a byte array to a hex string.
|
static String |
toHexString(byte[] buf,
int n)
Utility to display byte array of maximum i bytes as hexString.
|
static String |
toHexString(int[] ibuf)
Utility to convert an int array to a hex string
|
static String |
toString(byte[] key)
Formats a key as a series of comma delimited unsigned bytes.
|
static String |
toString(byte[][] data)
Formats the data into a
String . |
static String |
toString(byte[] key,
int off,
int len)
Formats a key as a series of comma delimited unsigned bytes.
|
static int |
withinByteIndexForBit(long bitIndex)
Return the offset within the byte in which the bit is coded of the bit
(this is just the remainder
bitIndex % 8 ). |
public static final byte[] EMPTY
byte[]
.public static final byte[][] EMPTY2
byte[][]
.public static final int minlen
public static boolean loadJNILibrary()
Note: this is done automatically if the optional boolean system property
com.bigdata.btree.BytesUtil.jni=true
is specified, e.g.,
using
java -Dcom.bigdata.btree.BytesUtil.jni=true ...
public static final boolean bytesEqual(byte[] a, byte[] b)
a
- A byte[].b
- Another byte[].null
) or if they have the same data.public static final int compareBytes(byte[] a, byte[] b)
a
- A byte[].b
- A byte[].public static final int compareBytesWithLenAndOffset(int aoff, int alen, byte[] a, int boff, int blen, byte[] b)
aoff
- The offset into a at which the comparison will begin.alen
- The #of bytes in a to consider starting at aoff.a
- A byte[].boff
- The offset into b at which the comparison will begin.blen
- The #of bytes in b to consider starting at boff.b
- A byte[].public static final int getPrefixLength(byte[] a, byte[] b)
a
- A variable length unsigned byte array.b
- A variable length unsigned byte array.public static final byte[] getPrefix(byte[] a, byte[] b)
a
- A variable length unsigned byte array[].b
- A variable length unsigned byte array[].public static final byte[] successor(byte[] key)
key
- A variable length unsigned byte array.public static final byte[] getSeparatorKey(byte[] givenKey, byte[] priorKey)
The keys in the nodes of a btree are known as separator keys. The role of the separator keys is to direct search towards the leaf in which a key exists or would exist by always searching the first child having a separator key that is greater than or equal to the search key.
Separator keys separate leaves and must be choosen with that purpose in mind. The simplest way to choose the separator key is to just take the first key of the leaf - this is always correct. However, shorter separator keys may be choosen by defining the separator key as the shortest key that is less than or equal to the first key of a leaf and greater than the last key of the left sibling of that leaf (that is, the key for the entry that immediately proceeds the first entry on the leaf).
There are several advantages to always choosing the shortest separator key. The original rationale (in "Prefix B-Trees" by Bayer and Unterauer) was to increase the branching factors for fixed size pages. Since we use variable size serialized record, that is not an issue. However, using the shortest separator keys in this implementation provides both smaller serialized records for nodes and faster search since fewer bytes must be tested.
Note that this trick can not be used at higher levels in the btree - separator keys are always formed based on the keys in the leaves and then propagated through the tree.
The rules are simple enough:
givenKey
- A key.priorKey
- Another key that proceeds the givenKey.IllegalArgumentException
- if either key is null
.IllegalArgumentException
- if both keys are the same reference.http://portal.acm.org/citation.cfm?doid=320521.320530
public static final String toString(byte[] key)
key
- The key.public static final String toString(byte[] key, int off, int len)
key
- The key.off
- The index of the first byte that will be visited.len
- The #of bytes to visit.public static String toString(byte[][] data)
String
.data
- An array of unsigned byte arrays.public static final int binarySearch(byte[][] keys, int base, int nmem, byte[] key)
keys
- The buffer.base
- The offset of the base of the array within the buffer.nmem
- The #of members in the array. When [nmem == 0], the array is
empty.key
- The key for the search.(-(insertion point) - 1)
. The insertion
point is defined as the point at which the key would be inserted
into the array of keys. Note that this guarantees that the return
value will be >= 0 if and only if the key is found.public static final boolean rangeCheck(byte[] key, byte[] fromKey, byte[] toKey)
true
if the key lies inside of the optional
half-open range constraint.key
- The key.fromKey
- The inclusive lower bound -or- null
if there is
no lower bound.toKey
- The exclusive upper bound -or- null
if there is
no upper bound.true
unless the key is LT [fromKey] or GTE
[toKey].public static void main(String[] args)
In order to use the JNI library under Windows, you must specify the JNI library location using the PATH environment variable, e.g.,
cd bigdata set PATH=%PATH%;lib java -cp bin com.bigdata.btree.BytesUtil
In order to use the JNI library under un*x, you must specify the JNI library location
java -Djava.library.path=lib com.bigdata.btree.BytesUtil
args
- UnsatisfiedLinkError
- if the JNI methods can not be resolved.AssertionError
- if the JNI methods do not produce the expected answers.public static final int bitFlagByteLength(int nbits)
nbits
- The #of bit flags.public static final int byteIndexForBit(long bitIndex)
bitIndex
- The bit index.public static final int withinByteIndexForBit(long bitIndex)
bitIndex % 8
).
Note, the computation of the bit offset is intentionally aligned with
OutputBitStream
and InputBitStream
.
bitIndex
- The bit index into the byte[].public static final boolean getBit(byte[] buf, long bitIndex)
Note, the computation of the bit offset is intentionally aligned with
OutputBitStream
and InputBitStream
.
bitIndex
- The index of the bit.public static final boolean setBit(byte[] buf, long bitIndex, boolean value)
Note, the computation of the bit offset is intentionally aligned with
OutputBitStream
and InputBitStream
.
bitIndex
- The index of the bit.public static int getMSBMask(int nbits)
nbits
- The #of bits to be revealed.IllegalArgumentException
- if nbits is LT ZERO (0).IllegalArgumentException
- if nbits is GT 32.public static int maskOffMSB(int h, int nbits)
h
- The hash value.nbits
- The #of bits already accounted for by the path from the root.public static int maskOffLSB(int h, int nbits)
h
- The hash value.nbits
- The #of LSB bits to be retained.public static int getBits(byte[] a, int off, int len)
a.length * 8 - 1
. The return value
is an int32 and the bit range must not be greater than 32 bits.
For example, given the following data and the bit range (0,2)
bit index: 01234567 ---------+---------- bit value: 10110000TWO (2) bits starting at bit offset ZERO (0) would be extracted and returned as a 2-bit integer. For those data, the return value would be an int32 value whose binary representation was
10
(with leading
zeros suppressed).
Note: This method is design for use with the unsigned byte[] keys in a bigdata hash index. All keys in bigdata are internally represented as unsigned byte[]s, which is why this method accepts a byte[] rather than an long[] for the bits. Also, while the length of an unsigned byte[] key can vary, they are never huge and an int32 value is sufficient to index into the bits in the byte[]. Finally, the return value is an int because it will be used in hash table designs to index into a hash table based on those bits in a hash code key which are masked as relevant to that hash table. 32bits is far more than we will need to index into a hash table. For an 8k page, we might expect a fan out of at most 1024 which is only 10 bits.
a
- A byte[].off
- The index of the first bit to be included.len
- The number of bits to be returned in [0:32]. However, a bit
length of zero will always return zero.public static long altGetBits64(byte[] a, int off, int len)
public static int altGetBits32(byte[] a, int off, int len)
public static long getBits64(byte[] a, int off, int len)
public static int optGetBits(byte[] a, int off, int len)
public static int getBits(int a, int off, int len)
31
. The return value is an int32
and the bit range must not be greater than 32 bits.
For example, given the following data and the bit range (0,2)
bit index: 01234567 ---------+---------- bit value: 10110000TWO (2) bits starting at bit offset ZERO (0) would be extracted and returned as a 2-bit integer. For those data, the return value would be an int32 value whose binary representation was
10
(with leading
zeros suppressed).
Note: This method is design for use in a bigdata hash index having native int32 keys rather than unsigned byte[] keys.
a
- An integer.off
- The index of the first bit to be included.len
- The number of bits to be returned in [0:32]. However, a bit
length of zero will always return zero.public static String toBitString(byte[] b)
b
- The unsigned byte[].IllegalArgumentException
- if the argument is null
.public static long getByteCount(String s)
[0-9]+(k|kb|m|mb|g|gb)?
,
returning the number of bytes. When a suffix indicates kilobytes,
megabytes, or gigabytes then the returned value is scaled accordingly.
The suffix is NOT case sensitive.s
- The string value.IllegalArgumentException
- if there is a problem with the argument (null
,
ill-formed, etc).public static byte[] toArray(ByteBuffer b)
ByteBuffer
from the
Buffer.position()
to the Buffer.limit()
. The
position, limit, and mark are not affected by this operation. When the
ByteBuffer
has a backing array, the array offset is ZERO (0), and
the Buffer.limit()
is equal to the
Buffer.capacity()
then the backing array is returned.
Otherwise, a new byte[] is allocated and the data are copied into that
byte[], which is then returned.b
- The ByteBuffer
.public static byte[] toArray(ByteBuffer b, boolean forceCopy, byte[] dst)
ByteBuffer
from the
Buffer.position()
to the Buffer.limit()
. The
position, limit, and mark are not affected by this operation.
Under certain circumstances it is possible and may be desirable to return
the backing ByteBuffer.array()
. This behavior is enabled by
forceCopy := false
.
It is possible to return the backing byte[] when the ByteBuffer
has a backing array, the array offset is ZERO (0), and the
Buffer.limit()
is equal to the Buffer.capacity()
then the backing array is returned. Otherwise, a new byte[] must be
allocated, and the data are copied into that byte[], which may then be
returned.
b
- The ByteBuffer
.forceCopy
- When false
, the backing array will be returned if
possible.dst
- A byte[] provided by the caller (optional). When non-
null
and having a length GTE
Buffer.remaining()
, this array will be preferred
to a newly allocated array.null
this MAY be the caller's array. When it is the
caller's array, it MAY be larger than the #of bytes actually
read.public static String toHexString(int[] ibuf)
buf
- The data.public static String toHexString(byte[] buf)
buf
- The data.public static String toHexString(byte[] buf, int n)
buf
- The data.n
- The #of bytes to convert.public static void printHexString(StringBuilder sb, String hexData)
sb
- Where to format the data.hexData
- The data.public static byte[] getBytes(ByteBuffer buf)
public static String byteArrToBinaryStr(byte[] zOrderByteArray)
zOrderByteArray
- Copyright © 2006–2019 SYSTAP, LLC DBA Blazegraph. All rights reserved.