public interface IKeyBuilder extends ISortKeyBuilder<Object>, IManagedByteArray
Interface for building up variable unsigned byte[]
keys from
one or more primitive data types values and/or Unicode strings. An instance
of this interface may be reset()
and reused to encode a series of
keys.
A sort key is an unsigned byte[] that preserves the total order of the
original data. Sort keys may potentially be formed from multiple fields but
field markers do not appear within the resulting sort key. While the original
values can be extracted from sort keys (this is true of all the fixed length
fields, such as int, long, float, or double) they can not be extracted from
Unicode variable length fields (the collation ordering for a Unicode string
depends on the Locale
, the collation strength, and the decomposition
mode and is a non-reversable operation).
Factory methods are defined by KeyBuilder
for obtaining instances of
this interface that optionally support Unicode. Instances may be created for
a given Locale
, collation strength, decomposition mode, etc.
The ICU library supports generation of compressed Unicode sort keys and is
used by default when available. The JDK java.text
package also
supports the generation of Unicode sort keys, but it does NOT produce
compressed sort keys. The resulting sort keys are therefore (a) incompatible
with those produced by the ICU library and (b) much larger than those
produced by the ICU library.
Support for Unicode MAY be disabled using KeyBuilder.Options.COLLATOR
, by using
KeyBuilder.newInstance()
or another factory method that does not
enable Unicode support, or by using one of the KeyBuilder
constructors that does not support Unicode.
Multi-field keys in which variable length fields are embedded within the key
present a special problem. Any run of fixed length fields can be compared as
unsigned byte[]s. Likewise, any any key with a fixed length prefix (including
zero) but a variable length field in its tail can also be compared directly
as unsigned byte[]s. However, the introduction of a variable length field
into any non-terminal position in a multi-field key must be handled specially
since simple concatenation of the field keys will NOT produce the correct
total ordering. (This is why SQL requires that text fields compare as if they
were padded out with ASCII blanks (0x20) to some maximum length for the
field.) A utility method exists specifically for this purpose - see
appendText(String, boolean, boolean)
.
KeyBuilder.newInstance()
,
KeyBuilder.newUnicodeInstance()
,
KeyBuilder.newUnicodeInstance(Properties)
,
SuccessorUtil
Modifier and Type | Field and Description |
---|---|
static int |
maxlen
The maximum length of a variable length text field is
65535 (pow(2,16)-1 ). |
Modifier and Type | Method and Description |
---|---|
IKeyBuilder |
append(BigDecimal d)
Encode a
BigDecimal into an unsigned byte[] and append it into
the key buffer. |
IKeyBuilder |
append(BigInteger i)
Encode a
BigInteger into an unsigned byte[] and append it into
the key buffer. |
IKeyBuilder |
append(byte b)
Appends a byte - the byte is treated as an
unsigned value. |
IKeyBuilder |
append(byte[] a)
Appends an array of bytes - the bytes are treated as
unsigned values. |
IKeyBuilder |
append(byte[] a,
int off,
int len)
Append len bytes starting at off in a to the key
buffer - the bytes are treated as
unsigned values. |
IKeyBuilder |
append(double d)
Appends a double precision floating point value by first converting it
into a signed long integer using
Double.doubleToLongBits(double) ,
converting that values into a twos-complement number and then appending
the bytes in big-endian order into the key buffer. |
IKeyBuilder |
append(float f)
Appends a single precision floating point value by first converting it
into a signed integer using
Float.floatToIntBits(float)
converting that values into a twos-complement number and then appending
the bytes in big-endian order into the key buffer. |
IKeyBuilder |
append(int v)
Appends a signed integer to the key by first converting it to a
lexiographic ordering as an unsigned integer and then appending it into
the buffer as 4 bytes using a big-endian order.
|
IKeyBuilder |
append(long v)
Appends a signed long integer to the key by first converting it to a
lexiographic ordering as an unsigned long integer and then appending it
into the buffer as 8 bytes using a big-endian order.
|
IKeyBuilder |
append(Object val)
Append the value to the buffer, encoding it as appropriate based on the
class of the object.
|
IKeyBuilder |
append(short v)
Appends a signed short integer to the key by first converting it to a
two-complete representation supporting unsigned byte[] comparison and
then appending it into the buffer as 2 bytes using a big-endian order.
|
IKeyBuilder |
append(String s)
Encodes a Unicode string using the configured
KeyBuilder.Options.COLLATOR
and appends the resulting sort key to the buffer (without a trailing nul
byte). |
IKeyBuilder |
append(UUID uuid)
Appends the UUID to the key using the MSB and then the LSB (this
preserves the natural order imposed by
UUID.compareTo(UUID) ). |
IKeyBuilder |
appendASCII(String s)
Encodes a unicode string by assuming that its contents are ASCII
characters.
|
IKeyBuilder |
appendNul()
Append an unsigned zero byte to the key.
|
IKeyBuilder |
appendSigned(byte v)
Converts the signed byte to an unsigned byte and appends it to the key.
|
IKeyBuilder |
appendText(String text,
boolean unicode,
boolean successor)
Encodes a variable length text field into the buffer.
|
byte[] |
array()
The backing byte[] WILL be transparently replaced if the buffer capacity
is extended.
|
long[] |
fromZOrder(int numDimensions)
Inverts method above in the sense that it interprets the buffer as
a zOrderString and returns an array of long values of size numDimensions,
reflecting the individual components of the z-order string.
|
byte[] |
getKey()
Return the encoded key.
|
boolean |
isUnicodeSupported()
Return
true iff Unicode is supported by this object
(returns false if only ASCII support is configured). |
int |
len()
The length of the slice is number of bytes written onto the backing
byte[].
|
int |
off()
The offset of the slice into the backing byte[] is always zero.
|
IKeyBuilder |
reset()
Reset the key length to zero before building another key.
|
byte[] |
toByteArray()
An alias for
getKey() . |
byte[] |
toZOrder(int numDimensions)
Converts the key into a z-order byte array, assuming numDimensions components
of type Long (i.e., 64bit each).
|
getSortKey
capacity, ensureCapacity, ensureFree
static final int maxlen
65535
(pow(2,16)-1
).
Note: This restriction only applies to multi-field keys where the text field appears in a non-terminal position within the key - that is as encoded by . When a text field appears in such a non-terminal position trailing pad characters are used to maintain lexiographic ordering over the multi-field key.
byte[] array()
array
in interface IByteArraySlice
int off()
IByteArraySlice.array()
.off
in interface IByteArraySlice
int len()
reset()
. The length of the slice in the IByteArraySlice.array()
.
Note: IByteArraySlice.len()
has different semantics for some
concrete implementations. ByteArrayBuffer.len()
always returns
the capacity of the backing byte[] while ByteArrayBuffer.pos()
returns the #of bytes written onto the backing buffer. In contrast,
KeyBuilder.len()
is always the #of bytes written onto the backing
buffer.len
in interface IByteArraySlice
byte[] getKey()
Note that keys are donated to the btree so it is important to allocate new keys when running in the same process space. When using a network api, the api provides the necessary decoupling.
BytesUtil.compareBytes(byte[], byte[])
byte[] toByteArray()
getKey()
.
Return a copy of the data in the slice.toByteArray
in interface IByteArraySlice
IKeyBuilder reset()
reset
in interface IManagedByteArray
IKeyBuilder append(String s)
KeyBuilder.Options.COLLATOR
and appends the resulting sort key to the buffer (without a trailing nul
byte).
Note: The SuccessorUtil.successor(String)
of a string is formed
by appending a trailing nul
character. However, since
IDENTICAL
appears to be required to differentiate between
a string and its successor (with the trailing nul
character), you MUST form the sort key first and then its successor (by
appending a trailing nul
). Failure to follow this pattern
will lead to the successor of the key comparing as EQUAL to the key. For
example,
IKeyBuilder keyBuilder = ...; String s = "foo"; byte[] fromKey = keyBuilder.reset().append( s ); // right. byte[] toKey = keyBuilder.reset().append( s ).appendNul(); // wrong! byte[] toKey = keyBuilder.reset().append( s+"\0" );
s
- A string.UnsupportedOperationException
- if Unicode is not supported.SuccessorUtil.successor(String)
,
SuccessorUtil.successor(byte[])
,
FIXME update the javadoc further to speak to handling of multi-field
keys.
IKeyBuilder appendText(String text, boolean unicode, boolean successor)
maxlen
characters. The sort keys for
strings that differ after truncation solely in the #of trailing
#pad
characters will be identical (trailing pad characters are
implicit out to maxlen
characters).
Note: Trailing pad characters are normalized to a representation as a single pad character (1 byte) followed by the #of actual or implied trailing pad characters represented as an unsigned short integer (2 bytes). This technique serves to keep multi-field keys with embedded variable length text fields aligned such that the field following a variable length text field does not bleed into the lexiographic ordering of the variable length text field.
Note: While the ASCII encoding happens to use one byte for each character that is NOT true of the Unicode encoding. The space requirements for the Unicode encoding depend on the text, the Locale, the collator strength, and the collator decomposition mode.
Note: The successor option is designed to encapsulate some
trickiness around forming the successor of a variable length text field
embedded in a multi-field key. In particular, simply appending a
nul
byte will NOT work (it works fine when the text field
is the last field in the key or when it is the only component in the
key). This approach breaks encapsulation of the field boundaries such
that the resulting "successor" is actually ordered before the original
key. This happens because you introduce a 0x0 byte right on the boundary
of the next field, effectively causing the next field to have a smaller
value. Consider the following example (in hex) where "|" represents the
end of the "text" field:
ab cd | 12if you compute the successor by appending a nul byte to the text field you get
ab cd | 00 12which is ordered before the original key!
text
- The text.unicode
- When true the text is interpreted as Unicode according to the
KeyBuilder.Options.COLLATOR
option. Otherwise it is interpreted
as ASCII.successor
- When true, the successor of the text will be encoded.
Otherwise the text will be encoded.IKeyBuilder
.http://www.unicode.org/reports/tr10/tr10-10.html#Interleaved_Levels
boolean isUnicodeSupported()
true
iff Unicode is supported by this object
(returns false
if only ASCII support is configured).IKeyBuilder appendASCII(String s)
Note: This method is potentially much faster than the Unicode aware
append(String)
. However, this method is NOT unicode aware and
non-ASCII characters will not be encoded correctly. This method MUST NOT
be mixed with keys whose corresponding component is encoded by the
unicode aware methods, e.g., append(String)
.
s
- A String containing US-ASCII characters.IKeyBuilder append(byte b)
unsigned
value.append
in interface IManagedByteArray
b
- The byte.IKeyBuilder append(byte[] a)
unsigned
values.append
in interface IManagedByteArray
a
- The array of bytes.IKeyBuilder append(byte[] a, int off, int len)
unsigned
values.append
in interface IManagedByteArray
off
- The offset.len
- The #of bytes to append.a
- The array containing the bytes to append.IKeyBuilder append(double d)
Double.doubleToLongBits(double)
,
converting that values into a twos-complement number and then appending
the bytes in big-endian order into the key buffer.
Note: this converts -0d and +0d to the same key.
d
- The double-precision floating point value.IKeyBuilder append(float f)
Float.floatToIntBits(float)
converting that values into a twos-complement number and then appending
the bytes in big-endian order into the key buffer.
Note: this converts -0f and +0f to the same key.
f
- The single-precision floating point value.IKeyBuilder append(UUID uuid)
UUID.compareTo(UUID)
).uuid
- The UUID.IKeyBuilder append(long v)
IKeyBuilder append(int v)
IKeyBuilder append(short v)
IKeyBuilder appendSigned(byte v)
v
- The signed byte.IKeyBuilder appendNul()
IKeyBuilder append(BigInteger i)
BigInteger
into an unsigned byte[] and append it into
the key buffer.
The encoding is a 2 byte run length whose leading bit is set iff the
BigInteger
is negative followed by the byte[]
as
returned by BigInteger.toByteArray()
.
The
- BigInteger
value.IKeyBuilder append(BigDecimal d)
BigDecimal
into an unsigned byte[] and append it into
the key buffer.The
- BigDecimal
value.IKeyBuilder append(Object val)
UUID
and Unicode String
s.val
- The value.IllegalArgumentException
- if val is null
.UnsupportedOperationException
- if val is an instance of an unsupported class.byte[] toZOrder(int numDimensions)
baseSize
- numDimensions
- long[] fromZOrder(int numDimensions)
size
- numDimensions
- Copyright © 2006–2019 SYSTAP, LLC DBA Blazegraph. All rights reserved.