public class IVUnicode extends Object
IV
s having inline Unicode data.
IVs must be able to report their correct mutual order. This means that the
Java String
must be given the same order as the encoded Unicode
representation. Since we must include the #of bytes in the IV
representation, this means that we wind up with a length prefix followed by
some representation of the character data. This can not be consistent with
the code point ordering imposed by String.compareTo(String)
.
Therefore, the IVUnicode.IVUnicodeComparator
is used to make the ordering over
the String
data consistent with the encoded representation of that
data.
Note: This is not the only way to solve the problem. We could also have
generated the encoded representation from any IV
having inline
Unicode data each time we need to compare two IV
s, but that could
turn into a lot of overhead.
Note: This does not attempt to make the Unicode representation "tight" and is not intended to handle very large Unicode strings. Large Unicode data in the statement indices causes them to bloat and has a negative impact on the overall system performance. The use case for inline Unicode data is when the data are small enough that they are worth inserting into the statement indices rather than indirecting through the TERM2ID/ID2TERM indices. Large RDF Values should always be inserted into the BLOBS index which is designed for that purpose.
Modifier and Type | Class and Description |
---|---|
static class |
IVUnicode.IVUnicodeComparator
|
Constructor and Description |
---|
IVUnicode() |
Modifier and Type | Method and Description |
---|---|
static int |
byteLengthUnicode(String s)
Return the byte length of the serialized representation of a unicode
string.
|
static byte[] |
encode1(String s)
Encode a Unicode string.
|
public static byte[] encode1(String s)
s
- The string.public static int byteLengthUnicode(String s)
s
- The string.Copyright © 2006–2019 SYSTAP, LLC DBA Blazegraph. All rights reserved.