public class FrontCodedStringList extends it.unimi.dsi.fastutil.objects.AbstractObjectList<MutableString> implements Serializable
This class stores a list of strings using front-coding compression (of course,
the compression will be reasonable only if the list is sorted, but you could
also use instances of this class just as a handy way to manage a large
amount of strings). It implements an immutable ObjectList that returns the i-th
string (as a MutableString) when the get(int) method is
called with argument i. The returned mutable string may be freely
modified.
As a commodity, this class provides a main method that reads from standard input a sequence of newline-separated words, and writes a corresponding serialized front-coded string list.
To store the list of strings, we use either a UTF-8 coded ByteArrayFrontCodedList, or a CharArrayFrontCodedList, depending on
the value of the utf8 parameter at creation time. In the first case, if the
strings are ASCII-oriented the resulting array will be much smaller, but
access times will increase manifold, as each string must be UTF-8 encoded
before being returned.
| Modifier and Type | Field and Description |
|---|---|
protected it.unimi.dsi.fastutil.bytes.ByteArrayFrontCodedList |
byteFrontCodedList
The underlying
ByteArrayFrontCodedList, or null. |
protected it.unimi.dsi.fastutil.chars.CharArrayFrontCodedList |
charFrontCodedList
The underlying
CharArrayFrontCodedList, or null. |
static long |
serialVersionUID |
protected boolean |
utf8
Whether this front-coded list is UTF-8 encoded.
|
| Constructor and Description |
|---|
FrontCodedStringList(Collection<? extends CharSequence> c,
int ratio,
boolean utf8)
Creates a new front-coded string list containing the character sequences contained in the given collection.
|
FrontCodedStringList(Iterator<? extends CharSequence> words,
int ratio,
boolean utf8)
Creates a new front-coded string list containing the character sequences returned by the given iterator.
|
| Modifier and Type | Method and Description |
|---|---|
protected static char[] |
byte2Char(byte[] a,
char[] s) |
protected static int |
countUTF8Chars(byte[] a) |
MutableString |
get(int index)
Returns the element at the specified position in this front-coded as a mutable string.
|
void |
get(int index,
MutableString s)
Returns the element at the specified position in this front-coded list by storing it in a mutable string.
|
it.unimi.dsi.fastutil.objects.ObjectListIterator<MutableString> |
listIterator(int k) |
static void |
main(String[] arg) |
int |
ratio()
Returns the ratio of the underlying front-coded list.
|
int |
size() |
boolean |
utf8()
Returns whether this front-coded string list is storing its strings as UTF-8 encoded bytes.
|
add, add, addAll, addAll, addElements, addElements, compareTo, contains, ensureIndex, ensureRestrictedIndex, equals, getElements, hashCode, indexOf, iterator, lastIndexOf, listIterator, objectListIterator, objectListIterator, objectSubList, peek, pop, push, remove, removeElements, set, size, subList, top, toStringcontainsAll, isEmpty, objectIterator, removeAll, retainAll, toArray, toArrayclear, removeclone, finalize, getClass, notify, notifyAll, wait, wait, waitclear, containsAll, isEmpty, remove, removeAll, retainAll, toArray, toArraypublic static final long serialVersionUID
protected final it.unimi.dsi.fastutil.bytes.ByteArrayFrontCodedList byteFrontCodedList
ByteArrayFrontCodedList, or null.protected final it.unimi.dsi.fastutil.chars.CharArrayFrontCodedList charFrontCodedList
CharArrayFrontCodedList, or null.protected final boolean utf8
public FrontCodedStringList(Iterator<? extends CharSequence> words, int ratio, boolean utf8)
words - an iterator returning character sequences.ratio - the desired ratio.utf8 - if true, the strings will be stored as UTF-8 byte arrays.public FrontCodedStringList(Collection<? extends CharSequence> c, int ratio, boolean utf8)
c - a collection containing character sequences.ratio - the desired ratio.utf8 - if true, the strings will be stored as UTF-8 byte arrays.public boolean utf8()
public int ratio()
public MutableString get(int index)
get in interface List<MutableString>index - an index in the list.MutableString that will contain the string at the specified position. The string may be freely modified.public void get(int index,
MutableString s)
index - an index in the list.s - a mutable string that will contain the string at the specified position.protected static int countUTF8Chars(byte[] a)
protected static char[] byte2Char(byte[] a,
char[] s)
public it.unimi.dsi.fastutil.objects.ObjectListIterator<MutableString> listIterator(int k)
listIterator in interface it.unimi.dsi.fastutil.objects.ObjectList<MutableString>listIterator in interface List<MutableString>listIterator in class it.unimi.dsi.fastutil.objects.AbstractObjectList<MutableString>public int size()
size in interface Collection<MutableString>size in interface List<MutableString>size in class AbstractCollection<MutableString>public static void main(String[] arg) throws IOException, com.martiansoftware.jsap.JSAPException, NoSuchMethodException
IOExceptioncom.martiansoftware.jsap.JSAPExceptionNoSuchMethodExceptionCopyright © 2006–2019 SYSTAP, LLC DBA Blazegraph. All rights reserved.