public class FrontCodedStringList extends it.unimi.dsi.fastutil.objects.AbstractObjectList<MutableString> implements Serializable
This class stores a list of strings using front-coding compression (of course,
the compression will be reasonable only if the list is sorted, but you could
also use instances of this class just as a handy way to manage a large
amount of strings). It implements an immutable ObjectList
that returns the i-th
string (as a MutableString
) when the get(int)
method is
called with argument i. The returned mutable string may be freely
modified.
As a commodity, this class provides a main method that reads from standard input a sequence of newline-separated words, and writes a corresponding serialized front-coded string list.
To store the list of strings, we use either a UTF-8 coded ByteArrayFrontCodedList
, or a CharArrayFrontCodedList
, depending on
the value of the utf8
parameter at creation time. In the first case, if the
strings are ASCII-oriented the resulting array will be much smaller, but
access times will increase manifold, as each string must be UTF-8 encoded
before being returned.
Modifier and Type | Field and Description |
---|---|
protected it.unimi.dsi.fastutil.bytes.ByteArrayFrontCodedList |
byteFrontCodedList
The underlying
ByteArrayFrontCodedList , or null . |
protected it.unimi.dsi.fastutil.chars.CharArrayFrontCodedList |
charFrontCodedList
The underlying
CharArrayFrontCodedList , or null . |
static long |
serialVersionUID |
protected boolean |
utf8
Whether this front-coded list is UTF-8 encoded.
|
Constructor and Description |
---|
FrontCodedStringList(Collection<? extends CharSequence> c,
int ratio,
boolean utf8)
Creates a new front-coded string list containing the character sequences contained in the given collection.
|
FrontCodedStringList(Iterator<? extends CharSequence> words,
int ratio,
boolean utf8)
Creates a new front-coded string list containing the character sequences returned by the given iterator.
|
Modifier and Type | Method and Description |
---|---|
protected static char[] |
byte2Char(byte[] a,
char[] s) |
protected static int |
countUTF8Chars(byte[] a) |
MutableString |
get(int index)
Returns the element at the specified position in this front-coded as a mutable string.
|
void |
get(int index,
MutableString s)
Returns the element at the specified position in this front-coded list by storing it in a mutable string.
|
it.unimi.dsi.fastutil.objects.ObjectListIterator<MutableString> |
listIterator(int k) |
static void |
main(String[] arg) |
int |
ratio()
Returns the ratio of the underlying front-coded list.
|
int |
size() |
boolean |
utf8()
Returns whether this front-coded string list is storing its strings as UTF-8 encoded bytes.
|
add, add, addAll, addAll, addElements, addElements, compareTo, contains, ensureIndex, ensureRestrictedIndex, equals, getElements, hashCode, indexOf, iterator, lastIndexOf, listIterator, objectListIterator, objectListIterator, objectSubList, peek, pop, push, remove, removeElements, set, size, subList, top, toString
containsAll, isEmpty, objectIterator, removeAll, retainAll, toArray, toArray
clear, remove
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
clear, containsAll, isEmpty, remove, removeAll, retainAll, toArray, toArray
public static final long serialVersionUID
protected final it.unimi.dsi.fastutil.bytes.ByteArrayFrontCodedList byteFrontCodedList
ByteArrayFrontCodedList
, or null
.protected final it.unimi.dsi.fastutil.chars.CharArrayFrontCodedList charFrontCodedList
CharArrayFrontCodedList
, or null
.protected final boolean utf8
public FrontCodedStringList(Iterator<? extends CharSequence> words, int ratio, boolean utf8)
words
- an iterator returning character sequences.ratio
- the desired ratio.utf8
- if true, the strings will be stored as UTF-8 byte arrays.public FrontCodedStringList(Collection<? extends CharSequence> c, int ratio, boolean utf8)
c
- a collection containing character sequences.ratio
- the desired ratio.utf8
- if true, the strings will be stored as UTF-8 byte arrays.public boolean utf8()
public int ratio()
public MutableString get(int index)
get
in interface List<MutableString>
index
- an index in the list.MutableString
that will contain the string at the specified position. The string may be freely modified.public void get(int index, MutableString s)
index
- an index in the list.s
- a mutable string that will contain the string at the specified position.protected static int countUTF8Chars(byte[] a)
protected static char[] byte2Char(byte[] a, char[] s)
public it.unimi.dsi.fastutil.objects.ObjectListIterator<MutableString> listIterator(int k)
listIterator
in interface it.unimi.dsi.fastutil.objects.ObjectList<MutableString>
listIterator
in interface List<MutableString>
listIterator
in class it.unimi.dsi.fastutil.objects.AbstractObjectList<MutableString>
public int size()
size
in interface Collection<MutableString>
size
in interface List<MutableString>
size
in class AbstractCollection<MutableString>
public static void main(String[] arg) throws IOException, com.martiansoftware.jsap.JSAPException, NoSuchMethodException
IOException
com.martiansoftware.jsap.JSAPException
NoSuchMethodException
Copyright © 2006–2019 SYSTAP, LLC DBA Blazegraph. All rights reserved.