public static class SampleIndex.BitVectorOffsetSampler extends Object implements SampleIndex.IOffsetSampler
This approach is based on a bit vector. If the bit is already marked,
then the offset has been used and we scan until we find the next free
offset. This requires [rangeCount] bits, so it works well when the
rangeCount of the key range is small. For example, a range count of 32k
requires a 4kb bit vector, which is quite manageable.
FIXME There is something broken in this class, probably an assumption I
have about how LongArrayBitVector
works. If you enable it in the
stress test, it will fail.
Constructor and Description |
---|
SampleIndex.BitVectorOffsetSampler() |
Modifier and Type | Method and Description |
---|---|
long[] |
getOffsets(long seed,
int limit,
long fromIndex,
long toIndex)
Return an array of tuple indices which may be used to sample a key
range of some index.
|
public long[] getOffsets(long seed, int limit, long fromIndex, long toIndex)
Note: The caller must stop when it runs out of offsets, not when the limit is satisfied, as there will be fewer offsets returned when the half open range is smaller than the limit.
Note: The utility of this class is limited to smaller range counts (32k is fine, 2x or 4k that is also Ok) so it will reject anything with a very large range count.
getOffsets
in interface SampleIndex.IOffsetSampler
seed
- The seed for the random number generator -or- ZERO (0L)
for a random seed. A non-zero value may be used to create
a repeatable sample.limit
- The maximum #of tuples to sample.fromIndex
- The inclusive lower bound.toIndex
- The exclusive upper bound0UnsupportedOperationException
- if the rangeCount is GT Integer.MAX_VALUE
Copyright © 2006–2019 SYSTAP, LLC DBA Blazegraph. All rights reserved.