public class BloomFilter extends Object implements IBloomFilter, Externalizable
Modifier and Type | Class and Description |
---|---|
static class |
BloomFilter.BloomFilterCounters
Counters for bloom filter access and notification of false positives.
|
Modifier and Type | Field and Description |
---|---|
BloomFilter.BloomFilterCounters |
counters
Counters are not persistent.
|
Constructor and Description |
---|
BloomFilter()
De-serialization ctor.
|
BloomFilter(int n,
double p)
Ctor specifies
maxN := n * 2 . |
BloomFilter(int n,
double p,
int maxN) |
Modifier and Type | Method and Description |
---|---|
boolean |
add(byte[] key)
Adds the key to the filter.
|
boolean |
contains(byte[] key)
Test the filter for the key a
true return DOES NOT
guarantee that the key has been added to the filter while a
false return guarantees that the key HAS NOT been added to
the filter. |
long |
disable()
Disables the bloom filter associated with the index.
|
void |
falsePos()
Notify the bloom filter that a false positive was observed for a key that
for which
IBloomFilter.add(byte[]) reported true (the key was
in fact not in the index). |
long |
getAddr()
Address that can be used to read this object from the store.
|
long |
getBitLength()
The bit length of the filter.
|
static long |
getBitLength(int hashFunctionCount,
int nentries)
Return the bit length required to provision a filter having the specified
#of hash functions and the target capacity.
|
static int |
getEntryCountForErrorRate(int k,
long m,
double p)
This returns the #of index entries at which the bloom filter will have
the specified expected error rate.
|
double |
getErrorRate()
The false positive error rate estimated as
2^-d , where
d is the #of hash functions. |
int |
getHashFunctionCount()
The #of hash functions used by the filter.
|
static int |
getHashFunctionCount(double errorRate)
Return the #of hash functions required to achieve the specified error
rate.
|
int |
getMaxN()
The #of index entries at which the filter will have reached its maximum
error rate (from the ctor).
|
int |
getN()
The expected #of index entries (from the ctor).
|
double |
getP()
The target error rate when there are
getN() index entries (from
the ctor). |
boolean |
isDirty()
Return
true iff the state of the filter has been modified
but not yet written onto the store. |
boolean |
isEnabled()
Return
true unless the bloom filter has been disabled. |
static BloomFilter |
read(IRawStore store,
long addr)
Read a bloom filter record from the store.
|
void |
readExternal(ObjectInput in)
|
String |
toString() |
long |
write(IRawStore store)
Writes the bloom filter on the store and clears the
isDirty()
flag. |
void |
writeExternal(ObjectOutput out) |
public transient BloomFilter.BloomFilterCounters counters
public BloomFilter()
public BloomFilter(int n, double p)
maxN := n * 2
.n
- The expected #of index entries.p
- The target error rate.public BloomFilter(int n, double p, int maxN)
n
- The expected #of index entries.p
- The target error rate.maxN
- The #of index entries at which the filter will have reached
its maximum error rate. (The value is normally calculated by
the BloomFilterFactory
.)IllegalArgumentException
- if n is non-positive.IllegalArgumentException
- unless p lies in (0:1).IllegalArgumentException
- if maxN is LT n.public final int getN()
public final double getP()
getN()
index entries (from
the ctor).
Note: This class does not know the actual false positive error rate.
However, that is tracked by the AbstractBTree.btreeCounters
.
public double getErrorRate()
2^-d
, where
d is the #of hash functions. This will be close to but typically
not exactly the same as the value of getP()
specified to the
ctor.
Note: This class does not know the actual false positive error rate.
However, that is tracked by the AbstractBTree.btreeCounters
.
public final int getMaxN()
public final int getHashFunctionCount()
public final long getBitLength()
public static int getHashFunctionCount(double errorRate)
p
is the probability of a false positive and
d
is the #of hash functions, then
p = pow(2, -d)and
d = ceil(-ln(p) / ln(2))
public static long getBitLength(int hashFunctionCount, int nentries)
bitLength = ceil(nentries * (hashFunctionCount / ln(2)))
hashFunctionCount
- The #of hash functions.nentries
- The target capacity.public static int getEntryCountForErrorRate(int k, long m, double p)
p = (1 - (1 - 1 / m) ˆ kn) ˆ kwhere m is the bit length of the filter, k is the #of hash functions, and n is the #of items that have been inserted into the filter. or approximately
p = (1 - e ˆ -kn / m) ˆ ksolving for
n
we obtain
n = -m ln( 1 - pˆ(1/k) ) / k
k
- The #of hash functions.m
- The bit length of the filter.p
- The expected error rate.http://en.wikipedia.org/wiki/Bloom_filter
public boolean add(byte[] key)
IBloomFilter
add
in interface IBloomFilter
key
- The key.true
iff the filter state was unchanged by the
incorporation of this key into the filter.IllegalStateException
- if the filter has been disable()
dpublic boolean contains(byte[] key)
IBloomFilter
true
return DOES NOT
guarantee that the key has been added to the filter while a
false
return guarantees that the key HAS NOT been added to
the filter.contains
in interface IBloomFilter
key
- The key.true
if the filter has either that key or some key
that is hash equivalent to that key using the hashing function
imposed by the filter; false
iff the filter can
guarantee that the key has not been added to the filter.IllegalStateException
- if the filter has been disable()
dpublic final long getAddr()
Note: This is not a persistent property. However the value is set when the record is read from, or written on, the store.
public static BloomFilter read(IRawStore store, long addr)
store
- the store.addr
- the address of the bloom filter record.public final boolean isDirty()
true
iff the state of the filter has been modified
but not yet written onto the store. The filter is presumed clean when
created or when it is read from the store. The filter will remain clean
until add(byte[])
returns true
, indicating that
the state of the filter has been changed.public long write(IRawStore store)
isDirty()
flag.
Note: This also sets the address on addr
as a side-effect, but
the address is NOT written into the store since it is not available until
after the record has been serialized.
Note: This method DOES NOT test isDirty()
.
store
- The store.IllegalStateException
- if the filter is not dirty.IllegalStateException
- if the filter is not enabled.public final long disable()
Note: This method is invoked by AbstractBTree.insert(byte[], byte[])
when
the #of index entries exceeds the maximum allowed for the bloom filter.
At that point the BTree
is dirty. Checkpoint
will notice
that the bloom filter is disabled will write its address as 0L so the
bloom filter is no longer reachable from the post-checkpoint record.
public final boolean isEnabled()
true
unless the bloom filter has been disabled.
Note: A bloom filter may be disabled is the #of index entries has exceeded the maximum desired error rate for the bloom filter.
public void readExternal(ObjectInput in) throws IOException, ClassNotFoundException
addr
is set to 0L
, the
dirty
flag is cleared, and enabled
flag is set. It's
necessary to override serialization otherwise java will default the
enabled
flag to false
when an object is
de-serialized - whoops!readExternal
in interface Externalizable
in
- IOException
ClassNotFoundException
public void writeExternal(ObjectOutput out) throws IOException
writeExternal
in interface Externalizable
out
- IOException
public void falsePos()
IBloomFilter
IBloomFilter.add(byte[])
reported true
(the key was
in fact not in the index). This method exists solely for reporting and
tracking the actual error rate of the bloom filter.falsePos
in interface IBloomFilter
Copyright © 2006–2019 SYSTAP, LLC DBA Blazegraph. All rights reserved.