BloomFilterFactory (Blazegraph Database Platform 2.1.5 API)

java.lang.Object
- com.bigdata.btree.BloomFilterFactory

All Implemented Interfaces:

Serializable
```
public class BloomFilterFactory
extends Object
implements Serializable
```
An interface that is used to generate a bloom filter for an AbstractBTree and which allows the caller to specify the expected number of index entries, the desired error rate for the filter at that #of index entries, and the maximum error rate before the bloom filter will be disabled.
While the bloom filter is always a "perfect" fit for an IndexSegment since the #of index entries is known before we build the IndexSegment, that is not true of a BTree. For a BTree, the performance of the bloom filter will degrade as you exceed the #of index entries for which it was provisioned (at a desired error rate). Therefore the factory also specifies the error rate above which the bloom filter will be disabled for a BTree. The factory works backwards from the actual bit length of the bloom filter and the maximum allowed error rate to derive the #of index entries which would achieve that error rate. The bloom filter is disabled when inserts into the BTree reach that threshold #of index entries.
Once a bloom filter has been disabled for a BTree it will not be re-enabled for that BTree. However, in the scale-out architecture the a new mutable BTree is created each time the journal overflows and the data from the historical view is migrated into IndexSegments. Both the new mutable BTree and IndexSegments will have bloom filters configured according to this factory. The bloom filter on the new BTree will operate until the #of index entries in that BTree (not in the index or the index partition) exceeds the threashold at which the bloom filter would be expected to operate with the specified maximum error rate, at which point it will be disabled.

Version:

$Id$

Author:

Bryan Thompson

See Also:
Serialized Form

Field Summary

Fields
Modifier and Type	Field and Description
`static BloomFilterFactory`	`DEFAULT` The recommenced default factory configuration.
`static double`	`DEFAULT_ERROR_RATE` The default target error rate 0.02.
`static double`	`DEFAULT_MAX_ERROR_RATE` The default maximum error rate 0.15.
`static int`	`DEFAULT_N` The default expected #of index entries 1000000.
`int`	`maxN` The maximum #of index entries before the expected performance will be worse than the specified maximum error rate.
`double`	`maxP` The maximum error rate for the bloom filter (it will be disabled for a `BTree` once the bloom filter can be expected to realize this error rate).
`int`	`n` The expected #of index entries.
`double`	`p` The desired error rate for the bloom filter at that #of index entries.

Constructor Summary

Constructors
Constructor and Description
`BloomFilterFactory(int n)` Configuration with the caller specified #of index entries and having a target error rate of 0.02 and a maximum error rate of 0.15.
`BloomFilterFactory(int n, double p, double maxP)` Core impl.

Method Summary

Methods
Modifier and Type Method and Description

BloomFilter newBloomFilter()
Create and return a new (empty) bloom filter for a BTree or IndexSegment.

String toString()
- Methods inherited from class java.lang.Object
  clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait

Methods
Modifier and Type	Method and Description
`BloomFilter`	`newBloomFilter()` Create and return a new (empty) bloom filter for a `BTree` or `IndexSegment`.
`String`	`toString()`

- Field Detail
  - n
```
public final int n
```
    The expected #of index entries.
  - p
```
public final double p
```
    The desired error rate for the bloom filter at that #of index entries.
  - maxP
```
public final double maxP
```
    The maximum error rate for the bloom filter (it will be disabled for a BTree once the bloom filter can be expected to realize this error rate).
  - maxN
```
public final int maxN
```
    The maximum #of index entries before the expected performance will be worse than the specified maximum error rate. A BTree will automatically disable its bloom filter once this many elements have been inserted. In practice, this is only a constraint on scale-up indices. For scale-out indices the maximum applies only the mutable BTree absorbing writes destined for a specific index partition and the IndexSegments will have "perfect fit" bloom filters.
  - DEFAULT_N
```
public static final transient int DEFAULT_N
```
    The default expected #of index entries 1000000.
    
    See Also:
    Constant Field Values
  - DEFAULT_ERROR_RATE
```
public static final transient double DEFAULT_ERROR_RATE
```
    The default target error rate 0.02.
    
    See Also:
    Constant Field Values
  - DEFAULT_MAX_ERROR_RATE
```
public static final transient double DEFAULT_MAX_ERROR_RATE
```
    The default maximum error rate 0.15.
    
    See Also:
    Constant Field Values
  - DEFAULT
```
public static final transient BloomFilterFactory DEFAULT
```
    The recommenced default factory configuration. This configuration is designed to provide a bloom filter with good performance up to ~2M index entries and then shut off automatically. This works for both the scale-up case (good for small indices and turns off for large indices) and for scale-out (just about right sized for scale-out, depending on the split handler).
    The space requirements of a BloomFilter are approximately one byte per index entry for a reasonable false positive rate (aka error rate). Use DEFAULT for reasonable defaults. With these values, the bloom filter will be limited to an index with no more than ~2M entries. After than the bloom filter begins to have diminishing results due to an increasing false positive rate and will be automatically disabled. If you increase the expected #of index entries, then you can handle larger indices with the same error rate, but this swiftly gets into diminishing returns again because the latency required to materialize and persist the bloom filter becomes noticeable. Therefore the default configuration for the bloom filter is highly accurate up to 1M index entries (0.02 error rate) and then begins to loose precision. If the index grows large enough then the expected error rate of the bloom filter will degrade to the point where it will be automatically disabled.
    While this places an effective upper bound on the size of a scale-up index that can make effective use of a bloom filter, scale-out indices DO NOT share the same limitation. Each time a scale-out index is partitioned, it is broken into a mutable BTree for absorbing writes for an index partition and zero or more IndexSegments. Each time an overflow occurs, index entries are migrated to the IndexSegments, so the #of index entries in the BTree is never that large. Finally, #of index entries in an IndexSegment is always known when the IndexSegment is built, so the BloomFilter for an IndexSegment is always a perfect fit.
- Constructor Detail
  - BloomFilterFactory
```
public BloomFilterFactory(int n)
```
    Configuration with the caller specified #of index entries and having a target error rate of 0.02 and a maximum error rate of 0.15.
    
    Parameters:
    n - The expected #of index entries for a BTree (this value is ignored for IndexSegments).
    
    Throws:
    
    IllegalArgumentException - if n is non-positive.
  - BloomFilterFactory
```
public BloomFilterFactory(int n,
                  double p,
                  double maxP)
```
    Core impl.
    
    Parameters:
    n - The expected #of index entries (this value is ignored for IndexSegments).
    p - The desired error rate for the bloom filter at that #of index entries (or at the actual #of index entries for an IndexSegment).
    maxP - The maximum error rate for the bloom filter for a BTree (it will be disabled for a BTree once the bloom filter can be expected to realize this error rate).
    
    Throws:
    
    IllegalArgumentException - if n is non-positive.
    
    IllegalArgumentException - unless p lies in (0:1].
    
    IllegalArgumentException
    
    IllegalArgumentException - unless maxP lies in (p:1].
- Method Detail
  - newBloomFilter
```
public BloomFilter newBloomFilter()
```
    Create and return a new (empty) bloom filter for a BTree or IndexSegment.
    The bloom filter can be provisioned with reference to /architecture/bloomfilter.xls. Let p be the probability of a false positive (aka the error rate) and n be the #of index entries. The values p=.02 and n=1M result in a space requirement of 8656171 bits or approximately 1mb and uses ~ 8.6 bits per element. In order to achieve the same error rate with n=10M, the size requirements of the bloom filter will be approximately 10mb since the filter will still use ~ 8.6 bits per element for that error rate, or roughly one byte per index entry.
    The maximum record length for the backing store can easily be exceeded by a large bloom filter, large bloom filters will require significant time to read/write during checkpoints, and large bloom filters will be written each time a dirty index is involved in a commit.
    While the scale-out architecture uses group commits and hence can be expected to perform more commits during a bulk data load, it also uses one bloom filter per AbstractBTree so the #of index entries is bounded by the configured ISimpleSplitHandler in an application dependent manner. On the other hand, the bloom filter performance will degrade as a scale-up index grows in size since the bloom filter can not be made very large for a scale-up store (the maximum record size is reduced in order to permit more records) and large indices will therefore experience increasing false positive rates as they grow.
    Whether or not a bloom filter is useful depends on the application. The bloom filter will ONLY be used for point tests such as contains(), lookup(), or an IAccessPath that is fully bound and therefore can benefit from testing a bloom filter.
  - toString
```
public String toString()
```
    Overrides:
    
    toString in class Object

Class BloomFilterFactory

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Detail

n

p

maxP

maxN

DEFAULT_N

DEFAULT_ERROR_RATE

DEFAULT_MAX_ERROR_RATE

DEFAULT

Constructor Detail

BloomFilterFactory

BloomFilterFactory

Method Detail

newBloomFilter

toString