bloom

package module

v0.0.0-...-6e045c4 Latest Latest Go to latest Published: May 9, 2025 License: BSD-2-Clause, MIT Imports: 9 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/ericvolp12/atomic-bloom

Links

Open Source Insights

README ¶

Atomic Bloom Filters

This library is a fork of Bits and Blooms that uses an alternative backing bitset based on Go's sync/atomic.Int64 rather than a bare slice of integers. This allows for concurrent addition and testing of filters without creating memory safety issues or race conditions by leveraging hardware support for atomic Load and Or operations on Int64s.

A Bloom filter is a concise/compressed representation of a set, where the main requirement is to make membership queries; i.e., whether an item is a member of a set. A Bloom filter will always correctly report the presence of an element in the set when the element is indeed present. A Bloom filter can use much less storage than the original set, but it allows for some 'false positives': it may sometimes report that an element is in the set whereas it is not.

When you construct, you need to know how many elements you have (the desired capacity), and what is the desired false positive rate you are willing to tolerate. A common false-positive rate is 1%. The lower the false-positive rate, the more memory you are going to require. Similarly, the higher the capacity, the more memory you will use. You may construct the Bloom filter capable of receiving 1 million elements with a false-positive rate of 1% in the following manner.

    filter := bloom.NewWithEstimates(1000000, 0.01)

You should call NewWithEstimates conservatively: if you specify a number of elements that it is too small, the false-positive bound might be exceeded. A Bloom filter is not a dynamic data structure: you must know ahead of time what your desired capacity is.

Our implementation accepts keys for setting and testing as []byte. Thus, to add a string item, "Love":

    filter.Add([]byte("Love"))

Similarly, to test if "Love" is in bloom:

    if filter.Test([]byte("Love"))

For numerical data, we recommend that you look into the encoding/binary library. But, for example, to add a uint32 to the filter:

    i := uint32(100)
    n1 := make([]byte, 4)
    binary.BigEndian.PutUint32(n1, i)
    filter.Add(n1)

Godoc documentation: https://pkg.go.dev/github.com/ericvolp12/atomic-bloom

Installation

go get -u github.com/ericvolp12/atomic-bloom

Running all tests

Before committing the code, please check if it passes all tests using (note: this will install some dependencies):

make deps
make qa

Design

A Bloom filter has two parameters: m, the number of bits used in storage, and k, the number of hashing functions on elements of the set. (The actual hashing functions are important, too, but this is not a parameter for this implementation). A Bloom filter is backed by an Atomic BitSet; a key is represented in the filter by setting the bits at each value of the hashing functions (modulo m). Set membership is done by testing whether the bits at each value of the hashing functions (again, modulo m) are set. If so, the item is in the set. If the item is actually in the set, a Bloom filter will never fail (the true positive rate is 1.0); but it is susceptible to false positives. The art is to choose k and m correctly.

In this implementation, the hashing functions used is murmurhash, a non-cryptographic hashing function.

Given the particular hashing scheme, it's best to be empirical about this. Note that estimating the FP rate will clear the Bloom filter.

Goroutine safety

This implementation of Bloom Filters is safe to call from concurrent goroutines so long as your CPU supports Atomic Int64 (Most mainstream x86_64 and ARM systems do).

You can Test from and Add to the same filter to your heart's content.

Documentation ¶

Overview ¶

Package bloom provides data structures and methods for creating Bloom filters.

A Bloom filter is a representation of a set of _n_ items, where the main requirement is to make membership queries; _i.e._, whether an item is a member of a set.

This implementation uses an atomic bitset for thread-safety.

Index ¶

func EstimateFalsePositiveRate(m, k, n uint) (fpRate float64)
func EstimateParameters(n uint, p float64) (m uint, k uint)
func Locations(data []byte, k uint) []uint64
type BloomFilter
type Digest128
- func (d *Digest128) Sum128(pad_tail bool, length uint, tail []byte) (h1, h2 uint64)
- func (d *Digest128) Sum256(data []byte) (hash1, hash2, hash3, hash4 uint64)

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func EstimateFalsePositiveRate ¶

func EstimateFalsePositiveRate(m, k, n uint) (fpRate float64)

EstimateFalsePositiveRate estimates the empirical false positive rate. Uses a temporary filter.

func EstimateParameters ¶

func EstimateParameters(n uint, p float64) (m uint, k uint)

EstimateParameters estimates requirements for m and k.

func Locations ¶

func Locations(data []byte, k uint) []uint64

Locations returns a list of hash locations representing a data item. This function remains independent of the bitset implementation.

Types ¶

type BloomFilter ¶

type BloomFilter struct {
	// contains filtered or unexported fields
}

A BloomFilter is a representation of a set of _n_ items, where the main requirement is to make membership queries; _i.e._, whether an item is a member of a set.

func From ¶

func From(data []int64, k uint) *BloomFilter

From creates a new Bloom filter with len(_data_) * 64 bits and _k_ hashing functions, initialized with the provided data.

func FromWithM ¶

func FromWithM(data []int64, m, k uint) *BloomFilter

FromWithM creates a new Bloom filter with _m_ length, _k_ hashing functions, initialized with the provided data.

func New ¶

func New(m uint, k uint) *BloomFilter

New creates a new Bloom filter with _m_ bits and _k_ hashing functions We force _m_ and _k_ to be at least one to avoid panics.

func NewWithEstimates ¶

func NewWithEstimates(n uint, fp float64) *BloomFilter

NewWithEstimates creates a new Bloom filter for about n items with fp false positive rate

func (*BloomFilter) Add ¶

func (f *BloomFilter) Add(data []byte) *BloomFilter

Add data to the Bloom Filter. Returns the filter (allows chaining)

func (*BloomFilter) AddHash ¶

func (f *BloomFilter) AddHash(h [4]uint64) *BloomFilter

Add precomputed hash values to the Bloom Filter. Returns the filter (allows chaining)

func (*BloomFilter) AddString ¶

func (f *BloomFilter) AddString(data string) *BloomFilter

AddString adds a string to the Bloom Filter.

func (*BloomFilter) ApproximatedSize ¶

func (f *BloomFilter) ApproximatedSize() int64

ApproximatedSize estimates the number of items added to the filter.

func (*BloomFilter) BitSet ¶

func (f *BloomFilter) BitSet() *atomicBitSet

BitSet returns the underlying atomic bitset for this filter.

func (*BloomFilter) Cap ¶

func (f *BloomFilter) Cap() uint

Cap returns the capacity, _m_, of a Bloom filter

func (*BloomFilter) ClearAll ¶

func (f *BloomFilter) ClearAll() *BloomFilter

ClearAll clears all the data in a Bloom filter.

func (*BloomFilter) Copy ¶

func (f *BloomFilter) Copy() *BloomFilter

Copy creates a copy of a Bloom filter.

func (*BloomFilter) Equal ¶

func (f *BloomFilter) Equal(g *BloomFilter) bool

Equal tests for the equality of two Bloom filters

func (*BloomFilter) GobDecode ¶

func (f *BloomFilter) GobDecode(data []byte) error

GobDecode implements gob.GobDecoder interface.

func (*BloomFilter) GobEncode ¶

func (f *BloomFilter) GobEncode() ([]byte, error)

GobEncode implements gob.GobEncoder interface.

func (*BloomFilter) K ¶

func (f *BloomFilter) K() uint

K returns the number of hash functions used in the BloomFilter

func (*BloomFilter) MarshalBinary ¶

func (f *BloomFilter) MarshalBinary() ([]byte, error)

MarshalBinary implements encoding.BinaryMarshaler interface.

func (BloomFilter) MarshalJSON ¶

func (f BloomFilter) MarshalJSON() ([]byte, error)

MarshalJSON implements json.Marshaler interface.

func (*BloomFilter) Merge ¶

func (f *BloomFilter) Merge(g *BloomFilter) error

Merge the data from another Bloom Filter. Returns error if parameters don't match.

func (*BloomFilter) ReadFrom ¶

func (f *BloomFilter) ReadFrom(stream io.Reader) (int64, error)

ReadFrom reads a binary representation of the BloomFilter from an i/o stream.

func (*BloomFilter) Test ¶

func (f *BloomFilter) Test(data []byte) bool

Test returns true if the data is *probably* in the BloomFilter, false otherwise.

func (*BloomFilter) TestAndAdd ¶

func (f *BloomFilter) TestAndAdd(data []byte) bool

TestAndAdd checks membership and adds the data unconditionally. Returns true if the element was *probably* present before adding.

func (*BloomFilter) TestAndAddString ¶

func (f *BloomFilter) TestAndAddString(data string) bool

TestAndAddString is the string version of TestAndAdd.

func (*BloomFilter) TestHash ¶

func (f *BloomFilter) TestHash(h [4]uint64) bool

TestHash returns true if the hash is *probably* in the BloomFilter.

func (*BloomFilter) TestLocations ¶

func (f *BloomFilter) TestLocations(locs []uint64) bool

TestLocations returns true if all locations are set in the BloomFilter.

func (*BloomFilter) TestOrAdd ¶

func (f *BloomFilter) TestOrAdd(data []byte) bool

TestOrAdd checks membership and adds the data only if not present. Returns true if the element was *probably* present before adding. Note: Due to the nature of atomics, this isn't truly conditional on *all* bits being present beforehand if run concurrently. It ensures each bit is set if it wasn't already.

func (*BloomFilter) TestOrAddString ¶

func (f *BloomFilter) TestOrAddString(data string) bool

TestOrAddString is the string version of TestOrAdd.

func (*BloomFilter) TestString ¶

func (f *BloomFilter) TestString(data string) bool

TestString returns true if the string is *probably* in the BloomFilter.

func (*BloomFilter) UnmarshalBinary ¶

func (f *BloomFilter) UnmarshalBinary(data []byte) error

UnmarshalBinary implements encoding.BinaryUnmarshaler interface.

func (*BloomFilter) UnmarshalJSON ¶

func (f *BloomFilter) UnmarshalJSON(data []byte) error

UnmarshalJSON implements json.Unmarshaler interface.

func (*BloomFilter) WriteTo ¶

func (f *BloomFilter) WriteTo(stream io.Writer) (int64, error)

WriteTo writes a binary representation of the BloomFilter to an i/o stream.

type Digest128 ¶

type Digest128 struct {
	// contains filtered or unexported fields
}

Digest128 represents a partial evaluation of a 128 bites hash.

func (*Digest128) Sum128 ¶

func (d *Digest128) Sum128(pad_tail bool, length uint, tail []byte) (h1, h2 uint64)

Sum128 computers two 64-bit hash value. It is assumed that bmix was first called on the data to process complete blocks of 16 bytes. The 'tail' is a slice representing the 'tail' (leftover elements, fewer than 16). If pad_tail is true, we make it seem like there is an extra element with value 1 appended to the tail. The length parameter represents the full length of the data (including the blocks of 16 bytes, and, if pad_tail is true, an extra byte).

func (*Digest128) Sum256 ¶

func (d *Digest128) Sum256(data []byte) (hash1, hash2, hash3, hash4 uint64)

Sum256 will compute 4 64-bit hash values from the input. It is designed to never allocate memory on the heap. So it works without any byte buffer whatsoever. It is designed to be strictly equivalent to

			a1 := []byte{1}
         hasher := murmur3.New128()
         hasher.Write(data) // #nosec
         v1, v2 := hasher.Sum128()
         hasher.Write(a1) // #nosec
         v3, v4 := hasher.Sum128()

See TestHashRandom.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL