bbhash

package module
v0.0.0-...-7358f69 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 31, 2025 License: MIT Imports: 14 Imported by: 0

README

BBHash

BBHash is a fast Go implementation of a minimal perfect hash function for large key sets.

Installing the module for use in your own project

% go get github.com/relab/bbhash

How to use the package

import (
	"fmt"

	"github.com/relab/bbhash"
)

func ExampleBBHash_Find() {
	keys := []uint64{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
	bb, err := bbhash.New(keys, bbhash.Gamma(1.5))
	if err != nil {
		panic(err)
	}
	for _, key := range keys {
		hashIndex := bb.Find(key)
		fmt.Printf("%d, ", hashIndex)
	}
	fmt.Println()
	// Output:
	// 2, 6, 8, 3, 5, 7, 1, 9, 10, 4,
}

Advanced usage

The bbhash.New function takes a slice of keys as its first argument. The keys must be unique and of type uint64. New also takes zero or more bbhash.Option arguments. These are the available options:

Option Description
Gamma(float64) Set the gamma parameter of the BBHash algorithm. Default is 2.0.
InitialLevels(int) Set the initial number of levels in the BBHash algorithm. Default is 32.
Partitions(int) Set the number of partitions to split the keys into and compute parallel.
WithReverseMap() Create a reverse map that allows you to retrieve the key from the hash index.
Parallel() Use parallelism in the BBHash algorithm. Prefer the Partitions option instead.

The options can be combined like this:

bb, err := bbhash.New(keys)
bb, err := bbhash.New(keys, bbhash.Gamma(1.5), bbhash.InitialLevels(64))
bb, err := bbhash.New(keys, bbhash.InitialLevels(20))
bb, err := bbhash.New(keys, bbhash.Gamma(1.5), bbhash.Partitions(4))
bb, err := bbhash.New(keys, bbhash.Gamma(1.5), bbhash.Partitions(4), bbhash.WithReverseMap())

But the following combinations are not supported:

bb, err := bbhash.New(keys, bbhash.Parallel(), bbhash.Partitions(4))
bb, err := bbhash.New(keys, bbhash.Parallel(), bbhash.WithReverseMap())

Credits

Implemented by Hein Meling.

The implementation is mainly inspired by Sudhi Herle's Go implementation. Damian Gryski also has a Go implementation.

The algorithm is described in the paper: Fast and Scalable Minimal Perfect Hashing for Massive Key Sets Antoine Limasset, Guillaume Rizk, Rayan Chikhi, and Pierre Peterlongo. Their C++ implementation is available here.

Documentation

Overview

Package bbhash implements the BBHash algorithm for minimal perfect hash functions.

Index

Examples

Constants

This section is empty.

Variables

View Source
var FastHashFunc = func(buf []byte) uint64 {
	return fast.Hash64(123, buf)
}
View Source
var SHA256HashFunc = func(buf []byte) uint64 {
	h := sha256.New()
	h.Write(buf)
	return binary.LittleEndian.Uint64(h.Sum(nil))
}

Functions

func Keys

func Keys(hashFunc func([]byte) uint64, chunks iter.Seq[[]byte]) []uint64

Keys returns the hashes of the chunks using the provided hash function

func ReadChunks

func ReadChunks(readerInfo io.Reader, bufSz int) iter.Seq[[]byte]

Find the chunks from slow memory Chunks returns smaller chunks of data given a reader with some data already being read and with the buffer size.

Types

type BBHash

type BBHash struct {
	// contains filtered or unexported fields
}

BBHash represents a minimal perfect hash for a set of keys.

func (BBHash) AppendBinary

func (bb BBHash) AppendBinary(buf []byte) (_ []byte, err error)

AppendBinary implements the encoding.BinaryAppender interface.

func (BBHash) BitVectors

func (bb BBHash) BitVectors(varName string) string

BitVectors returns a Go slice for BBHash's per-level bit vectors. This is intended for testing and debugging; no guarantees are made about the format.

func (BBHash) BitsPerKey

func (bb BBHash) BitsPerKey() float64

BitsPerKey returns the number of bits per key in the minimal perfect hash.

func (BBHash) Find

func (bb BBHash) Find(key uint64) uint64

Find returns a unique index representing the key in the minimal hash set.

The return value is meaningful ONLY for keys in the original key set (provided at the time of construction of the minimal hash set).

If the key is in the original key set, the return value is guaranteed to be in the range [1, len(keys)].

If the key is not in the original key set, two things can happen: 1. The return value is 0, representing that the key was not in the original key set. 2. The return value is in the expected range [1, len(keys)], but is a false positive.

Example
package main

import (
	"fmt"

	"github.com/relab/bbhash"
)

func main() {
	keys := []uint64{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
	bb, err := bbhash.New(keys, bbhash.Gamma(1.5))
	if err != nil {
		panic(err)
	}
	for _, key := range keys {
		hashIndex := bb.Find(key)
		fmt.Printf("%d, ", hashIndex)
	}
	fmt.Println()
}
Output:

2, 6, 1, 4, 9, 3, 8, 5, 10, 7,

func (BBHash) Key

func (bb BBHash) Key(index uint64) uint64

Key returns the key for the given index. The index must be in the range [1, len(keys)], otherwise 0 is returned.

func (BBHash) LevelVectors

func (bb BBHash) LevelVectors() [][]uint64

LevelVectors returns a slice representation of the BBHash's per-level bit vectors.

func (BBHash) Levels

func (bb BBHash) Levels() int

Levels returns the number of Levels in the minimal perfect hash.

func (BBHash) MarshalBinary

func (bb BBHash) MarshalBinary() ([]byte, error)

MarshalBinary implements the encoding.BinaryMarshaler interface.

func (BBHash) String

func (bb BBHash) String() string

String returns a string representation of BBHash with overall and per-level statistics.

func (*BBHash) UnmarshalBinary

func (bb *BBHash) UnmarshalBinary(data []byte) error

UnmarshalBinary implements the encoding.BinaryUnmarshaler interface.

type BBHash2

type BBHash2 struct {
	// contains filtered or unexported fields
}

BBHash2 represents a minimal perfect hash for a set of keys.

func New

func New(keys []uint64, opts ...Options) (*BBHash2, error)

New creates a new BBHash2 for the given keys. The keys must be unique. Creation is configured using the provided options. The default options are used if none are provided. Available options include: Gamma, InitialLevels, Partitions, Parallel, and WithReverseMap. With fewer than 1000 keys, the sequential version is always used.

func (BBHash2) AppendBinary

func (b2 BBHash2) AppendBinary(buf []byte) (_ []byte, err error)

AppendBinary implements the encoding.BinaryAppender interface.

func (BBHash2) BitVectors

func (bb BBHash2) BitVectors(varName string) string

BitVectors returns a Go slice for BBHash2's per-partition, per-level bit vectors. This is intended for testing and debugging; no guarantees are made about the format.

func (BBHash2) BitsPerKey

func (bb BBHash2) BitsPerKey() float64

BitsPerKey returns the number of bits per key in the minimal perfect hash.

func (BBHash2) Find

func (bb BBHash2) Find(key uint64) uint64

Find returns a unique index representing the key in the minimal hash set.

The return value is meaningful ONLY for keys in the original key set (provided at the time of construction of the minimal hash set).

If the key is in the original key set, the return value is guaranteed to be in the range [1, len(keys)].

If the key is not in the original key set, two things can happen: 1. The return value is 0, representing that the key was not in the original key set. 2. The return value is in the expected range [1, len(keys)], but is a false positive.

func (BBHash2) Key

func (bb BBHash2) Key(index uint64) uint64

Key returns the key for the given index. The index must be in the range [1, len(keys)], otherwise 0 is returned.

func (BBHash2) LevelVectors

func (bb BBHash2) LevelVectors() [][][]uint64

LevelVectors returns a slice representation of BBHash2's per-partition, per-level bit vectors.

func (BBHash2) MarshalBinary

func (b2 BBHash2) MarshalBinary() ([]byte, error)

MarshalBinary implements the encoding.BinaryMarshaler interface.

func (BBHash2) MaxMinLevels

func (bb BBHash2) MaxMinLevels() (max, min int)

MaxMinLevels returns the maximum and minimum number of levels across all partitions.

func (BBHash2) Partitions

func (bb BBHash2) Partitions() int

Partitions returns the number of partitions in the BBHash2. This is mainly useful for testing and may be removed in the future.

func (BBHash2) SinglePartition

func (bb BBHash2) SinglePartition() *BBHash

SinglePartition returns the underlying BBHash if it contains a single partition. If there are multiple partitions, it returns nil.

func (BBHash2) String

func (bb BBHash2) String() string

func (*BBHash2) UnmarshalBinary

func (b2 *BBHash2) UnmarshalBinary(data []byte) error

UnmarshalBinary implements the encoding.BinaryUnmarshaler interface.

type Options

type Options func(*options)

func Gamma

func Gamma(gamma float64) Options

Gamma sets the gamma parameter for creating a BBHash. The gamma parameter is the expansion factor for the bit vector; the paper recommends a value of 2.0. The larger the value the more space will be consumed by the BBHash.

func InitialLevels

func InitialLevels(levels int) Options

InitialLevels sets the initial number of levels to use when creating a BBHash.

func Parallel

func Parallel() Options

Parallel creates a BBHash by sharding the keys across multiple goroutines. This option is not compatible with the Partitions option.

func Partitions

func Partitions(partitions int) Options

Partitions sets the number of partitions to use when creating a BBHash2. The keys are partitioned into the given the number partitions. Setting partitions to less than 2 results in a single BBHash, wrapped in a BBHash2.

func WithReverseMap

func WithReverseMap() Options

WithReverseMap creates a reverse map when creating a BBHash.

Directories

Path Synopsis
cmd
bbhashbench command
falsepositive command
internal

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL