fastcdc

package
v0.0.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 29, 2026 License: Apache-2.0 Imports: 3 Imported by: 0

Documentation

Overview

Package fastcdc implements the FastCDC 2020 content-defined chunking algorithm. See https://ieeexplore.ieee.org/document/9055082 by Wen Xia, et al.

This implementation uses the 2-byte rolling optimization described in section 3.7 of the paper for improved performance.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Chunk

type Chunk struct {
	Offset      int    // Byte position in the stream where this chunk starts.
	Length      int    // Size of the chunk in bytes.
	Data        []byte // Raw chunk bytes. Only valid until the next call to Next.
	Fingerprint uint64 // Final gear hash value at the chunk boundary.
}

Chunk holds the result of a single content-defined chunk.

type Chunker

type Chunker struct {
	// contains filtered or unexported fields
}

Chunker splits a byte stream into variable-sized chunks using FastCDC 2020.

func NewChunker

func NewChunker(rd io.Reader, averageSize int, opts ...Option) (*Chunker, error)

NewChunker creates a new Chunker with the given average chunk size. The averageSize must be a power of 2 and must be in the range 64B to 1GiB. High normalization reduces the range of allowed values for average size. Other options have sensible defaults.

func (*Chunker) Next

func (c *Chunker) Next() (Chunk, error)

Next returns the next chunk, or io.EOF when the stream is exhausted. The chunk's Data slice is only valid until the next call to Next.

func (*Chunker) Reset

func (c *Chunker) Reset(rd io.Reader)

Reset reinitializes the chunker with a new reader.

type Option

type Option func(*options)

func WithBufferSize

func WithBufferSize(size int) Option

WithBufferSize sets the read buffer size (defaults to maxSize * 2). Larger buffers reduce read syscalls. Must exceed maxSize.

func WithMaxSize

func WithMaxSize(size int) Option

WithMaxSize overrides the maximum chunk size (defaults to averageSize * 4).

func WithMinSize

func WithMinSize(size int) Option

WithMinSize overrides the minimum chunk size (defaults to averageSize / 4).

func WithNormalization

func WithNormalization(level int) Option

WithNormalization sets the normalization level from 0-3 (defaults to 2).

Higher normalization levels produce chunks closer to the average size by making it harder to chunk at small sizes and harder to chunk at large sizes. This could reduce de-duplication, but can make chunk sizes more predictable.

0: Normalization disabled
1: Fewer chunks outside desired range
2: Most chunks match desired size (recommended)
3: Nearly all chunks are the desired size

func WithSeed

func WithSeed(seed uint64) Option

WithSeed applies an XOR mask to the global gear tables to prevent fingerprinting attacks that infer content from chunk sizes.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL