Documentation
¶
Overview ¶
Package fastcdc implements the FastCDC 2020 content-defined chunking algorithm. See https://ieeexplore.ieee.org/document/9055082 by Wen Xia, et al.
This implementation uses the 2-byte rolling optimization described in section 3.7 of the paper for improved performance.
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Chunk ¶
type Chunk struct {
Offset int // Byte position in the stream where this chunk starts.
Length int // Size of the chunk in bytes.
Data []byte // Raw chunk bytes. Only valid until the next call to Next.
Fingerprint uint64 // Final gear hash value at the chunk boundary.
}
Chunk holds the result of a single content-defined chunk.
type Chunker ¶
type Chunker struct {
// contains filtered or unexported fields
}
Chunker splits a byte stream into variable-sized chunks using FastCDC 2020.
func NewChunker ¶
NewChunker creates a new Chunker with the given average chunk size. The averageSize must be a power of 2 and must be in the range 64B to 1GiB. High normalization reduces the range of allowed values for average size. Other options have sensible defaults.
type Option ¶
type Option func(*options)
func WithBufferSize ¶
WithBufferSize sets the read buffer size (defaults to maxSize * 2). Larger buffers reduce read syscalls. Must exceed maxSize.
func WithMaxSize ¶
WithMaxSize overrides the maximum chunk size (defaults to averageSize * 4).
func WithMinSize ¶
WithMinSize overrides the minimum chunk size (defaults to averageSize / 4).
func WithNormalization ¶
WithNormalization sets the normalization level from 0-3 (defaults to 2).
Higher normalization levels produce chunks closer to the average size by making it harder to chunk at small sizes and harder to chunk at large sizes. This could reduce de-duplication, but can make chunk sizes more predictable.
0: Normalization disabled 1: Fewer chunks outside desired range 2: Most chunks match desired size (recommended) 3: Nearly all chunks are the desired size