go-local-search

module
v0.0.0-...-e803270 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 19, 2025 License: MIT

README ΒΆ

A fast, lightweight local full-text search engine for indexing and searching through your files (Markdown, text, and code). Built entirely in Go with inverted index data structures and persistent storage using BoltDB.

Features

  • πŸš€ Fast Indexing: Efficiently indexes files using inverted index data structures
  • πŸ” Instant Search: Sub-second search across thousands of files
  • 🎯 TF-IDF Ranking: Intelligent relevance scoring using Term Frequency-Inverse Document Frequency
  • πŸ”€ Fuzzy Matching: Find results even with typos using Levenshtein distance
  • πŸ“Š Incremental Indexing: Only re-indexes modified files
  • πŸ’Ύ Persistent Storage: Indexes stored on disk using BoltDB
  • πŸ–₯️ Dual Interface: CLI and HTTP API server
  • πŸ“ Multiple File Types: Supports Markdown, text, and code files (.md, .txt, .go, .py, .js, .ts, .java, .c, .cpp, .rs, etc.)
  • 🎨 Colored CLI Output: Beautiful, colorized terminal output

Installation

Prerequisites
  • Go 1.18 or higher
Build from Source
git clone https://github.com/BaseMax/go-local-search.git
cd go-local-search
go build -o bin/search ./cmd/search
Install Globally
go install github.com/BaseMax/go-local-search/cmd/search@latest

Usage

CLI Commands
Index Files

Index all files in a directory:

search index /path/to/directory

Example:

search index ~/Documents
search index ~/projects

Basic search:

search search "your query"

Fuzzy search (tolerates typos):

search search "your query" --fuzzy

Fuzzy search with custom edit distance:

search search "your query" --fuzzy --distance 1

Examples:

# Search for exact matches
search search "golang programming"

# Fuzzy search (finds "python" even if you type "pythn")
search search "pythn" --fuzzy

# Search code with function names
search search "handleRequest" --fuzzy --distance 2
Start HTTP Server

Start the HTTP API server:

search server [address]

Examples:

# Start on default port (localhost:8080)
search server

# Start on custom port
search server localhost:3000
View Statistics

Show index statistics:

search stats

Output:

=== Index Statistics ===
Documents:    1234
Terms:        5678
Files:        1234
Total Size:   45.67 MB
HTTP API
Endpoints

GET /search - Search indexed files

Parameters:

  • q (required) - Search query
  • fuzzy (optional) - Enable fuzzy matching (true/false)
  • distance (optional) - Max edit distance for fuzzy search (default: 2)

Example:

curl "http://localhost:8080/search?q=golang&fuzzy=true"

Response:

{
  "query": "golang",
  "count": 2,
  "results": [
    {
      "path": "/path/to/file.md",
      "score": 1.386,
      "match_count": 1,
      "snippet": "Go is a programming language..."
    }
  ]
}

POST /index - Index a directory

Body:

{
  "path": "/path/to/directory"
}

Example:

curl -X POST http://localhost:8080/index \
  -H "Content-Type: application/json" \
  -d '{"path": "/home/user/documents"}'

Response:

{
  "success": true,
  "files_indexed": 42
}

GET /stats - Get index statistics

Example:

curl http://localhost:8080/stats

Response:

{
  "document_count": 1234,
  "term_count": 5678,
  "files_indexed": 1234,
  "total_size": 47890123
}

Architecture

Components
  1. Tokenizer (internal/tokenizer):

    • Text tokenization and normalization
    • Stop word filtering
    • Basic stemming
    • Levenshtein distance calculation for fuzzy matching
  2. Inverted Index (internal/index):

    • Efficient inverted index data structure
    • TF-IDF scoring for relevance ranking
    • Positional information tracking
    • Thread-safe operations
  3. Storage (internal/storage):

    • BoltDB integration for persistent storage
    • Index serialization/deserialization
    • Metadata storage
  4. Indexer (internal/indexer):

    • Recursive directory scanning
    • File type detection
    • Incremental indexing (detects file changes)
    • SHA-256 hashing for change detection
  5. Search Engine (internal/search):

    • Main search engine orchestration
    • Query processing
    • Result ranking
    • Fuzzy search implementation
  6. HTTP Server (internal/server):

    • RESTful API endpoints
    • JSON request/response handling
    • Web-based interface
How It Works
  1. Indexing Phase:

    • Files are scanned recursively
    • Content is tokenized into terms
    • Terms are normalized (lowercase, stemming)
    • Inverted index is built: term β†’ list of (document, frequency, positions)
    • Index is persisted to BoltDB
  2. Search Phase:

    • Query is tokenized and normalized
    • Relevant documents are retrieved from inverted index
    • TF-IDF scoring calculates relevance
    • Results are ranked by score and number of matching terms
    • For fuzzy search, similar terms are found using Levenshtein distance
  3. Incremental Indexing:

    • File modification times and hashes are tracked
    • Only changed files are re-indexed
    • Removed files are automatically cleaned from index

Technical Details

Supported File Types
  • Markdown: .md
  • Text: .txt
  • Go: .go
  • Python: .py
  • JavaScript: .js, .ts
  • Java: .java
  • C/C++: .c, .cpp, .h
  • Rust: .rs
  • Ruby: .rb
  • PHP: .php
  • Shell: .sh
  • YAML: .yml, .yaml
  • JSON: .json
  • XML: .xml
  • HTML: .html
  • CSS: .css
  • SQL: .sql
  • README files (no extension)
TF-IDF Scoring

The search engine uses TF-IDF (Term Frequency-Inverse Document Frequency) for ranking:

  • TF (Term Frequency): Number of times a term appears in a document
  • IDF (Inverse Document Frequency): log(total_documents / documents_containing_term)
  • Score: TF Γ— IDF

Documents with higher scores are more relevant to the query.

Fuzzy Matching

Fuzzy search uses Levenshtein distance to find similar terms:

  • Default maximum edit distance: 2
  • Finds terms within the specified edit distance
  • Useful for handling typos and variations

Configuration

Configuration is stored in ~/.go-local-search/config.json:

{
  "storage_path": "/home/user/.go-local-search/index.db",
  "index_paths": [],
  "server_addr": "localhost:8080",
  "fuzzy_search": false,
  "max_distance": 2
}

Performance

  • Indexing speed: ~1000 files/second (depends on file size and disk speed)
  • Search speed: Sub-millisecond for most queries
  • Memory usage: Efficient with lazy loading from BoltDB
  • Disk usage: Index size is typically 10-20% of original file size

Examples

Index your projects
search index ~/projects
search index ~/Documents
search index ~/notes
Search examples
# Find Go tutorials
search search "golang tutorial"

# Find function definitions (with fuzzy matching)
search search "handleRequest" --fuzzy

# Search for algorithms
search search "binary search algorithm"

# Find Python code
search search "python class definition"
Using the HTTP API
# Start the server
search server localhost:8080

# Search via API
curl "http://localhost:8080/search?q=golang&fuzzy=true" | jq

# Index new directory via API
curl -X POST http://localhost:8080/index \
  -H "Content-Type: application/json" \
  -d '{"path": "/home/user/new-project"}' | jq

# Get statistics
curl http://localhost:8080/stats | jq

Dependencies

  • BoltDB - Embedded key/value database for persistent storage

Project Structure

.
β”œβ”€β”€ cmd/
β”‚   └── search/          # Main CLI application
β”œβ”€β”€ internal/
β”‚   β”œβ”€β”€ tokenizer/       # Text tokenization and normalization
β”‚   β”œβ”€β”€ index/           # Inverted index implementation
β”‚   β”œβ”€β”€ storage/         # BoltDB storage layer
β”‚   β”œβ”€β”€ indexer/         # File indexing logic
β”‚   β”œβ”€β”€ search/          # Search engine
β”‚   └── server/          # HTTP API server
β”œβ”€β”€ pkg/
β”‚   └── config/          # Configuration management
└── bin/                 # Compiled binaries

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the GPL-3.0 License - see the LICENSE file for details.

Author

Acknowledgments

  • Built with Go
  • Uses BoltDB for efficient storage
  • Inspired by modern search engines and information retrieval techniques

Directories ΒΆ

Path Synopsis
cmd
search command
internal
pkg

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL