Go Local Search
A fast, lightweight local full-text search engine for indexing and searching through your files (Markdown, text, and code). Built entirely in Go with inverted index data structures and persistent storage using BoltDB.
Features
- π Fast Indexing: Efficiently indexes files using inverted index data structures
- π Instant Search: Sub-second search across thousands of files
- π― TF-IDF Ranking: Intelligent relevance scoring using Term Frequency-Inverse Document Frequency
- π€ Fuzzy Matching: Find results even with typos using Levenshtein distance
- π Incremental Indexing: Only re-indexes modified files
- πΎ Persistent Storage: Indexes stored on disk using BoltDB
- π₯οΈ Dual Interface: CLI and HTTP API server
- π Multiple File Types: Supports Markdown, text, and code files (.md, .txt, .go, .py, .js, .ts, .java, .c, .cpp, .rs, etc.)
- π¨ Colored CLI Output: Beautiful, colorized terminal output
Installation
Prerequisites
Build from Source
git clone https://github.com/BaseMax/go-local-search.git
cd go-local-search
go build -o bin/search ./cmd/search
Install Globally
go install github.com/BaseMax/go-local-search/cmd/search@latest
Usage
CLI Commands
Index Files
Index all files in a directory:
search index /path/to/directory
Example:
search index ~/Documents
search index ~/projects
Search
Basic search:
search search "your query"
Fuzzy search (tolerates typos):
search search "your query" --fuzzy
Fuzzy search with custom edit distance:
search search "your query" --fuzzy --distance 1
Examples:
# Search for exact matches
search search "golang programming"
# Fuzzy search (finds "python" even if you type "pythn")
search search "pythn" --fuzzy
# Search code with function names
search search "handleRequest" --fuzzy --distance 2
Start HTTP Server
Start the HTTP API server:
search server [address]
Examples:
# Start on default port (localhost:8080)
search server
# Start on custom port
search server localhost:3000
View Statistics
Show index statistics:
search stats
Output:
=== Index Statistics ===
Documents: 1234
Terms: 5678
Files: 1234
Total Size: 45.67 MB
HTTP API
Endpoints
GET /search - Search indexed files
Parameters:
q (required) - Search query
fuzzy (optional) - Enable fuzzy matching (true/false)
distance (optional) - Max edit distance for fuzzy search (default: 2)
Example:
curl "http://localhost:8080/search?q=golang&fuzzy=true"
Response:
{
"query": "golang",
"count": 2,
"results": [
{
"path": "/path/to/file.md",
"score": 1.386,
"match_count": 1,
"snippet": "Go is a programming language..."
}
]
}
POST /index - Index a directory
Body:
{
"path": "/path/to/directory"
}
Example:
curl -X POST http://localhost:8080/index \
-H "Content-Type: application/json" \
-d '{"path": "/home/user/documents"}'
Response:
{
"success": true,
"files_indexed": 42
}
GET /stats - Get index statistics
Example:
curl http://localhost:8080/stats
Response:
{
"document_count": 1234,
"term_count": 5678,
"files_indexed": 1234,
"total_size": 47890123
}
Architecture
Components
-
Tokenizer (internal/tokenizer):
- Text tokenization and normalization
- Stop word filtering
- Basic stemming
- Levenshtein distance calculation for fuzzy matching
-
Inverted Index (internal/index):
- Efficient inverted index data structure
- TF-IDF scoring for relevance ranking
- Positional information tracking
- Thread-safe operations
-
Storage (internal/storage):
- BoltDB integration for persistent storage
- Index serialization/deserialization
- Metadata storage
-
Indexer (internal/indexer):
- Recursive directory scanning
- File type detection
- Incremental indexing (detects file changes)
- SHA-256 hashing for change detection
-
Search Engine (internal/search):
- Main search engine orchestration
- Query processing
- Result ranking
- Fuzzy search implementation
-
HTTP Server (internal/server):
- RESTful API endpoints
- JSON request/response handling
- Web-based interface
How It Works
-
Indexing Phase:
- Files are scanned recursively
- Content is tokenized into terms
- Terms are normalized (lowercase, stemming)
- Inverted index is built: term β list of (document, frequency, positions)
- Index is persisted to BoltDB
-
Search Phase:
- Query is tokenized and normalized
- Relevant documents are retrieved from inverted index
- TF-IDF scoring calculates relevance
- Results are ranked by score and number of matching terms
- For fuzzy search, similar terms are found using Levenshtein distance
-
Incremental Indexing:
- File modification times and hashes are tracked
- Only changed files are re-indexed
- Removed files are automatically cleaned from index
Technical Details
Supported File Types
- Markdown: .md
- Text: .txt
- Go: .go
- Python: .py
- JavaScript: .js, .ts
- Java: .java
- C/C++: .c, .cpp, .h
- Rust: .rs
- Ruby: .rb
- PHP: .php
- Shell: .sh
- YAML: .yml, .yaml
- JSON: .json
- XML: .xml
- HTML: .html
- CSS: .css
- SQL: .sql
- README files (no extension)
TF-IDF Scoring
The search engine uses TF-IDF (Term Frequency-Inverse Document Frequency) for ranking:
- TF (Term Frequency): Number of times a term appears in a document
- IDF (Inverse Document Frequency): log(total_documents / documents_containing_term)
- Score: TF Γ IDF
Documents with higher scores are more relevant to the query.
Fuzzy Matching
Fuzzy search uses Levenshtein distance to find similar terms:
- Default maximum edit distance: 2
- Finds terms within the specified edit distance
- Useful for handling typos and variations
Configuration
Configuration is stored in ~/.go-local-search/config.json:
{
"storage_path": "/home/user/.go-local-search/index.db",
"index_paths": [],
"server_addr": "localhost:8080",
"fuzzy_search": false,
"max_distance": 2
}
- Indexing speed: ~1000 files/second (depends on file size and disk speed)
- Search speed: Sub-millisecond for most queries
- Memory usage: Efficient with lazy loading from BoltDB
- Disk usage: Index size is typically 10-20% of original file size
Examples
Index your projects
search index ~/projects
search index ~/Documents
search index ~/notes
Search examples
# Find Go tutorials
search search "golang tutorial"
# Find function definitions (with fuzzy matching)
search search "handleRequest" --fuzzy
# Search for algorithms
search search "binary search algorithm"
# Find Python code
search search "python class definition"
Using the HTTP API
# Start the server
search server localhost:8080
# Search via API
curl "http://localhost:8080/search?q=golang&fuzzy=true" | jq
# Index new directory via API
curl -X POST http://localhost:8080/index \
-H "Content-Type: application/json" \
-d '{"path": "/home/user/new-project"}' | jq
# Get statistics
curl http://localhost:8080/stats | jq
Dependencies
- BoltDB - Embedded key/value database for persistent storage
Project Structure
.
βββ cmd/
β βββ search/ # Main CLI application
βββ internal/
β βββ tokenizer/ # Text tokenization and normalization
β βββ index/ # Inverted index implementation
β βββ storage/ # BoltDB storage layer
β βββ indexer/ # File indexing logic
β βββ search/ # Search engine
β βββ server/ # HTTP API server
βββ pkg/
β βββ config/ # Configuration management
βββ bin/ # Compiled binaries
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
License
This project is licensed under the GPL-3.0 License - see the LICENSE file for details.
Author
Acknowledgments
- Built with Go
- Uses BoltDB for efficient storage
- Inspired by modern search engines and information retrieval techniques