Your code, semantically searchable.
Local-first RAG engine that indexes your codebase AND documentation, searchable through MCP.
No cloud. No GPU. No data leaves your machine.
Everything your AI workflow needs
From code parsing to semantic search, Abyss handles the full pipeline locally.
Semantic Search
Search by meaning, not keywords. Ask "how is authentication handled?" and find the exact functions -- even if the word "auth" never appears.
AST-Aware Code Parsing
Tree-sitter splits code at syntactic boundaries -- methods, classes, functions -- not arbitrary line counts. Supports C#, Python, Java, TypeScript, and more.
MCP Native
8 tools + 3 resources accessible from VS Code, Claude Desktop, and any MCP-compatible client. Zero configuration friction.
Universal Document Ingestion
PDF, DOCX, PPTX, images (OCR), Markdown, Jupyter notebooks, CSV, JSON, XML -- all converted and indexed in one unified vector database.
SCIP Call Graph Enrichment
Optional SCIP indexing adds caller/callee relationships, symbol kinds, and documentation to every code chunk. Ask "who calls ProcessPayment()?" and get real answers.
Privacy by Design
Everything runs on your machine. The embedding model downloads once (~90MB) and is cached locally. No cloud APIs, no telemetry, no data exfiltration.
Advanced Query Filters
Filter by language, symbol kind, file path, line range, chunk type, or full-text substring. Precision search across your entire codebase.
Persistent ChromaDB Storage
Indexed data survives restarts in a local SQLite-backed ChromaDB database. No external server needed. Index once, query forever.
Debug HTML Reports
Per-file HTML debug reports showing every chunk, its metadata, and the exact text sent to the embedding model. Diagnose chunking quality visually.
Five-stage ingestion pipeline
Files flow through discovery, parsing, enrichment, embedding, and storage -- fully automated.
File Discovery
Recursive glob traversal with size limits (10MB max), directory exclusions, extension filtering, and automatic file type classification (code, document, structured, unknown).
Smart Parsing
Four specialized parsers dispatch by file type. CodeParser uses Tree-sitter AST grammars. DocParser leverages MarkItDown + header-based sectioning. JsonParser and XmlParser handle structured data hierarchically.
SCIP Enrichment (optional)
Matches each code chunk to a SCIP index by file + line range. Injects symbol, kind, callers[], callees[], and documentation -- enabling call-graph-aware search.
Semantic Header Injection
EmbedBuilder prepends a structured header to each chunk's text -- file path, language, symbol name, kind, callers, and callees. This dramatically improves embedding quality and search relevance.
Embedding & Storage
Batch-encodes enriched text with all-MiniLM-L6-v2, then upserts into ChromaDB with cosine similarity. Files are tracked in a document registry with hash, size, and timestamp for incremental re-indexing.
Semantically link your docs and codebase
Text search finds strings. Abyss finds meaning.
Battle-tested technology stack
Index everything
Source code with AST-aware parsing, documents with intelligent sectioning, structured data with hierarchical decomposition.
Source Code
Tree-sitterDocuments
MarkItDownStructured
DedicatedPlug into your AI workflow
MCP server with 8 tools and 3 resources. Works with any MCP-compatible client.
VS Code
GitHub Copilot + MCP integration
Claude Desktop
Native MCP client support
Any MCP Client
Standard protocol, zero lock-in
Available MCP Tools
index_directory
Recursively index a directory with include/exclude filters
query
Semantic search with filters: language, kind, file path, line range, text
list_documents
List all indexed files with metadata: name, date, size, chunk count
list_sources
Unique metadata values: file paths, languages, kinds, chunk types
replace_document
Re-index a single file after modification
remove_document
Remove a file and all its chunks from the database
list_filterable_fields
Describe all filterable metadata fields with types and operators
clear_database
Erase all chunks and document registry (requires confirmation)
Up and running in minutes
From clone to semantic search in four steps.
Clone & install
Configure MCP in VS Code
Index your codebase
Use the index_directory MCP tool from your AI assistant:
Search with natural language
Example of results with Abyss
A single structured prompt to an AI agent -- backed by Abyss MCP queries -- enables detailed analysis and insights
The prompt
query and list_sources calls against the Abyss index to build the full picture before writing a single line of the report.
The prompt
Ready to search your code
by meaning?
Open source. Privacy-first. No cloud required.