Stop Losing Relationships in Your RAG! EdgeQuake Fixes This
Stop Losing Relationships in Your RAG! EdgeQuake Fixes This
Your RAG system is lying to you. Every single day, it confidently retrieves chunks that seem relevant while completely missing the hidden connections that actually answer complex questions. You ask "How does our pricing strategy relate to customer churn in the enterprise segment?" and get back three paragraphs about pricing models—zero insight into the causal chain. The brutal truth? Traditional RAG was never built for reasoning. It was built for lookup. And in 2025, lookup is the bare minimum.
Here's the dirty secret that vector database vendors don't want you to know: semantic similarity is not understanding. Two concepts can be mathematically close in embedding space while having zero functional relationship. "Neural networks" and "neural plasticity" sit neighbors in vector land. Try asking your RAG how they interact. Crickets.
This is why the smartest engineering teams are quietly abandoning chunk-and-dump architectures. They're migrating to Graph-RAG—systems that extract entities, map relationships, and traverse knowledge structures at query time. But here's the catch: most Graph-RAG implementations are slow, resource-hungry Python monoliths that collapse under production load.
Enter EdgeQuake. A high-performance Graph-RAG framework written in Rust, inspired by the groundbreaking LightRAG algorithm, and engineered for the brutal realities of production deployment. It doesn't just retrieve documents—it understands them. And it does so at speeds that make traditional RAG look like it's running on a Raspberry Pi.
Ready to see what your RAG system has been missing? Let's dive into the architecture that's making vector-only retrieval obsolete.
What is EdgeQuake?
EdgeQuake is an open-source Graph-RAG framework that transforms unstructured documents into intelligent knowledge graphs for superior retrieval and generation. Created by Raphaël MANSUY, a Hong Kong-based systems architect, EdgeQuake implements the LightRAG algorithm in Rust—delivering production-grade performance that Python-based alternatives simply cannot match.
The project's name evokes its core mission: detecting the "seismic" relationships buried in document collections that conventional RAG systems miss entirely. Where traditional approaches treat documents as flat bags of chunks, EdgeQuake decomposes them into entities (people, organizations, concepts, technologies) and relationships (enables, contradicts, precedes, influences)—then stores both in queryable graph structures.
Why it's trending now: The framework has exploded in popularity because it solves three critical pain points simultaneously:
- The reasoning gap — Vector RAG fails on multi-hop questions requiring connection traversal
- The performance cliff — Python Graph-RAG implementations choke on concurrent workloads
- The deployment complexity — Most alternatives require weeks of infrastructure setup
EdgeQuake ships with a one-command Docker deployment, embedded PDF processing with LLM vision capabilities, and a React 19 frontend with interactive graph visualization. Version 0.11.3 (current at time of writing) adds Mistral La Plateforme as a first-class citizen—chat, vision PDF ingestion, and embeddings work with a single API key and make dev.
The repository has earned its Trendshift badge and growing star history by delivering what papers promise but implementations rarely achieve: algorithmic sophistication with industrial-strength engineering.
Key Features That Crush Traditional RAG
🚀 Rust-Powered Performance Architecture
EdgeQuake leverages Rust's zero-cost abstractions and ownership model to eliminate the garbage collection pauses and GIL contention that plague Python alternatives. The Tokio-based async runtime handles thousands of concurrent requests without breaking a sweat. Zero-copy operations mean data flows through the pipeline without redundant allocations—critical when processing million-token document collections.
SQL pre-filtering with GIN + B-tree indexes pushes metadata filters (tenant, workspace, document) to the database layer, eliminating up to 90% of wasted vector scans at scale. This isn't theoretical—it's the difference between sub-200ms hybrid queries and multi-second timeouts.
💉 Knowledge Injection (v0.8.0+)
Domain glossaries transform EdgeQuake from generic to specialized. Inject acronym definitions, synonym mappings, and invisible citations that automatically expand queries before vector search. A manufacturing glossary mapping "OEE" to "Overall Equipment Effectiveness" ensures technical queries retrieve relevant content even when terminology varies.
The injection system operates invisibly—enriching the knowledge graph without polluting source citations. Full CRUD API with background processing status polling means enterprise teams can manage domain knowledge programmatically.
🏷️ Custom Entity Configuration (v0.9.0+)
Six curated domain presets—General, Manufacturing, Healthcare, Legal, Research, Finance—with up to 50 custom entity types per workspace. Beyond presets, define any UPPERCASE_UNDERSCORED type: BEARING_TYPE, VIBRATION_ANOMALY, REGULATORY_CLAUSE. Auto-normalization handles messy input, and backward compatibility ensures existing workspaces never break.
📄 Production-Ready PDF Processing
The embedded pdfium pipeline requires zero external configuration. Text-mode extraction handles standard PDFs; enable use_vision_llm = true to route pages through GPT-4o, Claude 3.5+, or Gemini 2.5 for scanned documents and complex layouts. Automatic fallback, safe large-PDF guardrails with adaptive DPI limits, and restart-safe recovery make this genuinely production-hardened.
🔍 Six Query Modes for Every Question Type
| Mode | Latency | Best For |
|---|---|---|
| Naive | ~100-300ms | Keyword-like lookups |
| Local | ~200-500ms | Specific entity relationships |
| Global | ~300-800ms | Thematic/high-level questions |
| Hybrid (default) | ~400-1000ms | Balanced comprehensive results |
| Mix | Variable | Weighted vector + graph blend |
| Bypass | ~500-1500ms | General questions without retrieval |
🌐 Enterprise-Grade API
OpenAPI 3.0 with Swagger UI, Server-Sent Events for real-time streaming, Kubernetes-ready health checks (/health, /ready, /live), and fail-closed multi-tenant workspace isolation for query and delete flows. Protected dashboard routes fail closed when authentication is enabled—no silent security degradation.
Real-World Use Cases Where EdgeQuake Dominates
1. Multi-Hop Legal Discovery
A litigation team needs to answer: "Which prior contracts contain termination clauses that reference force majeure events, and which of those were invoked during supply chain disruptions in 2020-2021?"
Traditional RAG retrieves chunks about "termination clauses" and "force majeure" separately. EdgeQuake's Local query mode traverses CONTRACT → CONTAINS → TERMINATION_CLAUSE → REFERENCES → FORCE_MAJEURE_EVENT → INVOKED_DURING → SUPPLY_CHAIN_DISRUPTION relationships, returning the precise contract-document-event chain with source citations.
2. Manufacturing Root Cause Analysis
Equipment failure triggers investigation: "What maintenance procedures, operator training records, and supplier quality reports relate to bearing failures in Line 3 over the past quarter?"
With custom entity types (BEARING_TYPE, VIBRATION_ANOMALY, MAINTENANCE_PROCEDURE), EdgeQuake's Hybrid mode combines vector similarity on failure descriptions with graph traversal across equipment-supplier-personnel relationships. The Knowledge Injection feature expands "bearing" to include "roller bearing," "ball bearing," "thrust bearing" automatically.
3. Pharmaceutical Regulatory Intelligence
Regulatory affairs monitors: "How do FDA guidance documents on adaptive trial designs interact with EMA qualification opinions for biomarker-based enrichment strategies?"
Global query mode leverages Louvain community detection to identify thematic clusters of regulatory concepts, then retrieves community summaries that surface cross-jurisdictional relationships invisible to chunk-based retrieval.
4. Financial Due Diligence Automation
Analysts evaluate: "Which portfolio companies have supply chain exposure to geopolitical risk through tier-2 suppliers in contested regions?"
EdgeQuake extracts PORTFOLIO_COMPANY → DEPENDS_ON → TIER_1_SUPPLIER → SOURCES_FROM → TIER_2_SUPPLIER → LOCATED_IN → GEOPOLITICAL_RISK_REGION relationships from earnings calls, SEC filings, and supplier disclosures. Mix mode weights graph traversal against vector similarity for comprehensive coverage.
Step-by-Step Installation & Setup Guide
⚡ Production Deployment: One Command (30 Seconds)
The fastest path to running EdgeQuake requires only Docker—no Rust toolchain, no Node.js, no compilation:
# Download and execute the interactive setup wizard
curl -fsSL https://raw.githubusercontent.com/raphaelmansuy/edgequake/edgequake-main/quickstart.sh | sh
The wizard handles provider selection (OpenAI or Ollama—never guessed from environment), model selection from curated priced menus, API key validation, stack startup with health polling up to 90 seconds, and re-run detection with safe update or fresh start options.
Alternative direct compose deployment:
# Pipe directly to docker compose
curl -fsSL https://raw.githubusercontent.com/raphaelmansuy/edgequake/edgequake-main/docker-compose.quickstart.yml \
| docker compose -f - up -d
# Or download first for inspection
curl -fsSL https://raw.githubusercontent.com/raphaelmansuy/edgequake/edgequake-main/docker-compose.quickstart.yml \
-o docker-compose.quickstart.yml
docker compose -f docker-compose.quickstart.yml up -d
Access your deployment:
| Service | URL |
|---|---|
| Web UI | http://localhost:3000 |
| API | http://localhost:8080 |
| Swagger | http://localhost:8080/swagger-ui |
| Health | http://localhost:8080/health |
Headless/CI installation (no interactive terminal):
# OpenAI deployment
EDGEQUAKE_LLM_PROVIDER=openai \
OPENAI_API_KEY=sk-... \
docker compose -f docker-compose.quickstart.yml up -d
# Mistral La Plateforme (v0.11.0+)
MISTRAL_API_KEY=... \
docker compose -f docker-compose.quickstart.yml up -d
Management commands:
docker compose -f docker-compose.quickstart.yml logs -f # Stream logs
docker compose -f docker-compose.quickstart.yml ps # Check status
docker compose -f docker-compose.quickstart.yml down # Stop services
Pin versions for reproducibility:
EDGEQUAKE_VERSION=0.10.8 sh quickstart.sh
🛠️ Development Setup (5 Minutes)
For contributors or those needing source-level debugging:
Prerequisites:
- Rust 1.95+ (rustup.rs)
- Node.js 18+ or Bun 1.0+ (nodejs.org)
- Docker (for PostgreSQL)
- Ollama (optional, for local LLMs)
# 1. Clone repository
git clone https://github.com/raphaelmansuy/edgequake.git
cd edgequake
# 2. Install all dependencies
make install
# 3. Configure frontend environment
cp edgequake_webui/.env.local.example edgequake_webui/.env.local
# 4. Start full development stack
make dev # No authentication (default)
# make dev-auth # With authentication enabled
The development stack automatically handles port conflicts—using 3000 for the UI by default, auto-selecting the next free port if occupied.
Environment Configuration
Create edgequake/docker/.env from the example file:
cd edgequake/docker
cp .env.example .env
Critical variables for production tuning:
| Variable | Purpose | Example |
|---|---|---|
DATABASE_URL |
PostgreSQL connection | postgres://user:pass@host:5432/edgequake |
EDGEQUAKE_LLM_PROVIDER |
Backend LLM | openai, mistral, ollama |
EDGEQUAKE_CHUNK_TIMEOUT_SECS |
Per-chunk LLM timeout | 600 for slow local models |
EDGEQUAKE_MAX_CONCURRENT_EXTRACTIONS |
Parallelism control | 4 for GPU-constrained environments |
For large documents on slow local LLMs, increase timeouts to prevent failures:
export EDGEQUAKE_CHUNK_TIMEOUT_SECS=600
export EDGEQUAKE_MAX_CONCURRENT_EXTRACTIONS=4
export EDGEQUAKE_LLM_TIMEOUT_SECS=3600
REAL Code Examples from EdgeQuake
Example 1: First Document Upload via REST API
EdgeQuake's document ingestion pipeline automatically handles chunking, entity extraction, relationship mapping, and graph storage. Here's the simplest possible upload:
# Upload any supported file (PDF, TXT, MD, DOCX, etc.)
curl -X POST http://localhost:8080/api/v1/documents/upload \
-F "file=@your-document.pdf"
Typical response showing pipeline completion:
{
"id": "doc-123",
"status": "completed",
"chunk_count": 15,
"entity_count": 12,
"relationship_count": 8,
"processing_time_ms": 2500
}
The chunk_count reflects the ~1200-token segmentation with 100-token overlap. The entity_count and relationship_count reveal the knowledge graph density—higher numbers indicate richer structural understanding. The 2.5-second processing time for a typical document demonstrates Rust's efficiency; Python equivalents often take 6-10 seconds for comparable workloads.
Key insight: The id field (doc-123) becomes your handle for subsequent operations—queries, updates, deletions, and relationship exploration all reference this identifier.
Example 2: Hybrid Query with Full Response Structure
The default Hybrid mode combines local entity context with global community context for balanced, comprehensive answers:
# Query the knowledge graph with structured response
curl -X POST http://localhost:8080/api/v1/query \
-H "Content-Type: application/json" \
-d '{
"query": "What are the main concepts?",
"mode": "hybrid"
}'
Response demonstrating graph-aware retrieval:
{
"answer": "The main concepts are: knowledge graphs, entity extraction, and hybrid retrieval...",
"sources": [
{ "chunk_id": "chunk-1", "similarity": 0.92 },
{ "chunk_id": "chunk-5", "similarity": 0.87 }
],
"entities": ["KNOWLEDGE_GRAPH", "ENTITY_EXTRACTION"],
"relationships": [
{
"source": "KNOWLEDGE_GRAPH",
"target": "ENTITY_EXTRACTION",
"type": "ENABLES"
}
]
}
Critical architecture insight: Notice how the response contains three distinct information layers:
answer— The LLM-generated response grounded in retrieved contextsources— Vector similarity scores proving chunk relevance (0.92 = 92% semantic match)entitiesandrelationships— The graph structure that enabled multi-hop reasoning
The ENABLES relationship type between KNOWLEDGE_GRAPH and ENTITY_EXTRACTION wasn't explicitly in the source text—it was inferred by the LLM during extraction and stored in Apache AGE. This inferred structure is what enables questions like "What capabilities does knowledge graph technology enable?" to succeed where vector-only RAG fails.
Production tip: Parse the relationships array to build interactive exploration UIs—click any entity to traverse its neighborhood, exactly as the React 19 frontend does with Sigma.js.
Example 3: Docker Deployment with Custom Configuration
For production deployments requiring specific API versions and timeout tuning:
# Pull specific version with multi-arch support
docker pull ghcr.io/raphaelmansuy/edgequake:0.10.8
# Run with full configuration for slow local LLMs
docker run -d \
--name edgequake \
-p 8080:8080 \
-e DATABASE_URL="postgres://user:password@db-host:5432/edgequake" \
-e EDGEQUAKE_LLM_PROVIDER=openai \
-e OPENAI_API_KEY="sk-..." \
-e EDGEQUAKE_CHUNK_TIMEOUT_SECS=600 \
-e EDGEQUAKE_MAX_CONCURRENT_EXTRACTIONS=4 \
-e RUST_LOG=info \
ghcr.io/raphaelmansuy/edgequake:0.10.8
# Verify health endpoint
curl http://localhost:8080/health
Architecture note: The RUST_LOG=info environment variable controls tracing output. For debugging extraction failures, use RUST_LOG=debug—but expect voluminous output. The EDGEQUAKE_MAX_CONCURRENT_EXTRACTIONS=4 setting is crucial for local GPU deployments; default 16 overwhelms consumer cards and causes cascading timeouts.
Multi-arch magic: The same ghcr.io/raphaelmansuy/edgequake:0.10.8 tag works on x86 servers, Apple Silicon Macs, and AWS Graviton instances—Docker automatically selects the correct architecture without QEMU emulation.
Example 4: Make-Based Development Workflow
EdgeQuake's unified Makefile eliminates environment inconsistencies:
# Full stack with PostgreSQL, backend, and frontend
make dev
# Background mode for CI/automation
make dev-bg
# In-memory storage for rapid testing (no Docker required)
make dev-memory
# Service management
make status # Check all services
make stop # Graceful shutdown
# Backend-only operations
make backend-test # Run test suite
cargo clippy # Lint check
cargo fmt # Format code
# Frontend operations
make frontend-build # Production build
Critical for contributors: The make dev-memory target uses in-memory adapters instead of PostgreSQL—perfect for unit tests and rapid iteration. However, graph visualization features require the full PostgreSQL + Apache AGE stack; the in-memory adapter stores graph structures but doesn't expose AGE's Cypher query interface.
Advanced Usage & Best Practices
Query Mode Selection Strategy
Naive mode isn't useless—it's your fastest path for exact-match lookups. Use it when users search for specific phrases, error codes, or named entities where relationship context adds noise.
Local mode excels at "How does X relate to Y?" questions. It performs vector search on entities, then traverses their 1-2 hop neighborhood. The sweet spot: specific technical relationships in dense knowledge domains.
Global mode transforms thematic questions. "What are our major risks?" requires community detection via Louvain modularity optimization—identifying clusters of tightly-connected entities that form coherent themes.
Hybrid (default) costs 2-3x latency but eliminates mode-selection guesswork. For user-facing applications with unpredictable question types, it's the safe default.
PDF Processing Optimization
Enable vision mode selectively:
# Per-request header for one-off vision processing
curl -X POST http://localhost:8080/api/v1/documents/upload \
-H "X-Use-Vision: true" \
-F "file=@scanned-contract.pdf"
Scanned documents, complex tables, and multi-column layouts justify the 3-5x processing cost. For standard text PDFs, text mode (default) is faster and more token-efficient.
Knowledge Injection for Domain Specialization
Pre-load glossaries before document ingestion:
curl -X POST http://localhost:8080/api/v1/workspaces/:id/injection/upload \
-F "file=@manufacturing-glossary.txt"
Format: simple ACRONYM = Full Definition or TERM = Synonym1, Synonym2 per line. The system automatically expands queries—searching "OEE" retrieves content tagged with "Overall Equipment Effectiveness" without explicit mention.
Memory and Concurrency Tuning
| Document Volume | EDGEQUAKE_MAX_CONCURRENT_EXTRACTIONS |
EDGEQUAKE_CHUNK_TIMEOUT_SECS |
|---|---|---|
| < 100 pages, fast API | 16 (default) | 180 (default) |
| 100-500 pages, fast API | 12 | 300 |
| > 500 pages or local LLM | 4-8 | 600-1200 |
| GPU-constrained (8GB VRAM) | 2-4 | 600 |
Comparison with Alternatives
| Capability | EdgeQuake | LightRAG (Python) | Microsoft's GraphRAG | Traditional RAG |
|---|---|---|---|---|
| Language | Rust | Python | Python | Any |
| Query Latency (hybrid) | < 200ms | ~2000ms | ~3000ms | ~1000ms |
| Concurrent Users | 1000+ | ~50 | ~30 | ~100 |
| Memory per Document | 2MB | ~15MB | ~20MB | ~8MB |
| One-Command Deploy | ✅ Yes | ❌ Manual setup | ❌ Complex config | ✅ Usually |
| PDF Vision Processing | ✅ Built-in | ❌ External tools | ❌ External tools | ❌ External tools |
| Multi-Tenant Isolation | ✅ Fail-closed | ❌ Single tenant | ❌ Single tenant | Varies |
| MCP Agent Integration | ✅ Native | ❌ None | ❌ None | ❌ None |
| Streaming Responses | ✅ SSE | ❌ Polling | ❌ Polling | Varies |
| Custom Entity Types | ✅ 50/workspace | ❌ Fixed | ❌ Fixed | ❌ N/A |
| Production Auth | ✅ Runtime config | ❌ None | ❌ None | Varies |
The verdict: LightRAG (Python) pioneered the algorithm but cannot handle production load. Microsoft's GraphRAG offers sophisticated community detection at extreme infrastructure cost. Traditional RAG is fast but structurally blind. EdgeQuake uniquely delivers algorithmic sophistication with industrial performance—the Rust implementation isn't optimization, it's transformation.
FAQ: EdgeQuake for Serious Engineers
What makes EdgeQuake different from vector-only RAG systems?
Vector RAG retrieves chunks based on semantic similarity. EdgeQuake extracts entities and relationships during indexing, then traverses graph structures at query time. This enables multi-hop reasoning—answering "How does A relate to B through C?"—that vector similarity alone cannot solve.
Do I need Rust expertise to deploy EdgeQuake?
Absolutely not. The Docker deployment requires zero Rust knowledge. The quickstart.sh script handles everything. Development contributions require Rust 1.95+, but operational deployment is entirely containerized.
Which LLM providers work out of the box?
OpenAI, Anthropic (Claude), Mistral La Plateforme, Google (Gemini, Vertex AI), MiniMax, Azure OpenAI, xAI, Ollama, and LM Studio. Mistral gained first-class support in v0.11.0 with dedicated chat, vision PDF, and embedding models.
Can EdgeQuake handle scanned PDFs and complex layouts?
Yes. Enable vision mode via use_vision_llm = true or the X-Use-Vision: true header. GPT-4o, Claude 3.5+, and Gemini 2.5 process pages as images, recovering tables and multi-column layouts that text extraction mangles. Automatic fallback to text mode prevents total failure.
How does workspace isolation work?
EdgeQuake implements fail-closed multi-tenant isolation. Invalid or missing workspace selectors in query/delete operations are rejected rather than silently remapped to defaults. This prevents cross-tenant data leakage—a critical requirement for SaaS deployments.
What's the MCP integration for?
The Model Context Protocol exposes EdgeQuake capabilities to AI agents (Claude, Cursor, etc.). Agents can programmatically query knowledge graphs, upload documents, and explore relationships—enabling autonomous research workflows that combine EdgeQuake's retrieval with agent reasoning.
Is there a managed cloud version?
Currently self-hosted only. The Docker deployment supports horizontal scaling via PostgreSQL connection pooling and stateless API instances. Cloud-managed offerings are on the roadmap; star the repository for updates.
Conclusion: The Graph-RAG Shift Is Here
The evidence is overwhelming: vector-only RAG has hit its ceiling. It works for lookup, fails for reasoning, and collapses under complexity. The engineering teams winning in 2025 aren't tweaking embedding models—they're restructuring how knowledge gets represented.
EdgeQuake represents the vanguard of this shift. By implementing LightRAG's graph extraction in Rust's zero-cost concurrency model, it delivers what research papers promise and Python prototypes cannot: sub-200ms hybrid queries, 1000+ concurrent users, and genuine multi-hop reasoning over document collections.
The one-command Docker deployment removes infrastructure excuses. The embedded PDF vision pipeline eliminates preprocessing nightmares. The MCP integration future-proofs your architecture for agentic AI.
But here's what matters most: EdgeQuake makes your documents actually understandable. Not chunk-retrievable. Not semantically-similar. Understandable—connected, contextualized, traversable.
Your move. The repository is waiting. The quickstart.sh is 30 seconds away. And your competitors are already graphing.
⭐ Star EdgeQuake on GitHub | 🚀 Deploy Now | 📚 Read the Docs
Built with 🦀 Rust and 🔥 conviction by Raphaël MANSUY. Licensed under Apache 2.0.
Comments (0)
No comments yet. Be the first to share your thoughts!