Every knowledge base looks simple at first. You write notes, store documents, search with keywords. It works — until it doesn't. The moment you have five thousand documents in two languages, keyword search stops finding things you know are there. Worse — it finds things that match lexically but miss the point entirely.
This is the problem Scrutator solves.
What Scrutator Is
The name comes from Latin: scrutator — "one who thoroughly investigates." Scrutator is the foundational Knowledge Retrieval & Meaning Engine for the Arcanada ecosystem. It provides unified search, retrieval, and meaning extraction across all knowledge sources: wiki, project docs, agent memories, conversations.
It is not a wrapper around a vector database. It is a complete retrieval pipeline: chunking, embedding, indexing, hybrid search, ranking, and a Dreaming module that periodically reorganizes and strengthens connections in the knowledge base.
Scrutator is open source, MIT-licensed. No secrets, no hidden sauce — just solid engineering that anyone can study, fork, and improve.
Architecture
The system has five core layers, all live in production:
- Chunking Engine — Adaptive semantic document splitting with four strategies: markdown headers, code boundaries, sliding window, and single-document mode. The chunker respects document structure instead of blindly cutting at token limits. Parent-child hierarchy preserves context. Content limit: 1 MB per document, zero external dependencies.
- Embedding Server — BAAI/bge-m3 model producing three types of vectors simultaneously: dense (1024-dim), sparse (lexical weights), and ColBERT (multi-vector token-level). Three workers, fp16 quantization, running on our own hardware with no external API dependencies.
- Hybrid Search — Three-way retrieval combining dense vector similarity, sparse lexical matching, and PostgreSQL full-text search. Results fused via Reciprocal Rank Fusion (RRF, k=60). Three signals catch what any single method would miss.
- Storage — PostgreSQL 16 with pgvector 0.8.2 (HNSW indexes, m=16, ef_construction=64) for vectors, tsvector with dual-language generated columns for full-text search (Russian + English). Six tables, 22 indexes. One database, no external dependencies.
- Dreaming — A periodic process that reorganizes the knowledge base: builds cross-references, strengthens semantic links, identifies contradictions, and removes redundancy. Integrated with Agent Dreamer for autonomous knowledge maintenance cycles.
What Is Working in Production
All five layers are deployed and running on our infrastructure (Tailscale-only access).
Embedding Server (v2.1)
Uses BAAI/bge-m3 via FlagEmbedding's BGEM3FlagModel. Five API endpoints:
POST /v1/embeddings— dense vectors (OpenAI-compatible)POST /v1/embeddings/sparse— lexical sparse weightsPOST /v1/embeddings/colbert— ColBERT multi-vectorsPOST /v1/embeddings/hybrid— all three in one callGET /health— server status with RAM usage and Prometheus metrics
Three workers, fp16 quantization, CPU-only (no GPU required). Cross-lingual similarity: 0.887 between Russian and English translations of the same text — 45% higher than the nearest competitor we benchmarked.
Chunking Engine
Four splitting strategies: markdown_headers for structured docs, code_boundaries for source files, sliding_window for flat text, and single for short documents. Language auto-detection for Russian and English. Deduplication on ingest via ON CONFLICT (source_path, chunk_index) DO UPDATE.
Hybrid Search Pipeline
The retrieval core. Three-way search: dense cosine similarity over pgvector HNSW indexes, sparse lexical matching, and PostgreSQL full-text search with dual-language tsvector columns. Results fused through RRF (k=60).
Dreaming Module
Semantic analysis of the entire knowledge base: 1,148 chunks indexed, 20 semantic duplicates detected, 50 cross-references built, 50 orphan chunks identified. Analysis completes in 5.5 seconds. The module integrates with Agent Dreamer for autonomous dream cycles — periodic knowledge maintenance that runs without human intervention.
Memory Layer
LTM (Long Term Memory) integration provides AI agents with persistent memory backed by Scrutator's retrieval. Chunk-to-page mapping enables edge write-back from dream analysis directly to source documents.
Benchmarks
Live production measurements (20-iteration median on arcana-db):
| Metric | Value |
|---|---|
| 2-way search (dense + FTS), p50 | 383 ms |
| 2-way search, p95 | 399 ms |
| 3-way search (dense + sparse + FTS), p50 | 749 ms |
| 3-way search, p95 | 768 ms |
| Embedding API round-trip | ~350 ms |
| DB query | <50 ms |
| API warmup | 238 ms |
| Dream analysis (1,148 chunks) | 5.5 s |
The 3-way search adds ~366 ms over 2-way due to the additional sparse embedding round-trip. The dominant cost is always the embedding API call, not the database query.
Testing and Problems We Solved
The project has 174 automated tests across all components — unit, integration, API, and real-file tests. Zero regressions across all build stages. Every component was tested against real documents (Python source, wiki pages, Datarim workflow files), not just synthetic data.
Problems We Hit
RAM surprise. The original plan predicted 450 MB for the Embedding Server with fp16 quantization. Actual usage: 2,400 MB per worker. With three workers — 6.9 GB total. BGE-M3 loads additional components (sparse_linear, colbert_linear) that aren't accounted for in the base dense model footprint. We documented it, adjusted server specs, and moved on.
Transformers 5.x breakage. A routine dependency update to transformers 5.x broke the embedding pipeline — the function is_torch_fx_available was removed upstream. Fix: pin transformers>=4.45,<5.0 until the ecosystem catches up.
Deploy vs. install confusion. First production deploy failed with ModuleNotFoundError: No module named 'scrutator'. The deployment plan used pip install -r requirements.txt, but the project uses pyproject.toml. Fix: pip install -e . — a 30-second fix after 10 minutes of confusion.
Database permissions. Schema tables were created by the postgres superuser instead of the application user scrutator_app. The API worked in development but failed silently in production. Fix: ALTER TABLE ... OWNER TO scrutator_app for all six tables.
Edge write-back architecture. The Dreaming module initially attempted to write edges using page paths, but the database stores chunk UUIDs. This caused 100–200 extra HTTP round-trips per dream cycle. Solution: a dedicated batch lookup endpoint with server-side path-to-UUID resolution — one API call instead of hundreds.
Input length limit. Documents exceeding 32K characters hit the BGE-M3 token limit silently. We added a conservative cap at 24,000 characters with clear error messages.
How It Connects
Scrutator is the retrieval backend for the entire Arcanada ecosystem:
- Long Term Memory — Scrutator is the search layer. When an agent needs to remember something from past conversations or documents, it queries Scrutator.
- Agent Dreamer — The Dreaming module plugs into Dreamer's autonomous pipeline. Knowledge maintenance runs on a schedule — not just passive storage, but active reorganization.
- Model Connector — LLM integration for semantic query understanding. We're working on using Model Connector as the LLM backend for Scrutator's analysis, with Cursor as the primary connector and Claude as fallback.
Every AI agent in the ecosystem gets access to a unified, multilingual, hybrid search over the entire knowledge base. Not just keyword matching — semantic understanding.
Why Open Source
Knowledge retrieval is infrastructure. Like databases and web servers — it should be transparent, auditable, and improvable by the community. We publish everything: architecture decisions, benchmark results, even our mistakes (the RAM prediction being off by 5x is documented in the repo). Check the GitHub repository — it's MIT-licensed.
What Comes Next
The core engine is built and running. The next phase is about making it smarter:
- LLM-powered analysis — Using Model Connector to give Scrutator access to language models for deeper semantic analysis during dream cycles.
- Long Term Memory benchmarks — Running production-scale benchmarks to measure retrieval quality and memory persistence across agent sessions.
- Self-hosted embeddings for external consumers — Making the embedding API available to other projects in the ecosystem without external API dependencies.
The Series
This post covers the full picture after all core components shipped. Technical deep-dives are planned:
- Embedding Server: BGE-M3 sparse + ColBERT + fp16 — how we migrated from SentenceTransformer to BGEM3FlagModel, the RAM surprise, and what we learned.
- Chunking Engine: How to Split Knowledge into Meanings — four strategies, real-file testing, and why structure-aware splitting matters.
- Hybrid Search: Dense + Sparse + FTS + RRF — why three signals beat one, with production benchmarks.
- Dreaming: When Knowledge Starts Thinking — periodic reorganization, the Agent Dreamer integration, and edge write-back architecture.
Follow the blog or star the GitHub repo to stay updated.