Arcanada — Scrutator

A foundational retrieval engine for the Arcanada ecosystem: BGE-M3 trinity embeddings (dense + sparse + ColBERT), PostgreSQL with pgvector, hybrid search via Reciprocal Rank Fusion, adaptive semantic chunking, and a Dreaming module that self-organizes the index over time. MIT-licensed and public; LIVE on arcana-db with 1148+ chunks indexed. Embedding-service availability is the gap that blocks L3.

Capabilities

BGE-M3 trinity embeddings (dense + sparse + ColBERT) on port 8300
Hybrid search with Reciprocal Rank Fusion (port 8310)
Adaptive semantic chunking (no naive fixed-size cuts)
Dreaming module for index self-organization
PostgreSQL + pgvector storage
MIT-licensed, public repo (Arcanada-one/scrutator)
1148+ production chunks across the ecosystem knowledge base

Current autonomy level

What levels mean →

Weakest link

BGE-M3 worker health is not externally monitored; if the embedding service is down, search returns empty and consumers see no error. Index drift between PG vectors and source documents is not auto-detected.

Roadmap to L3

L3 lift — /health endpoint per worker, fallback to a lighter embedding model on outage, structured pino traces.
L3 polish — drift detection job comparing PG vector count vs source manifest; classified errors propagated to consumers.
Verification gate — kill embedding worker mid-query and assert classified 503 + Ops Bot fatal event within 5 s.

Links

GitHub

Next step

Explore the available product and its current limits

LIVE means the project is available now; the weakest link and roadmap above state the current version’s limits.

Review the path to the target level