ARCANADA
← Autonomy / Scrutator
LIVE L1 · target L3

Scrutator

Open-source hybrid search and meaning-extraction engine.

A foundational retrieval engine for the Arcanada ecosystem: BGE-M3 trinity embeddings (dense + sparse + ColBERT), PostgreSQL with pgvector, hybrid search via Reciprocal Rank Fusion, adaptive semantic chunking, and a Dreaming module that self-organizes the index over time. MIT-licensed and public; LIVE on arcana-db with 1148+ chunks indexed. Embedding-service availability is the gap that blocks L3.

Capabilities

  • BGE-M3 trinity embeddings (dense + sparse + ColBERT) on port 8300
  • Hybrid search with Reciprocal Rank Fusion (port 8310)
  • Adaptive semantic chunking (no naive fixed-size cuts)
  • Dreaming module for index self-organization
  • PostgreSQL + pgvector storage
  • MIT-licensed, public repo (Arcanada-one/scrutator)
  • 1148+ production chunks across the ecosystem knowledge base

Current autonomy level

L1
What levels mean →

Weakest link

BGE-M3 worker health is not externally monitored; if the embedding service is down, search returns empty and consumers see no error. Index drift between PG vectors and source documents is not auto-detected.

Roadmap to L3

  1. L3 lift — /health endpoint per worker, fallback to a lighter embedding model on outage, structured pino traces.
  2. L3 polish — drift detection job comparing PG vector count vs source manifest; classified errors propagated to consumers.
  3. Verification gate — kill embedding worker mid-query and assert classified 503 + Ops Bot fatal event within 5 s.

Links