Upload audio or video, get accurate transcripts with speaker diarization, timestamps, and editable segments. Backend uses Groq Whisper (free tier covers ecosystem volume); self-hosted Silero stack planned for high-volume customers.
Capabilities
- Whisper-large-v3 transcription via Groq
- Speaker diarization and timeline export
- Multi-format input (mp3, wav, mp4, webm)
- BullMQ job queue with retry classification
- Token-based billing with usage caps
Current autonomy level
Weakest link
DB schema drift not detected at runtime; a mock provider in tests has masked production outages in the past.
Roadmap to L4
- L3 lift — Prisma migration status check at startup; remove mock provider from production code path; structured pino traces.
- L4 lift — Groq → self-hosted Silero fallback for outages; hard cost CB per tenant; recovery audit log; output schema validation.
- L5 candidate — model routing based on file length and tier (when volume crosses 1000+ transcriptions/day).