Arcanada — Datarim: Do Agents Dream of Electric Sheep?

Philip K. Dick asked the question about androids. I'm asking it about AI agents. Because Datarim has a command called /dr-dream, and the agent genuinely "falls asleep." A librarian agent sifts through the project knowledge base, finds contradictions, builds cross-references, throws out garbage. Wakes up with a clear head.

Sounds like science fiction. It's working code. MIT license. On GitHub. Right now.

Here's what I'm arguing: a framework is not a tool. It's an operating system for AI development. Not "help me write a function," but "here's a backlog of 40 tasks, pick the next one, take it from requirements to archival, learn from your mistakes, suggest process improvements."

AI agents can write code. That's old news. But they can't run projects. Hand Claude Code a task with no structure and it'll write code, maybe even working code. But it won't ask about requirements. Won't assess complexity. Won't write tests before code. Won't do a review. Won't document what it learned. Won't pick the next task from the queue.

Datarim gives the agent that structure: backlog → task → complexity assessment → requirements → plan → implementation → verification → reflection → archival → next task.

I've closed over a thousand tasks in one year. Solo. On a $200/month subscription. No dev team. No project manager. No QA engineer. The agent does all of it, following Datarim's rules. I'm not claiming "this replaces a team." I'm saying one person with the right framework operates at the capacity of a small studio.

And no, I'm not a genius or a workaholic pulling 16-hour days. I just stopped re-explaining the same things to the agent every session.

Where It Came From

It started small. In March 2025, vanzan01 published cursor-memory-bank on GitHub — a simple memory bank for the Cursor IDE. The idea was basic: the agent remembers project context between sessions. Structure, stack, architecture decisions. No magic, just markdown files loaded at the start of each conversation.

I liked the idea but wanted more. I forked it and built cursor-memory-bank-angry. "Angry" from Angry Robot Deals, my company at the time. I added workflow management commands. Made TDD mandatory. Created a backlog. Bolted on an SDLC framework with task decomposition and automatic stack selection.

It worked better, but it was still tied to Cursor and to code. I wanted a system that manages any project, not just software. Research. Articles. Legal documents. DevOps infrastructure. SEO audits.

That's how Datarim v1.3.0 was born — a full platform. 15 agents, 18 skills, 19 commands, 5 templates, a 9-stage pipeline. Built for Claude Code, though the architecture isn't locked to any specific provider.

Links, if you want to trace the evolution:

Original: https://github.com/vanzan01/cursor-memory-bank
Intermediate: https://github.com/Angry-Robot-Deals/cursor-memory-bank-angry
Datarim: https://github.com/Arcanada-one/datarim

How It Works

A 9-stage pipeline:

init → prd → plan → design → do → qa → compliance → reflect → archive

Not every task goes through all 9. That would be absurd — running a bug fix through architectural design review. So there's complexity routing:

Level	What it is	Pipeline
L1 Quick Fix	1 file, <50 LOC	init → do → reflect → archive
L2 Enhancement	2-5 files, <200 LOC	init → [prd] → plan → do → [qa] → reflect → archive
L3 Feature	5-15 files, 200-1000 LOC	init → prd → plan → design → do → qa → [compliance] → reflect → archive
L4 Major Feature	15+ files, >1000 LOC	init → prd → plan → design → phased-do → qa → compliance → reflect → archive

Brackets mean optional at that level. An L1 task closes in 4 steps. L4 goes through all 9.

15 Agents

Each agent is a specialized role. Not the same Claude wearing different hats, but a different set of skills, rules, and context for each role:

Agent	Role	When it works
planner	Lead Project Manager	init, plan, archive
architect	Chief Architect	prd, design
developer	Senior Developer (TDD)	do
reviewer	QA & Security Lead	qa, reflect
compliance	Compliance Runner	compliance
code-simplifier	Code Simplification	compliance
strategist	Strategic Advisor	plan (L3-4)
devops	DevOps Engineer	plan, do, compliance
writer	Content Writer	write, reflect, archive, prd
editor	Content Editor	edit, qa (content)
skill-creator	Skill/Agent/Command Creator	addskill
optimizer	Framework Optimizer	optimize, reflect
librarian	Knowledge Base Librarian	dream
security	Security Analyst	design, qa, compliance
sre	Site Reliability Engineer	design, qa, reflect

Not all agents load for every task. An L1 fix loads developer and maybe reviewer. An L4 project pulls in planner, architect, developer, reviewer, strategist, and whoever else is needed.

18 Skills

Skills are knowledge modules. Not code — rules and patterns the agent loads on demand:

datarim-system — core, always loaded
ai-quality — TDD, decomposition, cognitive load management
compliance — 7-step hardening workflow
security — auth, validation, data protection
testing — testing pyramid, mocking rules
performance — optimization patterns
tech-stack — stack selection by project type
utilities — shell recipes for common operations
consilium — multi-agent panel discussions
discovery — requirements gathering interview
evolution — framework self-update rules
writing — content creation and editorial workflow
dream — knowledge base maintenance rules
seo-launch — SEO, analytics, launch checklists
marketing — ad campaigns, conversion tracking, landing pages
factcheck — fact verification for publications
humanize — AI text pattern removal
visual-maps — Mermaid workflow diagrams

Autonomous Work in Practice

Picture this: the backlog has task BACKLOG-0042 "Add JWT authentication to API." The agent picks it up:

/dr-init — loads the task, assesses complexity as L3
/dr-prd — architect defines requirements: token format, expiration, refresh flow, protected routes
/dr-plan — planner + strategist break it into phases: middleware, token service, login endpoint, tests
/dr-design — consilium: architect + security discuss JWT vs session tokens, make a decision
/dr-do — developer writes code via TDD: tests first, then implementation, one method per iteration
/dr-qa — reviewer verifies: PRD alignment, security review, test coverage, OWASP checks
/dr-reflect — reviewer notes: "refresh token rotation was underestimated during planning"
/dr-archive — planner archives the task, updates backlog, picks up the next one

The full cycle. Without me hovering over every step. I set the direction and review the results.

A Framework That Thinks About Itself

You could write a pipeline in a day. Three mechanisms I haven't seen in any other framework? Those you can't.

Reflection

/dr-reflect runs after every task. The reviewer agent analyzes what went well, what went wrong, and proposes specific improvements. Not vague stuff like "we should plan better," but "the security skill doesn't cover rate limiting — add a section." Or "the developer agent spends too many tokens re-reading files — add a caching rule to ai-quality.md."

Proposals get recorded. A human approves or rejects. The framework updates. The next task runs better.

Over a year, Datarim has gone through dozens of these cycles. The skills that exist today weren't planned from the start. factcheck appeared after I published an article with inaccurate numbers. humanize — after a LinkedIn post read like typical ChatGPT output. seo-launch — after manually launching a third website in a row.

Dream

/dr-dream is the librarian. An agent that doesn't write code or create content. It organizes the project knowledge base: the datarim/ directory, all state files, archives, reflections.

Finds contradictions between documents (techContext.md says "Express" but the code uses Fastify)
Builds cross-references
Removes outdated information
Reorganizes misplaced files
Updates indexes

I run it every week or two. After a big sprint or before starting a new project phase. Clean knowledge base = fewer agent hallucinations.

Optimize

/dr-optimize audits the framework itself. The optimizer agent checks all 18 skills, 15 agents, 19 commands. It looks for:

Unused skills (loaded but their rules aren't applied)
Duplicates (two skills describing the same thing in different words)
Broken references between files
Drift between documentation and reality

When a framework grows organically through reflection, entropy accumulates. Optimize fights that.

Consilium

For critical decisions (L3-L4), Datarim assembles a panel of agents. Architect proposes architecture, Security looks for vulnerabilities, DevOps calculates operational costs, SRE thinks about reliability. They debate. The outcome is a weighed decision.

This isn't synthetic crowd wisdom. Each agent loads its own skill set and sees the task through its own lens. Architect optimizes for scalability, Security for protection, SRE for observability. The tension between them is the value.

What Datarim Is Not

Three misconceptions that come up most often.

A Memory Layer (mem0.ai)

Datarim is not mem0.ai or anything like it. Mem0 solves a different problem: persistent agent memory. Who you are, what you like, what decisions you made before. That's useful, but it's a different layer.

Datarim is project execution structure. What to do. In what order. With what quality checks at each step. How to learn from results. Mem0 remembers that you like TypeScript. Datarim makes sure you wrote tests before code, did a review, documented decisions, and updated the backlog.

Different layers. Compatible. You can use both.

A Multi-Agent Orchestrator (AutoGPT, CrewAI)

Datarim is not a multi-agent orchestrator running agents in parallel. There's no complex real-time message routing between agents. One agent per stage. Clear pipeline. The next agent gets the previous one's output through files in datarim/.

This is simpler than CrewAI. And more reliable. Multi-agent systems with parallel execution look great in diagrams but produce coordination problems that are painful to debug. Datarim chooses sequence and predictability.

A Code-Only Tool

This is probably the most common misconception. Yes, Datarim grew out of software development. But the pipeline — requirements → plan → execution → verification → reflection — is universal.

Real use cases from the documentation:

Legal documents (SaaS Terms of Service)
Academic research (literature reviews, methodology)
Technical documentation (API docs, architecture decisions)
Project management (backlogs, iterations, retrospectives)
Content (articles, posts, with fact-checking)
DevOps (CI/CD, Docker, deployments)
SRE (observability, SLOs, incident response)
SEO (audits, analytics, Search Console)
Ad campaigns (Google Ads, structure, tracking)
App Store publishing (metadata, screenshots, privacy policy)
Website launches (pre-launch checklist, SSL, OG tags)
UI/UX (landing pages, components, responsiveness)

This article you're reading right now was written through Datarim. Task CONTENT-0001. Pipeline: init → plan → write → edit → archive.

Who It's For

Datarim works well if you're:

A solo developer who wants to get the most out of Claude Code
A startup where one or two people do the work of ten
A researcher running a large project with many tasks
A content team that needs an editorial process with fact-checking
Anyone who works with AI agents and is tired of chaotic freestyle

Datarim is not for you if you:

Don't use AI agents (the framework without an agent is just markdown files)
Write throwaway scripts (the overhead isn't worth it)

If you already have Jira, Asana, CI/CD, and code review — great. Datarim doesn't replace your SDLC, it works inside it. I pull a task from Asana, run /dr-init, and the agent takes it through the pipeline. The project backlog lives in your tracker, the task-level execution lives in Datarim. Different layers.

A note on models. Datarim is built with limited context in mind. At each stage, the agent loads the minimum necessary information: only the relevant skill, only the current task state, only the files referenced in the plan. Not the whole project. This means the framework works on models with smaller context windows too. Not perfectly, but it works.

Quick Start

Five minutes from zero to your first task.

git clone https://github.com/Arcanada-one/datarim.git
cd datarim && ./install.sh

The script places agents, skills, and commands in the right directories. Then:

cp CLAUDE.md /path/to/your/project/
cd /path/to/your/project/
claude

In Claude Code:

/dr-help          # See all commands with descriptions
/dr-init <task>   # Start working

First task for beginners: try an L1. Describe a simple bug fix or minor improvement. /dr-init "Fix typo in README". The agent assesses L1, runs init → do → reflect → archive. You'll see the full cycle in 5 minutes. Then try L2 — with a plan. Then L3 — with design and consilium.

MIT license. Fork it, adapt it, contribute.

GitHub: https://github.com/Arcanada-one/datarim

What's Next

I won't sugarcoat this with vague optimism. Here's what I know.

Datarim is a framework I use every day for real work. A thousand tasks in a year — that's not a marketing number, it's my backlog archive. Every task left a reflection, and every reflection made the framework a little better.

If you work with AI agents and feel like the freestyle approach is hitting a ceiling — give it a shot. Clone it, run /dr-init, close a few tasks. If it doesn't click, you've spent thirty minutes. If it does, you'll save hundreds of hours.

The repo is open. Contributors welcome. Issues too.