Roadmap

CellState Roadmap

Last updated: March 12, 2026 — v0.5.5

CellState is a Rust runtime and API server that gives AI agents persistent, hierarchical memory with forensic event tracking. PostgreSQL 18 is the brain. Every state change flows through a deterministic mutation pipeline and produces a tamper-evident cryptographic receipt.

This roadmap describes where the project is, where it’s going, and what each stage means. For the versioning philosophy behind these stages, see VERSIONING.md.


Where We Are: v0.5.5

v0.5.5 shipped March 12, 2026 — CLI full-fidelity (25 entity commands), core type extraction, Convex SDK hardening, embedding dimension fixes.

What’s live right now:

  • 7-crate Rust workspace, compiling to a single binary
  • 4-stage mutation pipeline (Assemble → Gates → Commit → Receipt) with 11 gate checks
  • Immutable Event DAG with Blake3 hash chains and UUIDv7 causal ordering
  • Full entity hierarchy: Tenant → Agent → Trajectory → Scope → Turn, plus Artifacts and Notes
  • 57 PostgreSQL tables, 85+ REST endpoints
  • Basic MCP server
  • WebSocket event broadcast and SSE streaming
  • LMDB cache layer for sub-millisecond reads
  • 18 background jobs
  • 2,001 passing tests
  • 48 Prometheus metrics, OTLP observability
  • Live in production on bare metal (Linode Newark + Cloudflare Tunnel)

v0.5.5 is a buffer release for ad-hoc optimization, sanity checks, and foundation work before v0.6. No fixed scope — it ships when the foundations feel solid.

What was built in v0.5.3–v0.5.5 (in progress)

18 PRs wiring up every typed-but-unimplemented behavior in the codebase:

  • MCP: Prompts capability, logging capability, completion/autocomplete, vector-backed note search
  • A2A: Full task state machine (submit → working → completed/failed/canceled), real SSE streaming with single-fetch polling
  • Conflict resolution: ContradictionGate dispatches 4 strategies (LastWriteWins, HighestConfidence, Escalate, None)
  • Artifact promotion: Child trajectory artifacts auto-promote to parent scope on outcome report; delegation completion triggers promotion
  • Summarization chains: Auto-triggering of L0→L1→L2 policies with threshold evaluation and pipeline routing
  • Protocol surfaces: AG-UI SSE endpoint, A2UI mutations + subscriptions with tenant isolation and bootstrap snapshots
  • Module isolation: Protocol layer enforced to not import from routes
  • HNSW indexes: Uncommented and wired with V57 migration
  • OpenAPI: 143 of 196 endpoints annotated and registered
  • Wire contract tests: 8 JSON fixtures, TypeScript + Python SDK test expansion
  • Benchmarks: Criterion context assembly benchmarks, k6 load test skeleton
  • Lock contention tests: Concurrent acquire, TTL expiry, 10-way contention
  • MCP integration test: End-to-end initialize → tools/list → tools/call

v0.6.0 — “It Works”

Theme: Every typed behavior is wired. The API does what it claims.

This is the shipment of the v0.5.3–v0.5.5 work as a tagged, verified release. Philosophy: “use what we have before we bolt shit on.”

When this ships:

  • Full MCP protocol coverage (prompts, logging, completion, vector search)
  • A2A task lifecycle functional end-to-end
  • All 4 conflict resolution strategies operational
  • AG-UI and A2UI protocol surfaces functional server-side
  • Summarization chains auto-trigger across abstraction levels
  • Artifact promotion works across trajectory hierarchy
  • OpenAPI spec covers core endpoints
  • TypeScript and Python SDK wire contracts validated

What v0.6 does NOT ship: test coverage for the full DB layer, production error quality, documentation, API freeze.


v0.7.0 — “Prove It” (Test Coverage)

Theme: Build the regression net. You can’t safely refactor 641 error-handling sites without tests catching regressions.

A codebase audit revealed: 78% of DB modules have zero unit tests, 65% of route handlers have zero or trivial tests, and 13 of 22 background jobs have minimal coverage. The happy paths work; the error paths are unproven.

What ships:

  • Auth + tenant DB tests — api_key authenticate/rotate/revoke, tenant member lifecycle (security-critical)
  • A2A + coordination lifecycle tests — full state machine with real DB, lock contention, delegation lifecycle
  • Agent + BDI persistence tests — belief/goal/plan storage, checkpoint save/load
  • Infrastructure DB tests — working set, pack config, summarization, deployment, tool execution (75 functions)
  • Route handler tests — search, config, models, event DAG, summarization CRUD (15 zero-coverage endpoints)
  • Job error path tests — OAuth refresh failure, tenant lifecycle, MCP error scenarios
  • Load test baselines — k6 authenticated CRUD + context assembly, 50 VUs 5 min, committed baselines, CI gate (p95 regression > 20% = fail)

When this ships, every critical code path has at least one test, and performance regressions are automatically caught.


v0.8.0 — “Harden It” (Error Quality)

Theme: Stop lying to operators. Replace every generic error message with context that helps debug production. Safe to do because v0.7 tests catch regressions.

A codebase audit revealed: 371 ApiError::internal_error calls, 86% with generic messages like “Entity deserialization failed” (repeated 68 times in one file). 100+ production unwrap() calls that will panic. 170 silently swallowed errors in the storage layer with zero logging.

The error architecture is solid (40-type ErrorCode enum, proper response shape, production redaction). The content is garbage.

What ships:

  • generic.rs contextualization — 70 identical error messages replaced with entity type + field context
  • driver.rs error quality — 43 generic errors + 41 silent let _ = patterns replaced with structured messages and debug logging
  • Storage layer visibility — 19 silent delete/evict failures now logged (cache corruption signals, eviction contention)
  • Production unwrap elimination — zero unwrap() calls in non-test server code
  • CHANGELOG.md — backfilled from v0.5.0 through v0.8.0

When this ships, a production error log entry tells you what went wrong, not just that something did.


v0.9.0 — “Document It” (API Surface)

Theme: The API is stable, tested, and well-errored. Now make it discoverable.

A codebase audit revealed: 2,960 public items across all crates, 8.5% with doc comments. 53 HTTP endpoints missing from the OpenAPI spec. The Rust SDK README is 49 lines.

What ships:

  • OpenAPI 100% coverage — all 196 endpoints annotated and registered
  • Core types documentation — every pub struct, pub enum, pub trait in cellstate-core has /// comments
  • Server public API documentation — route handlers, pipeline stages, middleware, request/response types (target: 40%+ from 8.5%)
  • SDK documentation — Rust SDK comprehensive README, Python/TypeScript API reference sections
  • A2A SSE push via LISTEN/NOTIFY — replace 1-second polling with DB-native push (the one optimization, because documenting a polling API as the contract is setting a bad precedent)

When this ships, someone reading the docs can use CellState without reading the source.


v1.0.0 — “The Contract”

1.0 is not a marketing event. It’s a promise: this API will not break under you without a major version bump.

Requirements:

  • Public API surface frozen — every path, type, error code in a versioned OpenAPI spec committed to the repo. CI check: any spec drift = fail.
  • Schema migration CLIcellstate migrate status, cellstate migrate up, cellstate migrate validate. Operators can upgrade without hoping.
  • External validation — at least one team outside core running CellState for 30+ days
  • SDK reference docs — TypeScript, Python, Rust all published with getting-started examples

What we decided NOT to do before 1.0:

  • Crate decomposition (module isolation is enforced; separate crates are premature until boundaries prove stable)
  • MCP JSON-RPC transport (HTTP POST works; JSON-RPC is a v1.x addition)
  • MCP SSE transport (same reasoning)
  • Hosted service (post-1.0)

After 1.0:

  • Releases are milestone-based, not calendar-based
  • Patches ship as needed (same-day for security)
  • Minor versions ship when a coherent capability set lands
  • Major versions only when breaking changes are genuinely necessary
  • Deprecated items survive at least two minor versions before removal

Beyond 1.0: Directions, Not Promises

These are areas of interest, not commitments. They’ll become concrete milestones when the foundation is stable enough to build on.

  • CellState Hosted — managed service where the runtime lives on owned infrastructure and agent shells deploy to edge. The Webflow-for-agents model.
  • Crate decomposition — if v0.9 module boundaries held, split cellstate-server into protocol-aligned crates (cellstate-mcp, cellstate-a2a, cellstate-agui). If not, document why and defer.
  • Pack Editor as CellState agent — the configuration tool itself runs on CellState, using its own working sets and event DAGs. Full dogfooding.
  • cellstate-rs crate — published Rust crate for agent developers who want typed state machines, event DAGs, and mutation pipelines without running the full server.
  • Ecosystem integrations — first-class connectors for major agent frameworks, LLM providers, and orchestration platforms.

How to Follow Along

  • Changelog: See CHANGELOG.md for the structured record of every change
  • Releases: GitHub Releases include human-written notes explaining what each version means
  • Versioning philosophy: See VERSIONING.md for the full framework behind these stages