Release Audit v0.5.5

0.5.5 Release Audit Ledger

This pass focused on concrete public-release blockers found while auditing the recent 0.5.5 release train and the current release path.

Findings

Blocker, Fixed: release preflight treated Fly templates as release-critical input

  • Evidence: bash ./scripts/release/preflight.sh v0.5.5 --skip-lint --skip-openapi --skip-package-dry-runs originally failed on missing root fly.api.toml.
  • Evidence: the same script used a stricter Python __version__ regex than the release workflow and failed against the current packages/python/cellstate/__init__.py.
  • Remediation: scripts/release/preflight.sh no longer treats Fly configs as release-critical, uses the same whitespace-tolerant Python version regex as release, and also validates packages/convex/package.json.

Blocker, Fixed: installer and release artifacts were on different contracts

  • Evidence: scripts/install.sh pointed at heyoub/cellstate, looked for cellstate-${version}-${target} assets, claimed universal platform support, and claimed SHA256 verification without any checksum publishing or verification.
  • Evidence: .github/workflows/release.yml only published cellstate-server-linux-amd64-${tag}.tar.gz.
  • Remediation: the installer now normalizes X.Y.Z and vX.Y.Z, points at the actual GitHub repo, targets the published Linux amd64 artifact, downloads and verifies a published .sha256, and fails clearly for unsupported platforms instead of pretending support exists.
  • Remediation: the release workflow now generates and uploads a SHA256 checksum beside the Linux binary and the release body now leads with the checksum-verified extraction flow instead of an unchecked curl | tar.

High, Fixed: release workflow could report success while public SDK delivery failed

  • Evidence: enabled sdk-typescript, sdk-python, smoke-typescript, and smoke-python jobs were marked continue-on-error: true.
  • Risk: a green release could still ship with broken npm/PyPI publication or broken consumer import paths.
  • Remediation: removed continue-on-error from the enabled TypeScript/Python publish and smoke jobs so the release fails closed.

High, Fixed: published cellstate-pg image was never actually smoke-tested

  • Evidence: release built and published ghcr.io/.../cellstate-pg, but smoke-docker booted pgvector/pgvector:pg18 instead of the published PG image.
  • Risk: release could advertise a cellstate-pg artifact that had never been exercised in the release pipeline.
  • Remediation: smoke-docker now depends on build-pg, pulls the published cellstate-pg tag, and uses it for the container smoke environment.

High, Fixed: CI did not compile or typecheck the Convex package

  • Evidence: make ci-ts covered root Bun checks, the TypeScript SDK build, and contract tests, but not packages/convex.
  • Evidence: once packages/convex was typechecked directly, packages/convex/src/component/lib.ts failed on unchecked unknown[] filtering and stringly _id usage.
  • Remediation: Makefile now adds cd packages/convex && bun run typecheck:all && bun run build to ci-ts.
  • Remediation: packages/convex/src/component/lib.ts now uses explicit document-narrowing helpers and typed stored-document IDs so the existing logic typechecks without changing behavior.

Medium, Fixed: deployment docs blurred production reality with optional templates

  • Evidence: deployment docs and checklists gave Fly example configs first-class treatment even though production is on bare-metal Linode.
  • Remediation: docs and release language now state Linode bare metal is the current production path and treat Fly/Railway/Helm as examples unless explicitly used.

Medium, Fixed: Helm example publishing blocked the core release path

  • Evidence: the release workflow required the Helm job before creating the GitHub release, even though Helm is an example deployment surface rather than the primary production path.
  • Risk: an example chart failure could block a valid bare-metal release.
  • Remediation: Helm remains publishable as an example artifact, but the core release job no longer waits on it.

Medium, Fixed: final release job was redundantly re-uploading assets that earlier jobs already published

  • Evidence: build-binary, openapi, and docs-bundle each uploaded their own release assets, and the final release job then downloaded those artifacts and uploaded them again.
  • Risk: extra release coupling, wasted CI time, and more chances for asset/update races while adding no real verification.
  • Remediation: the final release job now only creates/updates release notes. Asset-producing jobs remain responsible for uploading their own artifacts.

Medium, Fixed: TypeScript CI did not wake up for SDK-pipeline changes outside package directories

  • Evidence: the docs-guard TypeScript change filter skipped scripts/generate-sdk.sh and workflow changes, so a PR could alter the TS SDK pipeline while skipping make ci-ts.
  • Risk: package and generator regressions could survive PR CI and only show up at tag time.
  • Remediation: the TypeScript change filter now includes the SDK generation script and CI/release workflow files so make ci-ts runs when the TS delivery path changes.

Low, Fixed: extension SQL CI was waking up on unrelated script churn

  • Evidence: the docs-guard PostgreSQL filter marked any scripts/** change as a PG-extension change, which triggered the heavyweight extension-sql job even for unrelated helper-script edits.
  • Risk: wasted CI time and noisier signals without improving extension confidence.
  • Remediation: the PG-extension filter now keys off the actual extension/build surfaces instead of all scripts.

High, Fixed: the real Linode deploy script could skip brand-new migrations on first deploy

  • Evidence: examples/deploy/linode/deploy.sh originally copied repo migrations into /opt/cellstate/migrations only after it had already computed MAX(version) and iterated the files to apply.
  • Risk: a release with a new SQL migration could deploy the new binary without applying the new migration until a second deploy or manual rerun.
  • Remediation: the deploy script now syncs migrations from the current repo checkout before deciding what to apply, preserving forward-only/idempotent behavior on retry.
  • Remediation: the same deploy script now verifies the published release tarball checksum before installation instead of piping an unchecked download straight into extraction.

High, Fixed: @cellstate/convex was public in npm terms but missing from the release contract

  • Evidence: the repo has a versioned public package at packages/convex, CI builds it, preflight checks its version and packability, and you intend to ship it publicly on npm.
  • Evidence: .github/workflows/release.yml previously published/smoked @cellstate/sdk and cellstate (Python), but not @cellstate/convex.
  • Risk: tag releases could appear green while the public Convex package lagged the tagged version, failed publication, or had broken install/import paths.
  • Remediation: the release workflow now publishes @cellstate/convex to npm, waits for a consumer install/import smoke test, and lists it in the release notes SDK section.
  • Remediation: packages/convex/package.json now points npm repository metadata at packages/convex instead of the nonexistent top-level convex directory.

Medium, Open: binary installer support is intentionally narrowed to Linux x86_64

  • Evidence: release still publishes only a Linux amd64 binary tarball.
  • Remediation in this pass: installer now reflects reality and works for the published artifact instead of advertising unsupported targets.
  • Follow-up if desired: add macOS/arm64/Windows release builds before widening installer claims again.

Medium, Validation Gap: Python package build could not be fully re-run in this sandbox

  • Evidence: python3 -m build is installed locally, but package build failed because isolated build env setup needed to fetch hatchling and network access is blocked here.
  • Evidence: python3 -m build --no-isolation also failed because hatchling is not installed in the local interpreter.
  • Impact: Python packaging is not marked broken, but it is not fully re-verified from this environment.

Local Validation Performed

  • bash -n scripts/install.sh scripts/release/preflight.sh
  • bash ./scripts/release/preflight.sh v0.5.5 --allow-dirty --skip-lint --skip-openapi --skip-package-dry-runs
  • cd packages/convex && bun run typecheck:all
  • cd packages/convex && bun run build
  • bun run build:sdk
  • bun test ./tests/contracts/
  • npm pack --dry-run in packages/typescript
  • npm pack --dry-run in packages/convex

Required Follow-up Before Tagging

  • Run the Rust/DB-backed/security/live-API CI jobs for the final release commit.
  • Re-run Python package build and twine check in a networked CI/release environment.
  • Decide explicitly which non-primary artifacts you want to keep publishing every release: Docker API image, cellstate-pg image, Helm chart example, docs bundle.
  • Decide whether the docs bundle should remain a blocking CI artifact or stay optional as it is now in the release notes path.