colibri/docs/VAULT-PROVISION-RUNBOOK.md
Sam & Claude 064079e3fc
Some checks failed
CI / rust (pull_request) Has been cancelled
CI / markdown (pull_request) Has been cancelled
CI / port (pull_request) Has been cancelled
CI / agent-jail-pkgs (pull_request) Has been cancelled
docs: harness-agnostic + plainer doc names; codify naming principle
- ZOT-RPC-TRANSCRIPT.md → AGENT-EVENTS-REFERENCE.md: neutral, per-harness event
  reference (currently documents zot; pi uses pi --mode json). Avoids baking the
  current default harness into a name — same lesson as the pi_* renames. Adds a
  'Developer reference — operators can skip' header.
- VAULT-PROVISION-FIRST-PROOF.md → VAULT-PROVISION-RUNBOOK.md: it's a runbook;
  'first-proof' was redundant.
- Updated referrers: spawner.rs, wiki/agent-harness.md, docs/README.md.
- wiki/naming-decisions.md: new 'Naming principle — harness-agnostic by default'
  section (neutral concept → neutral name + configurable value; harness-specific
  → harness in the name, kept symmetric zot_/pi_).
- Fixed US/ISO prose dates → DD.mon.YYYY (21.jun.2026) per AGENTS.md; left the
  literal JSON "time" timestamps in the captured transcript as-is (data).

Gates: wiki-lint --strict clean; markdown format clean.
2026-06-24 16:33:40 +02:00

6.5 KiB

Vault Provision Runbook (osa)

Status. The spawn → vault-provision → .env chain is wired, hardened, and drivable from the CLI. The three gaps that previously forced a manual path are all closed:

  • #101 colibri register-tenant + list-tenants socket command / CLI verb landed (PR #107).
  • #102 colibri spawn-agent/spawn-local accept --jail-name / --jail-root flags (PR #107).
  • #92 provision-target containment guard landed — colibri-vault::provision canonicalizes the target and asserts it is strictly under the allowed jail-root base before any write (PR #119).

This runbook proves the chain live on osa using the clean CLI — no raw SQLite, no nc -U JSON. It validates the production deployment pattern (Bastille jail + provisioned .env); see AGENTS.md Project Identity — the bare-metal Clawdie service runs exactly this model.

First-proof policy (see layered-soul/docs/HIVE-ONBOARDING.md): use a scratch jail + throwaway test collection only — no real tenant data.


How the chain actually resolves (so the setup is correct)

  • The hook is provision_tenant_env(jail_name, jail_root_path). It looks up store.get_tenant(jail_name); if no tenant row matches, it no-ops.
  • It then requires tenant.jail_root_path == spawned root (trailing-slash-normalized) — a mismatch refuses provisioning.
  • colibri-vault::provision then canonicalizes the target and asserts it is strictly under the allowed jail-root base (COLIBRI_JAIL_ROOT_BASE, defaults to /usr/local/bastille/jails on FreeBSD) — a traversal/symlink escape is refused with TargetEscapesRoot before any directory or file is created (#92/#119).
  • It calls the bw CLI to fetch items by name from the collection named tenant_id, so tenant_id = jail name = collection name (the 1:1:1 contract).
  • On success it writes <jail_root>/.env at 0600 and flips tenant status → active.

Paths (FreeBSD daemon)

  • Socket: /var/run/colibri/colibri.sock
  • DB: /var/db/colibri/colibri.sqlite
  • Provider env (bootstrap creds): /usr/local/etc/colibri/provider.env

Prerequisites

  1. colibri-daemon running on osa (colibri ≥ 0.11.0 — has the CLI verbs + flags).
  2. /usr/local/etc/colibri/provider.env (mode 600) has BW_SERVER plus the three bootstrap secrets BW_CLIENTID / BW_CLIENTSECRET / BW_PASSWORD (PR #69), and the daemon has them in its environment (the rc.d loads provider.env).
  3. bw CLI on the daemon's PATH.
  4. Pick a scratch tenant id, e.g. T=proof0.

Step 1 — scratch jail + bootstrap

T=proof0
sudo bastille create "$T" 15.0-RELEASE-p10 <ip>   # your standard Bastille create
sudo agent-jail-bootstrap.sh "$T"                 # runtime pkgs + colibri binaries
# jail root is /usr/local/bastille/jails/$T/root

Step 2 — test collection in Vaultwarden

In the web UI, create a Collection named exactly $T, and add one Login item:

  • Name = a harmless env var, e.g. FIRST_PROOF_KEY
  • Password field = a throwaway value (this validates the name-based contract)

The bootstrap account must have read access to that collection.

Step 3 — register the tenant (CLI — #101)

T=proof0
sudo colibri register-tenant "$T" "/usr/local/bastille/jails/$T/root" "$T"
# expect JSON: {"tenant_id":"proof0","jail_root_path":"...","collection_id":"proof0","status":"provisioned",...}
  • jail_root_path must exactly match the spawned root (the hook compares them).
  • The collection is resolved by tenant_id (name) at provision time; pass collection_id = tenant_id to keep the 1:1:1 contract explicit.
  • Verify anytime without raw SQLite: sudo colibri list-tenants.

Step 4 — trigger a jailed spawn (CLI — #102)

T=proof0
sudo colibri spawn-agent local /usr/local/bin/colibri-test-agent \
    --jail-name "$T" \
    --jail-root "/usr/local/bastille/jails/$T/root" \
    --session-id "$T-proof"
  • provider: "local" uses the colibri-test-agent binary copied into the jail by agent-jail-bootstrap.sh, so the proof does not depend on provider API keys or a separate COLIBRI_AGENT_BINARY being present in the jail.
  • --jail-name enters the existing jail (jexec); --jail-root is the host-visible root where the hook writes .env. Both are required to trigger provisioning — a spawn without --jail-name/--jail-root skips the provision hook entirely.
  • The provision hook fires after the local test agent spawns successfully.

Step 5 — verify

T=proof0; DB=/var/db/colibri/colibri.sqlite; R=/usr/local/bastille/jails/$T/root
# daemon log shows: "provisioning tenant env from vault" then "vault provision complete"
sudo stat -f '%Sp %N' "$R/.env"                       # expect -rw------- (0600)
sudo grep -c '^FIRST_PROOF_KEY=' "$R/.env"             # expect 1 (value not printed)
sudo colibri list-tenants | grep proof0 | grep active  # expect status=active

Pass = .env at 0600, key present, tenant status=active.

Cleanup (scratch proof)

T=proof0
sudo rm -f /usr/local/bastille/jails/$T/root/.env
sudo bastille destroy "$T"
# delete the test Collection + item in Vaultwarden
# tenant row: list-tenants will stop showing it after bastille destroy; remove the
# row from the SQLite store if you want it gone immediately (no CLI verb yet for delete).

Security notes

  • Scratch jail + test collection only (first-proof policy) — no real tenant secrets.
  • Bootstrap creds (BW_*) remain confined to the daemon's provider.env (0600); only the resolved .env enters the jail.
  • Provision target is containment-checked (#92/#119): canonicalized and asserted under the allowed jail-root base before any write.

What landed (closed)

  • #101 register-tenant socket command + CLI → step 3 is now colibri register-tenant.
  • #102 --jail-name / --jail-root on colibri spawn-agent → step 4 is now colibri spawn-agent … --jail-name.
  • #92 path canonicalization/containment guard in colibri-vault::provision.
  • #100 crate bw hardening (server-match fail-closed, serialize, note-key validation).

Still open (not blockers for this proof)

  • A colibri delete-tenant / unregister-tenant CLI verb — cleanup is manual today (see Step Cleanup).
  • The Forgejo Actions runner has been intermittently down; CI has not been gating merges reliably. Verify gates locally (cargo fmt --check, cargo clippy --workspace --all-targets -- -D warnings, cargo test --workspace) until it recovers.