From 524ccbff400826963e7cc01aeb707d7ba1320e3e Mon Sep 17 00:00:00 2001 From: Sam & Claude Date: Wed, 24 Jun 2026 16:58:49 +0200 Subject: [PATCH] docs: delete 3 stale docs; repoint refs to successor MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Remove genuinely-stale docs (decision/evidence now elsewhere): - TRUSS-SPAWN-ANALYSIS.md — debug trace of a jail-spawn bug that was fixed - PLAN-MOTHER-MCP-VAULT-KEYS.md — planned a vaultwarden-pubkey exchange; the shipped mother MCP is seed-based (wiki/mother-hive + MOTHER-SETUP) - PRIORITY-HANDOFF-ISO-SPAWN-COST.md — self-superseded by MULTI-AGENT-HOST-PLAN Repointed referrers (README, AGENTS, FREEBSD-BUILD-LANE-HANDOFF, docs/README) to MULTI-AGENT-HOST-PLAN. Fixed the wiki ADR note (the stale 'referenced in stage-colibri-iso.sh' claim — those refs were already cleaned up). KEPT the two design docs (COLIBRI-JAILED-AGENT-SPAWN-DESIGN, COLIBRI-EXTERNAL-MCP-PROTOTYPE): on closer look they hold how-it-works detail the wiki only summarizes + links, so folding would lose detail or bloat the wiki. Gates: wiki-lint --strict (131) + markdown format clean. --- AGENTS.md | 2 +- README.md | 2 +- docs/FREEBSD-BUILD-LANE-HANDOFF.md | 4 +- docs/PLAN-MOTHER-MCP-VAULT-KEYS.md | 162 ------------ docs/PRIORITY-HANDOFF-ISO-SPAWN-COST.md | 313 ------------------------ docs/README.md | 3 - docs/TRUSS-SPAWN-ANALYSIS.md | 50 ---- docs/wiki/agent-harness.md | 4 +- 8 files changed, 6 insertions(+), 534 deletions(-) delete mode 100644 docs/PLAN-MOTHER-MCP-VAULT-KEYS.md delete mode 100644 docs/PRIORITY-HANDOFF-ISO-SPAWN-COST.md delete mode 100644 docs/TRUSS-SPAWN-ANALYSIS.md diff --git a/AGENTS.md b/AGENTS.md index 2be39e4..d8d8e5e 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -128,7 +128,7 @@ was skipped; do not skip it.) | 4 — Write takeover | Colibri owns scheduling and dispatch | open | | 5 — TS retirement | One ISO candidate with Colibri as sole control service | open | -Current sprint priorities: `docs/PRIORITY-HANDOFF-ISO-SPAWN-COST.md`. +Current sprint priorities: `docs/MULTI-AGENT-HOST-PLAN.md`. ## Jail Confinement diff --git a/README.md b/README.md index 16742c6..638cd1f 100644 --- a/README.md +++ b/README.md @@ -7,7 +7,7 @@ VMs, Bastille jails). Unifies coordination (task board, agent registry, skills catalog) with cache-first cost discipline (byte-stable prompt prefixes, cache-hit metering). -**Status:** workspace gates are fmt/clippy/test/release green. Round 2 audit is closed. Current priorities: ISO boot/runtime validation, Pi spawn end-to-end, and cost-mode enforcement (see [`docs/PRIORITY-HANDOFF-ISO-SPAWN-COST.md`](docs/PRIORITY-HANDOFF-ISO-SPAWN-COST.md)). Always query live state: see the crate table below and run the gate commands for current counts. +**Status:** workspace gates are fmt/clippy/test/release green. Round 2 audit is closed. Current priorities: ISO boot/runtime validation, Pi spawn end-to-end, and cost-mode enforcement (see [`docs/MULTI-AGENT-HOST-PLAN.md`](docs/MULTI-AGENT-HOST-PLAN.md)). Always query live state: see the crate table below and run the gate commands for current counts. FreeBSD build lane handoff: `docs/FREEBSD-BUILD-LANE-HANDOFF.md`. ISO acceptance runbook: `docs/ISO-ACCEPTANCE-RUNBOOK.md`. diff --git a/docs/FREEBSD-BUILD-LANE-HANDOFF.md b/docs/FREEBSD-BUILD-LANE-HANDOFF.md index 64cfa0b..867c92c 100644 --- a/docs/FREEBSD-BUILD-LANE-HANDOFF.md +++ b/docs/FREEBSD-BUILD-LANE-HANDOFF.md @@ -3,8 +3,8 @@ **For:** Codex (FreeBSD 15 host) · **Goal:** produce a Colibri-backed ISO candidate and prove ISO Gate 1 (passive service) on real FreeBSD. -This is the runtime-proof step for **Priority 1** of -`PRIORITY-HANDOFF-ISO-SPAWN-COST.md`. The build-side wiring is already done on +This is the runtime-proof step for the FreeBSD ISO validation tracked in +`MULTI-AGENT-HOST-PLAN.md`. The build-side wiring is already done on Linux — `clawdie-iso build.sh` stages the Colibri binaries, installs the rc.d script, creates the `colibri` user, and enables the service. What remains is work only a FreeBSD host can do: build the FreeBSD binaries, run the image diff --git a/docs/PLAN-MOTHER-MCP-VAULT-KEYS.md b/docs/PLAN-MOTHER-MCP-VAULT-KEYS.md deleted file mode 100644 index 68e6b3b..0000000 --- a/docs/PLAN-MOTHER-MCP-VAULT-KEYS.md +++ /dev/null @@ -1,162 +0,0 @@ -# Plan: Mother MCP Link — Vaultwarden Pubkey Exchange - -**Direction:** B — we (ISO agent) call mother via SSH. Pubkeys exchanged -through Vaultwarden, no operator copy-paste. - -**Scope:** OSA acts as mother until dedicated machine exists. Shell scripts -first, colibri-vault crate later when the contract is solid. - ---- - -## Flow - -``` -ISO (agent) VAULTWARDEN MOTHER (OSA) -─────────────────────────────────────── ─────────────── ───────────────────────────── - -[Enable Mother clicked] - -1. ssh-keygen -t ed25519 - (if no key yet) - -2. bw publish pubkey ──────────────► hive-pubkeys item - name: - notes: ssh-ed25519 AAAA... - - [cron: @every 5m] - - 3. bw list hive-pubkeys - - 4. rebuild authorized_keys.hive - command="colibri-mcp",restrict - one line per agent - -5. update external-mcp.json ──────────────────────────────────────► ready for connections - ssh -i key colibri@ colibri-mcp -``` - -## Files - -### Our side (clawdie-iso) - -| File | Change | What | -| ------------------------------- | ------ | --------------------------------------------------------------- | -| `clawdie-enable-mother.sh` | Extend | Add keygen + vault publish BEFORE the external-mcp.json update | -| `clawdie-vault-fetch` (colibri) | Extend | Add `--publish-pubkey` mode: create/update item in hive-pubkeys | - -### Mother side (OSA, new) - -| File | What | -| ------------------------------ | ---------------------------------------------------------------------- | -| `mother-sync-hive-keys.sh` | Pull all pubkeys from vault → rebuild authorized_keys.hive | -| `/etc/cron.d/mother-hive-keys` | `@every 5m` cron entry | -| sshd_config change | Add `AuthorizedKeysFile ... /var/db/colibri/.ssh/authorized_keys.hive` | - ---- - -## Step-by-step - -### A. Key generation (our side) - -``` -clawdie-enable-mother.sh, new step [1/3]: - -if [ ! -f ~/.ssh/id_ed25519 ]; then - ssh-keygen -t ed25519 -N "" -C "colibri@$(hostname)" -f ~/.ssh/id_ed25519 -fi -PUBKEY=$(cat ~/.ssh/id_ed25519.pub) -``` - -### B. Publish to Vaultwarden (our side) - -``` -clawdie-enable-mother.sh, new step [2/3]: - -clawdie-vault-fetch --publish-pubkey "$PUBKEY" --collection hive-pubkeys - -What this does: - - bw login (using BW_* from provider.env) - - bw get collection hive-pubkeys (create if absent) - - bw get item "$(hostname)" --collectionid - → if exists: bw edit item notes="$PUBKEY" - → if not: bw create item --name "$(hostname)" --notes "$PUBKEY" --collectionid -``` - -### C. Update external-mcp.json (our side) - -``` -clawdie-enable-mother.sh, step [3/3] (existing, with new identity file): - -jq --arg key "$HOME/.ssh/id_ed25519" \ - '.servers.mother = { - "command": "ssh", - "args": ["-i", $key, "-o", "StrictHostKeyChecking=accept-new", - "colibri@${MOTHER_TS_IP}", "colibri-mcp"], - "env": {} - }' "$EXTERNAL_MCP" > "$tmp" && mv "$tmp" "$EXTERNAL_MCP" -``` - -### D. Mother sync (cron) - -``` -mother-sync-hive-keys.sh: - -1. bw login (BW_* from provider.env) -2. COLLECTION_ID=$(bw get collection hive-pubkeys --id) -3. bw list items --collectionid $COLLECTION_ID -4. For each item: - - HOSTNAME=$(bw get item $id | jq -r '.name') - - PUBKEY=$(bw get item $id | jq -r '.notes') - - echo "command=\"colibri-mcp\",restrict,no-pty,no-port-forwarding,no-X11-forwarding,no-agent-forwarding $PUBKEY colibri@$HOSTNAME" -5. Write all to /var/db/colibri/.ssh/authorized_keys.hive (atomic: mktemp + mv) -6. chmod 600 - -Cron: @every 5m root /usr/local/sbin/mother-sync-hive-keys.sh -``` - -### E. sshd configuration (mother, one-time) - -``` -# /etc/ssh/sshd_config addition: -AuthorizedKeysFile .ssh/authorized_keys /var/db/colibri/.ssh/authorized_keys.hive - -# Reload: -service sshd reload -``` - ---- - -## Security properties - -| Property | How | -| ----------------------------- | ------------------------------------------------------------------- | -| Rebuild, not append | Each sync regenerates the file — deleting a vault item = revocation | -| Restriction applied by mother | `command="colibri-mcp",restrict` — not baked by publisher | -| Dedicated key file | `authorized_keys.hive` separate from operator keys | -| No shell access | `restrict` blocks everything except the forced command | -| Atomic write | `mktemp` + `mv` — no partial reads | -| TOFU on first connect | `StrictHostKeyChecking=accept-new` — auto-trust on first connection | - ---- - -## Acceptance - -- [ ] Click "Enable Mother" → keypair created if absent -- [ ] Pubkey published to Vaultwarden (verify: `bw get item `) -- [ ] external-mcp.json updated with SSH + identity file -- [ ] Mother cron syncs within 5 minutes -- [ ] `authorized_keys.hive` contains the restricted entry -- [ ] Pi can call mother's tools via `ssh -i key colibri@ colibri-mcp` -- [ ] Delete vault item → next sync removes access (revocation tested) - ---- - -## Sequencing - -| Step | Repo | Content | -| ---- | ----------- | ------------------------------------------------------ | -| 1 | colibri | Extend `clawdie-vault-fetch` with `--publish-pubkey` | -| 2 | clawdie-iso | Extend `clawdie-enable-mother.sh` — keygen + publish | -| 3 | — | Create `mother-sync-hive-keys.sh` on OSA | -| 4 | — | Wire cron + sshd_config on OSA | -| 5 | — | End-to-end test: ISO → vault → OSA → SSH → colibri-mcp | diff --git a/docs/PRIORITY-HANDOFF-ISO-SPAWN-COST.md b/docs/PRIORITY-HANDOFF-ISO-SPAWN-COST.md deleted file mode 100644 index 8d288ef..0000000 --- a/docs/PRIORITY-HANDOFF-ISO-SPAWN-COST.md +++ /dev/null @@ -1,313 +0,0 @@ -# Priority Handoff — Three Focus Items Toward ISO Gate 1 - -**Created:** 14.jun.2026 (Sam & Hermes) · **Updated:** 19.jun.2026 -**Status:** Priorities 2 & 3 **done** · Priority 1 **staged for FreeBSD build** -**Superseded by:** `MULTI-AGENT-HOST-PLAN.md` for the next sprint - -Round 2 audit is fully closed. All repos are green (211 tests, clippy clean, -fmt clean). The three items below were the highest-leverage work toward getting -a Colibri-backed ISO candidate and delivering on the core cost-discipline -promise. - -**Current status of each item:** - -- **Priority 1 (ISO boot validation):** Build wiring done, release runbook - landed (`clawdie-iso/docs/RELEASE-BUILD-RUNBOOK.md`), artifacts built on - FreeBSD host. Awaiting the 0.10.0 release build execution. -- **Priority 2 (Pi spawn end-to-end):** **Done** — `poll_tasks()` wired in - `9d443a4`, integration test `poll_tasks_spawns_agent_for_claimed_task` passes. -- **Priority 3 (Cost mode enforcement):** **Done** — cost mode is single source - of truth; `session_max_bytes`/`max_uncompacted_turns` removed from - `DaemonConfig`; per-append compaction derives from `CostMode::parse()`. - -The next sprint is multi-agent multi-host coordination — see -[`MULTI-AGENT-HOST-PLAN.md`](MULTI-AGENT-HOST-PLAN.md). - ---- - -## Priority 1: Validate the staged colibri_daemon boots and runs on the ISO - -### Why this is #1 - -The build-side wiring is **done**. The clawdie-iso build now stages the -Colibri binaries, installs the rc.d script, creates the `colibri` user, and -enables the service. What has **not** happened yet is booting a freshly built -image and confirming `colibri_daemon` actually starts and the acceptance -runbook passes on real FreeBSD. Until that boot/runtime validation is done, -Gate 1 (passive service) is unproven. - -### What's done (build wiring) - -| Artifact / step | Location | Status | -| -------------------- | ------------------------------------------------ | -------------------------------------------------------------------------------------------------------------------------------- | -| staging script | clawdie-iso `scripts/stage-colibri-iso.sh` | done — copies `colibri-daemon`, `colibri`, `colibri-test-agent`, rc.d, newsyslog, creates dirs (canonical; lives in clawdie-iso) | -| rc.d script | `packaging/freebsd/colibri_daemon.in` | done — `start_precmd`, pidfile, daemon(8) wrapper, `COLIBRI_COST_MODE` propagation | -| newsyslog config | `packaging/freebsd/newsyslog-colibri.conf` | done | -| rc.conf.sample | generated by staging script | done | -| acceptance runbook | `docs/ISO-ACCEPTANCE-RUNBOOK.md` | done | -| build integration | clawdie-iso `build.sh::install_colibri_service` | done — calls `stage-colibri-iso.sh` against the image root | -| `colibri` user/group | clawdie-iso `build.sh` (`pw useradd colibri`) | done — created in the image during build | -| service enable | clawdie-iso `build.sh` (`colibri_daemon_enable`) | done — written into image rc.conf | -| prebuilt binaries | build-host Rust toolchain (preflight-gated) | done — `build.sh` stages prebuilt release binaries and fails preflight if missing | - -### What's missing (boot/runtime validation) - -1. **Boot a freshly built image on FreeBSD** (bhyve VM or hardware) and confirm - the `colibri` user, binaries, rc.d script, and rc.conf entry are present in - the running system. - -2. **Run the acceptance runbook on the booted image:** - - ```sh - service colibri_daemon start - colibri status - colibri create-task --title "iso check" - colibri list-tasks --status queued - colibri intake-task --title "iso intake check" --capability freebsd - # wait one scheduler tick - colibri list-tasks --status queued - service colibri_daemon stop - ``` - -3. **Confirm logging + lifecycle:** pidfile is created, newsyslog rotation - config is in place, and `service colibri_daemon stop` cleanly stops the - daemon and removes the pidfile. - -4. **Validate the Hermes rc.d service** (`hermes-bsd`, merged 14.jun.2026 as - `fc4b57ade`). The `hermes_daemon` rc.d script runs `hermes gateway run` - under `daemon(8)` with a dedicated user, persistent `HERMES_HOME`, and - supervisor/child pidfile separation — but it has not been booted on real - FreeBSD yet. On the same image run: - - ```sh - # one-time: create user + install the rc.d script per README-FreeBSD.md - service hermes_daemon start # must abort cleanly if config.yaml is missing - service hermes_daemon health - service hermes_daemon stop # supervisor exits, child does not respawn - ``` - - Confirm: prestart aborts (exit 1, no crash loop) when - `/var/db/hermes/config.yaml` is absent; once configured, start/health/stop - work and both the supervisor and child pidfiles are cleaned up on stop. - -### Key files - -- clawdie-iso `scripts/stage-colibri-iso.sh` — the staging script (dir creation, bin copy, rc.d install, rc.conf.sample generation). Canonical copy lives in clawdie-iso; the colibri repo no longer keeps a duplicate. -- `packaging/freebsd/colibri_daemon.in` — rc.d script -- `docs/ISO-ACCEPTANCE-RUNBOOK.md` — acceptance commands to run on the booted image -- `docs/FREEBSD-BUILD-LANE-HANDOFF.md` — step-by-step build/boot/validate handoff for the FreeBSD agent -- clawdie-iso `build.sh` — `install_colibri_service()` already wires staging, user creation, and service enable -- `hermes-bsd` `packaging/freebsd/hermes_daemon.in` + `README-FreeBSD.md` — Hermes rc.d service and setup steps - -### Suggested owner - -ISO/build lane — FreeBSD agent (Codex) or Sam boots a built image and runs the -acceptance runbook plus the Hermes rc.d checks. No Linux-side code change is -required; this is a runtime-proof step. - ---- - -## Priority 2: Prove the Pi spawn path end-to-end - -### Why this is #2 - -The daemon has a full `Spawner` with provider routing, jail confinement, -retry/backoff, and `AgentHandle` that captures stdout for glasspane. But the -**daemon loop's `poll_tasks()` is a stub** (`daemon.rs:274-277`): - -```rust -pub async fn poll_tasks(state: &SharedState) { - debug!("task polling tick"); - let _spawner = Spawner::new(state.config.clone().into()); -} -``` - -It creates a `Spawner` and does nothing with it. No agent is ever spawned from -the daemon loop. This blocks Gate 2 (agent observation parity): glasspane -supervision requires a real spawned process whose JSONL events flow through -to state transitions before it can be validated. - -### What exists - -| Capability | Location | Status | -| -------------------- | ------------------------------------------ | ---------------------------------------------------------- | -| `Spawner::spawn()` | `crates/colibri-daemon/src/spawner.rs:585` | done — provider routing, jail wrap, retry/backoff | -| `AgentHandle` | `crates/colibri-daemon/src/spawner.rs:465` | done — tracks child, stdout for glasspane, kill, poll_exit | -| `take_stdout()` | `crates/colibri-daemon/src/spawner.rs:500` | done — hands stdout to glasspane supervision | -| Jail confinement | `crates/colibri-daemon/src/spawner.rs:332` | done — named/ephemeral, staged env payload, priv modes | -| `sample-pi-agent.py` | `scripts/sample-pi-agent.py` | exists — emits JSONL events for testing | -| Glasspane ingestion | `crates/colibri-glasspane/` | done — ingests JSONL, tracks pane state | - -### What's missing - -1. **Wire `poll_tasks()` to actually spawn agents.** - The scheduler drains `intake-task` into SQLite on tick, but no agent is - spawned to work on the task. The poll_tasks stub needs to: - - Query tasks in `queued` status with a capability match - - Build an `AgentSpawnConfig` for each - - Call `Spawner::spawn()` - - Register the `AgentHandle` in daemon state - - Hand stdout to glasspane - -2. **End-to-end integration test.** - Using `scripts/sample-pi-agent.py` (or a Rust mock binary): - - Start daemon - - Create a task + intake it - - Wait for scheduler tick + spawn - - Verify glasspane observes `Starting` → `Running` → `Stopped` lifecycle - - Verify session JSONL is written - - Verify agent appears in `colibri status` / `colibri snapshot` - -3. **`spawn-local` socket command (if not present).** - An operator CLI path to manually spawn a local binary for debugging: - - ```sh - colibri spawn-local /path/to/pi --session-id test-1 - ``` - - This may already exist as a socket command — check `socket.rs` for - `SpawnLocal` or `Spawn` command variants. - -4. **Process kill/cleanup verification.** - Confirm that `AgentHandle::kill()` reliably kills the child and any jail - wrapper, and that glasspane transitions to `Stopped`. - -### Key files - -- `crates/colibri-daemon/src/daemon.rs:274` — `poll_tasks()` stub (THE gap) -- `crates/colibri-daemon/src/daemon.rs:242` — `session_rotation()` (working, good reference for how other background loops iterate state) -- `crates/colibri-daemon/src/spawner.rs:585` — `Spawner::spawn()` (working) -- `crates/colibri-daemon/src/socket.rs` — socket command dispatch (check for spawn commands) -- `scripts/sample-pi-agent.py` — test agent that emits JSONL -- `crates/colibri-glasspane/src/` — JSONL ingestion + pane state machine - -### Suggested owner - -Rust lane (Hermes on Linux). Can implement and test fully on Linux with -`sample-pi-agent.py`. FreeBSD validation confirms jail path works. - ---- - -## Priority 3: Wire cost mode into actual enforcement - -### Why this is #3 - -Cost modes (`Fast`/`Smart`/`Max`) are the core design promise of Colibri — -"cache-first cost discipline." The code has all the pieces (thresholds, -escalation, compaction, trimming) but **they are not connected**. Right now -changing the cost mode does nothing to actual session behavior. - -This is the most subtle gap because the code _looks_ like it's wired up — the -functions exist and have tests — but the call sites are missing or duplicated. - -### The disconnection (detailed) - -There are **two compaction paths** that use different sources of truth: - -**Path A — per-append (session.rs):** - -`session.rs:397-398` in `maybe_compact_or_rollover()`: - -```rust -let needs_compaction = byte_count > self.config.session_max_bytes - || turn_count > self.config.max_uncompacted_turns; -``` - -This reads `self.config.session_max_bytes` and -`self.config.max_uncompacted_turns` — these are **static fields** in -`DaemonConfig` loaded once from env vars (`COLIBRI_SESSION_MAX_BYTES`, -`COLIBRI_MAX_UNCOMPACTED_TURNS`). They default to 2,000,000 and 20 (Smart -values) regardless of the cost mode string. - -**Path B — background rotation (daemon.rs):** - -`daemon.rs:242-261` in `session_rotation()`: - -```rust -let cost_mode = crate::cost::CostMode::parse(&state.config.cost_mode).unwrap_or_default(); -let max_bytes = cost_mode.session_max_bytes(); -let max_turns = cost_mode.max_uncompacted_turns(); -``` - -This correctly derives thresholds from the cost mode. But it runs on a -background timer, not per-append, so it's a lagging check. - -**Result:** if you set `COLIBRI_COST_MODE=fast`, the background loop will use -500K/5 thresholds, but the per-append check still uses the static 2M/20 -config values. The session can grow past the Fast budget before the background -loop catches up. - -### What's never called - -| Function | Location | Problem | -| ---------------------------------- | ---------------- | --------------------------------------------------------------------------------------- | -| `auto_escalate()` | `cost.rs:131` | Tested but **never called** from daemon loop or session code | -| `compact_tool_result()` | `cost.rs:165` | Tested but **never called** when appending `ToolResult` entries | -| `PromptAssembly::trim_to_budget()` | `session.rs:117` | Tested but **never called** from `build_prompt_assembly()` or `build_prompt_messages()` | -| `EscalationTrigger` | `cost.rs:117` | Type exists, tested, never constructed in production code | - -### What `set-cost-mode` does - -`socket.rs:657` updates `state.config.cost_mode` (the string), but does NOT -update `state.config.session_max_bytes` or `state.config.max_uncompacted_turns` -(the numeric fields). So after a mode change, the per-append compaction path -still uses the old thresholds. - -### Fix plan - -1. **Make per-append compaction cost-mode-aware.** - In `session.rs`, change `maybe_compact_or_rollover()` to derive thresholds - from `CostMode::parse(&self.config.cost_mode)` instead of reading the static - fields directly. Or better: remove the static fields from `DaemonConfig` - entirely and always derive from `cost_mode`. - -2. **Wire `compact_tool_result()` into the append path.** - When `SessionEntry::ToolResult` is appended and - `cost_mode.compact_tool_results()` is true, run the result through - `compact_tool_result()` before writing to JSONL. - -3. **Wire `auto_escalate()` into `session_rotation()`.** - After compaction, if the session is still over budget, construct an - `EscalationTrigger::CompactionInsufficient` and call `auto_escalate()`. - If escalation succeeds, log it visibly and update `state.config.cost_mode`. - -4. **Wire `trim_to_budget()` into prompt assembly.** - In `build_prompt_assembly()` or `build_prompt_messages()`, call - `trim_to_budget(cost_mode)` after constructing the assembly. - -5. **Make `set-cost-mode` update derived thresholds.** - When the socket command changes `cost_mode`, also update - `session_max_bytes` and `max_uncompacted_turns` to match (or remove those - fields entirely and always derive). - -6. **Remove `COLIBRI_SESSION_MAX_BYTES` / `COLIBRI_MAX_UNCOMPACTED_TURNS` env vars.** - These shadow the cost mode system and cause confusion. The cost mode - string (`COLIBRI_COST_MODE=fast|smart|max`) should be the single source of - truth for thresholds. - -### Key files - -- `crates/colibri-daemon/src/cost.rs` — cost mode logic (thresholds, escalation, compaction, headroom sidecar) -- `crates/colibri-daemon/src/session.rs:390` — `maybe_compact_or_rollover()` (uses static config, not cost mode) -- `crates/colibri-daemon/src/session.rs:492` — `build_prompt_assembly()` (doesn't call `trim_to_budget()`) -- `crates/colibri-daemon/src/config.rs:21,43` — `session_max_bytes` / `max_uncompacted_turns` static fields -- `crates/colibri-daemon/src/daemon.rs:242` — `session_rotation()` (correctly uses cost mode, good reference) -- `crates/colibri-daemon/src/socket.rs:657` — `cmd_set_cost_mode()` (updates string only, not derived values) - -### Suggested owner - -Rust lane (Hermes on Linux). Fully testable on Linux — this is pure logic -wiring, no platform-specific behavior. - ---- - -## Summary table - -| # | Item | Blocks | Linux-doable | Effort | -| --- | --------------------------- | ------------------- | ----------------------------- | ------ | -| 1 | ISO boot/runtime validation | Gate 1 | no (needs FreeBSD boot) | small | -| 2 | Pi spawn end-to-end | Gate 2 | yes (with sample-pi-agent.py) | medium | -| 3 | Cost mode enforcement | core design promise | yes (pure logic) | medium | - -All three are medium effort and can be worked in parallel. None require -FreeBSD to implement — only to validate the final result. diff --git a/docs/README.md b/docs/README.md index acc73b6..bd6ab0f 100644 --- a/docs/README.md +++ b/docs/README.md @@ -15,7 +15,4 @@ A quick-reference guide to every document in this folder. | [`ISO-ACCEPTANCE-RUNBOOK.md`](ISO-ACCEPTANCE-RUNBOOK.md) | Post-boot acceptance commands after staging Colibri into an ISO | Codex (FreeBSD) | | [`ISO-SERVICE-LAYOUT.md`](ISO-SERVICE-LAYOUT.md) | `rc.conf` service layout for the ISO image | All | | [`MULTI-AGENT-HOST-PLAN.md`](MULTI-AGENT-HOST-PLAN.md) | **Current sprint**: multi-agent task-board tests + CLI surface gaps | All agents | -| [`PLAN-MOTHER-MCP-VAULT-KEYS.md`](PLAN-MOTHER-MCP-VAULT-KEYS.md) | Vaultwarden pubkey exchange for mother MCP link (direction B) | Sam & Hermes | -| [`PRIORITY-HANDOFF-ISO-SPAWN-COST.md`](PRIORITY-HANDOFF-ISO-SPAWN-COST.md) | ISO boot validation, Pi spawn path, cost mode enforcement (P2/P3 done) | All agents | -| [`TRUSS-SPAWN-ANALYSIS.md`](TRUSS-SPAWN-ANALYSIS.md) | truss trace of jail-spawn Permission Denied — root cause + fix | Debugging | | [`VAULT-PROVISION-RUNBOOK.md`](VAULT-PROVISION-RUNBOOK.md) | First-proof runbook: vault → jail → `.env` chain (clean CLI) | Agents, Sam | diff --git a/docs/TRUSS-SPAWN-ANALYSIS.md b/docs/TRUSS-SPAWN-ANALYSIS.md deleted file mode 100644 index fe46b48..0000000 --- a/docs/TRUSS-SPAWN-ANALYSIS.md +++ /dev/null @@ -1,50 +0,0 @@ -# truss Analysis — colibri-daemon Jail Spawn (21.jun.2026) - -**Trace saved:** `/tmp/daemon.truss` (1964 lines, captured during successful spawn) - -## The Bug - -The daemon could not spawn agents inside jails. `colibri spawn-agent --jail-name` -returned "Permission denied (os error 13)" even though `sudo -n jexec proof0 ...` -worked fine from the shell. - -## What truss Revealed - -Two independent issues, both masked by the same EACCES error: - -### 1. Bare command names in daemon(8) PATH - -The daemon constructed spawn commands with bare names (`sudo`, `jexec`). -Under `daemon(8) -u clawdie`, the inherited PATH may be empty or reordered, -so `execvp` missed `/usr/local/bin/sudo` and returned EACCES. - -**Fix:** `resolve_program()` — absolutizes bare names by searching a fixed -list (`/usr/local/sbin`, `/usr/local/bin`, `/usr/sbin`, `/usr/bin`, `/sbin`, -`/bin`), returning the first executable found. PR #131. - -### 2. Staging directory owned by root - -For jailed spawns with environment variables, the daemon's -`prepare_spawn_command` stages files under the jail root at -`/var/run/colibri-stage//`. This directory was -created by a previous run (as root) and was mode 755 root:wheel. -The daemon runs as `clawdie` and could not write staging files there. - -**Fix (history):** initially `chmod 777 /var/run/colibri-stage`, then -`agent-jail-bootstrap.sh` pre-created it clawdie-owned `0700` (#134). **Final -(#135):** staging moved out of root-owned `/var/run` to the daemon user's home -at `/home/clawdie/.cache/colibri/stage//`, so the daemon creates it itself -with no privileged pre-creation step (overridable via `COLIBRI_JAIL_STAGE_DIR`). - -## The Winning Spawn - -``` -program=/usr/local/bin/sudo requested=sudo -args=["-n", "jexec", "proof0", "/bin/sh", - "/var/run/colibri-stage//launch.sh", - "/var/run/colibri-stage//env.sh", "-", - "/usr/local/bin/colibri-test-agent"] -path=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin -``` - -Agent spawned, vault provision ran, `.env` written. Track A complete. diff --git a/docs/wiki/agent-harness.md b/docs/wiki/agent-harness.md index defc67e..16ebd0b 100644 --- a/docs/wiki/agent-harness.md +++ b/docs/wiki/agent-harness.md @@ -14,8 +14,8 @@ Two binaries, not one (Sam rejected merging them, 13.jun.2026): Canonical statement: `AGENTS.md` (lines ~18–32). `clawdie-ai` (TS) is being pruned; surviving features move to zot/Colibri. -> There is **no** `ADR-agent-harness-consolidation.md` despite references to it -> in `clawdie-iso/scripts/stage-colibri-iso.sh`. Treat `AGENTS.md` as the ADR. +> There is **no** `ADR-agent-harness-consolidation.md` (it was referenced in the +> past; those references have since been cleaned up). Treat `AGENTS.md` as the ADR. ## Runtimes -- 2.45.3