From a6f1a8b4f59401965300674511e621757af7a55f Mon Sep 17 00:00:00 2001 From: Sam & Claude Date: Sun, 14 Jun 2026 12:57:02 +0200 Subject: [PATCH] =?UTF-8?q?docs:=20priority=20handoff=20=E2=80=94=20ISO=20?= =?UTF-8?q?staging,=20Pi=20spawn,=20cost=20mode=20enforcement=20(Sam=20&?= =?UTF-8?q?=20Hermes)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- docs/PRIORITY-HANDOFF-ISO-SPAWN-COST.md | 290 ++++++++++++++++++++++++ 1 file changed, 290 insertions(+) create mode 100644 docs/PRIORITY-HANDOFF-ISO-SPAWN-COST.md diff --git a/docs/PRIORITY-HANDOFF-ISO-SPAWN-COST.md b/docs/PRIORITY-HANDOFF-ISO-SPAWN-COST.md new file mode 100644 index 0000000..0337a82 --- /dev/null +++ b/docs/PRIORITY-HANDOFF-ISO-SPAWN-COST.md @@ -0,0 +1,290 @@ +# Priority Handoff — Three Focus Items Toward ISO Gate 1 + +**Created:** 2026-06-14 (Sam & Hermes) +**Status:** open for any agent to pick up +**Replaces:** ad-hoc priorities from `ISO-INTEGRATION-PLAN.md` work lanes + +Round 2 audit is fully closed. All repos are green (164 tests, clippy clean, +fmt clean). The three items below are the highest-leverage work toward getting +a Colibri-backed ISO candidate and delivering on the core cost-discipline +promise. + +Each item is independently implementable on Linux with FreeBSD validation as +the final step. Items can be worked in parallel by different agents. + +--- + +## Priority 1: Wire the ISO staging script into the clawdie-iso build + +### Why this is #1 + +`scripts/stage-colibri-iso.sh` already exists and copies binaries, rc.d, +directories, and a sample rc.conf into a DESTDIR. But it has **never been run +against the actual ISO image root**, and the clawdie-iso build process does +not call it. Without this wiring, colibri_daemon cannot boot on the ISO, which +blocks Gate 1 (passive service) entirely. + +### What exists + +| Artifact | Location | Status | +| -------- | --------- | ------ | +| staging script | `scripts/stage-colibri-iso.sh` | done — copies `colibri-daemon`, `colibri`, `colibri-smoke-agent`, rc.d, newsyslog, creates dirs | +| rc.d script | `packaging/freebsd/colibri_daemon.in` | done — `start_precmd`, pidfile, daemon(8) wrapper, `COLIBRI_COST_MODE` propagation | +| newsyslog config | `packaging/freebsd/newsyslog-colibri.conf` | done | +| rc.conf.sample | generated by staging script | done | +| acceptance runbook | `docs/ISO-ACCEPTANCE-RUNBOOK.md` | done | + +### What's missing + +1. **`colibri` user/group creation in the image.** + The staging script documents this as a manual step in + `ETC_DIR/README.iso` but does not create the user. Options: + - Add `pw useradd`/`pw groupadd` to the staging script (requires root, or + runs during image build which has root) + - Add it to the clawdie-iso firstboot wizard + - Add it as an `etc/passwd`/`etc/group` entry in the image root + +2. **clawdie-iso build integration.** + The clawdie-iso build process needs to: + - Cross-compile (or pre-build) `colibri-daemon` for `x86_64-unknown-freebsd` + - Call `stage-colibri-iso.sh` against the image root + - Merge `rc.conf.sample` into `/etc/rc.conf.d/colibri_daemon` or `/etc/rc.conf` + - Ensure the `colibri` user exists in `/etc/passwd` and `/etc/group` in the image + +3. **Pre-built FreeBSD binaries.** + Linux agents cannot produce FreeBSD binaries directly (no cross-compile + target in the workspace). Either: + - The FreeBSD agent (Codex) builds `cargo build --workspace --release` on + FreeBSD and the ISO build consumes `target/release/*.bin` + - A CI step on a FreeBSD runner produces the artifacts + - The ISO build host has Rust installed and builds in-place + +4. **Verification on FreeBSD.** + After staging, run the acceptance runbook commands on the booted image: + ```sh + service colibri_daemon start + colibri status + colibri create-task --title "iso smoke" + colibri list-tasks --status queued + colibri intake-task --title "iso intake smoke" --capability freebsd + # wait one scheduler tick + colibri list-tasks --status queued + service colibri_daemon stop + ``` + +### Key files + +- `scripts/stage-colibri-iso.sh` — the staging script (line 44-93: dir creation, bin copy, rc.d install, rc.conf.sample generation) +- `packaging/freebsd/colibri_daemon.in` — rc.d script +- `docs/ISO-ACCEPTANCE-RUNBOOK.md` — acceptance commands +- `docs/ISO-INTEGRATION-PLAN.md` §Lane A — full plan with gap audit +- clawdie-iso repo — image build scripts that need to call the staging script + +### Suggested owner + +ISO/build lane (Codex on FreeBSD for binary production + Sam for build +integration). Linux agents can prepare the staging script changes and +clawdie-iso build wiring. + +--- + +## Priority 2: Prove the Pi spawn path end-to-end + +### Why this is #2 + +The daemon has a full `Spawner` with provider routing, jail confinement, +retry/backoff, and `AgentHandle` that captures stdout for glasspane. But the +**daemon loop's `poll_tasks()` is a stub** (`daemon.rs:274-277`): + +```rust +pub async fn poll_tasks(state: &SharedState) { + debug!("task polling tick"); + let _spawner = Spawner::new(state.config.clone().into()); +} +``` + +It creates a `Spawner` and does nothing with it. No agent is ever spawned from +the daemon loop. This blocks Gate 2 (agent observation parity) — we cannot +claim glasspane supervision works until a real process is spawned and its +JSONL events flow through to state transitions. + +### What exists + +| Capability | Location | Status | +| ---------- | -------- | ------ | +| `Spawner::spawn()` | `crates/colibri-daemon/src/spawner.rs:585` | done — provider routing, jail wrap, retry/backoff | +| `AgentHandle` | `crates/colibri-daemon/src/spawner.rs:465` | done — tracks child, stdout for glasspane, kill, poll_exit | +| `take_stdout()` | `crates/colibri-daemon/src/spawner.rs:500` | done — hands stdout to glasspane supervision | +| Jail confinement | `crates/colibri-daemon/src/spawner.rs:332` | done — named/ephemeral, staged env payload, priv modes | +| `fake-pi-agent.py` | `scripts/fake-pi-agent.py` | exists — emits JSONL events for testing | +| Glasspane ingestion | `crates/colibri-glasspane/` | done — ingests JSONL, tracks pane state | + +### What's missing + +1. **Wire `poll_tasks()` to actually spawn agents.** + The scheduler drains `intake-task` into SQLite on tick, but no agent is + spawned to work on the task. The poll_tasks stub needs to: + - Query tasks in `queued` status with a capability match + - Build an `AgentSpawnConfig` for each + - Call `Spawner::spawn()` + - Register the `AgentHandle` in daemon state + - Hand stdout to glasspane + +2. **End-to-end integration test.** + Using `scripts/fake-pi-agent.py` (or a Rust mock binary): + - Start daemon + - Create a task + intake it + - Wait for scheduler tick + spawn + - Verify glasspane observes `Starting` → `Running` → `Stopped` lifecycle + - Verify session JSONL is written + - Verify agent appears in `colibri status` / `colibri snapshot` + +3. **`spawn-local` socket command (if not present).** + An operator CLI path to manually spawn a local binary for debugging: + ```sh + colibri spawn-local /path/to/pi --session-id test-1 + ``` + This may already exist as a socket command — check `socket.rs` for + `SpawnLocal` or `Spawn` command variants. + +4. **Process kill/cleanup verification.** + Confirm that `AgentHandle::kill()` reliably kills the child and any jail + wrapper, and that glasspane transitions to `Stopped`. + +### Key files + +- `crates/colibri-daemon/src/daemon.rs:274` — `poll_tasks()` stub (THE gap) +- `crates/colibri-daemon/src/daemon.rs:242` — `session_rotation()` (working, good reference for how other background loops iterate state) +- `crates/colibri-daemon/src/spawner.rs:585` — `Spawner::spawn()` (working) +- `crates/colibri-daemon/src/socket.rs` — socket command dispatch (check for spawn commands) +- `scripts/fake-pi-agent.py` — test agent that emits JSONL +- `crates/colibri-glasspane/src/` — JSONL ingestion + pane state machine + +### Suggested owner + +Rust lane (Hermes on Linux). Can implement and test fully on Linux with +`fake-pi-agent.py`. FreeBSD validation confirms jail path works. + +--- + +## Priority 3: Wire cost mode into actual enforcement + +### Why this is #3 + +Cost modes (`Fast`/`Smart`/`Max`) are the core design promise of Colibri — +"cache-first cost discipline." The code has all the pieces (thresholds, +escalation, compaction, trimming) but **they are not connected**. Right now +changing the cost mode does nothing to actual session behavior. + +This is the most subtle gap because the code *looks* like it's wired up — the +functions exist and have tests — but the call sites are missing or duplicated. + +### The disconnection (detailed) + +There are **two compaction paths** that use different sources of truth: + +**Path A — per-append (session.rs):** + +`session.rs:397-398` in `maybe_compact_or_rollover()`: +```rust +let needs_compaction = byte_count > self.config.session_max_bytes + || turn_count > self.config.max_uncompacted_turns; +``` + +This reads `self.config.session_max_bytes` and +`self.config.max_uncompacted_turns` — these are **static fields** in +`DaemonConfig` loaded once from env vars (`COLIBRI_SESSION_MAX_BYTES`, +`COLIBRI_MAX_UNCOMPACTED_TURNS`). They default to 2,000,000 and 20 (Smart +values) regardless of the cost mode string. + +**Path B — background rotation (daemon.rs):** + +`daemon.rs:242-261` in `session_rotation()`: +```rust +let cost_mode = crate::cost::CostMode::parse(&state.config.cost_mode).unwrap_or_default(); +let max_bytes = cost_mode.session_max_bytes(); +let max_turns = cost_mode.max_uncompacted_turns(); +``` + +This correctly derives thresholds from the cost mode. But it runs on a +background timer, not per-append, so it's a lagging check. + +**Result:** if you set `COLIBRI_COST_MODE=fast`, the background loop will use +500K/5 thresholds, but the per-append check still uses the static 2M/20 +config values. The session can grow past the Fast budget before the background +loop catches up. + +### What's never called + +| Function | Location | Problem | +| -------- | -------- | ------- | +| `auto_escalate()` | `cost.rs:131` | Tested but **never called** from daemon loop or session code | +| `compact_tool_result()` | `cost.rs:165` | Tested but **never called** when appending `ToolResult` entries | +| `PromptAssembly::trim_to_budget()` | `session.rs:117` | Tested but **never called** from `build_prompt_assembly()` or `build_prompt_messages()` | +| `EscalationTrigger` | `cost.rs:117` | Type exists, tested, never constructed in production code | + +### What `set-cost-mode` does + +`socket.rs:657` updates `state.config.cost_mode` (the string), but does NOT +update `state.config.session_max_bytes` or `state.config.max_uncompacted_turns` +(the numeric fields). So after a mode change, the per-append compaction path +still uses the old thresholds. + +### Fix plan + +1. **Make per-append compaction cost-mode-aware.** + In `session.rs`, change `maybe_compact_or_rollover()` to derive thresholds + from `CostMode::parse(&self.config.cost_mode)` instead of reading the static + fields directly. Or better: remove the static fields from `DaemonConfig` + entirely and always derive from `cost_mode`. + +2. **Wire `compact_tool_result()` into the append path.** + When `SessionEntry::ToolResult` is appended and + `cost_mode.compact_tool_results()` is true, run the result through + `compact_tool_result()` before writing to JSONL. + +3. **Wire `auto_escalate()` into `session_rotation()`.** + After compaction, if the session is still over budget, construct an + `EscalationTrigger::CompactionInsufficient` and call `auto_escalate()`. + If escalation succeeds, log it visibly and update `state.config.cost_mode`. + +4. **Wire `trim_to_budget()` into prompt assembly.** + In `build_prompt_assembly()` or `build_prompt_messages()`, call + `trim_to_budget(cost_mode)` after constructing the assembly. + +5. **Make `set-cost-mode` update derived thresholds.** + When the socket command changes `cost_mode`, also update + `session_max_bytes` and `max_uncompacted_turns` to match (or remove those + fields entirely and always derive). + +6. **Remove `COLIBRI_SESSION_MAX_BYTES` / `COLIBRI_MAX_UNCOMPACTED_TURNS` env vars.** + These shadow the cost mode system and cause confusion. The cost mode + string (`COLIBRI_COST_MODE=fast|smart|max`) should be the single source of + truth for thresholds. + +### Key files + +- `crates/colibri-daemon/src/cost.rs` — cost mode logic (thresholds, escalation, compaction, headroom sidecar) +- `crates/colibri-daemon/src/session.rs:390` — `maybe_compact_or_rollover()` (uses static config, not cost mode) +- `crates/colibri-daemon/src/session.rs:492` — `build_prompt_assembly()` (doesn't call `trim_to_budget()`) +- `crates/colibri-daemon/src/config.rs:21,43` — `session_max_bytes` / `max_uncompacted_turns` static fields +- `crates/colibri-daemon/src/daemon.rs:242` — `session_rotation()` (correctly uses cost mode, good reference) +- `crates/colibri-daemon/src/socket.rs:657` — `cmd_set_cost_mode()` (updates string only, not derived values) + +### Suggested owner + +Rust lane (Hermes on Linux). Fully testable on Linux — this is pure logic +wiring, no platform-specific behavior. + +--- + +## Summary table + +| # | Item | Blocks | Linux-doable | Effort | +| - | ---- | ------ | ------------ | ------ | +| 1 | ISO staging wiring | Gate 1 | partially (needs FreeBSD binaries) | medium | +| 2 | Pi spawn end-to-end | Gate 2 | yes (with fake-pi-agent.py) | medium | +| 3 | Cost mode enforcement | core design promise | yes (pure logic) | medium | + +All three are medium effort and can be worked in parallel. None require +FreeBSD to implement — only to validate the final result. -- 2.45.3