colibri/docs/PRIORITY-HANDOFF-ISO-SPAWN-COST.md
Sam & Claude 9891d06144 feat(rc): rename test agent and load provider env (Sam & Codex)
Rename the local deterministic launch helper from colibri-smoke-agent to colibri-test-agent, update CLI/TUI/tests/docs, and teach the FreeBSD rc.d service to source /usr/local/etc/colibri/provider.env plus set a service PATH for local spawns.\n\nChecks: cargo fmt --check; ./scripts/check-format.sh; git diff --check; cargo check -p colibri-daemon -p colibri-client -p colibri-glasspane-tui; cargo check -p colibri-client --bins; cargo test -p colibri-client --test live_socket_check -- --nocapture.
2026-06-15 07:35:44 +02:00

15 KiB

Priority Handoff — Three Focus Items Toward ISO Gate 1

Created: 2026-06-14 (Sam & Hermes) Status: open for any agent to pick up Replaces: ad-hoc ISO work-lane priorities

Round 2 audit is fully closed. All repos are green (164 tests, clippy clean, fmt clean). The three items below are the highest-leverage work toward getting a Colibri-backed ISO candidate and delivering on the core cost-discipline promise.

Each item is independently implementable on Linux with FreeBSD validation as the final step. Items can be worked in parallel by different agents.


Priority 1: Validate the staged colibri_daemon boots and runs on the ISO

Why this is #1

The build-side wiring is done. The clawdie-iso build now stages the Colibri binaries, installs the rc.d script, creates the colibri user, and enables the service. What has not happened yet is booting a freshly built image and confirming colibri_daemon actually starts and the acceptance runbook passes on real FreeBSD. Until that boot/runtime validation is done, Gate 1 (passive service) is unproven.

What's done (build wiring)

Artifact / step Location Status
staging script scripts/stage-colibri-iso.sh done — copies colibri-daemon, colibri, colibri-test-agent, rc.d, newsyslog, creates dirs
rc.d script packaging/freebsd/colibri_daemon.in done — start_precmd, pidfile, daemon(8) wrapper, COLIBRI_COST_MODE propagation
newsyslog config packaging/freebsd/newsyslog-colibri.conf done
rc.conf.sample generated by staging script done
acceptance runbook docs/ISO-ACCEPTANCE-RUNBOOK.md done
build integration clawdie-iso build.sh::install_colibri_service done — calls stage-colibri-iso.sh against the image root
colibri user/group clawdie-iso build.sh (pw useradd colibri) done — created in the image during build
service enable clawdie-iso build.sh (colibri_daemon_enable) done — written into image rc.conf
prebuilt binaries build-host Rust toolchain (preflight-gated) done — build.sh stages prebuilt release binaries and fails preflight if missing

What's missing (boot/runtime validation)

  1. Boot a freshly built image on FreeBSD (bhyve VM or hardware) and confirm the colibri user, binaries, rc.d script, and rc.conf entry are present in the running system.

  2. Run the acceptance runbook on the booted image:

    service colibri_daemon start
    colibri status
    colibri create-task --title "iso check"
    colibri list-tasks --status queued
    colibri intake-task --title "iso intake check" --capability freebsd
    # wait one scheduler tick
    colibri list-tasks --status queued
    service colibri_daemon stop
    
  3. Confirm logging + lifecycle: pidfile is created, newsyslog rotation config is in place, and service colibri_daemon stop cleanly stops the daemon and removes the pidfile.

  4. Validate the Hermes rc.d service (hermes-bsd, merged 2026-06-14 as fc4b57ade). The hermes_daemon rc.d script runs hermes gateway run under daemon(8) with a dedicated user, persistent HERMES_HOME, and supervisor/child pidfile separation — but it has not been booted on real FreeBSD yet. On the same image run:

    # one-time: create user + install the rc.d script per README-FreeBSD.md
    service hermes_daemon start    # must abort cleanly if config.yaml is missing
    service hermes_daemon health
    service hermes_daemon stop     # supervisor exits, child does not respawn
    

    Confirm: prestart aborts (exit 1, no crash loop) when /var/db/hermes/config.yaml is absent; once configured, start/health/stop work and both the supervisor and child pidfiles are cleaned up on stop.

Key files

  • scripts/stage-colibri-iso.sh — the staging script (dir creation, bin copy, rc.d install, rc.conf.sample generation)
  • packaging/freebsd/colibri_daemon.in — rc.d script
  • docs/ISO-ACCEPTANCE-RUNBOOK.md — acceptance commands to run on the booted image
  • docs/FREEBSD-BUILD-LANE-HANDOFF.md — step-by-step build/boot/validate handoff for the FreeBSD agent
  • clawdie-iso build.shinstall_colibri_service() already wires staging, user creation, and service enable
  • hermes-bsd packaging/freebsd/hermes_daemon.in + README-FreeBSD.md — Hermes rc.d service and setup steps

Suggested owner

ISO/build lane — FreeBSD agent (Codex) or Sam boots a built image and runs the acceptance runbook plus the Hermes rc.d checks. No Linux-side code change is required; this is a runtime-proof step.


Priority 2: Prove the Pi spawn path end-to-end

Why this is #2

The daemon has a full Spawner with provider routing, jail confinement, retry/backoff, and AgentHandle that captures stdout for glasspane. But the daemon loop's poll_tasks() is a stub (daemon.rs:274-277):

pub async fn poll_tasks(state: &SharedState) {
    debug!("task polling tick");
    let _spawner = Spawner::new(state.config.clone().into());
}

It creates a Spawner and does nothing with it. No agent is ever spawned from the daemon loop. This blocks Gate 2 (agent observation parity) — we cannot claim glasspane supervision works until a real process is spawned and its JSONL events flow through to state transitions.

What exists

Capability Location Status
Spawner::spawn() crates/colibri-daemon/src/spawner.rs:585 done — provider routing, jail wrap, retry/backoff
AgentHandle crates/colibri-daemon/src/spawner.rs:465 done — tracks child, stdout for glasspane, kill, poll_exit
take_stdout() crates/colibri-daemon/src/spawner.rs:500 done — hands stdout to glasspane supervision
Jail confinement crates/colibri-daemon/src/spawner.rs:332 done — named/ephemeral, staged env payload, priv modes
fake-pi-agent.py scripts/fake-pi-agent.py exists — emits JSONL events for testing
Glasspane ingestion crates/colibri-glasspane/ done — ingests JSONL, tracks pane state

What's missing

  1. Wire poll_tasks() to actually spawn agents. The scheduler drains intake-task into SQLite on tick, but no agent is spawned to work on the task. The poll_tasks stub needs to:

    • Query tasks in queued status with a capability match
    • Build an AgentSpawnConfig for each
    • Call Spawner::spawn()
    • Register the AgentHandle in daemon state
    • Hand stdout to glasspane
  2. End-to-end integration test. Using scripts/fake-pi-agent.py (or a Rust mock binary):

    • Start daemon
    • Create a task + intake it
    • Wait for scheduler tick + spawn
    • Verify glasspane observes StartingRunningStopped lifecycle
    • Verify session JSONL is written
    • Verify agent appears in colibri status / colibri snapshot
  3. spawn-local socket command (if not present). An operator CLI path to manually spawn a local binary for debugging:

    colibri spawn-local /path/to/pi --session-id test-1
    

    This may already exist as a socket command — check socket.rs for SpawnLocal or Spawn command variants.

  4. Process kill/cleanup verification. Confirm that AgentHandle::kill() reliably kills the child and any jail wrapper, and that glasspane transitions to Stopped.

Key files

  • crates/colibri-daemon/src/daemon.rs:274poll_tasks() stub (THE gap)
  • crates/colibri-daemon/src/daemon.rs:242session_rotation() (working, good reference for how other background loops iterate state)
  • crates/colibri-daemon/src/spawner.rs:585Spawner::spawn() (working)
  • crates/colibri-daemon/src/socket.rs — socket command dispatch (check for spawn commands)
  • scripts/fake-pi-agent.py — test agent that emits JSONL
  • crates/colibri-glasspane/src/ — JSONL ingestion + pane state machine

Suggested owner

Rust lane (Hermes on Linux). Can implement and test fully on Linux with fake-pi-agent.py. FreeBSD validation confirms jail path works.


Priority 3: Wire cost mode into actual enforcement

Why this is #3

Cost modes (Fast/Smart/Max) are the core design promise of Colibri — "cache-first cost discipline." The code has all the pieces (thresholds, escalation, compaction, trimming) but they are not connected. Right now changing the cost mode does nothing to actual session behavior.

This is the most subtle gap because the code looks like it's wired up — the functions exist and have tests — but the call sites are missing or duplicated.

The disconnection (detailed)

There are two compaction paths that use different sources of truth:

Path A — per-append (session.rs):

session.rs:397-398 in maybe_compact_or_rollover():

let needs_compaction = byte_count > self.config.session_max_bytes
    || turn_count > self.config.max_uncompacted_turns;

This reads self.config.session_max_bytes and self.config.max_uncompacted_turns — these are static fields in DaemonConfig loaded once from env vars (COLIBRI_SESSION_MAX_BYTES, COLIBRI_MAX_UNCOMPACTED_TURNS). They default to 2,000,000 and 20 (Smart values) regardless of the cost mode string.

Path B — background rotation (daemon.rs):

daemon.rs:242-261 in session_rotation():

let cost_mode = crate::cost::CostMode::parse(&state.config.cost_mode).unwrap_or_default();
let max_bytes = cost_mode.session_max_bytes();
let max_turns = cost_mode.max_uncompacted_turns();

This correctly derives thresholds from the cost mode. But it runs on a background timer, not per-append, so it's a lagging check.

Result: if you set COLIBRI_COST_MODE=fast, the background loop will use 500K/5 thresholds, but the per-append check still uses the static 2M/20 config values. The session can grow past the Fast budget before the background loop catches up.

What's never called

Function Location Problem
auto_escalate() cost.rs:131 Tested but never called from daemon loop or session code
compact_tool_result() cost.rs:165 Tested but never called when appending ToolResult entries
PromptAssembly::trim_to_budget() session.rs:117 Tested but never called from build_prompt_assembly() or build_prompt_messages()
EscalationTrigger cost.rs:117 Type exists, tested, never constructed in production code

What set-cost-mode does

socket.rs:657 updates state.config.cost_mode (the string), but does NOT update state.config.session_max_bytes or state.config.max_uncompacted_turns (the numeric fields). So after a mode change, the per-append compaction path still uses the old thresholds.

Fix plan

  1. Make per-append compaction cost-mode-aware. In session.rs, change maybe_compact_or_rollover() to derive thresholds from CostMode::parse(&self.config.cost_mode) instead of reading the static fields directly. Or better: remove the static fields from DaemonConfig entirely and always derive from cost_mode.

  2. Wire compact_tool_result() into the append path. When SessionEntry::ToolResult is appended and cost_mode.compact_tool_results() is true, run the result through compact_tool_result() before writing to JSONL.

  3. Wire auto_escalate() into session_rotation(). After compaction, if the session is still over budget, construct an EscalationTrigger::CompactionInsufficient and call auto_escalate(). If escalation succeeds, log it visibly and update state.config.cost_mode.

  4. Wire trim_to_budget() into prompt assembly. In build_prompt_assembly() or build_prompt_messages(), call trim_to_budget(cost_mode) after constructing the assembly.

  5. Make set-cost-mode update derived thresholds. When the socket command changes cost_mode, also update session_max_bytes and max_uncompacted_turns to match (or remove those fields entirely and always derive).

  6. Remove COLIBRI_SESSION_MAX_BYTES / COLIBRI_MAX_UNCOMPACTED_TURNS env vars. These shadow the cost mode system and cause confusion. The cost mode string (COLIBRI_COST_MODE=fast|smart|max) should be the single source of truth for thresholds.

Key files

  • crates/colibri-daemon/src/cost.rs — cost mode logic (thresholds, escalation, compaction, headroom sidecar)
  • crates/colibri-daemon/src/session.rs:390maybe_compact_or_rollover() (uses static config, not cost mode)
  • crates/colibri-daemon/src/session.rs:492build_prompt_assembly() (doesn't call trim_to_budget())
  • crates/colibri-daemon/src/config.rs:21,43session_max_bytes / max_uncompacted_turns static fields
  • crates/colibri-daemon/src/daemon.rs:242session_rotation() (correctly uses cost mode, good reference)
  • crates/colibri-daemon/src/socket.rs:657cmd_set_cost_mode() (updates string only, not derived values)

Suggested owner

Rust lane (Hermes on Linux). Fully testable on Linux — this is pure logic wiring, no platform-specific behavior.


Summary table

# Item Blocks Linux-doable Effort
1 ISO boot/runtime validation Gate 1 no (needs FreeBSD boot) small
2 Pi spawn end-to-end Gate 2 yes (with fake-pi-agent.py) medium
3 Cost mode enforcement core design promise yes (pure logic) medium

All three are medium effort and can be worked in parallel. None require FreeBSD to implement — only to validate the final result.