Rename the local deterministic launch helper from colibri-smoke-agent to colibri-test-agent, update CLI/TUI/tests/docs, and teach the FreeBSD rc.d service to source /usr/local/etc/colibri/provider.env plus set a service PATH for local spawns.\n\nChecks: cargo fmt --check; ./scripts/check-format.sh; git diff --check; cargo check -p colibri-daemon -p colibri-client -p colibri-glasspane-tui; cargo check -p colibri-client --bins; cargo test -p colibri-client --test live_socket_check -- --nocapture.
15 KiB
Priority Handoff — Three Focus Items Toward ISO Gate 1
Created: 2026-06-14 (Sam & Hermes) Status: open for any agent to pick up Replaces: ad-hoc ISO work-lane priorities
Round 2 audit is fully closed. All repos are green (164 tests, clippy clean, fmt clean). The three items below are the highest-leverage work toward getting a Colibri-backed ISO candidate and delivering on the core cost-discipline promise.
Each item is independently implementable on Linux with FreeBSD validation as the final step. Items can be worked in parallel by different agents.
Priority 1: Validate the staged colibri_daemon boots and runs on the ISO
Why this is #1
The build-side wiring is done. The clawdie-iso build now stages the
Colibri binaries, installs the rc.d script, creates the colibri user, and
enables the service. What has not happened yet is booting a freshly built
image and confirming colibri_daemon actually starts and the acceptance
runbook passes on real FreeBSD. Until that boot/runtime validation is done,
Gate 1 (passive service) is unproven.
What's done (build wiring)
| Artifact / step | Location | Status |
|---|---|---|
| staging script | scripts/stage-colibri-iso.sh |
done — copies colibri-daemon, colibri, colibri-test-agent, rc.d, newsyslog, creates dirs |
| rc.d script | packaging/freebsd/colibri_daemon.in |
done — start_precmd, pidfile, daemon(8) wrapper, COLIBRI_COST_MODE propagation |
| newsyslog config | packaging/freebsd/newsyslog-colibri.conf |
done |
| rc.conf.sample | generated by staging script | done |
| acceptance runbook | docs/ISO-ACCEPTANCE-RUNBOOK.md |
done |
| build integration | clawdie-iso build.sh::install_colibri_service |
done — calls stage-colibri-iso.sh against the image root |
colibri user/group |
clawdie-iso build.sh (pw useradd colibri) |
done — created in the image during build |
| service enable | clawdie-iso build.sh (colibri_daemon_enable) |
done — written into image rc.conf |
| prebuilt binaries | build-host Rust toolchain (preflight-gated) | done — build.sh stages prebuilt release binaries and fails preflight if missing |
What's missing (boot/runtime validation)
-
Boot a freshly built image on FreeBSD (bhyve VM or hardware) and confirm the
colibriuser, binaries, rc.d script, and rc.conf entry are present in the running system. -
Run the acceptance runbook on the booted image:
service colibri_daemon start colibri status colibri create-task --title "iso check" colibri list-tasks --status queued colibri intake-task --title "iso intake check" --capability freebsd # wait one scheduler tick colibri list-tasks --status queued service colibri_daemon stop -
Confirm logging + lifecycle: pidfile is created, newsyslog rotation config is in place, and
service colibri_daemon stopcleanly stops the daemon and removes the pidfile. -
Validate the Hermes rc.d service (
hermes-bsd, merged 2026-06-14 asfc4b57ade). Thehermes_daemonrc.d script runshermes gateway rununderdaemon(8)with a dedicated user, persistentHERMES_HOME, and supervisor/child pidfile separation — but it has not been booted on real FreeBSD yet. On the same image run:# one-time: create user + install the rc.d script per README-FreeBSD.md service hermes_daemon start # must abort cleanly if config.yaml is missing service hermes_daemon health service hermes_daemon stop # supervisor exits, child does not respawnConfirm: prestart aborts (exit 1, no crash loop) when
/var/db/hermes/config.yamlis absent; once configured, start/health/stop work and both the supervisor and child pidfiles are cleaned up on stop.
Key files
scripts/stage-colibri-iso.sh— the staging script (dir creation, bin copy, rc.d install, rc.conf.sample generation)packaging/freebsd/colibri_daemon.in— rc.d scriptdocs/ISO-ACCEPTANCE-RUNBOOK.md— acceptance commands to run on the booted imagedocs/FREEBSD-BUILD-LANE-HANDOFF.md— step-by-step build/boot/validate handoff for the FreeBSD agent- clawdie-iso
build.sh—install_colibri_service()already wires staging, user creation, and service enable hermes-bsdpackaging/freebsd/hermes_daemon.in+README-FreeBSD.md— Hermes rc.d service and setup steps
Suggested owner
ISO/build lane — FreeBSD agent (Codex) or Sam boots a built image and runs the acceptance runbook plus the Hermes rc.d checks. No Linux-side code change is required; this is a runtime-proof step.
Priority 2: Prove the Pi spawn path end-to-end
Why this is #2
The daemon has a full Spawner with provider routing, jail confinement,
retry/backoff, and AgentHandle that captures stdout for glasspane. But the
daemon loop's poll_tasks() is a stub (daemon.rs:274-277):
pub async fn poll_tasks(state: &SharedState) {
debug!("task polling tick");
let _spawner = Spawner::new(state.config.clone().into());
}
It creates a Spawner and does nothing with it. No agent is ever spawned from
the daemon loop. This blocks Gate 2 (agent observation parity) — we cannot
claim glasspane supervision works until a real process is spawned and its
JSONL events flow through to state transitions.
What exists
| Capability | Location | Status |
|---|---|---|
Spawner::spawn() |
crates/colibri-daemon/src/spawner.rs:585 |
done — provider routing, jail wrap, retry/backoff |
AgentHandle |
crates/colibri-daemon/src/spawner.rs:465 |
done — tracks child, stdout for glasspane, kill, poll_exit |
take_stdout() |
crates/colibri-daemon/src/spawner.rs:500 |
done — hands stdout to glasspane supervision |
| Jail confinement | crates/colibri-daemon/src/spawner.rs:332 |
done — named/ephemeral, staged env payload, priv modes |
fake-pi-agent.py |
scripts/fake-pi-agent.py |
exists — emits JSONL events for testing |
| Glasspane ingestion | crates/colibri-glasspane/ |
done — ingests JSONL, tracks pane state |
What's missing
-
Wire
poll_tasks()to actually spawn agents. The scheduler drainsintake-taskinto SQLite on tick, but no agent is spawned to work on the task. The poll_tasks stub needs to:- Query tasks in
queuedstatus with a capability match - Build an
AgentSpawnConfigfor each - Call
Spawner::spawn() - Register the
AgentHandlein daemon state - Hand stdout to glasspane
- Query tasks in
-
End-to-end integration test. Using
scripts/fake-pi-agent.py(or a Rust mock binary):- Start daemon
- Create a task + intake it
- Wait for scheduler tick + spawn
- Verify glasspane observes
Starting→Running→Stoppedlifecycle - Verify session JSONL is written
- Verify agent appears in
colibri status/colibri snapshot
-
spawn-localsocket command (if not present). An operator CLI path to manually spawn a local binary for debugging:colibri spawn-local /path/to/pi --session-id test-1This may already exist as a socket command — check
socket.rsforSpawnLocalorSpawncommand variants. -
Process kill/cleanup verification. Confirm that
AgentHandle::kill()reliably kills the child and any jail wrapper, and that glasspane transitions toStopped.
Key files
crates/colibri-daemon/src/daemon.rs:274—poll_tasks()stub (THE gap)crates/colibri-daemon/src/daemon.rs:242—session_rotation()(working, good reference for how other background loops iterate state)crates/colibri-daemon/src/spawner.rs:585—Spawner::spawn()(working)crates/colibri-daemon/src/socket.rs— socket command dispatch (check for spawn commands)scripts/fake-pi-agent.py— test agent that emits JSONLcrates/colibri-glasspane/src/— JSONL ingestion + pane state machine
Suggested owner
Rust lane (Hermes on Linux). Can implement and test fully on Linux with
fake-pi-agent.py. FreeBSD validation confirms jail path works.
Priority 3: Wire cost mode into actual enforcement
Why this is #3
Cost modes (Fast/Smart/Max) are the core design promise of Colibri —
"cache-first cost discipline." The code has all the pieces (thresholds,
escalation, compaction, trimming) but they are not connected. Right now
changing the cost mode does nothing to actual session behavior.
This is the most subtle gap because the code looks like it's wired up — the functions exist and have tests — but the call sites are missing or duplicated.
The disconnection (detailed)
There are two compaction paths that use different sources of truth:
Path A — per-append (session.rs):
session.rs:397-398 in maybe_compact_or_rollover():
let needs_compaction = byte_count > self.config.session_max_bytes
|| turn_count > self.config.max_uncompacted_turns;
This reads self.config.session_max_bytes and
self.config.max_uncompacted_turns — these are static fields in
DaemonConfig loaded once from env vars (COLIBRI_SESSION_MAX_BYTES,
COLIBRI_MAX_UNCOMPACTED_TURNS). They default to 2,000,000 and 20 (Smart
values) regardless of the cost mode string.
Path B — background rotation (daemon.rs):
daemon.rs:242-261 in session_rotation():
let cost_mode = crate::cost::CostMode::parse(&state.config.cost_mode).unwrap_or_default();
let max_bytes = cost_mode.session_max_bytes();
let max_turns = cost_mode.max_uncompacted_turns();
This correctly derives thresholds from the cost mode. But it runs on a background timer, not per-append, so it's a lagging check.
Result: if you set COLIBRI_COST_MODE=fast, the background loop will use
500K/5 thresholds, but the per-append check still uses the static 2M/20
config values. The session can grow past the Fast budget before the background
loop catches up.
What's never called
| Function | Location | Problem |
|---|---|---|
auto_escalate() |
cost.rs:131 |
Tested but never called from daemon loop or session code |
compact_tool_result() |
cost.rs:165 |
Tested but never called when appending ToolResult entries |
PromptAssembly::trim_to_budget() |
session.rs:117 |
Tested but never called from build_prompt_assembly() or build_prompt_messages() |
EscalationTrigger |
cost.rs:117 |
Type exists, tested, never constructed in production code |
What set-cost-mode does
socket.rs:657 updates state.config.cost_mode (the string), but does NOT
update state.config.session_max_bytes or state.config.max_uncompacted_turns
(the numeric fields). So after a mode change, the per-append compaction path
still uses the old thresholds.
Fix plan
-
Make per-append compaction cost-mode-aware. In
session.rs, changemaybe_compact_or_rollover()to derive thresholds fromCostMode::parse(&self.config.cost_mode)instead of reading the static fields directly. Or better: remove the static fields fromDaemonConfigentirely and always derive fromcost_mode. -
Wire
compact_tool_result()into the append path. WhenSessionEntry::ToolResultis appended andcost_mode.compact_tool_results()is true, run the result throughcompact_tool_result()before writing to JSONL. -
Wire
auto_escalate()intosession_rotation(). After compaction, if the session is still over budget, construct anEscalationTrigger::CompactionInsufficientand callauto_escalate(). If escalation succeeds, log it visibly and updatestate.config.cost_mode. -
Wire
trim_to_budget()into prompt assembly. Inbuild_prompt_assembly()orbuild_prompt_messages(), calltrim_to_budget(cost_mode)after constructing the assembly. -
Make
set-cost-modeupdate derived thresholds. When the socket command changescost_mode, also updatesession_max_bytesandmax_uncompacted_turnsto match (or remove those fields entirely and always derive). -
Remove
COLIBRI_SESSION_MAX_BYTES/COLIBRI_MAX_UNCOMPACTED_TURNSenv vars. These shadow the cost mode system and cause confusion. The cost mode string (COLIBRI_COST_MODE=fast|smart|max) should be the single source of truth for thresholds.
Key files
crates/colibri-daemon/src/cost.rs— cost mode logic (thresholds, escalation, compaction, headroom sidecar)crates/colibri-daemon/src/session.rs:390—maybe_compact_or_rollover()(uses static config, not cost mode)crates/colibri-daemon/src/session.rs:492—build_prompt_assembly()(doesn't calltrim_to_budget())crates/colibri-daemon/src/config.rs:21,43—session_max_bytes/max_uncompacted_turnsstatic fieldscrates/colibri-daemon/src/daemon.rs:242—session_rotation()(correctly uses cost mode, good reference)crates/colibri-daemon/src/socket.rs:657—cmd_set_cost_mode()(updates string only, not derived values)
Suggested owner
Rust lane (Hermes on Linux). Fully testable on Linux — this is pure logic wiring, no platform-specific behavior.
Summary table
| # | Item | Blocks | Linux-doable | Effort |
|---|---|---|---|---|
| 1 | ISO boot/runtime validation | Gate 1 | no (needs FreeBSD boot) | small |
| 2 | Pi spawn end-to-end | Gate 2 | yes (with fake-pi-agent.py) | medium |
| 3 | Cost mode enforcement | core design promise | yes (pure logic) | medium |
All three are medium effort and can be worked in parallel. None require FreeBSD to implement — only to validate the final result.