2026-06-14 12:57:02 +02:00
|
|
|
# Priority Handoff — Three Focus Items Toward ISO Gate 1
|
|
|
|
|
|
|
|
|
|
**Created:** 2026-06-14 (Sam & Hermes)
|
|
|
|
|
**Status:** open for any agent to pick up
|
2026-06-14 15:29:09 +02:00
|
|
|
**Replaces:** ad-hoc ISO work-lane priorities
|
2026-06-14 12:57:02 +02:00
|
|
|
|
|
|
|
|
Round 2 audit is fully closed. All repos are green (164 tests, clippy clean,
|
|
|
|
|
fmt clean). The three items below are the highest-leverage work toward getting
|
|
|
|
|
a Colibri-backed ISO candidate and delivering on the core cost-discipline
|
|
|
|
|
promise.
|
|
|
|
|
|
|
|
|
|
Each item is independently implementable on Linux with FreeBSD validation as
|
|
|
|
|
the final step. Items can be worked in parallel by different agents.
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
2026-06-14 15:07:47 +02:00
|
|
|
## Priority 1: Validate the staged colibri_daemon boots and runs on the ISO
|
2026-06-14 12:57:02 +02:00
|
|
|
|
|
|
|
|
### Why this is #1
|
|
|
|
|
|
2026-06-14 15:07:47 +02:00
|
|
|
The build-side wiring is **done**. The clawdie-iso build now stages the
|
|
|
|
|
Colibri binaries, installs the rc.d script, creates the `colibri` user, and
|
|
|
|
|
enables the service. What has **not** happened yet is booting a freshly built
|
|
|
|
|
image and confirming `colibri_daemon` actually starts and the acceptance
|
|
|
|
|
runbook passes on real FreeBSD. Until that boot/runtime validation is done,
|
|
|
|
|
Gate 1 (passive service) is unproven.
|
2026-06-14 12:57:02 +02:00
|
|
|
|
2026-06-14 15:07:47 +02:00
|
|
|
### What's done (build wiring)
|
2026-06-14 12:57:02 +02:00
|
|
|
|
2026-06-15 07:35:44 +02:00
|
|
|
| Artifact / step | Location | Status |
|
|
|
|
|
| -------------------- | ------------------------------------------------ | ---------------------------------------------------------------------------------------------- |
|
|
|
|
|
| staging script | `scripts/stage-colibri-iso.sh` | done — copies `colibri-daemon`, `colibri`, `colibri-test-agent`, rc.d, newsyslog, creates dirs |
|
|
|
|
|
| rc.d script | `packaging/freebsd/colibri_daemon.in` | done — `start_precmd`, pidfile, daemon(8) wrapper, `COLIBRI_COST_MODE` propagation |
|
|
|
|
|
| newsyslog config | `packaging/freebsd/newsyslog-colibri.conf` | done |
|
|
|
|
|
| rc.conf.sample | generated by staging script | done |
|
|
|
|
|
| acceptance runbook | `docs/ISO-ACCEPTANCE-RUNBOOK.md` | done |
|
|
|
|
|
| build integration | clawdie-iso `build.sh::install_colibri_service` | done — calls `stage-colibri-iso.sh` against the image root |
|
|
|
|
|
| `colibri` user/group | clawdie-iso `build.sh` (`pw useradd colibri`) | done — created in the image during build |
|
|
|
|
|
| service enable | clawdie-iso `build.sh` (`colibri_daemon_enable`) | done — written into image rc.conf |
|
|
|
|
|
| prebuilt binaries | build-host Rust toolchain (preflight-gated) | done — `build.sh` stages prebuilt release binaries and fails preflight if missing |
|
2026-06-14 12:57:02 +02:00
|
|
|
|
2026-06-14 15:07:47 +02:00
|
|
|
### What's missing (boot/runtime validation)
|
|
|
|
|
|
|
|
|
|
1. **Boot a freshly built image on FreeBSD** (bhyve VM or hardware) and confirm
|
|
|
|
|
the `colibri` user, binaries, rc.d script, and rc.conf entry are present in
|
|
|
|
|
the running system.
|
|
|
|
|
|
|
|
|
|
2. **Run the acceptance runbook on the booted image:**
|
2026-06-14 12:57:02 +02:00
|
|
|
|
|
|
|
|
```sh
|
|
|
|
|
service colibri_daemon start
|
|
|
|
|
colibri status
|
2026-06-15 07:35:44 +02:00
|
|
|
colibri create-task --title "iso check"
|
2026-06-14 12:57:02 +02:00
|
|
|
colibri list-tasks --status queued
|
2026-06-15 07:35:44 +02:00
|
|
|
colibri intake-task --title "iso intake check" --capability freebsd
|
2026-06-14 12:57:02 +02:00
|
|
|
# wait one scheduler tick
|
|
|
|
|
colibri list-tasks --status queued
|
|
|
|
|
service colibri_daemon stop
|
|
|
|
|
```
|
|
|
|
|
|
2026-06-14 15:07:47 +02:00
|
|
|
3. **Confirm logging + lifecycle:** pidfile is created, newsyslog rotation
|
|
|
|
|
config is in place, and `service colibri_daemon stop` cleanly stops the
|
|
|
|
|
daemon and removes the pidfile.
|
|
|
|
|
|
2026-06-14 15:24:21 +02:00
|
|
|
4. **Validate the Hermes rc.d service** (`hermes-bsd`, merged 2026-06-14 as
|
|
|
|
|
`fc4b57ade`). The `hermes_daemon` rc.d script runs `hermes gateway run`
|
|
|
|
|
under `daemon(8)` with a dedicated user, persistent `HERMES_HOME`, and
|
|
|
|
|
supervisor/child pidfile separation — but it has not been booted on real
|
|
|
|
|
FreeBSD yet. On the same image run:
|
|
|
|
|
|
|
|
|
|
```sh
|
|
|
|
|
# one-time: create user + install the rc.d script per README-FreeBSD.md
|
|
|
|
|
service hermes_daemon start # must abort cleanly if config.yaml is missing
|
|
|
|
|
service hermes_daemon health
|
|
|
|
|
service hermes_daemon stop # supervisor exits, child does not respawn
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
Confirm: prestart aborts (exit 1, no crash loop) when
|
|
|
|
|
`/var/db/hermes/config.yaml` is absent; once configured, start/health/stop
|
|
|
|
|
work and both the supervisor and child pidfiles are cleaned up on stop.
|
|
|
|
|
|
2026-06-14 12:57:02 +02:00
|
|
|
### Key files
|
|
|
|
|
|
2026-06-14 15:07:47 +02:00
|
|
|
- `scripts/stage-colibri-iso.sh` — the staging script (dir creation, bin copy, rc.d install, rc.conf.sample generation)
|
2026-06-14 12:57:02 +02:00
|
|
|
- `packaging/freebsd/colibri_daemon.in` — rc.d script
|
2026-06-14 15:07:47 +02:00
|
|
|
- `docs/ISO-ACCEPTANCE-RUNBOOK.md` — acceptance commands to run on the booted image
|
2026-06-14 15:29:09 +02:00
|
|
|
- `docs/FREEBSD-BUILD-LANE-HANDOFF.md` — step-by-step build/boot/validate handoff for the FreeBSD agent
|
2026-06-14 15:07:47 +02:00
|
|
|
- clawdie-iso `build.sh` — `install_colibri_service()` already wires staging, user creation, and service enable
|
2026-06-14 15:24:21 +02:00
|
|
|
- `hermes-bsd` `packaging/freebsd/hermes_daemon.in` + `README-FreeBSD.md` — Hermes rc.d service and setup steps
|
2026-06-14 12:57:02 +02:00
|
|
|
|
|
|
|
|
### Suggested owner
|
|
|
|
|
|
2026-06-14 15:07:47 +02:00
|
|
|
ISO/build lane — FreeBSD agent (Codex) or Sam boots a built image and runs the
|
2026-06-14 15:24:21 +02:00
|
|
|
acceptance runbook plus the Hermes rc.d checks. No Linux-side code change is
|
|
|
|
|
required; this is a runtime-proof step.
|
2026-06-14 12:57:02 +02:00
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
## Priority 2: Prove the Pi spawn path end-to-end
|
|
|
|
|
|
|
|
|
|
### Why this is #2
|
|
|
|
|
|
|
|
|
|
The daemon has a full `Spawner` with provider routing, jail confinement,
|
|
|
|
|
retry/backoff, and `AgentHandle` that captures stdout for glasspane. But the
|
|
|
|
|
**daemon loop's `poll_tasks()` is a stub** (`daemon.rs:274-277`):
|
|
|
|
|
|
|
|
|
|
```rust
|
|
|
|
|
pub async fn poll_tasks(state: &SharedState) {
|
|
|
|
|
debug!("task polling tick");
|
|
|
|
|
let _spawner = Spawner::new(state.config.clone().into());
|
|
|
|
|
}
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
It creates a `Spawner` and does nothing with it. No agent is ever spawned from
|
|
|
|
|
the daemon loop. This blocks Gate 2 (agent observation parity) — we cannot
|
|
|
|
|
claim glasspane supervision works until a real process is spawned and its
|
|
|
|
|
JSONL events flow through to state transitions.
|
|
|
|
|
|
|
|
|
|
### What exists
|
|
|
|
|
|
2026-06-14 13:35:37 +02:00
|
|
|
| Capability | Location | Status |
|
|
|
|
|
| ------------------- | ------------------------------------------ | ---------------------------------------------------------- |
|
|
|
|
|
| `Spawner::spawn()` | `crates/colibri-daemon/src/spawner.rs:585` | done — provider routing, jail wrap, retry/backoff |
|
|
|
|
|
| `AgentHandle` | `crates/colibri-daemon/src/spawner.rs:465` | done — tracks child, stdout for glasspane, kill, poll_exit |
|
|
|
|
|
| `take_stdout()` | `crates/colibri-daemon/src/spawner.rs:500` | done — hands stdout to glasspane supervision |
|
|
|
|
|
| Jail confinement | `crates/colibri-daemon/src/spawner.rs:332` | done — named/ephemeral, staged env payload, priv modes |
|
|
|
|
|
| `fake-pi-agent.py` | `scripts/fake-pi-agent.py` | exists — emits JSONL events for testing |
|
|
|
|
|
| Glasspane ingestion | `crates/colibri-glasspane/` | done — ingests JSONL, tracks pane state |
|
2026-06-14 12:57:02 +02:00
|
|
|
|
|
|
|
|
### What's missing
|
|
|
|
|
|
|
|
|
|
1. **Wire `poll_tasks()` to actually spawn agents.**
|
|
|
|
|
The scheduler drains `intake-task` into SQLite on tick, but no agent is
|
|
|
|
|
spawned to work on the task. The poll_tasks stub needs to:
|
|
|
|
|
- Query tasks in `queued` status with a capability match
|
|
|
|
|
- Build an `AgentSpawnConfig` for each
|
|
|
|
|
- Call `Spawner::spawn()`
|
|
|
|
|
- Register the `AgentHandle` in daemon state
|
|
|
|
|
- Hand stdout to glasspane
|
|
|
|
|
|
|
|
|
|
2. **End-to-end integration test.**
|
|
|
|
|
Using `scripts/fake-pi-agent.py` (or a Rust mock binary):
|
|
|
|
|
- Start daemon
|
|
|
|
|
- Create a task + intake it
|
|
|
|
|
- Wait for scheduler tick + spawn
|
|
|
|
|
- Verify glasspane observes `Starting` → `Running` → `Stopped` lifecycle
|
|
|
|
|
- Verify session JSONL is written
|
|
|
|
|
- Verify agent appears in `colibri status` / `colibri snapshot`
|
|
|
|
|
|
|
|
|
|
3. **`spawn-local` socket command (if not present).**
|
|
|
|
|
An operator CLI path to manually spawn a local binary for debugging:
|
2026-06-14 13:35:37 +02:00
|
|
|
|
2026-06-14 12:57:02 +02:00
|
|
|
```sh
|
|
|
|
|
colibri spawn-local /path/to/pi --session-id test-1
|
|
|
|
|
```
|
2026-06-14 13:35:37 +02:00
|
|
|
|
2026-06-14 12:57:02 +02:00
|
|
|
This may already exist as a socket command — check `socket.rs` for
|
|
|
|
|
`SpawnLocal` or `Spawn` command variants.
|
|
|
|
|
|
|
|
|
|
4. **Process kill/cleanup verification.**
|
|
|
|
|
Confirm that `AgentHandle::kill()` reliably kills the child and any jail
|
|
|
|
|
wrapper, and that glasspane transitions to `Stopped`.
|
|
|
|
|
|
|
|
|
|
### Key files
|
|
|
|
|
|
|
|
|
|
- `crates/colibri-daemon/src/daemon.rs:274` — `poll_tasks()` stub (THE gap)
|
|
|
|
|
- `crates/colibri-daemon/src/daemon.rs:242` — `session_rotation()` (working, good reference for how other background loops iterate state)
|
|
|
|
|
- `crates/colibri-daemon/src/spawner.rs:585` — `Spawner::spawn()` (working)
|
|
|
|
|
- `crates/colibri-daemon/src/socket.rs` — socket command dispatch (check for spawn commands)
|
|
|
|
|
- `scripts/fake-pi-agent.py` — test agent that emits JSONL
|
|
|
|
|
- `crates/colibri-glasspane/src/` — JSONL ingestion + pane state machine
|
|
|
|
|
|
|
|
|
|
### Suggested owner
|
|
|
|
|
|
|
|
|
|
Rust lane (Hermes on Linux). Can implement and test fully on Linux with
|
|
|
|
|
`fake-pi-agent.py`. FreeBSD validation confirms jail path works.
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
## Priority 3: Wire cost mode into actual enforcement
|
|
|
|
|
|
|
|
|
|
### Why this is #3
|
|
|
|
|
|
|
|
|
|
Cost modes (`Fast`/`Smart`/`Max`) are the core design promise of Colibri —
|
|
|
|
|
"cache-first cost discipline." The code has all the pieces (thresholds,
|
|
|
|
|
escalation, compaction, trimming) but **they are not connected**. Right now
|
|
|
|
|
changing the cost mode does nothing to actual session behavior.
|
|
|
|
|
|
2026-06-14 13:35:37 +02:00
|
|
|
This is the most subtle gap because the code _looks_ like it's wired up — the
|
2026-06-14 12:57:02 +02:00
|
|
|
functions exist and have tests — but the call sites are missing or duplicated.
|
|
|
|
|
|
|
|
|
|
### The disconnection (detailed)
|
|
|
|
|
|
|
|
|
|
There are **two compaction paths** that use different sources of truth:
|
|
|
|
|
|
|
|
|
|
**Path A — per-append (session.rs):**
|
|
|
|
|
|
|
|
|
|
`session.rs:397-398` in `maybe_compact_or_rollover()`:
|
2026-06-14 13:35:37 +02:00
|
|
|
|
2026-06-14 12:57:02 +02:00
|
|
|
```rust
|
|
|
|
|
let needs_compaction = byte_count > self.config.session_max_bytes
|
|
|
|
|
|| turn_count > self.config.max_uncompacted_turns;
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
This reads `self.config.session_max_bytes` and
|
|
|
|
|
`self.config.max_uncompacted_turns` — these are **static fields** in
|
|
|
|
|
`DaemonConfig` loaded once from env vars (`COLIBRI_SESSION_MAX_BYTES`,
|
|
|
|
|
`COLIBRI_MAX_UNCOMPACTED_TURNS`). They default to 2,000,000 and 20 (Smart
|
|
|
|
|
values) regardless of the cost mode string.
|
|
|
|
|
|
|
|
|
|
**Path B — background rotation (daemon.rs):**
|
|
|
|
|
|
|
|
|
|
`daemon.rs:242-261` in `session_rotation()`:
|
2026-06-14 13:35:37 +02:00
|
|
|
|
2026-06-14 12:57:02 +02:00
|
|
|
```rust
|
|
|
|
|
let cost_mode = crate::cost::CostMode::parse(&state.config.cost_mode).unwrap_or_default();
|
|
|
|
|
let max_bytes = cost_mode.session_max_bytes();
|
|
|
|
|
let max_turns = cost_mode.max_uncompacted_turns();
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
This correctly derives thresholds from the cost mode. But it runs on a
|
|
|
|
|
background timer, not per-append, so it's a lagging check.
|
|
|
|
|
|
|
|
|
|
**Result:** if you set `COLIBRI_COST_MODE=fast`, the background loop will use
|
|
|
|
|
500K/5 thresholds, but the per-append check still uses the static 2M/20
|
|
|
|
|
config values. The session can grow past the Fast budget before the background
|
|
|
|
|
loop catches up.
|
|
|
|
|
|
|
|
|
|
### What's never called
|
|
|
|
|
|
2026-06-14 13:35:37 +02:00
|
|
|
| Function | Location | Problem |
|
|
|
|
|
| ---------------------------------- | ---------------- | --------------------------------------------------------------------------------------- |
|
|
|
|
|
| `auto_escalate()` | `cost.rs:131` | Tested but **never called** from daemon loop or session code |
|
|
|
|
|
| `compact_tool_result()` | `cost.rs:165` | Tested but **never called** when appending `ToolResult` entries |
|
2026-06-14 12:57:02 +02:00
|
|
|
| `PromptAssembly::trim_to_budget()` | `session.rs:117` | Tested but **never called** from `build_prompt_assembly()` or `build_prompt_messages()` |
|
2026-06-14 13:35:37 +02:00
|
|
|
| `EscalationTrigger` | `cost.rs:117` | Type exists, tested, never constructed in production code |
|
2026-06-14 12:57:02 +02:00
|
|
|
|
|
|
|
|
### What `set-cost-mode` does
|
|
|
|
|
|
|
|
|
|
`socket.rs:657` updates `state.config.cost_mode` (the string), but does NOT
|
|
|
|
|
update `state.config.session_max_bytes` or `state.config.max_uncompacted_turns`
|
|
|
|
|
(the numeric fields). So after a mode change, the per-append compaction path
|
|
|
|
|
still uses the old thresholds.
|
|
|
|
|
|
|
|
|
|
### Fix plan
|
|
|
|
|
|
|
|
|
|
1. **Make per-append compaction cost-mode-aware.**
|
|
|
|
|
In `session.rs`, change `maybe_compact_or_rollover()` to derive thresholds
|
|
|
|
|
from `CostMode::parse(&self.config.cost_mode)` instead of reading the static
|
|
|
|
|
fields directly. Or better: remove the static fields from `DaemonConfig`
|
|
|
|
|
entirely and always derive from `cost_mode`.
|
|
|
|
|
|
|
|
|
|
2. **Wire `compact_tool_result()` into the append path.**
|
|
|
|
|
When `SessionEntry::ToolResult` is appended and
|
|
|
|
|
`cost_mode.compact_tool_results()` is true, run the result through
|
|
|
|
|
`compact_tool_result()` before writing to JSONL.
|
|
|
|
|
|
|
|
|
|
3. **Wire `auto_escalate()` into `session_rotation()`.**
|
|
|
|
|
After compaction, if the session is still over budget, construct an
|
|
|
|
|
`EscalationTrigger::CompactionInsufficient` and call `auto_escalate()`.
|
|
|
|
|
If escalation succeeds, log it visibly and update `state.config.cost_mode`.
|
|
|
|
|
|
|
|
|
|
4. **Wire `trim_to_budget()` into prompt assembly.**
|
|
|
|
|
In `build_prompt_assembly()` or `build_prompt_messages()`, call
|
|
|
|
|
`trim_to_budget(cost_mode)` after constructing the assembly.
|
|
|
|
|
|
|
|
|
|
5. **Make `set-cost-mode` update derived thresholds.**
|
|
|
|
|
When the socket command changes `cost_mode`, also update
|
|
|
|
|
`session_max_bytes` and `max_uncompacted_turns` to match (or remove those
|
|
|
|
|
fields entirely and always derive).
|
|
|
|
|
|
|
|
|
|
6. **Remove `COLIBRI_SESSION_MAX_BYTES` / `COLIBRI_MAX_UNCOMPACTED_TURNS` env vars.**
|
|
|
|
|
These shadow the cost mode system and cause confusion. The cost mode
|
|
|
|
|
string (`COLIBRI_COST_MODE=fast|smart|max`) should be the single source of
|
|
|
|
|
truth for thresholds.
|
|
|
|
|
|
|
|
|
|
### Key files
|
|
|
|
|
|
|
|
|
|
- `crates/colibri-daemon/src/cost.rs` — cost mode logic (thresholds, escalation, compaction, headroom sidecar)
|
|
|
|
|
- `crates/colibri-daemon/src/session.rs:390` — `maybe_compact_or_rollover()` (uses static config, not cost mode)
|
|
|
|
|
- `crates/colibri-daemon/src/session.rs:492` — `build_prompt_assembly()` (doesn't call `trim_to_budget()`)
|
|
|
|
|
- `crates/colibri-daemon/src/config.rs:21,43` — `session_max_bytes` / `max_uncompacted_turns` static fields
|
|
|
|
|
- `crates/colibri-daemon/src/daemon.rs:242` — `session_rotation()` (correctly uses cost mode, good reference)
|
|
|
|
|
- `crates/colibri-daemon/src/socket.rs:657` — `cmd_set_cost_mode()` (updates string only, not derived values)
|
|
|
|
|
|
|
|
|
|
### Suggested owner
|
|
|
|
|
|
|
|
|
|
Rust lane (Hermes on Linux). Fully testable on Linux — this is pure logic
|
|
|
|
|
wiring, no platform-specific behavior.
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
## Summary table
|
|
|
|
|
|
2026-06-14 15:07:47 +02:00
|
|
|
| # | Item | Blocks | Linux-doable | Effort |
|
|
|
|
|
| --- | --------------------------- | ------------------- | --------------------------- | ------ |
|
|
|
|
|
| 1 | ISO boot/runtime validation | Gate 1 | no (needs FreeBSD boot) | small |
|
|
|
|
|
| 2 | Pi spawn end-to-end | Gate 2 | yes (with fake-pi-agent.py) | medium |
|
|
|
|
|
| 3 | Cost mode enforcement | core design promise | yes (pure logic) | medium |
|
2026-06-14 12:57:02 +02:00
|
|
|
|
|
|
|
|
All three are medium effort and can be worked in parallel. None require
|
|
|
|
|
FreeBSD to implement — only to validate the final result.
|