docs: delete 3 stale docs (TRUSS, PLAN-MOTHER-MCP, PRIORITY-HANDOFF) #175
8 changed files with 6 additions and 534 deletions
|
|
@ -128,7 +128,7 @@ was skipped; do not skip it.)
|
|||
| 4 — Write takeover | Colibri owns scheduling and dispatch | open |
|
||||
| 5 — TS retirement | One ISO candidate with Colibri as sole control service | open |
|
||||
|
||||
Current sprint priorities: `docs/PRIORITY-HANDOFF-ISO-SPAWN-COST.md`.
|
||||
Current sprint priorities: `docs/MULTI-AGENT-HOST-PLAN.md`.
|
||||
|
||||
## Jail Confinement
|
||||
|
||||
|
|
|
|||
|
|
@ -7,7 +7,7 @@ VMs, Bastille jails). Unifies coordination (task board, agent registry, skills
|
|||
catalog) with cache-first cost discipline (byte-stable prompt prefixes,
|
||||
cache-hit metering).
|
||||
|
||||
**Status:** workspace gates are fmt/clippy/test/release green. Round 2 audit is closed. Current priorities: ISO boot/runtime validation, Pi spawn end-to-end, and cost-mode enforcement (see [`docs/PRIORITY-HANDOFF-ISO-SPAWN-COST.md`](docs/PRIORITY-HANDOFF-ISO-SPAWN-COST.md)). Always query live state: see the crate table below and run the gate commands for current counts.
|
||||
**Status:** workspace gates are fmt/clippy/test/release green. Round 2 audit is closed. Current priorities: ISO boot/runtime validation, Pi spawn end-to-end, and cost-mode enforcement (see [`docs/MULTI-AGENT-HOST-PLAN.md`](docs/MULTI-AGENT-HOST-PLAN.md)). Always query live state: see the crate table below and run the gate commands for current counts.
|
||||
|
||||
FreeBSD build lane handoff: `docs/FREEBSD-BUILD-LANE-HANDOFF.md`.
|
||||
ISO acceptance runbook: `docs/ISO-ACCEPTANCE-RUNBOOK.md`.
|
||||
|
|
|
|||
|
|
@ -3,8 +3,8 @@
|
|||
**For:** Codex (FreeBSD 15 host) · **Goal:** produce a Colibri-backed ISO
|
||||
candidate and prove ISO Gate 1 (passive service) on real FreeBSD.
|
||||
|
||||
This is the runtime-proof step for **Priority 1** of
|
||||
`PRIORITY-HANDOFF-ISO-SPAWN-COST.md`. The build-side wiring is already done on
|
||||
This is the runtime-proof step for the FreeBSD ISO validation tracked in
|
||||
`MULTI-AGENT-HOST-PLAN.md`. The build-side wiring is already done on
|
||||
Linux — `clawdie-iso build.sh` stages the Colibri binaries, installs the rc.d
|
||||
script, creates the `colibri` user, and enables the service. What remains is
|
||||
work only a FreeBSD host can do: build the FreeBSD binaries, run the image
|
||||
|
|
|
|||
|
|
@ -1,162 +0,0 @@
|
|||
# Plan: Mother MCP Link — Vaultwarden Pubkey Exchange
|
||||
|
||||
**Direction:** B — we (ISO agent) call mother via SSH. Pubkeys exchanged
|
||||
through Vaultwarden, no operator copy-paste.
|
||||
|
||||
**Scope:** OSA acts as mother until dedicated machine exists. Shell scripts
|
||||
first, colibri-vault crate later when the contract is solid.
|
||||
|
||||
---
|
||||
|
||||
## Flow
|
||||
|
||||
```
|
||||
ISO (agent) VAULTWARDEN MOTHER (OSA)
|
||||
─────────────────────────────────────── ─────────────── ─────────────────────────────
|
||||
|
||||
[Enable Mother clicked]
|
||||
|
||||
1. ssh-keygen -t ed25519
|
||||
(if no key yet)
|
||||
|
||||
2. bw publish pubkey ──────────────► hive-pubkeys item
|
||||
name: <hostname>
|
||||
notes: ssh-ed25519 AAAA...
|
||||
|
||||
[cron: @every 5m]
|
||||
|
||||
3. bw list hive-pubkeys
|
||||
|
||||
4. rebuild authorized_keys.hive
|
||||
command="colibri-mcp",restrict
|
||||
one line per agent
|
||||
|
||||
5. update external-mcp.json ──────────────────────────────────────► ready for connections
|
||||
ssh -i key colibri@<ts-ip> colibri-mcp
|
||||
```
|
||||
|
||||
## Files
|
||||
|
||||
### Our side (clawdie-iso)
|
||||
|
||||
| File | Change | What |
|
||||
| ------------------------------- | ------ | --------------------------------------------------------------- |
|
||||
| `clawdie-enable-mother.sh` | Extend | Add keygen + vault publish BEFORE the external-mcp.json update |
|
||||
| `clawdie-vault-fetch` (colibri) | Extend | Add `--publish-pubkey` mode: create/update item in hive-pubkeys |
|
||||
|
||||
### Mother side (OSA, new)
|
||||
|
||||
| File | What |
|
||||
| ------------------------------ | ---------------------------------------------------------------------- |
|
||||
| `mother-sync-hive-keys.sh` | Pull all pubkeys from vault → rebuild authorized_keys.hive |
|
||||
| `/etc/cron.d/mother-hive-keys` | `@every 5m` cron entry |
|
||||
| sshd_config change | Add `AuthorizedKeysFile ... /var/db/colibri/.ssh/authorized_keys.hive` |
|
||||
|
||||
---
|
||||
|
||||
## Step-by-step
|
||||
|
||||
### A. Key generation (our side)
|
||||
|
||||
```
|
||||
clawdie-enable-mother.sh, new step [1/3]:
|
||||
|
||||
if [ ! -f ~/.ssh/id_ed25519 ]; then
|
||||
ssh-keygen -t ed25519 -N "" -C "colibri@$(hostname)" -f ~/.ssh/id_ed25519
|
||||
fi
|
||||
PUBKEY=$(cat ~/.ssh/id_ed25519.pub)
|
||||
```
|
||||
|
||||
### B. Publish to Vaultwarden (our side)
|
||||
|
||||
```
|
||||
clawdie-enable-mother.sh, new step [2/3]:
|
||||
|
||||
clawdie-vault-fetch --publish-pubkey "$PUBKEY" --collection hive-pubkeys
|
||||
|
||||
What this does:
|
||||
- bw login (using BW_* from provider.env)
|
||||
- bw get collection hive-pubkeys (create if absent)
|
||||
- bw get item "$(hostname)" --collectionid <id>
|
||||
→ if exists: bw edit item <id> notes="$PUBKEY"
|
||||
→ if not: bw create item --name "$(hostname)" --notes "$PUBKEY" --collectionid <id>
|
||||
```
|
||||
|
||||
### C. Update external-mcp.json (our side)
|
||||
|
||||
```
|
||||
clawdie-enable-mother.sh, step [3/3] (existing, with new identity file):
|
||||
|
||||
jq --arg key "$HOME/.ssh/id_ed25519" \
|
||||
'.servers.mother = {
|
||||
"command": "ssh",
|
||||
"args": ["-i", $key, "-o", "StrictHostKeyChecking=accept-new",
|
||||
"colibri@${MOTHER_TS_IP}", "colibri-mcp"],
|
||||
"env": {}
|
||||
}' "$EXTERNAL_MCP" > "$tmp" && mv "$tmp" "$EXTERNAL_MCP"
|
||||
```
|
||||
|
||||
### D. Mother sync (cron)
|
||||
|
||||
```
|
||||
mother-sync-hive-keys.sh:
|
||||
|
||||
1. bw login (BW_* from provider.env)
|
||||
2. COLLECTION_ID=$(bw get collection hive-pubkeys --id)
|
||||
3. bw list items --collectionid $COLLECTION_ID
|
||||
4. For each item:
|
||||
- HOSTNAME=$(bw get item $id | jq -r '.name')
|
||||
- PUBKEY=$(bw get item $id | jq -r '.notes')
|
||||
- echo "command=\"colibri-mcp\",restrict,no-pty,no-port-forwarding,no-X11-forwarding,no-agent-forwarding $PUBKEY colibri@$HOSTNAME"
|
||||
5. Write all to /var/db/colibri/.ssh/authorized_keys.hive (atomic: mktemp + mv)
|
||||
6. chmod 600
|
||||
|
||||
Cron: @every 5m root /usr/local/sbin/mother-sync-hive-keys.sh
|
||||
```
|
||||
|
||||
### E. sshd configuration (mother, one-time)
|
||||
|
||||
```
|
||||
# /etc/ssh/sshd_config addition:
|
||||
AuthorizedKeysFile .ssh/authorized_keys /var/db/colibri/.ssh/authorized_keys.hive
|
||||
|
||||
# Reload:
|
||||
service sshd reload
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Security properties
|
||||
|
||||
| Property | How |
|
||||
| ----------------------------- | ------------------------------------------------------------------- |
|
||||
| Rebuild, not append | Each sync regenerates the file — deleting a vault item = revocation |
|
||||
| Restriction applied by mother | `command="colibri-mcp",restrict` — not baked by publisher |
|
||||
| Dedicated key file | `authorized_keys.hive` separate from operator keys |
|
||||
| No shell access | `restrict` blocks everything except the forced command |
|
||||
| Atomic write | `mktemp` + `mv` — no partial reads |
|
||||
| TOFU on first connect | `StrictHostKeyChecking=accept-new` — auto-trust on first connection |
|
||||
|
||||
---
|
||||
|
||||
## Acceptance
|
||||
|
||||
- [ ] Click "Enable Mother" → keypair created if absent
|
||||
- [ ] Pubkey published to Vaultwarden (verify: `bw get item <hostname>`)
|
||||
- [ ] external-mcp.json updated with SSH + identity file
|
||||
- [ ] Mother cron syncs within 5 minutes
|
||||
- [ ] `authorized_keys.hive` contains the restricted entry
|
||||
- [ ] Pi can call mother's tools via `ssh -i key colibri@<ts-ip> colibri-mcp`
|
||||
- [ ] Delete vault item → next sync removes access (revocation tested)
|
||||
|
||||
---
|
||||
|
||||
## Sequencing
|
||||
|
||||
| Step | Repo | Content |
|
||||
| ---- | ----------- | ------------------------------------------------------ |
|
||||
| 1 | colibri | Extend `clawdie-vault-fetch` with `--publish-pubkey` |
|
||||
| 2 | clawdie-iso | Extend `clawdie-enable-mother.sh` — keygen + publish |
|
||||
| 3 | — | Create `mother-sync-hive-keys.sh` on OSA |
|
||||
| 4 | — | Wire cron + sshd_config on OSA |
|
||||
| 5 | — | End-to-end test: ISO → vault → OSA → SSH → colibri-mcp |
|
||||
|
|
@ -1,313 +0,0 @@
|
|||
# Priority Handoff — Three Focus Items Toward ISO Gate 1
|
||||
|
||||
**Created:** 14.jun.2026 (Sam & Hermes) · **Updated:** 19.jun.2026
|
||||
**Status:** Priorities 2 & 3 **done** · Priority 1 **staged for FreeBSD build**
|
||||
**Superseded by:** `MULTI-AGENT-HOST-PLAN.md` for the next sprint
|
||||
|
||||
Round 2 audit is fully closed. All repos are green (211 tests, clippy clean,
|
||||
fmt clean). The three items below were the highest-leverage work toward getting
|
||||
a Colibri-backed ISO candidate and delivering on the core cost-discipline
|
||||
promise.
|
||||
|
||||
**Current status of each item:**
|
||||
|
||||
- **Priority 1 (ISO boot validation):** Build wiring done, release runbook
|
||||
landed (`clawdie-iso/docs/RELEASE-BUILD-RUNBOOK.md`), artifacts built on
|
||||
FreeBSD host. Awaiting the 0.10.0 release build execution.
|
||||
- **Priority 2 (Pi spawn end-to-end):** **Done** — `poll_tasks()` wired in
|
||||
`9d443a4`, integration test `poll_tasks_spawns_agent_for_claimed_task` passes.
|
||||
- **Priority 3 (Cost mode enforcement):** **Done** — cost mode is single source
|
||||
of truth; `session_max_bytes`/`max_uncompacted_turns` removed from
|
||||
`DaemonConfig`; per-append compaction derives from `CostMode::parse()`.
|
||||
|
||||
The next sprint is multi-agent multi-host coordination — see
|
||||
[`MULTI-AGENT-HOST-PLAN.md`](MULTI-AGENT-HOST-PLAN.md).
|
||||
|
||||
---
|
||||
|
||||
## Priority 1: Validate the staged colibri_daemon boots and runs on the ISO
|
||||
|
||||
### Why this is #1
|
||||
|
||||
The build-side wiring is **done**. The clawdie-iso build now stages the
|
||||
Colibri binaries, installs the rc.d script, creates the `colibri` user, and
|
||||
enables the service. What has **not** happened yet is booting a freshly built
|
||||
image and confirming `colibri_daemon` actually starts and the acceptance
|
||||
runbook passes on real FreeBSD. Until that boot/runtime validation is done,
|
||||
Gate 1 (passive service) is unproven.
|
||||
|
||||
### What's done (build wiring)
|
||||
|
||||
| Artifact / step | Location | Status |
|
||||
| -------------------- | ------------------------------------------------ | -------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| staging script | clawdie-iso `scripts/stage-colibri-iso.sh` | done — copies `colibri-daemon`, `colibri`, `colibri-test-agent`, rc.d, newsyslog, creates dirs (canonical; lives in clawdie-iso) |
|
||||
| rc.d script | `packaging/freebsd/colibri_daemon.in` | done — `start_precmd`, pidfile, daemon(8) wrapper, `COLIBRI_COST_MODE` propagation |
|
||||
| newsyslog config | `packaging/freebsd/newsyslog-colibri.conf` | done |
|
||||
| rc.conf.sample | generated by staging script | done |
|
||||
| acceptance runbook | `docs/ISO-ACCEPTANCE-RUNBOOK.md` | done |
|
||||
| build integration | clawdie-iso `build.sh::install_colibri_service` | done — calls `stage-colibri-iso.sh` against the image root |
|
||||
| `colibri` user/group | clawdie-iso `build.sh` (`pw useradd colibri`) | done — created in the image during build |
|
||||
| service enable | clawdie-iso `build.sh` (`colibri_daemon_enable`) | done — written into image rc.conf |
|
||||
| prebuilt binaries | build-host Rust toolchain (preflight-gated) | done — `build.sh` stages prebuilt release binaries and fails preflight if missing |
|
||||
|
||||
### What's missing (boot/runtime validation)
|
||||
|
||||
1. **Boot a freshly built image on FreeBSD** (bhyve VM or hardware) and confirm
|
||||
the `colibri` user, binaries, rc.d script, and rc.conf entry are present in
|
||||
the running system.
|
||||
|
||||
2. **Run the acceptance runbook on the booted image:**
|
||||
|
||||
```sh
|
||||
service colibri_daemon start
|
||||
colibri status
|
||||
colibri create-task --title "iso check"
|
||||
colibri list-tasks --status queued
|
||||
colibri intake-task --title "iso intake check" --capability freebsd
|
||||
# wait one scheduler tick
|
||||
colibri list-tasks --status queued
|
||||
service colibri_daemon stop
|
||||
```
|
||||
|
||||
3. **Confirm logging + lifecycle:** pidfile is created, newsyslog rotation
|
||||
config is in place, and `service colibri_daemon stop` cleanly stops the
|
||||
daemon and removes the pidfile.
|
||||
|
||||
4. **Validate the Hermes rc.d service** (`hermes-bsd`, merged 14.jun.2026 as
|
||||
`fc4b57ade`). The `hermes_daemon` rc.d script runs `hermes gateway run`
|
||||
under `daemon(8)` with a dedicated user, persistent `HERMES_HOME`, and
|
||||
supervisor/child pidfile separation — but it has not been booted on real
|
||||
FreeBSD yet. On the same image run:
|
||||
|
||||
```sh
|
||||
# one-time: create user + install the rc.d script per README-FreeBSD.md
|
||||
service hermes_daemon start # must abort cleanly if config.yaml is missing
|
||||
service hermes_daemon health
|
||||
service hermes_daemon stop # supervisor exits, child does not respawn
|
||||
```
|
||||
|
||||
Confirm: prestart aborts (exit 1, no crash loop) when
|
||||
`/var/db/hermes/config.yaml` is absent; once configured, start/health/stop
|
||||
work and both the supervisor and child pidfiles are cleaned up on stop.
|
||||
|
||||
### Key files
|
||||
|
||||
- clawdie-iso `scripts/stage-colibri-iso.sh` — the staging script (dir creation, bin copy, rc.d install, rc.conf.sample generation). Canonical copy lives in clawdie-iso; the colibri repo no longer keeps a duplicate.
|
||||
- `packaging/freebsd/colibri_daemon.in` — rc.d script
|
||||
- `docs/ISO-ACCEPTANCE-RUNBOOK.md` — acceptance commands to run on the booted image
|
||||
- `docs/FREEBSD-BUILD-LANE-HANDOFF.md` — step-by-step build/boot/validate handoff for the FreeBSD agent
|
||||
- clawdie-iso `build.sh` — `install_colibri_service()` already wires staging, user creation, and service enable
|
||||
- `hermes-bsd` `packaging/freebsd/hermes_daemon.in` + `README-FreeBSD.md` — Hermes rc.d service and setup steps
|
||||
|
||||
### Suggested owner
|
||||
|
||||
ISO/build lane — FreeBSD agent (Codex) or Sam boots a built image and runs the
|
||||
acceptance runbook plus the Hermes rc.d checks. No Linux-side code change is
|
||||
required; this is a runtime-proof step.
|
||||
|
||||
---
|
||||
|
||||
## Priority 2: Prove the Pi spawn path end-to-end
|
||||
|
||||
### Why this is #2
|
||||
|
||||
The daemon has a full `Spawner` with provider routing, jail confinement,
|
||||
retry/backoff, and `AgentHandle` that captures stdout for glasspane. But the
|
||||
**daemon loop's `poll_tasks()` is a stub** (`daemon.rs:274-277`):
|
||||
|
||||
```rust
|
||||
pub async fn poll_tasks(state: &SharedState) {
|
||||
debug!("task polling tick");
|
||||
let _spawner = Spawner::new(state.config.clone().into());
|
||||
}
|
||||
```
|
||||
|
||||
It creates a `Spawner` and does nothing with it. No agent is ever spawned from
|
||||
the daemon loop. This blocks Gate 2 (agent observation parity): glasspane
|
||||
supervision requires a real spawned process whose JSONL events flow through
|
||||
to state transitions before it can be validated.
|
||||
|
||||
### What exists
|
||||
|
||||
| Capability | Location | Status |
|
||||
| -------------------- | ------------------------------------------ | ---------------------------------------------------------- |
|
||||
| `Spawner::spawn()` | `crates/colibri-daemon/src/spawner.rs:585` | done — provider routing, jail wrap, retry/backoff |
|
||||
| `AgentHandle` | `crates/colibri-daemon/src/spawner.rs:465` | done — tracks child, stdout for glasspane, kill, poll_exit |
|
||||
| `take_stdout()` | `crates/colibri-daemon/src/spawner.rs:500` | done — hands stdout to glasspane supervision |
|
||||
| Jail confinement | `crates/colibri-daemon/src/spawner.rs:332` | done — named/ephemeral, staged env payload, priv modes |
|
||||
| `sample-pi-agent.py` | `scripts/sample-pi-agent.py` | exists — emits JSONL events for testing |
|
||||
| Glasspane ingestion | `crates/colibri-glasspane/` | done — ingests JSONL, tracks pane state |
|
||||
|
||||
### What's missing
|
||||
|
||||
1. **Wire `poll_tasks()` to actually spawn agents.**
|
||||
The scheduler drains `intake-task` into SQLite on tick, but no agent is
|
||||
spawned to work on the task. The poll_tasks stub needs to:
|
||||
- Query tasks in `queued` status with a capability match
|
||||
- Build an `AgentSpawnConfig` for each
|
||||
- Call `Spawner::spawn()`
|
||||
- Register the `AgentHandle` in daemon state
|
||||
- Hand stdout to glasspane
|
||||
|
||||
2. **End-to-end integration test.**
|
||||
Using `scripts/sample-pi-agent.py` (or a Rust mock binary):
|
||||
- Start daemon
|
||||
- Create a task + intake it
|
||||
- Wait for scheduler tick + spawn
|
||||
- Verify glasspane observes `Starting` → `Running` → `Stopped` lifecycle
|
||||
- Verify session JSONL is written
|
||||
- Verify agent appears in `colibri status` / `colibri snapshot`
|
||||
|
||||
3. **`spawn-local` socket command (if not present).**
|
||||
An operator CLI path to manually spawn a local binary for debugging:
|
||||
|
||||
```sh
|
||||
colibri spawn-local /path/to/pi --session-id test-1
|
||||
```
|
||||
|
||||
This may already exist as a socket command — check `socket.rs` for
|
||||
`SpawnLocal` or `Spawn` command variants.
|
||||
|
||||
4. **Process kill/cleanup verification.**
|
||||
Confirm that `AgentHandle::kill()` reliably kills the child and any jail
|
||||
wrapper, and that glasspane transitions to `Stopped`.
|
||||
|
||||
### Key files
|
||||
|
||||
- `crates/colibri-daemon/src/daemon.rs:274` — `poll_tasks()` stub (THE gap)
|
||||
- `crates/colibri-daemon/src/daemon.rs:242` — `session_rotation()` (working, good reference for how other background loops iterate state)
|
||||
- `crates/colibri-daemon/src/spawner.rs:585` — `Spawner::spawn()` (working)
|
||||
- `crates/colibri-daemon/src/socket.rs` — socket command dispatch (check for spawn commands)
|
||||
- `scripts/sample-pi-agent.py` — test agent that emits JSONL
|
||||
- `crates/colibri-glasspane/src/` — JSONL ingestion + pane state machine
|
||||
|
||||
### Suggested owner
|
||||
|
||||
Rust lane (Hermes on Linux). Can implement and test fully on Linux with
|
||||
`sample-pi-agent.py`. FreeBSD validation confirms jail path works.
|
||||
|
||||
---
|
||||
|
||||
## Priority 3: Wire cost mode into actual enforcement
|
||||
|
||||
### Why this is #3
|
||||
|
||||
Cost modes (`Fast`/`Smart`/`Max`) are the core design promise of Colibri —
|
||||
"cache-first cost discipline." The code has all the pieces (thresholds,
|
||||
escalation, compaction, trimming) but **they are not connected**. Right now
|
||||
changing the cost mode does nothing to actual session behavior.
|
||||
|
||||
This is the most subtle gap because the code _looks_ like it's wired up — the
|
||||
functions exist and have tests — but the call sites are missing or duplicated.
|
||||
|
||||
### The disconnection (detailed)
|
||||
|
||||
There are **two compaction paths** that use different sources of truth:
|
||||
|
||||
**Path A — per-append (session.rs):**
|
||||
|
||||
`session.rs:397-398` in `maybe_compact_or_rollover()`:
|
||||
|
||||
```rust
|
||||
let needs_compaction = byte_count > self.config.session_max_bytes
|
||||
|| turn_count > self.config.max_uncompacted_turns;
|
||||
```
|
||||
|
||||
This reads `self.config.session_max_bytes` and
|
||||
`self.config.max_uncompacted_turns` — these are **static fields** in
|
||||
`DaemonConfig` loaded once from env vars (`COLIBRI_SESSION_MAX_BYTES`,
|
||||
`COLIBRI_MAX_UNCOMPACTED_TURNS`). They default to 2,000,000 and 20 (Smart
|
||||
values) regardless of the cost mode string.
|
||||
|
||||
**Path B — background rotation (daemon.rs):**
|
||||
|
||||
`daemon.rs:242-261` in `session_rotation()`:
|
||||
|
||||
```rust
|
||||
let cost_mode = crate::cost::CostMode::parse(&state.config.cost_mode).unwrap_or_default();
|
||||
let max_bytes = cost_mode.session_max_bytes();
|
||||
let max_turns = cost_mode.max_uncompacted_turns();
|
||||
```
|
||||
|
||||
This correctly derives thresholds from the cost mode. But it runs on a
|
||||
background timer, not per-append, so it's a lagging check.
|
||||
|
||||
**Result:** if you set `COLIBRI_COST_MODE=fast`, the background loop will use
|
||||
500K/5 thresholds, but the per-append check still uses the static 2M/20
|
||||
config values. The session can grow past the Fast budget before the background
|
||||
loop catches up.
|
||||
|
||||
### What's never called
|
||||
|
||||
| Function | Location | Problem |
|
||||
| ---------------------------------- | ---------------- | --------------------------------------------------------------------------------------- |
|
||||
| `auto_escalate()` | `cost.rs:131` | Tested but **never called** from daemon loop or session code |
|
||||
| `compact_tool_result()` | `cost.rs:165` | Tested but **never called** when appending `ToolResult` entries |
|
||||
| `PromptAssembly::trim_to_budget()` | `session.rs:117` | Tested but **never called** from `build_prompt_assembly()` or `build_prompt_messages()` |
|
||||
| `EscalationTrigger` | `cost.rs:117` | Type exists, tested, never constructed in production code |
|
||||
|
||||
### What `set-cost-mode` does
|
||||
|
||||
`socket.rs:657` updates `state.config.cost_mode` (the string), but does NOT
|
||||
update `state.config.session_max_bytes` or `state.config.max_uncompacted_turns`
|
||||
(the numeric fields). So after a mode change, the per-append compaction path
|
||||
still uses the old thresholds.
|
||||
|
||||
### Fix plan
|
||||
|
||||
1. **Make per-append compaction cost-mode-aware.**
|
||||
In `session.rs`, change `maybe_compact_or_rollover()` to derive thresholds
|
||||
from `CostMode::parse(&self.config.cost_mode)` instead of reading the static
|
||||
fields directly. Or better: remove the static fields from `DaemonConfig`
|
||||
entirely and always derive from `cost_mode`.
|
||||
|
||||
2. **Wire `compact_tool_result()` into the append path.**
|
||||
When `SessionEntry::ToolResult` is appended and
|
||||
`cost_mode.compact_tool_results()` is true, run the result through
|
||||
`compact_tool_result()` before writing to JSONL.
|
||||
|
||||
3. **Wire `auto_escalate()` into `session_rotation()`.**
|
||||
After compaction, if the session is still over budget, construct an
|
||||
`EscalationTrigger::CompactionInsufficient` and call `auto_escalate()`.
|
||||
If escalation succeeds, log it visibly and update `state.config.cost_mode`.
|
||||
|
||||
4. **Wire `trim_to_budget()` into prompt assembly.**
|
||||
In `build_prompt_assembly()` or `build_prompt_messages()`, call
|
||||
`trim_to_budget(cost_mode)` after constructing the assembly.
|
||||
|
||||
5. **Make `set-cost-mode` update derived thresholds.**
|
||||
When the socket command changes `cost_mode`, also update
|
||||
`session_max_bytes` and `max_uncompacted_turns` to match (or remove those
|
||||
fields entirely and always derive).
|
||||
|
||||
6. **Remove `COLIBRI_SESSION_MAX_BYTES` / `COLIBRI_MAX_UNCOMPACTED_TURNS` env vars.**
|
||||
These shadow the cost mode system and cause confusion. The cost mode
|
||||
string (`COLIBRI_COST_MODE=fast|smart|max`) should be the single source of
|
||||
truth for thresholds.
|
||||
|
||||
### Key files
|
||||
|
||||
- `crates/colibri-daemon/src/cost.rs` — cost mode logic (thresholds, escalation, compaction, headroom sidecar)
|
||||
- `crates/colibri-daemon/src/session.rs:390` — `maybe_compact_or_rollover()` (uses static config, not cost mode)
|
||||
- `crates/colibri-daemon/src/session.rs:492` — `build_prompt_assembly()` (doesn't call `trim_to_budget()`)
|
||||
- `crates/colibri-daemon/src/config.rs:21,43` — `session_max_bytes` / `max_uncompacted_turns` static fields
|
||||
- `crates/colibri-daemon/src/daemon.rs:242` — `session_rotation()` (correctly uses cost mode, good reference)
|
||||
- `crates/colibri-daemon/src/socket.rs:657` — `cmd_set_cost_mode()` (updates string only, not derived values)
|
||||
|
||||
### Suggested owner
|
||||
|
||||
Rust lane (Hermes on Linux). Fully testable on Linux — this is pure logic
|
||||
wiring, no platform-specific behavior.
|
||||
|
||||
---
|
||||
|
||||
## Summary table
|
||||
|
||||
| # | Item | Blocks | Linux-doable | Effort |
|
||||
| --- | --------------------------- | ------------------- | ----------------------------- | ------ |
|
||||
| 1 | ISO boot/runtime validation | Gate 1 | no (needs FreeBSD boot) | small |
|
||||
| 2 | Pi spawn end-to-end | Gate 2 | yes (with sample-pi-agent.py) | medium |
|
||||
| 3 | Cost mode enforcement | core design promise | yes (pure logic) | medium |
|
||||
|
||||
All three are medium effort and can be worked in parallel. None require
|
||||
FreeBSD to implement — only to validate the final result.
|
||||
|
|
@ -15,7 +15,4 @@ A quick-reference guide to every document in this folder.
|
|||
| [`ISO-ACCEPTANCE-RUNBOOK.md`](ISO-ACCEPTANCE-RUNBOOK.md) | Post-boot acceptance commands after staging Colibri into an ISO | Codex (FreeBSD) |
|
||||
| [`ISO-SERVICE-LAYOUT.md`](ISO-SERVICE-LAYOUT.md) | `rc.conf` service layout for the ISO image | All |
|
||||
| [`MULTI-AGENT-HOST-PLAN.md`](MULTI-AGENT-HOST-PLAN.md) | **Current sprint**: multi-agent task-board tests + CLI surface gaps | All agents |
|
||||
| [`PLAN-MOTHER-MCP-VAULT-KEYS.md`](PLAN-MOTHER-MCP-VAULT-KEYS.md) | Vaultwarden pubkey exchange for mother MCP link (direction B) | Sam & Hermes |
|
||||
| [`PRIORITY-HANDOFF-ISO-SPAWN-COST.md`](PRIORITY-HANDOFF-ISO-SPAWN-COST.md) | ISO boot validation, Pi spawn path, cost mode enforcement (P2/P3 done) | All agents |
|
||||
| [`TRUSS-SPAWN-ANALYSIS.md`](TRUSS-SPAWN-ANALYSIS.md) | truss trace of jail-spawn Permission Denied — root cause + fix | Debugging |
|
||||
| [`VAULT-PROVISION-RUNBOOK.md`](VAULT-PROVISION-RUNBOOK.md) | First-proof runbook: vault → jail → `.env` chain (clean CLI) | Agents, Sam |
|
||||
|
|
|
|||
|
|
@ -1,50 +0,0 @@
|
|||
# truss Analysis — colibri-daemon Jail Spawn (21.jun.2026)
|
||||
|
||||
**Trace saved:** `/tmp/daemon.truss` (1964 lines, captured during successful spawn)
|
||||
|
||||
## The Bug
|
||||
|
||||
The daemon could not spawn agents inside jails. `colibri spawn-agent --jail-name`
|
||||
returned "Permission denied (os error 13)" even though `sudo -n jexec proof0 ...`
|
||||
worked fine from the shell.
|
||||
|
||||
## What truss Revealed
|
||||
|
||||
Two independent issues, both masked by the same EACCES error:
|
||||
|
||||
### 1. Bare command names in daemon(8) PATH
|
||||
|
||||
The daemon constructed spawn commands with bare names (`sudo`, `jexec`).
|
||||
Under `daemon(8) -u clawdie`, the inherited PATH may be empty or reordered,
|
||||
so `execvp` missed `/usr/local/bin/sudo` and returned EACCES.
|
||||
|
||||
**Fix:** `resolve_program()` — absolutizes bare names by searching a fixed
|
||||
list (`/usr/local/sbin`, `/usr/local/bin`, `/usr/sbin`, `/usr/bin`, `/sbin`,
|
||||
`/bin`), returning the first executable found. PR #131.
|
||||
|
||||
### 2. Staging directory owned by root
|
||||
|
||||
For jailed spawns with environment variables, the daemon's
|
||||
`prepare_spawn_command` stages files under the jail root at
|
||||
`<jail_root>/var/run/colibri-stage/<stage_id>/`. This directory was
|
||||
created by a previous run (as root) and was mode 755 root:wheel.
|
||||
The daemon runs as `clawdie` and could not write staging files there.
|
||||
|
||||
**Fix (history):** initially `chmod 777 <jail_root>/var/run/colibri-stage`, then
|
||||
`agent-jail-bootstrap.sh` pre-created it clawdie-owned `0700` (#134). **Final
|
||||
(#135):** staging moved out of root-owned `/var/run` to the daemon user's home
|
||||
at `/home/clawdie/.cache/colibri/stage/<id>/`, so the daemon creates it itself
|
||||
with no privileged pre-creation step (overridable via `COLIBRI_JAIL_STAGE_DIR`).
|
||||
|
||||
## The Winning Spawn
|
||||
|
||||
```
|
||||
program=/usr/local/bin/sudo requested=sudo
|
||||
args=["-n", "jexec", "proof0", "/bin/sh",
|
||||
"/var/run/colibri-stage/<id>/launch.sh",
|
||||
"/var/run/colibri-stage/<id>/env.sh", "-",
|
||||
"/usr/local/bin/colibri-test-agent"]
|
||||
path=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
|
||||
```
|
||||
|
||||
Agent spawned, vault provision ran, `.env` written. Track A complete.
|
||||
|
|
@ -14,8 +14,8 @@ Two binaries, not one (Sam rejected merging them, 13.jun.2026):
|
|||
Canonical statement: `AGENTS.md` (lines ~18–32). `clawdie-ai` (TS) is being
|
||||
pruned; surviving features move to zot/Colibri.
|
||||
|
||||
> There is **no** `ADR-agent-harness-consolidation.md` despite references to it
|
||||
> in `clawdie-iso/scripts/stage-colibri-iso.sh`. Treat `AGENTS.md` as the ADR.
|
||||
> There is **no** `ADR-agent-harness-consolidation.md` (it was referenced in the
|
||||
> past; those references have since been cleaned up). Treat `AGENTS.md` as the ADR.
|
||||
|
||||
## Runtimes
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Reference in a new issue