docs: rewrite ADR + jail-spawn design to match shipped code
Some checks failed
CI / rust (pull_request) Has been cancelled
CI / markdown (pull_request) Has been cancelled

Both were written as proposals; the decisions are now working code, so slim them
to plain "how it works" docs (code is the source of truth).

- ADR-agent-harness-consolidation: Proposed -> Accepted/implemented; drop the
  migration plan + gates (all shipped), fold in the pi-demotion correction, and
  drop the dangling CLAWDIE-AGENT-WIKI reference (deleted in #34). 116 -> ~55 lines.
- COLIBRI-JAILED-AGENT-SPAWN-DESIGN: proposal -> implemented; describe the shipped
  spawner (name-vs-path lifecycle, command= syntax, PrivMode mdo/helper, socket
  wiring, external-MCP reuse) instead of the original code sketch.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
Sam & Claude 2026-06-13 21:56:01 +02:00
parent 6dbc9f6ada
commit 8eff3c6eff
2 changed files with 93 additions and 271 deletions

View file

@ -1,116 +1,52 @@
# ADR: Consolidate agent harnesses on zot + Colibri
# ADR: zot is the agent, Colibri is the control plane
**Status:** Proposed · **Date:** 13.jun.2026 · **Owner:** Sam & Claude
**Status:** Accepted — implemented · **Date:** 13.jun.2026 · **Owner:** Sam & Claude
> **Update (13.jun.2026):** The "remove Pi" / "stage-deprecate Pi" guidance below
> is **superseded**. Decision refined: **Pi is demoted, not removed** — it stays
> on the image as a _spawnable agent backend_ that zot (the primary harness) can
> launch (e.g. a jailed Pi via the Colibri spawner). The Node runtime therefore
> **stays**. zot becomes the default/primary harness; the Rust `clawdie` relay and
> mini-binary are still retired (done — the `clawdie` crate was removed). See
> `docs/COLIBRI-JAILED-AGENT-SPAWN-DESIGN.md`.
## What we decided
## Context
We had five overlapping things that each did some of "run an agent" or "supervise
agents": **pi, zot, codex, a Rust `clawdie` mini-binary, and the clawdie-ai
TypeScript app.** We collapsed that to **two clear roles**:
We have accumulated overlapping pieces that all do "agent" or "supervision":
- **zot — the agent** (the front door that talks to the model). One static Go
binary with built-in providers (DeepSeek + ~25 others), a Telegram bot, JSON
output mode, `SKILL.md` skills, subprocess extensions, and its own credential
store (`$ZOT_HOME/auth.json`).
- **Colibri — the control plane** (the supervisor). Watches agents via glasspane,
runs the task board, holds the skills catalog, tracks cost. It does **not**
contain zot — it spawns and observes it.
| Piece | What it is | Layer |
| ------------------------------------------------------ | ------------------------------------------------------------------- | ---------------------- |
| **Pi** (`@earendil-works/pi-coding-agent`) | Node coding-agent CLI; `--mode json` JSONL | harness |
| **zot** (`clawdie/zot`, mirror of `patriceckhart/zot`) | Go coding-agent, one static binary; TUI/print/json modes | harness |
| **codex** | OpenAI Codex CLI | harness |
| **clawdie** (Rust crate in this repo) | single binary: glasspane + Herdr socket + a Telegram→DeepSeek relay | mixed |
| **clawdie-ai** (TypeScript) | legacy control plane + Telegram + media features | mixed |
| **Colibri** | FreeBSD-native supervision (glasspane) + coordination + cost | control plane |
| **Herdr** | Linux-only supervision/pane UI | supervision (optional) |
"Clawdie" is the product name for _zot + Colibri together_, not a separate binary.
Two redundancies are now concrete, not theoretical:
## How it works now (shipped, not planned)
1. **zot already provides what the Rust `clawdie` binary reimplements** — a single
binary with a built-in Telegram bot, DeepSeek (and ~25 other providers), a JSON
run mode, `SKILL.md` skills, and subprocess JSON-RPC extensions. `clawdie`'s
`telegram.rs` + `deepseek.rs` are a single-provider subset of that.
2. **Pi and zot are interchangeable harnesses.** `colibri-glasspane` already models
both (`AgentRuntime { Pi, Zot }`) and `apply_zot_event` **delegates to
`apply_pi_event`** — identical event taxonomy. zot also covers Pi's lanes:
`openai_codex.go` (the gpt-5.5/codex lane), Anthropic/Gemini/Copilot/DeepSeek,
and OAuth (`auth/oauth.go`). zot is one static Go binary; Pi drags the Node
runtime into the ISO.
- **pi is kept as a spawnable backend, not the default.** zot is the primary
harness; Colibri can spawn pi as a worker (including jailed — see
`COLIBRI-JAILED-AGENT-SPAWN-DESIGN.md`). The Node runtime stays for that.
glasspane treats pi and zot identically (`AgentRuntime { Pi, Zot }`).
- **The Rust `clawdie` mini-binary is gone** — it reimplemented Telegram +
DeepSeek + credentials zot already has. Removed (crate deleted).
- **Herdr** — unchanged; optional Linux display client, Colibri glasspane is the
FreeBSD-native core.
- **clawdie-ai (TS)** — being pruned; surviving features move to zot
skills/extensions or Colibri.
- **Credentials** live in zot's own surface (`$ZOT_HOME/auth.json` or the rc.d env
file), not baked into a custom binary.
Credentials are the same story: zot has a full credential manager
(`--api-key` → provider env var → `$ZOT_HOME/auth.json` [API key or OAuth, 0600],
plus interactive OAuth/api-key login). `clawdie`'s `resolve()` (env → build-flag
baked) is a strict subset.
## Why
## Decision
zot already did what our own code was reimplementing (providers, Telegram, OAuth,
skills, credentials), and pi/zot are interchangeable at the event level. Keeping
all of it was duplicate work and dragged the Node runtime into the image. One
agent + one control plane = less to maintain, smaller image, one credential model.
Converge on **two components**:
## Trade-off we accept
- **Agent / front door = zot** — providers (incl. DeepSeek), Telegram, tools,
JSON mode, skills, extensions, and credential management (`$ZOT_HOME/auth.json`).
- **Control plane = Colibri** — supervision (glasspane consumes zot JSON),
coordination (task board), skills catalog, cost discipline.
zot is an upstream mirror (`patriceckhart/zot`, MIT). We pin a tag for the ISO and
track upstream; MIT lets us fork if ever needed.
Consequences for the other pieces:
## References (code is the source of truth)
- **Rust `clawdie` binary:** stop reimplementing the agent. Either (a) slim it to a
thin supervisor that launches zot and wires it to Colibri, or (b) retire the
binary and let **"clawdie" be the product name** for _zot + Colibri_. Drop
`telegram.rs` / `deepseek.rs` / credential `resolve()` — zot owns those.
- **Pi:** stage-deprecate in favor of zot once the gates below pass. Keep it until
then — it is the currently-documented Colibri JSONL contract.
- **Herdr:** unchanged — optional Linux UI; Colibri glasspane is the FreeBSD core.
- **clawdie-ai (TS):** continue pruning; route surviving capabilities to **zot
extensions / `SKILL.md`** (the mechanism already exists upstream), the rest to
Colibri; retire over time.
Net: three "agents" (Pi, the Rust clawdie relay, clawdie-ai TS) collapse toward
**one agent (zot) + one control plane (Colibri)**, and the Node runtime leaves the
agent path.
## Credentials on the operator image
The ISO should populate zot's own credential surface instead of baking keys into a
bespoke binary:
- set `ZOT_HOME` for the operator (e.g. `/var/db/clawdie` or the operator home),
- export `DEEPSEEK_API_KEY` (+ the Telegram token zot reads) via the rc.d env file,
**or** stage a mode-0600 `$ZOT_HOME/auth.json`,
- the existing `clawdie.env` becomes the place those land.
This replaces `clawdie`'s build-flag baking and keeps secrets out of the binary.
## Migration gates (do not break Pi mid-flight)
1. Validate `colibri-glasspane`'s zot parser against **real** `zot --mode json`
output (capture a live JSONL sample; confirm state transitions match Pi's).
2. Confirm zot covers the required provider lanes on FreeBSD — DeepSeek, and the
codex/gpt-5.5 **OAuth** subscription lane (`auth.json`) — with a smoke per lane.
3. Pilot zot as the operator-USB agent behind a build flag; Pi stays default.
4. Once green: flip default to zot, retire the Rust `clawdie` relay, drop the
Node-for-agent bundling, and remove Pi from the live CLI set.
## Risks
- Real migration, not a rename; Pi is the documented contract today.
- zot is third-party upstream (`patriceckhart/zot`, MIT) — we must track/sync it
(wire an `upstream` remote; pin a tag for the ISO). MIT lets us fork if needed.
- codex/gpt-5.5 OAuth must be proven in zot before dropping Pi.
- FreeBSD: zot is Go (static binary, easy); confirm `x86_64-freebsd` builds + the
Telegram/long-poll path on the live USB.
## Consequences
- **Positive:** one harness + one control plane; no Node in the agent path; smaller
ISO; one credential model (`auth.json`); zot's skills/extensions absorb clawdie-ai
features; less surface to maintain.
- **Negative:** migration effort; temporary dual-run (Pi + zot) during gates;
dependency on an external upstream we must keep synced.
## References
- `crates/colibri-glasspane/src/lib.rs``AgentRuntime { Pi, Zot }`, `apply_zot_event`
- `clawdie/zot``packages/agent/config.go` (credential order), `packages/agent/botcmd.go`
(Telegram), `packages/provider/openai_codex.go`, `packages/provider/auth/oauth.go`
- `docs/HERDR-VS-COLIBRI-GRAPH.md` — supervision boundary (Herdr vs Colibri)
- `docs/CLAWDIE-AGENT-WIKI.md` — the (now-superseded) Rust clawdie bundle
- `crates/colibri-glasspane/src/lib.rs``AgentRuntime { Pi, Zot }`
- `crates/colibri-daemon/src/spawner.rs` — the agent/pi spawner (+ jail confinement)
- `docs/COLIBRI-JAILED-AGENT-SPAWN-DESIGN.md`

View file

@ -1,185 +1,71 @@
# Colibri jailed agent spawn — design
# Colibri jailed agent spawn
**Status:** proposal · **Date:** 2026-06-13
**Status:** Accepted — implemented · **Date:** 13.jun.2026
How Colibri spawns a child agent (e.g. `pi`) confined inside a **FreeBSD jail**,
and the privilege model for the root-requiring jail step.
How Colibri confines a spawned agent (e.g. `pi`) inside a FreeBSD jail, and how
the unprivileged daemon gets the root that jails require. This describes the
shipped code in `crates/colibri-daemon/src/spawner.rs`.
## Why this lives in Colibri, not zot
zot has a multi-agent `swarm`, but its child is hardwired to `os.Executable()`
(itself) — the binary override is test-only, the public `SpawnRequest` exposes
no command field, and swarm explicitly runs with **"no worktree, no isolation,
cwd == RepoRoot"**. So "zot spawns pi-in-a-jail" would require forking the zot
mirror (binary override + a pi↔swarm protocol shim + jail wrapping we add
anyway). See the agent-harness consolidation notes.
Colibri is the supervisor and already spawns agents — `spawner.rs` runs the
subprocess, captures its JSONL, and feeds glasspane. Confinement is a supervisor
concern, so it lives here, and zot stays a clean upstream mirror. (zot's own
`swarm` only spawns copies of zot and has no isolation, so it was never the right
place for this.)
Colibri is the **supervisor**, already models `AgentRuntime{Pi, Zot}`, and —
critically — **already spawns pi**: `crates/colibri-daemon/src/spawner.rs` runs
agent subprocesses, captures their stdout JSONL, and hands it to glasspane.
`socket.rs:345` even comments _"enables real Pi spawn."_ Confinement is a
supervisor concern and root-adjacent, so it belongs here. zot stays a clean
upstream mirror, untouched.
## How it works
## What already exists (the spawn pipeline)
A spawn can carry an optional `JailConfig`; with none, the agent runs on the host
as before. The field that is set picks the jail lifecycle:
```
SpawnAgent socket cmd (lib.rs:40)
→ cmd_spawn_agent (socket.rs:327)
→ Spawner::spawn (spawner.rs)
→ Command::new(binary).args().envs().stdout(piped()).spawn() ← spawner.rs:341
→ AgentHandle.take_stdout() (spawner.rs:192)
→ glasspane apply_pi_event (AgentRuntime::Pi, socket.rs:410)
```
- **`name`** — enter an already-running **persistent** jail with `jexec`
(created/destroyed out of band by rc.d / the operator). Takes precedence.
- **`path`** — create an **ephemeral** jail with `jail -c … command=<binary>`,
which exists only while the agent runs and is removed when it exits (no teardown
needed).
- optional `ip4` (`inherit` by default) and `user` (in-jail user, `jexec` path).
Everything except _what binary gets exec'd_ is jail-agnostic and stays as-is.
`jail_wrap()` turns `(binary, args)` into the `(program, argv)` to exec. stdio is
untouched — `jexec`, `jail`, and `mdo` all run the child in the foreground and
inherit stdin/stdout — so the agent's JSON stream still reaches glasspane and the
MCP host's stdin/stdout transport still works.
## Design
This is wired through the `spawn-agent` socket command (any caller can request a
jail) and reused by the external-MCP host (`colibri-mcp`), which confines
arbitrary third-party MCP servers the same way.
### 1. Config — extend `AgentSpawnConfig` (spawner.rs:84)
## Privilege: how the unprivileged daemon gets root
```rust
pub struct AgentSpawnConfig {
// ...existing: binary, args, env, working_dir, provider, model...
/// Optional FreeBSD jail confinement. None = run on host (today's behavior).
#[serde(default)]
pub jail: Option<JailConfig>,
}
Jail attach (`jexec`) and create (`jail`) are root-only, but `colibri_daemon`
runs unprivileged. The deciding fact: FreeBSD `mac_do` rules are **identity**
mappings (`security.mac.do.rules=gid=0>uid=0` means "wheel may become root"), not
command filters — so granting the daemon `mdo` access grants it _full_ root, not
just `jexec`. We choose the escalation per host via `PrivMode`
(`COLIBRI_JAIL_PRIV_MODE`):
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct JailConfig {
pub name: Option<String>, // enter a running persistent jail (jexec)...
pub path: Option<String>, // ...or create an ephemeral jail here (jail -c)
pub ip4: Option<String>, // "inherit" | addr | none (vnet later)
pub user: Option<String>, // in-jail user, default "clawdie"
pub ephemeral: bool, // tear down with `jail -r` on exit
}
```
- **Live operator USB → `mdo` (default).** The single operator already holds
wheel→root, so a trusted local daemon is the same trust domain — `mdo -u root`
reuses the image's existing `mac_do` plumbing, no new privileged binary.
- **Deployed / shared host → setuid helper.** A socket-facing daemon with blanket
root is a real escalation surface, so use a narrow setuid helper
(`/usr/local/libexec/colibri-jail-spawn`) that only performs the jail spawn, and
keep the daemon unprivileged.
- **`none`** — run the jail command directly (already root, or tests).
Rides through the existing `SpawnAgent` socket command as one optional field — no
protocol redesign. colibri-tui / a skill / the supervisor can request "spawn pi,
jailed."
## Open items
### 2. The wrap — `(binary, args)` → jail invocation
- **Teardown:** ephemeral `jail -c command=` self-cleans; reaping a deeply nested
in-jail process tree may want a process-group kill (follow-up).
- **mdo env passthrough:** verify on FreeBSD that `mdo` propagates the injected
`COLIBRI_*` / provider env into the jailed process; if not, pass via `jexec`
args or a 0600 env file.
- **Jail filesystem provisioning** (ISO / deploy): the jailed binary needs its
runtime + work dir — a pre-provisioned persistent jail, or nullfs mounts for an
ephemeral one.
```rust
fn jail_wrap(binary: &str, args: &[String], jail: &Option<JailConfig>,
priv_mode: PrivMode) -> (String, Vec<String>)
{
let Some(j) = jail else { return (binary.into(), args.to_vec()); };
## References
// Inner jail command (jexec persistent, or jail -c ephemeral).
let (mut exe, mut a) = if let Some(name) = &j.name {
// jexec WITHOUT -l so the injected COLIBRI_*/provider env is inherited.
("jexec".to_string(), vec![name.clone(), binary.into()])
} else {
("jail".to_string(), vec![
"-c".into(), format!("path={}", j.path.clone().unwrap()),
"mount.devfs".into(),
j.ip4.as_deref().map(|ip| format!("ip4.addr={ip}"))
.unwrap_or("ip4=inherit".into()),
"command".into(), binary.into(),
])
};
a.extend(args.iter().cloned());
// Privilege escalation for the root-only jail step (see below).
match priv_mode {
PrivMode::Mdo => { // live USB
let mut wrapped = vec!["-u".into(), "root".into(), exe];
wrapped.extend(a);
("mdo".into(), wrapped)
}
PrivMode::Helper => { // deployed/hardened
// colibri stays unprivileged; the setuid helper does the one op.
("/usr/local/libexec/colibri-jail-spawn".into(),
std::iter::once(exe).chain(a).collect())
}
PrivMode::None => (exe, a), // tests / already-root
}
}
```
At spawner.rs:341 the only change is sourcing exe/argv from `jail_wrap`; the
`.envs()`, retry/backoff, and stdout-pipe capture are unchanged. **stdout JSONL
survives** because `jexec`/`jail -c command=`/`mdo` all run the child in the
foreground and inherit stdio → glasspane ingestion is unaffected, and the jailed
pi shows up as `AgentRuntime::Pi` with zero glasspane changes.
### 3. Teardown — `AgentHandle::kill` (spawner.rs:197)
Add a jail-aware branch:
- **jexec:** kill the **process group** (`-pid`); killing the jexec process
alone does not reliably reap the in-jail child.
- **ephemeral jail:** also `jail -r <name>` so jails are not leaked per spawn.
## Privilege model — the decision
Jail _attach_ (`jexec`) and _create_ (`jail`) are **root-only** in base FreeBSD;
there is no unprivileged path. But `colibri_daemon` runs as the unprivileged
`colibri` user (`nologin`), so it cannot attach a jail by itself. Two ways to
cross that line — and we pick **per deployment context**, matching the
live-vs-deployed split.
The deciding fact: the ISO's `mac_do` rules are **identity** mappings, not command
filters — `security.mac.do.rules=gid=0>uid=0` (clawdie-iso `build.sh:1274`) means
"wheel may become root." `mac_do` **cannot** restrict _which_ command runs as root.
| | `mdo -u root` | setuid/Capsicum helper |
| ------------------------------------- | ------------------------ | ---------------------- |
| New privileged binary to write+audit | none (reuses mac_do) | yes |
| Kernel-enforced | yes | yes |
| Non-interactive from daemon | yes (no password prompt) | yes |
| Root blast radius if daemon is popped | **full root** | **just jexec-pi** |
| Extra setup | one mac_do rule | helper + install |
Because `mac_do` is command-blind, **wrapping mdo in a helper does NOT narrow it**:
once `colibri` may `mdo -u root`, a compromise just runs `mdo -u root sh`. The
helper is hygiene, not a boundary. Only a setuid/Capsicum helper (where colibri
is _not_ granted general root) is a true boundary.
### Decision
- **Live operator USB → `mdo -u root`.** Single operator who already holds
wheel→root; the trusted local daemon is the same trust domain, so "daemon can
root" crosses no new boundary. Cheapest path, reuses the mac_do plumbing the
image already ships, no new C to audit. Requires one rule granting the colibri
group→root (grant by gid, not the dynamic uid):
`security.mac.do.rules=gid=0>uid=0,gid=<colibri-gid>>uid=0`.
Set `PrivMode::Mdo`.
- **Deployed disk/server (`service clawdie`) → setuid/Capsicum helper.** A
socket-facing daemon with blanket root is a real escalation surface on a
multi-user / exposed host. Ship `/usr/local/libexec/colibri-jail-spawn` (root,
argv-hardcoded to the pi-jail op) and keep colibri unprivileged.
Set `PrivMode::Helper`.
`PrivMode` is selected at daemon config time (live image stages `Mdo`, deploy
packaging stages `Helper`), so the same spawner serves both.
## Open items / must verify on the FreeBSD builder
1. **env passthrough through `mdo`.** The spawner injects `COLIBRI_*` + provider
keys via `.envs()`; they only reach pi if `mdo` (then `jexec`) propagate them.
`build.sh:1271` notes mdo "changes the primary gid to wheel" but is silent on
env. If mdo sanitizes env, pass critical values as explicit `jexec --env`/argv
or via a 0600 env file.
2. **jail filesystem provisioning** (ISO/deploy plumbing, not Rust): pi needs
Node + its `node_modules` + the work dir inside the jail. Either a
pre-provisioned persistent jail, or an ephemeral jail over a base with nullfs
`ro` mounts of `/usr/local` (pi) + a `rw` work dir.
3. **process-group kill + `jail -r`** teardown semantics under load.
## Scope summary
| Piece | Effort | Where |
| ------------------------------------ | ------------------------ | ------------------------ |
| `JailConfig` + field | trivial | spawner.rs:84, lib.rs:40 |
| `jail_wrap` + call site | small | spawner.rs:341 |
| jail-aware `kill` / `-r` | small | spawner.rs:197 |
| `PrivMode` (mdo vs helper) selection | small | daemon config |
| glasspane observation | none | already works |
| zot changes | none | mirror untouched |
| setuid `colibri-jail-spawn` helper | medium + security review | new (deploy lane) |
| jail FS provisioning | medium (ops) | ISO / deploy packaging |
- `crates/colibri-daemon/src/spawner.rs``JailConfig`, `PrivMode`, `jail_wrap`
- `crates/colibri-daemon/src/lib.rs` + `socket.rs``jail` on the spawn-agent command
- `crates/colibri-mcp/src/external.rs` — jailed external MCP servers