Merge pull request 'docs: design note for colibri-spawned pi in a FreeBSD jail' (#33) from design/colibri-jailed-agent-spawn into main
Some checks are pending
CI / rust (push) Waiting to run
CI / markdown (push) Waiting to run

Reviewed-on: #33
This commit is contained in:
clawdie 2026-06-13 19:08:38 +02:00
commit c3e68e98f2

View file

@ -0,0 +1,184 @@
# Colibri jailed agent spawn — design
**Status:** proposal · **Date:** 2026-06-13
How Colibri spawns a child agent (e.g. `pi`) confined inside a **FreeBSD jail**,
and the privilege model for the root-requiring jail step.
## Why this lives in Colibri, not zot
zot has a multi-agent `swarm`, but its child is hardwired to `os.Executable()`
(itself) — the binary override is test-only, the public `SpawnRequest` exposes
no command field, and swarm explicitly runs with **"no worktree, no isolation,
cwd == RepoRoot"**. So "zot spawns pi-in-a-jail" would require forking the zot
mirror (binary override + a pi↔swarm protocol shim + jail wrapping we add
anyway). See the agent-harness consolidation notes.
Colibri is the **supervisor**, already models `AgentRuntime{Pi, Zot}`, and —
critically — **already spawns pi**: `crates/colibri-daemon/src/spawner.rs` runs
agent subprocesses, captures their stdout JSONL, and hands it to glasspane.
`socket.rs:345` even comments *"enables real Pi spawn."* Confinement is a
supervisor concern and root-adjacent, so it belongs here. zot stays a clean
upstream mirror, untouched.
## What already exists (the spawn pipeline)
```
SpawnAgent socket cmd (lib.rs:40)
→ cmd_spawn_agent (socket.rs:327)
→ Spawner::spawn (spawner.rs)
→ Command::new(binary).args().envs().stdout(piped()).spawn() ← spawner.rs:341
→ AgentHandle.take_stdout() (spawner.rs:192)
→ glasspane apply_pi_event (AgentRuntime::Pi, socket.rs:410)
```
Everything except *what binary gets exec'd* is jail-agnostic and stays as-is.
## Design
### 1. Config — extend `AgentSpawnConfig` (spawner.rs:84)
```rust
pub struct AgentSpawnConfig {
// ...existing: binary, args, env, working_dir, provider, model...
/// Optional FreeBSD jail confinement. None = run on host (today's behavior).
#[serde(default)]
pub jail: Option<JailConfig>,
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct JailConfig {
pub name: Option<String>, // enter a running persistent jail (jexec)...
pub path: Option<String>, // ...or create an ephemeral jail here (jail -c)
pub ip4: Option<String>, // "inherit" | addr | none (vnet later)
pub user: Option<String>, // in-jail user, default "clawdie"
pub ephemeral: bool, // tear down with `jail -r` on exit
}
```
Rides through the existing `SpawnAgent` socket command as one optional field — no
protocol redesign. colibri-tui / a skill / the supervisor can request "spawn pi,
jailed."
### 2. The wrap — `(binary, args)` → jail invocation
```rust
fn jail_wrap(binary: &str, args: &[String], jail: &Option<JailConfig>,
priv_mode: PrivMode) -> (String, Vec<String>)
{
let Some(j) = jail else { return (binary.into(), args.to_vec()); };
// Inner jail command (jexec persistent, or jail -c ephemeral).
let (mut exe, mut a) = if let Some(name) = &j.name {
// jexec WITHOUT -l so the injected COLIBRI_*/provider env is inherited.
("jexec".to_string(), vec![name.clone(), binary.into()])
} else {
("jail".to_string(), vec![
"-c".into(), format!("path={}", j.path.clone().unwrap()),
"mount.devfs".into(),
j.ip4.as_deref().map(|ip| format!("ip4.addr={ip}"))
.unwrap_or("ip4=inherit".into()),
"command".into(), binary.into(),
])
};
a.extend(args.iter().cloned());
// Privilege escalation for the root-only jail step (see below).
match priv_mode {
PrivMode::Mdo => { // live USB
let mut wrapped = vec!["-u".into(), "root".into(), exe];
wrapped.extend(a);
("mdo".into(), wrapped)
}
PrivMode::Helper => { // deployed/hardened
// colibri stays unprivileged; the setuid helper does the one op.
("/usr/local/libexec/colibri-jail-spawn".into(),
std::iter::once(exe).chain(a).collect())
}
PrivMode::None => (exe, a), // tests / already-root
}
}
```
At spawner.rs:341 the only change is sourcing exe/argv from `jail_wrap`; the
`.envs()`, retry/backoff, and stdout-pipe capture are unchanged. **stdout JSONL
survives** because `jexec`/`jail -c command=`/`mdo` all run the child in the
foreground and inherit stdio → glasspane ingestion is unaffected, and the jailed
pi shows up as `AgentRuntime::Pi` with zero glasspane changes.
### 3. Teardown — `AgentHandle::kill` (spawner.rs:197)
Add a jail-aware branch:
- **jexec:** kill the **process group** (`-pid`); killing the jexec process
alone does not reliably reap the in-jail child.
- **ephemeral jail:** also `jail -r <name>` so jails are not leaked per spawn.
## Privilege model — the decision
Jail *attach* (`jexec`) and *create* (`jail`) are **root-only** in base FreeBSD;
there is no unprivileged path. But `colibri_daemon` runs as the unprivileged
`colibri` user (`nologin`), so it cannot attach a jail by itself. Two ways to
cross that line — and we pick **per deployment context**, matching the
live-vs-deployed split.
The deciding fact: the ISO's mac_do rules are **identity** mappings, not command
filters — `security.mac.do.rules=gid=0>uid=0` (clawdie-iso `build.sh:1274`) means
"wheel may become root." mac_do **cannot** restrict *which* command runs as root.
| | `mdo -u root` | setuid/Capsicum helper |
|---|---|---|
| New privileged binary to write+audit | none (reuses mac_do) | yes |
| Kernel-enforced | yes | yes |
| Non-interactive from daemon | yes (no password prompt) | yes |
| Root blast radius if daemon is popped | **full root** | **just jexec-pi** |
| Extra setup | one mac_do rule | helper + install |
Because mac_do is command-blind, **wrapping mdo in a helper does NOT narrow it**:
once `colibri` may `mdo -u root`, a compromise just runs `mdo -u root sh`. The
helper is hygiene, not a boundary. Only a setuid/Capsicum helper (where colibri
is *not* granted general root) is a true boundary.
### Decision
- **Live operator USB → `mdo -u root`.** Single operator who already holds
wheel→root; the trusted local daemon is the same trust domain, so "daemon can
root" crosses no new boundary. Cheapest path, reuses the mac_do plumbing the
image already ships, no new C to audit. Requires one rule granting the colibri
group→root (grant by gid, not the dynamic uid):
`security.mac.do.rules=gid=0>uid=0,gid=<colibri-gid>>uid=0`.
Set `PrivMode::Mdo`.
- **Deployed disk/server (`service clawdie`) → setuid/Capsicum helper.** A
socket-facing daemon with blanket root is a real escalation surface on a
multi-user / exposed host. Ship `/usr/local/libexec/colibri-jail-spawn` (root,
argv-hardcoded to the pi-jail op) and keep colibri unprivileged.
Set `PrivMode::Helper`.
`PrivMode` is selected at daemon config time (live image stages `Mdo`, deploy
packaging stages `Helper`), so the same spawner serves both.
## Open items / must verify on the FreeBSD builder
1. **env passthrough through `mdo`.** The spawner injects `COLIBRI_*` + provider
keys via `.envs()`; they only reach pi if `mdo` (then `jexec`) propagate them.
`build.sh:1271` notes mdo "changes the primary gid to wheel" but is silent on
env. If mdo sanitizes env, pass critical values as explicit `jexec --env`/argv
or via a 0600 env file.
2. **jail filesystem provisioning** (ISO/deploy plumbing, not Rust): pi needs
Node + its `node_modules` + the work dir inside the jail. Either a
pre-provisioned persistent jail, or an ephemeral jail over a base with nullfs
`ro` mounts of `/usr/local` (pi) + a `rw` work dir.
3. **process-group kill + `jail -r`** teardown semantics under load.
## Scope summary
| Piece | Effort | Where |
|---|---|---|
| `JailConfig` + field | trivial | spawner.rs:84, lib.rs:40 |
| `jail_wrap` + call site | small | spawner.rs:341 |
| jail-aware `kill` / `-r` | small | spawner.rs:197 |
| `PrivMode` (mdo vs helper) selection | small | daemon config |
| glasspane observation | none | already works |
| zot changes | none | mirror untouched |
| setuid `colibri-jail-spawn` helper | medium + security review | new (deploy lane) |
| jail FS provisioning | medium (ops) | ISO / deploy packaging |