Merge pull request 'docs: design note for colibri-spawned pi in a FreeBSD jail' (#33) from design/colibri-jailed-agent-spawn into main
Reviewed-on: #33
This commit is contained in:
commit
c3e68e98f2
1 changed files with 184 additions and 0 deletions
184
docs/COLIBRI-JAILED-AGENT-SPAWN-DESIGN.md
Normal file
184
docs/COLIBRI-JAILED-AGENT-SPAWN-DESIGN.md
Normal file
|
|
@ -0,0 +1,184 @@
|
|||
# Colibri jailed agent spawn — design
|
||||
|
||||
**Status:** proposal · **Date:** 2026-06-13
|
||||
|
||||
How Colibri spawns a child agent (e.g. `pi`) confined inside a **FreeBSD jail**,
|
||||
and the privilege model for the root-requiring jail step.
|
||||
|
||||
## Why this lives in Colibri, not zot
|
||||
|
||||
zot has a multi-agent `swarm`, but its child is hardwired to `os.Executable()`
|
||||
(itself) — the binary override is test-only, the public `SpawnRequest` exposes
|
||||
no command field, and swarm explicitly runs with **"no worktree, no isolation,
|
||||
cwd == RepoRoot"**. So "zot spawns pi-in-a-jail" would require forking the zot
|
||||
mirror (binary override + a pi↔swarm protocol shim + jail wrapping we add
|
||||
anyway). See the agent-harness consolidation notes.
|
||||
|
||||
Colibri is the **supervisor**, already models `AgentRuntime{Pi, Zot}`, and —
|
||||
critically — **already spawns pi**: `crates/colibri-daemon/src/spawner.rs` runs
|
||||
agent subprocesses, captures their stdout JSONL, and hands it to glasspane.
|
||||
`socket.rs:345` even comments *"enables real Pi spawn."* Confinement is a
|
||||
supervisor concern and root-adjacent, so it belongs here. zot stays a clean
|
||||
upstream mirror, untouched.
|
||||
|
||||
## What already exists (the spawn pipeline)
|
||||
|
||||
```
|
||||
SpawnAgent socket cmd (lib.rs:40)
|
||||
→ cmd_spawn_agent (socket.rs:327)
|
||||
→ Spawner::spawn (spawner.rs)
|
||||
→ Command::new(binary).args().envs().stdout(piped()).spawn() ← spawner.rs:341
|
||||
→ AgentHandle.take_stdout() (spawner.rs:192)
|
||||
→ glasspane apply_pi_event (AgentRuntime::Pi, socket.rs:410)
|
||||
```
|
||||
|
||||
Everything except *what binary gets exec'd* is jail-agnostic and stays as-is.
|
||||
|
||||
## Design
|
||||
|
||||
### 1. Config — extend `AgentSpawnConfig` (spawner.rs:84)
|
||||
|
||||
```rust
|
||||
pub struct AgentSpawnConfig {
|
||||
// ...existing: binary, args, env, working_dir, provider, model...
|
||||
/// Optional FreeBSD jail confinement. None = run on host (today's behavior).
|
||||
#[serde(default)]
|
||||
pub jail: Option<JailConfig>,
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||
pub struct JailConfig {
|
||||
pub name: Option<String>, // enter a running persistent jail (jexec)...
|
||||
pub path: Option<String>, // ...or create an ephemeral jail here (jail -c)
|
||||
pub ip4: Option<String>, // "inherit" | addr | none (vnet later)
|
||||
pub user: Option<String>, // in-jail user, default "clawdie"
|
||||
pub ephemeral: bool, // tear down with `jail -r` on exit
|
||||
}
|
||||
```
|
||||
|
||||
Rides through the existing `SpawnAgent` socket command as one optional field — no
|
||||
protocol redesign. colibri-tui / a skill / the supervisor can request "spawn pi,
|
||||
jailed."
|
||||
|
||||
### 2. The wrap — `(binary, args)` → jail invocation
|
||||
|
||||
```rust
|
||||
fn jail_wrap(binary: &str, args: &[String], jail: &Option<JailConfig>,
|
||||
priv_mode: PrivMode) -> (String, Vec<String>)
|
||||
{
|
||||
let Some(j) = jail else { return (binary.into(), args.to_vec()); };
|
||||
|
||||
// Inner jail command (jexec persistent, or jail -c ephemeral).
|
||||
let (mut exe, mut a) = if let Some(name) = &j.name {
|
||||
// jexec WITHOUT -l so the injected COLIBRI_*/provider env is inherited.
|
||||
("jexec".to_string(), vec![name.clone(), binary.into()])
|
||||
} else {
|
||||
("jail".to_string(), vec![
|
||||
"-c".into(), format!("path={}", j.path.clone().unwrap()),
|
||||
"mount.devfs".into(),
|
||||
j.ip4.as_deref().map(|ip| format!("ip4.addr={ip}"))
|
||||
.unwrap_or("ip4=inherit".into()),
|
||||
"command".into(), binary.into(),
|
||||
])
|
||||
};
|
||||
a.extend(args.iter().cloned());
|
||||
|
||||
// Privilege escalation for the root-only jail step (see below).
|
||||
match priv_mode {
|
||||
PrivMode::Mdo => { // live USB
|
||||
let mut wrapped = vec!["-u".into(), "root".into(), exe];
|
||||
wrapped.extend(a);
|
||||
("mdo".into(), wrapped)
|
||||
}
|
||||
PrivMode::Helper => { // deployed/hardened
|
||||
// colibri stays unprivileged; the setuid helper does the one op.
|
||||
("/usr/local/libexec/colibri-jail-spawn".into(),
|
||||
std::iter::once(exe).chain(a).collect())
|
||||
}
|
||||
PrivMode::None => (exe, a), // tests / already-root
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
At spawner.rs:341 the only change is sourcing exe/argv from `jail_wrap`; the
|
||||
`.envs()`, retry/backoff, and stdout-pipe capture are unchanged. **stdout JSONL
|
||||
survives** because `jexec`/`jail -c command=`/`mdo` all run the child in the
|
||||
foreground and inherit stdio → glasspane ingestion is unaffected, and the jailed
|
||||
pi shows up as `AgentRuntime::Pi` with zero glasspane changes.
|
||||
|
||||
### 3. Teardown — `AgentHandle::kill` (spawner.rs:197)
|
||||
|
||||
Add a jail-aware branch:
|
||||
- **jexec:** kill the **process group** (`-pid`); killing the jexec process
|
||||
alone does not reliably reap the in-jail child.
|
||||
- **ephemeral jail:** also `jail -r <name>` so jails are not leaked per spawn.
|
||||
|
||||
## Privilege model — the decision
|
||||
|
||||
Jail *attach* (`jexec`) and *create* (`jail`) are **root-only** in base FreeBSD;
|
||||
there is no unprivileged path. But `colibri_daemon` runs as the unprivileged
|
||||
`colibri` user (`nologin`), so it cannot attach a jail by itself. Two ways to
|
||||
cross that line — and we pick **per deployment context**, matching the
|
||||
live-vs-deployed split.
|
||||
|
||||
The deciding fact: the ISO's mac_do rules are **identity** mappings, not command
|
||||
filters — `security.mac.do.rules=gid=0>uid=0` (clawdie-iso `build.sh:1274`) means
|
||||
"wheel may become root." mac_do **cannot** restrict *which* command runs as root.
|
||||
|
||||
| | `mdo -u root` | setuid/Capsicum helper |
|
||||
|---|---|---|
|
||||
| New privileged binary to write+audit | none (reuses mac_do) | yes |
|
||||
| Kernel-enforced | yes | yes |
|
||||
| Non-interactive from daemon | yes (no password prompt) | yes |
|
||||
| Root blast radius if daemon is popped | **full root** | **just jexec-pi** |
|
||||
| Extra setup | one mac_do rule | helper + install |
|
||||
|
||||
Because mac_do is command-blind, **wrapping mdo in a helper does NOT narrow it**:
|
||||
once `colibri` may `mdo -u root`, a compromise just runs `mdo -u root sh`. The
|
||||
helper is hygiene, not a boundary. Only a setuid/Capsicum helper (where colibri
|
||||
is *not* granted general root) is a true boundary.
|
||||
|
||||
### Decision
|
||||
|
||||
- **Live operator USB → `mdo -u root`.** Single operator who already holds
|
||||
wheel→root; the trusted local daemon is the same trust domain, so "daemon can
|
||||
root" crosses no new boundary. Cheapest path, reuses the mac_do plumbing the
|
||||
image already ships, no new C to audit. Requires one rule granting the colibri
|
||||
group→root (grant by gid, not the dynamic uid):
|
||||
`security.mac.do.rules=gid=0>uid=0,gid=<colibri-gid>>uid=0`.
|
||||
Set `PrivMode::Mdo`.
|
||||
|
||||
- **Deployed disk/server (`service clawdie`) → setuid/Capsicum helper.** A
|
||||
socket-facing daemon with blanket root is a real escalation surface on a
|
||||
multi-user / exposed host. Ship `/usr/local/libexec/colibri-jail-spawn` (root,
|
||||
argv-hardcoded to the pi-jail op) and keep colibri unprivileged.
|
||||
Set `PrivMode::Helper`.
|
||||
|
||||
`PrivMode` is selected at daemon config time (live image stages `Mdo`, deploy
|
||||
packaging stages `Helper`), so the same spawner serves both.
|
||||
|
||||
## Open items / must verify on the FreeBSD builder
|
||||
|
||||
1. **env passthrough through `mdo`.** The spawner injects `COLIBRI_*` + provider
|
||||
keys via `.envs()`; they only reach pi if `mdo` (then `jexec`) propagate them.
|
||||
`build.sh:1271` notes mdo "changes the primary gid to wheel" but is silent on
|
||||
env. If mdo sanitizes env, pass critical values as explicit `jexec --env`/argv
|
||||
or via a 0600 env file.
|
||||
2. **jail filesystem provisioning** (ISO/deploy plumbing, not Rust): pi needs
|
||||
Node + its `node_modules` + the work dir inside the jail. Either a
|
||||
pre-provisioned persistent jail, or an ephemeral jail over a base with nullfs
|
||||
`ro` mounts of `/usr/local` (pi) + a `rw` work dir.
|
||||
3. **process-group kill + `jail -r`** teardown semantics under load.
|
||||
|
||||
## Scope summary
|
||||
|
||||
| Piece | Effort | Where |
|
||||
|---|---|---|
|
||||
| `JailConfig` + field | trivial | spawner.rs:84, lib.rs:40 |
|
||||
| `jail_wrap` + call site | small | spawner.rs:341 |
|
||||
| jail-aware `kill` / `-r` | small | spawner.rs:197 |
|
||||
| `PrivMode` (mdo vs helper) selection | small | daemon config |
|
||||
| glasspane observation | none | already works |
|
||||
| zot changes | none | mirror untouched |
|
||||
| setuid `colibri-jail-spawn` helper | medium + security review | new (deploy lane) |
|
||||
| jail FS provisioning | medium (ops) | ISO / deploy packaging |
|
||||
Loading…
Add table
Reference in a new issue