colibri/docs/COLIBRI-JAILED-AGENT-SPAWN-DESIGN.md

# Colibri jailed agent spawn

**Status:** Accepted — implemented · **Date:** 13.jun.2026

How Colibri confines a spawned agent (e.g. `pi`) inside a FreeBSD jail, and how
the unprivileged daemon gets the root that jails require. This describes the
shipped code in `crates/colibri-daemon/src/spawner.rs`.

## Why this lives in Colibri, not zot

Colibri is the supervisor and already spawns agents — `spawner.rs` runs the
subprocess, captures its JSONL, and feeds glasspane. Confinement is a supervisor
concern, so it lives here, and zot stays a clean upstream mirror. (zot's own
`swarm` only spawns copies of zot and has no isolation, so it was never the right
place for this.)

## How it works

A spawn can carry an optional `JailConfig`; with none, the agent runs on the host
as before. The field that is set picks the jail lifecycle:

- **`name`** — enter an already-running **persistent** jail with `jexec`
  (created/destroyed out of band by rc.d / the operator). Takes precedence.
- **`path`** — create an **ephemeral** jail with `jail -c … command=<binary>`,
  which exists only while the agent runs and is removed when it exits (no teardown
  needed).
- **`root_path`** — host-visible root path for a named jail; required when staged
  env/working-dir payload delivery is needed. Falls back to `path` for ephemeral
  jails.
- optional `ip4` (`inherit` by default) and `user` (in-jail user, `jexec` path).

`jail_wrap()` turns `(binary, args)` into the `(program, argv)` to exec. stdio is
untouched — `jexec`, `jail`, and `mdo` all run the child in the foreground and
inherit stdin/stdout — so the agent's JSON stream still reaches glasspane and the
MCP host's stdin/stdout transport still works.

This is wired through the `spawn-agent` socket command (any caller can request a
jail) and reused by the external-MCP host (`colibri-mcp`), which confines
arbitrary third-party MCP servers the same way.

## Privilege: how the unprivileged daemon gets root

Jail attach (`jexec`) and create (`jail`) are root-only, but `colibri_daemon`
runs unprivileged. The deciding fact: FreeBSD `mac_do` rules are **identity**
mappings (`security.mac.do.rules=gid=0>uid=0` means "wheel may become root"), not
command filters — so granting the daemon `mdo` access grants it _full_ root, not
just `jexec`. We choose the escalation per host via `PrivMode`
(`COLIBRI_JAIL_PRIV_MODE`):

- **Live operator USB → `mdo` (default).** The single operator already holds
  wheel→root, so a trusted local daemon is the same trust domain — `mdo -u root`
  reuses the image's existing `mac_do` plumbing, no new privileged binary.
- **Deployed / shared host → setuid helper.** A socket-facing daemon with blanket
  root is a real escalation surface, so use a narrow setuid helper
  (`/usr/local/libexec/colibri-jail-spawn`) that only performs the jail spawn, and
  keep the daemon unprivileged.
- **Validated hosts with existing sudo policy → `sudo`.** `sudo -n` can be used
  as an interim proof/ops mode when a narrow sudoers rule already permits the
  daemon user to run the jail command without prompting. Prefer the setuid helper
  for long-lived production hosts once packaged.
- **`none`** — run the jail command directly (already root, or tests).

## Staged env payloads

When a jailed spawn needs env vars or a working dir, `prepare_spawn_command()`
writes a 0600 `env.sh` (sorted, single-quoted exports) and a `launch.sh` wrapper
into a staged directory under the jail's `root_path` at
`/var/run/colibri-stage/<id>/`. The jail command runs `/bin/sh launch.sh`, which
sources the env file and `cd`s to the working dir before `exec`-ing the agent
binary. This bypasses the env-passthrough problem entirely — no reliance on
`jexec`/`mdo` inheriting env vars.

The staged directory is cleaned up when the agent stops, fails, exits early, or
encounters a poll error. The same mechanism is used by the external-MCP host for
jailed MCP servers.

## Open items

- **Teardown:** ephemeral `jail -c command=` self-cleans; reaping a deeply nested
  in-jail process tree may want a process-group kill (follow-up).
- **Jail filesystem provisioning** (ISO / deploy): the jailed binary needs its
  runtime + work dir — a pre-provisioned persistent jail, or nullfs mounts for an
  ephemeral one.

## References

- `crates/colibri-daemon/src/spawner.rs` — `JailConfig`, `PrivMode`, `jail_wrap`,
  `prepare_spawn_command`, `PreparedSpawnCommand`
- `crates/colibri-daemon/src/lib.rs` + `socket.rs` — `jail` on the spawn-agent command
- `crates/colibri-mcp/src/external.rs` — jailed external MCP servers
docs: rewrite ADR + jail-spawn design to match shipped code Both were written as proposals; the decisions are now working code, so slim them to plain "how it works" docs (code is the source of truth). - ADR-agent-harness-consolidation: Proposed -> Accepted/implemented; drop the migration plan + gates (all shipped), fold in the pi-demotion correction, and drop the dangling CLAWDIE-AGENT-WIKI reference (deleted in #34). 116 -> ~55 lines. - COLIBRI-JAILED-AGENT-SPAWN-DESIGN: proposal -> implemented; describe the shipped spawner (name-vs-path lifecycle, command= syntax, PrivMode mdo/helper, socket wiring, external-MCP reuse) instead of the original code sketch. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> 2026-06-13 21:56:01 +02:00			`# Colibri jailed agent spawn`
docs: design note for colibri-spawned pi in a FreeBSD jail Colibri already spawns pi (spawner.rs) and captures its JSONL for glasspane; this documents adding optional jail confinement to that existing path rather than touching zot (whose swarm is self-only + no isolation — keeps the mirror clean). Covers: JailConfig + jail_wrap at the Command::new site, jail-aware teardown, and the privilege decision for the root-only jexec step — - live USB → `mdo -u root` (reuses mac_do; daemon == operator trust domain) - deployed → setuid/Capsicum helper (narrow root surface on exposed hosts) mac_do rules are identity-based (gid=0>uid=0), not command-filtered, so mdo grants the daemon full root; that's acceptable on the single-operator live USB but not on a deployed/exposed box, hence the split. Selected via PrivMode at daemon config time so one spawner serves both. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> 2026-06-13 19:06:21 +02:00
docs: rewrite ADR + jail-spawn design to match shipped code Both were written as proposals; the decisions are now working code, so slim them to plain "how it works" docs (code is the source of truth). - ADR-agent-harness-consolidation: Proposed -> Accepted/implemented; drop the migration plan + gates (all shipped), fold in the pi-demotion correction, and drop the dangling CLAWDIE-AGENT-WIKI reference (deleted in #34). 116 -> ~55 lines. - COLIBRI-JAILED-AGENT-SPAWN-DESIGN: proposal -> implemented; describe the shipped spawner (name-vs-path lifecycle, command= syntax, PrivMode mdo/helper, socket wiring, external-MCP reuse) instead of the original code sketch. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> 2026-06-13 21:56:01 +02:00			`Status: Accepted — implemented · Date: 13.jun.2026`
docs: design note for colibri-spawned pi in a FreeBSD jail Colibri already spawns pi (spawner.rs) and captures its JSONL for glasspane; this documents adding optional jail confinement to that existing path rather than touching zot (whose swarm is self-only + no isolation — keeps the mirror clean). Covers: JailConfig + jail_wrap at the Command::new site, jail-aware teardown, and the privilege decision for the root-only jexec step — - live USB → `mdo -u root` (reuses mac_do; daemon == operator trust domain) - deployed → setuid/Capsicum helper (narrow root surface on exposed hosts) mac_do rules are identity-based (gid=0>uid=0), not command-filtered, so mdo grants the daemon full root; that's acceptable on the single-operator live USB but not on a deployed/exposed box, hence the split. Selected via PrivMode at daemon config time so one spawner serves both. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> 2026-06-13 19:06:21 +02:00
docs: rewrite ADR + jail-spawn design to match shipped code Both were written as proposals; the decisions are now working code, so slim them to plain "how it works" docs (code is the source of truth). - ADR-agent-harness-consolidation: Proposed -> Accepted/implemented; drop the migration plan + gates (all shipped), fold in the pi-demotion correction, and drop the dangling CLAWDIE-AGENT-WIKI reference (deleted in #34). 116 -> ~55 lines. - COLIBRI-JAILED-AGENT-SPAWN-DESIGN: proposal -> implemented; describe the shipped spawner (name-vs-path lifecycle, command= syntax, PrivMode mdo/helper, socket wiring, external-MCP reuse) instead of the original code sketch. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> 2026-06-13 21:56:01 +02:00			How Colibri confines a spawned agent (e.g. `pi`) inside a FreeBSD jail, and how
			`the unprivileged daemon gets the root that jails require. This describes the`
			shipped code in `crates/colibri-daemon/src/spawner.rs`.
docs: design note for colibri-spawned pi in a FreeBSD jail Colibri already spawns pi (spawner.rs) and captures its JSONL for glasspane; this documents adding optional jail confinement to that existing path rather than touching zot (whose swarm is self-only + no isolation — keeps the mirror clean). Covers: JailConfig + jail_wrap at the Command::new site, jail-aware teardown, and the privilege decision for the root-only jexec step — - live USB → `mdo -u root` (reuses mac_do; daemon == operator trust domain) - deployed → setuid/Capsicum helper (narrow root surface on exposed hosts) mac_do rules are identity-based (gid=0>uid=0), not command-filtered, so mdo grants the daemon full root; that's acceptable on the single-operator live USB but not on a deployed/exposed box, hence the split. Selected via PrivMode at daemon config time so one spawner serves both. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> 2026-06-13 19:06:21 +02:00
			`## Why this lives in Colibri, not zot`

docs: rewrite ADR + jail-spawn design to match shipped code Both were written as proposals; the decisions are now working code, so slim them to plain "how it works" docs (code is the source of truth). - ADR-agent-harness-consolidation: Proposed -> Accepted/implemented; drop the migration plan + gates (all shipped), fold in the pi-demotion correction, and drop the dangling CLAWDIE-AGENT-WIKI reference (deleted in #34). 116 -> ~55 lines. - COLIBRI-JAILED-AGENT-SPAWN-DESIGN: proposal -> implemented; describe the shipped spawner (name-vs-path lifecycle, command= syntax, PrivMode mdo/helper, socket wiring, external-MCP reuse) instead of the original code sketch. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> 2026-06-13 21:56:01 +02:00			Colibri is the supervisor and already spawns agents — `spawner.rs` runs the
			`subprocess, captures its JSONL, and feeds glasspane. Confinement is a supervisor`
			`concern, so it lives here, and zot stays a clean upstream mirror. (zot's own`
			`swarm` only spawns copies of zot and has no isolation, so it was never the right
			`place for this.)`

			`## How it works`

			A spawn can carry an optional `JailConfig`; with none, the agent runs on the host
			`as before. The field that is set picks the jail lifecycle:`

			- `name` — enter an already-running persistent jail with `jexec`
			`(created/destroyed out of band by rc.d / the operator). Takes precedence.`
			- `path` — create an ephemeral jail with `jail -c … command=<binary>`,
			`which exists only while the agent runs and is removed when it exits (no teardown`
			`needed).`
docs: document jail root_path + staged env payloads Match the docs to the shipped staged-env code: add the JailConfig root_path field, a 'Staged env payloads' section (prepare_spawn_command writes env.sh/ launch.sh under /var/run/colibri-stage/<id>/), resolve the mdo-env-passthrough open item, and add root_path to the external-MCP example. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> 2026-06-14 01:38:30 +02:00			- `root_path` — host-visible root path for a named jail; required when staged
			env/working-dir payload delivery is needed. Falls back to `path` for ephemeral
			`jails.`
docs: rewrite ADR + jail-spawn design to match shipped code Both were written as proposals; the decisions are now working code, so slim them to plain "how it works" docs (code is the source of truth). - ADR-agent-harness-consolidation: Proposed -> Accepted/implemented; drop the migration plan + gates (all shipped), fold in the pi-demotion correction, and drop the dangling CLAWDIE-AGENT-WIKI reference (deleted in #34). 116 -> ~55 lines. - COLIBRI-JAILED-AGENT-SPAWN-DESIGN: proposal -> implemented; describe the shipped spawner (name-vs-path lifecycle, command= syntax, PrivMode mdo/helper, socket wiring, external-MCP reuse) instead of the original code sketch. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> 2026-06-13 21:56:01 +02:00			- optional `ip4` (`inherit` by default) and `user` (in-jail user, `jexec` path).

			`jail_wrap()` turns `(binary, args)` into the `(program, argv)` to exec. stdio is
			untouched — `jexec`, `jail`, and `mdo` all run the child in the foreground and
			`inherit stdin/stdout — so the agent's JSON stream still reaches glasspane and the`
			`MCP host's stdin/stdout transport still works.`

			This is wired through the `spawn-agent` socket command (any caller can request a
			jail) and reused by the external-MCP host (`colibri-mcp`), which confines
			`arbitrary third-party MCP servers the same way.`

			`## Privilege: how the unprivileged daemon gets root`

			Jail attach (`jexec`) and create (`jail`) are root-only, but `colibri_daemon`
			runs unprivileged. The deciding fact: FreeBSD `mac_do` rules are identity
			mappings (`security.mac.do.rules=gid=0>uid=0` means "wheel may become root"), not
			command filters — so granting the daemon `mdo` access grants it _full_ root, not
			just `jexec`. We choose the escalation per host via `PrivMode`
			(`COLIBRI_JAIL_PRIV_MODE`):

			- Live operator USB → `mdo` (default). The single operator already holds
			wheel→root, so a trusted local daemon is the same trust domain — `mdo -u root`
			reuses the image's existing `mac_do` plumbing, no new privileged binary.
			`- Deployed / shared host → setuid helper. A socket-facing daemon with blanket`
			`root is a real escalation surface, so use a narrow setuid helper`
			(`/usr/local/libexec/colibri-jail-spawn`) that only performs the jail spawn, and
			`keep the daemon unprivileged.`
fix(spawner): avoid async closure in retry path (Sam & Pi) Move the backoff spawn operation into a named async helper so older tooling does not trip over \|\| async syntax, and add a jail sudo wrapping unit test. Document sudo as an interim validated-host privilege mode.\n\nValidation: ./scripts/check-format.sh; cargo fmt --check; cargo check -p colibri-daemon; cargo test -p colibri-daemon jail_tests -- --nocapture. 2026-06-21 16:00:11 +02:00			- Validated hosts with existing sudo policy → `sudo`. `sudo -n` can be used
			`as an interim proof/ops mode when a narrow sudoers rule already permits the`
			`daemon user to run the jail command without prompting. Prefer the setuid helper`
			`for long-lived production hosts once packaged.`
docs: rewrite ADR + jail-spawn design to match shipped code Both were written as proposals; the decisions are now working code, so slim them to plain "how it works" docs (code is the source of truth). - ADR-agent-harness-consolidation: Proposed -> Accepted/implemented; drop the migration plan + gates (all shipped), fold in the pi-demotion correction, and drop the dangling CLAWDIE-AGENT-WIKI reference (deleted in #34). 116 -> ~55 lines. - COLIBRI-JAILED-AGENT-SPAWN-DESIGN: proposal -> implemented; describe the shipped spawner (name-vs-path lifecycle, command= syntax, PrivMode mdo/helper, socket wiring, external-MCP reuse) instead of the original code sketch. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> 2026-06-13 21:56:01 +02:00			- `none` — run the jail command directly (already root, or tests).

docs: document jail root_path + staged env payloads Match the docs to the shipped staged-env code: add the JailConfig root_path field, a 'Staged env payloads' section (prepare_spawn_command writes env.sh/ launch.sh under /var/run/colibri-stage/<id>/), resolve the mdo-env-passthrough open item, and add root_path to the external-MCP example. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> 2026-06-14 01:38:30 +02:00			`## Staged env payloads`

			When a jailed spawn needs env vars or a working dir, `prepare_spawn_command()`
			writes a 0600 `env.sh` (sorted, single-quoted exports) and a `launch.sh` wrapper
			into a staged directory under the jail's `root_path` at
			`/var/run/colibri-stage/<id>/`. The jail command runs `/bin/sh launch.sh`, which
			sources the env file and `cd`s to the working dir before `exec`-ing the agent
			`binary. This bypasses the env-passthrough problem entirely — no reliance on`
			`jexec`/`mdo` inheriting env vars.

			`The staged directory is cleaned up when the agent stops, fails, exits early, or`
			`encounters a poll error. The same mechanism is used by the external-MCP host for`
			`jailed MCP servers.`

docs: rewrite ADR + jail-spawn design to match shipped code Both were written as proposals; the decisions are now working code, so slim them to plain "how it works" docs (code is the source of truth). - ADR-agent-harness-consolidation: Proposed -> Accepted/implemented; drop the migration plan + gates (all shipped), fold in the pi-demotion correction, and drop the dangling CLAWDIE-AGENT-WIKI reference (deleted in #34). 116 -> ~55 lines. - COLIBRI-JAILED-AGENT-SPAWN-DESIGN: proposal -> implemented; describe the shipped spawner (name-vs-path lifecycle, command= syntax, PrivMode mdo/helper, socket wiring, external-MCP reuse) instead of the original code sketch. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> 2026-06-13 21:56:01 +02:00			`## Open items`

			- Teardown: ephemeral `jail -c command=` self-cleans; reaping a deeply nested
			`in-jail process tree may want a process-group kill (follow-up).`
			`- Jail filesystem provisioning (ISO / deploy): the jailed binary needs its`
			`runtime + work dir — a pre-provisioned persistent jail, or nullfs mounts for an`
			`ephemeral one.`

			`## References`

docs: document jail root_path + staged env payloads Match the docs to the shipped staged-env code: add the JailConfig root_path field, a 'Staged env payloads' section (prepare_spawn_command writes env.sh/ launch.sh under /var/run/colibri-stage/<id>/), resolve the mdo-env-passthrough open item, and add root_path to the external-MCP example. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> 2026-06-14 01:38:30 +02:00			- `crates/colibri-daemon/src/spawner.rs` — `JailConfig`, `PrivMode`, `jail_wrap`,
			`prepare_spawn_command`, `PreparedSpawnCommand`
docs: rewrite ADR + jail-spawn design to match shipped code Both were written as proposals; the decisions are now working code, so slim them to plain "how it works" docs (code is the source of truth). - ADR-agent-harness-consolidation: Proposed -> Accepted/implemented; drop the migration plan + gates (all shipped), fold in the pi-demotion correction, and drop the dangling CLAWDIE-AGENT-WIKI reference (deleted in #34). 116 -> ~55 lines. - COLIBRI-JAILED-AGENT-SPAWN-DESIGN: proposal -> implemented; describe the shipped spawner (name-vs-path lifecycle, command= syntax, PrivMode mdo/helper, socket wiring, external-MCP reuse) instead of the original code sketch. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> 2026-06-13 21:56:01 +02:00			- `crates/colibri-daemon/src/lib.rs` + `socket.rs` — `jail` on the spawn-agent command
			- `crates/colibri-mcp/src/external.rs` — jailed external MCP servers