docs(wiki): add 9 subsystem pages (rebuilt on current main)
Some checks failed
CI / rust (pull_request) Has been cancelled
CI / markdown (pull_request) Has been cancelled
CI / port (pull_request) Has been cancelled
CI / agent-jail-pkgs (pull_request) Has been cancelled

Brings the wiki-expansion pages onto current main WITHOUT the stale baggage the
original feature/wiki-expansion branch carried (it predated the rename + date
PRs and would have reverted them). Cherry-picked only the 9 genuinely-new pages:
contracts, store-schema, external-mcp, operator-cli, tui, runtime-inventory,
skills-catalog, vault-provision, deployment. Added them to index.md.

Fixed on the way in: vault-provision referenced the pre-rename
VAULT-PROVISION-FIRST-PROOF → repointed to VAULT-PROVISION-RUNBOOK. (No US dates
in these pages.)

Gates: wiki-lint --strict clean (131 pass); markdown format clean.
This commit is contained in:
Sam & Claude 2026-06-24 16:48:49 +02:00
parent 5d646b1f2c
commit f581433b29
10 changed files with 1161 additions and 10 deletions

49
docs/wiki/contracts.md Normal file
View file

@ -0,0 +1,49 @@
# Stable JSON contracts
← [index](./index.md)
`colibri-contracts` holds the stable, language-agnostic wire shapes shared
between Colibri (Rust) and Clawdie agents (TypeScript). It owns _schemas and
(De)serialize_, not business logic.
## Why a separate contracts crate
- Prevent duplicated definitions between Rust and TypeScript lanes.
- Keep committed manifests in `manifests/` parseable by both sides.
- Centralize schema strings, field renaming aliases, and backward-compat
defaults.
## Active schemas
| Schema | Rust struct | Purpose |
| -------------------------------------- | --------------------- | -------------------------------------------------------------- |
| `clawdie.interagent.run-manifest.v1` | `RunManifest` | Records a build/test run — role, agent, artifacts, summary. |
| `clawdie.runtime-version-inventory.v1` | `RuntimeInventory` | Host runtime snapshot — OS, package versions, npm/node/zot/pi. |
| `clawdie.provider-smoke.result.v1` | `ProviderSmokeResult` | DeepSeek cache-hit probe result and token accounting. |
Schema constants and structs live in `crates/colibri-contracts/src/lib.rs`.
## Evolution rules
- The crate carries **no logic** — only `serde` structs and schema constants.
- New fields are normally optional with `#[serde(default)]` so old manifests
still parse.
- `RuntimeInventory.pi` is optional because not every host installs `pi` or
`zot`.
- `HostStatus.raw` is a catch-all `serde_json::Value` so hostile collector
output can be captured without forcing a schema bump.
## Golden tests
`crates/colibri-contracts/tests/golden.rs` parses every committed manifest in
`manifests/` and asserts round-trip equality. The fixtures are intended to be
**cross-platform** — if a manifest produced on Linux differs from one produced
on FreeBSD 15, the difference must be understood and documented before it is
merged.
## See also
- [cost-model](./cost-model.md) — how the provider-smoke result feeds cache-hit
metering.
- [runtime-inventory](./runtime-inventory.md) — where the runtime inventory is
produced.

175
docs/wiki/deployment.md Normal file
View file

@ -0,0 +1,175 @@
# Deployment
← [index](./index.md)
The `clawdie` crate is Colibri's host installer. It discovers a machine's ZFS
layout and provisions the `clawdie` service. On FreeBSD this means an rc.d
service, ZFS datasets, and an unprivileged user. On Linux it can use systemd
and either ZFS or plain directories.
`crates/clawdie/src/main.rs`
`crates/clawdie/src/plan.rs`
`docs/ISO-SERVICE-LAYOUT.md`
`docs/CLAWDIE-INSTALLER-HANDOFF.md`
## Decisions
### ZFS is required on FreeBSD, preferred on Linux
FreeBSD does not support a plain-directory layout. If ZFS userland is missing,
the plan errors immediately. Linux can fall back to plain directories if no
pool is named and ZFS is unavailable, and it can create a fresh pool on a spare
disk when asked.
This matches the production target: bare-metal FreeBSD on a ZFS RAID1 mirror.
Linux support makes development and CI possible without a ZFS host.
### Storage is resolved, not configured
`clawdie plan` resolves storage in this order:
1. If `--pool NAME --create-pool DEVICE` is given, create that pool.
2. If `--pool NAME` is given, use that existing pool.
3. If no pool is given and exactly one pool exists, use it.
4. If multiple pools exist and none is named, error.
5. On Linux with no ZFS, fall back to plain directories.
This removes the need for a hand-written topology file on typical single-pool
hosts, while still allowing explicit control when needed.
`crates/clawdie/src/main.rs` (`pick_pool`, `validate_storage`)
### Datasets separate state from logs
When ZFS is used, the installer creates:
- `<pool>/clawdie` as a container dataset with `canmount=off`
- `<pool>/clawdie/db` mounted at `/var/db/clawdie`
- `<pool>/clawdie/log` mounted at `/var/log/clawdie`
Keeping database and logs in separate datasets lets snapshots, quotas, and
log-rotation policies apply independently.
`crates/clawdie/src/plan.rs` (`zfs_dataset_steps`)
### Dry-run by default
`clawdie apply` prints the plan and exits unless `--yes` is given. `discover`
and `plan` are read-only. This protects production hosts from accidental
provisioning.
`crates/clawdie/src/main.rs` (`Cmd::Apply`)
### Pool creation is guarded against busy disks
`--create-pool` on a non-empty disk is refused unless `--force` is also given.
The installer uses `lsblk` on Linux to detect partitions, filesystems, mount
points, and the root disk. The guard is conservative: if a disk is ambiguous,
it must be explicitly forced.
`crates/clawdie/src/disk.rs`
`crates/clawdie/src/main.rs` (`validate_create_device`)
### Single unprivileged service user
The service runs as `_clawdie` on both platforms. On FreeBSD the user is created
with `pw useradd -s /usr/sbin/nologin -d /var/db/clawdie` and exit code `65`
(already exists) is treated as a skip. On Linux `useradd --system` is used. The
state directories are then chowned to that user.
`crates/clawdie/src/platform.rs`
### Platform-specific service managers, same spec
`Platform` is an internal trait. The two implementations differ only in how
they install and enable the unit:
- FreeBSD: writes `/usr/local/etc/rc.d/clawdie`, uses `sysrc clawdie_enable=YES`.
- Linux: writes `/etc/systemd/system/clawdie.service`, runs `systemctl enable --now
clawdie`.
Both use the same `ServiceSpec` (binary, user, data dir, service name).
Running `apply` across platforms therefore produces the same filesystem layout
and differs only in the service-manager wrapper.
`crates/clawdie/src/platform.rs` (`FreeBsd`, `Linux`)
### Daemon runs through the platform supervisor
The generated FreeBSD rc.d script execs `/usr/local/bin/colibri-daemon` through
`/usr/sbin/daemon -u _clawdie` so the supervisor restarts on crash and the
process drops to the unprivileged user. The systemd unit is a simple service
with `Restart=on-failure`.
The installer itself does not start the daemon or stage the binary; it only
creates the environment. The operator or package build stages
`colibri-daemon` and then `service clawdie start`.
`docs/ISO-SERVICE-LAYOUT.md` (rc.d through daemon(8))
### Secrets are not written by the installer
The installer does not touch provider API keys. A separate file — conventionally
`/usr/local/etc/colibri/provider environment file — holds secrets and is sourced by rc.d
before the daemon starts. This keeps the installer's blast radius limited to
ZFS, directories, users, and service files.
→ [vault-provision](./vault-provision.md)
### Steps are executed sequentially and stop on failure
`deploy::apply` runs each `Step` in order. `Run` steps shell out and fail on a
non-zero exit unless the step declares allowed exit codes. `WriteFile` steps
create parent directories, write the file, and chmod it. If any step fails,
apply stops immediately and reports the failing command and stderr.
`crates/clawdie/src/deploy.rs`
## Plan shape
```text
clawdie plan
├── ZFS layout (or plain dirs)
│ ├── create <pool>/clawdie container
│ ├── create <pool>/clawdie/db -> /var/db/clawdie
│ └── create <pool>/clawdie/log -> /var/log/clawdie
└── service install
├── create user _clawdie
├── chown state dirs
├── write service unit (rc.d / systemd)
├── enable service (sysrc / systemctl)
└── [systemd] daemon-reload + start
```
## Typical FreeBSD install
```sh
# discover
clawdie discover
# preview
clawdie plan
# provision datasets, user, and rc.d service
sudo clawdie apply --yes
# start once the colibri-daemon binary is staged
sudo service clawdie start
```
## Cross-link to runtime paths
After deployment, the service owns these paths:
- `/var/db/clawdie/colibri.sqlite` — SQLite coordination store
- `/var/run/clawdie/clawdie.sock` — daemon Unix socket
- `/var/log/clawdie/daemon.log` — stdout/stderr log
- `/usr/local/etc/colibri/` — configuration and provider secrets
→ [store-schema](./store-schema.md)
→ [operator-cli](./operator-cli.md)

138
docs/wiki/external-mcp.md Normal file
View file

@ -0,0 +1,138 @@
# External MCP bridge
← [index](./index.md)
`colibri-mcp` is the Model Context Protocol bridge between Colibri and
MCP-capable editors (Zed, Cursor, Windsurf, Claude Code). It exposes the
current daemon state as MCP tools today and acts as a small MCP host for
arbitrary external stdio MCP servers as a prototype.
## Why MCP?
The daemon already exposes a typed Unix-socket API through
`crates/colibri-client`. MCP wraps that API into the standard JSON-RPC tool
protocol that editors already speak. This avoids the maintenance cost and
political risk of forking or embedding an editor, keeps Colibri headless-safe,
and lets any MCP-compatible client access the same surface.
For the longer-term product framing, see ../CLAWDIE-STUDIO-PROPOSAL.md.
## Two roles in one binary
`colibri-mcp` serves as both:
1. **MCP server for Colibri** — presents tools such as `colibri_status`,
`colibri_snapshot`, `colibri_list_tasks`, `colibri_create_task`, etc.
2. **MCP host for external servers** — reads a registry file, spawns configured
proc ess servers, and proxies `tools/list` and `tools/call` to them.
Separating these roles would create a second binary for little gain; hosting
external servers is gated so the default surface stays read-only.
## Daemon socket resolution
The MCP server must reach the daemon. The socket path is resolved in order:
1. `--socket` CLI flag
2. `COLIBRI_MCP_SOCKET`
3. `COLIBRI_DAEMON_SOCKET`
4. `DaemonConfig::from_env().socket_path` (env-driven defaults)
This mirrors how the operator CLI and TUI resolve the same socket.
## Colibri tools and gates
| Tool | Default | Gate |
| ----------------------- | ----------- | --------------------------------- |
| `colibri_status` | read-only | none |
| `colibri_snapshot` | read-only | none |
| `colibri_list_tasks` | read-only | none |
| `colibri_list_skills` | read-only | none |
| `colibri_create_task` | write-gated | `COLIBRI_MCP_WRITE=1` / `--write` |
| `colibri_intake_task` | write-gated | `COLIBRI_MCP_WRITE=1` / `--write` |
| `colibri_set_cost_mode` | write-gated | `COLIBRI_MCP_WRITE=1` / `--write` |
The default ISO posture is read-only. Mutating commands require the operator to
opt in explicitly, which prevents an assistant from creating tasks or switching
cost mode by accident.
## External MCP host
The prototype external-host tools are always exposed but only allow calling an
external tool when the separate `COLIBRI_MCP_EXTERNAL_CALL=1` / `--external-call`
flag is set.
### Registry
External servers are configured from a JSON registry. Default path:
/usr/local/etc/colibri/external-mcp.json. Override with
`COLIBRI_MCP_EXTERNAL_CONFIG` or `--external-config`.
Each entry declares a command, args, optional env, and optional jail
confinement:
```json
{
"servers": {
"demo": {
"command": "/usr/local/bin/demo-mcp-server",
"args": ["--stdio"],
"env": { "DEMO_MODE": "1" },
"jail": { "name": "mcp0", "root_path": "/usr/local/bastille/jails/mcp0/root" }
}
}
}
```
### Confinement
External MCP servers execute arbitrary code on the operator machine, so they
reuse the same jail primitive as agent spawning:
`colibri_daemon::spawner::{prepare_spawn_command, jail_wrap, JailConfig, PrivMode}`.
- `jail.name` enters an existing persistent jail via `jexec`.
- `jail.root_path` creates an ephemeral jail for the duration of the call.
- Omitting `jail` runs the server on the host, but stdin/stdout framing is the
same either way.
The root-only jail step honors the shared `COLIBRI_JAIL_PRIV_MODE` policy (`mdo`
on the operator USB, `helper` on deployed hosts). See [jail-confinement](./jail-confinement.md).
### Request lifecycle
Every external `tools/list` or `tools/call` request:
1. Spawns a fresh process (`ExternalMcpSession::start`) using the shared spawner.
2. Runs the MCP `initialize` handshake with protocol version `2024-11-05`.
3. Sends `tools/list` or `tools/call`, reads the response over newline-delimited
JSON, and returns the result.
4. Kills the child and removes the staged cleanup directory.
This is intentionally simple: one process per request, no connection pool, no
streaming, no long-lived state. It is good enough for prototyping; a production
host should add policy, audit logging, secret management, and per-tool
permissions.
## Why separate `COLIBRI_MCP_WRITE` and `COLIBRI_MCP_EXTERNAL_CALL`
`COLIBRI_MCP_WRITE` gates mutations against the local Colibri daemon. External
tool calls execute arbitrary third-party binaries and therefore live on a
different trust surface. Requiring two separate opt-ins makes accidental
privilege escalation harder.
## Limits and open questions
- stdio transport only
- one external process per request
- no server/tool allowlist beyond the registry file
- no streaming tool results
- no production secret manager integration
Those limits are recorded as explicitly accepted for now; if the prototype is
promoted to default ISO behavior, each limit should be addressed.
## See also
- [jail-confinement](./jail-confinement.md) — jail policy reused for external MCP servers
- [cost-model](./cost-model.md) — cost mode and the write-gated `colibri_set_cost_mode`
- [skills-catalog](./skills-catalog.md) — read-only skill catalog exposed via `colibri_list_skills`

View file

@ -43,13 +43,22 @@ warning.
## Pages
| Page | What it covers |
| ----------------------------------------- | --------------------------------------------------------------------------------------------- |
| [agent-harness](./agent-harness.md) | The zot (agent) + Colibri (control plane) split; autospawn + RPC driver |
| [cost-model](./cost-model.md) | Byte-stable prefixes, cache-hit metering, auto-escalation, T14 compaction |
| [glasspane](./glasspane.md) | Agent state machine, JSONL streaming, AgentRuntime taxonomy, snapshot API |
| [jail-confinement](./jail-confinement.md) | Persistent vs ephemeral jails, priv-mode policy, reuse of spawner confinement for MCP servers |
| [mother-hive](./mother-hive.md) | Mother MCP architecture — forced-command SSH, single-home-in-colibri, peer auth, key-on-seed |
| [naming-decisions](./naming-decisions.md) | Ledger of harness-neutral / architecture renames — shipped and in-flight |
| [task-board](./task-board.md) | Capability match scoring, cron scheduling, intake drain, SQLite backing |
| [quality-gates](./quality-gates.md) | `ci-checks.sh` as the pre-merge gate; why drift reached `main` before |
| Page | What it covers |
| ------------------------------------------- | --------------------------------------------------------------------------------------------- |
| [agent-harness](./agent-harness.md) | The zot (agent) + Colibri (control plane) split; autospawn + RPC driver |
| [cost-model](./cost-model.md) | Byte-stable prefixes, cache-hit metering, auto-escalation, T14 compaction |
| [glasspane](./glasspane.md) | Agent state machine, JSONL streaming, AgentRuntime taxonomy, snapshot API |
| [jail-confinement](./jail-confinement.md) | Persistent vs ephemeral jails, priv-mode policy, reuse of spawner confinement for MCP servers |
| [mother-hive](./mother-hive.md) | Mother MCP architecture — forced-command SSH, single-home-in-colibri, peer auth, key-on-seed |
| [naming-decisions](./naming-decisions.md) | Ledger of harness-neutral / architecture renames — shipped and in-flight |
| [task-board](./task-board.md) | Capability match scoring, cron scheduling, intake drain, SQLite backing |
| [quality-gates](./quality-gates.md) | `ci-checks.sh` as the pre-merge gate; why drift reached `main` before |
| [contracts](./contracts.md) | Stable JSON schemas (run-manifest, runtime-inventory, provider-smoke), golden tests |
| [store-schema](./store-schema.md) | SQLite coordination schema and migration discipline |
| [external-mcp](./external-mcp.md) | MCP bridge for editors + external stdio MCP host; read/write/external-call gates |
| [operator-cli](./operator-cli.md) | The `colibri` CLI as a thin typed Unix-socket client over the daemon API |
| [tui](./tui.md) | Terminal dashboard client (colibri-tui) vs the colibri-glasspane state machine |
| [runtime-inventory](./runtime-inventory.md) | Host runtime inventory + watchdog status reader; additive, read-only integrations |
| [skills-catalog](./skills-catalog.md) | Read-only runtime consumer for reviewed Clawdie-AI skill artifacts |
| [vault-provision](./vault-provision.md) | Vaultwarden-driven env-file provisioning into jails after agent spawn |
| [deployment](./deployment.md) | Host installer (clawdie): ZFS layout, rc.d/systemd service, dry-run safety |

124
docs/wiki/operator-cli.md Normal file
View file

@ -0,0 +1,124 @@
# Operator CLI (`colibri`)
← [index](./index.md)
The `colibri` binary is the operator's command-line interface to the daemon.
It wraps a typed Unix-socket client (`DaemonClient`) and turns typed commands
into newline-delimited JSON messages on the control-plane socket. It is not
where policy lives — policy lives in the daemon behind the socket.
## Job of the CLI
The CLI has two responsibilities:
1. **Parse shell input** into strongly-typed commands.
2. **Send those commands** to the daemon and print the JSON response.
It does not contain business logic about session compaction, task scheduling,
or jail confinement. That keeps the CLI small and lets any other client (TUI,
MCP bridge, web dashboard, tests) perform the same operations with the same
protocol.
`crates/colibri-client/src/bin/colibri.rs` (argument parsing and `run` dispatch)
`crates/colibri-client/src/lib.rs` (`DaemonClient` request/response wrapper)
## Decisions
### One binary, one socket, one protocol
Every command — `status`, `snapshot`, `spawn-agent`, `create-task`,
`register-tenant` — goes over the same Unix socket. The CLI builds a
`DaemonClient`, serializes a `ColibriCommand`, writes one line ending in `\n`,
and reads one `ColibriResponse` line back.
Because the protocol is newline-delimited JSON, operators can still debug with
`nc -U` or similar when the CLI is not enough. The socket is the stable API;
the CLI is a polished client.
`crates/colibri-daemon/src/lib.rs` (`ColibriCommand`, `ColibriResponse`)
`crates/colibri-daemon/src/socket.rs` (dispatch table)
### Socket resolution order matches other clients
The CLI resolves the daemon socket the same way the TUI and MCP bridge do:
1. `--socket PATH`
2. `COLIBRI_DAEMON_SOCKET`
3. `DaemonConfig::from_env().socket_path`
Sharing the resolution order means documentation, environment setup scripts,
and operator muscle memory apply to every client.
`crates/colibri-client/src/bin/colibri.rs` (`default_socket_path`)
### No write-gating inside the CLI itself
Commands that mutate state (`create-task`, `kill-agent`, `set-cost-mode`,
`register-tenant`) are not blocked by CLI flags. The gate is the Unix socket
itself: the daemon is configured to listen on a unix socket with operator-only
permissions, and the daemon validates each command. This avoids two parallel
permission layers that could drift out of sync.
This is an intentional contrast with `colibri-mcp`, which exposes the daemon to
editor assistants and therefore uses `COLIBRI_MCP_WRITE=1` as an explicit trust
switch. An operator at the shell already has that trust by virtue of the socket.
→ [external-mcp](./external-mcp.md)
### Commands return JSON, not human prose
All successful CLI commands print pretty-printed JSON. This keeps the output
scriptable (`colibri snapshot | jq '.panes[] | select(.state == "working")'`)
and consistent with the socket protocol. If a command fails, the CLI prints the
daemon's error message to stderr and exits non-zero.
`crates/colibri-client/src/lib.rs` (`request`, error handling)
### `spawn-agent` accepts jail confinement directly
The `--jail-name` and `--jail-root` flags on `spawn-local` and `spawn-agent`
build a `JailConfig` that is sent to the daemon. The same type is re-exported
from `colibri-daemon::spawner` so the CLI crate does not have to depend on the
daemon crate just to build a config.
Pairing `--jail-name` with `--jail-root` is the only path that triggers vault
provisioning after a spawn, because the daemon needs both the jail identity
and the host-visible jail root.
`crates/colibri-client/src/lib.rs` (`JailConfig` re-export)
`crates/colibri-daemon/src/spawner.rs`
### Local sample agent lives next door
The same crate also ships `colibri-test-agent`, a tiny sample binary used by
tests and the TUI's spawn shortcut. Keeping it in `colibri-client` keeps the
sample close to its primary caller without adding a new crate.
`crates/colibri-client/src/bin/colibri_test_agent.rs`
## Notable commands
| Command | Purpose |
| ---------------------------------------------------------------- | --------------------------------- |
| `status` | daemon health, paths, cost mode |
| `snapshot` / `glasspane-snapshot` | current pane radar view |
| `list-sessions` | active agent sessions |
| `spawn-local` / `spawn-agent` | start an agent, optionally jailed |
| `kill AGENT_ID` | terminate a pane/agent |
| `create-task` / `intake-task` / `claim-task` / `transition-task` | task-board workflow |
| `set-cost-mode MODE` | acknowledge/toggle cost mode |
| `register-tenant` / `list-tenants` | vault provisioning bookkeeping |
| `register-skill` / `list-skills` | skill catalog maintenance |
| `register-agent` / `list-agents` | agent capability registration |
## See also
- [tui](./tui.md) — the live terminal dashboard that uses the same `DaemonClient`
- [glasspane](./glasspane.md) — the pane state machine behind `snapshot`
- [task-board](./task-board.md) — commands that manipulate the task board
- [store-schema](./store-schema.md) — SQLite entities queried by the CLI
- [vault-provision](./vault-provision.md) — why `register-tenant` carries a jail root path
- [external-mcp](./external-mcp.md) — another daemon client with write-gating

View file

@ -0,0 +1,88 @@
# Runtime inventory and host status
← [index](./index.md)
Colibri discovers the host in two complementary ways:
1. **Runtime inventory** — a one-shot probe that reports versions installed on
the machine (`node`, `npm`, `pi`, `zot`, package manager, OS, etc.).
2. **Watchdog host status** — a read-only, newline-framed Unix-socket call to
the Clawdie watchdog that returns live health metrics.
Both are intentionally additive: they read from Clawdie, they do not change
it. This page records the design of those read-only integrations.
## Runtime inventory probe
The `colibri-runtime-inventory` binary (`src/bin/runtime_inventory.rs`) emits a
single JSON object matching the `clawdie.runtime-version-inventory.v1` schema
from `crates/colibri-contracts/src/lib.rs`.
### Detection strategy
| Field | How it is detected |
| ----------------- | ------------------------------------------------------------------------------------------------------ |
| `host` | `COLIBRI_HOST``HOSTNAME``hostname` command → `"unknown"` |
| `os` | `uname -sr` + target architecture; falls back to `std::env::consts` |
| `node` | `node --version` |
| `npm` | `npm --version` |
| `npm_prefix` | `npm config get prefix` |
| `package_manager` | `pkg` on FreeBSD, otherwise `apt` / `dnf` / `brew` |
| `pi` | `PI_BIN``~/.npm-global/bin/pi``pi --version` → package.json of `@earendil-works/pi-coding-agent` |
| `zot` | `ZOT_BIN``zot --version` across PATH and candidate locations |
### Why this shape
- `pi` is an npm package installed in `node_modules`, so version detection must
fall back to reading its package manifest when `--version` is missing.
- `zot` is a single Go binary, so a plain `--version` probe is correct.
- `pi`/`zot` are optional; a host that only runs one agent runtime should
still produce a valid inventory.
## Watchdog host status
`crates/colibri-runtime/src/lib.rs` implements the watchdog reader. It connects
over a Unix domain socket, sends `{"cmd":"status"}\n`, reads back one
newline-terminated JSON line, and normalizes the response into `HostStatus`.
### Socket path resolution
The search order lets operators, services, and test harnesses override the
socket location without recompiling:
1. `COLIBRI_WATCHDOG_SOCKET` (explicit override)
2. `COLIBRI_SERVICE_NAME` (default `clawdie`) → `{service}-watchdog.sock`
3. `TMP_IPC_DIR/{service}-watchdog.sock`
4. `AGENT_TMP_DIR/ipc/{service}-watchdog.sock` or
`CLAWDIE_TMP_DIR/ipc/{service}-watchdog.sock`
5. `$HOME/clawdie-ai/tmp/ipc/{service}-watchdog.sock`
6. `tmp/ipc/{service}-watchdog.sock` (final fallback)
### Wire protocol
- Framing: one line, newline-terminated.
- Request: `{"cmd":"status"}\n`.
- Expected response: `{"ok": true, "data": { ... watchful host fields ... }}`.
- Timeout: 2 seconds by default, overridable in `WatchdogReadOptions`.
### Normalization rules
`normalize_watchdog_status()` in `crates/colibri-runtime/src/lib.rs` is defensive:
- Missing fields default to `"unknown"` for strings, `0` for counters, and
`false` for booleans.
- `controlplane_status` is lifted from `controlplane.overallStatus`.
- The original raw object is preserved under `HostStatus.raw` so callers can
access fields Colibri does not yet model.
## Golden fixtures
`crates/colibri-contracts/tests/golden.rs` parses committed inventory and
host-status manifests in `manifests/` and round-trips them through the Rust
structs. Those fixtures come from real hosts (`osa`, `domedog`, `debby`, the
operator USB) and are treated as cross-platform source material.
## See also
- [contracts](./contracts.md) — stable schemas for inventory and host-status.
- [cost-model](./cost-model.md) — how runtime inventory feeds cost decisions.

161
docs/wiki/skills-catalog.md Normal file
View file

@ -0,0 +1,161 @@
# Skills catalog
← [index](./index.md)
`colibri-skills` is Colibri's read-only runtime consumer for Clawdie-AI skill
artifacts. Clawdie-AI authors and reviews the skillpacks; Colibri indexes
them, validates checksums, chunks searchable text, and exposes typed structs to
the daemon, CLI, and TUI. This crate does not author skills.
`crates/colibri-skills/src/lib.rs`
`docs/COLIBRI-SKILLS-PLAN.md`
## Decisions
### Source of truth stays in Clawdie-AI
Skill artifacts live in the `clawdie-ai` repository, not in `colibri`. They are
committed reviewed directories containing prose, screenshots, transcripts,
scripts, a manifest, and a checksum file. `colibri-skills` imports these
artifacts into Colibri's SQLite store at runtime.
This split preserves review discipline: a skill changes through a PR in its
home repo, then Colibri re-indexes the checkout.
### Read-only, not authoring
The crate deliberately lacks "create skill" or "edit skill" operations. Those
belong in Clawdie-AI where human review and media pipelines run. Putting
authoring here would duplicate state and split review authority.
The import path is target for Phase 1: scan the configured Clawdie-AI checkout,
parse manifests, verify checksums, and upsert into SQLite. The type scaffold
exists today; the importer, chunker, and FTS5 index are planned.
`docs/COLIBRI-SKILLS-PLAN.md` (Phases 1-7)
### Manifest-driven identity
Each skill directory contains a run manifest file. From it the importer derives:
- `skill_id`
- `display_name`
- `source_path` within the Clawdie-AI checkout
- pipeline stages and models used
- source media metadata
Any file not listed in the manifest can still be classified and indexed as an
artifact, but the manifest is the canonical identity document.
### Artifact classification by extension and filename
`ArtifactType::from_path` classifies files without relying on a sidecar:
- Python or shell files → Script
- paths containing contact_sheet → ContactSheet
- paths containing run_manifest and ending in .json → Manifest
- paths containing sha256 or checksum → Checksum
- paths containing report and ending in .json → Report
- .md → Document
- .jpg / .png / .webp → Image
- .txt transcript files → Transcript
- anything else → Other
This heuristic keeps classification local and fast. Misclassified files can be
fixed by renaming within Clawdie-AI.
`crates/colibri-skills/src/lib.rs` (`ArtifactType::from_path`)
### Checksums are validated, then stored
The run manifest is accompanied by a checksum file. At import time the runtime
computes SHA-256 of each artifact and compares it to the committed checksum.
Failures are reported in `ImportSummary::checksum_failures` and prevent
`success()`.
Only the hash is stored in SQLite; image and media blobs stay on disk. The
catalog stores relative paths and hashes, not the binary content.
### Content is chunked into searchable units
The planned chunker turns skill content into `SkillChunk` rows:
- Markdown sections by heading
- Command blocks
- Code blocks
- Tables
- Transcript segments
Chunks are the unit of search and the unit shown in TUI or CLI results.
`SkillChunk` carries `line_start`/`line_end` so a hit can point back to the
source artifact.
`crates/colibri-skills/src/lib.rs` (`SkillChunk`, `ChunkType`)
### SQLite + FTS5 as the runtime search backend
The target schema keeps three tables:
- `system_skills` — one row per skill
- `system_skill_artifacts` — one row per file
- `system_skill_chunks` — one row per searchable chunk, plus a virtual FTS5
table for ranked text search
This matches the store's pragmatic relational model. If skill volumes grow
beyond tens of thousands of chunks, we can move the FTS index to PostgreSQL
pgvector; until then, SQLite keeps the control-plane self-contained.
→ [store-schema](./store-schema.md)
`docs/COLIBRI-SKILLS-PLAN.md` (SQLite schema target)
### Status is a lifecycle marker, not a state machine
`SkillStatus` is `active`, `archived`, or `superseded`. There is no pending
review state because review happens in Clawdie-AI before import. Colibri simply
stops returning archived skills in default searches but keeps them in the store
for audit and explicit lookups.
### Natural-language verification question
Each skill can carry a `verification` field like "can the user create and run
an Astro project?". This is not an executable test; it is the acceptance
criterion used during skill review and later during agent self-verification.
### Runtime commands are read-only
The CLI surface is planned as:
- `colibri list-skills`
- `colibri show-skill <id>`
- `colibri search-skills <query>`
- `colibri index-skills`
- `colibri verify-skill <id>`
`index-skills` refreshes the catalog from disk. The remaining commands query the
runtime store. None mutate the Clawdie-AI checkout.
→ [operator-cli](./operator-cli.md)
## Entity shape
```text
Skill
├─ skill_id, display_name, source_path, status, verification
├─ SkillManifest
│ ├─ run_id, created, notes
│ ├─ ManifestSource
│ ├─ [PipelineStage]
│ └─ [ModelUsage]
└─ [SkillArtifact]
├─ artifact_type, relative_path, file_name, mime_type, size_bytes, sha256_hash
└─ [SkillChunk]
├─ chunk_type, heading, content, line_start, line_end, tokens_estimate
```
## See also
- [store-schema](./store-schema.md) — coordination and planned skill catalog tables
- [operator-cli](./operator-cli.md) — planned skill catalog CLI commands
- [task-board](./task-board.md) — agents will match claimed tasks to skills by capability

140
docs/wiki/store-schema.md Normal file
View file

@ -0,0 +1,140 @@
# Store schema
← [index](./index.md)
Colibri's coordination store is a single SQLite database owned by the `colibri`
service. It holds the task board, the registry of agents and skills, and the
vault tenant map. It is not a cache — it is durable state. Most writes happen
through the daemon's socket API, but the schema belongs to `colibri-store`.
`crates/colibri-store/src/schema.rs`
`crates/colibri-store/src/lib.rs`
## Decisions
### SQLite, not PostgreSQL, for the control-plane store
The store is SQLite because the control plane needs a single-file database that
is easy to back up, snapshot, inspect, and ship. PostgreSQL with pgvector is
planned for retrieval/long-term memory, but the task board and agent registry do
not need a server process.
The daemon batches related writes and relies on SQLite's WAL mode for concurrent
readers. This keeps the operator stack self-contained on a small bare-metal host.
### WAL + foreign keys by default
`Store::open` runs three pragmas on every startup:
- `journal_mode=WAL` — readers don't block writers.
- `synchronous=NORMAL` — a safe middle ground between full-synchronous and OFF.
- `foreign_keys=ON` — the task/agent FK is enforced.
These are not configurable at runtime. If we ever need different durability or
concurrency guarantees, we should make it explicit rather than letting the
connection inherit defaults.
`crates/colibri-store/src/lib.rs` (`Store::open`)
### Idempotent migrations only
Migrations run on every `Store::open`. They use `IF NOT EXISTS` tables and
indexes, so repeated runs are safe. We do not ship downward migrations; schema
evolution is additive tables and columns. If a destructive migration is ever
needed, it must be a deliberate manual step documented in a handoff.
`crates/colibri-store/src/schema.rs`
### Four tables for four concerns
| Table | Concern | Key entity |
| --------- | ----------------------- | ---------- |
| `tasks` | Task board | `Task` |
| `agents` | Registered teammates | `Agent` |
| `skills` | Team skill catalog | `Skill` |
| `tenants` | Vault/secret tenant map | `Tenant` |
Tasks carry an `agent_id` foreign key into `agents`. Every other relationship is
loose — skills are not linked to agents, and tenants are referenced by their
`tenant_id` in socket commands and provisioning hooks.
`crates/colibri-store/src/schema.rs`
### Task-status CHECK constraint is the source of truth
`tasks.status` is constrained to `('queued','claimed','started','done','failed')`.
The Rust `TaskStatus` enum mirrors it, but the database is the final gate. A
command that tries to insert an unknown status fails at write time.
`crates/colibri-store/src/schema.rs`
### Agent capabilities stored as JSON, not normalized
`agents.capabilities` is a JSON blob like `["code","rust","freebsd"]`. We
avoided a separate capabilities table because capability tags are just
strings, and the team registry is small. Normalized joins would add schema
complexity without improving query power.
If capability metadata grows (weights, versions, required skills), we can split
it later; the current schema intentionally stays pragmatic.
`crates/colibri-store/src/lib.rs` (`register_agent`)
### Tenants encode the 1:1:1 jail/vault/collection map
`tenants` stores `tenant_id`, `jail_root_path`, and `collection_id` as UNIQUE
columns. The rule is `tenant_id = jail name = Vaultwarden collection`. This
lets `colibri-vault` look up a jail by name and know exactly which host path and
Vaultwarden collection to use when writing the environment file.
The tenant `status` column tracks the lifecycle:
`provisioned → active → stopped → destroyed`. It is independent of whether the
jail process is running; lifecycle management is a separate concern.
`crates/colibri-store/src/schema.rs` (comments on `tenants`)
### Default database path is platform-specific
The store default is:
- `COLIBRI_DB_PATH` if set.
- FreeBSD: `/var/db/colibri/colibri.sqlite`.
- Linux/macOS: `$XDG_DATA_HOME/colibri/colibri.sqlite`, falling back to
`$HOME/.local/share/colibri/colibri.sqlite`, then `/tmp`.
FreeBSD defaults to `/var/db` because that is the conventional local-state
directory for services. The Linux fallback respects XDG, so development on a
workstation feels normal.
`crates/colibri-store/src/lib.rs` (`default_db_path`)
### JSON export for backups and parity tests
`Store::export_json()` dumps all four tables into one JSON object. It exists
for dual-run parity diffs, ad-hoc backups, and debugging. It is not the primary
query API; most readers should use the typed methods.
## Entity relationships
```text
tasks.agent_id ----------> agents.id
tasks agents skills tenants
----- ------ ------ -------
id id id tenant_id
agent_id FK name name jail_root_path
status capabilities description collection_id
title status category status
description created_at created_at created_at
created_at updated_at
updated_at
```
## See also
- [task-board](./task-board.md) — task lifecycle and capability matching
- [operator-cli](./operator-cli.md) — socket commands that write to these tables
- [vault-provision](./vault-provision.md) — how the tenants table drives env-file provisioning
- [jail-confinement](./jail-confinement.md) — jail names map to tenant rows
- [skills-catalog](./skills-catalog.md) — the read-only skills consumer

104
docs/wiki/tui.md Normal file
View file

@ -0,0 +1,104 @@
# Terminal dashboard (colibri-tui)
← [index](./index.md)
The TUI is Colibri's live terminal dashboard. It connects to the daemon's Unix
socket, pulls the `GlasspaneSnapshot`, and renders a color-coded table of
supervised panes. It is a **display client**, not part of the daemon, and not
the same thing as `colibri-glasspane`.
## Why it is not `colibri-glasspane`
`colibri-glasspane` is the **state machine** that decides what state an agent
is in from its JSONL events. `colibri-tui` is the **screen** that asks the
daemon "what does the radar look like right now?" and draws it.
| Artifact | Role | Resident crate |
| ------------------- | -------------------------------------------------------- | ------------------------------------------------------- |
| `colibri-glasspane` | Pane state machine, event ingestor, snapshot builder | `crates/colibri-glasspane` |
| `colibri-tui` | Terminal dashboard client with rows, colors, keybindings | `crates/colibri-glasspane-tui` (binary = `colibri-tui`) |
The split matters because the daemon, the MCP bridge, the CLI, and tests all
use `colibri-glasspane`. The TUI is just one consumer. If the TUI is not
installed, or crashes, agents keep running.
## Decisions
### Keep the daemon separate from any terminal UI
`colibri-tui` is a standalone process. It resolves the daemon socket the same
way the CLI does (`DaemonConfig::from_env().socket_path`), then calls
`client.glasspane_snapshot()` every two seconds. The daemon has no awareness of
crossterm or ratatui.
This is the same "service owns state, clients render it" pattern as the MCP
bridge and the CLI. It keeps Colibri headless-safe, which is required for an
`rc.d` service that must boot before any operator logs in.
`crates/colibri-glasspane-tui/src/main.rs` (socket resolution, refresh loop)
### TUI gets spawn/kill keys, not just read-only status
You can spawn a local test agent (`s`) and kill the selected pane (`x`) from
the dashboard. That overlaps with commands the `colibri` CLI can already do,
but the experience is different: a CLI command is one-shot; the TUI is a live
supervision surface with a selected row and an immediate status bar.
We kept the action keys because the dashboard's job is to let an operator
notice and react — spot a stalled pane and kill it without leaving the
terminal.
`crates/colibri-glasspane-tui/src/main.rs` (`spawn_agent`, `kill_selected`)
### One taxonomy from one snapshot
The TUI does not parse agent stdout. It only reads the already-folded
`GlasspaneSnapshot`, so Pi, zot, and local test agents are rendered with the
same columns, colors, and state icons. The rendering code concerns itself only
with layout and keybindings; all semantic decisions live in
`colibri-glasspane`.
`crates/colibri-glasspane/src/lib.rs` (`AgentState`, `GlasspaneSnapshot`)
### Naming: the binary is `colibri-tui`, the crate is `colibri-glasspane-tui`
The crate directory is `colibri-glasspane-tui` because the package implements
"a TUI for the glasspane." The installed binary is named `colibri-tui`
because that is what an operator types. `CLAWDIE-STUDIO-PROPOSAL.md` and other
docs refer to `colibri-tui` as shorthand; there is no separate `colibri-tui`
crate.
This duality is currently accepted. If we ever add a second TUI surface (e.g.
a `colibri-tui-web` or `colibri-tui-gui`), the naming becomes confusing and
should be revisited.
## Current keybindings
| Key | Action |
| ---------------------- | ----------------------------------------------- |
| `q` / `Esc` | Quit, or close detail pane if open |
| `r` | Refresh snapshot now |
| `s` | Spawn a local `colibri-test-agent` |
| `x` | Kill the selected pane |
| `Enter` | Open/close the detail pane for the selected row |
| `Tab` / `Shift-Tab` | Cycle through distinct sessions |
| `j` / `k` or `↓` / `↑` | Navigate the pane table |
## When to use the TUI vs the CLI
Use the TUI when:
- You want a live, auto-refreshing view of all panes.
- You are picking a pane to inspect or kill visually.
- You are on an SSH session with only a terminal.
Use the `colibri` CLI when:
- You are scripting or piping output (`colibri snapshot | jq`).
- You need a command not bound to a key (e.g. `claim-task`, `set-cost-mode`).
- You want a one-shot answer without entering an alternate screen.
## See also
- [glasspane](./glasspane.md) — the pane state machine the TUI renders
- [operator-cli](./operator-cli.md) — the `colibri` CLI that shares the same socket client

View file

@ -0,0 +1,163 @@
# Vault provision
← [index](./index.md)
`colibri-vault` fetches secrets from a Vaultwarden collection and writes them
into a freshly created jail as `0600` env-file. It is invoked as a post-spawn
hook from the daemon, not by a human operator at provision time. The human step
is registering a tenant mapping; the daemon does the secret fetch.
`crates/colibri-vault/src/lib.rs`
`crates/colibri-daemon/src/daemon.rs` (`provision_tenant_env`)
`docs/VAULT-PROVISION-RUNBOOK.md`
## Decisions
### Tenant = jail name = Vaultwarden collection
The `tenants` table stores a 1:1:1 map:
- `tenant_id` — the jail name.
- `jail_root_path` — the host-visible root of the jail.
- `collection_id` — the Vaultwarden collection name (kept equal to the jail name).
This means `colibri-vault` does not need a separate lookup table or configuration
file. It finds the collection by the jail name and knows the destination path
from the tenant row.
→ [store-schema](./store-schema.md)
### Provisioning is a post-spawn hook, not a separate command
When the daemon spawns an agent with both `--jail-name` and `--jail-root`, it
calls `provision_tenant_env` after the jail is up. If the jail name has no
matching tenant row, the hook no-ops. If the provision fails, the agent still
starts, because a missing secret file should not leave the host with stale
partial jails. The daemon logs the failure.
`crates/colibri-daemon/src/socket.rs` (`jail_provision_target`)
### Fail-soft on missing tenant or vault error
The hook returns early (and silently) when:
- no tenant row matches the jail name;
- the stored `jail_root_path` does not match the spawned root; or
- the vault call fails.
These are warnings, not hard errors. The spawn itself succeeds. This reflects the
operational reality that secret tooling may be unavailable during boot or
experimental spawns, while the agent process should still be observable.
### Path containment before any write
`colibri-vault::provision` canonicalizes the target directory and asserts it is
strictly under the configured jail-root base (`/usr/local/bastille/jails` by
default, overridable with `COLIBRI_JAIL_ROOT_BASE`). The check runs before
`create_dir_all`, so a symlink or `..` path that escapes the jails tree results
in `TargetEscapesRoot` before any file is created.
This is the same filesystem containment primitive reused by the external MCP
server spawner.
→ [jail-confinement](./jail-confinement.md)
### Wrap the official `bw` CLI
We do not speak the Vaultwarden REST protocol directly. `colibri-vault` shells
out to the official `bw` CLI. This keeps authentication, session management, and
crypto off our plate.
The `bw` lifecycle is serialized across the process with a static `Mutex` because
`bw` keeps global state (one configured server and one session token per
process). Concurrent provisions would otherwise race on `bw config server` or
tear down each other's session.
### Bootstrap creds come from the daemon environment
The daemon is expected to receive three variables from the operator-provided
provider environment file:
- `BW_CLIENTID`
- `BW_CLIENTSECRET`
- `BW_PASSWORD`
Optional:
- `BW_SERVER` — the Vaultwarden host.
- `COLIBRI_JAIL_ROOT_BASE` — base path used for containment checks.
The CLI never sees these values; it only registers the tenant row that triggers
the hook.
→ [operator-cli](./operator-cli.md)
### Server-mismatch is fail-closed
If `BW_SERVER` is set and `bw` is already logged in to a different server,
`provision` returns `ServerMismatch`. We do not wipe state automatically because
cross-server confusion could leak credentials. An operator must `bw logout` if
they want to switch servers.
### Env-file content from login items and secure notes
Each Vaultwarden collection item becomes one or more `KEY=VALUE` lines:
- **Login item**: `item.name` becomes the key, `login.password` becomes the value.
- **Secure note**: each line is parsed as `KEY=VALUE` from the note body.
Keys are validated to `[A-Z0-9_]` after normalizing spaces, dashes, and dots
to underscores. Invalid keys are skipped with a warning.
Note: a key collision between two items produces a duplicate line. The consumer
is expected to ignore duplicates or define items accordingly.
### File mode and atomic-ish placement
The env file is written into the target directory and set to mode `0600`. The
target directory is created if it does not exist, but it must already resolve
under the jail-root base. The write is a single `std::fs::write`, then a
permission change; it is not atomic-swap. If the daemon crashes between the
write and the `chmod`, the file could momentarily have looser permissions. For
now, we accept this because the daemon has the directory created immediately
before the write and the target is inside the jail.
### Tenant status follows the provision state
`register_tenant` inserts the row with `status = provisioned`. After a successful
vault provision, the hook flips it to `active`. A stopped or destroyed jail may
later be moved to `stopped` or `destroyed` by the operator or a teardown flow.
Strictly, `provisioned` means the row is created; `active` means the secrets
have been materialized at least once.
## Flow
```text
register-tenant tenant_id jail_root collection_id
|
v
spawn-agent --jail-name tenant_id --jail-root jail_root
|
v
provision_tenant_env(tenant_id, jail_root)
|-- no tenant row -> no-op
|-- root mismatch -> warn, no-op
|-- else
v
bw login -> unlock -> list collection -> list items -> write env file @ 0600
|
v
set tenant status = active
agent starts running
```
## See also
- [store-schema](./store-schema.md) — how the tenant row is stored
- [jail-confinement](./jail-confinement.md) — how jails are created and confined
- [operator-cli](./operator-cli.md) — `register-tenant` and `spawn-agent` verbs
- [mother-hive](./mother-hive.md) — a related Vaultwarden-backed pubkey exchange
used to authorize agents to call mother