diff --git a/docs/wiki/contracts.md b/docs/wiki/contracts.md new file mode 100644 index 0000000..ed5ded6 --- /dev/null +++ b/docs/wiki/contracts.md @@ -0,0 +1,49 @@ +# Stable JSON contracts + +← [index](./index.md) + +`colibri-contracts` holds the stable, language-agnostic wire shapes shared +between Colibri (Rust) and Clawdie agents (TypeScript). It owns _schemas and +(De)serialize_, not business logic. + +## Why a separate contracts crate + +- Prevent duplicated definitions between Rust and TypeScript lanes. +- Keep committed manifests in `manifests/` parseable by both sides. +- Centralize schema strings, field renaming aliases, and backward-compat + defaults. + +## Active schemas + +| Schema | Rust struct | Purpose | +| -------------------------------------- | --------------------- | -------------------------------------------------------------- | +| `clawdie.interagent.run-manifest.v1` | `RunManifest` | Records a build/test run — role, agent, artifacts, summary. | +| `clawdie.runtime-version-inventory.v1` | `RuntimeInventory` | Host runtime snapshot — OS, package versions, npm/node/zot/pi. | +| `clawdie.provider-smoke.result.v1` | `ProviderSmokeResult` | DeepSeek cache-hit probe result and token accounting. | + +Schema constants and structs live in `crates/colibri-contracts/src/lib.rs`. + +## Evolution rules + +- The crate carries **no logic** — only `serde` structs and schema constants. +- New fields are normally optional with `#[serde(default)]` so old manifests + still parse. +- `RuntimeInventory.pi` is optional because not every host installs `pi` or + `zot`. +- `HostStatus.raw` is a catch-all `serde_json::Value` so hostile collector + output can be captured without forcing a schema bump. + +## Golden tests + +`crates/colibri-contracts/tests/golden.rs` parses every committed manifest in +`manifests/` and asserts round-trip equality. The fixtures are intended to be +**cross-platform** — if a manifest produced on Linux differs from one produced +on FreeBSD 15, the difference must be understood and documented before it is +merged. + +## See also + +- [cost-model](./cost-model.md) — how the provider-smoke result feeds cache-hit + metering. +- [runtime-inventory](./runtime-inventory.md) — where the runtime inventory is + produced. diff --git a/docs/wiki/deployment.md b/docs/wiki/deployment.md new file mode 100644 index 0000000..4eb280c --- /dev/null +++ b/docs/wiki/deployment.md @@ -0,0 +1,175 @@ +# Deployment + +← [index](./index.md) + +The `clawdie` crate is Colibri's host installer. It discovers a machine's ZFS +layout and provisions the `clawdie` service. On FreeBSD this means an rc.d +service, ZFS datasets, and an unprivileged user. On Linux it can use systemd +and either ZFS or plain directories. + +→ `crates/clawdie/src/main.rs` + +→ `crates/clawdie/src/plan.rs` + +→ `docs/ISO-SERVICE-LAYOUT.md` + +→ `docs/CLAWDIE-INSTALLER-HANDOFF.md` + +## Decisions + +### ZFS is required on FreeBSD, preferred on Linux + +FreeBSD does not support a plain-directory layout. If ZFS userland is missing, +the plan errors immediately. Linux can fall back to plain directories if no +pool is named and ZFS is unavailable, and it can create a fresh pool on a spare +disk when asked. + +This matches the production target: bare-metal FreeBSD on a ZFS RAID1 mirror. +Linux support makes development and CI possible without a ZFS host. + +### Storage is resolved, not configured + +`clawdie plan` resolves storage in this order: + +1. If `--pool NAME --create-pool DEVICE` is given, create that pool. +2. If `--pool NAME` is given, use that existing pool. +3. If no pool is given and exactly one pool exists, use it. +4. If multiple pools exist and none is named, error. +5. On Linux with no ZFS, fall back to plain directories. + +This removes the need for a hand-written topology file on typical single-pool +hosts, while still allowing explicit control when needed. + +→ `crates/clawdie/src/main.rs` (`pick_pool`, `validate_storage`) + +### Datasets separate state from logs + +When ZFS is used, the installer creates: + +- `/clawdie` as a container dataset with `canmount=off` +- `/clawdie/db` mounted at `/var/db/clawdie` +- `/clawdie/log` mounted at `/var/log/clawdie` + +Keeping database and logs in separate datasets lets snapshots, quotas, and +log-rotation policies apply independently. + +→ `crates/clawdie/src/plan.rs` (`zfs_dataset_steps`) + +### Dry-run by default + +`clawdie apply` prints the plan and exits unless `--yes` is given. `discover` +and `plan` are read-only. This protects production hosts from accidental +provisioning. + +→ `crates/clawdie/src/main.rs` (`Cmd::Apply`) + +### Pool creation is guarded against busy disks + +`--create-pool` on a non-empty disk is refused unless `--force` is also given. +The installer uses `lsblk` on Linux to detect partitions, filesystems, mount +points, and the root disk. The guard is conservative: if a disk is ambiguous, +it must be explicitly forced. + +→ `crates/clawdie/src/disk.rs` + +→ `crates/clawdie/src/main.rs` (`validate_create_device`) + +### Single unprivileged service user + +The service runs as `_clawdie` on both platforms. On FreeBSD the user is created +with `pw useradd -s /usr/sbin/nologin -d /var/db/clawdie` and exit code `65` +(already exists) is treated as a skip. On Linux `useradd --system` is used. The +state directories are then chowned to that user. + +→ `crates/clawdie/src/platform.rs` + +### Platform-specific service managers, same spec + +`Platform` is an internal trait. The two implementations differ only in how +they install and enable the unit: + +- FreeBSD: writes `/usr/local/etc/rc.d/clawdie`, uses `sysrc clawdie_enable=YES`. +- Linux: writes `/etc/systemd/system/clawdie.service`, runs `systemctl enable --now +clawdie`. + +Both use the same `ServiceSpec` (binary, user, data dir, service name). +Running `apply` across platforms therefore produces the same filesystem layout +and differs only in the service-manager wrapper. + +→ `crates/clawdie/src/platform.rs` (`FreeBsd`, `Linux`) + +### Daemon runs through the platform supervisor + +The generated FreeBSD rc.d script execs `/usr/local/bin/colibri-daemon` through +`/usr/sbin/daemon -u _clawdie` so the supervisor restarts on crash and the +process drops to the unprivileged user. The systemd unit is a simple service +with `Restart=on-failure`. + +The installer itself does not start the daemon or stage the binary; it only +creates the environment. The operator or package build stages +`colibri-daemon` and then `service clawdie start`. + +→ `docs/ISO-SERVICE-LAYOUT.md` (rc.d through daemon(8)) + +### Secrets are not written by the installer + +The installer does not touch provider API keys. A separate file — conventionally +`/usr/local/etc/colibri/provider environment file — holds secrets and is sourced by rc.d +before the daemon starts. This keeps the installer's blast radius limited to +ZFS, directories, users, and service files. + +→ [vault-provision](./vault-provision.md) + +### Steps are executed sequentially and stop on failure + +`deploy::apply` runs each `Step` in order. `Run` steps shell out and fail on a +non-zero exit unless the step declares allowed exit codes. `WriteFile` steps +create parent directories, write the file, and chmod it. If any step fails, +apply stops immediately and reports the failing command and stderr. + +→ `crates/clawdie/src/deploy.rs` + +## Plan shape + +```text +clawdie plan + ├── ZFS layout (or plain dirs) + │ ├── create /clawdie container + │ ├── create /clawdie/db -> /var/db/clawdie + │ └── create /clawdie/log -> /var/log/clawdie + └── service install + ├── create user _clawdie + ├── chown state dirs + ├── write service unit (rc.d / systemd) + ├── enable service (sysrc / systemctl) + └── [systemd] daemon-reload + start +``` + +## Typical FreeBSD install + +```sh +# discover +clawdie discover + +# preview +clawdie plan + +# provision datasets, user, and rc.d service +sudo clawdie apply --yes + +# start once the colibri-daemon binary is staged +sudo service clawdie start +``` + +## Cross-link to runtime paths + +After deployment, the service owns these paths: + +- `/var/db/clawdie/colibri.sqlite` — SQLite coordination store +- `/var/run/clawdie/clawdie.sock` — daemon Unix socket +- `/var/log/clawdie/daemon.log` — stdout/stderr log +- `/usr/local/etc/colibri/` — configuration and provider secrets + +→ [store-schema](./store-schema.md) + +→ [operator-cli](./operator-cli.md) diff --git a/docs/wiki/external-mcp.md b/docs/wiki/external-mcp.md new file mode 100644 index 0000000..c84b27a --- /dev/null +++ b/docs/wiki/external-mcp.md @@ -0,0 +1,138 @@ +# External MCP bridge + +← [index](./index.md) + +`colibri-mcp` is the Model Context Protocol bridge between Colibri and +MCP-capable editors (Zed, Cursor, Windsurf, Claude Code). It exposes the +current daemon state as MCP tools today and acts as a small MCP host for +arbitrary external stdio MCP servers as a prototype. + +## Why MCP? + +The daemon already exposes a typed Unix-socket API through +`crates/colibri-client`. MCP wraps that API into the standard JSON-RPC tool +protocol that editors already speak. This avoids the maintenance cost and +political risk of forking or embedding an editor, keeps Colibri headless-safe, +and lets any MCP-compatible client access the same surface. + +For the longer-term product framing, see ../CLAWDIE-STUDIO-PROPOSAL.md. + +## Two roles in one binary + +`colibri-mcp` serves as both: + +1. **MCP server for Colibri** — presents tools such as `colibri_status`, + `colibri_snapshot`, `colibri_list_tasks`, `colibri_create_task`, etc. +2. **MCP host for external servers** — reads a registry file, spawns configured + proc ess servers, and proxies `tools/list` and `tools/call` to them. + +Separating these roles would create a second binary for little gain; hosting +external servers is gated so the default surface stays read-only. + +## Daemon socket resolution + +The MCP server must reach the daemon. The socket path is resolved in order: + +1. `--socket` CLI flag +2. `COLIBRI_MCP_SOCKET` +3. `COLIBRI_DAEMON_SOCKET` +4. `DaemonConfig::from_env().socket_path` (env-driven defaults) + +This mirrors how the operator CLI and TUI resolve the same socket. + +## Colibri tools and gates + +| Tool | Default | Gate | +| ----------------------- | ----------- | --------------------------------- | +| `colibri_status` | read-only | none | +| `colibri_snapshot` | read-only | none | +| `colibri_list_tasks` | read-only | none | +| `colibri_list_skills` | read-only | none | +| `colibri_create_task` | write-gated | `COLIBRI_MCP_WRITE=1` / `--write` | +| `colibri_intake_task` | write-gated | `COLIBRI_MCP_WRITE=1` / `--write` | +| `colibri_set_cost_mode` | write-gated | `COLIBRI_MCP_WRITE=1` / `--write` | + +The default ISO posture is read-only. Mutating commands require the operator to +opt in explicitly, which prevents an assistant from creating tasks or switching +cost mode by accident. + +## External MCP host + +The prototype external-host tools are always exposed but only allow calling an +external tool when the separate `COLIBRI_MCP_EXTERNAL_CALL=1` / `--external-call` +flag is set. + +### Registry + +External servers are configured from a JSON registry. Default path: +/usr/local/etc/colibri/external-mcp.json. Override with +`COLIBRI_MCP_EXTERNAL_CONFIG` or `--external-config`. + +Each entry declares a command, args, optional env, and optional jail +confinement: + +```json +{ + "servers": { + "demo": { + "command": "/usr/local/bin/demo-mcp-server", + "args": ["--stdio"], + "env": { "DEMO_MODE": "1" }, + "jail": { "name": "mcp0", "root_path": "/usr/local/bastille/jails/mcp0/root" } + } + } +} +``` + +### Confinement + +External MCP servers execute arbitrary code on the operator machine, so they +reuse the same jail primitive as agent spawning: +`colibri_daemon::spawner::{prepare_spawn_command, jail_wrap, JailConfig, PrivMode}`. + +- `jail.name` enters an existing persistent jail via `jexec`. +- `jail.root_path` creates an ephemeral jail for the duration of the call. +- Omitting `jail` runs the server on the host, but stdin/stdout framing is the + same either way. + +The root-only jail step honors the shared `COLIBRI_JAIL_PRIV_MODE` policy (`mdo` +on the operator USB, `helper` on deployed hosts). See [jail-confinement](./jail-confinement.md). + +### Request lifecycle + +Every external `tools/list` or `tools/call` request: + +1. Spawns a fresh process (`ExternalMcpSession::start`) using the shared spawner. +2. Runs the MCP `initialize` handshake with protocol version `2024-11-05`. +3. Sends `tools/list` or `tools/call`, reads the response over newline-delimited + JSON, and returns the result. +4. Kills the child and removes the staged cleanup directory. + +This is intentionally simple: one process per request, no connection pool, no +streaming, no long-lived state. It is good enough for prototyping; a production +host should add policy, audit logging, secret management, and per-tool +permissions. + +## Why separate `COLIBRI_MCP_WRITE` and `COLIBRI_MCP_EXTERNAL_CALL` + +`COLIBRI_MCP_WRITE` gates mutations against the local Colibri daemon. External +tool calls execute arbitrary third-party binaries and therefore live on a +different trust surface. Requiring two separate opt-ins makes accidental +privilege escalation harder. + +## Limits and open questions + +- stdio transport only +- one external process per request +- no server/tool allowlist beyond the registry file +- no streaming tool results +- no production secret manager integration + +Those limits are recorded as explicitly accepted for now; if the prototype is +promoted to default ISO behavior, each limit should be addressed. + +## See also + +- [jail-confinement](./jail-confinement.md) — jail policy reused for external MCP servers +- [cost-model](./cost-model.md) — cost mode and the write-gated `colibri_set_cost_mode` +- [skills-catalog](./skills-catalog.md) — read-only skill catalog exposed via `colibri_list_skills` diff --git a/docs/wiki/index.md b/docs/wiki/index.md index 78c653a..41941e9 100644 --- a/docs/wiki/index.md +++ b/docs/wiki/index.md @@ -43,13 +43,22 @@ warning. ## Pages -| Page | What it covers | -| ----------------------------------------- | --------------------------------------------------------------------------------------------- | -| [agent-harness](./agent-harness.md) | The zot (agent) + Colibri (control plane) split; autospawn + RPC driver | -| [cost-model](./cost-model.md) | Byte-stable prefixes, cache-hit metering, auto-escalation, T14 compaction | -| [glasspane](./glasspane.md) | Agent state machine, JSONL streaming, AgentRuntime taxonomy, snapshot API | -| [jail-confinement](./jail-confinement.md) | Persistent vs ephemeral jails, priv-mode policy, reuse of spawner confinement for MCP servers | -| [mother-hive](./mother-hive.md) | Mother MCP architecture — forced-command SSH, single-home-in-colibri, peer auth, key-on-seed | -| [naming-decisions](./naming-decisions.md) | Ledger of harness-neutral / architecture renames — shipped and in-flight | -| [task-board](./task-board.md) | Capability match scoring, cron scheduling, intake drain, SQLite backing | -| [quality-gates](./quality-gates.md) | `ci-checks.sh` as the pre-merge gate; why drift reached `main` before | +| Page | What it covers | +| ------------------------------------------- | --------------------------------------------------------------------------------------------- | +| [agent-harness](./agent-harness.md) | The zot (agent) + Colibri (control plane) split; autospawn + RPC driver | +| [cost-model](./cost-model.md) | Byte-stable prefixes, cache-hit metering, auto-escalation, T14 compaction | +| [glasspane](./glasspane.md) | Agent state machine, JSONL streaming, AgentRuntime taxonomy, snapshot API | +| [jail-confinement](./jail-confinement.md) | Persistent vs ephemeral jails, priv-mode policy, reuse of spawner confinement for MCP servers | +| [mother-hive](./mother-hive.md) | Mother MCP architecture — forced-command SSH, single-home-in-colibri, peer auth, key-on-seed | +| [naming-decisions](./naming-decisions.md) | Ledger of harness-neutral / architecture renames — shipped and in-flight | +| [task-board](./task-board.md) | Capability match scoring, cron scheduling, intake drain, SQLite backing | +| [quality-gates](./quality-gates.md) | `ci-checks.sh` as the pre-merge gate; why drift reached `main` before | +| [contracts](./contracts.md) | Stable JSON schemas (run-manifest, runtime-inventory, provider-smoke), golden tests | +| [store-schema](./store-schema.md) | SQLite coordination schema and migration discipline | +| [external-mcp](./external-mcp.md) | MCP bridge for editors + external stdio MCP host; read/write/external-call gates | +| [operator-cli](./operator-cli.md) | The `colibri` CLI as a thin typed Unix-socket client over the daemon API | +| [tui](./tui.md) | Terminal dashboard client (colibri-tui) vs the colibri-glasspane state machine | +| [runtime-inventory](./runtime-inventory.md) | Host runtime inventory + watchdog status reader; additive, read-only integrations | +| [skills-catalog](./skills-catalog.md) | Read-only runtime consumer for reviewed Clawdie-AI skill artifacts | +| [vault-provision](./vault-provision.md) | Vaultwarden-driven env-file provisioning into jails after agent spawn | +| [deployment](./deployment.md) | Host installer (clawdie): ZFS layout, rc.d/systemd service, dry-run safety | diff --git a/docs/wiki/operator-cli.md b/docs/wiki/operator-cli.md new file mode 100644 index 0000000..78b33a7 --- /dev/null +++ b/docs/wiki/operator-cli.md @@ -0,0 +1,124 @@ +# Operator CLI (`colibri`) + +← [index](./index.md) + +The `colibri` binary is the operator's command-line interface to the daemon. +It wraps a typed Unix-socket client (`DaemonClient`) and turns typed commands +into newline-delimited JSON messages on the control-plane socket. It is not +where policy lives — policy lives in the daemon behind the socket. + +## Job of the CLI + +The CLI has two responsibilities: + +1. **Parse shell input** into strongly-typed commands. +2. **Send those commands** to the daemon and print the JSON response. + +It does not contain business logic about session compaction, task scheduling, +or jail confinement. That keeps the CLI small and lets any other client (TUI, +MCP bridge, web dashboard, tests) perform the same operations with the same +protocol. + +→ `crates/colibri-client/src/bin/colibri.rs` (argument parsing and `run` dispatch) + +→ `crates/colibri-client/src/lib.rs` (`DaemonClient` request/response wrapper) + +## Decisions + +### One binary, one socket, one protocol + +Every command — `status`, `snapshot`, `spawn-agent`, `create-task`, +`register-tenant` — goes over the same Unix socket. The CLI builds a +`DaemonClient`, serializes a `ColibriCommand`, writes one line ending in `\n`, +and reads one `ColibriResponse` line back. + +Because the protocol is newline-delimited JSON, operators can still debug with +`nc -U` or similar when the CLI is not enough. The socket is the stable API; +the CLI is a polished client. + +→ `crates/colibri-daemon/src/lib.rs` (`ColibriCommand`, `ColibriResponse`) + +→ `crates/colibri-daemon/src/socket.rs` (dispatch table) + +### Socket resolution order matches other clients + +The CLI resolves the daemon socket the same way the TUI and MCP bridge do: + +1. `--socket PATH` +2. `COLIBRI_DAEMON_SOCKET` +3. `DaemonConfig::from_env().socket_path` + +Sharing the resolution order means documentation, environment setup scripts, +and operator muscle memory apply to every client. + +→ `crates/colibri-client/src/bin/colibri.rs` (`default_socket_path`) + +### No write-gating inside the CLI itself + +Commands that mutate state (`create-task`, `kill-agent`, `set-cost-mode`, +`register-tenant`) are not blocked by CLI flags. The gate is the Unix socket +itself: the daemon is configured to listen on a unix socket with operator-only +permissions, and the daemon validates each command. This avoids two parallel +permission layers that could drift out of sync. + +This is an intentional contrast with `colibri-mcp`, which exposes the daemon to +editor assistants and therefore uses `COLIBRI_MCP_WRITE=1` as an explicit trust +switch. An operator at the shell already has that trust by virtue of the socket. + +→ [external-mcp](./external-mcp.md) + +### Commands return JSON, not human prose + +All successful CLI commands print pretty-printed JSON. This keeps the output +scriptable (`colibri snapshot | jq '.panes[] | select(.state == "working")'`) +and consistent with the socket protocol. If a command fails, the CLI prints the +daemon's error message to stderr and exits non-zero. + +→ `crates/colibri-client/src/lib.rs` (`request`, error handling) + +### `spawn-agent` accepts jail confinement directly + +The `--jail-name` and `--jail-root` flags on `spawn-local` and `spawn-agent` +build a `JailConfig` that is sent to the daemon. The same type is re-exported +from `colibri-daemon::spawner` so the CLI crate does not have to depend on the +daemon crate just to build a config. + +Pairing `--jail-name` with `--jail-root` is the only path that triggers vault +provisioning after a spawn, because the daemon needs both the jail identity +and the host-visible jail root. + +→ `crates/colibri-client/src/lib.rs` (`JailConfig` re-export) + +→ `crates/colibri-daemon/src/spawner.rs` + +### Local sample agent lives next door + +The same crate also ships `colibri-test-agent`, a tiny sample binary used by +tests and the TUI's spawn shortcut. Keeping it in `colibri-client` keeps the +sample close to its primary caller without adding a new crate. + +→ `crates/colibri-client/src/bin/colibri_test_agent.rs` + +## Notable commands + +| Command | Purpose | +| ---------------------------------------------------------------- | --------------------------------- | +| `status` | daemon health, paths, cost mode | +| `snapshot` / `glasspane-snapshot` | current pane radar view | +| `list-sessions` | active agent sessions | +| `spawn-local` / `spawn-agent` | start an agent, optionally jailed | +| `kill AGENT_ID` | terminate a pane/agent | +| `create-task` / `intake-task` / `claim-task` / `transition-task` | task-board workflow | +| `set-cost-mode MODE` | acknowledge/toggle cost mode | +| `register-tenant` / `list-tenants` | vault provisioning bookkeeping | +| `register-skill` / `list-skills` | skill catalog maintenance | +| `register-agent` / `list-agents` | agent capability registration | + +## See also + +- [tui](./tui.md) — the live terminal dashboard that uses the same `DaemonClient` +- [glasspane](./glasspane.md) — the pane state machine behind `snapshot` +- [task-board](./task-board.md) — commands that manipulate the task board +- [store-schema](./store-schema.md) — SQLite entities queried by the CLI +- [vault-provision](./vault-provision.md) — why `register-tenant` carries a jail root path +- [external-mcp](./external-mcp.md) — another daemon client with write-gating diff --git a/docs/wiki/runtime-inventory.md b/docs/wiki/runtime-inventory.md new file mode 100644 index 0000000..654c77a --- /dev/null +++ b/docs/wiki/runtime-inventory.md @@ -0,0 +1,88 @@ +# Runtime inventory and host status + +← [index](./index.md) + +Colibri discovers the host in two complementary ways: + +1. **Runtime inventory** — a one-shot probe that reports versions installed on + the machine (`node`, `npm`, `pi`, `zot`, package manager, OS, etc.). +2. **Watchdog host status** — a read-only, newline-framed Unix-socket call to + the Clawdie watchdog that returns live health metrics. + +Both are intentionally additive: they read from Clawdie, they do not change +it. This page records the design of those read-only integrations. + +## Runtime inventory probe + +The `colibri-runtime-inventory` binary (`src/bin/runtime_inventory.rs`) emits a +single JSON object matching the `clawdie.runtime-version-inventory.v1` schema +from `crates/colibri-contracts/src/lib.rs`. + +### Detection strategy + +| Field | How it is detected | +| ----------------- | ------------------------------------------------------------------------------------------------------ | +| `host` | `COLIBRI_HOST` → `HOSTNAME` → `hostname` command → `"unknown"` | +| `os` | `uname -sr` + target architecture; falls back to `std::env::consts` | +| `node` | `node --version` | +| `npm` | `npm --version` | +| `npm_prefix` | `npm config get prefix` | +| `package_manager` | `pkg` on FreeBSD, otherwise `apt` / `dnf` / `brew` | +| `pi` | `PI_BIN` → `~/.npm-global/bin/pi` → `pi --version` → package.json of `@earendil-works/pi-coding-agent` | +| `zot` | `ZOT_BIN` → `zot --version` across PATH and candidate locations | + +### Why this shape + +- `pi` is an npm package installed in `node_modules`, so version detection must + fall back to reading its package manifest when `--version` is missing. +- `zot` is a single Go binary, so a plain `--version` probe is correct. +- `pi`/`zot` are optional; a host that only runs one agent runtime should + still produce a valid inventory. + +## Watchdog host status + +`crates/colibri-runtime/src/lib.rs` implements the watchdog reader. It connects +over a Unix domain socket, sends `{"cmd":"status"}\n`, reads back one +newline-terminated JSON line, and normalizes the response into `HostStatus`. + +### Socket path resolution + +The search order lets operators, services, and test harnesses override the +socket location without recompiling: + +1. `COLIBRI_WATCHDOG_SOCKET` (explicit override) +2. `COLIBRI_SERVICE_NAME` (default `clawdie`) → `{service}-watchdog.sock` +3. `TMP_IPC_DIR/{service}-watchdog.sock` +4. `AGENT_TMP_DIR/ipc/{service}-watchdog.sock` or + `CLAWDIE_TMP_DIR/ipc/{service}-watchdog.sock` +5. `$HOME/clawdie-ai/tmp/ipc/{service}-watchdog.sock` +6. `tmp/ipc/{service}-watchdog.sock` (final fallback) + +### Wire protocol + +- Framing: one line, newline-terminated. +- Request: `{"cmd":"status"}\n`. +- Expected response: `{"ok": true, "data": { ... watchful host fields ... }}`. +- Timeout: 2 seconds by default, overridable in `WatchdogReadOptions`. + +### Normalization rules + +`normalize_watchdog_status()` in `crates/colibri-runtime/src/lib.rs` is defensive: + +- Missing fields default to `"unknown"` for strings, `0` for counters, and + `false` for booleans. +- `controlplane_status` is lifted from `controlplane.overallStatus`. +- The original raw object is preserved under `HostStatus.raw` so callers can + access fields Colibri does not yet model. + +## Golden fixtures + +`crates/colibri-contracts/tests/golden.rs` parses committed inventory and +host-status manifests in `manifests/` and round-trips them through the Rust +structs. Those fixtures come from real hosts (`osa`, `domedog`, `debby`, the +operator USB) and are treated as cross-platform source material. + +## See also + +- [contracts](./contracts.md) — stable schemas for inventory and host-status. +- [cost-model](./cost-model.md) — how runtime inventory feeds cost decisions. diff --git a/docs/wiki/skills-catalog.md b/docs/wiki/skills-catalog.md new file mode 100644 index 0000000..ddcbb08 --- /dev/null +++ b/docs/wiki/skills-catalog.md @@ -0,0 +1,161 @@ +# Skills catalog + +← [index](./index.md) + +`colibri-skills` is Colibri's read-only runtime consumer for Clawdie-AI skill +artifacts. Clawdie-AI authors and reviews the skillpacks; Colibri indexes +them, validates checksums, chunks searchable text, and exposes typed structs to +the daemon, CLI, and TUI. This crate does not author skills. + +→ `crates/colibri-skills/src/lib.rs` + +→ `docs/COLIBRI-SKILLS-PLAN.md` + +## Decisions + +### Source of truth stays in Clawdie-AI + +Skill artifacts live in the `clawdie-ai` repository, not in `colibri`. They are +committed reviewed directories containing prose, screenshots, transcripts, +scripts, a manifest, and a checksum file. `colibri-skills` imports these +artifacts into Colibri's SQLite store at runtime. + +This split preserves review discipline: a skill changes through a PR in its +home repo, then Colibri re-indexes the checkout. + +### Read-only, not authoring + +The crate deliberately lacks "create skill" or "edit skill" operations. Those +belong in Clawdie-AI where human review and media pipelines run. Putting +authoring here would duplicate state and split review authority. + +The import path is target for Phase 1: scan the configured Clawdie-AI checkout, +parse manifests, verify checksums, and upsert into SQLite. The type scaffold +exists today; the importer, chunker, and FTS5 index are planned. + +→ `docs/COLIBRI-SKILLS-PLAN.md` (Phases 1-7) + +### Manifest-driven identity + +Each skill directory contains a run manifest file. From it the importer derives: + +- `skill_id` +- `display_name` +- `source_path` within the Clawdie-AI checkout +- pipeline stages and models used +- source media metadata + +Any file not listed in the manifest can still be classified and indexed as an +artifact, but the manifest is the canonical identity document. + +### Artifact classification by extension and filename + +`ArtifactType::from_path` classifies files without relying on a sidecar: + +- Python or shell files → Script +- paths containing contact_sheet → ContactSheet +- paths containing run_manifest and ending in .json → Manifest +- paths containing sha256 or checksum → Checksum +- paths containing report and ending in .json → Report +- .md → Document +- .jpg / .png / .webp → Image +- .txt transcript files → Transcript +- anything else → Other + +This heuristic keeps classification local and fast. Misclassified files can be +fixed by renaming within Clawdie-AI. + +→ `crates/colibri-skills/src/lib.rs` (`ArtifactType::from_path`) + +### Checksums are validated, then stored + +The run manifest is accompanied by a checksum file. At import time the runtime +computes SHA-256 of each artifact and compares it to the committed checksum. +Failures are reported in `ImportSummary::checksum_failures` and prevent +`success()`. + +Only the hash is stored in SQLite; image and media blobs stay on disk. The +catalog stores relative paths and hashes, not the binary content. + +### Content is chunked into searchable units + +The planned chunker turns skill content into `SkillChunk` rows: + +- Markdown sections by heading +- Command blocks +- Code blocks +- Tables +- Transcript segments + +Chunks are the unit of search and the unit shown in TUI or CLI results. +`SkillChunk` carries `line_start`/`line_end` so a hit can point back to the +source artifact. + +→ `crates/colibri-skills/src/lib.rs` (`SkillChunk`, `ChunkType`) + +### SQLite + FTS5 as the runtime search backend + +The target schema keeps three tables: + +- `system_skills` — one row per skill +- `system_skill_artifacts` — one row per file +- `system_skill_chunks` — one row per searchable chunk, plus a virtual FTS5 + table for ranked text search + +This matches the store's pragmatic relational model. If skill volumes grow +beyond tens of thousands of chunks, we can move the FTS index to PostgreSQL +pgvector; until then, SQLite keeps the control-plane self-contained. + +→ [store-schema](./store-schema.md) + +→ `docs/COLIBRI-SKILLS-PLAN.md` (SQLite schema target) + +### Status is a lifecycle marker, not a state machine + +`SkillStatus` is `active`, `archived`, or `superseded`. There is no pending +review state because review happens in Clawdie-AI before import. Colibri simply +stops returning archived skills in default searches but keeps them in the store +for audit and explicit lookups. + +### Natural-language verification question + +Each skill can carry a `verification` field like "can the user create and run +an Astro project?". This is not an executable test; it is the acceptance +criterion used during skill review and later during agent self-verification. + +### Runtime commands are read-only + +The CLI surface is planned as: + +- `colibri list-skills` +- `colibri show-skill ` +- `colibri search-skills ` +- `colibri index-skills` +- `colibri verify-skill ` + +`index-skills` refreshes the catalog from disk. The remaining commands query the +runtime store. None mutate the Clawdie-AI checkout. + +→ [operator-cli](./operator-cli.md) + +## Entity shape + +```text +Skill + ├─ skill_id, display_name, source_path, status, verification + ├─ SkillManifest + │ ├─ run_id, created, notes + │ ├─ ManifestSource + │ ├─ [PipelineStage] + │ └─ [ModelUsage] + └─ [SkillArtifact] + ├─ artifact_type, relative_path, file_name, mime_type, size_bytes, sha256_hash + └─ [SkillChunk] + ├─ chunk_type, heading, content, line_start, line_end, tokens_estimate +``` + +## See also + +- [store-schema](./store-schema.md) — coordination and planned skill catalog tables +- [operator-cli](./operator-cli.md) — planned skill catalog CLI commands +- [task-board](./task-board.md) — agents will match claimed tasks to skills by capability diff --git a/docs/wiki/store-schema.md b/docs/wiki/store-schema.md new file mode 100644 index 0000000..2906026 --- /dev/null +++ b/docs/wiki/store-schema.md @@ -0,0 +1,140 @@ +# Store schema + +← [index](./index.md) + +Colibri's coordination store is a single SQLite database owned by the `colibri` +service. It holds the task board, the registry of agents and skills, and the +vault tenant map. It is not a cache — it is durable state. Most writes happen +through the daemon's socket API, but the schema belongs to `colibri-store`. + +→ `crates/colibri-store/src/schema.rs` + +→ `crates/colibri-store/src/lib.rs` + +## Decisions + +### SQLite, not PostgreSQL, for the control-plane store + +The store is SQLite because the control plane needs a single-file database that +is easy to back up, snapshot, inspect, and ship. PostgreSQL with pgvector is +planned for retrieval/long-term memory, but the task board and agent registry do +not need a server process. + +The daemon batches related writes and relies on SQLite's WAL mode for concurrent +readers. This keeps the operator stack self-contained on a small bare-metal host. + +### WAL + foreign keys by default + +`Store::open` runs three pragmas on every startup: + +- `journal_mode=WAL` — readers don't block writers. +- `synchronous=NORMAL` — a safe middle ground between full-synchronous and OFF. +- `foreign_keys=ON` — the task/agent FK is enforced. + +These are not configurable at runtime. If we ever need different durability or +concurrency guarantees, we should make it explicit rather than letting the +connection inherit defaults. + +→ `crates/colibri-store/src/lib.rs` (`Store::open`) + +### Idempotent migrations only + +Migrations run on every `Store::open`. They use `IF NOT EXISTS` tables and +indexes, so repeated runs are safe. We do not ship downward migrations; schema +evolution is additive tables and columns. If a destructive migration is ever +needed, it must be a deliberate manual step documented in a handoff. + +→ `crates/colibri-store/src/schema.rs` + +### Four tables for four concerns + +| Table | Concern | Key entity | +| --------- | ----------------------- | ---------- | +| `tasks` | Task board | `Task` | +| `agents` | Registered teammates | `Agent` | +| `skills` | Team skill catalog | `Skill` | +| `tenants` | Vault/secret tenant map | `Tenant` | + +Tasks carry an `agent_id` foreign key into `agents`. Every other relationship is +loose — skills are not linked to agents, and tenants are referenced by their +`tenant_id` in socket commands and provisioning hooks. + +→ `crates/colibri-store/src/schema.rs` + +### Task-status CHECK constraint is the source of truth + +`tasks.status` is constrained to `('queued','claimed','started','done','failed')`. +The Rust `TaskStatus` enum mirrors it, but the database is the final gate. A +command that tries to insert an unknown status fails at write time. + +→ `crates/colibri-store/src/schema.rs` + +### Agent capabilities stored as JSON, not normalized + +`agents.capabilities` is a JSON blob like `["code","rust","freebsd"]`. We +avoided a separate capabilities table because capability tags are just +strings, and the team registry is small. Normalized joins would add schema +complexity without improving query power. + +If capability metadata grows (weights, versions, required skills), we can split +it later; the current schema intentionally stays pragmatic. + +→ `crates/colibri-store/src/lib.rs` (`register_agent`) + +### Tenants encode the 1:1:1 jail/vault/collection map + +`tenants` stores `tenant_id`, `jail_root_path`, and `collection_id` as UNIQUE +columns. The rule is `tenant_id = jail name = Vaultwarden collection`. This +lets `colibri-vault` look up a jail by name and know exactly which host path and +Vaultwarden collection to use when writing the environment file. + +The tenant `status` column tracks the lifecycle: +`provisioned → active → stopped → destroyed`. It is independent of whether the +jail process is running; lifecycle management is a separate concern. + +→ `crates/colibri-store/src/schema.rs` (comments on `tenants`) + +### Default database path is platform-specific + +The store default is: + +- `COLIBRI_DB_PATH` if set. +- FreeBSD: `/var/db/colibri/colibri.sqlite`. +- Linux/macOS: `$XDG_DATA_HOME/colibri/colibri.sqlite`, falling back to + `$HOME/.local/share/colibri/colibri.sqlite`, then `/tmp`. + +FreeBSD defaults to `/var/db` because that is the conventional local-state +directory for services. The Linux fallback respects XDG, so development on a +workstation feels normal. + +→ `crates/colibri-store/src/lib.rs` (`default_db_path`) + +### JSON export for backups and parity tests + +`Store::export_json()` dumps all four tables into one JSON object. It exists +for dual-run parity diffs, ad-hoc backups, and debugging. It is not the primary +query API; most readers should use the typed methods. + +## Entity relationships + +```text +tasks.agent_id ----------> agents.id + + tasks agents skills tenants + ----- ------ ------ ------- + id id id tenant_id + agent_id FK name name jail_root_path + status capabilities description collection_id +title status category status +description created_at created_at created_at +created_at updated_at +updated_at +``` + +## See also + +- [task-board](./task-board.md) — task lifecycle and capability matching +- [operator-cli](./operator-cli.md) — socket commands that write to these tables +- [vault-provision](./vault-provision.md) — how the tenants table drives env-file provisioning +- [jail-confinement](./jail-confinement.md) — jail names map to tenant rows +- [skills-catalog](./skills-catalog.md) — the read-only skills consumer diff --git a/docs/wiki/tui.md b/docs/wiki/tui.md new file mode 100644 index 0000000..23f55dd --- /dev/null +++ b/docs/wiki/tui.md @@ -0,0 +1,104 @@ +# Terminal dashboard (colibri-tui) + +← [index](./index.md) + +The TUI is Colibri's live terminal dashboard. It connects to the daemon's Unix +socket, pulls the `GlasspaneSnapshot`, and renders a color-coded table of +supervised panes. It is a **display client**, not part of the daemon, and not +the same thing as `colibri-glasspane`. + +## Why it is not `colibri-glasspane` + +`colibri-glasspane` is the **state machine** that decides what state an agent +is in from its JSONL events. `colibri-tui` is the **screen** that asks the +daemon "what does the radar look like right now?" and draws it. + +| Artifact | Role | Resident crate | +| ------------------- | -------------------------------------------------------- | ------------------------------------------------------- | +| `colibri-glasspane` | Pane state machine, event ingestor, snapshot builder | `crates/colibri-glasspane` | +| `colibri-tui` | Terminal dashboard client with rows, colors, keybindings | `crates/colibri-glasspane-tui` (binary = `colibri-tui`) | + +The split matters because the daemon, the MCP bridge, the CLI, and tests all +use `colibri-glasspane`. The TUI is just one consumer. If the TUI is not +installed, or crashes, agents keep running. + +## Decisions + +### Keep the daemon separate from any terminal UI + +`colibri-tui` is a standalone process. It resolves the daemon socket the same +way the CLI does (`DaemonConfig::from_env().socket_path`), then calls +`client.glasspane_snapshot()` every two seconds. The daemon has no awareness of +crossterm or ratatui. + +This is the same "service owns state, clients render it" pattern as the MCP +bridge and the CLI. It keeps Colibri headless-safe, which is required for an +`rc.d` service that must boot before any operator logs in. + +→ `crates/colibri-glasspane-tui/src/main.rs` (socket resolution, refresh loop) + +### TUI gets spawn/kill keys, not just read-only status + +You can spawn a local test agent (`s`) and kill the selected pane (`x`) from +the dashboard. That overlaps with commands the `colibri` CLI can already do, +but the experience is different: a CLI command is one-shot; the TUI is a live +supervision surface with a selected row and an immediate status bar. + +We kept the action keys because the dashboard's job is to let an operator +notice and react — spot a stalled pane and kill it without leaving the +terminal. + +→ `crates/colibri-glasspane-tui/src/main.rs` (`spawn_agent`, `kill_selected`) + +### One taxonomy from one snapshot + +The TUI does not parse agent stdout. It only reads the already-folded +`GlasspaneSnapshot`, so Pi, zot, and local test agents are rendered with the +same columns, colors, and state icons. The rendering code concerns itself only +with layout and keybindings; all semantic decisions live in +`colibri-glasspane`. + +→ `crates/colibri-glasspane/src/lib.rs` (`AgentState`, `GlasspaneSnapshot`) + +### Naming: the binary is `colibri-tui`, the crate is `colibri-glasspane-tui` + +The crate directory is `colibri-glasspane-tui` because the package implements +"a TUI for the glasspane." The installed binary is named `colibri-tui` +because that is what an operator types. `CLAWDIE-STUDIO-PROPOSAL.md` and other +docs refer to `colibri-tui` as shorthand; there is no separate `colibri-tui` +crate. + +This duality is currently accepted. If we ever add a second TUI surface (e.g. +a `colibri-tui-web` or `colibri-tui-gui`), the naming becomes confusing and +should be revisited. + +## Current keybindings + +| Key | Action | +| ---------------------- | ----------------------------------------------- | +| `q` / `Esc` | Quit, or close detail pane if open | +| `r` | Refresh snapshot now | +| `s` | Spawn a local `colibri-test-agent` | +| `x` | Kill the selected pane | +| `Enter` | Open/close the detail pane for the selected row | +| `Tab` / `Shift-Tab` | Cycle through distinct sessions | +| `j` / `k` or `↓` / `↑` | Navigate the pane table | + +## When to use the TUI vs the CLI + +Use the TUI when: + +- You want a live, auto-refreshing view of all panes. +- You are picking a pane to inspect or kill visually. +- You are on an SSH session with only a terminal. + +Use the `colibri` CLI when: + +- You are scripting or piping output (`colibri snapshot | jq`). +- You need a command not bound to a key (e.g. `claim-task`, `set-cost-mode`). +- You want a one-shot answer without entering an alternate screen. + +## See also + +- [glasspane](./glasspane.md) — the pane state machine the TUI renders +- [operator-cli](./operator-cli.md) — the `colibri` CLI that shares the same socket client diff --git a/docs/wiki/vault-provision.md b/docs/wiki/vault-provision.md new file mode 100644 index 0000000..7720f52 --- /dev/null +++ b/docs/wiki/vault-provision.md @@ -0,0 +1,163 @@ +# Vault provision + +← [index](./index.md) + +`colibri-vault` fetches secrets from a Vaultwarden collection and writes them +into a freshly created jail as `0600` env-file. It is invoked as a post-spawn +hook from the daemon, not by a human operator at provision time. The human step +is registering a tenant mapping; the daemon does the secret fetch. + +→ `crates/colibri-vault/src/lib.rs` + +→ `crates/colibri-daemon/src/daemon.rs` (`provision_tenant_env`) + +→ `docs/VAULT-PROVISION-RUNBOOK.md` + +## Decisions + +### Tenant = jail name = Vaultwarden collection + +The `tenants` table stores a 1:1:1 map: + +- `tenant_id` — the jail name. +- `jail_root_path` — the host-visible root of the jail. +- `collection_id` — the Vaultwarden collection name (kept equal to the jail name). + +This means `colibri-vault` does not need a separate lookup table or configuration +file. It finds the collection by the jail name and knows the destination path +from the tenant row. + +→ [store-schema](./store-schema.md) + +### Provisioning is a post-spawn hook, not a separate command + +When the daemon spawns an agent with both `--jail-name` and `--jail-root`, it +calls `provision_tenant_env` after the jail is up. If the jail name has no +matching tenant row, the hook no-ops. If the provision fails, the agent still +starts, because a missing secret file should not leave the host with stale +partial jails. The daemon logs the failure. + +→ `crates/colibri-daemon/src/socket.rs` (`jail_provision_target`) + +### Fail-soft on missing tenant or vault error + +The hook returns early (and silently) when: + +- no tenant row matches the jail name; +- the stored `jail_root_path` does not match the spawned root; or +- the vault call fails. + +These are warnings, not hard errors. The spawn itself succeeds. This reflects the +operational reality that secret tooling may be unavailable during boot or +experimental spawns, while the agent process should still be observable. + +### Path containment before any write + +`colibri-vault::provision` canonicalizes the target directory and asserts it is +strictly under the configured jail-root base (`/usr/local/bastille/jails` by +default, overridable with `COLIBRI_JAIL_ROOT_BASE`). The check runs before +`create_dir_all`, so a symlink or `..` path that escapes the jails tree results +in `TargetEscapesRoot` before any file is created. + +This is the same filesystem containment primitive reused by the external MCP +server spawner. + +→ [jail-confinement](./jail-confinement.md) + +### Wrap the official `bw` CLI + +We do not speak the Vaultwarden REST protocol directly. `colibri-vault` shells +out to the official `bw` CLI. This keeps authentication, session management, and +crypto off our plate. + +The `bw` lifecycle is serialized across the process with a static `Mutex` because +`bw` keeps global state (one configured server and one session token per +process). Concurrent provisions would otherwise race on `bw config server` or +tear down each other's session. + +### Bootstrap creds come from the daemon environment + +The daemon is expected to receive three variables from the operator-provided +provider environment file: + +- `BW_CLIENTID` +- `BW_CLIENTSECRET` +- `BW_PASSWORD` + +Optional: + +- `BW_SERVER` — the Vaultwarden host. +- `COLIBRI_JAIL_ROOT_BASE` — base path used for containment checks. + +The CLI never sees these values; it only registers the tenant row that triggers +the hook. + +→ [operator-cli](./operator-cli.md) + +### Server-mismatch is fail-closed + +If `BW_SERVER` is set and `bw` is already logged in to a different server, +`provision` returns `ServerMismatch`. We do not wipe state automatically because +cross-server confusion could leak credentials. An operator must `bw logout` if +they want to switch servers. + +### Env-file content from login items and secure notes + +Each Vaultwarden collection item becomes one or more `KEY=VALUE` lines: + +- **Login item**: `item.name` becomes the key, `login.password` becomes the value. +- **Secure note**: each line is parsed as `KEY=VALUE` from the note body. + +Keys are validated to `[A-Z0-9_]` after normalizing spaces, dashes, and dots +to underscores. Invalid keys are skipped with a warning. + +Note: a key collision between two items produces a duplicate line. The consumer +is expected to ignore duplicates or define items accordingly. + +### File mode and atomic-ish placement + +The env file is written into the target directory and set to mode `0600`. The +target directory is created if it does not exist, but it must already resolve +under the jail-root base. The write is a single `std::fs::write`, then a +permission change; it is not atomic-swap. If the daemon crashes between the +write and the `chmod`, the file could momentarily have looser permissions. For +now, we accept this because the daemon has the directory created immediately +before the write and the target is inside the jail. + +### Tenant status follows the provision state + +`register_tenant` inserts the row with `status = provisioned`. After a successful +vault provision, the hook flips it to `active`. A stopped or destroyed jail may +later be moved to `stopped` or `destroyed` by the operator or a teardown flow. + +Strictly, `provisioned` means the row is created; `active` means the secrets +have been materialized at least once. + +## Flow + +```text +register-tenant tenant_id jail_root collection_id + | + v +spawn-agent --jail-name tenant_id --jail-root jail_root + | + v +provision_tenant_env(tenant_id, jail_root) + |-- no tenant row -> no-op + |-- root mismatch -> warn, no-op + |-- else + v + bw login -> unlock -> list collection -> list items -> write env file @ 0600 + | + v + set tenant status = active + agent starts running +``` + +## See also + +- [store-schema](./store-schema.md) — how the tenant row is stored +- [jail-confinement](./jail-confinement.md) — how jails are created and confined +- [operator-cli](./operator-cli.md) — `register-tenant` and `spawn-agent` verbs +- [mother-hive](./mother-hive.md) — a related Vaultwarden-backed pubkey exchange + used to authorize agents to call mother