diff --git a/docs/guide/architecture/colibri.md b/docs/guide/architecture/colibri.md index c6e8824..ec51f77 100644 --- a/docs/guide/architecture/colibri.md +++ b/docs/guide/architecture/colibri.md @@ -1,193 +1,84 @@ --- title: Colibri -description: The Clawdie-AI event and control fabric — Pi-centric, ingesting Pi events, watchdog host status, run manifests, and runtime inventories. +description: Cross-platform Rust control plane — agent supervision, task scheduling, cost tracking, and MCP bridge for the Clawdie operator USB and bare-metal deployment. --- -Colibri is the Clawdie-AI **event and control fabric**. It normalizes what the -agent runtime and the host already emit — Pi events, watchdog status, cross-host -run manifests, runtime inventories — into structured inputs that update -Clawdie's task and control state. +Colibri is the Clawdie **control plane**. A small, cross-platform (FreeBSD + +Linux) Rust daemon that supervises agents, manages the task board, tracks cost, +and provides a live dashboard for the operator. -Colibri is a coordination layer, not a new runtime and not a dashboard. Agent -reasoning happens in **Pi**; host safety stays in the **watchdog**; privileged -operations stay in **hostd**. Colibri reads their outputs and gives the operator -one coherent control state instead of many ad-hoc surfaces. +It replaces the TypeScript control plane previously in clawdie-ai. The v0.12 +release is a complete Rust rewrite. -:::note -The ingestion modules described here are implemented and tested -(`src/colibri-*.ts`). The wider simplification — collapsing legacy multi-runner -orchestration onto Pi — now continues on `main` behind the -[proof gates](#proof-gates-before-removing-legacy-paths). Source plan: -`doc/COLIBRI-PI-CONTROL-PLAN.md`. -::: +## Architecture -## What Colibri is — and isn't - -Colibri grew out of evaluating a Linux terminal multiplexer (Herdr) as a control -surface. The conclusion: Herdr cannot be the cross-host bus or the FreeBSD -runtime (it is Linux/macOS-only and AGPL-licensed). So Colibri is **our own** -fabric, and the agent runtime is **Pi only**. - -- **It is** a normalizer and aggregator of structured feeds — Pi JSONL events, - watchdog status, run manifests, runtime inventories — into Clawdie control - state. -- **It is not** a second agent runtime, a replacement for the watchdog or - `hostd`, or a bundle of third-party operator tools. -- **Herdr** stays an _optional, Linux-side_ operator dashboard / terminal - surface — never a FreeBSD dependency, and never vendored into Clawdie. - -## Core decisions - -- Use **Pi** as the only agent runtime. -- Keep **Herdr** as an optional Linux-side surface — not a cross-host message bus - and not the FreeBSD runtime. -- **Colibri** is the Clawdie-AI control/event fabric name. -- Do not vendor or copy Herdr code; do not port Herdr to FreeBSD. -- Do not delete existing orchestration until the replacement loop is proven. - -## Component map - -```text -Pi — agent reasoning/runtime; emits structured JSON/SDK events -Colibri — Clawdie-AI event/control fabric; consumes Pi events + host feeds, - updates Clawdie task/control state -watchdog — FreeBSD runtime governor (keep); its status socket is a Colibri input -hostd — privileged host operations: bastille, zfs, pf, services, packages (keep) -Herdr — optional Linux operator dashboard/terminal surface -doctor / pi-profile — existing watchdog-status consumers; must keep working +``` +colibri-daemon — Unix-socket server (the always-on supervisor) + ├── glasspane agent state machine + JSONL streaming (the "radar") + ├── store SQLite coordination (tasks, agents, tenants, skills) + ├── scheduler cron/interval/once job execution + capability matching + ├── cost tracker cache-hit metering, auto-escalation, budget enforcement + ├── spawner agent subprocess lifecycle (zot, pi, jail confinement) + ├── mcp bridge editor integration (stdio) + external MCP host (jailed) + ├── TUI dashboard ratatui terminal supervision surface + └── CLI client typed Unix-socket client for operator commands ``` -## Ingestion modules +**Key crates** (`cargo build --workspace --release`): -Colibri's implemented surface is a set of pure parsers/normalizers plus one -read-only socket client. With the exception of the host-status reader's socket, -they have no FreeBSD dependency, so Linux agents can develop and test them. +| Crate | Role | +| ----------------------- | --------------------------------------------------------- | +| `colibri-daemon` | Socket server, agent lifecycle, scheduler loop | +| `colibri-store` | Embedded SQLite — tasks, agents, tenants, skills | +| `colibri-glasspane` | Agent state machine (Idle → Working → Done/Error/Stalled) | +| `colibri-glasspane-tui` | ratatui dashboard — live pane supervision | +| `colibri-client` | CLI (`colibri status`, `colibri spawn`, `colibri tasks`) | +| `colibri-skills` | Read-only skills catalog | +| `colibri-mcp` | MCP bridge — editor integration + external MCP host | +| `colibri-deepseek` | Cache-hit probe + prefix metering for DeepSeek | +| `colibri-contracts` | Manifest/capability/event schemas (golden tests) | +| `colibri-runtime` | Host status ingestion, runtime inventory | +| `clawdie` | Host installer/deployer (ZFS layout, rc.d service) | -| Module | Purpose | Schema / source | -| ------------------------------ | -------------------------------------------------- | -------------------------------------- | -| `colibri-pi-events.ts` | Normalize Pi `--mode json` JSONL into typed events | flat top-level `type` records | -| `colibri-pi-run.ts` | Summarize a whole Pi run (counts, tools, usage) | `pi-jsonl` | -| `colibri-host-status.ts` | Read the watchdog IPC socket into a status record | `watchdog-socket` | -| `colibri-run-manifest.ts` | Validate cross-host inter-agent run manifests | `clawdie.interagent.run-manifest.v1` | -| `colibri-runtime-inventory.ts` | Validate runtime inventories; compute drift | `clawdie.runtime-version-inventory.v1` | +## Agent model -### Pi event ingestion +Colibri supervises agents — it does not contain them. Two harnesses, one +taxonomy: -`colibri-pi-events.ts` turns Pi's `--mode json` output — flat, newline-framed -JSON records — into a normalized `ColibriPiEvent` union. The first stdout line is -the session header (`{"type":"session","id":…,"cwd":…}`); subsequent lines are -events. `parsePiJsonLine` / `parsePiJsonLines` are total and never throw — -malformed lines surface as parse errors rather than crashing. +| Harness | Role | Default since | +| ------- | -------------------------------------------------------------- | ------------- | +| **zot** | Go binary, RPC mode, ~25 providers, Telegram bot, skill-driven | v0.12 | +| **pi** | Original agent, kept as spawnable fallback | — | -Normalized event kinds: +Agents emit JSONL events on stdout. Glasspane reads the stream line-by-line, +folds events into a 5-state machine, and exposes a snapshot API for the daemon, +TUI, and CLI. The mapping is harness-specific — zot events (`turn_start`/`done`) +and pi events (`agent_start`/`turn_end`) both resolve to the same `AgentState` +enum via `zot_event_type()`. -```text -pi.session_started pi.message_text_delta pi.tool_finished -pi.agent_started pi.message_finished pi.queue_updated -pi.agent_finished pi.tool_started pi.compaction_started/finished -pi.turn_started pi.tool_updated pi.retry_started/finished -pi.turn_finished pi.message_started pi.unknown +## How it works with mother + +USB nodes connect to the mother node via MCP over SSH: + +``` +USB node Mother (OSA) + colibri_daemon PostgreSQL + └── external MCP host ──SSH──→ colibri-mcp-ssh (forced-command) + └── colibri-mcp + └── node-register-mcp + └── hive_nodes ``` -`colibri-pi-run.ts` rolls a full run up with `summarizeColibriPiRun(raw)`, which -returns a `ColibriPiRunSummary`: `sessionId`, `cwd`, per-kind `eventCounts`, -`toolNames`, `parseErrorCount`, `textDeltaChars`, `finalAssistantText`, and -`runtimeUsage`. `buildColibriPiTaskResult` maps that to the values Clawdie -records per run — `tokensUsed`, `output`, `actualProvider`, `actualModel`, -`costTotalUsd` — which line up with the control plane's `actual_*` -[runtime observability](../controlplane/#runtime-observability) fields. +The mother maintains a registry of all nodes with hardware profiles, derived +capabilities (GPU, RAM, WiFi), and OS version. The daemon's autospawned zot +reads `CLAWDIE_HW_PROFILE` and calls `node_register` on first boot. -### Host-status ingestion +## See also -`colibri-host-status.ts` connects to the watchdog Unix socket, sends -`{"cmd":"status"}`, reads the newline-framed `{ok,data}` reply, and normalizes it -into a `ColibriHostStatus` record: - -```text -source: 'watchdog-socket' -mode, throttled, freeMemoryMB, activeJails, queuedGroups, controlplaneStatus -``` - -It mirrors the wire protocol already used by `doctor.ts`, is **additive and -read-only** (a new consumer alongside `doctor`, not a change to the watchdog), -and never throws — every failure resolves to `{ ok: false, error }`. - -### Inter-agent run manifests - -Cross-host coordination (for example, the network-throughput playground) is -exchanged as structured manifests, not free-form chat or raw captures. -`colibri-run-manifest.ts` validates the `clawdie.interagent.run-manifest.v1` -schema with `parseColibriRunManifest` / `parseColibriRunManifestJson` and renders -a compact `` text block via `summarizeColibriRunManifest`. - -Fields: `test_id`, `role`, `host`, `agent?`, `started_at`, `ended_at?`, -`protocols`, `network`, `artifacts`, `summary`, `raw_transfer_required`, -`notes`. Raw pcaps stay out of git; the manifest is the structured handoff. - -### Runtime version inventory and drift - -`colibri-runtime-inventory.ts` validates the -`clawdie.runtime-version-inventory.v1` schema (`host`, `os`, `node?`, `npm?`, -`pi?`, `npm_prefix?`, `package_manager?`, `iso_npm_globals_pin`, `notes`). Each -host emits its own inventory; `buildRuntimeDriftReport` compares them against a -target (default Node major **24**, optional pinned Pi version) and returns which -hosts drift on Node, Pi, a missing Pi, or an ISO npm-globals pin. -`summarizeRuntimeDriftReport` renders a `` block. - -## Relationship to the watchdog - -The watchdog is **load-bearing runtime safety**, not a dashboard. Colibri reads -its socket as an input; it does not own, replace, or merge it. `src/watchdog.ts`: - -- reads FreeBSD free memory via `sysctl vm.stats.vm.v_free_count` -- throttles jail-queue concurrency under memory pressure -- exposes run modes (`auto`, `slow`, `fast`, `permanent`) -- answers structured IPC: `{"cmd":"status"}` and `{"cmd":"mode","value":…}` - -`doctor` and `pi-profile` are existing consumers of that status and must keep -working — they are hard gates, not deletion targets. - -## Proof gates before removing legacy paths - -No legacy runner or status code is removed until all of these hold: - -1. Pi runs an end-to-end task on Linux. -2. Pi runs an end-to-end task on FreeBSD. -3. Pi JSON/SDK events map into Clawdie task/activity state. -4. A DeepSeek lane works through Pi on Linux using `--mode json`. -5. Colibri consumes watchdog status without breaking `doctor` or `pi-profile`. -6. Herdr can display/launch the Linux operator workflow without becoming a - required FreeBSD dependency. -7. The network-throughput coordination test has produced structured manifests - from at least two hosts. - -## Runtime drift and version sync - -Colibri is also the coordination layer for runtime hygiene: - -```text -Node: 24.x on Linux and FreeBSD -Pi: pinned per host inventory, not assumed from PATH -ISO npm globals: pinned in the ISO repo, not fetched from moving latest tags -``` - -Each host emits a small inventory manifest; the coordinator compares manifests -before upgrades; FreeBSD package actions stay locally authorized and -rollback-aware. The supporting skills are `colibri-provider-verify` (validates a -provider lane through Pi `--mode json`) and `runtime-version-sync` (inventories -and aligns Node, Pi, npm globals, and ISO pins across hosts). - -## Non-goals - -- No Herdr FreeBSD port; no Herdr vendoring. -- No deletion of the watchdog or `hostd`. -- No replacement of FreeBSD local safety with Linux-side orchestration. -- No broad agent-backend deletion before caller inventory and proof gates. - -## References - -- [Control Plane](../controlplane/) — the orchestration layer Colibri feeds. -- [Provider Fallback](../operate/provider-fallback/) — provider switching that - produces the `effective_*` vs `actual_*` divergence Colibri records. -- `doc/COLIBRI-PI-CONTROL-PLAN.md` — the source plan and phase breakdown. -- `doc/INTERAGENT-RUN-CONTRACT.md` — the inter-agent run-manifest contract. +- [Control plane](../controlplane/) — the pre-v0.12 TypeScript architecture (historical) +- [Agent harness](../../wiki/agent-harness/) — wiki: zot + Colibri split, autospawn +- [Cost model](../../wiki/cost-model/) — wiki: cache-hit metering, auto-escalation +- [Glasspane](../../wiki/glasspane/) — wiki: state machine, JSONL streaming +- [Task board](../../wiki/task-board/) — wiki: capability matching, scheduling +- [Jail confinement](../../wiki/jail-confinement/) — wiki: persistent vs ephemeral jails +- [Mother hive](../../wiki/mother-hive/) — wiki: MCP architecture, peer auth diff --git a/docs/guide/architecture/control-plane-bridge.md b/docs/guide/architecture/control-plane-bridge.md index 8372ba4..a6fc328 100644 --- a/docs/guide/architecture/control-plane-bridge.md +++ b/docs/guide/architecture/control-plane-bridge.md @@ -25,10 +25,10 @@ operator / peer host bridged host The bridge is a thin `socat` front-end, supervised by the host's service manager. Both sides are shipped in the repo: -| Host | Service | Packaging | -| --- | --- | --- | -| FreeBSD | rc.d `colibri_bridge` | `packaging/freebsd/colibri_bridge.in` | -| Linux | systemd `colibri-bridge.service` | `packaging/linux/` (unit + env + nft + README) | +| Host | Service | Packaging | +| ------- | -------------------------------- | ---------------------------------------------- | +| FreeBSD | rc.d `colibri_bridge` | `packaging/freebsd/colibri_bridge.in` | +| Linux | systemd `colibri-bridge.service` | `packaging/linux/` (unit + env + nft + README) | Both run effectively: @@ -51,7 +51,7 @@ native firewall: - **Linux (ufw):** `ufw allow in on tailscale0 to any port 9190 proto tcp` On a default-deny host (e.g. ufw), the public side is already blocked, so only -the interface-scoped *allow* is needed. The `packaging/linux/colibri-bridge.nft` +the interface-scoped _allow_ is needed. The `packaging/linux/colibri-bridge.nft` ruleset is provided for Linux hosts that do **not** run ufw (a default-accept input chain); under ufw it is redundant. diff --git a/docs/guide/install/install.md b/docs/guide/install/install.md index f018c51..b528646 100644 --- a/docs/guide/install/install.md +++ b/docs/guide/install/install.md @@ -272,7 +272,7 @@ but can be disabled), or **optional** (skipped unless explicitly enabled). | service | required | — | | hostd | required | — | | identity-restore | optional | `SUPABASE_URL` not set | -| verify | optional | warn on most check failures; fail on broken runtime integrity | +| verify | optional | warn on most check failures; fail on compromised runtime integrity | A required step failure stops the install immediately and prints the resume command. Default steps ship enabled (`FEATURE_GITEA=YES`, artifact.sql bundled) diff --git a/docs/guide/operate/docs-publishing.md b/docs/guide/operate/docs-publishing.md index 934252b..0b5030e 100644 --- a/docs/guide/operate/docs-publishing.md +++ b/docs/guide/operate/docs-publishing.md @@ -157,7 +157,7 @@ verification output: Every enabled site currently has served output where the platform expects it. - `inconsistent` The live output and the publish manifest disagree. This is the state to treat - as broken. + as inconsistent. ## What verify checks diff --git a/docs/guide/operate/terminal-capture.md b/docs/guide/operate/terminal-capture.md index 9eb4dcb..eabdbd2 100644 --- a/docs/guide/operate/terminal-capture.md +++ b/docs/guide/operate/terminal-capture.md @@ -6,7 +6,7 @@ description: Deduplicated tmux pane history with edge-triggered failure alerts. Terminal capture is the screen-scraping half of Glasspane. Where the rest of Glasspane derives agent state from structured JSONL events, this layer records the **actual terminal text** of a pane and triages it against known patterns — -so Colibri can both *remember* what a terminal showed and *speak up* the moment +so Colibri can both _remember_ what a terminal showed and _speak up_ the moment something it recognises goes wrong. It lives in `colibri-glasspane` (`terminal.rs`, `signatures.rs`) and is driven @@ -18,14 +18,14 @@ by the `colibri-daemon` poll loop. Identical screens produce identical ids. - **Deduplicated history.** The recorder drops any frame whose hash equals the previous one, so polling a near-static pane every few seconds collapses into a - compact log of *actual* state transitions, not thousands of duplicates. The + compact log of _actual_ state transitions, not thousands of duplicates. The history is a bounded ring buffer per pane. - **Signature triage.** Each captured frame is scanned by a `SignatureSet`. A signature carries a severity (`error`/`warn`/`info`/`ok`), a plain-language `next_action`, and an optional `invoke` (a skill to run to remediate). Matches are classified into `failures` / `warnings` / `info` / `healthy`. - **Edge-triggered alerts.** A failure/warning is reported only on the frame - where it *first appears* — not on every subsequent frame that still shows it. + where it _first appears_ — not on every subsequent frame that still shows it. When the condition clears and later recurs, it fires again. This is what keeps a persistent error from spamming alerts. @@ -42,12 +42,12 @@ different set; the matcher is shared. Set on the daemon's environment (off by default): -| Variable | Purpose | Default | -| --- | --- | --- | -| `COLIBRI_TERMINAL_CAPTURE` | Enable the poll loop (`1`/`true`/`yes`/`on`) | off | -| `COLIBRI_TERMINAL_CAPTURE_INTERVAL_SECS` | Seconds between captures of each watched pane | `5` | -| `COLIBRI_TERMINAL_WATCH` | Comma-separated tmux targets to watch from startup | _(none)_ | -| `TELEGRAM_BOT_TOKEN` / `TELEGRAM_CHAT_ID` | Route edge-triggered alerts to Telegram | _(unset → log only)_ | +| Variable | Purpose | Default | +| ----------------------------------------- | -------------------------------------------------- | -------------------- | +| `COLIBRI_TERMINAL_CAPTURE` | Enable the poll loop (`1`/`true`/`yes`/`on`) | off | +| `COLIBRI_TERMINAL_CAPTURE_INTERVAL_SECS` | Seconds between captures of each watched pane | `5` | +| `COLIBRI_TERMINAL_WATCH` | Comma-separated tmux targets to watch from startup | _(none)_ | +| `TELEGRAM_BOT_TOKEN` / `TELEGRAM_CHAT_ID` | Route edge-triggered alerts to Telegram | _(unset → log only)_ | When the bot token/chat id are unset, alerts degrade cleanly to a daemon log line — the feature is safe to leave enabled without Telegram configured. @@ -56,13 +56,13 @@ line — the feature is safe to leave enabled without Telegram configured. Over the Colibri socket (newline-delimited JSON): -| Command | Effect | -| --- | --- | -| `{"cmd":"terminal-watch","target":"clawdie:0"}` | Start recording a tmux target (session / `session:window` / `%pane`) | -| `{"cmd":"terminal-unwatch","target":"clawdie:0"}` | Stop recording and drop the pane's history | -| `{"cmd":"terminal-list"}` | Watched panes with frame counts and currently-firing alerts | -| `{"cmd":"terminal-history","target":"clawdie:0","limit":20}` | Recent recorded frames (text + detection) for a pane | -| `{"cmd":"terminal-poll","target":"clawdie:0"}` | Capture now instead of waiting for the tick (`target` optional → all) | +| Command | Effect | +| ------------------------------------------------------------ | --------------------------------------------------------------------- | +| `{"cmd":"terminal-watch","target":"clawdie:0"}` | Start recording a tmux target (session / `session:window` / `%pane`) | +| `{"cmd":"terminal-unwatch","target":"clawdie:0"}` | Stop recording and drop the pane's history | +| `{"cmd":"terminal-list"}` | Watched panes with frame counts and currently-firing alerts | +| `{"cmd":"terminal-history","target":"clawdie:0","limit":20}` | Recent recorded frames (text + detection) for a pane | +| `{"cmd":"terminal-poll","target":"clawdie:0"}` | Capture now instead of waiting for the tick (`target` optional → all) | `terminal-poll` returns, per pane, whether the frame was `recorded` or `unchanged` (deduped) and any `new_alerts` that fired on this capture. diff --git a/docs/guide/reference/sdk-deep-dive.md b/docs/guide/reference/sdk-deep-dive.md index 5f1d813..f8842f8 100644 --- a/docs/guide/reference/sdk-deep-dive.md +++ b/docs/guide/reference/sdk-deep-dive.md @@ -382,7 +382,7 @@ With V1 `query()` + string prompt + agent teams: Instead of passing a string prompt (which sets `isSingleUserTurn = true`), pass an `AsyncIterable`: ```typescript -// Before (broken for agent teams): +// Before (not suitable for agent teams): query({ prompt: "do something" }) // After (keeps CLI alive):