docs/guide-port #211
6 changed files with 89 additions and 198 deletions
|
|
@ -1,193 +1,84 @@
|
|||
---
|
||||
title: Colibri
|
||||
description: The Clawdie-AI event and control fabric — Pi-centric, ingesting Pi events, watchdog host status, run manifests, and runtime inventories.
|
||||
description: Cross-platform Rust control plane — agent supervision, task scheduling, cost tracking, and MCP bridge for the Clawdie operator USB and bare-metal deployment.
|
||||
---
|
||||
|
||||
Colibri is the Clawdie-AI **event and control fabric**. It normalizes what the
|
||||
agent runtime and the host already emit — Pi events, watchdog status, cross-host
|
||||
run manifests, runtime inventories — into structured inputs that update
|
||||
Clawdie's task and control state.
|
||||
Colibri is the Clawdie **control plane**. A small, cross-platform (FreeBSD +
|
||||
Linux) Rust daemon that supervises agents, manages the task board, tracks cost,
|
||||
and provides a live dashboard for the operator.
|
||||
|
||||
Colibri is a coordination layer, not a new runtime and not a dashboard. Agent
|
||||
reasoning happens in **Pi**; host safety stays in the **watchdog**; privileged
|
||||
operations stay in **hostd**. Colibri reads their outputs and gives the operator
|
||||
one coherent control state instead of many ad-hoc surfaces.
|
||||
It replaces the TypeScript control plane previously in clawdie-ai. The v0.12
|
||||
release is a complete Rust rewrite.
|
||||
|
||||
:::note
|
||||
The ingestion modules described here are implemented and tested
|
||||
(`src/colibri-*.ts`). The wider simplification — collapsing legacy multi-runner
|
||||
orchestration onto Pi — now continues on `main` behind the
|
||||
[proof gates](#proof-gates-before-removing-legacy-paths). Source plan:
|
||||
`doc/COLIBRI-PI-CONTROL-PLAN.md`.
|
||||
:::
|
||||
## Architecture
|
||||
|
||||
## What Colibri is — and isn't
|
||||
|
||||
Colibri grew out of evaluating a Linux terminal multiplexer (Herdr) as a control
|
||||
surface. The conclusion: Herdr cannot be the cross-host bus or the FreeBSD
|
||||
runtime (it is Linux/macOS-only and AGPL-licensed). So Colibri is **our own**
|
||||
fabric, and the agent runtime is **Pi only**.
|
||||
|
||||
- **It is** a normalizer and aggregator of structured feeds — Pi JSONL events,
|
||||
watchdog status, run manifests, runtime inventories — into Clawdie control
|
||||
state.
|
||||
- **It is not** a second agent runtime, a replacement for the watchdog or
|
||||
`hostd`, or a bundle of third-party operator tools.
|
||||
- **Herdr** stays an _optional, Linux-side_ operator dashboard / terminal
|
||||
surface — never a FreeBSD dependency, and never vendored into Clawdie.
|
||||
|
||||
## Core decisions
|
||||
|
||||
- Use **Pi** as the only agent runtime.
|
||||
- Keep **Herdr** as an optional Linux-side surface — not a cross-host message bus
|
||||
and not the FreeBSD runtime.
|
||||
- **Colibri** is the Clawdie-AI control/event fabric name.
|
||||
- Do not vendor or copy Herdr code; do not port Herdr to FreeBSD.
|
||||
- Do not delete existing orchestration until the replacement loop is proven.
|
||||
|
||||
## Component map
|
||||
|
||||
```text
|
||||
Pi — agent reasoning/runtime; emits structured JSON/SDK events
|
||||
Colibri — Clawdie-AI event/control fabric; consumes Pi events + host feeds,
|
||||
updates Clawdie task/control state
|
||||
watchdog — FreeBSD runtime governor (keep); its status socket is a Colibri input
|
||||
hostd — privileged host operations: bastille, zfs, pf, services, packages (keep)
|
||||
Herdr — optional Linux operator dashboard/terminal surface
|
||||
doctor / pi-profile — existing watchdog-status consumers; must keep working
|
||||
```
|
||||
colibri-daemon — Unix-socket server (the always-on supervisor)
|
||||
├── glasspane agent state machine + JSONL streaming (the "radar")
|
||||
├── store SQLite coordination (tasks, agents, tenants, skills)
|
||||
├── scheduler cron/interval/once job execution + capability matching
|
||||
├── cost tracker cache-hit metering, auto-escalation, budget enforcement
|
||||
├── spawner agent subprocess lifecycle (zot, pi, jail confinement)
|
||||
├── mcp bridge editor integration (stdio) + external MCP host (jailed)
|
||||
├── TUI dashboard ratatui terminal supervision surface
|
||||
└── CLI client typed Unix-socket client for operator commands
|
||||
```
|
||||
|
||||
## Ingestion modules
|
||||
**Key crates** (`cargo build --workspace --release`):
|
||||
|
||||
Colibri's implemented surface is a set of pure parsers/normalizers plus one
|
||||
read-only socket client. With the exception of the host-status reader's socket,
|
||||
they have no FreeBSD dependency, so Linux agents can develop and test them.
|
||||
| Crate | Role |
|
||||
| ----------------------- | --------------------------------------------------------- |
|
||||
| `colibri-daemon` | Socket server, agent lifecycle, scheduler loop |
|
||||
| `colibri-store` | Embedded SQLite — tasks, agents, tenants, skills |
|
||||
| `colibri-glasspane` | Agent state machine (Idle → Working → Done/Error/Stalled) |
|
||||
| `colibri-glasspane-tui` | ratatui dashboard — live pane supervision |
|
||||
| `colibri-client` | CLI (`colibri status`, `colibri spawn`, `colibri tasks`) |
|
||||
| `colibri-skills` | Read-only skills catalog |
|
||||
| `colibri-mcp` | MCP bridge — editor integration + external MCP host |
|
||||
| `colibri-deepseek` | Cache-hit probe + prefix metering for DeepSeek |
|
||||
| `colibri-contracts` | Manifest/capability/event schemas (golden tests) |
|
||||
| `colibri-runtime` | Host status ingestion, runtime inventory |
|
||||
| `clawdie` | Host installer/deployer (ZFS layout, rc.d service) |
|
||||
|
||||
| Module | Purpose | Schema / source |
|
||||
| ------------------------------ | -------------------------------------------------- | -------------------------------------- |
|
||||
| `colibri-pi-events.ts` | Normalize Pi `--mode json` JSONL into typed events | flat top-level `type` records |
|
||||
| `colibri-pi-run.ts` | Summarize a whole Pi run (counts, tools, usage) | `pi-jsonl` |
|
||||
| `colibri-host-status.ts` | Read the watchdog IPC socket into a status record | `watchdog-socket` |
|
||||
| `colibri-run-manifest.ts` | Validate cross-host inter-agent run manifests | `clawdie.interagent.run-manifest.v1` |
|
||||
| `colibri-runtime-inventory.ts` | Validate runtime inventories; compute drift | `clawdie.runtime-version-inventory.v1` |
|
||||
## Agent model
|
||||
|
||||
### Pi event ingestion
|
||||
Colibri supervises agents — it does not contain them. Two harnesses, one
|
||||
taxonomy:
|
||||
|
||||
`colibri-pi-events.ts` turns Pi's `--mode json` output — flat, newline-framed
|
||||
JSON records — into a normalized `ColibriPiEvent` union. The first stdout line is
|
||||
the session header (`{"type":"session","id":…,"cwd":…}`); subsequent lines are
|
||||
events. `parsePiJsonLine` / `parsePiJsonLines` are total and never throw —
|
||||
malformed lines surface as parse errors rather than crashing.
|
||||
| Harness | Role | Default since |
|
||||
| ------- | -------------------------------------------------------------- | ------------- |
|
||||
| **zot** | Go binary, RPC mode, ~25 providers, Telegram bot, skill-driven | v0.12 |
|
||||
| **pi** | Original agent, kept as spawnable fallback | — |
|
||||
|
||||
Normalized event kinds:
|
||||
Agents emit JSONL events on stdout. Glasspane reads the stream line-by-line,
|
||||
folds events into a 5-state machine, and exposes a snapshot API for the daemon,
|
||||
TUI, and CLI. The mapping is harness-specific — zot events (`turn_start`/`done`)
|
||||
and pi events (`agent_start`/`turn_end`) both resolve to the same `AgentState`
|
||||
enum via `zot_event_type()`.
|
||||
|
||||
```text
|
||||
pi.session_started pi.message_text_delta pi.tool_finished
|
||||
pi.agent_started pi.message_finished pi.queue_updated
|
||||
pi.agent_finished pi.tool_started pi.compaction_started/finished
|
||||
pi.turn_started pi.tool_updated pi.retry_started/finished
|
||||
pi.turn_finished pi.message_started pi.unknown
|
||||
## How it works with mother
|
||||
|
||||
USB nodes connect to the mother node via MCP over SSH:
|
||||
|
||||
```
|
||||
USB node Mother (OSA)
|
||||
colibri_daemon PostgreSQL
|
||||
└── external MCP host ──SSH──→ colibri-mcp-ssh (forced-command)
|
||||
└── colibri-mcp
|
||||
└── node-register-mcp
|
||||
└── hive_nodes
|
||||
```
|
||||
|
||||
`colibri-pi-run.ts` rolls a full run up with `summarizeColibriPiRun(raw)`, which
|
||||
returns a `ColibriPiRunSummary`: `sessionId`, `cwd`, per-kind `eventCounts`,
|
||||
`toolNames`, `parseErrorCount`, `textDeltaChars`, `finalAssistantText`, and
|
||||
`runtimeUsage`. `buildColibriPiTaskResult` maps that to the values Clawdie
|
||||
records per run — `tokensUsed`, `output`, `actualProvider`, `actualModel`,
|
||||
`costTotalUsd` — which line up with the control plane's `actual_*`
|
||||
[runtime observability](../controlplane/#runtime-observability) fields.
|
||||
The mother maintains a registry of all nodes with hardware profiles, derived
|
||||
capabilities (GPU, RAM, WiFi), and OS version. The daemon's autospawned zot
|
||||
reads `CLAWDIE_HW_PROFILE` and calls `node_register` on first boot.
|
||||
|
||||
### Host-status ingestion
|
||||
## See also
|
||||
|
||||
`colibri-host-status.ts` connects to the watchdog Unix socket, sends
|
||||
`{"cmd":"status"}`, reads the newline-framed `{ok,data}` reply, and normalizes it
|
||||
into a `ColibriHostStatus` record:
|
||||
|
||||
```text
|
||||
source: 'watchdog-socket'
|
||||
mode, throttled, freeMemoryMB, activeJails, queuedGroups, controlplaneStatus
|
||||
```
|
||||
|
||||
It mirrors the wire protocol already used by `doctor.ts`, is **additive and
|
||||
read-only** (a new consumer alongside `doctor`, not a change to the watchdog),
|
||||
and never throws — every failure resolves to `{ ok: false, error }`.
|
||||
|
||||
### Inter-agent run manifests
|
||||
|
||||
Cross-host coordination (for example, the network-throughput playground) is
|
||||
exchanged as structured manifests, not free-form chat or raw captures.
|
||||
`colibri-run-manifest.ts` validates the `clawdie.interagent.run-manifest.v1`
|
||||
schema with `parseColibriRunManifest` / `parseColibriRunManifestJson` and renders
|
||||
a compact `<colibri-run-manifest>` text block via `summarizeColibriRunManifest`.
|
||||
|
||||
Fields: `test_id`, `role`, `host`, `agent?`, `started_at`, `ended_at?`,
|
||||
`protocols`, `network`, `artifacts`, `summary`, `raw_transfer_required`,
|
||||
`notes`. Raw pcaps stay out of git; the manifest is the structured handoff.
|
||||
|
||||
### Runtime version inventory and drift
|
||||
|
||||
`colibri-runtime-inventory.ts` validates the
|
||||
`clawdie.runtime-version-inventory.v1` schema (`host`, `os`, `node?`, `npm?`,
|
||||
`pi?`, `npm_prefix?`, `package_manager?`, `iso_npm_globals_pin`, `notes`). Each
|
||||
host emits its own inventory; `buildRuntimeDriftReport` compares them against a
|
||||
target (default Node major **24**, optional pinned Pi version) and returns which
|
||||
hosts drift on Node, Pi, a missing Pi, or an ISO npm-globals pin.
|
||||
`summarizeRuntimeDriftReport` renders a `<colibri-runtime-drift>` block.
|
||||
|
||||
## Relationship to the watchdog
|
||||
|
||||
The watchdog is **load-bearing runtime safety**, not a dashboard. Colibri reads
|
||||
its socket as an input; it does not own, replace, or merge it. `src/watchdog.ts`:
|
||||
|
||||
- reads FreeBSD free memory via `sysctl vm.stats.vm.v_free_count`
|
||||
- throttles jail-queue concurrency under memory pressure
|
||||
- exposes run modes (`auto`, `slow`, `fast`, `permanent`)
|
||||
- answers structured IPC: `{"cmd":"status"}` and `{"cmd":"mode","value":…}`
|
||||
|
||||
`doctor` and `pi-profile` are existing consumers of that status and must keep
|
||||
working — they are hard gates, not deletion targets.
|
||||
|
||||
## Proof gates before removing legacy paths
|
||||
|
||||
No legacy runner or status code is removed until all of these hold:
|
||||
|
||||
1. Pi runs an end-to-end task on Linux.
|
||||
2. Pi runs an end-to-end task on FreeBSD.
|
||||
3. Pi JSON/SDK events map into Clawdie task/activity state.
|
||||
4. A DeepSeek lane works through Pi on Linux using `--mode json`.
|
||||
5. Colibri consumes watchdog status without breaking `doctor` or `pi-profile`.
|
||||
6. Herdr can display/launch the Linux operator workflow without becoming a
|
||||
required FreeBSD dependency.
|
||||
7. The network-throughput coordination test has produced structured manifests
|
||||
from at least two hosts.
|
||||
|
||||
## Runtime drift and version sync
|
||||
|
||||
Colibri is also the coordination layer for runtime hygiene:
|
||||
|
||||
```text
|
||||
Node: 24.x on Linux and FreeBSD
|
||||
Pi: pinned per host inventory, not assumed from PATH
|
||||
ISO npm globals: pinned in the ISO repo, not fetched from moving latest tags
|
||||
```
|
||||
|
||||
Each host emits a small inventory manifest; the coordinator compares manifests
|
||||
before upgrades; FreeBSD package actions stay locally authorized and
|
||||
rollback-aware. The supporting skills are `colibri-provider-verify` (validates a
|
||||
provider lane through Pi `--mode json`) and `runtime-version-sync` (inventories
|
||||
and aligns Node, Pi, npm globals, and ISO pins across hosts).
|
||||
|
||||
## Non-goals
|
||||
|
||||
- No Herdr FreeBSD port; no Herdr vendoring.
|
||||
- No deletion of the watchdog or `hostd`.
|
||||
- No replacement of FreeBSD local safety with Linux-side orchestration.
|
||||
- No broad agent-backend deletion before caller inventory and proof gates.
|
||||
|
||||
## References
|
||||
|
||||
- [Control Plane](../controlplane/) — the orchestration layer Colibri feeds.
|
||||
- [Provider Fallback](../operate/provider-fallback/) — provider switching that
|
||||
produces the `effective_*` vs `actual_*` divergence Colibri records.
|
||||
- `doc/COLIBRI-PI-CONTROL-PLAN.md` — the source plan and phase breakdown.
|
||||
- `doc/INTERAGENT-RUN-CONTRACT.md` — the inter-agent run-manifest contract.
|
||||
- [Control plane](../controlplane/) — the pre-v0.12 TypeScript architecture (historical)
|
||||
- [Agent harness](../../wiki/agent-harness/) — wiki: zot + Colibri split, autospawn
|
||||
- [Cost model](../../wiki/cost-model/) — wiki: cache-hit metering, auto-escalation
|
||||
- [Glasspane](../../wiki/glasspane/) — wiki: state machine, JSONL streaming
|
||||
- [Task board](../../wiki/task-board/) — wiki: capability matching, scheduling
|
||||
- [Jail confinement](../../wiki/jail-confinement/) — wiki: persistent vs ephemeral jails
|
||||
- [Mother hive](../../wiki/mother-hive/) — wiki: MCP architecture, peer auth
|
||||
|
|
|
|||
|
|
@ -25,10 +25,10 @@ operator / peer host bridged host
|
|||
The bridge is a thin `socat` front-end, supervised by the host's service
|
||||
manager. Both sides are shipped in the repo:
|
||||
|
||||
| Host | Service | Packaging |
|
||||
| --- | --- | --- |
|
||||
| FreeBSD | rc.d `colibri_bridge` | `packaging/freebsd/colibri_bridge.in` |
|
||||
| Linux | systemd `colibri-bridge.service` | `packaging/linux/` (unit + env + nft + README) |
|
||||
| Host | Service | Packaging |
|
||||
| ------- | -------------------------------- | ---------------------------------------------- |
|
||||
| FreeBSD | rc.d `colibri_bridge` | `packaging/freebsd/colibri_bridge.in` |
|
||||
| Linux | systemd `colibri-bridge.service` | `packaging/linux/` (unit + env + nft + README) |
|
||||
|
||||
Both run effectively:
|
||||
|
||||
|
|
@ -51,7 +51,7 @@ native firewall:
|
|||
- **Linux (ufw):** `ufw allow in on tailscale0 to any port 9190 proto tcp`
|
||||
|
||||
On a default-deny host (e.g. ufw), the public side is already blocked, so only
|
||||
the interface-scoped *allow* is needed. The `packaging/linux/colibri-bridge.nft`
|
||||
the interface-scoped _allow_ is needed. The `packaging/linux/colibri-bridge.nft`
|
||||
ruleset is provided for Linux hosts that do **not** run ufw (a default-accept
|
||||
input chain); under ufw it is redundant.
|
||||
|
||||
|
|
|
|||
|
|
@ -272,7 +272,7 @@ but can be disabled), or **optional** (skipped unless explicitly enabled).
|
|||
| service | required | — |
|
||||
| hostd | required | — |
|
||||
| identity-restore | optional | `SUPABASE_URL` not set |
|
||||
| verify | optional | warn on most check failures; fail on broken runtime integrity |
|
||||
| verify | optional | warn on most check failures; fail on compromised runtime integrity |
|
||||
|
||||
A required step failure stops the install immediately and prints the resume
|
||||
command. Default steps ship enabled (`FEATURE_GITEA=YES`, artifact.sql bundled)
|
||||
|
|
|
|||
|
|
@ -157,7 +157,7 @@ verification output:
|
|||
Every enabled site currently has served output where the platform expects it.
|
||||
- `inconsistent`
|
||||
The live output and the publish manifest disagree. This is the state to treat
|
||||
as broken.
|
||||
as inconsistent.
|
||||
|
||||
## What verify checks
|
||||
|
||||
|
|
|
|||
|
|
@ -6,7 +6,7 @@ description: Deduplicated tmux pane history with edge-triggered failure alerts.
|
|||
Terminal capture is the screen-scraping half of Glasspane. Where the rest of
|
||||
Glasspane derives agent state from structured JSONL events, this layer records
|
||||
the **actual terminal text** of a pane and triages it against known patterns —
|
||||
so Colibri can both *remember* what a terminal showed and *speak up* the moment
|
||||
so Colibri can both _remember_ what a terminal showed and _speak up_ the moment
|
||||
something it recognises goes wrong.
|
||||
|
||||
It lives in `colibri-glasspane` (`terminal.rs`, `signatures.rs`) and is driven
|
||||
|
|
@ -18,14 +18,14 @@ by the `colibri-daemon` poll loop.
|
|||
Identical screens produce identical ids.
|
||||
- **Deduplicated history.** The recorder drops any frame whose hash equals the
|
||||
previous one, so polling a near-static pane every few seconds collapses into a
|
||||
compact log of *actual* state transitions, not thousands of duplicates. The
|
||||
compact log of _actual_ state transitions, not thousands of duplicates. The
|
||||
history is a bounded ring buffer per pane.
|
||||
- **Signature triage.** Each captured frame is scanned by a `SignatureSet`.
|
||||
A signature carries a severity (`error`/`warn`/`info`/`ok`), a plain-language
|
||||
`next_action`, and an optional `invoke` (a skill to run to remediate). Matches
|
||||
are classified into `failures` / `warnings` / `info` / `healthy`.
|
||||
- **Edge-triggered alerts.** A failure/warning is reported only on the frame
|
||||
where it *first appears* — not on every subsequent frame that still shows it.
|
||||
where it _first appears_ — not on every subsequent frame that still shows it.
|
||||
When the condition clears and later recurs, it fires again. This is what keeps
|
||||
a persistent error from spamming alerts.
|
||||
|
||||
|
|
@ -42,12 +42,12 @@ different set; the matcher is shared.
|
|||
|
||||
Set on the daemon's environment (off by default):
|
||||
|
||||
| Variable | Purpose | Default |
|
||||
| --- | --- | --- |
|
||||
| `COLIBRI_TERMINAL_CAPTURE` | Enable the poll loop (`1`/`true`/`yes`/`on`) | off |
|
||||
| `COLIBRI_TERMINAL_CAPTURE_INTERVAL_SECS` | Seconds between captures of each watched pane | `5` |
|
||||
| `COLIBRI_TERMINAL_WATCH` | Comma-separated tmux targets to watch from startup | _(none)_ |
|
||||
| `TELEGRAM_BOT_TOKEN` / `TELEGRAM_CHAT_ID` | Route edge-triggered alerts to Telegram | _(unset → log only)_ |
|
||||
| Variable | Purpose | Default |
|
||||
| ----------------------------------------- | -------------------------------------------------- | -------------------- |
|
||||
| `COLIBRI_TERMINAL_CAPTURE` | Enable the poll loop (`1`/`true`/`yes`/`on`) | off |
|
||||
| `COLIBRI_TERMINAL_CAPTURE_INTERVAL_SECS` | Seconds between captures of each watched pane | `5` |
|
||||
| `COLIBRI_TERMINAL_WATCH` | Comma-separated tmux targets to watch from startup | _(none)_ |
|
||||
| `TELEGRAM_BOT_TOKEN` / `TELEGRAM_CHAT_ID` | Route edge-triggered alerts to Telegram | _(unset → log only)_ |
|
||||
|
||||
When the bot token/chat id are unset, alerts degrade cleanly to a daemon log
|
||||
line — the feature is safe to leave enabled without Telegram configured.
|
||||
|
|
@ -56,13 +56,13 @@ line — the feature is safe to leave enabled without Telegram configured.
|
|||
|
||||
Over the Colibri socket (newline-delimited JSON):
|
||||
|
||||
| Command | Effect |
|
||||
| --- | --- |
|
||||
| `{"cmd":"terminal-watch","target":"clawdie:0"}` | Start recording a tmux target (session / `session:window` / `%pane`) |
|
||||
| `{"cmd":"terminal-unwatch","target":"clawdie:0"}` | Stop recording and drop the pane's history |
|
||||
| `{"cmd":"terminal-list"}` | Watched panes with frame counts and currently-firing alerts |
|
||||
| `{"cmd":"terminal-history","target":"clawdie:0","limit":20}` | Recent recorded frames (text + detection) for a pane |
|
||||
| `{"cmd":"terminal-poll","target":"clawdie:0"}` | Capture now instead of waiting for the tick (`target` optional → all) |
|
||||
| Command | Effect |
|
||||
| ------------------------------------------------------------ | --------------------------------------------------------------------- |
|
||||
| `{"cmd":"terminal-watch","target":"clawdie:0"}` | Start recording a tmux target (session / `session:window` / `%pane`) |
|
||||
| `{"cmd":"terminal-unwatch","target":"clawdie:0"}` | Stop recording and drop the pane's history |
|
||||
| `{"cmd":"terminal-list"}` | Watched panes with frame counts and currently-firing alerts |
|
||||
| `{"cmd":"terminal-history","target":"clawdie:0","limit":20}` | Recent recorded frames (text + detection) for a pane |
|
||||
| `{"cmd":"terminal-poll","target":"clawdie:0"}` | Capture now instead of waiting for the tick (`target` optional → all) |
|
||||
|
||||
`terminal-poll` returns, per pane, whether the frame was `recorded` or
|
||||
`unchanged` (deduped) and any `new_alerts` that fired on this capture.
|
||||
|
|
|
|||
|
|
@ -382,7 +382,7 @@ With V1 `query()` + string prompt + agent teams:
|
|||
Instead of passing a string prompt (which sets `isSingleUserTurn = true`), pass an `AsyncIterable<SDKUserMessage>`:
|
||||
|
||||
```typescript
|
||||
// Before (broken for agent teams):
|
||||
// Before (not suitable for agent teams):
|
||||
query({ prompt: "do something" })
|
||||
|
||||
// After (keeps CLI alive):
|
||||
|
|
|
|||
Loading…
Add table
Reference in a new issue