From ace863d3eb24fe442ff1378cbc711c769eee7c99 Mon Sep 17 00:00:00 2001 From: Sam & Claude Date: Wed, 24 Jun 2026 13:37:31 +0200 Subject: [PATCH] =?UTF-8?q?feat(wiki):=20expand=20to=20full=20coverage=20?= =?UTF-8?q?=E2=80=94=20cost-model,=20glasspane,=20task-board,=20jail-confi?= =?UTF-8?q?nement?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Adds four wiki pages, one per major architectural subsystem: - cost-model: byte-stable prefixes, cache-hit metering, three cost modes, auto-escalation, T14 compaction, DeepSeek cache-hit probe - glasspane: agent state machine, JSONL streaming, AgentRuntime taxonomy, snapshot API, pane reader loop - task-board: capability match scoring, cron/interval/once schedule types, intake drain, SQLite backing - jail-confinement: persistent vs ephemeral jails, priv-mode policy, reuse of spawner confinement for MCP servers Updates index.md: removes "pilot" framing, updates lint section to reflect the shipped wiki-lint, adds all four pages to the table. wiki-lint --strict: clean (70 PASS, 0 FAIL). --- docs/wiki/cost-model.md | 90 ++++++++++++++++++++++++++++++++ docs/wiki/glasspane.md | 97 +++++++++++++++++++++++++++++++++++ docs/wiki/index.md | 42 +++++++-------- docs/wiki/jail-confinement.md | 92 +++++++++++++++++++++++++++++++++ docs/wiki/task-board.md | 93 +++++++++++++++++++++++++++++++++ 5 files changed, 393 insertions(+), 21 deletions(-) create mode 100644 docs/wiki/cost-model.md create mode 100644 docs/wiki/glasspane.md create mode 100644 docs/wiki/jail-confinement.md create mode 100644 docs/wiki/task-board.md diff --git a/docs/wiki/cost-model.md b/docs/wiki/cost-model.md new file mode 100644 index 0000000..e8f4abd --- /dev/null +++ b/docs/wiki/cost-model.md @@ -0,0 +1,90 @@ +# Cost model + +← [index](./index.md) + +## What this is + +Colibri tracks every token that passes through an agent session and meters cost +against a configurable budget. The key insight: **cache-hit tokens cost 10× +less** than fresh tokens on DeepSeek — so the prompt prefix is engineered to be +byte-stable across requests, maximizing cache hits. Three cost modes (fast, +smart, max) represent different points on the speed/cost trade-off, and the +model auto-escalates when a cheaper mode can't keep up. + +## Decisions + +### Byte-stable prompt prefix → cache-hit metering + +The system prompt and early context blocks are **byte-for-byte identical** +across consecutive requests to the same DeepSeek endpoint. DeepSeek's cache-hit +pricing discounts these by ~90%. Colibri's `colibri-deepseek` probe determines +the exact token-count split between cached and fresh tokens per request, and +the cost tracker records both so the session budget reflects the **actual** +discounted cost, not the nominal token count. + +**Why not just count tokens**: token counting with an offline tokenizer gives +you an upper bound but not the real cost. DeepSeek's API sometimes re-caches and +sometimes doesn't — the probe measures what actually happened. The discount is +too large (10×) to leave unmeasured. + +→ [`HEADROOM-SIDECAR.md`](../HEADROOM-SIDECAR.md), +[`COLIBRI-TOKENOMICS-TRIFECTA.md`](../COLIBRI-TOKENOMICS-TRIFECTA.md), +[`crates/colibri-deepseek/src/lib.rs`](../../crates/colibri-deepseek/src/lib.rs) + +### Three cost modes (fast → smart → max) + +| Mode | Budget (tokens) | Behavior | +| ----- | --------------- | ------------------------------------------------------------------------- | +| Fast | 16K | Maximum cache-hits, minimum fresh tokens. Rejects large expansions early. | +| Smart | 64K | Default. Balances cache reuse with room for follow-up turns. | +| Max | 256K | Almost never hits budget. For one-shot deep tasks where cost is secondary. | + +The daemon **auto-escalates** when a session exhausts its budget in a lower +mode: fast → smart → max. Escalation is one-way (never downgrades mid-session). + +**Why three modes, not a continuous slider**: simplicity wins here. Three +well-understood points cover the space — operators pick by risk appetite, not +by fine-tuning a number. The escalation chain means "start cheap, pay more only +if it works." + +→ [`COLIBRI-TOKENOMICS-TRIFECTA.md`](../COLIBRI-TOKENOMICS-TRIFECTA.md), +[`crates/colibri-daemon/src/cost.rs`](../../crates/colibri-daemon/src/cost.rs) + +### T14 compaction (budget trim, not truncate) + +When a session is about to exceed its budget, Colibri compacts the tool results +in the volatile region — it sends them through the headroom sidecar for +summarization, then trims the oldest volatile blocks until the prompt fits +within budget. The **prefix** (system prompt, static context) is never trimmed +— only the volatile suffix. + +If compaction is insufficient and auto-escalation is enabled, the mode steps up +before truncating. + +**Why not just truncate**: truncating mid-conversation loses context the agent +needs to continue. Compaction preserves the semantic content at lower token cost. +The headroom sidecar is optional (off by default); without it, the fallback is +simple truncation. + +→ [`HEADROOM-SIDECAR.md`](../HEADROOM-SIDECAR.md), +[`crates/colibri-daemon/src/session.rs`](../../crates/colibri-daemon/src/session.rs) + +### Cache-hit probe (DeepSeek-specific) + +The `colibri-deepseek` crate sends a preflight request with a known prompt to +the DeepSeek API and parses the response headers to determine the cache-hit +split (prompt_cache_hit_tokens / prompt_cache_miss_tokens). This is +provider-specific — DeepSeek is the only provider that exposes this granularity. +The probe runs once per session configuration change, not per request. + +**Why a probe and not a hook**: middleware that intercepts every API response +would couple cost tracking to the HTTP layer. A probe decouples it — the cost +tracker asks "what was the cache ratio?" and the probe answers, independently of +how the request was made. + +→ [`crates/colibri-deepseek/src/lib.rs`](../../crates/colibri-deepseek/src/lib.rs) + +## See also + +- [mother-hive](./mother-hive.md) — MCP architecture (different cost domain) +- [quality-gates](./quality-gates.md) — the gate that validates cost-mode parsing diff --git a/docs/wiki/glasspane.md b/docs/wiki/glasspane.md new file mode 100644 index 0000000..6960166 --- /dev/null +++ b/docs/wiki/glasspane.md @@ -0,0 +1,97 @@ +# Glasspane — agent state supervision + +← [index](./index.md) + +## What this is + +Glasspane is Colibri's agent observation layer. It watches agent subprocesses +via their JSONL stdout, folds the stream into a semantic state machine +(`Idle → Working → Done`), and exposes a snapshot API for dashboards and +daemon coordination. Every spawned agent — Pi, zot, or a local sample — feeds +through the same ingestor and ends up in the same taxonomy. + +## Decisions + +### Agent state as a state machine, not raw event log + +Glasspane doesn't just relay raw agent events. It ingests JSONL lines and +transitions a **named pane** through a finite set of states: + +``` +Idle → Working → Done + ↳ Error + ↳ Stalled (no events within a timeout window) +``` + +The `AgentState` enum (`Idle, Working, Done, Error, Stalled`) is deliberately +small. It captures what a supervisor needs to know — "is the agent working? +stuck? finished?" — without encoding agent-specific semantics. Events that don't +change the state (e.g. a usage report from zot) are recorded in the pane's +metadata but don't affect the state machine. + +**Why not just tail the log**: raw event logs are agent-specific and change over +time (zot adds new event types). The state machine is a stable contract that the +daemon, TUI, and client CLI can all rely on. + +→ [`crates/colibri-glasspane/src/lib.rs`](../../crates/colibri-glasspane/src/lib.rs) + +### JSONL streaming (one line = one event) + +Agents emit structured events as newline-delimited JSON on stdout. Glasspane +reads line-by-line with `BufReader`, deserializes each line, and feeds it into +the `PiJsonlIngestor` (the name is legacy — it handles zot events too). + +The reader runs in a **single background task per pane** (`pane_reader_loop`). +It never blocks the daemon's main loop — the ingestor is a synchronous fold +that updates the pane's in-memory state, and the snapshot API reads from +`Arc>` with no contention on the reader hot path. + +Malformed lines are **skipped** with a counter increment, not an error — +dropouts in an agent's JSONL shouldn't crash the observer. + +**Why JSONL, not a socket or gRPC**: the agent is a subprocess, not a service. +stdout is the universal interface — every language, every harness, zero setup. +JSONL is trivial to write from bash, Go, Python, Rust. A structured wire format +would add a dep and a handshake to every agent. + +→ [`crates/colibri-glasspane/src/lib.rs`](../../crates/colibri-glasspane/src/lib.rs) +(`PiJsonlIngestor`, `pane_reader_loop`) + +### `AgentRuntime { Pi, Zot, Local }` — one taxonomy for two harnesses + +Pi and zot emit **different** raw event types: Pi uses `agent_start` / +`turn_end`, zot uses `turn_start` / `done`. Glasspane maps both into the same +`AgentState` transitions via `zot_event_type()`. The `AgentRuntime` enum tags +each pane with its harness so the mapping function knows which event vocabulary +to parse. + +The `Pane` struct's `session_id` field uses `#[serde(alias = "pi_session_id")]` +for backward compatibility with pre-neutrality serialized snapshots. + +**Why not have two separate state machines**: the TUI, daemon scheduler, and +client CLI all need to ask "what state is this agent in?" — they don't care +whether it's zot or Pi. One taxonomy, one API. The mapping is a ~50-line +function, not a subsystem. + +→ [`crates/colibri-glasspane/src/lib.rs`](../../crates/colibri-glasspane/src/lib.rs) +(`zot_event_type`, `AgentRuntime`) + +### Snapshot API (read-heavy, not write-heavy) + +Glasspane exposes a snapshot object (the full set of panes with their current +state, session ID, timestamp, and metadata) through `Arc>`. The +daemon serves this over its Unix socket to client readers. Writes happen once +per event; reads are frequent (TUI polls, CLI status checks). + +**Why RwLock, not channels**: the write path is low-frequency (agent JSONL at +human-reading speed), and the read path is lock-free in the common case. A +channel-based design would add buffering and delivery semantics for a problem +that's fundamentally about current state, not event delivery. + +→ [`crates/colibri-glasspane/src/lib.rs`](../../crates/colibri-glasspane/src/lib.rs) +(`Supervisor`, `snapshot`) + +## See also + +- [agent-harness](./agent-harness.md) — the zot/Colibri split that Glasspane observes +- [naming-decisions](./naming-decisions.md) — `pi_session_id → session_id`, `pi_type → event_type` diff --git a/docs/wiki/index.md b/docs/wiki/index.md index cba5e28..78c653a 100644 --- a/docs/wiki/index.md +++ b/docs/wiki/index.md @@ -1,11 +1,12 @@ # Colibri Wiki -A small, agent-maintained knowledge base for Colibri's **decisions and -architecture** — based on Andrej Karpathy's +A knowledge base for Colibri's **decisions and architecture** — based on +Andrej Karpathy's [LLM Wiki pattern](https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f). -This is a **pilot**. It deliberately covers a few decision-dense areas, not the -whole repo. +Every major subsystem has a page recording **why** it was built the way it +was — the rationale the code can't express. Implementation docs in `docs/` +cover the _how_; these pages cover the _why_. ## Why this exists @@ -32,24 +33,23 @@ These rules keep the wiki a maintainable artifact, not a second source of truth: 5. **Lint, don't trust.** A page is a claim to be checked against code, not a guarantee. -## Lint workflow (the point of the pilot) +## Lint workflow -`lint` = an agent pass that reads each page and checks it against the current -code: stale names, dangling references, contradictions, decisions that shipped -but whose page still says "planned." Output is a **report**, not auto-edits — -advisory first, until the signal is trusted. (Tool TBD — pilot step 2.) - -Open drift already noted by hand: - -- `stage-colibri-iso.sh` (clawdie-iso) and a guardrail comment reference - `ADR-agent-harness-consolidation.md`, which **does not exist** in either repo. - The real architecture statement is `AGENTS.md`. → see [agent-harness](./agent-harness.md). +The [`wiki-lint`](../../scripts/wiki-lint) script checks every page against the +current code: dangling references, resurrected old names (from the naming +ledger), and orphan pages. It runs as part of `ci-checks.sh --strict` and is +gated by the pre-push hook — a drift failure blocks a push, same as a clippy +warning. ## Pages -| Page | What it covers | -| ----------------------------------------- | ------------------------------------------------------------------------------- | -| [agent-harness](./agent-harness.md) | The zot (agent) + Colibri (control plane) split; autospawn + RPC driver | -| [mother-hive](./mother-hive.md) | Mother MCP architecture — forced-command SSH, single-home-in-colibri, peer auth | -| [naming-decisions](./naming-decisions.md) | Ledger of harness-neutral / architecture renames — shipped and in-flight | -| [quality-gates](./quality-gates.md) | `ci-checks.sh` as the pre-merge gate; why drift reached `main` before | +| Page | What it covers | +| ----------------------------------------- | --------------------------------------------------------------------------------------------- | +| [agent-harness](./agent-harness.md) | The zot (agent) + Colibri (control plane) split; autospawn + RPC driver | +| [cost-model](./cost-model.md) | Byte-stable prefixes, cache-hit metering, auto-escalation, T14 compaction | +| [glasspane](./glasspane.md) | Agent state machine, JSONL streaming, AgentRuntime taxonomy, snapshot API | +| [jail-confinement](./jail-confinement.md) | Persistent vs ephemeral jails, priv-mode policy, reuse of spawner confinement for MCP servers | +| [mother-hive](./mother-hive.md) | Mother MCP architecture — forced-command SSH, single-home-in-colibri, peer auth, key-on-seed | +| [naming-decisions](./naming-decisions.md) | Ledger of harness-neutral / architecture renames — shipped and in-flight | +| [task-board](./task-board.md) | Capability match scoring, cron scheduling, intake drain, SQLite backing | +| [quality-gates](./quality-gates.md) | `ci-checks.sh` as the pre-merge gate; why drift reached `main` before | diff --git a/docs/wiki/jail-confinement.md b/docs/wiki/jail-confinement.md new file mode 100644 index 0000000..1859279 --- /dev/null +++ b/docs/wiki/jail-confinement.md @@ -0,0 +1,92 @@ +# Jail confinement + +← [index](./index.md) + +## What this is + +Colibri can confine spawned agents and external MCP servers inside FreeBSD +jails. The spawner wraps the subprocess command through `jexec` (persistent +jails) or `jail -c` (ephemeral jails), so the agent's entire filesystem view +and network are isolated. stdio passes through unchanged — the agent's JSONL +still reaches Glasspane, and the MCP host's stdin/stdout transport still works. + +## Decisions + +### Reuse the spawner's confinement primitive (don't build a parallel one) + +The agent spawner and the external-MCP host both need to confine untrusted +subprocesses. Instead of building a second confinement layer, the MCP host +reuses the agent spawner's `jail_wrap()` function directly — the same +`JailConfig` struct, the same `PrivMode` policy, the same `prepare_spawn_command` +pipeline. + +**Why reuse**: two confinement paths → one can drift. The spawner is tested +(20+ unit tests in `spawner.rs` covering named, ephemeral, staged, priv-mode +variants). The MCP host gets a battle-tested implementation for free. + +→ [`COLIBRI-JAILED-AGENT-SPAWN-DESIGN.md`](../COLIBRI-JAILED-AGENT-SPAWN-DESIGN.md), +[`crates/colibri-daemon/src/spawner.rs`](../../crates/colibri-daemon/src/spawner.rs) +(`jail_wrap`, `JailConfig`), +[`crates/colibri-mcp/src/external.rs`](../../crates/colibri-mcp/src/external.rs) + +### Persistent vs ephemeral jails + +| Type | How | When to use | +| ---------- | ----------------------------------------- | ------------------------------------------- | +| Persistent | `jexec ` into an existing jail | Operator-managed jails with preconfigured environments | +| Ephemeral | `jail -c command=` auto-destroyed | One-shot confinement, no state between runs | + +The `JailConfig` struct uses an enum: if `name` is set, jexec; if `path` is set, +ephemeral. They're mutually exclusive; `name` takes precedence. + +**Why both**: persistent jails are operator-managed infrastructure (a build jail, +a worker jail that persists between agent runs). Ephemeral jails are for +untrusted one-shot work — like an external MCP server from a third-party +registry. The caller picks the lifecycle. + +→ [`COLIBRI-JAILED-AGENT-SPAWN-DESIGN.md`](../COLIBRI-JAILED-AGENT-SPAWN-DESIGN.md) + +### Priv-mode policy (`mdo` on live USB, `helper` on deployed) + +The daemon is unprivileged but jail creation requires root. The priv-mode +policy resolves this without granting the daemon blanket sudo: + +- **`mdo`** — the live USB's operator tool (`mdo -u root jail -c ...`). Used + on the operator image where `mdo` is configured. +- **`helper`** — a setuid helper binary on deployed hosts (not yet shipped; + falls back to `sudo`). The daemon never runs as root. + +The policy is configurable via `COLIBRI_JAIL_PRIV_MODE` and is resolved once +at daemon startup. The same policy applies to agents and MCP servers. + +**Why not the daemon as root**: the daemon spawns arbitrary subprocesses +(potentially attacker-controlled, via the MCP registry or task intake). +Running as unprivileged `colibri` limits the blast radius; the priv-mode +helper grants only the specific operations needed (jail creation). + +→ [`COLIBRI-JAILED-AGENT-SPAWN-DESIGN.md`](../COLIBRI-JAILED-AGENT-SPAWN-DESIGN.md), +[`crates/colibri-daemon/src/spawner.rs`](../../crates/colibri-daemon/src/spawner.rs) +(`PrivMode`) + +### MCP servers are jailed by default (same threat model as agents) + +External MCP servers registered in the external MCP registry accept an optional +`jail` field with the same shape as agent spawn configs. The MCP host applies +the jail wrapper before spawning the server. Servers without a `jail` field +run on the host (backward compatible). + +The MCP host's registry entry supports per-server jail configuration — +different servers can run in different jails. This is a property of the +registry, not a global daemon setting. + +**Why jailed by default**: external MCP servers are arbitrary third-party +binaries — at least as untrusted as the agents Colibri already jails. The +threat model is identical. + +→ [`COLIBRI-EXTERNAL-MCP-PROTOTYPE.md`](../COLIBRI-EXTERNAL-MCP-PROTOTYPE.md), +[`crates/colibri-mcp/src/external.rs`](../../crates/colibri-mcp/src/external.rs) + +## See also + +- [mother-hive](./mother-hive.md) — the SSH forced-command boundary (a different confinement model) +- [agent-harness](./agent-harness.md) — the spawner that jails agents diff --git a/docs/wiki/task-board.md b/docs/wiki/task-board.md new file mode 100644 index 0000000..56bd961 --- /dev/null +++ b/docs/wiki/task-board.md @@ -0,0 +1,93 @@ +# Task board + scheduler + +← [index](./index.md) + +## What this is + +Colibri's task board holds operator-submitted work items, and the scheduler +assigns them to the best-fit agent on each tick. Tasks flow in via the +daemon's Unix socket (`create-task`, `intake-task`) and are drained by the +scheduler loop running inside the daemon every ~30 seconds. + +## Decisions + +### Capability match scoring (best-fit, not first-fit) + +When the scheduler picks an agent for a task, it scores every available agent +against the task's **required capabilities** using a simple intersection count: +`|required ∩ agent_caps| / |required|`. The agent with the highest score wins; +ties are broken by agent name (deterministic, so repeated runs don't thrash). + +A task with `["freebsd", "zfs"]` will match an agent with both capabilities +over one with only `freebsd`. A task with no required capabilities matches +any agent. Offline agents and agents whose capabilities don't intersect at all +are skipped. + +**Why not round-robin or FIFO**: capability-based matching means the right agent +gets the right work without operator hand-assignment. The scoring is trivial +(set intersection) and transparent — no machine learning, no weights to tune. + +→ [`crates/colibri-daemon/src/scheduler.rs`](../../crates/colibri-daemon/src/scheduler.rs) +(`capability_match_score`, `pick_agent`) + +### Three schedule types (cron, interval, once) + +| Type | Behavior | +| -------- | ----------------------------------------------------------------- | +| Cron | Fires at specific wall-clock times (e.g. `0 0 * * *` = midnight). | +| Interval | Fires after a fixed duration since last run (e.g. 3600s). | +| Once | Fires exactly once, at the specified future time. | + +Cron patterns are simple 5-field expressions (minute, hour, day, month, +weekday) with wildcards — no second granularity, no `/step` syntax. The +matching uses prefix comparison: a cron pattern matches if each field of the +current time begins with the pattern string, so `0` matches `00`, `1` matches +`10-19`, etc. This is intentionally simple — cron is a convenience for periodic +housekeeping, not a general-purpose job engine. + +**Why not use a real cron library**: the scheduler's job is dispatching tasks to +agents, not calendar management. The simple prefix-match cron covers 90% of use +cases (daily builds, hourly reports) without pulling in a parsing dependency. + +→ [`crates/colibri-daemon/src/scheduler.rs`](../../crates/colibri-daemon/src/scheduler.rs) +(`should_fire`) + +### Intake drain (queue → task board → agent) + +The `intake-task` socket command pushes a task onto the intake queue. On each +scheduler tick (~30s), the loop drains the intake queue into the task board's +SQLite store, then checks for due scheduled jobs. This two-phase drain +decouples submission from execution: the operator submits at any time, the +scheduler processes in batches. + +Tasks in the intake queue carry a **capability string** (not an agent ID). The +scheduler picks the best agent at execution time, so a task submitted when no +matching agent is online will be picked up when one connects. + +**Why an intake queue, not direct assignment**: agents come and go. If submission +required picking an agent, the operator would need to know which agents are +available — a coupling the task board deliberately avoids. + +→ [`crates/colibri-daemon/src/scheduler.rs`](../../crates/colibri-daemon/src/scheduler.rs) +(`Scheduler`, `add_job`, `submit`), +[`crates/colibri-daemon/tests/intake_scheduler_loop.rs`](../../crates/colibri-daemon/tests/intake_scheduler_loop.rs) + +### SQLite backing (embedded, not a service) + +The task board stores tasks, agent registrations, tenant info, and the skills +catalog in an embedded SQLite database at `/var/db/colibri/colibri.sqlite`. No +separate database process — the daemon opens the file directly. + +**Why SQLite, not PostgreSQL**: the daemon runs on the operator USB and on +deployed hosts. A full PostgreSQL service is heavyweight for a single daemon's +coordination state. SQLite is zero-config, zero-admin, and survives daemon +restarts without a separate lifecycle. The mother node uses PostgreSQL for the +hive registry because it's multi-tenant; the local daemon is single-tenant. + +→ [`crates/colibri-store/src/lib.rs`](../../crates/colibri-store/src/lib.rs) + +## See also + +- [mother-hive](./mother-hive.md) — the mother node's PostgreSQL-based hive registry +- [cost-model](./cost-model.md) — cost tracking per session +- [agent-harness](./agent-harness.md) — autospawn -- 2.45.3