From ace863d3eb24fe442ff1378cbc711c769eee7c99 Mon Sep 17 00:00:00 2001
From: Sam & Claude <hello@clawdie.si>
Date: Wed, 24 Jun 2026 13:37:31 +0200
Subject: [PATCH] =?UTF-8?q?feat(wiki):=20expand=20to=20full=20coverage=20?=
 =?UTF-8?q?=E2=80=94=20cost-model,=20glasspane,=20task-board,=20jail-confi?=
 =?UTF-8?q?nement?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Adds four wiki pages, one per major architectural subsystem:

- cost-model: byte-stable prefixes, cache-hit metering, three cost modes,
  auto-escalation, T14 compaction, DeepSeek cache-hit probe
- glasspane: agent state machine, JSONL streaming, AgentRuntime taxonomy,
  snapshot API, pane reader loop
- task-board: capability match scoring, cron/interval/once schedule types,
  intake drain, SQLite backing
- jail-confinement: persistent vs ephemeral jails, priv-mode policy,
  reuse of spawner confinement for MCP servers

Updates index.md: removes "pilot" framing, updates lint section to reflect
the shipped wiki-lint, adds all four pages to the table.

wiki-lint --strict: clean (70 PASS, 0 FAIL).
---
 docs/wiki/cost-model.md       | 90 ++++++++++++++++++++++++++++++++
 docs/wiki/glasspane.md        | 97 +++++++++++++++++++++++++++++++++++
 docs/wiki/index.md            | 42 +++++++--------
 docs/wiki/jail-confinement.md | 92 +++++++++++++++++++++++++++++++++
 docs/wiki/task-board.md       | 93 +++++++++++++++++++++++++++++++++
 5 files changed, 393 insertions(+), 21 deletions(-)
 create mode 100644 docs/wiki/cost-model.md
 create mode 100644 docs/wiki/glasspane.md
 create mode 100644 docs/wiki/jail-confinement.md
 create mode 100644 docs/wiki/task-board.md

diff --git a/docs/wiki/cost-model.md b/docs/wiki/cost-model.md
new file mode 100644
index 0000000..e8f4abd
--- /dev/null
+++ b/docs/wiki/cost-model.md
@@ -0,0 +1,90 @@
+# Cost model
+
+← [index](./index.md)
+
+## What this is
+
+Colibri tracks every token that passes through an agent session and meters cost
+against a configurable budget. The key insight: **cache-hit tokens cost 10×
+less** than fresh tokens on DeepSeek — so the prompt prefix is engineered to be
+byte-stable across requests, maximizing cache hits. Three cost modes (fast,
+smart, max) represent different points on the speed/cost trade-off, and the
+model auto-escalates when a cheaper mode can't keep up.
+
+## Decisions
+
+### Byte-stable prompt prefix → cache-hit metering
+
+The system prompt and early context blocks are **byte-for-byte identical**
+across consecutive requests to the same DeepSeek endpoint. DeepSeek's cache-hit
+pricing discounts these by ~90%. Colibri's `colibri-deepseek` probe determines
+the exact token-count split between cached and fresh tokens per request, and
+the cost tracker records both so the session budget reflects the **actual**
+discounted cost, not the nominal token count.
+
+**Why not just count tokens**: token counting with an offline tokenizer gives
+you an upper bound but not the real cost. DeepSeek's API sometimes re-caches and
+sometimes doesn't — the probe measures what actually happened. The discount is
+too large (10×) to leave unmeasured.
+
+→ [`HEADROOM-SIDECAR.md`](../HEADROOM-SIDECAR.md),
+[`COLIBRI-TOKENOMICS-TRIFECTA.md`](../COLIBRI-TOKENOMICS-TRIFECTA.md),
+[`crates/colibri-deepseek/src/lib.rs`](../../crates/colibri-deepseek/src/lib.rs)
+
+### Three cost modes (fast → smart → max)
+
+| Mode  | Budget (tokens) | Behavior                                                                  |
+| ----- | --------------- | ------------------------------------------------------------------------- |
+| Fast  | 16K             | Maximum cache-hits, minimum fresh tokens. Rejects large expansions early.  |
+| Smart | 64K             | Default. Balances cache reuse with room for follow-up turns.               |
+| Max   | 256K            | Almost never hits budget. For one-shot deep tasks where cost is secondary. |
+
+The daemon **auto-escalates** when a session exhausts its budget in a lower
+mode: fast → smart → max. Escalation is one-way (never downgrades mid-session).
+
+**Why three modes, not a continuous slider**: simplicity wins here. Three
+well-understood points cover the space — operators pick by risk appetite, not
+by fine-tuning a number. The escalation chain means "start cheap, pay more only
+if it works."
+
+→ [`COLIBRI-TOKENOMICS-TRIFECTA.md`](../COLIBRI-TOKENOMICS-TRIFECTA.md),
+[`crates/colibri-daemon/src/cost.rs`](../../crates/colibri-daemon/src/cost.rs)
+
+### T14 compaction (budget trim, not truncate)
+
+When a session is about to exceed its budget, Colibri compacts the tool results
+in the volatile region — it sends them through the headroom sidecar for
+summarization, then trims the oldest volatile blocks until the prompt fits
+within budget. The **prefix** (system prompt, static context) is never trimmed
+— only the volatile suffix.
+
+If compaction is insufficient and auto-escalation is enabled, the mode steps up
+before truncating.
+
+**Why not just truncate**: truncating mid-conversation loses context the agent
+needs to continue. Compaction preserves the semantic content at lower token cost.
+The headroom sidecar is optional (off by default); without it, the fallback is
+simple truncation.
+
+→ [`HEADROOM-SIDECAR.md`](../HEADROOM-SIDECAR.md),
+[`crates/colibri-daemon/src/session.rs`](../../crates/colibri-daemon/src/session.rs)
+
+### Cache-hit probe (DeepSeek-specific)
+
+The `colibri-deepseek` crate sends a preflight request with a known prompt to
+the DeepSeek API and parses the response headers to determine the cache-hit
+split (prompt_cache_hit_tokens / prompt_cache_miss_tokens). This is
+provider-specific — DeepSeek is the only provider that exposes this granularity.
+The probe runs once per session configuration change, not per request.
+
+**Why a probe and not a hook**: middleware that intercepts every API response
+would couple cost tracking to the HTTP layer. A probe decouples it — the cost
+tracker asks "what was the cache ratio?" and the probe answers, independently of
+how the request was made.
+
+→ [`crates/colibri-deepseek/src/lib.rs`](../../crates/colibri-deepseek/src/lib.rs)
+
+## See also
+
+- [mother-hive](./mother-hive.md) — MCP architecture (different cost domain)
+- [quality-gates](./quality-gates.md) — the gate that validates cost-mode parsing
diff --git a/docs/wiki/glasspane.md b/docs/wiki/glasspane.md
new file mode 100644
index 0000000..6960166
--- /dev/null
+++ b/docs/wiki/glasspane.md
@@ -0,0 +1,97 @@
+# Glasspane — agent state supervision
+
+← [index](./index.md)
+
+## What this is
+
+Glasspane is Colibri's agent observation layer. It watches agent subprocesses
+via their JSONL stdout, folds the stream into a semantic state machine
+(`Idle → Working → Done`), and exposes a snapshot API for dashboards and
+daemon coordination. Every spawned agent — Pi, zot, or a local sample — feeds
+through the same ingestor and ends up in the same taxonomy.
+
+## Decisions
+
+### Agent state as a state machine, not raw event log
+
+Glasspane doesn't just relay raw agent events. It ingests JSONL lines and
+transitions a **named pane** through a finite set of states:
+
+```
+Idle → Working → Done
+         ↳ Error
+         ↳ Stalled (no events within a timeout window)
+```
+
+The `AgentState` enum (`Idle, Working, Done, Error, Stalled`) is deliberately
+small. It captures what a supervisor needs to know — "is the agent working?
+stuck? finished?" — without encoding agent-specific semantics. Events that don't
+change the state (e.g. a usage report from zot) are recorded in the pane's
+metadata but don't affect the state machine.
+
+**Why not just tail the log**: raw event logs are agent-specific and change over
+time (zot adds new event types). The state machine is a stable contract that the
+daemon, TUI, and client CLI can all rely on.
+
+→ [`crates/colibri-glasspane/src/lib.rs`](../../crates/colibri-glasspane/src/lib.rs)
+
+### JSONL streaming (one line = one event)
+
+Agents emit structured events as newline-delimited JSON on stdout. Glasspane
+reads line-by-line with `BufReader`, deserializes each line, and feeds it into
+the `PiJsonlIngestor` (the name is legacy — it handles zot events too).
+
+The reader runs in a **single background task per pane** (`pane_reader_loop`).
+It never blocks the daemon's main loop — the ingestor is a synchronous fold
+that updates the pane's in-memory state, and the snapshot API reads from
+`Arc<RwLock<...>>` with no contention on the reader hot path.
+
+Malformed lines are **skipped** with a counter increment, not an error —
+dropouts in an agent's JSONL shouldn't crash the observer.
+
+**Why JSONL, not a socket or gRPC**: the agent is a subprocess, not a service.
+stdout is the universal interface — every language, every harness, zero setup.
+JSONL is trivial to write from bash, Go, Python, Rust. A structured wire format
+would add a dep and a handshake to every agent.
+
+→ [`crates/colibri-glasspane/src/lib.rs`](../../crates/colibri-glasspane/src/lib.rs)
+(`PiJsonlIngestor`, `pane_reader_loop`)
+
+### `AgentRuntime { Pi, Zot, Local }` — one taxonomy for two harnesses
+
+Pi and zot emit **different** raw event types: Pi uses `agent_start` /
+`turn_end`, zot uses `turn_start` / `done`. Glasspane maps both into the same
+`AgentState` transitions via `zot_event_type()`. The `AgentRuntime` enum tags
+each pane with its harness so the mapping function knows which event vocabulary
+to parse.
+
+The `Pane` struct's `session_id` field uses `#[serde(alias = "pi_session_id")]`
+for backward compatibility with pre-neutrality serialized snapshots.
+
+**Why not have two separate state machines**: the TUI, daemon scheduler, and
+client CLI all need to ask "what state is this agent in?" — they don't care
+whether it's zot or Pi. One taxonomy, one API. The mapping is a ~50-line
+function, not a subsystem.
+
+→ [`crates/colibri-glasspane/src/lib.rs`](../../crates/colibri-glasspane/src/lib.rs)
+(`zot_event_type`, `AgentRuntime`)
+
+### Snapshot API (read-heavy, not write-heavy)
+
+Glasspane exposes a snapshot object (the full set of panes with their current
+state, session ID, timestamp, and metadata) through `Arc<RwLock<...>>`. The
+daemon serves this over its Unix socket to client readers. Writes happen once
+per event; reads are frequent (TUI polls, CLI status checks).
+
+**Why RwLock, not channels**: the write path is low-frequency (agent JSONL at
+human-reading speed), and the read path is lock-free in the common case. A
+channel-based design would add buffering and delivery semantics for a problem
+that's fundamentally about current state, not event delivery.
+
+→ [`crates/colibri-glasspane/src/lib.rs`](../../crates/colibri-glasspane/src/lib.rs)
+(`Supervisor`, `snapshot`)
+
+## See also
+
+- [agent-harness](./agent-harness.md) — the zot/Colibri split that Glasspane observes
+- [naming-decisions](./naming-decisions.md) — `pi_session_id → session_id`, `pi_type → event_type`
diff --git a/docs/wiki/index.md b/docs/wiki/index.md
index cba5e28..78c653a 100644
--- a/docs/wiki/index.md
+++ b/docs/wiki/index.md
@@ -1,11 +1,12 @@
 # Colibri Wiki
 
-A small, agent-maintained knowledge base for Colibri's **decisions and
-architecture** — based on Andrej Karpathy's
+A knowledge base for Colibri's **decisions and architecture** — based on
+Andrej Karpathy's
 [LLM Wiki pattern](https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f).
 
-This is a **pilot**. It deliberately covers a few decision-dense areas, not the
-whole repo.
+Every major subsystem has a page recording **why** it was built the way it
+was — the rationale the code can't express. Implementation docs in `docs/`
+cover the _how_; these pages cover the _why_.
 
 ## Why this exists
 
@@ -32,24 +33,23 @@ These rules keep the wiki a maintainable artifact, not a second source of truth:
 5. **Lint, don't trust.** A page is a claim to be checked against code, not a
    guarantee.
 
-## Lint workflow (the point of the pilot)
+## Lint workflow
 
-`lint` = an agent pass that reads each page and checks it against the current
-code: stale names, dangling references, contradictions, decisions that shipped
-but whose page still says "planned." Output is a **report**, not auto-edits —
-advisory first, until the signal is trusted. (Tool TBD — pilot step 2.)
-
-Open drift already noted by hand:
-
-- `stage-colibri-iso.sh` (clawdie-iso) and a guardrail comment reference
-  `ADR-agent-harness-consolidation.md`, which **does not exist** in either repo.
-  The real architecture statement is `AGENTS.md`. → see [agent-harness](./agent-harness.md).
+The [`wiki-lint`](../../scripts/wiki-lint) script checks every page against the
+current code: dangling references, resurrected old names (from the naming
+ledger), and orphan pages. It runs as part of `ci-checks.sh --strict` and is
+gated by the pre-push hook — a drift failure blocks a push, same as a clippy
+warning.
 
 ## Pages
 
-| Page                                      | What it covers                                                                  |
-| ----------------------------------------- | ------------------------------------------------------------------------------- |
-| [agent-harness](./agent-harness.md)       | The zot (agent) + Colibri (control plane) split; autospawn + RPC driver         |
-| [mother-hive](./mother-hive.md)           | Mother MCP architecture — forced-command SSH, single-home-in-colibri, peer auth |
-| [naming-decisions](./naming-decisions.md) | Ledger of harness-neutral / architecture renames — shipped and in-flight        |
-| [quality-gates](./quality-gates.md)       | `ci-checks.sh` as the pre-merge gate; why drift reached `main` before           |
+| Page                                      | What it covers                                                                                |
+| ----------------------------------------- | --------------------------------------------------------------------------------------------- |
+| [agent-harness](./agent-harness.md)       | The zot (agent) + Colibri (control plane) split; autospawn + RPC driver                       |
+| [cost-model](./cost-model.md)             | Byte-stable prefixes, cache-hit metering, auto-escalation, T14 compaction                     |
+| [glasspane](./glasspane.md)               | Agent state machine, JSONL streaming, AgentRuntime taxonomy, snapshot API                     |
+| [jail-confinement](./jail-confinement.md) | Persistent vs ephemeral jails, priv-mode policy, reuse of spawner confinement for MCP servers |
+| [mother-hive](./mother-hive.md)           | Mother MCP architecture — forced-command SSH, single-home-in-colibri, peer auth, key-on-seed  |
+| [naming-decisions](./naming-decisions.md) | Ledger of harness-neutral / architecture renames — shipped and in-flight                      |
+| [task-board](./task-board.md)             | Capability match scoring, cron scheduling, intake drain, SQLite backing                       |
+| [quality-gates](./quality-gates.md)       | `ci-checks.sh` as the pre-merge gate; why drift reached `main` before                         |
diff --git a/docs/wiki/jail-confinement.md b/docs/wiki/jail-confinement.md
new file mode 100644
index 0000000..1859279
--- /dev/null
+++ b/docs/wiki/jail-confinement.md
@@ -0,0 +1,92 @@
+# Jail confinement
+
+← [index](./index.md)
+
+## What this is
+
+Colibri can confine spawned agents and external MCP servers inside FreeBSD
+jails. The spawner wraps the subprocess command through `jexec` (persistent
+jails) or `jail -c` (ephemeral jails), so the agent's entire filesystem view
+and network are isolated. stdio passes through unchanged — the agent's JSONL
+still reaches Glasspane, and the MCP host's stdin/stdout transport still works.
+
+## Decisions
+
+### Reuse the spawner's confinement primitive (don't build a parallel one)
+
+The agent spawner and the external-MCP host both need to confine untrusted
+subprocesses. Instead of building a second confinement layer, the MCP host
+reuses the agent spawner's `jail_wrap()` function directly — the same
+`JailConfig` struct, the same `PrivMode` policy, the same `prepare_spawn_command`
+pipeline.
+
+**Why reuse**: two confinement paths → one can drift. The spawner is tested
+(20+ unit tests in `spawner.rs` covering named, ephemeral, staged, priv-mode
+variants). The MCP host gets a battle-tested implementation for free.
+
+→ [`COLIBRI-JAILED-AGENT-SPAWN-DESIGN.md`](../COLIBRI-JAILED-AGENT-SPAWN-DESIGN.md),
+[`crates/colibri-daemon/src/spawner.rs`](../../crates/colibri-daemon/src/spawner.rs)
+(`jail_wrap`, `JailConfig`),
+[`crates/colibri-mcp/src/external.rs`](../../crates/colibri-mcp/src/external.rs)
+
+### Persistent vs ephemeral jails
+
+| Type       | How                                       | When to use                                 |
+| ---------- | ----------------------------------------- | ------------------------------------------- |
+| Persistent | `jexec <name>` into an existing jail      | Operator-managed jails with preconfigured environments |
+| Ephemeral  | `jail -c command=<binary>` auto-destroyed | One-shot confinement, no state between runs  |
+
+The `JailConfig` struct uses an enum: if `name` is set, jexec; if `path` is set,
+ephemeral. They're mutually exclusive; `name` takes precedence.
+
+**Why both**: persistent jails are operator-managed infrastructure (a build jail,
+a worker jail that persists between agent runs). Ephemeral jails are for
+untrusted one-shot work — like an external MCP server from a third-party
+registry. The caller picks the lifecycle.
+
+→ [`COLIBRI-JAILED-AGENT-SPAWN-DESIGN.md`](../COLIBRI-JAILED-AGENT-SPAWN-DESIGN.md)
+
+### Priv-mode policy (`mdo` on live USB, `helper` on deployed)
+
+The daemon is unprivileged but jail creation requires root. The priv-mode
+policy resolves this without granting the daemon blanket sudo:
+
+- **`mdo`** — the live USB's operator tool (`mdo -u root jail -c ...`). Used
+  on the operator image where `mdo` is configured.
+- **`helper`** — a setuid helper binary on deployed hosts (not yet shipped;
+  falls back to `sudo`). The daemon never runs as root.
+
+The policy is configurable via `COLIBRI_JAIL_PRIV_MODE` and is resolved once
+at daemon startup. The same policy applies to agents and MCP servers.
+
+**Why not the daemon as root**: the daemon spawns arbitrary subprocesses
+(potentially attacker-controlled, via the MCP registry or task intake).
+Running as unprivileged `colibri` limits the blast radius; the priv-mode
+helper grants only the specific operations needed (jail creation).
+
+→ [`COLIBRI-JAILED-AGENT-SPAWN-DESIGN.md`](../COLIBRI-JAILED-AGENT-SPAWN-DESIGN.md),
+[`crates/colibri-daemon/src/spawner.rs`](../../crates/colibri-daemon/src/spawner.rs)
+(`PrivMode`)
+
+### MCP servers are jailed by default (same threat model as agents)
+
+External MCP servers registered in the external MCP registry accept an optional
+`jail` field with the same shape as agent spawn configs. The MCP host applies
+the jail wrapper before spawning the server. Servers without a `jail` field
+run on the host (backward compatible).
+
+The MCP host's registry entry supports per-server jail configuration —
+different servers can run in different jails. This is a property of the
+registry, not a global daemon setting.
+
+**Why jailed by default**: external MCP servers are arbitrary third-party
+binaries — at least as untrusted as the agents Colibri already jails. The
+threat model is identical.
+
+→ [`COLIBRI-EXTERNAL-MCP-PROTOTYPE.md`](../COLIBRI-EXTERNAL-MCP-PROTOTYPE.md),
+[`crates/colibri-mcp/src/external.rs`](../../crates/colibri-mcp/src/external.rs)
+
+## See also
+
+- [mother-hive](./mother-hive.md) — the SSH forced-command boundary (a different confinement model)
+- [agent-harness](./agent-harness.md) — the spawner that jails agents
diff --git a/docs/wiki/task-board.md b/docs/wiki/task-board.md
new file mode 100644
index 0000000..56bd961
--- /dev/null
+++ b/docs/wiki/task-board.md
@@ -0,0 +1,93 @@
+# Task board + scheduler
+
+← [index](./index.md)
+
+## What this is
+
+Colibri's task board holds operator-submitted work items, and the scheduler
+assigns them to the best-fit agent on each tick. Tasks flow in via the
+daemon's Unix socket (`create-task`, `intake-task`) and are drained by the
+scheduler loop running inside the daemon every ~30 seconds.
+
+## Decisions
+
+### Capability match scoring (best-fit, not first-fit)
+
+When the scheduler picks an agent for a task, it scores every available agent
+against the task's **required capabilities** using a simple intersection count:
+`|required ∩ agent_caps| / |required|`. The agent with the highest score wins;
+ties are broken by agent name (deterministic, so repeated runs don't thrash).
+
+A task with `["freebsd", "zfs"]` will match an agent with both capabilities
+over one with only `freebsd`. A task with no required capabilities matches
+any agent. Offline agents and agents whose capabilities don't intersect at all
+are skipped.
+
+**Why not round-robin or FIFO**: capability-based matching means the right agent
+gets the right work without operator hand-assignment. The scoring is trivial
+(set intersection) and transparent — no machine learning, no weights to tune.
+
+→ [`crates/colibri-daemon/src/scheduler.rs`](../../crates/colibri-daemon/src/scheduler.rs)
+(`capability_match_score`, `pick_agent`)
+
+### Three schedule types (cron, interval, once)
+
+| Type     | Behavior                                                          |
+| -------- | ----------------------------------------------------------------- |
+| Cron     | Fires at specific wall-clock times (e.g. `0 0 * * *` = midnight). |
+| Interval | Fires after a fixed duration since last run (e.g. 3600s).         |
+| Once     | Fires exactly once, at the specified future time.                 |
+
+Cron patterns are simple 5-field expressions (minute, hour, day, month,
+weekday) with wildcards — no second granularity, no `/step` syntax. The
+matching uses prefix comparison: a cron pattern matches if each field of the
+current time begins with the pattern string, so `0` matches `00`, `1` matches
+`10-19`, etc. This is intentionally simple — cron is a convenience for periodic
+housekeeping, not a general-purpose job engine.
+
+**Why not use a real cron library**: the scheduler's job is dispatching tasks to
+agents, not calendar management. The simple prefix-match cron covers 90% of use
+cases (daily builds, hourly reports) without pulling in a parsing dependency.
+
+→ [`crates/colibri-daemon/src/scheduler.rs`](../../crates/colibri-daemon/src/scheduler.rs)
+(`should_fire`)
+
+### Intake drain (queue → task board → agent)
+
+The `intake-task` socket command pushes a task onto the intake queue. On each
+scheduler tick (~30s), the loop drains the intake queue into the task board's
+SQLite store, then checks for due scheduled jobs. This two-phase drain
+decouples submission from execution: the operator submits at any time, the
+scheduler processes in batches.
+
+Tasks in the intake queue carry a **capability string** (not an agent ID). The
+scheduler picks the best agent at execution time, so a task submitted when no
+matching agent is online will be picked up when one connects.
+
+**Why an intake queue, not direct assignment**: agents come and go. If submission
+required picking an agent, the operator would need to know which agents are
+available — a coupling the task board deliberately avoids.
+
+→ [`crates/colibri-daemon/src/scheduler.rs`](../../crates/colibri-daemon/src/scheduler.rs)
+(`Scheduler`, `add_job`, `submit`),
+[`crates/colibri-daemon/tests/intake_scheduler_loop.rs`](../../crates/colibri-daemon/tests/intake_scheduler_loop.rs)
+
+### SQLite backing (embedded, not a service)
+
+The task board stores tasks, agent registrations, tenant info, and the skills
+catalog in an embedded SQLite database at `/var/db/colibri/colibri.sqlite`. No
+separate database process — the daemon opens the file directly.
+
+**Why SQLite, not PostgreSQL**: the daemon runs on the operator USB and on
+deployed hosts. A full PostgreSQL service is heavyweight for a single daemon's
+coordination state. SQLite is zero-config, zero-admin, and survives daemon
+restarts without a separate lifecycle. The mother node uses PostgreSQL for the
+hive registry because it's multi-tenant; the local daemon is single-tenant.
+
+→ [`crates/colibri-store/src/lib.rs`](../../crates/colibri-store/src/lib.rs)
+
+## See also
+
+- [mother-hive](./mother-hive.md) — the mother node's PostgreSQL-based hive registry
+- [cost-model](./cost-model.md) — cost tracking per session
+- [agent-harness](./agent-harness.md) — autospawn
-- 
2.45.3