feat(wiki): expand to full coverage — cost-model, glasspane, task-board, jail-confinement #168
5 changed files with 393 additions and 21 deletions
90
docs/wiki/cost-model.md
Normal file
90
docs/wiki/cost-model.md
Normal file
|
|
@ -0,0 +1,90 @@
|
||||||
|
# Cost model
|
||||||
|
|
||||||
|
← [index](./index.md)
|
||||||
|
|
||||||
|
## What this is
|
||||||
|
|
||||||
|
Colibri tracks every token that passes through an agent session and meters cost
|
||||||
|
against a configurable budget. The key insight: **cache-hit tokens cost 10×
|
||||||
|
less** than fresh tokens on DeepSeek — so the prompt prefix is engineered to be
|
||||||
|
byte-stable across requests, maximizing cache hits. Three cost modes (fast,
|
||||||
|
smart, max) represent different points on the speed/cost trade-off, and the
|
||||||
|
model auto-escalates when a cheaper mode can't keep up.
|
||||||
|
|
||||||
|
## Decisions
|
||||||
|
|
||||||
|
### Byte-stable prompt prefix → cache-hit metering
|
||||||
|
|
||||||
|
The system prompt and early context blocks are **byte-for-byte identical**
|
||||||
|
across consecutive requests to the same DeepSeek endpoint. DeepSeek's cache-hit
|
||||||
|
pricing discounts these by ~90%. Colibri's `colibri-deepseek` probe determines
|
||||||
|
the exact token-count split between cached and fresh tokens per request, and
|
||||||
|
the cost tracker records both so the session budget reflects the **actual**
|
||||||
|
discounted cost, not the nominal token count.
|
||||||
|
|
||||||
|
**Why not just count tokens**: token counting with an offline tokenizer gives
|
||||||
|
you an upper bound but not the real cost. DeepSeek's API sometimes re-caches and
|
||||||
|
sometimes doesn't — the probe measures what actually happened. The discount is
|
||||||
|
too large (10×) to leave unmeasured.
|
||||||
|
|
||||||
|
→ [`HEADROOM-SIDECAR.md`](../HEADROOM-SIDECAR.md),
|
||||||
|
[`COLIBRI-TOKENOMICS-TRIFECTA.md`](../COLIBRI-TOKENOMICS-TRIFECTA.md),
|
||||||
|
[`crates/colibri-deepseek/src/lib.rs`](../../crates/colibri-deepseek/src/lib.rs)
|
||||||
|
|
||||||
|
### Three cost modes (fast → smart → max)
|
||||||
|
|
||||||
|
| Mode | Budget (tokens) | Behavior |
|
||||||
|
| ----- | --------------- | ------------------------------------------------------------------------- |
|
||||||
|
| Fast | 16K | Maximum cache-hits, minimum fresh tokens. Rejects large expansions early. |
|
||||||
|
| Smart | 64K | Default. Balances cache reuse with room for follow-up turns. |
|
||||||
|
| Max | 256K | Almost never hits budget. For one-shot deep tasks where cost is secondary. |
|
||||||
|
|
||||||
|
The daemon **auto-escalates** when a session exhausts its budget in a lower
|
||||||
|
mode: fast → smart → max. Escalation is one-way (never downgrades mid-session).
|
||||||
|
|
||||||
|
**Why three modes, not a continuous slider**: simplicity wins here. Three
|
||||||
|
well-understood points cover the space — operators pick by risk appetite, not
|
||||||
|
by fine-tuning a number. The escalation chain means "start cheap, pay more only
|
||||||
|
if it works."
|
||||||
|
|
||||||
|
→ [`COLIBRI-TOKENOMICS-TRIFECTA.md`](../COLIBRI-TOKENOMICS-TRIFECTA.md),
|
||||||
|
[`crates/colibri-daemon/src/cost.rs`](../../crates/colibri-daemon/src/cost.rs)
|
||||||
|
|
||||||
|
### T14 compaction (budget trim, not truncate)
|
||||||
|
|
||||||
|
When a session is about to exceed its budget, Colibri compacts the tool results
|
||||||
|
in the volatile region — it sends them through the headroom sidecar for
|
||||||
|
summarization, then trims the oldest volatile blocks until the prompt fits
|
||||||
|
within budget. The **prefix** (system prompt, static context) is never trimmed
|
||||||
|
— only the volatile suffix.
|
||||||
|
|
||||||
|
If compaction is insufficient and auto-escalation is enabled, the mode steps up
|
||||||
|
before truncating.
|
||||||
|
|
||||||
|
**Why not just truncate**: truncating mid-conversation loses context the agent
|
||||||
|
needs to continue. Compaction preserves the semantic content at lower token cost.
|
||||||
|
The headroom sidecar is optional (off by default); without it, the fallback is
|
||||||
|
simple truncation.
|
||||||
|
|
||||||
|
→ [`HEADROOM-SIDECAR.md`](../HEADROOM-SIDECAR.md),
|
||||||
|
[`crates/colibri-daemon/src/session.rs`](../../crates/colibri-daemon/src/session.rs)
|
||||||
|
|
||||||
|
### Cache-hit probe (DeepSeek-specific)
|
||||||
|
|
||||||
|
The `colibri-deepseek` crate sends a preflight request with a known prompt to
|
||||||
|
the DeepSeek API and parses the response headers to determine the cache-hit
|
||||||
|
split (prompt_cache_hit_tokens / prompt_cache_miss_tokens). This is
|
||||||
|
provider-specific — DeepSeek is the only provider that exposes this granularity.
|
||||||
|
The probe runs once per session configuration change, not per request.
|
||||||
|
|
||||||
|
**Why a probe and not a hook**: middleware that intercepts every API response
|
||||||
|
would couple cost tracking to the HTTP layer. A probe decouples it — the cost
|
||||||
|
tracker asks "what was the cache ratio?" and the probe answers, independently of
|
||||||
|
how the request was made.
|
||||||
|
|
||||||
|
→ [`crates/colibri-deepseek/src/lib.rs`](../../crates/colibri-deepseek/src/lib.rs)
|
||||||
|
|
||||||
|
## See also
|
||||||
|
|
||||||
|
- [mother-hive](./mother-hive.md) — MCP architecture (different cost domain)
|
||||||
|
- [quality-gates](./quality-gates.md) — the gate that validates cost-mode parsing
|
||||||
97
docs/wiki/glasspane.md
Normal file
97
docs/wiki/glasspane.md
Normal file
|
|
@ -0,0 +1,97 @@
|
||||||
|
# Glasspane — agent state supervision
|
||||||
|
|
||||||
|
← [index](./index.md)
|
||||||
|
|
||||||
|
## What this is
|
||||||
|
|
||||||
|
Glasspane is Colibri's agent observation layer. It watches agent subprocesses
|
||||||
|
via their JSONL stdout, folds the stream into a semantic state machine
|
||||||
|
(`Idle → Working → Done`), and exposes a snapshot API for dashboards and
|
||||||
|
daemon coordination. Every spawned agent — Pi, zot, or a local sample — feeds
|
||||||
|
through the same ingestor and ends up in the same taxonomy.
|
||||||
|
|
||||||
|
## Decisions
|
||||||
|
|
||||||
|
### Agent state as a state machine, not raw event log
|
||||||
|
|
||||||
|
Glasspane doesn't just relay raw agent events. It ingests JSONL lines and
|
||||||
|
transitions a **named pane** through a finite set of states:
|
||||||
|
|
||||||
|
```
|
||||||
|
Idle → Working → Done
|
||||||
|
↳ Error
|
||||||
|
↳ Stalled (no events within a timeout window)
|
||||||
|
```
|
||||||
|
|
||||||
|
The `AgentState` enum (`Idle, Working, Done, Error, Stalled`) is deliberately
|
||||||
|
small. It captures what a supervisor needs to know — "is the agent working?
|
||||||
|
stuck? finished?" — without encoding agent-specific semantics. Events that don't
|
||||||
|
change the state (e.g. a usage report from zot) are recorded in the pane's
|
||||||
|
metadata but don't affect the state machine.
|
||||||
|
|
||||||
|
**Why not just tail the log**: raw event logs are agent-specific and change over
|
||||||
|
time (zot adds new event types). The state machine is a stable contract that the
|
||||||
|
daemon, TUI, and client CLI can all rely on.
|
||||||
|
|
||||||
|
→ [`crates/colibri-glasspane/src/lib.rs`](../../crates/colibri-glasspane/src/lib.rs)
|
||||||
|
|
||||||
|
### JSONL streaming (one line = one event)
|
||||||
|
|
||||||
|
Agents emit structured events as newline-delimited JSON on stdout. Glasspane
|
||||||
|
reads line-by-line with `BufReader`, deserializes each line, and feeds it into
|
||||||
|
the `PiJsonlIngestor` (the name is legacy — it handles zot events too).
|
||||||
|
|
||||||
|
The reader runs in a **single background task per pane** (`pane_reader_loop`).
|
||||||
|
It never blocks the daemon's main loop — the ingestor is a synchronous fold
|
||||||
|
that updates the pane's in-memory state, and the snapshot API reads from
|
||||||
|
`Arc<RwLock<...>>` with no contention on the reader hot path.
|
||||||
|
|
||||||
|
Malformed lines are **skipped** with a counter increment, not an error —
|
||||||
|
dropouts in an agent's JSONL shouldn't crash the observer.
|
||||||
|
|
||||||
|
**Why JSONL, not a socket or gRPC**: the agent is a subprocess, not a service.
|
||||||
|
stdout is the universal interface — every language, every harness, zero setup.
|
||||||
|
JSONL is trivial to write from bash, Go, Python, Rust. A structured wire format
|
||||||
|
would add a dep and a handshake to every agent.
|
||||||
|
|
||||||
|
→ [`crates/colibri-glasspane/src/lib.rs`](../../crates/colibri-glasspane/src/lib.rs)
|
||||||
|
(`PiJsonlIngestor`, `pane_reader_loop`)
|
||||||
|
|
||||||
|
### `AgentRuntime { Pi, Zot, Local }` — one taxonomy for two harnesses
|
||||||
|
|
||||||
|
Pi and zot emit **different** raw event types: Pi uses `agent_start` /
|
||||||
|
`turn_end`, zot uses `turn_start` / `done`. Glasspane maps both into the same
|
||||||
|
`AgentState` transitions via `zot_event_type()`. The `AgentRuntime` enum tags
|
||||||
|
each pane with its harness so the mapping function knows which event vocabulary
|
||||||
|
to parse.
|
||||||
|
|
||||||
|
The `Pane` struct's `session_id` field uses `#[serde(alias = "pi_session_id")]`
|
||||||
|
for backward compatibility with pre-neutrality serialized snapshots.
|
||||||
|
|
||||||
|
**Why not have two separate state machines**: the TUI, daemon scheduler, and
|
||||||
|
client CLI all need to ask "what state is this agent in?" — they don't care
|
||||||
|
whether it's zot or Pi. One taxonomy, one API. The mapping is a ~50-line
|
||||||
|
function, not a subsystem.
|
||||||
|
|
||||||
|
→ [`crates/colibri-glasspane/src/lib.rs`](../../crates/colibri-glasspane/src/lib.rs)
|
||||||
|
(`zot_event_type`, `AgentRuntime`)
|
||||||
|
|
||||||
|
### Snapshot API (read-heavy, not write-heavy)
|
||||||
|
|
||||||
|
Glasspane exposes a snapshot object (the full set of panes with their current
|
||||||
|
state, session ID, timestamp, and metadata) through `Arc<RwLock<...>>`. The
|
||||||
|
daemon serves this over its Unix socket to client readers. Writes happen once
|
||||||
|
per event; reads are frequent (TUI polls, CLI status checks).
|
||||||
|
|
||||||
|
**Why RwLock, not channels**: the write path is low-frequency (agent JSONL at
|
||||||
|
human-reading speed), and the read path is lock-free in the common case. A
|
||||||
|
channel-based design would add buffering and delivery semantics for a problem
|
||||||
|
that's fundamentally about current state, not event delivery.
|
||||||
|
|
||||||
|
→ [`crates/colibri-glasspane/src/lib.rs`](../../crates/colibri-glasspane/src/lib.rs)
|
||||||
|
(`Supervisor`, `snapshot`)
|
||||||
|
|
||||||
|
## See also
|
||||||
|
|
||||||
|
- [agent-harness](./agent-harness.md) — the zot/Colibri split that Glasspane observes
|
||||||
|
- [naming-decisions](./naming-decisions.md) — `pi_session_id → session_id`, `pi_type → event_type`
|
||||||
|
|
@ -1,11 +1,12 @@
|
||||||
# Colibri Wiki
|
# Colibri Wiki
|
||||||
|
|
||||||
A small, agent-maintained knowledge base for Colibri's **decisions and
|
A knowledge base for Colibri's **decisions and architecture** — based on
|
||||||
architecture** — based on Andrej Karpathy's
|
Andrej Karpathy's
|
||||||
[LLM Wiki pattern](https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f).
|
[LLM Wiki pattern](https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f).
|
||||||
|
|
||||||
This is a **pilot**. It deliberately covers a few decision-dense areas, not the
|
Every major subsystem has a page recording **why** it was built the way it
|
||||||
whole repo.
|
was — the rationale the code can't express. Implementation docs in `docs/`
|
||||||
|
cover the _how_; these pages cover the _why_.
|
||||||
|
|
||||||
## Why this exists
|
## Why this exists
|
||||||
|
|
||||||
|
|
@ -32,24 +33,23 @@ These rules keep the wiki a maintainable artifact, not a second source of truth:
|
||||||
5. **Lint, don't trust.** A page is a claim to be checked against code, not a
|
5. **Lint, don't trust.** A page is a claim to be checked against code, not a
|
||||||
guarantee.
|
guarantee.
|
||||||
|
|
||||||
## Lint workflow (the point of the pilot)
|
## Lint workflow
|
||||||
|
|
||||||
`lint` = an agent pass that reads each page and checks it against the current
|
The [`wiki-lint`](../../scripts/wiki-lint) script checks every page against the
|
||||||
code: stale names, dangling references, contradictions, decisions that shipped
|
current code: dangling references, resurrected old names (from the naming
|
||||||
but whose page still says "planned." Output is a **report**, not auto-edits —
|
ledger), and orphan pages. It runs as part of `ci-checks.sh --strict` and is
|
||||||
advisory first, until the signal is trusted. (Tool TBD — pilot step 2.)
|
gated by the pre-push hook — a drift failure blocks a push, same as a clippy
|
||||||
|
warning.
|
||||||
Open drift already noted by hand:
|
|
||||||
|
|
||||||
- `stage-colibri-iso.sh` (clawdie-iso) and a guardrail comment reference
|
|
||||||
`ADR-agent-harness-consolidation.md`, which **does not exist** in either repo.
|
|
||||||
The real architecture statement is `AGENTS.md`. → see [agent-harness](./agent-harness.md).
|
|
||||||
|
|
||||||
## Pages
|
## Pages
|
||||||
|
|
||||||
| Page | What it covers |
|
| Page | What it covers |
|
||||||
| ----------------------------------------- | ------------------------------------------------------------------------------- |
|
| ----------------------------------------- | --------------------------------------------------------------------------------------------- |
|
||||||
| [agent-harness](./agent-harness.md) | The zot (agent) + Colibri (control plane) split; autospawn + RPC driver |
|
| [agent-harness](./agent-harness.md) | The zot (agent) + Colibri (control plane) split; autospawn + RPC driver |
|
||||||
| [mother-hive](./mother-hive.md) | Mother MCP architecture — forced-command SSH, single-home-in-colibri, peer auth |
|
| [cost-model](./cost-model.md) | Byte-stable prefixes, cache-hit metering, auto-escalation, T14 compaction |
|
||||||
| [naming-decisions](./naming-decisions.md) | Ledger of harness-neutral / architecture renames — shipped and in-flight |
|
| [glasspane](./glasspane.md) | Agent state machine, JSONL streaming, AgentRuntime taxonomy, snapshot API |
|
||||||
| [quality-gates](./quality-gates.md) | `ci-checks.sh` as the pre-merge gate; why drift reached `main` before |
|
| [jail-confinement](./jail-confinement.md) | Persistent vs ephemeral jails, priv-mode policy, reuse of spawner confinement for MCP servers |
|
||||||
|
| [mother-hive](./mother-hive.md) | Mother MCP architecture — forced-command SSH, single-home-in-colibri, peer auth, key-on-seed |
|
||||||
|
| [naming-decisions](./naming-decisions.md) | Ledger of harness-neutral / architecture renames — shipped and in-flight |
|
||||||
|
| [task-board](./task-board.md) | Capability match scoring, cron scheduling, intake drain, SQLite backing |
|
||||||
|
| [quality-gates](./quality-gates.md) | `ci-checks.sh` as the pre-merge gate; why drift reached `main` before |
|
||||||
|
|
|
||||||
92
docs/wiki/jail-confinement.md
Normal file
92
docs/wiki/jail-confinement.md
Normal file
|
|
@ -0,0 +1,92 @@
|
||||||
|
# Jail confinement
|
||||||
|
|
||||||
|
← [index](./index.md)
|
||||||
|
|
||||||
|
## What this is
|
||||||
|
|
||||||
|
Colibri can confine spawned agents and external MCP servers inside FreeBSD
|
||||||
|
jails. The spawner wraps the subprocess command through `jexec` (persistent
|
||||||
|
jails) or `jail -c` (ephemeral jails), so the agent's entire filesystem view
|
||||||
|
and network are isolated. stdio passes through unchanged — the agent's JSONL
|
||||||
|
still reaches Glasspane, and the MCP host's stdin/stdout transport still works.
|
||||||
|
|
||||||
|
## Decisions
|
||||||
|
|
||||||
|
### Reuse the spawner's confinement primitive (don't build a parallel one)
|
||||||
|
|
||||||
|
The agent spawner and the external-MCP host both need to confine untrusted
|
||||||
|
subprocesses. Instead of building a second confinement layer, the MCP host
|
||||||
|
reuses the agent spawner's `jail_wrap()` function directly — the same
|
||||||
|
`JailConfig` struct, the same `PrivMode` policy, the same `prepare_spawn_command`
|
||||||
|
pipeline.
|
||||||
|
|
||||||
|
**Why reuse**: two confinement paths → one can drift. The spawner is tested
|
||||||
|
(20+ unit tests in `spawner.rs` covering named, ephemeral, staged, priv-mode
|
||||||
|
variants). The MCP host gets a battle-tested implementation for free.
|
||||||
|
|
||||||
|
→ [`COLIBRI-JAILED-AGENT-SPAWN-DESIGN.md`](../COLIBRI-JAILED-AGENT-SPAWN-DESIGN.md),
|
||||||
|
[`crates/colibri-daemon/src/spawner.rs`](../../crates/colibri-daemon/src/spawner.rs)
|
||||||
|
(`jail_wrap`, `JailConfig`),
|
||||||
|
[`crates/colibri-mcp/src/external.rs`](../../crates/colibri-mcp/src/external.rs)
|
||||||
|
|
||||||
|
### Persistent vs ephemeral jails
|
||||||
|
|
||||||
|
| Type | How | When to use |
|
||||||
|
| ---------- | ----------------------------------------- | ------------------------------------------- |
|
||||||
|
| Persistent | `jexec <name>` into an existing jail | Operator-managed jails with preconfigured environments |
|
||||||
|
| Ephemeral | `jail -c command=<binary>` auto-destroyed | One-shot confinement, no state between runs |
|
||||||
|
|
||||||
|
The `JailConfig` struct uses an enum: if `name` is set, jexec; if `path` is set,
|
||||||
|
ephemeral. They're mutually exclusive; `name` takes precedence.
|
||||||
|
|
||||||
|
**Why both**: persistent jails are operator-managed infrastructure (a build jail,
|
||||||
|
a worker jail that persists between agent runs). Ephemeral jails are for
|
||||||
|
untrusted one-shot work — like an external MCP server from a third-party
|
||||||
|
registry. The caller picks the lifecycle.
|
||||||
|
|
||||||
|
→ [`COLIBRI-JAILED-AGENT-SPAWN-DESIGN.md`](../COLIBRI-JAILED-AGENT-SPAWN-DESIGN.md)
|
||||||
|
|
||||||
|
### Priv-mode policy (`mdo` on live USB, `helper` on deployed)
|
||||||
|
|
||||||
|
The daemon is unprivileged but jail creation requires root. The priv-mode
|
||||||
|
policy resolves this without granting the daemon blanket sudo:
|
||||||
|
|
||||||
|
- **`mdo`** — the live USB's operator tool (`mdo -u root jail -c ...`). Used
|
||||||
|
on the operator image where `mdo` is configured.
|
||||||
|
- **`helper`** — a setuid helper binary on deployed hosts (not yet shipped;
|
||||||
|
falls back to `sudo`). The daemon never runs as root.
|
||||||
|
|
||||||
|
The policy is configurable via `COLIBRI_JAIL_PRIV_MODE` and is resolved once
|
||||||
|
at daemon startup. The same policy applies to agents and MCP servers.
|
||||||
|
|
||||||
|
**Why not the daemon as root**: the daemon spawns arbitrary subprocesses
|
||||||
|
(potentially attacker-controlled, via the MCP registry or task intake).
|
||||||
|
Running as unprivileged `colibri` limits the blast radius; the priv-mode
|
||||||
|
helper grants only the specific operations needed (jail creation).
|
||||||
|
|
||||||
|
→ [`COLIBRI-JAILED-AGENT-SPAWN-DESIGN.md`](../COLIBRI-JAILED-AGENT-SPAWN-DESIGN.md),
|
||||||
|
[`crates/colibri-daemon/src/spawner.rs`](../../crates/colibri-daemon/src/spawner.rs)
|
||||||
|
(`PrivMode`)
|
||||||
|
|
||||||
|
### MCP servers are jailed by default (same threat model as agents)
|
||||||
|
|
||||||
|
External MCP servers registered in the external MCP registry accept an optional
|
||||||
|
`jail` field with the same shape as agent spawn configs. The MCP host applies
|
||||||
|
the jail wrapper before spawning the server. Servers without a `jail` field
|
||||||
|
run on the host (backward compatible).
|
||||||
|
|
||||||
|
The MCP host's registry entry supports per-server jail configuration —
|
||||||
|
different servers can run in different jails. This is a property of the
|
||||||
|
registry, not a global daemon setting.
|
||||||
|
|
||||||
|
**Why jailed by default**: external MCP servers are arbitrary third-party
|
||||||
|
binaries — at least as untrusted as the agents Colibri already jails. The
|
||||||
|
threat model is identical.
|
||||||
|
|
||||||
|
→ [`COLIBRI-EXTERNAL-MCP-PROTOTYPE.md`](../COLIBRI-EXTERNAL-MCP-PROTOTYPE.md),
|
||||||
|
[`crates/colibri-mcp/src/external.rs`](../../crates/colibri-mcp/src/external.rs)
|
||||||
|
|
||||||
|
## See also
|
||||||
|
|
||||||
|
- [mother-hive](./mother-hive.md) — the SSH forced-command boundary (a different confinement model)
|
||||||
|
- [agent-harness](./agent-harness.md) — the spawner that jails agents
|
||||||
93
docs/wiki/task-board.md
Normal file
93
docs/wiki/task-board.md
Normal file
|
|
@ -0,0 +1,93 @@
|
||||||
|
# Task board + scheduler
|
||||||
|
|
||||||
|
← [index](./index.md)
|
||||||
|
|
||||||
|
## What this is
|
||||||
|
|
||||||
|
Colibri's task board holds operator-submitted work items, and the scheduler
|
||||||
|
assigns them to the best-fit agent on each tick. Tasks flow in via the
|
||||||
|
daemon's Unix socket (`create-task`, `intake-task`) and are drained by the
|
||||||
|
scheduler loop running inside the daemon every ~30 seconds.
|
||||||
|
|
||||||
|
## Decisions
|
||||||
|
|
||||||
|
### Capability match scoring (best-fit, not first-fit)
|
||||||
|
|
||||||
|
When the scheduler picks an agent for a task, it scores every available agent
|
||||||
|
against the task's **required capabilities** using a simple intersection count:
|
||||||
|
`|required ∩ agent_caps| / |required|`. The agent with the highest score wins;
|
||||||
|
ties are broken by agent name (deterministic, so repeated runs don't thrash).
|
||||||
|
|
||||||
|
A task with `["freebsd", "zfs"]` will match an agent with both capabilities
|
||||||
|
over one with only `freebsd`. A task with no required capabilities matches
|
||||||
|
any agent. Offline agents and agents whose capabilities don't intersect at all
|
||||||
|
are skipped.
|
||||||
|
|
||||||
|
**Why not round-robin or FIFO**: capability-based matching means the right agent
|
||||||
|
gets the right work without operator hand-assignment. The scoring is trivial
|
||||||
|
(set intersection) and transparent — no machine learning, no weights to tune.
|
||||||
|
|
||||||
|
→ [`crates/colibri-daemon/src/scheduler.rs`](../../crates/colibri-daemon/src/scheduler.rs)
|
||||||
|
(`capability_match_score`, `pick_agent`)
|
||||||
|
|
||||||
|
### Three schedule types (cron, interval, once)
|
||||||
|
|
||||||
|
| Type | Behavior |
|
||||||
|
| -------- | ----------------------------------------------------------------- |
|
||||||
|
| Cron | Fires at specific wall-clock times (e.g. `0 0 * * *` = midnight). |
|
||||||
|
| Interval | Fires after a fixed duration since last run (e.g. 3600s). |
|
||||||
|
| Once | Fires exactly once, at the specified future time. |
|
||||||
|
|
||||||
|
Cron patterns are simple 5-field expressions (minute, hour, day, month,
|
||||||
|
weekday) with wildcards — no second granularity, no `/step` syntax. The
|
||||||
|
matching uses prefix comparison: a cron pattern matches if each field of the
|
||||||
|
current time begins with the pattern string, so `0` matches `00`, `1` matches
|
||||||
|
`10-19`, etc. This is intentionally simple — cron is a convenience for periodic
|
||||||
|
housekeeping, not a general-purpose job engine.
|
||||||
|
|
||||||
|
**Why not use a real cron library**: the scheduler's job is dispatching tasks to
|
||||||
|
agents, not calendar management. The simple prefix-match cron covers 90% of use
|
||||||
|
cases (daily builds, hourly reports) without pulling in a parsing dependency.
|
||||||
|
|
||||||
|
→ [`crates/colibri-daemon/src/scheduler.rs`](../../crates/colibri-daemon/src/scheduler.rs)
|
||||||
|
(`should_fire`)
|
||||||
|
|
||||||
|
### Intake drain (queue → task board → agent)
|
||||||
|
|
||||||
|
The `intake-task` socket command pushes a task onto the intake queue. On each
|
||||||
|
scheduler tick (~30s), the loop drains the intake queue into the task board's
|
||||||
|
SQLite store, then checks for due scheduled jobs. This two-phase drain
|
||||||
|
decouples submission from execution: the operator submits at any time, the
|
||||||
|
scheduler processes in batches.
|
||||||
|
|
||||||
|
Tasks in the intake queue carry a **capability string** (not an agent ID). The
|
||||||
|
scheduler picks the best agent at execution time, so a task submitted when no
|
||||||
|
matching agent is online will be picked up when one connects.
|
||||||
|
|
||||||
|
**Why an intake queue, not direct assignment**: agents come and go. If submission
|
||||||
|
required picking an agent, the operator would need to know which agents are
|
||||||
|
available — a coupling the task board deliberately avoids.
|
||||||
|
|
||||||
|
→ [`crates/colibri-daemon/src/scheduler.rs`](../../crates/colibri-daemon/src/scheduler.rs)
|
||||||
|
(`Scheduler`, `add_job`, `submit`),
|
||||||
|
[`crates/colibri-daemon/tests/intake_scheduler_loop.rs`](../../crates/colibri-daemon/tests/intake_scheduler_loop.rs)
|
||||||
|
|
||||||
|
### SQLite backing (embedded, not a service)
|
||||||
|
|
||||||
|
The task board stores tasks, agent registrations, tenant info, and the skills
|
||||||
|
catalog in an embedded SQLite database at `/var/db/colibri/colibri.sqlite`. No
|
||||||
|
separate database process — the daemon opens the file directly.
|
||||||
|
|
||||||
|
**Why SQLite, not PostgreSQL**: the daemon runs on the operator USB and on
|
||||||
|
deployed hosts. A full PostgreSQL service is heavyweight for a single daemon's
|
||||||
|
coordination state. SQLite is zero-config, zero-admin, and survives daemon
|
||||||
|
restarts without a separate lifecycle. The mother node uses PostgreSQL for the
|
||||||
|
hive registry because it's multi-tenant; the local daemon is single-tenant.
|
||||||
|
|
||||||
|
→ [`crates/colibri-store/src/lib.rs`](../../crates/colibri-store/src/lib.rs)
|
||||||
|
|
||||||
|
## See also
|
||||||
|
|
||||||
|
- [mother-hive](./mother-hive.md) — the mother node's PostgreSQL-based hive registry
|
||||||
|
- [cost-model](./cost-model.md) — cost tracking per session
|
||||||
|
- [agent-harness](./agent-harness.md) — autospawn
|
||||||
Loading…
Add table
Reference in a new issue