Schema + store + daemon hook for the eval harness (Phase 1 of T2.x). Per docs/wiki/t2x-eval-harness.md, the eval harness records multi-dimensional success measurement per task — beyond the boolean 'did it exit 0?' that T1.5 already captures. Phase 1 uses agent self-report (exit code → quality 1.0 or 0.0). Phases 2/3/4 will layer on local-llm eval, cloud-llm eval, and model-selection routing. Schema (colibri-store): - New task_evals table: task_id, agent_id, eval_mode, completion_status, quality_score, correctness_check, eval_provider, eval_latency_ms, eval_cost_usd, evaluated_at. CHECK constraints enforce the enum fields. Intentionally no FK to tasks — we don't want DELETE CASCADE to destroy eval history and we don't want a missing task row to block eval writes. - task_costs gets quality_score and eval_mode columns for dashboard display. - Migrations use IF NOT EXISTS / try-block pattern for idempotent reopens. Store API: - write_task_eval: INSERT OR REPLACE — same task_id can be upgraded (e.g. skip → agent → local-llm → cloud-llm) - read_task_eval - list_task_evals_by_agent - list_all_task_evals - eval_summary(window_hours): aggregated rollup for Phase 3 routing Daemon integration: - New TaskCompletion struct consolidates what used to be 8 args to an inline cost-capture closure. The struct is a stable API that future eval modes (local-llm, cloud-llm) can populate with eval_provider, eval_latency_ms, eval_cost_usd without touching the hook signature. - record_task_completion(state, &TaskCompletion): single atomic hook now writes both task_costs AND task_evals. Called from heartbeat's poll_exit path; designed so RPC-completion and periodic-snapshot paths (the gap flagged in feat/rpc-task-dispatch for persistent RPC agents) can call the same function. - Hardcoded eval_mode='agent' in Phase 1 — future phases pass different values; the function itself is mode-agnostic. MCP tool: - colibri_get_task_eval(task_id): returns the eval record for a task. Client: - Client::get_task_eval() async method. Tests: - 6 new store tests: roundtrip, insert-or-replace upgrade path, list-by-agent filter, eval_summary aggregation, CHECK constraint enforcement, export_json integration. - tool_dispatch test updated for new tool count (20 → 21). All gates green: cargo fmt, clippy -D warnings, cargo test workspace, wiki-lint --strict (187/0). Sam & Claude |
||
|---|---|---|
| .agent/skills | ||
| .forgejo/workflows | ||
| astro/wiki | ||
| crates | ||
| docs | ||
| manifests | ||
| packaging | ||
| scripts | ||
| src | ||
| tests | ||
| .env.example | ||
| .gitignore | ||
| .prettierignore | ||
| .prettierrc | ||
| AGENTS.md | ||
| Cargo.lock | ||
| Cargo.toml | ||
| LICENSE | ||
| README.md | ||
| rust-toolchain.toml | ||
Colibri
The Clawdie control plane core — a small, cross-platform (FreeBSD + Linux) Rust daemon. Developed from an operator USB environment; deploys as the Clawdie service on bare FreeBSD hardware (ZFS RAID1, PostgreSQL + pgvector, bhyve VMs, Bastille jails). Unifies coordination (task board, agent registry, skills catalog) with cache-first cost discipline (byte-stable prompt prefixes, cache-hit metering).
Status: workspace gates are fmt/clippy/test/release green. Round 2 audit is closed. Current priorities: ISO boot/runtime validation, Pi spawn end-to-end, and cost-mode enforcement (see docs/MULTI-AGENT-HOST-PLAN.md). Always query live state: see the crate table below and run the gate commands for current counts.
FreeBSD build lane handoff: docs/FREEBSD-BUILD-LANE-HANDOFF.md.
ISO acceptance runbook: docs/ISO-ACCEPTANCE-RUNBOOK.md.
Clawdie Studio/Zed proposal: docs/CLAWDIE-STUDIO-PROPOSAL.md.
External MCP host prototype: docs/COLIBRI-EXTERNAL-MCP-PROTOTYPE.md.
Optional Headroom compression sidecar: docs/wiki/headroom-sidecar.md.
Workspace
| Crate | Role |
|---|---|
colibri (root) |
Workspace root + probe binaries (colibri-probe, runtime-inventory) |
colibri-mcp |
MCP bridge for editor integration (Zed, Claude Code) via stdio JSON-RPC |
colibri-contracts |
JSON schema contracts (golden tests) |
colibri-deepseek |
DeepSeek cache-hit probe, prefix metering |
colibri-runtime |
Host status ingestion, runtime inventory |
colibri-glasspane |
Agent 5-state machine (zot/pi JSONL events → state) |
colibri-daemon |
Always-on Unix socket server, session lifecycle |
colibri-client |
Typed Unix-socket client + operator CLI |
colibri-glasspane-tui |
ratatui live dashboard (FreeBSD-native) |
colibri-store |
Embedded SQLite coordination (task board, agents, skills) |
colibri-skills |
Skills catalog crate |
clawdie |
Host installer/deployer: ZFS layout + clawdie service (FreeBSD/Linux) |
Build
cargo build --release
Test
cargo test --workspace
cargo clippy --workspace --all-targets -- -D warnings
Architecture
colibri-daemon (always-on Unix socket server)
├── glasspane — agent state machine (zot/pi JSONL → idle/working/blocked/done)
├── store — SQLite coordination (tasks, agents, skills)
├── socket — newline-JSON socket API
├── session — append-only JSONL sessions, 3-region prompt assembly
└── spawner — agent subprocess management (retry/backoff, FreeBSD jail confinement)
colibri-client — CLI tools (colibri, colibri-test-agent)
colibri-glasspane-tui— ratatui dashboard
Probe binaries
# DeepSeek cache probe (needs DEEPSEEK_API_KEY)
cargo run --release --bin colibri-probe
# Runtime inventory manifest
cargo run --release --bin colibri-runtime-inventory
FreeBSD
Target x86_64-unknown-freebsd (Rust Tier-2). TLS uses rustls for clean
static linking (no openssl-sys dependency). Default DB path: /var/db/colibri/colibri.sqlite.