Linux/FreeBSD Cross-platform Rust control plane core https://clawdie.si
Find a file
Sam & Claude 89e47363ef feat(store): T2.x Phase 1 eval harness — agent self-report
Schema + store + daemon hook for the eval harness (Phase 1 of T2.x).

Per docs/wiki/t2x-eval-harness.md, the eval harness records multi-dimensional
success measurement per task — beyond the boolean 'did it exit 0?' that T1.5
already captures. Phase 1 uses agent self-report (exit code → quality 1.0 or
0.0). Phases 2/3/4 will layer on local-llm eval, cloud-llm eval, and
model-selection routing.

Schema (colibri-store):
- New task_evals table: task_id, agent_id, eval_mode, completion_status,
  quality_score, correctness_check, eval_provider, eval_latency_ms,
  eval_cost_usd, evaluated_at. CHECK constraints enforce the enum fields.
  Intentionally no FK to tasks — we don't want DELETE CASCADE to destroy
  eval history and we don't want a missing task row to block eval writes.
- task_costs gets quality_score and eval_mode columns for dashboard display.
- Migrations use IF NOT EXISTS / try-block pattern for idempotent reopens.

Store API:
- write_task_eval: INSERT OR REPLACE — same task_id can be upgraded
  (e.g. skip → agent → local-llm → cloud-llm)
- read_task_eval
- list_task_evals_by_agent
- list_all_task_evals
- eval_summary(window_hours): aggregated rollup for Phase 3 routing

Daemon integration:
- New TaskCompletion struct consolidates what used to be 8 args to an
  inline cost-capture closure. The struct is a stable API that future
  eval modes (local-llm, cloud-llm) can populate with eval_provider,
  eval_latency_ms, eval_cost_usd without touching the hook signature.
- record_task_completion(state, &TaskCompletion): single atomic hook now
  writes both task_costs AND task_evals. Called from heartbeat's poll_exit
  path; designed so RPC-completion and periodic-snapshot paths (the gap
  flagged in feat/rpc-task-dispatch for persistent RPC agents) can call
  the same function.
- Hardcoded eval_mode='agent' in Phase 1 — future phases pass different
  values; the function itself is mode-agnostic.

MCP tool:
- colibri_get_task_eval(task_id): returns the eval record for a task.

Client:
- Client::get_task_eval() async method.

Tests:
- 6 new store tests: roundtrip, insert-or-replace upgrade path,
  list-by-agent filter, eval_summary aggregation, CHECK constraint
  enforcement, export_json integration.
- tool_dispatch test updated for new tool count (20 → 21).

All gates green: cargo fmt, clippy -D warnings, cargo test workspace,
wiki-lint --strict (187/0).

Sam & Claude
2026-06-28 08:23:05 +02:00
.agent/skills fix(skills): remove duplicate PF validate line in freebsd-admin SKILL 2026-06-28 00:20:33 +02:00
.forgejo/workflows chore(ci): add wiki-lint to CI for parity with ci-checks.sh 2026-06-25 22:50:19 +02:00
astro/wiki fix(astro): EN index reads from src/content, not ../../docs/wiki 2026-06-28 00:59:13 +02:00
crates feat(store): T2.x Phase 1 eval harness — agent self-report 2026-06-28 08:23:05 +02:00
docs docs: remove legacy references — positive framing pass (11 files) (#248) 2026-06-28 00:07:17 +02:00
manifests refactor: rename smoke→test across provider contracts and docs 2026-06-27 11:54:30 +02:00
packaging fix(dashboard): restore dual-proof lightbox — screenshot + text badges 2026-06-27 22:34:40 +02:00
scripts style: restore main green — fmt + prettier drift (Sam & Claude) 2026-06-27 17:19:57 +02:00
src refactor: clear pi-era residue from the harness-neutral agent path 2026-06-23 18:04:45 +02:00
tests feat(rc): rename test agent and load provider env (Sam & Codex) 2026-06-15 07:35:44 +02:00
.env.example Auto-load .env for the DeepSeek probe; gitignore .env (Sam & Claude) 2026-05-26 14:27:41 +02:00
.gitignore Auto-load .env for the DeepSeek probe; gitignore .env (Sam & Claude) 2026-05-26 14:27:41 +02:00
.prettierignore chore: adopt markdown formatting gate + one-shot prettier sweep (Sam & Claude) 2026-06-04 20:13:47 +02:00
.prettierrc chore: adopt markdown formatting gate + one-shot prettier sweep (Sam & Claude) 2026-06-04 20:13:47 +02:00
AGENTS.md docs: delete 3 stale docs; repoint refs to successor 2026-06-24 16:58:49 +02:00
Cargo.lock feat(deploy): add colibri-deploy crate + MCP tools 2026-06-27 18:57:55 +02:00
Cargo.toml feat(deploy): add colibri-deploy crate + MCP tools 2026-06-27 18:57:55 +02:00
LICENSE release: colibri 0.11.0 + relicense AGPL-3.0 -> MIT 2026-06-20 22:05:47 +02:00
README.md docs: fix README referrer to moved headroom-sidecar wiki page 2026-06-24 17:34:42 +02:00
rust-toolchain.toml Scaffold Colibri Phase 1: colibri-probe DeepSeek cache smoke (Sam & Claude) 2026-05-26 10:08:23 +02:00

Colibri

The Clawdie control plane core — a small, cross-platform (FreeBSD + Linux) Rust daemon. Developed from an operator USB environment; deploys as the Clawdie service on bare FreeBSD hardware (ZFS RAID1, PostgreSQL + pgvector, bhyve VMs, Bastille jails). Unifies coordination (task board, agent registry, skills catalog) with cache-first cost discipline (byte-stable prompt prefixes, cache-hit metering).

Status: workspace gates are fmt/clippy/test/release green. Round 2 audit is closed. Current priorities: ISO boot/runtime validation, Pi spawn end-to-end, and cost-mode enforcement (see docs/MULTI-AGENT-HOST-PLAN.md). Always query live state: see the crate table below and run the gate commands for current counts.

FreeBSD build lane handoff: docs/FREEBSD-BUILD-LANE-HANDOFF.md. ISO acceptance runbook: docs/ISO-ACCEPTANCE-RUNBOOK.md. Clawdie Studio/Zed proposal: docs/CLAWDIE-STUDIO-PROPOSAL.md. External MCP host prototype: docs/COLIBRI-EXTERNAL-MCP-PROTOTYPE.md. Optional Headroom compression sidecar: docs/wiki/headroom-sidecar.md.

Workspace

Crate Role
colibri (root) Workspace root + probe binaries (colibri-probe, runtime-inventory)
colibri-mcp MCP bridge for editor integration (Zed, Claude Code) via stdio JSON-RPC
colibri-contracts JSON schema contracts (golden tests)
colibri-deepseek DeepSeek cache-hit probe, prefix metering
colibri-runtime Host status ingestion, runtime inventory
colibri-glasspane Agent 5-state machine (zot/pi JSONL events → state)
colibri-daemon Always-on Unix socket server, session lifecycle
colibri-client Typed Unix-socket client + operator CLI
colibri-glasspane-tui ratatui live dashboard (FreeBSD-native)
colibri-store Embedded SQLite coordination (task board, agents, skills)
colibri-skills Skills catalog crate
clawdie Host installer/deployer: ZFS layout + clawdie service (FreeBSD/Linux)

Build

cargo build --release

Test

cargo test --workspace
cargo clippy --workspace --all-targets -- -D warnings

Architecture

colibri-daemon (always-on Unix socket server)
  ├── glasspane      — agent state machine (zot/pi JSONL → idle/working/blocked/done)
  ├── store          — SQLite coordination (tasks, agents, skills)
  ├── socket         — newline-JSON socket API
  ├── session        — append-only JSONL sessions, 3-region prompt assembly
  └── spawner        — agent subprocess management (retry/backoff, FreeBSD jail confinement)

colibri-client       — CLI tools (colibri, colibri-test-agent)
colibri-glasspane-tui— ratatui dashboard

Probe binaries

# DeepSeek cache probe (needs DEEPSEEK_API_KEY)
cargo run --release --bin colibri-probe

# Runtime inventory manifest
cargo run --release --bin colibri-runtime-inventory

FreeBSD

Target x86_64-unknown-freebsd (Rust Tier-2). TLS uses rustls for clean static linking (no openssl-sys dependency). Default DB path: /var/db/colibri/colibri.sqlite.