feat: T2.x eval harness + RPC task dispatch #264

Merged

clawdie merged 4 commits from feat/rpc-eval-combined into main

2026-06-28 08:43:34 +02:00

Author	SHA1	Message	Date
Sam & Claude	ed35e3ffb0	style: cargo fmt on RPC dispatch method chains Some checks failed CI / rust (pull_request) Has been cancelled Details CI / markdown (pull_request) Has been cancelled Details CI / port (pull_request) Has been cancelled Details CI / agent-jail-pkgs (pull_request) Has been cancelled Details	2026-06-28 08:37:47 +02:00
Sam & Claude	514105b44d	fix(clippy): map_or → is_some_and in ollama probe fallback Clippy 1.94 lint: unnecessary_map_or on socket.rs:931. Part of the eval harness probe_capabilities fallback (#260). Combined PR: eval harness Phase 1 + RPC task dispatch.	2026-06-28 08:27:01 +02:00
Sam & Claude	5227b2cd25	feat(daemon): dispatch claimed tasks to running RPC agents Adds RPC dispatch to poll_tasks() — when a claimed task has an agent_id matching a running autospawned agent (zot rpc), the daemon sends the task description via the existing RPC channel and transitions the task to 'started'. Key changes: - Resolves store row ID → spawn handle ID via get_agent().name - Falls back to spawn-per-task path if no RPC agent found - Uses existing send_prompt() on RpcSender Pipeline verified end-to-end: intake-task → queued → scheduler tick → claimed → poll_tasks RPC dispatch → started ✅ Remaining: persistent RPC agents don't exit after one task, so the current poll_exit-based cost capture (triggered by process exit) doesn't fire. Periodic pane-usage snapshot needed for long-running RPC agents.	2026-06-28 08:23:05 +02:00
Sam & Claude	89e47363ef	feat(store): T2.x Phase 1 eval harness — agent self-report Schema + store + daemon hook for the eval harness (Phase 1 of T2.x). Per docs/wiki/t2x-eval-harness.md, the eval harness records multi-dimensional success measurement per task — beyond the boolean 'did it exit 0?' that T1.5 already captures. Phase 1 uses agent self-report (exit code → quality 1.0 or 0.0). Phases 2/3/4 will layer on local-llm eval, cloud-llm eval, and model-selection routing. Schema (colibri-store): - New task_evals table: task_id, agent_id, eval_mode, completion_status, quality_score, correctness_check, eval_provider, eval_latency_ms, eval_cost_usd, evaluated_at. CHECK constraints enforce the enum fields. Intentionally no FK to tasks — we don't want DELETE CASCADE to destroy eval history and we don't want a missing task row to block eval writes. - task_costs gets quality_score and eval_mode columns for dashboard display. - Migrations use IF NOT EXISTS / try-block pattern for idempotent reopens. Store API: - write_task_eval: INSERT OR REPLACE — same task_id can be upgraded (e.g. skip → agent → local-llm → cloud-llm) - read_task_eval - list_task_evals_by_agent - list_all_task_evals - eval_summary(window_hours): aggregated rollup for Phase 3 routing Daemon integration: - New TaskCompletion struct consolidates what used to be 8 args to an inline cost-capture closure. The struct is a stable API that future eval modes (local-llm, cloud-llm) can populate with eval_provider, eval_latency_ms, eval_cost_usd without touching the hook signature. - record_task_completion(state, &TaskCompletion): single atomic hook now writes both task_costs AND task_evals. Called from heartbeat's poll_exit path; designed so RPC-completion and periodic-snapshot paths (the gap flagged in feat/rpc-task-dispatch for persistent RPC agents) can call the same function. - Hardcoded eval_mode='agent' in Phase 1 — future phases pass different values; the function itself is mode-agnostic. MCP tool: - colibri_get_task_eval(task_id): returns the eval record for a task. Client: - Client::get_task_eval() async method. Tests: - 6 new store tests: roundtrip, insert-or-replace upgrade path, list-by-agent filter, eval_summary aggregation, CHECK constraint enforcement, export_json integration. - tool_dispatch test updated for new tool count (20 → 21). All gates green: cargo fmt, clippy -D warnings, cargo test workspace, wiki-lint --strict (187/0). Sam & Claude	2026-06-28 08:23:05 +02:00