Adds RPC dispatch to poll_tasks() — when a claimed task has an
agent_id matching a running autospawned agent (zot rpc), the daemon
sends the task description via the existing RPC channel and
transitions the task to 'started'.
Key changes:
- Resolves store row ID → spawn handle ID via get_agent().name
- Falls back to spawn-per-task path if no RPC agent found
- Uses existing send_prompt() on RpcSender
Pipeline verified end-to-end:
intake-task → queued → scheduler tick → claimed
→ poll_tasks RPC dispatch → started ✅
Remaining: persistent RPC agents don't exit after one task, so
the current poll_exit-based cost capture (triggered by process exit)
doesn't fire. Periodic pane-usage snapshot needed for long-running
RPC agents.
Schema + store + daemon hook for the eval harness (Phase 1 of T2.x).
Per docs/wiki/t2x-eval-harness.md, the eval harness records multi-dimensional
success measurement per task — beyond the boolean 'did it exit 0?' that T1.5
already captures. Phase 1 uses agent self-report (exit code → quality 1.0 or
0.0). Phases 2/3/4 will layer on local-llm eval, cloud-llm eval, and
model-selection routing.
Schema (colibri-store):
- New task_evals table: task_id, agent_id, eval_mode, completion_status,
quality_score, correctness_check, eval_provider, eval_latency_ms,
eval_cost_usd, evaluated_at. CHECK constraints enforce the enum fields.
Intentionally no FK to tasks — we don't want DELETE CASCADE to destroy
eval history and we don't want a missing task row to block eval writes.
- task_costs gets quality_score and eval_mode columns for dashboard display.
- Migrations use IF NOT EXISTS / try-block pattern for idempotent reopens.
Store API:
- write_task_eval: INSERT OR REPLACE — same task_id can be upgraded
(e.g. skip → agent → local-llm → cloud-llm)
- read_task_eval
- list_task_evals_by_agent
- list_all_task_evals
- eval_summary(window_hours): aggregated rollup for Phase 3 routing
Daemon integration:
- New TaskCompletion struct consolidates what used to be 8 args to an
inline cost-capture closure. The struct is a stable API that future
eval modes (local-llm, cloud-llm) can populate with eval_provider,
eval_latency_ms, eval_cost_usd without touching the hook signature.
- record_task_completion(state, &TaskCompletion): single atomic hook now
writes both task_costs AND task_evals. Called from heartbeat's poll_exit
path; designed so RPC-completion and periodic-snapshot paths (the gap
flagged in feat/rpc-task-dispatch for persistent RPC agents) can call
the same function.
- Hardcoded eval_mode='agent' in Phase 1 — future phases pass different
values; the function itself is mode-agnostic.
MCP tool:
- colibri_get_task_eval(task_id): returns the eval record for a task.
Client:
- Client::get_task_eval() async method.
Tests:
- 6 new store tests: roundtrip, insert-or-replace upgrade path,
list-by-agent filter, eval_summary aggregation, CHECK constraint
enforcement, export_json integration.
- tool_dispatch test updated for new tool count (20 → 21).
All gates green: cargo fmt, clippy -D warnings, cargo test workspace,
wiki-lint --strict (187/0).
Sam & Claude