docs: post-Phase-3 wiki accuracy + task-dispatch-flow page
- model-selection-and-eval: status Design → Phases 1–3 shipped (#264/#280/#285); mark Phase 2/3 deliverables, add 3a scope note, fix stale routing-gap row. - hive-routing: status → partially shipped; scheduler row reflects pick_agent + select_model. - README + index: model-selection row reflects shipped, not "design". - New task-dispatch-flow.md: the verified queued→claim→spawn→register→dispatch→ cost chain with code anchors + "why a task stalls" (stale build, not RPC mode, registration linkage). Indexed. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
parent
f10b369c1d
commit
04370dd869
5 changed files with 111 additions and 32 deletions
|
|
@ -67,7 +67,7 @@ Key pages:
|
|||
| [glasspane](docs/wiki/glasspane.md) | 5-state machine, attention system, TUI |
|
||||
| [hive-routing](docs/wiki/hive-routing.md) | Fleet routing, capability matrix, node registration |
|
||||
| [cost-dashboard](docs/wiki/cost-dashboard.md) | Per-task cost tracking, proof_text, live dashboard |
|
||||
| [model-selection-and-eval](docs/wiki/model-selection-and-eval.md) | T2.x eval harness + model selection design |
|
||||
| [model-selection-and-eval](docs/wiki/model-selection-and-eval.md) | Eval harness + eval-driven model selection (Ph 1–3) |
|
||||
| [mother-hive](docs/wiki/mother-hive.md) | Mother node coordination (PostgreSQL + MCP over SSH) |
|
||||
| [pull-requests](docs/wiki/pull-requests.md) | Branch naming, commit format, review workflow |
|
||||
| [quality-gates](docs/wiki/quality-gates.md) | CI pipeline, fmt/clippy/test/wiki-lint gates |
|
||||
|
|
|
|||
|
|
@ -1,6 +1,7 @@
|
|||
# Hive Member Tracking & Cost-Aware Routing
|
||||
|
||||
**Status:** 📋 Design
|
||||
**Status:** Partially shipped — capability matching + eval-driven model
|
||||
selection live (#285); stable node UUID and hive-level cost aggregation pending.
|
||||
**Date:** 24.jun.2026
|
||||
**Driven by:** T1.5 per-task cost tracking (shipped) → T2.x routing
|
||||
|
||||
|
|
@ -10,15 +11,15 @@
|
|||
|
||||
## What Exists Today
|
||||
|
||||
| Component | State | Gap |
|
||||
| ---------------------------------------- | --------------------------------------------------------------------------------------- | --------------------------------------------------------- |
|
||||
| `mother_schema.sql` | `hive_nodes` table with `hw_profile` + `capabilities` JSONB | No stable node UUID; hostname is the key |
|
||||
| `derive_capabilities()` trigger | Auto-computes `has_gpu`, `gpu_vendor`, `can_run_local_llm`, `max_model` from hw_profile | Only GPU/VRAM heuristics — doesn't probe running services |
|
||||
| `clawdie-system-probe` | Collects GPU, RAM, CPU, disks, ZFS, WiFi, Vulkan, Colibri status | No ollama/llama.cpp probing |
|
||||
| `node-register-mcp` | UPSERTs hw_profile into `hive_nodes` on join | No UUID generation at join time |
|
||||
| `crates/colibri-daemon/src/scheduler.rs` | Cron/interval/one-shot jobs, capability matching stubs | No cost-aware routing, no hive awareness |
|
||||
| `colibri-ledger` | Local SQLite `agents` table with UUID (v4 random) | UUID is session-local, not hive-stable |
|
||||
| T1.5 cost tracking | Per-task cost captured in local SQLite | No hive-level cost aggregation |
|
||||
| Component | State | Gap |
|
||||
| ---------------------------------------- | ------------------------------------------------------------------------------------------- | --------------------------------------------------------- |
|
||||
| `mother_schema.sql` | `hive_nodes` table with `hw_profile` + `capabilities` JSONB | No stable node UUID; hostname is the key |
|
||||
| `derive_capabilities()` trigger | Auto-computes `has_gpu`, `gpu_vendor`, `can_run_local_llm`, `max_model` from hw_profile | Only GPU/VRAM heuristics — doesn't probe running services |
|
||||
| `clawdie-system-probe` | Collects GPU, RAM, CPU, disks, ZFS, WiFi, Vulkan, Colibri status | No ollama/llama.cpp probing |
|
||||
| `node-register-mcp` | UPSERTs hw_profile into `hive_nodes` on join | No UUID generation at join time |
|
||||
| `crates/colibri-daemon/src/scheduler.rs` | Cron/interval/one-shot jobs, capability matching (`pick_agent`), eval-driven `select_model` | Selection is per-host; no cross-hive awareness yet |
|
||||
| `colibri-ledger` | Local SQLite `agents` table with UUID (v4 random) | UUID is session-local, not hive-stable |
|
||||
| T1.5 cost tracking | Per-task cost captured in local SQLite | No hive-level cost aggregation |
|
||||
|
||||
## Design Goals
|
||||
|
||||
|
|
@ -380,7 +381,7 @@ The capability matrix, stable UUIDs, and local LLM probes are the foundation —
|
|||
| `colibri_query_hive_capabilities` MCP tool | colibri-mcp |
|
||||
| `colibri_dispatch_to_node` MCP tool | colibri-mcp |
|
||||
| `hive-routing` skill | `.agent/skills/` |
|
||||
| `Task.routing` JSONB field in colibri-ledger | colibri-ledger |
|
||||
| `Task.routing` JSONB field in colibri-ledger | colibri-ledger |
|
||||
| Mother-side routing score as PostgreSQL function (optional — only if agent-driven routing proves insufficient) | mother_schema.sql |
|
||||
|
||||
---
|
||||
|
|
|
|||
|
|
@ -46,6 +46,7 @@ warning.
|
|||
| Page | What it covers |
|
||||
| --------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------- |
|
||||
| [agent-harness](./agent-harness.md) | The zot (agent) + Colibri (control plane) split; autospawn + RPC driver |
|
||||
| [task-dispatch-flow](./task-dispatch-flow.md) | End-to-end task path: queued → claimed → spawn → register → dispatch → cost; why a task stalls |
|
||||
| [agent-events-reference](./agent-events-reference.md) | Per-harness zot event reference, Glasspane mappings, and verified transcript fields |
|
||||
| [cost-model](./cost-model.md) | Byte-stable prefixes, cache-hit metering, auto-escalation, T14 compaction |
|
||||
| [glasspane](./glasspane.md) | Agent state machine, JSONL streaming, AgentRuntime taxonomy, snapshot API |
|
||||
|
|
@ -57,14 +58,14 @@ warning.
|
|||
| [hive-pane](./hive-pane.md) | Glasspane for the hive — multi-node cost observability, A2A discovery, and operator board |
|
||||
| [cost-dashboard](./cost-dashboard.md) | Mother-side cost observability — human gallery + JSON, screenshot proof linked from cost rows |
|
||||
| [a2a-complexity-audit](./a2a-complexity-audit.md) | A2A code complexity impact — 6-protocol surface audit, when A2A pays off |
|
||||
| [model-selection-and-eval](./model-selection-and-eval.md) | T2.x design: model selection (tier arbitrage) + evaluation harness (task success measurement) |
|
||||
| [model-selection-and-eval](./model-selection-and-eval.md) | Eval harness + eval-driven model selection — Phases 1–3 shipped, Phase 4 (cloud eval) planned |
|
||||
| [naming-decisions](./naming-decisions.md) | Ledger of harness-neutral / architecture renames — shipped and in-flight |
|
||||
| [daemon-not-demon](./daemon-not-demon.md) | Why we say daemon (helper spirit) not demon (bad spirit) — English + Slovenian |
|
||||
| [layered-soul](./layered-soul.md) | How Colibri consumes the layered-soul reviewed-context repo today vs planned |
|
||||
| [task-board](./task-board.md) | Capability match scoring, cron scheduling, intake drain, SQLite backing |
|
||||
| [pull-requests](./pull-requests.md) | PR workflow — branching, review, gates, merge conventions |
|
||||
| [quality-gates](./quality-gates.md) | `ci-checks.sh` as the pre-merge gate; why drift reached `main` before |
|
||||
| [contracts](./contracts.md) | Stable JSON schemas (run-manifest, runtime-inventory, provider-test), fixture tests |
|
||||
| [contracts](./contracts.md) | Stable JSON schemas (run-manifest, runtime-inventory, provider-test), fixture tests |
|
||||
| [store-schema](./store-schema.md) | SQLite coordination schema and migration discipline |
|
||||
| [external-mcp](./external-mcp.md) | MCP bridge for editors + external stdio MCP host; read/write/external-call gates |
|
||||
| [operator-cli](./operator-cli.md) | The `colibri` CLI as a thin typed Unix-socket client over the daemon API |
|
||||
|
|
@ -74,4 +75,4 @@ warning.
|
|||
| [skills-catalog](./skills-catalog.md) | Read-only runtime consumer for reviewed skill artifacts |
|
||||
| [vault-provision](./vault-provision.md) | Vaultwarden-driven env-file provisioning into jails after agent spawn |
|
||||
| [deployment](./deployment.md) | Host installer (clawdie): ZFS layout, rc.d/systemd service, dry-run safety |
|
||||
| [rust-glossary](./rust-glossary.md) | Quick reference for Rust terms in the codebase: serde, Result/Option, Arc/Box, derive macros |
|
||||
| [rust-glossary](./rust-glossary.md) | Quick reference for Rust terms in the codebase: serde, Result/Option, Arc/Box, derive macros |
|
||||
|
|
|
|||
|
|
@ -1,9 +1,16 @@
|
|||
# T2.x: Model Selection & Evaluation Harness
|
||||
|
||||
**Status:** 📋 Design
|
||||
**Status:** Phases 1–3 shipped (#264, #280, #285). Phase 4 (cloud eval) planned.
|
||||
**Date:** 25.jun.2026
|
||||
**Driven by:** T1.5 per-task cost tracking (shipped) → T2.x model selection + eval
|
||||
|
||||
Scope note (Phase 3, 3a): selection is **model-only** — success rate is the
|
||||
primary signal, cost is a tiebreaker, `cost_mode` owns token spend. Per-
|
||||
`task_type` routing is deferred to Phase 4 (no task_type data yet). The
|
||||
selector reads [`Store::model_success_rates`] and runs at agent spawn
|
||||
(long-lived harnesses, not per-task dispatch); gated by
|
||||
`COLIBRI_MODEL_SELECTION`, off by default.
|
||||
|
||||
> **Companion doc:** [hive-routing](./hive-routing.md) — the capability matrix,
|
||||
> machine identity, and routing engine. This doc covers what the routing engine
|
||||
> _optimizes for_ (model selection) and how it knows if it's winning (eval harness).
|
||||
|
|
@ -14,7 +21,7 @@
|
|||
| ------------------------- | ---------------------------------------------------------------------------------------------- | ---------------------------------------------------- |
|
||||
| `task_costs` (PostgreSQL) | Per-task cost rows with `provider`, `model`, `cost_usd`, `success` | `success` is boolean — agent process exited 0 or not |
|
||||
| `hive_nodes.capabilities` | JSONB with `has_gpu`, `can_run_local_llm`, `ollama_models` | No success-rate history per model per node |
|
||||
| Cost tiers (T0–T3) | Defined in hive-routing.md: local ($0), DeepSeek ($0.27/1M), Gemini ($0.15/1M), Claude ($3/1M) | No routing decision uses them yet |
|
||||
| Cost tiers (T0–T3) | Defined in hive-routing.md: local ($0), DeepSeek ($0.27/1M), Gemini ($0.15/1M), Claude ($3/1M) | Phase 3 `select_model` now factors cost into routing |
|
||||
| Agent harness | Spawns zot/pi, tracks session usage | No quality measurement beyond "did it exit 0?" |
|
||||
|
||||
## The Problem
|
||||
|
|
@ -344,7 +351,7 @@ task arrives at scheduler
|
|||
|
||||
**What this gives us:** Eval infrastructure is in place. We're collecting quality scores from agent self-report. This is the minimum viable eval.
|
||||
|
||||
### Phase 2 — Local LLM Eval (3 days)
|
||||
### Phase 2 — Local LLM Eval (shipped — PR #280)
|
||||
|
||||
**Goal:** Independent eval via local LLM.
|
||||
|
||||
|
|
@ -354,13 +361,13 @@ task arrives at scheduler
|
|||
| Local eval: spawn local LLM with eval prompt | colibri-daemon | ~60 |
|
||||
| Fallback logic: self-report → local → cloud → skipped | colibri-daemon | ~40 |
|
||||
| Eval job scheduler (async, fire-and-forget) | colibri-daemon | ~30 |
|
||||
| Eval result merge into task_eval | colibri-ledger | ~20 |
|
||||
| Eval result merge into task_eval | colibri-ledger | ~20 |
|
||||
|
||||
**Total:** ~180 lines, 3 days.
|
||||
|
||||
**What this gives us:** Independent eval for most tasks. Self-report is still the default, but local LLM eval can verify or override.
|
||||
|
||||
### Phase 3 — Model Selection (3 days)
|
||||
### Phase 3 — Model Selection (shipped — PR #285)
|
||||
|
||||
**Goal:** Data-driven routing decisions.
|
||||
|
||||
|
|
@ -383,7 +390,7 @@ task arrives at scheduler
|
|||
| Deliverable | Where | Lines |
|
||||
| --------------------------------------------------- | -------------- | ----- |
|
||||
| Cloud eval: call Claude/DeepSeek with eval prompt | colibri-daemon | ~50 |
|
||||
| Cost accounting: eval_cost_usd added to task_eval | colibri-ledger | ~10 |
|
||||
| Cost accounting: eval_cost_usd added to task_eval | colibri-ledger | ~10 |
|
||||
| Feedback loop: eval results → routing weight update | colibri-daemon | ~30 |
|
||||
| Eval aggregation: 5-minute rollup of success rates | colibri-mcp | ~25 |
|
||||
|
||||
|
|
@ -402,20 +409,21 @@ task arrives at scheduler
|
|||
- Daemon writes eval result ✅
|
||||
- Query API for eval data ✅
|
||||
|
||||
### Phase 2 — Local LLM Eval
|
||||
### Phase 2 — Local LLM Eval (shipped — PR #280)
|
||||
|
||||
- Eval prompt template (JSON schema)
|
||||
- Local eval job (spawn local LLM)
|
||||
- Fallback logic (self-report → local → cloud → skipped)
|
||||
- Async eval scheduler
|
||||
- Eval prompt template ✅
|
||||
- Local eval job (spawn local LLM via ollama) ✅
|
||||
- Fallback chain self-report → local ✅ (→ cloud lands in Phase 4)
|
||||
- Async eval (background `spawn_blocking`) ✅
|
||||
|
||||
### Phase 3 — Model Selection
|
||||
### Phase 3 — Model Selection (shipped — PR #285)
|
||||
|
||||
- `select_model()` function
|
||||
- Query eval success rates per (model, task_type)
|
||||
- Decision rationale logging
|
||||
- Configurable weights
|
||||
- Integration with task dispatch
|
||||
- `select_model()` function ✅
|
||||
- Query eval success rates per model — `Store::model_success_rates` ✅
|
||||
(per `task_type` deferred to Phase 4)
|
||||
- Decision rationale logging ✅
|
||||
- Configurable weights (`COLIBRI_MODEL_SELECTION_WEIGHT_*`) ✅
|
||||
- Integration at agent spawn (`recommend_model` → autospawn env) ✅
|
||||
|
||||
### Phase 4 — Cloud Eval + Feedback
|
||||
|
||||
|
|
|
|||
69
docs/wiki/task-dispatch-flow.md
Normal file
69
docs/wiki/task-dispatch-flow.md
Normal file
|
|
@ -0,0 +1,69 @@
|
|||
# Task dispatch flow (queued → processed → cost)
|
||||
|
||||
← [index](./index.md)
|
||||
|
||||
## What this is
|
||||
|
||||
The end-to-end path a task takes from submission to a running agent and back.
|
||||
This page exists because the chain spans three modules ([task-board](./task-board.md)
|
||||
scheduling, [agent-harness](./agent-harness.md) spawning, and the daemon poll
|
||||
loop), and it's a recurring source of "why won't the agent pick up my task?"
|
||||
confusion. Every stage below is in `main` today.
|
||||
|
||||
## The chain
|
||||
|
||||
```
|
||||
operator submits intake-task (socket) cmd_intake_task
|
||||
→ task row (queued) ───────────────────────────────► store.create_task
|
||||
│
|
||||
scheduler tick (~30s) pick best-fit agent Scheduler::tick
|
||||
→ claim_task ──────────────────────────────────────► pick_agent + claim_task
|
||||
│
|
||||
autospawn (once) spawn `zot rpc` (stdin piped) autospawn_agent_if_configured
|
||||
→ register agent ───────────────────────────────────► register_agent (name = spawn id)
|
||||
│
|
||||
daemon poll loop task text → agent stdin poll_tasks
|
||||
→ send_prompt ──────────────────────────────────────► rpc_sender().send_prompt(task)
|
||||
→ status: Started │
|
||||
▼
|
||||
agent works, emits JSONL → glasspane state → completion → set_task_cost
|
||||
→ write_task_eval (self-report) + background local eval → push_cost_to_mother → dashboard
|
||||
```
|
||||
|
||||
## Stages
|
||||
|
||||
| Stage | What happens | Code |
|
||||
| ------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | -------------------------------------------------------------------------------------------------- |
|
||||
| **Submit** | A task row is created in the store with status `queued`. | [`cmd_intake_task`](../../crates/colibri-daemon/src/socket.rs) |
|
||||
| **Claim** | Each tick, the scheduler picks the best-fit agent by capability and claims the task (`queued → claimed`). | [`Scheduler::tick`](../../crates/colibri-daemon/src/scheduler.rs) → `pick_agent`, `claim_task` |
|
||||
| **Spawn** | Autospawn starts the harness as **`zot rpc`** (provider `local`), so stdin is piped and an `RpcSender` is available. | [`autospawn_agent_if_configured`](../../crates/colibri-daemon/src/socket.rs), `default_agent_args` |
|
||||
| **Register** | The spawned agent is registered in the store; its `name` column holds the live spawn-handle id used in `state.agents`. | `register_agent` (store row `name` = spawn id) |
|
||||
| **Dispatch** | The poll loop resolves the spawn handle from the claimed task's `agent_id`, gets `rpc_sender()`, and writes the task text to the agent's stdin, transitioning the task to `Started`. | [`poll_tasks`](../../crates/colibri-daemon/src/daemon.rs) → `send_prompt` |
|
||||
| **Process + cost** | The agent works and emits JSONL ([glasspane](./glasspane.md)). On completion, cost and an eval record are written, then pushed to mother for the [dashboard](./cost-dashboard.md). | `set_task_cost`, `write_task_eval`, `push_cost_to_mother` |
|
||||
|
||||
## Why a task can stall (and what it is _not_)
|
||||
|
||||
The dispatch logic above is all in `main` — a stalled task is almost never
|
||||
missing code. The usual causes, in order:
|
||||
|
||||
1. **Stale deployed build.** The host is running a colibri binary older than
|
||||
the current `poll_tasks` dispatch or agent-registration fixes. Check
|
||||
`git rev-parse HEAD` on the host against `origin/main`; reset, rebuild,
|
||||
restart the daemon.
|
||||
2. **Agent not in RPC mode.** If the process is `zot --mode json` (not
|
||||
`zot rpc`), stdin isn't piped, `rpc_sender()` is `None`, and no dispatch
|
||||
happens. Confirm with `ps`.
|
||||
3. **Registration linkage broken.** Dispatch needs the store agent row's
|
||||
`name` to equal the live spawn id in `state.agents`. A mismatch (older
|
||||
build) means `poll_tasks` can't find the sender.
|
||||
|
||||
If you're told "merge branch X to enable dispatch," verify X against `main`
|
||||
first — the chain is already merged, and re-pushing an auto-deleted branch
|
||||
hits the [branch recreation hazard](./pull-requests.md).
|
||||
|
||||
## See also
|
||||
|
||||
- [task-board](./task-board.md) — scheduler internals (capability scoring, intake drain)
|
||||
- [agent-harness](./agent-harness.md) — zot/Colibri split, autospawn, RPC driver
|
||||
- [glasspane](./glasspane.md) — how agent stdout becomes observable state
|
||||
- [cost-dashboard](./cost-dashboard.md) — where cost lands after completion
|
||||
Loading…
Add table
Reference in a new issue