- model-selection-and-eval: status Design → Phases 1–3 shipped (#264/#280/#285); mark Phase 2/3 deliverables, add 3a scope note, fix stale routing-gap row. - hive-routing: status → partially shipped; scheduler row reflects pick_agent + select_model. - README + index: model-selection row reflects shipped, not "design". - New task-dispatch-flow.md: the verified queued→claim→spawn→register→dispatch→ cost chain with code anchors + "why a task stalls" (stale build, not RPC mode, registration linkage). Indexed. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
5.6 KiB
5.6 KiB
Task dispatch flow (queued → processed → cost)
← index
What this is
The end-to-end path a task takes from submission to a running agent and back.
This page exists because the chain spans three modules (task-board
scheduling, agent-harness spawning, and the daemon poll
loop), and it's a recurring source of "why won't the agent pick up my task?"
confusion. Every stage below is in main today.
The chain
operator submits intake-task (socket) cmd_intake_task
→ task row (queued) ───────────────────────────────► store.create_task
│
scheduler tick (~30s) pick best-fit agent Scheduler::tick
→ claim_task ──────────────────────────────────────► pick_agent + claim_task
│
autospawn (once) spawn `zot rpc` (stdin piped) autospawn_agent_if_configured
→ register agent ───────────────────────────────────► register_agent (name = spawn id)
│
daemon poll loop task text → agent stdin poll_tasks
→ send_prompt ──────────────────────────────────────► rpc_sender().send_prompt(task)
→ status: Started │
▼
agent works, emits JSONL → glasspane state → completion → set_task_cost
→ write_task_eval (self-report) + background local eval → push_cost_to_mother → dashboard
Stages
| Stage | What happens | Code |
|---|---|---|
| Submit | A task row is created in the store with status queued. |
cmd_intake_task |
| Claim | Each tick, the scheduler picks the best-fit agent by capability and claims the task (queued → claimed). |
Scheduler::tick → pick_agent, claim_task |
| Spawn | Autospawn starts the harness as zot rpc (provider local), so stdin is piped and an RpcSender is available. |
autospawn_agent_if_configured, default_agent_args |
| Register | The spawned agent is registered in the store; its name column holds the live spawn-handle id used in state.agents. |
register_agent (store row name = spawn id) |
| Dispatch | The poll loop resolves the spawn handle from the claimed task's agent_id, gets rpc_sender(), and writes the task text to the agent's stdin, transitioning the task to Started. |
poll_tasks → send_prompt |
| Process + cost | The agent works and emits JSONL (glasspane). On completion, cost and an eval record are written, then pushed to mother for the dashboard. | set_task_cost, write_task_eval, push_cost_to_mother |
Why a task can stall (and what it is not)
The dispatch logic above is all in main — a stalled task is almost never
missing code. The usual causes, in order:
- Stale deployed build. The host is running a colibri binary older than
the current
poll_tasksdispatch or agent-registration fixes. Checkgit rev-parse HEADon the host againstorigin/main; reset, rebuild, restart the daemon. - Agent not in RPC mode. If the process is
zot --mode json(notzot rpc), stdin isn't piped,rpc_sender()isNone, and no dispatch happens. Confirm withps. - Registration linkage broken. Dispatch needs the store agent row's
nameto equal the live spawn id instate.agents. A mismatch (older build) meanspoll_taskscan't find the sender.
If you're told "merge branch X to enable dispatch," verify X against main
first — the chain is already merged, and re-pushing an auto-deleted branch
hits the branch recreation hazard.
See also
- task-board — scheduler internals (capability scoring, intake drain)
- agent-harness — zot/Colibri split, autospawn, RPC driver
- glasspane — how agent stdout becomes observable state
- cost-dashboard — where cost lands after completion