colibri/docs/wiki/task-dispatch-flow.md
Sam & Claude 04370dd869
Some checks failed
CI / rust (pull_request) Has been cancelled
CI / markdown (pull_request) Has been cancelled
CI / port (pull_request) Has been cancelled
CI / agent-jail-pkgs (pull_request) Has been cancelled
docs: post-Phase-3 wiki accuracy + task-dispatch-flow page
- model-selection-and-eval: status Design → Phases 1–3 shipped (#264/#280/#285);
  mark Phase 2/3 deliverables, add 3a scope note, fix stale routing-gap row.
- hive-routing: status → partially shipped; scheduler row reflects pick_agent +
  select_model.
- README + index: model-selection row reflects shipped, not "design".
- New task-dispatch-flow.md: the verified queued→claim→spawn→register→dispatch→
  cost chain with code anchors + "why a task stalls" (stale build, not RPC mode,
  registration linkage). Indexed.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-28 18:53:09 +02:00

5.6 KiB

Task dispatch flow (queued → processed → cost)

index

What this is

The end-to-end path a task takes from submission to a running agent and back. This page exists because the chain spans three modules (task-board scheduling, agent-harness spawning, and the daemon poll loop), and it's a recurring source of "why won't the agent pick up my task?" confusion. Every stage below is in main today.

The chain

operator submits        intake-task (socket)            cmd_intake_task
   → task row (queued) ───────────────────────────────► store.create_task
                                                              │
scheduler tick (~30s)   pick best-fit agent              Scheduler::tick
   → claim_task ──────────────────────────────────────► pick_agent + claim_task
                                                              │
autospawn (once)        spawn `zot rpc` (stdin piped)    autospawn_agent_if_configured
   → register agent ───────────────────────────────────► register_agent (name = spawn id)
                                                              │
daemon poll loop        task text → agent stdin          poll_tasks
   → send_prompt ──────────────────────────────────────► rpc_sender().send_prompt(task)
   → status: Started                                          │
                                                              ▼
agent works, emits JSONL → glasspane state → completion → set_task_cost
   → write_task_eval (self-report) + background local eval → push_cost_to_mother → dashboard

Stages

Stage What happens Code
Submit A task row is created in the store with status queued. cmd_intake_task
Claim Each tick, the scheduler picks the best-fit agent by capability and claims the task (queued → claimed). Scheduler::tickpick_agent, claim_task
Spawn Autospawn starts the harness as zot rpc (provider local), so stdin is piped and an RpcSender is available. autospawn_agent_if_configured, default_agent_args
Register The spawned agent is registered in the store; its name column holds the live spawn-handle id used in state.agents. register_agent (store row name = spawn id)
Dispatch The poll loop resolves the spawn handle from the claimed task's agent_id, gets rpc_sender(), and writes the task text to the agent's stdin, transitioning the task to Started. poll_taskssend_prompt
Process + cost The agent works and emits JSONL (glasspane). On completion, cost and an eval record are written, then pushed to mother for the dashboard. set_task_cost, write_task_eval, push_cost_to_mother

Why a task can stall (and what it is not)

The dispatch logic above is all in main — a stalled task is almost never missing code. The usual causes, in order:

  1. Stale deployed build. The host is running a colibri binary older than the current poll_tasks dispatch or agent-registration fixes. Check git rev-parse HEAD on the host against origin/main; reset, rebuild, restart the daemon.
  2. Agent not in RPC mode. If the process is zot --mode json (not zot rpc), stdin isn't piped, rpc_sender() is None, and no dispatch happens. Confirm with ps.
  3. Registration linkage broken. Dispatch needs the store agent row's name to equal the live spawn id in state.agents. A mismatch (older build) means poll_tasks can't find the sender.

If you're told "merge branch X to enable dispatch," verify X against main first — the chain is already merged, and re-pushing an auto-deleted branch hits the branch recreation hazard.

See also

  • task-board — scheduler internals (capability scoring, intake drain)
  • agent-harness — zot/Colibri split, autospawn, RPC driver
  • glasspane — how agent stdout becomes observable state
  • cost-dashboard — where cost lands after completion