Convert 'do not', 'cannot', 'never', 'avoid', 'don't' patterns across AGENTS.md, README.md, and 11 docs/*.md files into positive, actionable instructions that tell the reader what TO do. Preserved: hard safety constraints (MUST NOT agent boundaries, vault credential confinement intent) — these are enforceable guardrails where the prohibition IS the instruction.
12 KiB
Multi-Agent Multi-Host — Gap Analysis & Implementation Plan
Created: 2026-06-19 (Sam & Hermes) Status: Phase 1-2 ready for implementation
Context
The 0.10.0 milestone (ISO build, rc.d lifecycle, SIGTERM/socket fixes, release gate) is staged for the FreeBSD build host. The next milestone is proving the multi-agent, multi-host coordination model: multiple agents on different hosts reading from the same Colibri task board, each picking up work by capability, and reporting results back.
PR #83 landed the first cross-host plumbing — a socat TCP bridge, Python polling scripts, and a Hermes cronjob configuration. But the gap analysis below shows that the multi-host plane is packaged and documented but almost entirely untested. This document defines what needs to happen to close that gap.
Current architecture (as of PR #83)
The multi-host stack lives outside the Rust daemon:
FreeBSD host (colibri-daemon)
└── Unix socket: /var/run/colibri/colibri.sock
└── socat bridge (colibri_bridge rc.d, port 9190, Tailscale)
└── TCP reachable from: debby, domedog
└── colibri_poll.py (Python, raw JSON-over-socket)
└── Hermes cronjob (2min poll / 5min work)
- Transport:
tokio::net::UnixListeneronly — zero TCP in Rust. The socat bridge is a shell-level relay. - Agent model:
register-agentstores name + capabilities + status (active/idle/offline). Nohostfield, nolast_seen, no heartbeat, no lease/TTL. - Task assignment:
pick_agent()matches by capability score (partial match counts, highest score wins, tie → later-in-slice).claim_task()is a blind UPDATE with no concurrency guard. - Polling:
colibri_poll.pyquerieslist-tasks status=startedfiltered byagent_id.colibri_task_done.pycallstransition-task. - Spawning:
poll_tasks()in daemon.rs spawns agents forClaimedtasks, skipping those with an existing session (idempotency guard).
Socket command inventory (17 commands, all Unix-socket)
| Category | Commands |
|---|---|
| Daemon | status, glasspane-snapshot, set-cost-mode |
| Session | list-sessions, get-session, compact-session |
| Agent process | spawn-agent, kill-agent |
| Board | list-tasks, create-task, transition-task, claim-task, intake-task |
| Agent registry | register-agent, list-agents |
| Skills | list-skills, register-skill |
CLI surface (10 of 17 commands exposed)
Missing from CLI: claim-task, transition-task, register-agent,
list-agents, set-cost-mode, register-skill (register-skill IS in CLI;
the others are socket-only). Remote agents currently must use raw Python
socket calls.
Gap analysis
What IS tested (single-host, single-agent)
- Agent spawn → JSONL → glasspane → Done lifecycle
- Task create/intake/claim/start/done over socket
- SIGTERM cleanup + stale socket safety
- Session isolation with 2 agents (bypasses task board)
- Cost mode derivation in background rotation
pick_agentunit tests: best match, offline exclusion, no-match, empty-required- Scheduler tick drains intake queue
poll_tasksspawns agent for a claimed task
What is NOT tested
| # | Gap | Severity | Linux-doable? |
|---|---|---|---|
| 1 | Multi-agent task-board contention — pick_agent only tested with 0-1 agents; no capability-based multi-agent assignment test; no same-agent-multiple-tasks test |
High | Yes |
| 2 | CLI surface gaps — claim-task, transition-task, register-agent, list-agents have no CLI; remote agents forced to use raw Python |
Medium | Yes |
| 3 | Agent presence model — missing host, last_seen, and heartbeat/lease columns; add these schema fields to detect stale remote agents |
High | Yes (schema change) |
| 4 | Remote-safe task claim — claim_task is a blind UPDATE, no concurrency safety, no lease/TTL |
Medium | Yes |
| 5 | Python polling scripts — colibri_poll.py and colibri_task_done.py have zero test coverage |
Medium | Yes |
| 6 | TCP bridge round-trip — socat bridge untested end-to-end | Medium | Partial (needs socat or FreeBSD) |
| 7 | Cross-host coordination — no test simulates a remote agent claiming/transitioning a task over the bridge | High | FreeBSD only |
Implementation phases
Phase 1: Multi-agent task board tests (Linux, highest impact)
1a. Pure pick_agent unit tests — extend scheduler.rs test module
Existing tests cover: best match (2 agents, different caps), offline exclusion, no-match, empty-required. Add:
| Test | What it proves |
|---|---|
test_pick_agent_partial_match_wins_over_no_match |
Agent with ["rust","freebsd"] beats agent with ["python"] for required ["freebsd"] |
test_pick_agent_tie_breaking |
Two agents with same score — verify deterministic tie-break (later name wins) |
test_pick_agent_multiple_required_capabilities |
Required ["rust","freebsd"] — agent with both beats agent with one |
test_pick_agent_active_status_eligible |
status: "active" is treated same as "idle" (both eligible) |
1b. Multi-agent board integration test — new file crates/colibri-daemon/tests/multi_agent_board.rs
Full lifecycle: register 2 agents with different capabilities, submit 2 intake
tasks with matching capabilities, run scheduler tick, verify correct assignment,
run poll_tasks, verify both agents spawn and reach Done.
Register agent "freebsd-agent" with ["freebsd"]
Register agent "rust-agent" with ["rust"]
Submit intake "build on freebsd" required ["freebsd"]
Submit intake "write rust code" required ["rust"]
Run scheduler.tick(&state)
→ verify task A agent_id == freebsd-agent.id
→ verify task B agent_id == rust-agent.id
Run poll_tasks(&state)
→ verify 2 agent handles in state.agents
→ verify both tasks transitioned Claimed → Started
→ wait for glasspane Done on both panes
This proves the core multi-agent coordination loop: different agents get different tasks by capability.
1c. Same-capability multi-task test
Register agent "worker" with ["freebsd"]
Submit 2 intake tasks both requiring ["freebsd"]
Run tick + poll_tasks
→ verify both tasks assigned to same agent (documents current behavior)
→ verify both agents spawn independently (session isolation)
→ verify both reach Done
Documents the current contention behavior (no guard against same agent getting multiple tasks) and proves session isolation when one agent handles multiple tasks.
Phase 2: Merge feat/cli-register-agent + add claim/transition CLI
2a. Merge feat/cli-register-agent (existing branch, 64 lines, client-only)
The branch is clean and ready:
Command::RegisterAgent { name, capabilities }+Command::ListAgentsparse_capabilities()helper (reuses--capability/--capabilitiespattern)DaemonClient::register_agent()+DaemonClient::list_agents()- Usage text
Enables: colibri register-agent osa-agent --capability freebsd and
colibri list-agents.
2b. Add claim-task and transition-task to CLI
The two commands colibri_task_done.py currently does via raw socket. Adding
them to the CLI means remote agents can work entirely through the colibri
binary:
colibri claim-task --task-id <UUID> --agent-id <UUID>
colibri transition-task --task-id <UUID> --status done|failed
Implementation:
- Add
Command::ClaimTask { task_id, agent_id }andCommand::TransitionTask { task_id, status }variants - Add
DaemonClient::claim_task()andDaemonClient::transition_task() - Add CLI parsing (follow existing
--flag valuepattern)
2c. Add CLI unit tests for new commands
Parse tests matching existing parses_task_commands style.
Phase 3: Agent presence schema (deferred)
Add host and last_seen columns to the agents table. Update register-agent
to accept an optional host parameter and update last_seen on each call. Add
a heartbeat socket command for liveness. Enables detecting stale remote agents.
Deferred — requires schema migration and broader design discussion about lease semantics. Not blocking the multi-agent test coverage goal.
Phase 4: Polling workflow integration test (deferred)
Test the full poll → claim → work → done cycle from the agent's perspective,
simulating what colibri_poll.py does. Register two agents, create tasks with
different capabilities, verify each agent sees only its tasks via the poll
path, transition tasks to done.
Deferred — depends on Phase 2 CLI additions (so the test can use CLI commands instead of raw socket replication of the Python scripts).
Phase 5: Bridge validation (FreeBSD-only)
Start colibri_bridge with socat on the FreeBSD host. Connect from a second
host via Tailscale TCP. Verify round-trip: status, list-tasks, claim-task all
work over the bridge. Can only be done on FreeBSD 15 with the Tailscale
mesh.
Summary
| Phase | What | Files | Linux? | Status |
|---|---|---|---|---|
| 1a | pick_agent unit tests |
scheduler.rs tests |
Yes | Ready |
| 1b | Multi-agent board integration test | tests/multi_agent_board.rs (new) |
Yes | Ready |
| 1c | Same-capability multi-task test | Same file | Yes | Ready |
| 2a | Merge feat/cli-register-agent |
colibri.rs + lib.rs |
Yes | Branch exists |
| 2b | Add claim-task + transition-task CLI |
colibri.rs + lib.rs |
Yes | Ready |
| 2c | CLI parse tests | colibri.rs tests |
Yes | Ready |
| 3 | Agent presence schema | schema.rs + lib.rs + socket.rs |
Yes | Deferred |
| 4 | Polling workflow test | tests/ |
Yes | Deferred (needs Phase 2) |
| 5 | TCP bridge validation | FreeBSD host | No | FreeBSD lane |
Immediate scope: Phases 1-2. All testable on Linux with cargo test +
cargo clippy gate. No FreeBSD dependency for implementation.