Co-authored-by: Sam & Claude <hello@clawdie.si> Co-committed-by: Sam & Claude <hello@clawdie.si>
15 KiB
Multi-Agent Multi-Host — Gap Analysis & Implementation Plan
Created: 19.jun.2026 (Sam & Hermes) Updated: 25.jun.2026 (Sam & Claude) — reflects 0.12.0 release; Phases 1 + 2 complete Status: Phases 1 + 2 complete; Phase 3 (agent presence schema) deferred
Context
Colibri 0.12.0 is released (MIT license, 258 tests, FreeBSD port + CI running).
The tenant/vault provision chain has landed (register-tenant → jail spawn →
provision_tenant_env() → colibri-vault::provision). The next milestone is
proving the multi-agent, multi-host coordination model: multiple agents on
different hosts reading from the same Colibri task board, each picking up work
by capability, and reporting results back.
PR #83 landed the first cross-host plumbing — a socat TCP bridge, Python polling scripts, and a Hermes cronjob configuration. The gap analysis below defines what remains to close the multi-host testing gap.
Current architecture (as of 0.12.0)
The multi-host stack lives outside the Rust daemon:
FreeBSD host (colibri-daemon)
└── Unix socket: /var/run/colibri/colibri.sock
└── socat bridge (colibri_bridge rc.d, port 9190, Tailscale)
└── TCP reachable from: debby, domedog
└── colibri_poll.py (Python, raw JSON-over-socket)
└── Hermes cronjob (2min poll / 5min work)
- Transport:
tokio::net::UnixListeneronly — zero TCP in Rust. The socat bridge is a shell-level relay. - Agent model:
register-agentstores name + capabilities + status (active/idle/offline). Awaitinghostfield,last_seen, heartbeat, and lease/TTL (Phase 3). - Task assignment:
pick_agent()matches by capability score (partial match counts, highest score wins, tie → later-in-slice).claim_task()is a blind UPDATE; await a concurrency guard (Gap 4). - Polling:
colibri_poll.pyquerieslist-tasks status=startedfiltered byagent_id.colibri_task_done.pycallstransition-task. - Spawning:
poll_tasks()in daemon.rs spawns agents forClaimedtasks, skipping those with an existing session (idempotency guard).
Socket command inventory (19 commands, all Unix-socket)
| Category | Commands |
|---|---|
| Daemon | status, glasspane-snapshot, set-cost-mode |
| Session | list-sessions, get-session, compact-session |
| Agent process | spawn-agent, kill-agent |
| Board | list-tasks, create-task, transition-task, claim-task, intake-task |
| Agent registry | register-agent, list-agents |
| Tenant | register-tenant, list-tenants |
| Skills | list-skills, register-skill |
CLI surface (19 of 19 commands exposed)
All socket commands now have CLI wrappers. claim-task, transition-task,
and set-cost-mode were added in Phase 2b (PR #138).
Gap analysis
What IS tested (single-host, single-agent)
- Agent spawn → JSONL → glasspane → Done lifecycle
- Task create/intake/claim/start/done over socket
- SIGTERM cleanup + stale socket safety
- Session isolation with 2 agents (bypasses task board)
- Cost mode derivation in background rotation
pick_agentunit tests: best match (2 agents), offline exclusion, no-match, empty-required, partial scoring, none scoring- Scheduler tick drains intake queue without deadlock
poll_tasksspawns agent for a claimed task- Double-spawn session isolation
- Tenant register + list over socket
Test targets (remaining gaps)
| # | Gap | Severity | Linux-doable? |
|---|---|---|---|
| 3 | Agent presence model — await host, last_seen, and heartbeat/lease columns to detect stale remote agents (Phase 3) |
High | Yes (schema change) |
| 5 | Python polling scripts — colibri_poll.py and colibri_task_done.py have zero test coverage |
Medium | Yes |
| 6 | TCP bridge round-trip — socat bridge untested end-to-end | Medium | Partial (needs socat or FreeBSD) |
| 7 | Cross-host coordination — await a test simulating a remote agent claiming/transitioning a task over the bridge | High | FreeBSD only |
Closed gaps (since the original 19.jun.2026 analysis)
- Remote-safe task claim (Gap 4) —
claim_taskwas a blind UPDATE (last writer wins). Now guarded onstatus = 'queued', so the claim is atomic and exclusive: racing agents — exactly the contention the Tailscale bridge exposes — get aConflictinstead of silently stealing a claimed task. Covered bytest_claim_task_is_exclusive(store) andsocket_rejects_double_claim_of_same_task(daemon, end-to-end over the socket). - Multi-agent task-board contention (Gap 1) — tie-breaking, multi-required-
capability, and active-status eligibility tests added (Phase 1a, PR #138).
Full board lifecycle, capability routing, and contention tests added
(Phase 1b/1c, PR #186). Key finding: capabilities must be registered as a
JSON array (
["freebsd"]); the object form ({"freebsd":true}) silently scores zero inpick_agentbecause it deserializesVec<String>. - CLI surface gaps (Gap 2) —
claim-task,transition-task,set-cost-modeadded to CLI with parse tests (Phase 2b/2c, PR #138). CLI surface is now 19/19. - CLI: register-agent + list-agents — merged (Phase 2a, PR #107)
- CLI: register-tenant + list-tenants + register-skill — merged
- pick_agent scoring — partial-match and no-match scoring tests added
- Tenant/vault provision chain — register-tenant, jail spawn flags,
provision_tenant_env(),colibri-vault::provisionall landed - Issue #88 (CollectionNotFound) — daemon passes
tenant_id(collection name) tovault::provision - Issue #91 (tenant provision target verification) —
trim_trailing_slashstring-equality check - Issue #92 (vault provision canonicalization) — canonicalize + allowed-root containment (PR #119)
Implementation phases
Phase 1: Multi-agent task board tests — COMPLETE
1a. Pure pick_agent unit tests — COMPLETE (PR #138)
Added to scheduler.rs test module:
| Test | What it proves |
|---|---|
test_pick_agent_tie_breaking |
Two agents with same score — verify deterministic tie-break (later-in-slice wins) |
test_pick_agent_multiple_required_capabilities |
Required ["rust","freebsd"] — agent with both beats agent with one |
test_pick_agent_active_status_eligible |
status: "active" is treated same as "idle" (both eligible) |
1b. Multi-agent board integration test — COMPLETE (PR #186)
Full lifecycle via real Unix socket: register 2 agents with disjoint
capabilities, submit 2 intake tasks with matching required capabilities,
scheduler's pick_agent auto-routes each task to the capable agent (no manual
claim), verified by polling list-tasks status=claimed.
Register agent "sysadmin" with ["freebsd"] ← array form required
Register agent "db-admin" with ["postgres"]
Submit intake "scrub zroot" required ["freebsd"]
Submit intake "vacuum db" required ["postgres"]
Scheduler tick (50ms interval) auto-claims:
→ verify freebsd task agent_id == sysadmin.id
→ verify postgres task agent_id == db-admin.id
This proves the core multi-agent coordination loop: different agents get different tasks by capability, assigned by the scheduler.
Capability format:
pick_agentdeserializesVec<String>, so capabilities must be registered as a JSON array (["freebsd"]). The object form ({"freebsd":true}) silently deserializes to an empty vec and scores zero. The board-mechanics tests (1a, 1c) use manualclaim-taskso the format is inert there; the routing test (1b) uses array form and documents this requirement.
1c. Same-capability multi-task test — COMPLETE (PR #186)
Register agent "worker" with ["freebsd"]
Create 2 plain board tasks (scrub zroot, check smart)
Same agent claims both via manual claim-task
→ verify both transition started → done
→ documents current contention behavior (no guard)
Documents the current contention behavior (no guard against same agent getting multiple tasks) and proves session isolation when one agent handles multiple tasks.
Phase 2: CLI surface completion
2a. Merge feat/cli-register-agent — COMPLETE
register-agent and list-agents are in the CLI (merged via PR #107).
2b. Add claim-task, transition-task, and set-cost-mode to CLI — COMPLETE (PR #138)
All three commands are in the CLI. Remote agents can now work entirely through
the colibri binary:
colibri claim-task --task-id <UUID> --agent-id <UUID>
colibri transition-task --task-id <UUID> --status done|failed
colibri set-cost-mode MODE
2c. CLI unit tests for new commands — COMPLETE (PR #138)
Parse tests added: parses_claim_task, parses_transition_task,
parses_set_cost_mode, rejects_claim_task_missing_flags,
rejects_transition_task_missing_flags, rejects_set_cost_mode_without_arg.
Phase 3: Agent presence schema (deferred)
Add host and last_seen columns to the agents table. Update register-agent
to accept an optional host parameter and update last_seen on each call. Add
a heartbeat socket command for liveness. Enables detecting stale remote agents.
Deferred — requires schema migration and broader design discussion about lease semantics. Not blocking the multi-agent test coverage goal.
Phase 4: Polling workflow integration test (deferred)
Test the full poll → claim → work → done cycle from the agent's perspective,
simulating what colibri_poll.py does. Register two agents, create tasks with
different capabilities, verify each agent sees only its tasks via the poll
path, transition tasks to done.
Deferred — Phase 2b CLI additions are now complete; this test can be written when prioritized.
Phase 5: Bridge validation (FreeBSD-only)
Closes Gap 6 (bridge round-trip) and Gap 7 (cross-host coordination) on the real
Tailscale mesh — an operational run, not more code. The bridge is the
colibri_bridge rc.d service running
socat TCP-LISTEN:9190,fork → UNIX-CONNECT:/var/run/colibri/colibri.sock.
Prerequisites
colibri_daemonrunning (socket at/var/run/colibri/colibri.sock)pkg install socat- both hosts on the tailnet; a
pfrule allowing the bridge port inbound ontailscale0only, never the public interface
On the FreeBSD host (OSA)
sysrc colibri_bridge_enable=YES
sysrc colibri_bridge_listen_addr=<osa-tailnet-ip> # this host's tailnet address
service colibri_bridge start
sockstat -4 -l | grep 9190 # confirm socat is listening
From a second host (e.g. domedog) over Tailscale — the remote path is raw
TCP (the colibri_poll.py script speaks the local Unix socket; over the wire,
send newline-delimited JSON with nc):
printf '%s\n' '{"cmd":"status"}' | nc -w2 <osa-tailnet-ip> 9190
printf '%s\n' '{"cmd":"list-tasks"}' | nc -w2 <osa-tailnet-ip> 9190
Cross-host coordination — the real proof (Gap 7)
- Register a remote agent (array-form caps — object form scores zero in
pick_agent):{"cmd":"register-agent","name":"domedog","capabilities":["linux"]} - Submit an intake task requiring that capability:
{"cmd":"intake-task","title":"...","capabilities":["linux"]} - Confirm the scheduler routed it:
{"cmd":"list-tasks","status":"claimed"}→ the task'sagent_idis the remote agent. - From the remote side,
transition-taskit todoneand verify.
Acceptance: a task created on OSA is claimed and driven to done by an agent
on a different host, entirely over the Tailscale bridge — the same routing the
scheduler_routes_intake_tasks_by_capability test proves on a single host.
Security: bind to the tailnet interface only and scope the pf rule to
tailscale0. Use placeholder tailnet addresses in any committed notes — never
paste real 100.x IPs into git. (The shipped colibri_bridge.in currently
hardcodes a real default listen_addr; that should be scrubbed to a placeholder
or required-via-rc.conf separately.)
Summary
| Phase | What | Files | Linux? | Status |
|---|---|---|---|---|
| 1a | pick_agent unit tests (tie-break, multi-cap, active) |
scheduler.rs tests |
Yes | Complete (PR #138) |
| 1b | Multi-agent board integration test (capability routing) | tests/multi_agent_board.rs |
Yes | Complete (PR #186) |
| 1c | Same-capability multi-task test (contention) | tests/multi_agent_board.rs |
Yes | Complete (PR #186) |
| 2a | Merge feat/cli-register-agent |
colibri.rs + lib.rs |
Yes | Complete (PR #107) |
| 2b | Add claim-task + transition-task + set-cost-mode CLI |
colibri.rs + lib.rs |
Yes | Complete (PR #138) |
| 2c | CLI parse tests | colibri.rs tests |
Yes | Complete (PR #138) |
| 3 | Agent presence schema | schema.rs + lib.rs + socket.rs |
Yes | Deferred |
| 4 | Polling workflow test | tests/ |
Yes | Deferred |
| 5 | TCP bridge validation | FreeBSD host | No | FreeBSD lane |
Phases 1 + 2 complete. Next scope: Phase 3 (agent presence schema) or Phase 5 (FreeBSD bridge validation).