Convert US/ISO prose dates (2026-06-21) to EU format (21.jun.2026) across colibri docs + wiki. Left as-is (data, not prose): the captured JSON "time" timestamp in AGENT-EVENTS-REFERENCE and the rustc/cargo version strings in CLAWDIE-INSTALLER-HANDOFF — ISO is correct for machine timestamps/filenames. Gates: wiki-lint --strict clean; markdown format clean.
13 KiB
Multi-Agent Multi-Host — Gap Analysis & Implementation Plan
Created: 19.jun.2026 (Sam & Hermes) Updated: 21.jun.2026 (Sam & Claude) — reflects 0.11.0 release and narrowed gaps Status: Phase 2a complete; Phase 1 + Phase 2b ready for implementation
Context
Colibri 0.11.0 is released (MIT license, 230 tests, FreeBSD port + CI running).
The tenant/vault provision chain has landed (register-tenant → jail spawn →
provision_tenant_env() → colibri-vault::provision). The next milestone is
proving the multi-agent, multi-host coordination model: multiple agents on
different hosts reading from the same Colibri task board, each picking up work
by capability, and reporting results back.
PR #83 landed the first cross-host plumbing — a socat TCP bridge, Python polling scripts, and a Hermes cronjob configuration. The gap analysis below defines what remains to close the multi-host testing gap.
Current architecture (as of 0.11.0)
The multi-host stack lives outside the Rust daemon:
FreeBSD host (colibri-daemon)
└── Unix socket: /var/run/colibri/colibri.sock
└── socat bridge (colibri_bridge rc.d, port 9190, Tailscale)
└── TCP reachable from: debby, domedog
└── colibri_poll.py (Python, raw JSON-over-socket)
└── Hermes cronjob (2min poll / 5min work)
- Transport:
tokio::net::UnixListeneronly — zero TCP in Rust. The socat bridge is a shell-level relay. - Agent model:
register-agentstores name + capabilities + status (active/idle/offline). Awaitinghostfield,last_seen, heartbeat, and lease/TTL (Phase 3). - Task assignment:
pick_agent()matches by capability score (partial match counts, highest score wins, tie → later-in-slice).claim_task()is a blind UPDATE; await a concurrency guard (Gap 4). - Polling:
colibri_poll.pyquerieslist-tasks status=startedfiltered byagent_id.colibri_task_done.pycallstransition-task. - Spawning:
poll_tasks()in daemon.rs spawns agents forClaimedtasks, skipping those with an existing session (idempotency guard).
Socket command inventory (19 commands, all Unix-socket)
| Category | Commands |
|---|---|
| Daemon | status, glasspane-snapshot, set-cost-mode |
| Session | list-sessions, get-session, compact-session |
| Agent process | spawn-agent, kill-agent |
| Board | list-tasks, create-task, transition-task, claim-task, intake-task |
| Agent registry | register-agent, list-agents |
| Tenant | register-tenant, list-tenants |
| Skills | list-skills, register-skill |
CLI surface (16 of 19 commands exposed)
Awaiting CLI exposure: claim-task, transition-task, set-cost-mode
(Phase 2b). Remote agents currently use raw Python socket calls for these
three commands.
Gap analysis
What IS tested (single-host, single-agent)
- Agent spawn → JSONL → glasspane → Done lifecycle
- Task create/intake/claim/start/done over socket
- SIGTERM cleanup + stale socket safety
- Session isolation with 2 agents (bypasses task board)
- Cost mode derivation in background rotation
pick_agentunit tests: best match (2 agents), offline exclusion, no-match, empty-required, partial scoring, none scoring- Scheduler tick drains intake queue without deadlock
poll_tasksspawns agent for a claimed task- Double-spawn session isolation
- Tenant register + list over socket
Test targets (awaiting coverage)
| # | Gap | Severity | Linux-doable? |
|---|---|---|---|
| 1 | Multi-agent task-board contention — pick_agent tie-breaking, multi-required-capability, and active-status eligibility await dedicated tests |
High | Yes |
| 2 | CLI surface gaps — claim-task, transition-task, set-cost-mode await CLI exposure (Phase 2b) |
Medium | Yes |
| 3 | Agent presence model — await host, last_seen, and heartbeat/lease columns to detect stale remote agents (Phase 3) |
High | Yes (schema change) |
| 4 | Remote-safe task claim — claim_task is a blind UPDATE; await a concurrency guard or lease/TTL |
Medium | Yes |
| 5 | Python polling scripts — colibri_poll.py and colibri_task_done.py have zero test coverage |
Medium | Yes |
| 6 | TCP bridge round-trip — socat bridge untested end-to-end | Medium | Partial (needs socat or FreeBSD) |
| 7 | Cross-host coordination — await a test simulating a remote agent claiming/transitioning a task over the bridge | High | FreeBSD only |
Closed gaps (since the original 19.jun.2026 analysis)
- CLI: register-agent + list-agents — merged (Phase 2a, PR #107)
- CLI: register-tenant + list-tenants + register-skill — merged
- pick_agent scoring — partial-match and no-match scoring tests added
- Tenant/vault provision chain — register-tenant, jail spawn flags,
provision_tenant_env(),colibri-vault::provisionall landed - Issue #88 (CollectionNotFound) — daemon passes
tenant_id(collection name) tovault::provision - Issue #91 (tenant provision target verification) —
trim_trailing_slashstring-equality check - Issue #92 (vault provision canonicalization) — canonicalize + allowed-root containment (PR #119)
Implementation phases
Phase 1: Multi-agent task board tests (Linux, highest impact)
1a. Pure pick_agent unit tests — extend scheduler.rs test module
Existing tests cover: best match (2 agents, different caps), offline exclusion, no-match, empty-required, partial scoring, none scoring, tick-drains-intake. Add:
| Test | What it proves |
|---|---|
test_pick_agent_tie_breaking |
Two agents with same score — verify deterministic tie-break (later-in-slice wins) |
test_pick_agent_multiple_required_capabilities |
Required ["rust","freebsd"] — agent with both beats agent with one |
test_pick_agent_active_status_eligible |
status: "active" is treated same as "idle" (both eligible) |
1b. Multi-agent board integration test — new file crates/colibri-daemon/tests/multi_agent_board.rs
Full lifecycle: register 2 agents with different capabilities, submit 2 intake
tasks with matching capabilities, run scheduler tick, verify correct assignment,
run poll_tasks, verify both agents spawn and reach Done.
Register agent "freebsd-agent" with ["freebsd"]
Register agent "rust-agent" with ["rust"]
Submit intake "build on freebsd" required ["freebsd"]
Submit intake "write rust code" required ["rust"]
Run scheduler.tick(&state)
→ verify task A agent_id == freebsd-agent.id
→ verify task B agent_id == rust-agent.id
Run poll_tasks(&state)
→ verify 2 agent handles in state.agents
→ verify both tasks transitioned Claimed → Started
→ wait for glasspane Done on both panes
This proves the core multi-agent coordination loop: different agents get different tasks by capability.
1c. Same-capability multi-task test
Register agent "worker" with ["freebsd"]
Submit 2 intake tasks both requiring ["freebsd"]
Run tick + poll_tasks
→ verify both tasks assigned to same agent (documents current behavior)
→ verify both agents spawn independently (session isolation)
→ verify both reach Done
Documents the current contention behavior (no guard against same agent getting multiple tasks) and proves session isolation when one agent handles multiple tasks.
Phase 2: CLI surface completion
2a. Merge feat/cli-register-agent — COMPLETE
register-agent and list-agents are in the CLI (merged via PR #107).
2b. Add claim-task, transition-task, and set-cost-mode to CLI
The three commands colibri_task_done.py currently reaches via raw socket.
Adding them to the CLI means remote agents can work entirely through the
colibri binary:
colibri claim-task --task-id <UUID> --agent-id <UUID>
colibri transition-task --task-id <UUID> --status done|failed
colibri set-cost-mode MODE
Implementation:
- Add
Command::ClaimTask { task_id, agent_id },Command::TransitionTask { task_id, status }, andCommand::SetCostMode { mode }variants - Add
DaemonClient::claim_task(),DaemonClient::transition_task(), andDaemonClient::set_cost_mode()methods - Add CLI parsing (follow existing
--flag valuepattern)
2c. Add CLI unit tests for new commands
Parse tests matching existing parses_task_commands style.
Phase 3: Agent presence schema (deferred)
Add host and last_seen columns to the agents table. Update register-agent
to accept an optional host parameter and update last_seen on each call. Add
a heartbeat socket command for liveness. Enables detecting stale remote agents.
Deferred — requires schema migration and broader design discussion about lease semantics. Not blocking the multi-agent test coverage goal.
Phase 4: Polling workflow integration test (deferred)
Test the full poll → claim → work → done cycle from the agent's perspective,
simulating what colibri_poll.py does. Register two agents, create tasks with
different capabilities, verify each agent sees only its tasks via the poll
path, transition tasks to done.
Deferred — depends on Phase 2b CLI additions (so the test can use CLI commands instead of raw socket replication of the Python scripts).
Phase 5: Bridge validation (FreeBSD-only)
Start colibri_bridge with socat on the FreeBSD host. Connect from a second
host via Tailscale TCP. Verify round-trip: status, list-tasks, claim-task all
work over the bridge. Can only be done on FreeBSD 15 with the Tailscale
mesh.
Summary
| Phase | What | Files | Linux? | Status |
|---|---|---|---|---|
| 1a | pick_agent unit tests (3 remaining) |
scheduler.rs tests |
Yes | Ready |
| 1b | Multi-agent board integration test | tests/multi_agent_board.rs (new) |
Yes | Ready |
| 1c | Same-capability multi-task test | Same file | Yes | Ready |
| 2a | Merge feat/cli-register-agent |
colibri.rs + lib.rs |
Yes | Complete |
| 2b | Add claim-task + transition-task + set-cost-mode CLI |
colibri.rs + lib.rs |
Yes | Ready |
| 2c | CLI parse tests | colibri.rs tests |
Yes | Ready |
| 3 | Agent presence schema | schema.rs + lib.rs + socket.rs |
Yes | Deferred |
| 4 | Polling workflow test | tests/ |
Yes | Deferred (needs Phase 2b) |
| 5 | TCP bridge validation | FreeBSD host | No | FreeBSD lane |
Immediate scope: Phases 1 + 2b. All testable on Linux with cargo test +
cargo clippy gate. No FreeBSD dependency for implementation.