colibri/docs/MULTI-AGENT-HOST-PLAN.md
Sam & Claude 95bf3f396d
Some checks are pending
CI / rust (push) Waiting to run
CI / markdown (push) Waiting to run
CI / port (push) Waiting to run
CI / agent-jail-pkgs (push) Waiting to run
fix(store): atomic+exclusive claim_task — close Gap 4 concurrency guard (#190)
Co-authored-by: Sam & Claude <hello@clawdie.si>
Co-committed-by: Sam & Claude <hello@clawdie.si>
2026-06-25 17:33:15 +02:00

15 KiB

Multi-Agent Multi-Host — Gap Analysis & Implementation Plan

Created: 19.jun.2026 (Sam & Hermes) Updated: 25.jun.2026 (Sam & Claude) — reflects 0.12.0 release; Phases 1 + 2 complete Status: Phases 1 + 2 complete; Phase 3 (agent presence schema) deferred

Context

Colibri 0.12.0 is released (MIT license, 258 tests, FreeBSD port + CI running). The tenant/vault provision chain has landed (register-tenant → jail spawn → provision_tenant_env()colibri-vault::provision). The next milestone is proving the multi-agent, multi-host coordination model: multiple agents on different hosts reading from the same Colibri task board, each picking up work by capability, and reporting results back.

PR #83 landed the first cross-host plumbing — a socat TCP bridge, Python polling scripts, and a Hermes cronjob configuration. The gap analysis below defines what remains to close the multi-host testing gap.


Current architecture (as of 0.12.0)

The multi-host stack lives outside the Rust daemon:

  FreeBSD host (colibri-daemon)
    └── Unix socket: /var/run/colibri/colibri.sock
         └── socat bridge (colibri_bridge rc.d, port 9190, Tailscale)
              └── TCP reachable from: debby, domedog
                   └── colibri_poll.py (Python, raw JSON-over-socket)
                        └── Hermes cronjob (2min poll / 5min work)
  • Transport: tokio::net::UnixListener only — zero TCP in Rust. The socat bridge is a shell-level relay.
  • Agent model: register-agent stores name + capabilities + status (active/idle/offline). Awaiting host field, last_seen, heartbeat, and lease/TTL (Phase 3).
  • Task assignment: pick_agent() matches by capability score (partial match counts, highest score wins, tie → later-in-slice). claim_task() is a blind UPDATE; await a concurrency guard (Gap 4).
  • Polling: colibri_poll.py queries list-tasks status=started filtered by agent_id. colibri_task_done.py calls transition-task.
  • Spawning: poll_tasks() in daemon.rs spawns agents for Claimed tasks, skipping those with an existing session (idempotency guard).

Socket command inventory (19 commands, all Unix-socket)

Category Commands
Daemon status, glasspane-snapshot, set-cost-mode
Session list-sessions, get-session, compact-session
Agent process spawn-agent, kill-agent
Board list-tasks, create-task, transition-task, claim-task, intake-task
Agent registry register-agent, list-agents
Tenant register-tenant, list-tenants
Skills list-skills, register-skill

CLI surface (19 of 19 commands exposed)

All socket commands now have CLI wrappers. claim-task, transition-task, and set-cost-mode were added in Phase 2b (PR #138).


Gap analysis

What IS tested (single-host, single-agent)

  • Agent spawn → JSONL → glasspane → Done lifecycle
  • Task create/intake/claim/start/done over socket
  • SIGTERM cleanup + stale socket safety
  • Session isolation with 2 agents (bypasses task board)
  • Cost mode derivation in background rotation
  • pick_agent unit tests: best match (2 agents), offline exclusion, no-match, empty-required, partial scoring, none scoring
  • Scheduler tick drains intake queue without deadlock
  • poll_tasks spawns agent for a claimed task
  • Double-spawn session isolation
  • Tenant register + list over socket

Test targets (remaining gaps)

# Gap Severity Linux-doable?
3 Agent presence model — await host, last_seen, and heartbeat/lease columns to detect stale remote agents (Phase 3) High Yes (schema change)
5 Python polling scriptscolibri_poll.py and colibri_task_done.py have zero test coverage Medium Yes
6 TCP bridge round-trip — socat bridge untested end-to-end Medium Partial (needs socat or FreeBSD)
7 Cross-host coordination — await a test simulating a remote agent claiming/transitioning a task over the bridge High FreeBSD only

Closed gaps (since the original 19.jun.2026 analysis)

  • Remote-safe task claim (Gap 4)claim_task was a blind UPDATE (last writer wins). Now guarded on status = 'queued', so the claim is atomic and exclusive: racing agents — exactly the contention the Tailscale bridge exposes — get a Conflict instead of silently stealing a claimed task. Covered by test_claim_task_is_exclusive (store) and socket_rejects_double_claim_of_same_task (daemon, end-to-end over the socket).
  • Multi-agent task-board contention (Gap 1) — tie-breaking, multi-required- capability, and active-status eligibility tests added (Phase 1a, PR #138). Full board lifecycle, capability routing, and contention tests added (Phase 1b/1c, PR #186). Key finding: capabilities must be registered as a JSON array (["freebsd"]); the object form ({"freebsd":true}) silently scores zero in pick_agent because it deserializes Vec<String>.
  • CLI surface gaps (Gap 2)claim-task, transition-task, set-cost-mode added to CLI with parse tests (Phase 2b/2c, PR #138). CLI surface is now 19/19.
  • CLI: register-agent + list-agents — merged (Phase 2a, PR #107)
  • CLI: register-tenant + list-tenants + register-skill — merged
  • pick_agent scoring — partial-match and no-match scoring tests added
  • Tenant/vault provision chain — register-tenant, jail spawn flags, provision_tenant_env(), colibri-vault::provision all landed
  • Issue #88 (CollectionNotFound) — daemon passes tenant_id (collection name) to vault::provision
  • Issue #91 (tenant provision target verification) — trim_trailing_slash string-equality check
  • Issue #92 (vault provision canonicalization) — canonicalize + allowed-root containment (PR #119)

Implementation phases

Phase 1: Multi-agent task board tests — COMPLETE

1a. Pure pick_agent unit tests — COMPLETE (PR #138)

Added to scheduler.rs test module:

Test What it proves
test_pick_agent_tie_breaking Two agents with same score — verify deterministic tie-break (later-in-slice wins)
test_pick_agent_multiple_required_capabilities Required ["rust","freebsd"] — agent with both beats agent with one
test_pick_agent_active_status_eligible status: "active" is treated same as "idle" (both eligible)

1b. Multi-agent board integration test — COMPLETE (PR #186)

Full lifecycle via real Unix socket: register 2 agents with disjoint capabilities, submit 2 intake tasks with matching required capabilities, scheduler's pick_agent auto-routes each task to the capable agent (no manual claim), verified by polling list-tasks status=claimed.

Register agent "sysadmin" with ["freebsd"]   ← array form required
Register agent "db-admin" with ["postgres"]
Submit intake "scrub zroot" required ["freebsd"]
Submit intake "vacuum db" required ["postgres"]
Scheduler tick (50ms interval) auto-claims:
  → verify freebsd task agent_id == sysadmin.id
  → verify postgres task agent_id == db-admin.id

This proves the core multi-agent coordination loop: different agents get different tasks by capability, assigned by the scheduler.

Capability format: pick_agent deserializes Vec<String>, so capabilities must be registered as a JSON array (["freebsd"]). The object form ({"freebsd":true}) silently deserializes to an empty vec and scores zero. The board-mechanics tests (1a, 1c) use manual claim-task so the format is inert there; the routing test (1b) uses array form and documents this requirement.

1c. Same-capability multi-task test — COMPLETE (PR #186)

Register agent "worker" with ["freebsd"]
Create 2 plain board tasks (scrub zroot, check smart)
Same agent claims both via manual claim-task
  → verify both transition started → done
  → documents current contention behavior (no guard)

Documents the current contention behavior (no guard against same agent getting multiple tasks) and proves session isolation when one agent handles multiple tasks.

Phase 2: CLI surface completion

2a. Merge feat/cli-register-agent — COMPLETE

register-agent and list-agents are in the CLI (merged via PR #107).

2b. Add claim-task, transition-task, and set-cost-mode to CLI — COMPLETE (PR #138)

All three commands are in the CLI. Remote agents can now work entirely through the colibri binary:

colibri claim-task --task-id <UUID> --agent-id <UUID>
colibri transition-task --task-id <UUID> --status done|failed
colibri set-cost-mode MODE

2c. CLI unit tests for new commands — COMPLETE (PR #138)

Parse tests added: parses_claim_task, parses_transition_task, parses_set_cost_mode, rejects_claim_task_missing_flags, rejects_transition_task_missing_flags, rejects_set_cost_mode_without_arg.

Phase 3: Agent presence schema (deferred)

Add host and last_seen columns to the agents table. Update register-agent to accept an optional host parameter and update last_seen on each call. Add a heartbeat socket command for liveness. Enables detecting stale remote agents.

Deferred — requires schema migration and broader design discussion about lease semantics. Not blocking the multi-agent test coverage goal.

Phase 4: Polling workflow integration test (deferred)

Test the full poll → claim → work → done cycle from the agent's perspective, simulating what colibri_poll.py does. Register two agents, create tasks with different capabilities, verify each agent sees only its tasks via the poll path, transition tasks to done.

Deferred — Phase 2b CLI additions are now complete; this test can be written when prioritized.

Phase 5: Bridge validation (FreeBSD-only)

Closes Gap 6 (bridge round-trip) and Gap 7 (cross-host coordination) on the real Tailscale mesh — an operational run, not more code. The bridge is the colibri_bridge rc.d service running socat TCP-LISTEN:9190,fork → UNIX-CONNECT:/var/run/colibri/colibri.sock.

Prerequisites

  • colibri_daemon running (socket at /var/run/colibri/colibri.sock)
  • pkg install socat
  • both hosts on the tailnet; a pf rule allowing the bridge port inbound on tailscale0 only, never the public interface

On the FreeBSD host (OSA)

sysrc colibri_bridge_enable=YES
sysrc colibri_bridge_listen_addr=<osa-tailnet-ip>   # this host's tailnet address
service colibri_bridge start
sockstat -4 -l | grep 9190                           # confirm socat is listening

From a second host (e.g. domedog) over Tailscale — the remote path is raw TCP (the colibri_poll.py script speaks the local Unix socket; over the wire, send newline-delimited JSON with nc):

printf '%s\n' '{"cmd":"status"}'     | nc -w2 <osa-tailnet-ip> 9190
printf '%s\n' '{"cmd":"list-tasks"}' | nc -w2 <osa-tailnet-ip> 9190

Cross-host coordination — the real proof (Gap 7)

  1. Register a remote agent (array-form caps — object form scores zero in pick_agent): {"cmd":"register-agent","name":"domedog","capabilities":["linux"]}
  2. Submit an intake task requiring that capability: {"cmd":"intake-task","title":"...","capabilities":["linux"]}
  3. Confirm the scheduler routed it: {"cmd":"list-tasks","status":"claimed"} → the task's agent_id is the remote agent.
  4. From the remote side, transition-task it to done and verify.

Acceptance: a task created on OSA is claimed and driven to done by an agent on a different host, entirely over the Tailscale bridge — the same routing the scheduler_routes_intake_tasks_by_capability test proves on a single host.

Security: bind to the tailnet interface only and scope the pf rule to tailscale0. Use placeholder tailnet addresses in any committed notes — never paste real 100.x IPs into git. (The shipped colibri_bridge.in currently hardcodes a real default listen_addr; that should be scrubbed to a placeholder or required-via-rc.conf separately.)


Summary

Phase What Files Linux? Status
1a pick_agent unit tests (tie-break, multi-cap, active) scheduler.rs tests Yes Complete (PR #138)
1b Multi-agent board integration test (capability routing) tests/multi_agent_board.rs Yes Complete (PR #186)
1c Same-capability multi-task test (contention) tests/multi_agent_board.rs Yes Complete (PR #186)
2a Merge feat/cli-register-agent colibri.rs + lib.rs Yes Complete (PR #107)
2b Add claim-task + transition-task + set-cost-mode CLI colibri.rs + lib.rs Yes Complete (PR #138)
2c CLI parse tests colibri.rs tests Yes Complete (PR #138)
3 Agent presence schema schema.rs + lib.rs + socket.rs Yes Deferred
4 Polling workflow test tests/ Yes Deferred
5 TCP bridge validation FreeBSD host No FreeBSD lane

Phases 1 + 2 complete. Next scope: Phase 3 (agent presence schema) or Phase 5 (FreeBSD bridge validation).