colibri/docs/MULTI-AGENT-HOST-PLAN.md
Sam & Claude b878b4bdfb
Some checks failed
CI / agent-jail-pkgs (pull_request) Has been cancelled
CI / rust (pull_request) Has been cancelled
CI / markdown (pull_request) Has been cancelled
CI / port (pull_request) Has been cancelled
docs: rewrite negative patterns as positive actionable instructions
Convert 'do not', 'cannot', 'never', 'avoid', 'don't' patterns across
AGENTS.md, README.md, and 11 docs/*.md files into positive,
actionable instructions that tell the reader what TO do.

Preserved: hard safety constraints (MUST NOT agent boundaries,
vault credential confinement intent) — these are enforceable
guardrails where the prohibition IS the instruction.
2026-06-21 13:09:19 +02:00

12 KiB

Multi-Agent Multi-Host — Gap Analysis & Implementation Plan

Created: 2026-06-19 (Sam & Hermes) Status: Phase 1-2 ready for implementation

Context

The 0.10.0 milestone (ISO build, rc.d lifecycle, SIGTERM/socket fixes, release gate) is staged for the FreeBSD build host. The next milestone is proving the multi-agent, multi-host coordination model: multiple agents on different hosts reading from the same Colibri task board, each picking up work by capability, and reporting results back.

PR #83 landed the first cross-host plumbing — a socat TCP bridge, Python polling scripts, and a Hermes cronjob configuration. But the gap analysis below shows that the multi-host plane is packaged and documented but almost entirely untested. This document defines what needs to happen to close that gap.


Current architecture (as of PR #83)

The multi-host stack lives outside the Rust daemon:

  FreeBSD host (colibri-daemon)
    └── Unix socket: /var/run/colibri/colibri.sock
         └── socat bridge (colibri_bridge rc.d, port 9190, Tailscale)
              └── TCP reachable from: debby, domedog
                   └── colibri_poll.py (Python, raw JSON-over-socket)
                        └── Hermes cronjob (2min poll / 5min work)
  • Transport: tokio::net::UnixListener only — zero TCP in Rust. The socat bridge is a shell-level relay.
  • Agent model: register-agent stores name + capabilities + status (active/idle/offline). No host field, no last_seen, no heartbeat, no lease/TTL.
  • Task assignment: pick_agent() matches by capability score (partial match counts, highest score wins, tie → later-in-slice). claim_task() is a blind UPDATE with no concurrency guard.
  • Polling: colibri_poll.py queries list-tasks status=started filtered by agent_id. colibri_task_done.py calls transition-task.
  • Spawning: poll_tasks() in daemon.rs spawns agents for Claimed tasks, skipping those with an existing session (idempotency guard).

Socket command inventory (17 commands, all Unix-socket)

Category Commands
Daemon status, glasspane-snapshot, set-cost-mode
Session list-sessions, get-session, compact-session
Agent process spawn-agent, kill-agent
Board list-tasks, create-task, transition-task, claim-task, intake-task
Agent registry register-agent, list-agents
Skills list-skills, register-skill

CLI surface (10 of 17 commands exposed)

Missing from CLI: claim-task, transition-task, register-agent, list-agents, set-cost-mode, register-skill (register-skill IS in CLI; the others are socket-only). Remote agents currently must use raw Python socket calls.


Gap analysis

What IS tested (single-host, single-agent)

  • Agent spawn → JSONL → glasspane → Done lifecycle
  • Task create/intake/claim/start/done over socket
  • SIGTERM cleanup + stale socket safety
  • Session isolation with 2 agents (bypasses task board)
  • Cost mode derivation in background rotation
  • pick_agent unit tests: best match, offline exclusion, no-match, empty-required
  • Scheduler tick drains intake queue
  • poll_tasks spawns agent for a claimed task

What is NOT tested

# Gap Severity Linux-doable?
1 Multi-agent task-board contentionpick_agent only tested with 0-1 agents; no capability-based multi-agent assignment test; no same-agent-multiple-tasks test High Yes
2 CLI surface gapsclaim-task, transition-task, register-agent, list-agents have no CLI; remote agents forced to use raw Python Medium Yes
3 Agent presence model — missing host, last_seen, and heartbeat/lease columns; add these schema fields to detect stale remote agents High Yes (schema change)
4 Remote-safe task claimclaim_task is a blind UPDATE, no concurrency safety, no lease/TTL Medium Yes
5 Python polling scriptscolibri_poll.py and colibri_task_done.py have zero test coverage Medium Yes
6 TCP bridge round-trip — socat bridge untested end-to-end Medium Partial (needs socat or FreeBSD)
7 Cross-host coordination — no test simulates a remote agent claiming/transitioning a task over the bridge High FreeBSD only

Implementation phases

Phase 1: Multi-agent task board tests (Linux, highest impact)

1a. Pure pick_agent unit tests — extend scheduler.rs test module

Existing tests cover: best match (2 agents, different caps), offline exclusion, no-match, empty-required. Add:

Test What it proves
test_pick_agent_partial_match_wins_over_no_match Agent with ["rust","freebsd"] beats agent with ["python"] for required ["freebsd"]
test_pick_agent_tie_breaking Two agents with same score — verify deterministic tie-break (later name wins)
test_pick_agent_multiple_required_capabilities Required ["rust","freebsd"] — agent with both beats agent with one
test_pick_agent_active_status_eligible status: "active" is treated same as "idle" (both eligible)

1b. Multi-agent board integration test — new file crates/colibri-daemon/tests/multi_agent_board.rs

Full lifecycle: register 2 agents with different capabilities, submit 2 intake tasks with matching capabilities, run scheduler tick, verify correct assignment, run poll_tasks, verify both agents spawn and reach Done.

Register agent "freebsd-agent" with ["freebsd"]
Register agent "rust-agent" with ["rust"]
Submit intake "build on freebsd" required ["freebsd"]
Submit intake "write rust code" required ["rust"]
Run scheduler.tick(&state)
  → verify task A agent_id == freebsd-agent.id
  → verify task B agent_id == rust-agent.id
Run poll_tasks(&state)
  → verify 2 agent handles in state.agents
  → verify both tasks transitioned Claimed → Started
  → wait for glasspane Done on both panes

This proves the core multi-agent coordination loop: different agents get different tasks by capability.

1c. Same-capability multi-task test

Register agent "worker" with ["freebsd"]
Submit 2 intake tasks both requiring ["freebsd"]
Run tick + poll_tasks
  → verify both tasks assigned to same agent (documents current behavior)
  → verify both agents spawn independently (session isolation)
  → verify both reach Done

Documents the current contention behavior (no guard against same agent getting multiple tasks) and proves session isolation when one agent handles multiple tasks.

Phase 2: Merge feat/cli-register-agent + add claim/transition CLI

2a. Merge feat/cli-register-agent (existing branch, 64 lines, client-only)

The branch is clean and ready:

  • Command::RegisterAgent { name, capabilities } + Command::ListAgents
  • parse_capabilities() helper (reuses --capability/--capabilities pattern)
  • DaemonClient::register_agent() + DaemonClient::list_agents()
  • Usage text

Enables: colibri register-agent osa-agent --capability freebsd and colibri list-agents.

2b. Add claim-task and transition-task to CLI

The two commands colibri_task_done.py currently does via raw socket. Adding them to the CLI means remote agents can work entirely through the colibri binary:

colibri claim-task --task-id <UUID> --agent-id <UUID>
colibri transition-task --task-id <UUID> --status done|failed

Implementation:

  • Add Command::ClaimTask { task_id, agent_id } and Command::TransitionTask { task_id, status } variants
  • Add DaemonClient::claim_task() and DaemonClient::transition_task()
  • Add CLI parsing (follow existing --flag value pattern)

2c. Add CLI unit tests for new commands

Parse tests matching existing parses_task_commands style.

Phase 3: Agent presence schema (deferred)

Add host and last_seen columns to the agents table. Update register-agent to accept an optional host parameter and update last_seen on each call. Add a heartbeat socket command for liveness. Enables detecting stale remote agents.

Deferred — requires schema migration and broader design discussion about lease semantics. Not blocking the multi-agent test coverage goal.

Phase 4: Polling workflow integration test (deferred)

Test the full poll → claim → work → done cycle from the agent's perspective, simulating what colibri_poll.py does. Register two agents, create tasks with different capabilities, verify each agent sees only its tasks via the poll path, transition tasks to done.

Deferred — depends on Phase 2 CLI additions (so the test can use CLI commands instead of raw socket replication of the Python scripts).

Phase 5: Bridge validation (FreeBSD-only)

Start colibri_bridge with socat on the FreeBSD host. Connect from a second host via Tailscale TCP. Verify round-trip: status, list-tasks, claim-task all work over the bridge. Can only be done on FreeBSD 15 with the Tailscale mesh.


Summary

Phase What Files Linux? Status
1a pick_agent unit tests scheduler.rs tests Yes Ready
1b Multi-agent board integration test tests/multi_agent_board.rs (new) Yes Ready
1c Same-capability multi-task test Same file Yes Ready
2a Merge feat/cli-register-agent colibri.rs + lib.rs Yes Branch exists
2b Add claim-task + transition-task CLI colibri.rs + lib.rs Yes Ready
2c CLI parse tests colibri.rs tests Yes Ready
3 Agent presence schema schema.rs + lib.rs + socket.rs Yes Deferred
4 Polling workflow test tests/ Yes Deferred (needs Phase 2)
5 TCP bridge validation FreeBSD host No FreeBSD lane

Immediate scope: Phases 1-2. All testable on Linux with cargo test + cargo clippy gate. No FreeBSD dependency for implementation.