docs: multi-agent multi-host gap analysis + implementation plan (Sam & Hermes) #84
3 changed files with 245 additions and 8 deletions
225
docs/MULTI-AGENT-HOST-PLAN.md
Normal file
225
docs/MULTI-AGENT-HOST-PLAN.md
Normal file
|
|
@ -0,0 +1,225 @@
|
|||
# Multi-Agent Multi-Host — Gap Analysis & Implementation Plan
|
||||
|
||||
**Created:** 2026-06-19 (Sam & Hermes)
|
||||
**Status:** Phase 1-2 ready for implementation
|
||||
|
||||
## Context
|
||||
|
||||
The 0.10.0 milestone (ISO build, rc.d lifecycle, SIGTERM/socket fixes, release
|
||||
gate) is staged for the FreeBSD build host. The next milestone is proving the
|
||||
multi-agent, multi-host coordination model: multiple agents on different hosts
|
||||
reading from the same Colibri task board, each picking up work by capability,
|
||||
and reporting results back.
|
||||
|
||||
PR #83 landed the first cross-host plumbing — a socat TCP bridge, Python polling
|
||||
scripts, and a Hermes cronjob configuration. But the gap analysis below shows
|
||||
that **the multi-host plane is packaged and documented but almost entirely
|
||||
untested**. This document defines what needs to happen to close that gap.
|
||||
|
||||
---
|
||||
|
||||
## Current architecture (as of PR #83)
|
||||
|
||||
The multi-host stack lives **outside the Rust daemon**:
|
||||
|
||||
```
|
||||
FreeBSD host (colibri-daemon)
|
||||
└── Unix socket: /var/run/colibri/colibri.sock
|
||||
└── socat bridge (colibri_bridge rc.d, port 9190, Tailscale)
|
||||
└── TCP reachable from: debby, domedog
|
||||
└── colibri_poll.py (Python, raw JSON-over-socket)
|
||||
└── Hermes cronjob (2min poll / 5min work)
|
||||
```
|
||||
|
||||
- **Transport:** `tokio::net::UnixListener` only — zero TCP in Rust. The socat
|
||||
bridge is a shell-level relay.
|
||||
- **Agent model:** `register-agent` stores name + capabilities + status
|
||||
(`active`/`idle`/`offline`). No `host` field, no `last_seen`, no heartbeat,
|
||||
no lease/TTL.
|
||||
- **Task assignment:** `pick_agent()` matches by capability score (partial
|
||||
match counts, highest score wins, tie → later-in-slice). `claim_task()` is a
|
||||
blind UPDATE with no concurrency guard.
|
||||
- **Polling:** `colibri_poll.py` queries `list-tasks status=started` filtered
|
||||
by `agent_id`. `colibri_task_done.py` calls `transition-task`.
|
||||
- **Spawning:** `poll_tasks()` in daemon.rs spawns agents for `Claimed` tasks,
|
||||
skipping those with an existing session (idempotency guard).
|
||||
|
||||
### Socket command inventory (17 commands, all Unix-socket)
|
||||
|
||||
| Category | Commands |
|
||||
|----------|----------|
|
||||
| Daemon | `status`, `glasspane-snapshot`, `set-cost-mode` |
|
||||
| Session | `list-sessions`, `get-session`, `compact-session` |
|
||||
| Agent process | `spawn-agent`, `kill-agent` |
|
||||
| Board | `list-tasks`, `create-task`, `transition-task`, `claim-task`, `intake-task` |
|
||||
| Agent registry | `register-agent`, `list-agents` |
|
||||
| Skills | `list-skills`, `register-skill` |
|
||||
|
||||
### CLI surface (10 of 17 commands exposed)
|
||||
|
||||
Missing from CLI: `claim-task`, `transition-task`, `register-agent`,
|
||||
`list-agents`, `set-cost-mode`, `register-skill` (register-skill IS in CLI;
|
||||
the others are socket-only). Remote agents currently must use raw Python
|
||||
socket calls.
|
||||
|
||||
---
|
||||
|
||||
## Gap analysis
|
||||
|
||||
### What IS tested (single-host, single-agent)
|
||||
|
||||
- Agent spawn → JSONL → glasspane → Done lifecycle
|
||||
- Task create/intake/claim/start/done over socket
|
||||
- SIGTERM cleanup + stale socket safety
|
||||
- Session isolation with 2 agents (bypasses task board)
|
||||
- Cost mode derivation in background rotation
|
||||
- `pick_agent` unit tests: best match, offline exclusion, no-match, empty-required
|
||||
- Scheduler tick drains intake queue
|
||||
- `poll_tasks` spawns agent for a claimed task
|
||||
|
||||
### What is NOT tested
|
||||
|
||||
| # | Gap | Severity | Linux-doable? |
|
||||
|---|-----|----------|---------------|
|
||||
| 1 | **Multi-agent task-board contention** — `pick_agent` only tested with 0-1 agents; no capability-based multi-agent assignment test; no same-agent-multiple-tasks test | High | Yes |
|
||||
| 2 | **CLI surface gaps** — `claim-task`, `transition-task`, `register-agent`, `list-agents` have no CLI; remote agents forced to use raw Python | Medium | Yes |
|
||||
| 3 | **Agent presence model** — no `host` column, no `last_seen`, no heartbeat/lease; cannot detect stale remote agents | High | Yes (schema change) |
|
||||
| 4 | **Remote-safe task claim** — `claim_task` is a blind UPDATE, no concurrency safety, no lease/TTL | Medium | Yes |
|
||||
| 5 | **Python polling scripts** — `colibri_poll.py` and `colibri_task_done.py` have zero test coverage | Medium | Yes |
|
||||
| 6 | **TCP bridge round-trip** — socat bridge untested end-to-end | Medium | Partial (needs socat or FreeBSD) |
|
||||
| 7 | **Cross-host coordination** — no test simulates a remote agent claiming/transitioning a task over the bridge | High | FreeBSD only |
|
||||
|
||||
---
|
||||
|
||||
## Implementation phases
|
||||
|
||||
### Phase 1: Multi-agent task board tests (Linux, highest impact)
|
||||
|
||||
#### 1a. Pure `pick_agent` unit tests — extend `scheduler.rs` test module
|
||||
|
||||
Existing tests cover: best match (2 agents, different caps), offline exclusion,
|
||||
no-match, empty-required. Add:
|
||||
|
||||
| Test | What it proves |
|
||||
|------|---------------|
|
||||
| `test_pick_agent_partial_match_wins_over_no_match` | Agent with `["rust","freebsd"]` beats agent with `["python"]` for required `["freebsd"]` |
|
||||
| `test_pick_agent_tie_breaking` | Two agents with same score — verify deterministic tie-break (later name wins) |
|
||||
| `test_pick_agent_multiple_required_capabilities` | Required `["rust","freebsd"]` — agent with both beats agent with one |
|
||||
| `test_pick_agent_active_status_eligible` | `status: "active"` is treated same as `"idle"` (both eligible) |
|
||||
|
||||
#### 1b. Multi-agent board integration test — new file `crates/colibri-daemon/tests/multi_agent_board.rs`
|
||||
|
||||
Full lifecycle: register 2 agents with different capabilities, submit 2 intake
|
||||
tasks with matching capabilities, run scheduler tick, verify correct assignment,
|
||||
run `poll_tasks`, verify both agents spawn and reach Done.
|
||||
|
||||
```
|
||||
Register agent "freebsd-agent" with ["freebsd"]
|
||||
Register agent "rust-agent" with ["rust"]
|
||||
Submit intake "build on freebsd" required ["freebsd"]
|
||||
Submit intake "write rust code" required ["rust"]
|
||||
Run scheduler.tick(&state)
|
||||
→ verify task A agent_id == freebsd-agent.id
|
||||
→ verify task B agent_id == rust-agent.id
|
||||
Run poll_tasks(&state)
|
||||
→ verify 2 agent handles in state.agents
|
||||
→ verify both tasks transitioned Claimed → Started
|
||||
→ wait for glasspane Done on both panes
|
||||
```
|
||||
|
||||
This proves the core multi-agent coordination loop: **different agents get
|
||||
different tasks by capability**.
|
||||
|
||||
#### 1c. Same-capability multi-task test
|
||||
|
||||
```
|
||||
Register agent "worker" with ["freebsd"]
|
||||
Submit 2 intake tasks both requiring ["freebsd"]
|
||||
Run tick + poll_tasks
|
||||
→ verify both tasks assigned to same agent (documents current behavior)
|
||||
→ verify both agents spawn independently (session isolation)
|
||||
→ verify both reach Done
|
||||
```
|
||||
|
||||
Documents the current contention behavior (no guard against same agent getting
|
||||
multiple tasks) and proves session isolation when one agent handles multiple
|
||||
tasks.
|
||||
|
||||
### Phase 2: Merge `feat/cli-register-agent` + add claim/transition CLI
|
||||
|
||||
#### 2a. Merge `feat/cli-register-agent` (existing branch, 64 lines, client-only)
|
||||
|
||||
The branch is clean and ready:
|
||||
- `Command::RegisterAgent { name, capabilities }` + `Command::ListAgents`
|
||||
- `parse_capabilities()` helper (reuses `--capability`/`--capabilities` pattern)
|
||||
- `DaemonClient::register_agent()` + `DaemonClient::list_agents()`
|
||||
- Usage text
|
||||
|
||||
Enables: `colibri register-agent osa-agent --capability freebsd` and
|
||||
`colibri list-agents`.
|
||||
|
||||
#### 2b. Add `claim-task` and `transition-task` to CLI
|
||||
|
||||
The two commands `colibri_task_done.py` currently does via raw socket. Adding
|
||||
them to the CLI means remote agents can work entirely through the `colibri`
|
||||
binary:
|
||||
|
||||
```
|
||||
colibri claim-task --task-id <UUID> --agent-id <UUID>
|
||||
colibri transition-task --task-id <UUID> --status done|failed
|
||||
```
|
||||
|
||||
Implementation:
|
||||
- Add `Command::ClaimTask { task_id, agent_id }` and
|
||||
`Command::TransitionTask { task_id, status }` variants
|
||||
- Add `DaemonClient::claim_task()` and `DaemonClient::transition_task()`
|
||||
- Add CLI parsing (follow existing `--flag value` pattern)
|
||||
|
||||
#### 2c. Add CLI unit tests for new commands
|
||||
|
||||
Parse tests matching existing `parses_task_commands` style.
|
||||
|
||||
### Phase 3: Agent presence schema (deferred)
|
||||
|
||||
Add `host` and `last_seen` columns to the agents table. Update `register-agent`
|
||||
to accept an optional `host` parameter and update `last_seen` on each call. Add
|
||||
a `heartbeat` socket command for liveness. Enables detecting stale remote agents.
|
||||
|
||||
**Deferred** — requires schema migration and broader design discussion about
|
||||
lease semantics. Not blocking the multi-agent test coverage goal.
|
||||
|
||||
### Phase 4: Polling workflow integration test (deferred)
|
||||
|
||||
Test the full poll → claim → work → done cycle from the agent's perspective,
|
||||
simulating what `colibri_poll.py` does. Register two agents, create tasks with
|
||||
different capabilities, verify each agent sees only its tasks via the poll
|
||||
path, transition tasks to done.
|
||||
|
||||
**Deferred** — depends on Phase 2 CLI additions (so the test can use CLI
|
||||
commands instead of raw socket replication of the Python scripts).
|
||||
|
||||
### Phase 5: Bridge validation (FreeBSD-only)
|
||||
|
||||
Start `colibri_bridge` with socat on the FreeBSD host. Connect from a second
|
||||
host via Tailscale TCP. Verify round-trip: status, list-tasks, claim-task all
|
||||
work over the bridge. **Can only be done on FreeBSD 15 with the Tailscale
|
||||
mesh.**
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
| Phase | What | Files | Linux? | Status |
|
||||
|-------|------|-------|--------|--------|
|
||||
| 1a | `pick_agent` unit tests | `scheduler.rs` tests | Yes | Ready |
|
||||
| 1b | Multi-agent board integration test | `tests/multi_agent_board.rs` (new) | Yes | Ready |
|
||||
| 1c | Same-capability multi-task test | Same file | Yes | Ready |
|
||||
| 2a | Merge `feat/cli-register-agent` | `colibri.rs` + `lib.rs` | Yes | Branch exists |
|
||||
| 2b | Add `claim-task` + `transition-task` CLI | `colibri.rs` + `lib.rs` | Yes | Ready |
|
||||
| 2c | CLI parse tests | `colibri.rs` tests | Yes | Ready |
|
||||
| 3 | Agent presence schema | `schema.rs` + `lib.rs` + `socket.rs` | Yes | Deferred |
|
||||
| 4 | Polling workflow test | `tests/` | Yes | Deferred (needs Phase 2) |
|
||||
| 5 | TCP bridge validation | FreeBSD host | No | FreeBSD lane |
|
||||
|
||||
**Immediate scope:** Phases 1-2. All testable on Linux with `cargo test` +
|
||||
`cargo clippy` gate. No FreeBSD dependency for implementation.
|
||||
|
|
@ -1,16 +1,27 @@
|
|||
# Priority Handoff — Three Focus Items Toward ISO Gate 1
|
||||
|
||||
**Created:** 2026-06-14 (Sam & Hermes)
|
||||
**Status:** open for any agent to pick up
|
||||
**Replaces:** ad-hoc ISO work-lane priorities
|
||||
**Created:** 2026-06-14 (Sam & Hermes) · **Updated:** 2026-06-19
|
||||
**Status:** Priorities 2 & 3 **done** · Priority 1 **staged for FreeBSD build**
|
||||
**Superseded by:** `MULTI-AGENT-HOST-PLAN.md` for the next sprint
|
||||
|
||||
Round 2 audit is fully closed. All repos are green (164 tests, clippy clean,
|
||||
fmt clean). The three items below are the highest-leverage work toward getting
|
||||
Round 2 audit is fully closed. All repos are green (211 tests, clippy clean,
|
||||
fmt clean). The three items below were the highest-leverage work toward getting
|
||||
a Colibri-backed ISO candidate and delivering on the core cost-discipline
|
||||
promise.
|
||||
|
||||
Each item is independently implementable on Linux with FreeBSD validation as
|
||||
the final step. Items can be worked in parallel by different agents.
|
||||
**Current status of each item:**
|
||||
|
||||
- **Priority 1 (ISO boot validation):** Build wiring done, release runbook
|
||||
landed (`clawdie-iso/docs/RELEASE-BUILD-RUNBOOK.md`), artifacts built on
|
||||
FreeBSD host. Awaiting the 0.10.0 release build execution.
|
||||
- **Priority 2 (Pi spawn end-to-end):** **Done** — `poll_tasks()` wired in
|
||||
`9d443a4`, integration test `poll_tasks_spawns_agent_for_claimed_task` passes.
|
||||
- **Priority 3 (Cost mode enforcement):** **Done** — cost mode is single source
|
||||
of truth; `session_max_bytes`/`max_uncompacted_turns` removed from
|
||||
`DaemonConfig`; per-append compaction derives from `CostMode::parse()`.
|
||||
|
||||
The next sprint is multi-agent multi-host coordination — see
|
||||
[`MULTI-AGENT-HOST-PLAN.md`](MULTI-AGENT-HOST-PLAN.md).
|
||||
|
||||
---
|
||||
|
||||
|
|
|
|||
|
|
@ -14,4 +14,5 @@ A quick-reference guide to every document in this folder.
|
|||
| [`INTEGRATION-LAYERED-SOUL.md`](INTEGRATION-LAYERED-SOUL.md) | How Colibri consumes `layered-soul` reviewed context today vs planned | Agents |
|
||||
| [`ISO-ACCEPTANCE-RUNBOOK.md`](ISO-ACCEPTANCE-RUNBOOK.md) | Post-boot acceptance commands after staging Colibri into an ISO | Codex (FreeBSD) |
|
||||
| [`ISO-SERVICE-LAYOUT.md`](ISO-SERVICE-LAYOUT.md) | `rc.conf` service layout for the ISO image | All |
|
||||
| [`PRIORITY-HANDOFF-ISO-SPAWN-COST.md`](PRIORITY-HANDOFF-ISO-SPAWN-COST.md) | **Current sprint**: ISO staging wiring, Pi spawn path, cost mode enforcement | All agents |
|
||||
| [`MULTI-AGENT-HOST-PLAN.md`](MULTI-AGENT-HOST-PLAN.md) | **Current sprint**: multi-agent task-board tests + CLI surface gaps | All agents |
|
||||
| [`PRIORITY-HANDOFF-ISO-SPAWN-COST.md`](PRIORITY-HANDOFF-ISO-SPAWN-COST.md) | ISO boot validation, Pi spawn path, cost mode enforcement (P2/P3 done) | All agents |
|
||||
|
|
|
|||
Loading…
Add table
Reference in a new issue