docs: refresh MULTI-AGENT-HOST-PLAN for 0.12.0 — mark phases 1+2 complete #187

Merged
clawdie merged 2 commits from docs/multi-agent-plan-0.12-refresh into main 2026-06-25 16:58:39 +02:00

View file

@ -1,12 +1,12 @@
# Multi-Agent Multi-Host — Gap Analysis & Implementation Plan # Multi-Agent Multi-Host — Gap Analysis & Implementation Plan
**Created:** 19.jun.2026 (Sam & Hermes) **Created:** 19.jun.2026 (Sam & Hermes)
**Updated:** 21.jun.2026 (Sam & Claude) — reflects 0.11.0 release and narrowed gaps **Updated:** 25.jun.2026 (Sam & Claude) — reflects 0.12.0 release; Phases 1 + 2 complete
**Status:** Phase 2a complete; Phase 1 + Phase 2b ready for implementation **Status:** Phases 1 + 2 complete; Phase 3 (agent presence schema) deferred
## Context ## Context
Colibri 0.11.0 is released (MIT license, 230 tests, FreeBSD port + CI running). Colibri 0.12.0 is released (MIT license, 258 tests, FreeBSD port + CI running).
The tenant/vault provision chain has landed (`register-tenant` → jail spawn → The tenant/vault provision chain has landed (`register-tenant` → jail spawn →
`provision_tenant_env()``colibri-vault::provision`). The next milestone is `provision_tenant_env()``colibri-vault::provision`). The next milestone is
proving the multi-agent, multi-host coordination model: multiple agents on proving the multi-agent, multi-host coordination model: multiple agents on
@ -19,7 +19,7 @@ remains to close the multi-host testing gap.
--- ---
## Current architecture (as of 0.11.0) ## Current architecture (as of 0.12.0)
The multi-host stack lives **outside the Rust daemon**: The multi-host stack lives **outside the Rust daemon**:
@ -57,11 +57,10 @@ The multi-host stack lives **outside the Rust daemon**:
| Tenant | `register-tenant`, `list-tenants` | | Tenant | `register-tenant`, `list-tenants` |
| Skills | `list-skills`, `register-skill` | | Skills | `list-skills`, `register-skill` |
### CLI surface (16 of 19 commands exposed) ### CLI surface (19 of 19 commands exposed)
Awaiting CLI exposure: `claim-task`, `transition-task`, `set-cost-mode` All socket commands now have CLI wrappers. `claim-task`, `transition-task`,
(Phase 2b). Remote agents currently use raw Python socket calls for these and `set-cost-mode` were added in Phase 2b (PR #138).
three commands.
--- ---
@ -81,12 +80,10 @@ three commands.
- Double-spawn session isolation - Double-spawn session isolation
- Tenant register + list over socket - Tenant register + list over socket
### Test targets (awaiting coverage) ### Test targets (remaining gaps)
| # | Gap | Severity | Linux-doable? | | # | Gap | Severity | Linux-doable? |
| --- | ------------------------------------------------------------------------------------------------------------------------------------------------- | -------- | -------------------------------- | | --- | ------------------------------------------------------------------------------------------------------------------------------------------------- | -------- | -------------------------------- |
| 1 | **Multi-agent task-board contention**`pick_agent` tie-breaking, multi-required-capability, and active-status eligibility await dedicated tests | High | Yes |
| 2 | **CLI surface gaps**`claim-task`, `transition-task`, `set-cost-mode` await CLI exposure (Phase 2b) | Medium | Yes |
| 3 | **Agent presence model** — await `host`, `last_seen`, and heartbeat/lease columns to detect stale remote agents (Phase 3) | High | Yes (schema change) | | 3 | **Agent presence model** — await `host`, `last_seen`, and heartbeat/lease columns to detect stale remote agents (Phase 3) | High | Yes (schema change) |
| 4 | **Remote-safe task claim**`claim_task` is a blind UPDATE; await a concurrency guard or lease/TTL | Medium | Yes | | 4 | **Remote-safe task claim**`claim_task` is a blind UPDATE; await a concurrency guard or lease/TTL | Medium | Yes |
| 5 | **Python polling scripts**`colibri_poll.py` and `colibri_task_done.py` have zero test coverage | Medium | Yes | | 5 | **Python polling scripts**`colibri_poll.py` and `colibri_task_done.py` have zero test coverage | Medium | Yes |
@ -95,6 +92,15 @@ three commands.
### Closed gaps (since the original 19.jun.2026 analysis) ### Closed gaps (since the original 19.jun.2026 analysis)
- **Multi-agent task-board contention (Gap 1)** — tie-breaking, multi-required-
capability, and active-status eligibility tests added (Phase 1a, PR #138).
Full board lifecycle, capability routing, and contention tests added
(Phase 1b/1c, PR #186). Key finding: capabilities must be registered as a
JSON array (`["freebsd"]`); the object form (`{"freebsd":true}`) silently
scores zero in `pick_agent` because it deserializes `Vec<String>`.
- **CLI surface gaps (Gap 2)**`claim-task`, `transition-task`,
`set-cost-mode` added to CLI with parse tests (Phase 2b/2c, PR #138).
CLI surface is now 19/19.
- **CLI: register-agent + list-agents** — merged (Phase 2a, PR #107) - **CLI: register-agent + list-agents** — merged (Phase 2a, PR #107)
- **CLI: register-tenant + list-tenants + register-skill** — merged - **CLI: register-tenant + list-tenants + register-skill** — merged
- **pick_agent scoring** — partial-match and no-match scoring tests added - **pick_agent scoring** — partial-match and no-match scoring tests added
@ -111,13 +117,11 @@ three commands.
## Implementation phases ## Implementation phases
### Phase 1: Multi-agent task board tests (Linux, highest impact) ### Phase 1: Multi-agent task board tests — COMPLETE
#### 1a. Pure `pick_agent` unit tests — extend `scheduler.rs` test module #### 1a. Pure `pick_agent` unit tests — COMPLETE (PR #138)
Existing tests cover: best match (2 agents, different caps), offline exclusion, Added to `scheduler.rs` test module:
no-match, empty-required, partial scoring, none scoring, tick-drains-intake.
Add:
| Test | What it proves | | Test | What it proves |
| ------------------------------------------------ | --------------------------------------------------------------------------------- | | ------------------------------------------------ | --------------------------------------------------------------------------------- |
@ -125,38 +129,41 @@ Add:
| `test_pick_agent_multiple_required_capabilities` | Required `["rust","freebsd"]` — agent with both beats agent with one | | `test_pick_agent_multiple_required_capabilities` | Required `["rust","freebsd"]` — agent with both beats agent with one |
| `test_pick_agent_active_status_eligible` | `status: "active"` is treated same as `"idle"` (both eligible) | | `test_pick_agent_active_status_eligible` | `status: "active"` is treated same as `"idle"` (both eligible) |
#### 1b. Multi-agent board integration test — new file `crates/colibri-daemon/tests/multi_agent_board.rs` #### 1b. Multi-agent board integration test — COMPLETE (PR #186)
Full lifecycle: register 2 agents with different capabilities, submit 2 intake Full lifecycle via real Unix socket: register 2 agents with disjoint
tasks with matching capabilities, run scheduler tick, verify correct assignment, capabilities, submit 2 intake tasks with matching required capabilities,
run `poll_tasks`, verify both agents spawn and reach Done. scheduler's `pick_agent` auto-routes each task to the capable agent (no manual
claim), verified by polling `list-tasks status=claimed`.
``` ```
Register agent "freebsd-agent" with ["freebsd"] Register agent "sysadmin" with ["freebsd"] ← array form required
Register agent "rust-agent" with ["rust"] Register agent "db-admin" with ["postgres"]
Submit intake "build on freebsd" required ["freebsd"] Submit intake "scrub zroot" required ["freebsd"]
Submit intake "write rust code" required ["rust"] Submit intake "vacuum db" required ["postgres"]
Run scheduler.tick(&state) Scheduler tick (50ms interval) auto-claims:
→ verify task A agent_id == freebsd-agent.id → verify freebsd task agent_id == sysadmin.id
→ verify task B agent_id == rust-agent.id → verify postgres task agent_id == db-admin.id
Run poll_tasks(&state)
→ verify 2 agent handles in state.agents
→ verify both tasks transitioned Claimed → Started
→ wait for glasspane Done on both panes
``` ```
This proves the core multi-agent coordination loop: **different agents get This proves the core multi-agent coordination loop: **different agents get
different tasks by capability**. different tasks by capability**, assigned by the scheduler.
#### 1c. Same-capability multi-task test > **Capability format:** `pick_agent` deserializes `Vec<String>`, so
> capabilities must be registered as a JSON array (`["freebsd"]`). The object
> form (`{"freebsd":true}`) silently deserializes to an empty vec and scores
> zero. The board-mechanics tests (1a, 1c) use manual `claim-task` so the
> format is inert there; the routing test (1b) uses array form and documents
> this requirement.
#### 1c. Same-capability multi-task test — COMPLETE (PR #186)
``` ```
Register agent "worker" with ["freebsd"] Register agent "worker" with ["freebsd"]
Submit 2 intake tasks both requiring ["freebsd"] Create 2 plain board tasks (scrub zroot, check smart)
Run tick + poll_tasks Same agent claims both via manual claim-task
→ verify both tasks assigned to same agent (documents current behavior) → verify both transition started → done
→ verify both agents spawn independently (session isolation) → documents current contention behavior (no guard)
→ verify both reach Done
``` ```
Documents the current contention behavior (no guard against same agent getting Documents the current contention behavior (no guard against same agent getting
@ -169,11 +176,10 @@ tasks.
`register-agent` and `list-agents` are in the CLI (merged via PR #107). `register-agent` and `list-agents` are in the CLI (merged via PR #107).
#### 2b. Add `claim-task`, `transition-task`, and `set-cost-mode` to CLI #### 2b. Add `claim-task`, `transition-task`, and `set-cost-mode` to CLI — COMPLETE (PR #138)
The three commands `colibri_task_done.py` currently reaches via raw socket. All three commands are in the CLI. Remote agents can now work entirely through
Adding them to the CLI means remote agents can work entirely through the the `colibri` binary:
`colibri` binary:
``` ```
colibri claim-task --task-id <UUID> --agent-id <UUID> colibri claim-task --task-id <UUID> --agent-id <UUID>
@ -181,18 +187,11 @@ colibri transition-task --task-id <UUID> --status done|failed
colibri set-cost-mode MODE colibri set-cost-mode MODE
``` ```
Implementation: #### 2c. CLI unit tests for new commands — COMPLETE (PR #138)
- Add `Command::ClaimTask { task_id, agent_id }`, Parse tests added: `parses_claim_task`, `parses_transition_task`,
`Command::TransitionTask { task_id, status }`, and `parses_set_cost_mode`, `rejects_claim_task_missing_flags`,
`Command::SetCostMode { mode }` variants `rejects_transition_task_missing_flags`, `rejects_set_cost_mode_without_arg`.
- Add `DaemonClient::claim_task()`, `DaemonClient::transition_task()`, and
`DaemonClient::set_cost_mode()` methods
- Add CLI parsing (follow existing `--flag value` pattern)
#### 2c. Add CLI unit tests for new commands
Parse tests matching existing `parses_task_commands` style.
### Phase 3: Agent presence schema (deferred) ### Phase 3: Agent presence schema (deferred)
@ -210,15 +209,61 @@ simulating what `colibri_poll.py` does. Register two agents, create tasks with
different capabilities, verify each agent sees only its tasks via the poll different capabilities, verify each agent sees only its tasks via the poll
path, transition tasks to done. path, transition tasks to done.
**Deferred** — depends on Phase 2b CLI additions (so the test can use CLI **Deferred** — Phase 2b CLI additions are now complete; this test can be
commands instead of raw socket replication of the Python scripts). written when prioritized.
### Phase 5: Bridge validation (FreeBSD-only) ### Phase 5: Bridge validation (FreeBSD-only)
Start `colibri_bridge` with socat on the FreeBSD host. Connect from a second Closes Gap 6 (bridge round-trip) and Gap 7 (cross-host coordination) on the real
host via Tailscale TCP. Verify round-trip: status, list-tasks, claim-task all Tailscale mesh — an operational run, not more code. The bridge is the
work over the bridge. **Can only be done on FreeBSD 15 with the Tailscale `colibri_bridge` rc.d service running
mesh.** `socat TCP-LISTEN:9190,fork → UNIX-CONNECT:/var/run/colibri/colibri.sock`.
**Prerequisites**
- `colibri_daemon` running (socket at `/var/run/colibri/colibri.sock`)
- `pkg install socat`
- both hosts on the tailnet; a `pf` rule allowing the bridge port **inbound on
`tailscale0` only**, never the public interface
**On the FreeBSD host (OSA)**
```sh
sysrc colibri_bridge_enable=YES
sysrc colibri_bridge_listen_addr=<osa-tailnet-ip> # this host's tailnet address
service colibri_bridge start
sockstat -4 -l | grep 9190 # confirm socat is listening
```
**From a second host (e.g. domedog) over Tailscale** — the remote path is raw
TCP (the `colibri_poll.py` script speaks the local Unix socket; over the wire,
send newline-delimited JSON with `nc`):
```sh
printf '%s\n' '{"cmd":"status"}' | nc -w2 <osa-tailnet-ip> 9190
printf '%s\n' '{"cmd":"list-tasks"}' | nc -w2 <osa-tailnet-ip> 9190
```
**Cross-host coordination — the real proof (Gap 7)**
1. Register a remote agent (array-form caps — object form scores zero in
`pick_agent`):
`{"cmd":"register-agent","name":"domedog","capabilities":["linux"]}`
2. Submit an intake task requiring that capability:
`{"cmd":"intake-task","title":"...","capabilities":["linux"]}`
3. Confirm the scheduler routed it: `{"cmd":"list-tasks","status":"claimed"}`
→ the task's `agent_id` is the remote agent.
4. From the remote side, `transition-task` it to `done` and verify.
**Acceptance:** a task created on OSA is claimed and driven to `done` by an agent
on a *different* host, entirely over the Tailscale bridge — the same routing the
`scheduler_routes_intake_tasks_by_capability` test proves on a single host.
**Security:** bind to the tailnet interface only and scope the `pf` rule to
`tailscale0`. Use placeholder tailnet addresses in any committed notes — never
paste real `100.x` IPs into git. (The shipped `colibri_bridge.in` currently
hardcodes a real default `listen_addr`; that should be scrubbed to a placeholder
or required-via-rc.conf separately.)
--- ---
@ -226,15 +271,15 @@ mesh.**
| Phase | What | Files | Linux? | Status | | Phase | What | Files | Linux? | Status |
| ----- | ---------------------------------------------------------- | ------------------------------------ | ------ | ------------------------- | | ----- | ---------------------------------------------------------- | ------------------------------------ | ------ | ------------------------- |
| 1a | `pick_agent` unit tests (3 remaining) | `scheduler.rs` tests | Yes | Ready | | 1a | `pick_agent` unit tests (tie-break, multi-cap, active) | `scheduler.rs` tests | Yes | **Complete** (PR #138) |
| 1b | Multi-agent board integration test | `tests/multi_agent_board.rs` (new) | Yes | Ready | | 1b | Multi-agent board integration test (capability routing) | `tests/multi_agent_board.rs` | Yes | **Complete** (PR #186) |
| 1c | Same-capability multi-task test | Same file | Yes | Ready | | 1c | Same-capability multi-task test (contention) | `tests/multi_agent_board.rs` | Yes | **Complete** (PR #186) |
| 2a | Merge `feat/cli-register-agent` | `colibri.rs` + `lib.rs` | Yes | **Complete** | | 2a | Merge `feat/cli-register-agent` | `colibri.rs` + `lib.rs` | Yes | **Complete** (PR #107) |
| 2b | Add `claim-task` + `transition-task` + `set-cost-mode` CLI | `colibri.rs` + `lib.rs` | Yes | Ready | | 2b | Add `claim-task` + `transition-task` + `set-cost-mode` CLI | `colibri.rs` + `lib.rs` | Yes | **Complete** (PR #138) |
| 2c | CLI parse tests | `colibri.rs` tests | Yes | Ready | | 2c | CLI parse tests | `colibri.rs` tests | Yes | **Complete** (PR #138) |
| 3 | Agent presence schema | `schema.rs` + `lib.rs` + `socket.rs` | Yes | Deferred | | 3 | Agent presence schema | `schema.rs` + `lib.rs` + `socket.rs` | Yes | Deferred |
| 4 | Polling workflow test | `tests/` | Yes | Deferred (needs Phase 2b) | | 4 | Polling workflow test | `tests/` | Yes | Deferred |
| 5 | TCP bridge validation | FreeBSD host | No | FreeBSD lane | | 5 | TCP bridge validation | FreeBSD host | No | FreeBSD lane |
**Immediate scope:** Phases 1 + 2b. All testable on Linux with `cargo test` + **Phases 1 + 2 complete.** Next scope: Phase 3 (agent presence schema) or
`cargo clippy` gate. No FreeBSD dependency for implementation. Phase 5 (FreeBSD bridge validation).