docs: refresh MULTI-AGENT-HOST-PLAN for 0.12.0 — mark phases 1+2 complete #187
1 changed files with 113 additions and 68 deletions
|
|
@ -1,12 +1,12 @@
|
||||||
# Multi-Agent Multi-Host — Gap Analysis & Implementation Plan
|
# Multi-Agent Multi-Host — Gap Analysis & Implementation Plan
|
||||||
|
|
||||||
**Created:** 19.jun.2026 (Sam & Hermes)
|
**Created:** 19.jun.2026 (Sam & Hermes)
|
||||||
**Updated:** 21.jun.2026 (Sam & Claude) — reflects 0.11.0 release and narrowed gaps
|
**Updated:** 25.jun.2026 (Sam & Claude) — reflects 0.12.0 release; Phases 1 + 2 complete
|
||||||
**Status:** Phase 2a complete; Phase 1 + Phase 2b ready for implementation
|
**Status:** Phases 1 + 2 complete; Phase 3 (agent presence schema) deferred
|
||||||
|
|
||||||
## Context
|
## Context
|
||||||
|
|
||||||
Colibri 0.11.0 is released (MIT license, 230 tests, FreeBSD port + CI running).
|
Colibri 0.12.0 is released (MIT license, 258 tests, FreeBSD port + CI running).
|
||||||
The tenant/vault provision chain has landed (`register-tenant` → jail spawn →
|
The tenant/vault provision chain has landed (`register-tenant` → jail spawn →
|
||||||
`provision_tenant_env()` → `colibri-vault::provision`). The next milestone is
|
`provision_tenant_env()` → `colibri-vault::provision`). The next milestone is
|
||||||
proving the multi-agent, multi-host coordination model: multiple agents on
|
proving the multi-agent, multi-host coordination model: multiple agents on
|
||||||
|
|
@ -19,7 +19,7 @@ remains to close the multi-host testing gap.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## Current architecture (as of 0.11.0)
|
## Current architecture (as of 0.12.0)
|
||||||
|
|
||||||
The multi-host stack lives **outside the Rust daemon**:
|
The multi-host stack lives **outside the Rust daemon**:
|
||||||
|
|
||||||
|
|
@ -57,11 +57,10 @@ The multi-host stack lives **outside the Rust daemon**:
|
||||||
| Tenant | `register-tenant`, `list-tenants` |
|
| Tenant | `register-tenant`, `list-tenants` |
|
||||||
| Skills | `list-skills`, `register-skill` |
|
| Skills | `list-skills`, `register-skill` |
|
||||||
|
|
||||||
### CLI surface (16 of 19 commands exposed)
|
### CLI surface (19 of 19 commands exposed)
|
||||||
|
|
||||||
Awaiting CLI exposure: `claim-task`, `transition-task`, `set-cost-mode`
|
All socket commands now have CLI wrappers. `claim-task`, `transition-task`,
|
||||||
(Phase 2b). Remote agents currently use raw Python socket calls for these
|
and `set-cost-mode` were added in Phase 2b (PR #138).
|
||||||
three commands.
|
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
|
@ -81,12 +80,10 @@ three commands.
|
||||||
- Double-spawn session isolation
|
- Double-spawn session isolation
|
||||||
- Tenant register + list over socket
|
- Tenant register + list over socket
|
||||||
|
|
||||||
### Test targets (awaiting coverage)
|
### Test targets (remaining gaps)
|
||||||
|
|
||||||
| # | Gap | Severity | Linux-doable? |
|
| # | Gap | Severity | Linux-doable? |
|
||||||
| --- | ------------------------------------------------------------------------------------------------------------------------------------------------- | -------- | -------------------------------- |
|
| --- | ------------------------------------------------------------------------------------------------------------------------------------------------- | -------- | -------------------------------- |
|
||||||
| 1 | **Multi-agent task-board contention** — `pick_agent` tie-breaking, multi-required-capability, and active-status eligibility await dedicated tests | High | Yes |
|
|
||||||
| 2 | **CLI surface gaps** — `claim-task`, `transition-task`, `set-cost-mode` await CLI exposure (Phase 2b) | Medium | Yes |
|
|
||||||
| 3 | **Agent presence model** — await `host`, `last_seen`, and heartbeat/lease columns to detect stale remote agents (Phase 3) | High | Yes (schema change) |
|
| 3 | **Agent presence model** — await `host`, `last_seen`, and heartbeat/lease columns to detect stale remote agents (Phase 3) | High | Yes (schema change) |
|
||||||
| 4 | **Remote-safe task claim** — `claim_task` is a blind UPDATE; await a concurrency guard or lease/TTL | Medium | Yes |
|
| 4 | **Remote-safe task claim** — `claim_task` is a blind UPDATE; await a concurrency guard or lease/TTL | Medium | Yes |
|
||||||
| 5 | **Python polling scripts** — `colibri_poll.py` and `colibri_task_done.py` have zero test coverage | Medium | Yes |
|
| 5 | **Python polling scripts** — `colibri_poll.py` and `colibri_task_done.py` have zero test coverage | Medium | Yes |
|
||||||
|
|
@ -95,6 +92,15 @@ three commands.
|
||||||
|
|
||||||
### Closed gaps (since the original 19.jun.2026 analysis)
|
### Closed gaps (since the original 19.jun.2026 analysis)
|
||||||
|
|
||||||
|
- **Multi-agent task-board contention (Gap 1)** — tie-breaking, multi-required-
|
||||||
|
capability, and active-status eligibility tests added (Phase 1a, PR #138).
|
||||||
|
Full board lifecycle, capability routing, and contention tests added
|
||||||
|
(Phase 1b/1c, PR #186). Key finding: capabilities must be registered as a
|
||||||
|
JSON array (`["freebsd"]`); the object form (`{"freebsd":true}`) silently
|
||||||
|
scores zero in `pick_agent` because it deserializes `Vec<String>`.
|
||||||
|
- **CLI surface gaps (Gap 2)** — `claim-task`, `transition-task`,
|
||||||
|
`set-cost-mode` added to CLI with parse tests (Phase 2b/2c, PR #138).
|
||||||
|
CLI surface is now 19/19.
|
||||||
- **CLI: register-agent + list-agents** — merged (Phase 2a, PR #107)
|
- **CLI: register-agent + list-agents** — merged (Phase 2a, PR #107)
|
||||||
- **CLI: register-tenant + list-tenants + register-skill** — merged
|
- **CLI: register-tenant + list-tenants + register-skill** — merged
|
||||||
- **pick_agent scoring** — partial-match and no-match scoring tests added
|
- **pick_agent scoring** — partial-match and no-match scoring tests added
|
||||||
|
|
@ -111,13 +117,11 @@ three commands.
|
||||||
|
|
||||||
## Implementation phases
|
## Implementation phases
|
||||||
|
|
||||||
### Phase 1: Multi-agent task board tests (Linux, highest impact)
|
### Phase 1: Multi-agent task board tests — COMPLETE
|
||||||
|
|
||||||
#### 1a. Pure `pick_agent` unit tests — extend `scheduler.rs` test module
|
#### 1a. Pure `pick_agent` unit tests — COMPLETE (PR #138)
|
||||||
|
|
||||||
Existing tests cover: best match (2 agents, different caps), offline exclusion,
|
Added to `scheduler.rs` test module:
|
||||||
no-match, empty-required, partial scoring, none scoring, tick-drains-intake.
|
|
||||||
Add:
|
|
||||||
|
|
||||||
| Test | What it proves |
|
| Test | What it proves |
|
||||||
| ------------------------------------------------ | --------------------------------------------------------------------------------- |
|
| ------------------------------------------------ | --------------------------------------------------------------------------------- |
|
||||||
|
|
@ -125,38 +129,41 @@ Add:
|
||||||
| `test_pick_agent_multiple_required_capabilities` | Required `["rust","freebsd"]` — agent with both beats agent with one |
|
| `test_pick_agent_multiple_required_capabilities` | Required `["rust","freebsd"]` — agent with both beats agent with one |
|
||||||
| `test_pick_agent_active_status_eligible` | `status: "active"` is treated same as `"idle"` (both eligible) |
|
| `test_pick_agent_active_status_eligible` | `status: "active"` is treated same as `"idle"` (both eligible) |
|
||||||
|
|
||||||
#### 1b. Multi-agent board integration test — new file `crates/colibri-daemon/tests/multi_agent_board.rs`
|
#### 1b. Multi-agent board integration test — COMPLETE (PR #186)
|
||||||
|
|
||||||
Full lifecycle: register 2 agents with different capabilities, submit 2 intake
|
Full lifecycle via real Unix socket: register 2 agents with disjoint
|
||||||
tasks with matching capabilities, run scheduler tick, verify correct assignment,
|
capabilities, submit 2 intake tasks with matching required capabilities,
|
||||||
run `poll_tasks`, verify both agents spawn and reach Done.
|
scheduler's `pick_agent` auto-routes each task to the capable agent (no manual
|
||||||
|
claim), verified by polling `list-tasks status=claimed`.
|
||||||
|
|
||||||
```
|
```
|
||||||
Register agent "freebsd-agent" with ["freebsd"]
|
Register agent "sysadmin" with ["freebsd"] ← array form required
|
||||||
Register agent "rust-agent" with ["rust"]
|
Register agent "db-admin" with ["postgres"]
|
||||||
Submit intake "build on freebsd" required ["freebsd"]
|
Submit intake "scrub zroot" required ["freebsd"]
|
||||||
Submit intake "write rust code" required ["rust"]
|
Submit intake "vacuum db" required ["postgres"]
|
||||||
Run scheduler.tick(&state)
|
Scheduler tick (50ms interval) auto-claims:
|
||||||
→ verify task A agent_id == freebsd-agent.id
|
→ verify freebsd task agent_id == sysadmin.id
|
||||||
→ verify task B agent_id == rust-agent.id
|
→ verify postgres task agent_id == db-admin.id
|
||||||
Run poll_tasks(&state)
|
|
||||||
→ verify 2 agent handles in state.agents
|
|
||||||
→ verify both tasks transitioned Claimed → Started
|
|
||||||
→ wait for glasspane Done on both panes
|
|
||||||
```
|
```
|
||||||
|
|
||||||
This proves the core multi-agent coordination loop: **different agents get
|
This proves the core multi-agent coordination loop: **different agents get
|
||||||
different tasks by capability**.
|
different tasks by capability**, assigned by the scheduler.
|
||||||
|
|
||||||
#### 1c. Same-capability multi-task test
|
> **Capability format:** `pick_agent` deserializes `Vec<String>`, so
|
||||||
|
> capabilities must be registered as a JSON array (`["freebsd"]`). The object
|
||||||
|
> form (`{"freebsd":true}`) silently deserializes to an empty vec and scores
|
||||||
|
> zero. The board-mechanics tests (1a, 1c) use manual `claim-task` so the
|
||||||
|
> format is inert there; the routing test (1b) uses array form and documents
|
||||||
|
> this requirement.
|
||||||
|
|
||||||
|
#### 1c. Same-capability multi-task test — COMPLETE (PR #186)
|
||||||
|
|
||||||
```
|
```
|
||||||
Register agent "worker" with ["freebsd"]
|
Register agent "worker" with ["freebsd"]
|
||||||
Submit 2 intake tasks both requiring ["freebsd"]
|
Create 2 plain board tasks (scrub zroot, check smart)
|
||||||
Run tick + poll_tasks
|
Same agent claims both via manual claim-task
|
||||||
→ verify both tasks assigned to same agent (documents current behavior)
|
→ verify both transition started → done
|
||||||
→ verify both agents spawn independently (session isolation)
|
→ documents current contention behavior (no guard)
|
||||||
→ verify both reach Done
|
|
||||||
```
|
```
|
||||||
|
|
||||||
Documents the current contention behavior (no guard against same agent getting
|
Documents the current contention behavior (no guard against same agent getting
|
||||||
|
|
@ -169,11 +176,10 @@ tasks.
|
||||||
|
|
||||||
`register-agent` and `list-agents` are in the CLI (merged via PR #107).
|
`register-agent` and `list-agents` are in the CLI (merged via PR #107).
|
||||||
|
|
||||||
#### 2b. Add `claim-task`, `transition-task`, and `set-cost-mode` to CLI
|
#### 2b. Add `claim-task`, `transition-task`, and `set-cost-mode` to CLI — COMPLETE (PR #138)
|
||||||
|
|
||||||
The three commands `colibri_task_done.py` currently reaches via raw socket.
|
All three commands are in the CLI. Remote agents can now work entirely through
|
||||||
Adding them to the CLI means remote agents can work entirely through the
|
the `colibri` binary:
|
||||||
`colibri` binary:
|
|
||||||
|
|
||||||
```
|
```
|
||||||
colibri claim-task --task-id <UUID> --agent-id <UUID>
|
colibri claim-task --task-id <UUID> --agent-id <UUID>
|
||||||
|
|
@ -181,18 +187,11 @@ colibri transition-task --task-id <UUID> --status done|failed
|
||||||
colibri set-cost-mode MODE
|
colibri set-cost-mode MODE
|
||||||
```
|
```
|
||||||
|
|
||||||
Implementation:
|
#### 2c. CLI unit tests for new commands — COMPLETE (PR #138)
|
||||||
|
|
||||||
- Add `Command::ClaimTask { task_id, agent_id }`,
|
Parse tests added: `parses_claim_task`, `parses_transition_task`,
|
||||||
`Command::TransitionTask { task_id, status }`, and
|
`parses_set_cost_mode`, `rejects_claim_task_missing_flags`,
|
||||||
`Command::SetCostMode { mode }` variants
|
`rejects_transition_task_missing_flags`, `rejects_set_cost_mode_without_arg`.
|
||||||
- Add `DaemonClient::claim_task()`, `DaemonClient::transition_task()`, and
|
|
||||||
`DaemonClient::set_cost_mode()` methods
|
|
||||||
- Add CLI parsing (follow existing `--flag value` pattern)
|
|
||||||
|
|
||||||
#### 2c. Add CLI unit tests for new commands
|
|
||||||
|
|
||||||
Parse tests matching existing `parses_task_commands` style.
|
|
||||||
|
|
||||||
### Phase 3: Agent presence schema (deferred)
|
### Phase 3: Agent presence schema (deferred)
|
||||||
|
|
||||||
|
|
@ -210,15 +209,61 @@ simulating what `colibri_poll.py` does. Register two agents, create tasks with
|
||||||
different capabilities, verify each agent sees only its tasks via the poll
|
different capabilities, verify each agent sees only its tasks via the poll
|
||||||
path, transition tasks to done.
|
path, transition tasks to done.
|
||||||
|
|
||||||
**Deferred** — depends on Phase 2b CLI additions (so the test can use CLI
|
**Deferred** — Phase 2b CLI additions are now complete; this test can be
|
||||||
commands instead of raw socket replication of the Python scripts).
|
written when prioritized.
|
||||||
|
|
||||||
### Phase 5: Bridge validation (FreeBSD-only)
|
### Phase 5: Bridge validation (FreeBSD-only)
|
||||||
|
|
||||||
Start `colibri_bridge` with socat on the FreeBSD host. Connect from a second
|
Closes Gap 6 (bridge round-trip) and Gap 7 (cross-host coordination) on the real
|
||||||
host via Tailscale TCP. Verify round-trip: status, list-tasks, claim-task all
|
Tailscale mesh — an operational run, not more code. The bridge is the
|
||||||
work over the bridge. **Can only be done on FreeBSD 15 with the Tailscale
|
`colibri_bridge` rc.d service running
|
||||||
mesh.**
|
`socat TCP-LISTEN:9190,fork → UNIX-CONNECT:/var/run/colibri/colibri.sock`.
|
||||||
|
|
||||||
|
**Prerequisites**
|
||||||
|
|
||||||
|
- `colibri_daemon` running (socket at `/var/run/colibri/colibri.sock`)
|
||||||
|
- `pkg install socat`
|
||||||
|
- both hosts on the tailnet; a `pf` rule allowing the bridge port **inbound on
|
||||||
|
`tailscale0` only**, never the public interface
|
||||||
|
|
||||||
|
**On the FreeBSD host (OSA)**
|
||||||
|
|
||||||
|
```sh
|
||||||
|
sysrc colibri_bridge_enable=YES
|
||||||
|
sysrc colibri_bridge_listen_addr=<osa-tailnet-ip> # this host's tailnet address
|
||||||
|
service colibri_bridge start
|
||||||
|
sockstat -4 -l | grep 9190 # confirm socat is listening
|
||||||
|
```
|
||||||
|
|
||||||
|
**From a second host (e.g. domedog) over Tailscale** — the remote path is raw
|
||||||
|
TCP (the `colibri_poll.py` script speaks the local Unix socket; over the wire,
|
||||||
|
send newline-delimited JSON with `nc`):
|
||||||
|
|
||||||
|
```sh
|
||||||
|
printf '%s\n' '{"cmd":"status"}' | nc -w2 <osa-tailnet-ip> 9190
|
||||||
|
printf '%s\n' '{"cmd":"list-tasks"}' | nc -w2 <osa-tailnet-ip> 9190
|
||||||
|
```
|
||||||
|
|
||||||
|
**Cross-host coordination — the real proof (Gap 7)**
|
||||||
|
|
||||||
|
1. Register a remote agent (array-form caps — object form scores zero in
|
||||||
|
`pick_agent`):
|
||||||
|
`{"cmd":"register-agent","name":"domedog","capabilities":["linux"]}`
|
||||||
|
2. Submit an intake task requiring that capability:
|
||||||
|
`{"cmd":"intake-task","title":"...","capabilities":["linux"]}`
|
||||||
|
3. Confirm the scheduler routed it: `{"cmd":"list-tasks","status":"claimed"}`
|
||||||
|
→ the task's `agent_id` is the remote agent.
|
||||||
|
4. From the remote side, `transition-task` it to `done` and verify.
|
||||||
|
|
||||||
|
**Acceptance:** a task created on OSA is claimed and driven to `done` by an agent
|
||||||
|
on a *different* host, entirely over the Tailscale bridge — the same routing the
|
||||||
|
`scheduler_routes_intake_tasks_by_capability` test proves on a single host.
|
||||||
|
|
||||||
|
**Security:** bind to the tailnet interface only and scope the `pf` rule to
|
||||||
|
`tailscale0`. Use placeholder tailnet addresses in any committed notes — never
|
||||||
|
paste real `100.x` IPs into git. (The shipped `colibri_bridge.in` currently
|
||||||
|
hardcodes a real default `listen_addr`; that should be scrubbed to a placeholder
|
||||||
|
or required-via-rc.conf separately.)
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
|
@ -226,15 +271,15 @@ mesh.**
|
||||||
|
|
||||||
| Phase | What | Files | Linux? | Status |
|
| Phase | What | Files | Linux? | Status |
|
||||||
| ----- | ---------------------------------------------------------- | ------------------------------------ | ------ | ------------------------- |
|
| ----- | ---------------------------------------------------------- | ------------------------------------ | ------ | ------------------------- |
|
||||||
| 1a | `pick_agent` unit tests (3 remaining) | `scheduler.rs` tests | Yes | Ready |
|
| 1a | `pick_agent` unit tests (tie-break, multi-cap, active) | `scheduler.rs` tests | Yes | **Complete** (PR #138) |
|
||||||
| 1b | Multi-agent board integration test | `tests/multi_agent_board.rs` (new) | Yes | Ready |
|
| 1b | Multi-agent board integration test (capability routing) | `tests/multi_agent_board.rs` | Yes | **Complete** (PR #186) |
|
||||||
| 1c | Same-capability multi-task test | Same file | Yes | Ready |
|
| 1c | Same-capability multi-task test (contention) | `tests/multi_agent_board.rs` | Yes | **Complete** (PR #186) |
|
||||||
| 2a | Merge `feat/cli-register-agent` | `colibri.rs` + `lib.rs` | Yes | **Complete** |
|
| 2a | Merge `feat/cli-register-agent` | `colibri.rs` + `lib.rs` | Yes | **Complete** (PR #107) |
|
||||||
| 2b | Add `claim-task` + `transition-task` + `set-cost-mode` CLI | `colibri.rs` + `lib.rs` | Yes | Ready |
|
| 2b | Add `claim-task` + `transition-task` + `set-cost-mode` CLI | `colibri.rs` + `lib.rs` | Yes | **Complete** (PR #138) |
|
||||||
| 2c | CLI parse tests | `colibri.rs` tests | Yes | Ready |
|
| 2c | CLI parse tests | `colibri.rs` tests | Yes | **Complete** (PR #138) |
|
||||||
| 3 | Agent presence schema | `schema.rs` + `lib.rs` + `socket.rs` | Yes | Deferred |
|
| 3 | Agent presence schema | `schema.rs` + `lib.rs` + `socket.rs` | Yes | Deferred |
|
||||||
| 4 | Polling workflow test | `tests/` | Yes | Deferred (needs Phase 2b) |
|
| 4 | Polling workflow test | `tests/` | Yes | Deferred |
|
||||||
| 5 | TCP bridge validation | FreeBSD host | No | FreeBSD lane |
|
| 5 | TCP bridge validation | FreeBSD host | No | FreeBSD lane |
|
||||||
|
|
||||||
**Immediate scope:** Phases 1 + 2b. All testable on Linux with `cargo test` +
|
**Phases 1 + 2 complete.** Next scope: Phase 3 (agent presence schema) or
|
||||||
`cargo clippy` gate. No FreeBSD dependency for implementation.
|
Phase 5 (FreeBSD bridge validation).
|
||||||
|
|
|
||||||
Loading…
Add table
Reference in a new issue