docs(plan): refresh MULTI-AGENT-HOST-PLAN for current state
Phase 3 schema landed (PR #204) — columns exist, wiring pending. Bridge IP scrubbed, health/status unscrambled. Linux packaging added (PR #203). Firewall rules live (pf OSA + ufw domedog). Gap 4 (claim_task atomicity) marked closed. Test count: 256. Accurate status: Phase 3 is schema-in/logic-pending (not 'deferred' and not 'done'). Heartbeat/lease/TTL remain open.
This commit is contained in:
parent
ca5a226dce
commit
4e509c3e37
1 changed files with 24 additions and 17 deletions
|
|
@ -1,12 +1,12 @@
|
|||
# Multi-Agent Multi-Host — Gap Analysis & Implementation Plan
|
||||
|
||||
**Created:** 19.jun.2026 (Sam & Hermes)
|
||||
**Updated:** 25.jun.2026 (Sam & Claude) — reflects 0.12.0 release; Phases 1 + 2 complete
|
||||
**Status:** Phases 1 + 2 complete; Phase 3 (agent presence schema) deferred
|
||||
**Updated:** 26.jun.2026 (Sam & Claude) — Phase 3 schema landed; bridge packaging + firewall live
|
||||
**Status:** Phases 1 + 2 complete; Phase 3 schema in, agent-presence logic + heartbeat/lease pending; Phase 5 bridge ready
|
||||
|
||||
## Context
|
||||
|
||||
Colibri 0.12.0 is released (MIT license, 258 tests, FreeBSD port + CI running).
|
||||
Colibri 0.12.0 is released (MIT license, 256 tests, FreeBSD port + CI running).
|
||||
The tenant/vault provision chain has landed (`register-tenant` → jail spawn →
|
||||
`provision_tenant_env()` → `colibri-vault::provision`). The next milestone is
|
||||
proving the multi-agent, multi-host coordination model: multiple agents on
|
||||
|
|
@ -35,11 +35,12 @@ The multi-host stack lives **outside the Rust daemon**:
|
|||
- **Transport:** `tokio::net::UnixListener` only — zero TCP in Rust. The socat
|
||||
bridge is a shell-level relay.
|
||||
- **Agent model:** `register-agent` stores name + capabilities + status
|
||||
(`active`/`idle`/`offline`). Awaiting `host` field, `last_seen`, heartbeat,
|
||||
and lease/TTL (Phase 3).
|
||||
(`active`/`idle`/`offline`). `host` and `last_seen` columns landed
|
||||
(Phase 3 schema, PR #204); the `_host` arg is still ignored in the handler
|
||||
— wiring + heartbeat/lease/TTL pending.
|
||||
- **Task assignment:** `pick_agent()` matches by capability score (partial
|
||||
match counts, highest score wins, tie → later-in-slice). `claim_task()` is a
|
||||
blind UPDATE; await a concurrency guard (Gap 4).
|
||||
match counts, highest score wins, tie → later-in-slice). `claim_task()` is
|
||||
atomic (gated on `status = 'queued'`); Gap 4 closed (PR #190).
|
||||
- **Polling:** `colibri_poll.py` queries `list-tasks status=started` filtered
|
||||
by `agent_id`. `colibri_task_done.py` calls `transition-task`.
|
||||
- **Spawning:** `poll_tasks()` in daemon.rs spawns agents for `Claimed` tasks,
|
||||
|
|
@ -84,7 +85,7 @@ and `set-cost-mode` were added in Phase 2b (PR #138).
|
|||
|
||||
| # | Gap | Severity | Linux-doable? |
|
||||
| --- | ------------------------------------------------------------------------------------------------------------------------------------------------- | -------- | -------------------------------- |
|
||||
| 3 | **Agent presence model** — await `host`, `last_seen`, and heartbeat/lease columns to detect stale remote agents (Phase 3) | High | Yes (schema change) |
|
||||
| 3 | **Agent presence wiring** — `host` and `last_seen` columns landed (Phase 3 schema, PR #204); `_host` arg still ignored, heartbeat/lease/TTL pending | High | Yes (follow-up PR) |
|
||||
| 5 | **Python polling scripts** — `colibri_poll.py` and `colibri_task_done.py` have zero test coverage | Medium | Yes |
|
||||
| 6 | **TCP bridge round-trip** — socat bridge untested end-to-end | Medium | Partial (needs socat or FreeBSD) |
|
||||
| 7 | **Cross-host coordination** — await a test simulating a remote agent claiming/transitioning a task over the bridge | High | FreeBSD only |
|
||||
|
|
@ -198,14 +199,16 @@ Parse tests added: `parses_claim_task`, `parses_transition_task`,
|
|||
`parses_set_cost_mode`, `rejects_claim_task_missing_flags`,
|
||||
`rejects_transition_task_missing_flags`, `rejects_set_cost_mode_without_arg`.
|
||||
|
||||
### Phase 3: Agent presence schema (deferred)
|
||||
### Phase 3: Agent presence schema (schema landed, logic pending)
|
||||
|
||||
Add `host` and `last_seen` columns to the agents table. Update `register-agent`
|
||||
to accept an optional `host` parameter and update `last_seen` on each call. Add
|
||||
a `heartbeat` socket command for liveness. Enables detecting stale remote agents.
|
||||
|
||||
**Deferred** — requires schema migration and broader design discussion about
|
||||
lease semantics. Not blocking the multi-agent test coverage goal.
|
||||
**Schema landed (PR #204).** `MIGRATIONS` adds `host TEXT` and `last_seen TEXT`
|
||||
columns idempotently. The `_host` arg is accepted but ignored in the handler —
|
||||
agent presence is not functional yet. Heartbeat dispatch, host wiring, and
|
||||
lease/TTL semantics remain open.
|
||||
|
||||
### Phase 4: Polling workflow integration test (deferred)
|
||||
|
||||
|
|
@ -266,9 +269,12 @@ on a *different* host, entirely over the Tailscale bridge — the same routing t
|
|||
|
||||
**Security:** bind to the tailnet interface only and scope the `pf` rule to
|
||||
`tailscale0`. Use placeholder tailnet addresses in any committed notes — never
|
||||
paste real `100.x` IPs into git. (The shipped `colibri_bridge.in` currently
|
||||
hardcodes a real default `listen_addr`; that should be scrubbed to a placeholder
|
||||
or required-via-rc.conf separately.)
|
||||
paste real `100.x` IPs into git. **Done (PR #204):** `colibri_bridge.in`
|
||||
default listen_addr is now `TAILSCALE_IP_REQUIRED` with a prestart guard
|
||||
that fails loud if unconfigured. Linux bridge packaging landed (PR #203 —
|
||||
systemd unit, nft rules, env example). Firewall rules live: pf rule on OSA
|
||||
(port 9190, tailscale0 only), ufw rule on domedog (same). Health/status
|
||||
functions unscrambled (PR #204).
|
||||
|
||||
---
|
||||
|
||||
|
|
@ -282,9 +288,10 @@ or required-via-rc.conf separately.)
|
|||
| 2a | Merge `feat/cli-register-agent` | `colibri.rs` + `lib.rs` | Yes | **Complete** (PR #107) |
|
||||
| 2b | Add `claim-task` + `transition-task` + `set-cost-mode` CLI | `colibri.rs` + `lib.rs` | Yes | **Complete** (PR #138) |
|
||||
| 2c | CLI parse tests | `colibri.rs` tests | Yes | **Complete** (PR #138) |
|
||||
| 3 | Agent presence schema | `schema.rs` + `lib.rs` + `socket.rs` | Yes | Deferred |
|
||||
| 3 | Agent presence schema (WIP) | `schema.rs` + `lib.rs` + `socket.rs` | Yes | Schema in (PR #204); wiring + heartbeat/lease pending |
|
||||
| 4 | Polling workflow test | `tests/` | Yes | Deferred |
|
||||
| 5 | TCP bridge validation | FreeBSD host | No | FreeBSD lane |
|
||||
| — | Bridge packaging (FreeBSD + Linux) | `packaging/freebsd/` + `linux/` | Yes | **Complete** (PR #203, #204) |
|
||||
| — | Firewall rules (pf + ufw) | OSA + domedog | Both | **Live** |
|
||||
|
||||
**Phases 1 + 2 complete.** Next scope: Phase 3 (agent presence schema) or
|
||||
Phase 5 (FreeBSD bridge validation).
|
||||
**Phases 1 + 2 complete. Phase 3 schema in (wiring pending). Phase 5 bridge packaging + firewall live — operational validation next.**
|
||||
|
|
|
|||
Loading…
Add table
Reference in a new issue