From 4e509c3e37d3ef49b4d032f4fe1c9532df8d8d15 Mon Sep 17 00:00:00 2001 From: Sam & Claude Date: Fri, 26 Jun 2026 01:49:46 +0200 Subject: [PATCH] docs(plan): refresh MULTI-AGENT-HOST-PLAN for current state MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Phase 3 schema landed (PR #204) — columns exist, wiring pending. Bridge IP scrubbed, health/status unscrambled. Linux packaging added (PR #203). Firewall rules live (pf OSA + ufw domedog). Gap 4 (claim_task atomicity) marked closed. Test count: 256. Accurate status: Phase 3 is schema-in/logic-pending (not 'deferred' and not 'done'). Heartbeat/lease/TTL remain open. --- docs/MULTI-AGENT-HOST-PLAN.md | 41 ++++++++++++++++++++--------------- 1 file changed, 24 insertions(+), 17 deletions(-) diff --git a/docs/MULTI-AGENT-HOST-PLAN.md b/docs/MULTI-AGENT-HOST-PLAN.md index 00a97a6..e4c0f6c 100644 --- a/docs/MULTI-AGENT-HOST-PLAN.md +++ b/docs/MULTI-AGENT-HOST-PLAN.md @@ -1,12 +1,12 @@ # Multi-Agent Multi-Host — Gap Analysis & Implementation Plan **Created:** 19.jun.2026 (Sam & Hermes) -**Updated:** 25.jun.2026 (Sam & Claude) — reflects 0.12.0 release; Phases 1 + 2 complete -**Status:** Phases 1 + 2 complete; Phase 3 (agent presence schema) deferred +**Updated:** 26.jun.2026 (Sam & Claude) — Phase 3 schema landed; bridge packaging + firewall live +**Status:** Phases 1 + 2 complete; Phase 3 schema in, agent-presence logic + heartbeat/lease pending; Phase 5 bridge ready ## Context -Colibri 0.12.0 is released (MIT license, 258 tests, FreeBSD port + CI running). +Colibri 0.12.0 is released (MIT license, 256 tests, FreeBSD port + CI running). The tenant/vault provision chain has landed (`register-tenant` → jail spawn → `provision_tenant_env()` → `colibri-vault::provision`). The next milestone is proving the multi-agent, multi-host coordination model: multiple agents on @@ -35,11 +35,12 @@ The multi-host stack lives **outside the Rust daemon**: - **Transport:** `tokio::net::UnixListener` only — zero TCP in Rust. The socat bridge is a shell-level relay. - **Agent model:** `register-agent` stores name + capabilities + status - (`active`/`idle`/`offline`). Awaiting `host` field, `last_seen`, heartbeat, - and lease/TTL (Phase 3). + (`active`/`idle`/`offline`). `host` and `last_seen` columns landed + (Phase 3 schema, PR #204); the `_host` arg is still ignored in the handler + — wiring + heartbeat/lease/TTL pending. - **Task assignment:** `pick_agent()` matches by capability score (partial - match counts, highest score wins, tie → later-in-slice). `claim_task()` is a - blind UPDATE; await a concurrency guard (Gap 4). + match counts, highest score wins, tie → later-in-slice). `claim_task()` is + atomic (gated on `status = 'queued'`); Gap 4 closed (PR #190). - **Polling:** `colibri_poll.py` queries `list-tasks status=started` filtered by `agent_id`. `colibri_task_done.py` calls `transition-task`. - **Spawning:** `poll_tasks()` in daemon.rs spawns agents for `Claimed` tasks, @@ -84,7 +85,7 @@ and `set-cost-mode` were added in Phase 2b (PR #138). | # | Gap | Severity | Linux-doable? | | --- | ------------------------------------------------------------------------------------------------------------------------------------------------- | -------- | -------------------------------- | -| 3 | **Agent presence model** — await `host`, `last_seen`, and heartbeat/lease columns to detect stale remote agents (Phase 3) | High | Yes (schema change) | +| 3 | **Agent presence wiring** — `host` and `last_seen` columns landed (Phase 3 schema, PR #204); `_host` arg still ignored, heartbeat/lease/TTL pending | High | Yes (follow-up PR) | | 5 | **Python polling scripts** — `colibri_poll.py` and `colibri_task_done.py` have zero test coverage | Medium | Yes | | 6 | **TCP bridge round-trip** — socat bridge untested end-to-end | Medium | Partial (needs socat or FreeBSD) | | 7 | **Cross-host coordination** — await a test simulating a remote agent claiming/transitioning a task over the bridge | High | FreeBSD only | @@ -198,14 +199,16 @@ Parse tests added: `parses_claim_task`, `parses_transition_task`, `parses_set_cost_mode`, `rejects_claim_task_missing_flags`, `rejects_transition_task_missing_flags`, `rejects_set_cost_mode_without_arg`. -### Phase 3: Agent presence schema (deferred) +### Phase 3: Agent presence schema (schema landed, logic pending) Add `host` and `last_seen` columns to the agents table. Update `register-agent` to accept an optional `host` parameter and update `last_seen` on each call. Add a `heartbeat` socket command for liveness. Enables detecting stale remote agents. -**Deferred** — requires schema migration and broader design discussion about -lease semantics. Not blocking the multi-agent test coverage goal. +**Schema landed (PR #204).** `MIGRATIONS` adds `host TEXT` and `last_seen TEXT` +columns idempotently. The `_host` arg is accepted but ignored in the handler — +agent presence is not functional yet. Heartbeat dispatch, host wiring, and +lease/TTL semantics remain open. ### Phase 4: Polling workflow integration test (deferred) @@ -266,9 +269,12 @@ on a *different* host, entirely over the Tailscale bridge — the same routing t **Security:** bind to the tailnet interface only and scope the `pf` rule to `tailscale0`. Use placeholder tailnet addresses in any committed notes — never -paste real `100.x` IPs into git. (The shipped `colibri_bridge.in` currently -hardcodes a real default `listen_addr`; that should be scrubbed to a placeholder -or required-via-rc.conf separately.) +paste real `100.x` IPs into git. **Done (PR #204):** `colibri_bridge.in` +default listen_addr is now `TAILSCALE_IP_REQUIRED` with a prestart guard +that fails loud if unconfigured. Linux bridge packaging landed (PR #203 — +systemd unit, nft rules, env example). Firewall rules live: pf rule on OSA +(port 9190, tailscale0 only), ufw rule on domedog (same). Health/status +functions unscrambled (PR #204). --- @@ -282,9 +288,10 @@ or required-via-rc.conf separately.) | 2a | Merge `feat/cli-register-agent` | `colibri.rs` + `lib.rs` | Yes | **Complete** (PR #107) | | 2b | Add `claim-task` + `transition-task` + `set-cost-mode` CLI | `colibri.rs` + `lib.rs` | Yes | **Complete** (PR #138) | | 2c | CLI parse tests | `colibri.rs` tests | Yes | **Complete** (PR #138) | -| 3 | Agent presence schema | `schema.rs` + `lib.rs` + `socket.rs` | Yes | Deferred | +| 3 | Agent presence schema (WIP) | `schema.rs` + `lib.rs` + `socket.rs` | Yes | Schema in (PR #204); wiring + heartbeat/lease pending | | 4 | Polling workflow test | `tests/` | Yes | Deferred | | 5 | TCP bridge validation | FreeBSD host | No | FreeBSD lane | +| — | Bridge packaging (FreeBSD + Linux) | `packaging/freebsd/` + `linux/` | Yes | **Complete** (PR #203, #204) | +| — | Firewall rules (pf + ufw) | OSA + domedog | Both | **Live** | -**Phases 1 + 2 complete.** Next scope: Phase 3 (agent presence schema) or -Phase 5 (FreeBSD bridge validation). +**Phases 1 + 2 complete. Phase 3 schema in (wiring pending). Phase 5 bridge packaging + firewall live — operational validation next.**