clawdie/colibri

Fork 0

Sam & Claude b233aa8d9e

CI / agent-jail-pkgs (pull_request) Has been cancelled

Details

CI / rust (pull_request) Has been cancelled

Details

CI / markdown (pull_request) Has been cancelled

Details

CI / port (pull_request) Has been cancelled

Details

docs: normalize prose dates to DD.mon.YYYY (AGENTS.md rule)

Convert US/ISO prose dates (2026-06-21) to EU format (21.jun.2026) across colibri
docs + wiki. Left as-is (data, not prose): the captured JSON "time" timestamp in
AGENT-EVENTS-REFERENCE and the rustc/cargo version strings in
CLAWDIE-INSTALLER-HANDOFF — ISO is correct for machine timestamps/filenames.

Gates: wiki-lint --strict clean; markdown format clean.

2026-06-24 16:43:41 +02:00

13 KiB

Raw Blame History

Multi-Agent Multi-Host — Gap Analysis & Implementation Plan

Created: 19.jun.2026 (Sam & Hermes) Updated: 21.jun.2026 (Sam & Claude) — reflects 0.11.0 release and narrowed gaps Status: Phase 2a complete; Phase 1 + Phase 2b ready for implementation

Context

Colibri 0.11.0 is released (MIT license, 230 tests, FreeBSD port + CI running). The tenant/vault provision chain has landed (register-tenant → jail spawn → provision_tenant_env() → colibri-vault::provision). The next milestone is proving the multi-agent, multi-host coordination model: multiple agents on different hosts reading from the same Colibri task board, each picking up work by capability, and reporting results back.

PR #83 landed the first cross-host plumbing — a socat TCP bridge, Python polling scripts, and a Hermes cronjob configuration. The gap analysis below defines what remains to close the multi-host testing gap.

Current architecture (as of 0.11.0)

The multi-host stack lives outside the Rust daemon:

  FreeBSD host (colibri-daemon)
    └── Unix socket: /var/run/colibri/colibri.sock
         └── socat bridge (colibri_bridge rc.d, port 9190, Tailscale)
              └── TCP reachable from: debby, domedog
                   └── colibri_poll.py (Python, raw JSON-over-socket)
                        └── Hermes cronjob (2min poll / 5min work)

Transport: tokio::net::UnixListener only — zero TCP in Rust. The socat bridge is a shell-level relay.
Agent model: register-agent stores name + capabilities + status (active/idle/offline). Awaiting host field, last_seen, heartbeat, and lease/TTL (Phase 3).
Task assignment: pick_agent() matches by capability score (partial match counts, highest score wins, tie → later-in-slice). claim_task() is a blind UPDATE; await a concurrency guard (Gap 4).
Polling: colibri_poll.py queries list-tasks status=started filtered by agent_id. colibri_task_done.py calls transition-task.
Spawning: poll_tasks() in daemon.rs spawns agents for Claimed tasks, skipping those with an existing session (idempotency guard).

Socket command inventory (19 commands, all Unix-socket)

Category	Commands
Daemon	`status`, `glasspane-snapshot`, `set-cost-mode`
Session	`list-sessions`, `get-session`, `compact-session`
Agent process	`spawn-agent`, `kill-agent`
Board	`list-tasks`, `create-task`, `transition-task`, `claim-task`, `intake-task`
Agent registry	`register-agent`, `list-agents`
Tenant	`register-tenant`, `list-tenants`
Skills	`list-skills`, `register-skill`

CLI surface (16 of 19 commands exposed)

Awaiting CLI exposure: claim-task, transition-task, set-cost-mode (Phase 2b). Remote agents currently use raw Python socket calls for these three commands.

Gap analysis

What IS tested (single-host, single-agent)

Agent spawn → JSONL → glasspane → Done lifecycle
Task create/intake/claim/start/done over socket
SIGTERM cleanup + stale socket safety
Session isolation with 2 agents (bypasses task board)
Cost mode derivation in background rotation
pick_agent unit tests: best match (2 agents), offline exclusion, no-match, empty-required, partial scoring, none scoring
Scheduler tick drains intake queue without deadlock
poll_tasks spawns agent for a claimed task
Double-spawn session isolation
Tenant register + list over socket

Test targets (awaiting coverage)

#	Gap	Severity	Linux-doable?
1	Multi-agent task-board contention — `pick_agent` tie-breaking, multi-required-capability, and active-status eligibility await dedicated tests	High	Yes
2	CLI surface gaps — `claim-task`, `transition-task`, `set-cost-mode` await CLI exposure (Phase 2b)	Medium	Yes
3	Agent presence model — await `host`, `last_seen`, and heartbeat/lease columns to detect stale remote agents (Phase 3)	High	Yes (schema change)
4	Remote-safe task claim — `claim_task` is a blind UPDATE; await a concurrency guard or lease/TTL	Medium	Yes
5	Python polling scripts — `colibri_poll.py` and `colibri_task_done.py` have zero test coverage	Medium	Yes
6	TCP bridge round-trip — socat bridge untested end-to-end	Medium	Partial (needs socat or FreeBSD)
7	Cross-host coordination — await a test simulating a remote agent claiming/transitioning a task over the bridge	High	FreeBSD only

Closed gaps (since the original 19.jun.2026 analysis)

CLI: register-agent + list-agents — merged (Phase 2a, PR #107)
CLI: register-tenant + list-tenants + register-skill — merged
pick_agent scoring — partial-match and no-match scoring tests added
Tenant/vault provision chain — register-tenant, jail spawn flags, provision_tenant_env(), colibri-vault::provision all landed
Issue #88 (CollectionNotFound) — daemon passes tenant_id (collection name) to vault::provision
Issue #91 (tenant provision target verification) — trim_trailing_slash string-equality check
Issue #92 (vault provision canonicalization) — canonicalize + allowed-root containment (PR #119)

Implementation phases

Phase 1: Multi-agent task board tests (Linux, highest impact)

1a. Pure `pick_agent` unit tests — extend `scheduler.rs` test module

Existing tests cover: best match (2 agents, different caps), offline exclusion, no-match, empty-required, partial scoring, none scoring, tick-drains-intake. Add:

Test	What it proves
`test_pick_agent_tie_breaking`	Two agents with same score — verify deterministic tie-break (later-in-slice wins)
`test_pick_agent_multiple_required_capabilities`	Required `["rust","freebsd"]` — agent with both beats agent with one
`test_pick_agent_active_status_eligible`	`status: "active"` is treated same as `"idle"` (both eligible)

1b. Multi-agent board integration test — new file `crates/colibri-daemon/tests/multi_agent_board.rs`

Full lifecycle: register 2 agents with different capabilities, submit 2 intake tasks with matching capabilities, run scheduler tick, verify correct assignment, run poll_tasks, verify both agents spawn and reach Done.

Register agent "freebsd-agent" with ["freebsd"]
Register agent "rust-agent" with ["rust"]
Submit intake "build on freebsd" required ["freebsd"]
Submit intake "write rust code" required ["rust"]
Run scheduler.tick(&state)
  → verify task A agent_id == freebsd-agent.id
  → verify task B agent_id == rust-agent.id
Run poll_tasks(&state)
  → verify 2 agent handles in state.agents
  → verify both tasks transitioned Claimed → Started
  → wait for glasspane Done on both panes

This proves the core multi-agent coordination loop: different agents get different tasks by capability.

1c. Same-capability multi-task test

Register agent "worker" with ["freebsd"]
Submit 2 intake tasks both requiring ["freebsd"]
Run tick + poll_tasks
  → verify both tasks assigned to same agent (documents current behavior)
  → verify both agents spawn independently (session isolation)
  → verify both reach Done

Documents the current contention behavior (no guard against same agent getting multiple tasks) and proves session isolation when one agent handles multiple tasks.

Phase 2: CLI surface completion

2a. Merge `feat/cli-register-agent` — COMPLETE

register-agent and list-agents are in the CLI (merged via PR #107).

2b. Add `claim-task`, `transition-task`, and `set-cost-mode` to CLI

The three commands colibri_task_done.py currently reaches via raw socket. Adding them to the CLI means remote agents can work entirely through the colibri binary:

colibri claim-task --task-id <UUID> --agent-id <UUID>
colibri transition-task --task-id <UUID> --status done|failed
colibri set-cost-mode MODE

Implementation:

Add Command::ClaimTask { task_id, agent_id }, Command::TransitionTask { task_id, status }, and Command::SetCostMode { mode } variants
Add DaemonClient::claim_task(), DaemonClient::transition_task(), and DaemonClient::set_cost_mode() methods
Add CLI parsing (follow existing --flag value pattern)

2c. Add CLI unit tests for new commands

Parse tests matching existing parses_task_commands style.

Phase 3: Agent presence schema (deferred)

Add host and last_seen columns to the agents table. Update register-agent to accept an optional host parameter and update last_seen on each call. Add a heartbeat socket command for liveness. Enables detecting stale remote agents.

Deferred — requires schema migration and broader design discussion about lease semantics. Not blocking the multi-agent test coverage goal.

Phase 4: Polling workflow integration test (deferred)

Test the full poll → claim → work → done cycle from the agent's perspective, simulating what colibri_poll.py does. Register two agents, create tasks with different capabilities, verify each agent sees only its tasks via the poll path, transition tasks to done.

Deferred — depends on Phase 2b CLI additions (so the test can use CLI commands instead of raw socket replication of the Python scripts).

Phase 5: Bridge validation (FreeBSD-only)

Start colibri_bridge with socat on the FreeBSD host. Connect from a second host via Tailscale TCP. Verify round-trip: status, list-tasks, claim-task all work over the bridge. Can only be done on FreeBSD 15 with the Tailscale mesh.

Summary

Phase	What	Files	Linux?	Status
1a	`pick_agent` unit tests (3 remaining)	`scheduler.rs` tests	Yes	Ready
1b	Multi-agent board integration test	`tests/multi_agent_board.rs` (new)	Yes	Ready
1c	Same-capability multi-task test	Same file	Yes	Ready
2a	Merge `feat/cli-register-agent`	`colibri.rs` + `lib.rs`	Yes	Complete
2b	Add `claim-task` + `transition-task` + `set-cost-mode` CLI	`colibri.rs` + `lib.rs`	Yes	Ready
2c	CLI parse tests	`colibri.rs` tests	Yes	Ready
3	Agent presence schema	`schema.rs` + `lib.rs` + `socket.rs`	Yes	Deferred
4	Polling workflow test	`tests/`	Yes	Deferred (needs Phase 2b)
5	TCP bridge validation	FreeBSD host	No	FreeBSD lane

Immediate scope: Phases 1 + 2b. All testable on Linux with cargo test + cargo clippy gate. No FreeBSD dependency for implementation.

13 KiB Raw Blame History

Multi-Agent Multi-Host — Gap Analysis & Implementation Plan

Context

Current architecture (as of 0.11.0)

Socket command inventory (19 commands, all Unix-socket)

CLI surface (16 of 19 commands exposed)

Gap analysis

What IS tested (single-host, single-agent)

Test targets (awaiting coverage)

Closed gaps (since the original 19.jun.2026 analysis)

Implementation phases

Phase 1: Multi-agent task board tests (Linux, highest impact)

1a. Pure pick_agent unit tests — extend scheduler.rs test module

1b. Multi-agent board integration test — new file crates/colibri-daemon/tests/multi_agent_board.rs

1c. Same-capability multi-task test

Phase 2: CLI surface completion

2a. Merge feat/cli-register-agent — COMPLETE

2b. Add claim-task, transition-task, and set-cost-mode to CLI

2c. Add CLI unit tests for new commands

Phase 3: Agent presence schema (deferred)

Phase 4: Polling workflow integration test (deferred)

Phase 5: Bridge validation (FreeBSD-only)

Summary

13 KiB

Raw Blame History

1a. Pure `pick_agent` unit tests — extend `scheduler.rs` test module

1b. Multi-agent board integration test — new file `crates/colibri-daemon/tests/multi_agent_board.rs`

2a. Merge `feat/cli-register-agent` — COMPLETE

2b. Add `claim-task`, `transition-task`, and `set-cost-mode` to CLI