clawdie/colibri

Fork 0

Sam & Claude 95bf3f396d

CI / rust (push) Waiting to run

Details

CI / markdown (push) Waiting to run

Details

CI / port (push) Waiting to run

Details

CI / agent-jail-pkgs (push) Waiting to run

Details

fix(store): atomic+exclusive claim_task — close Gap 4 concurrency guard (#190 )

Co-authored-by: Sam & Claude <hello@clawdie.si>
Co-committed-by: Sam & Claude <hello@clawdie.si>

2026-06-25 17:33:15 +02:00

15 KiB

Raw Blame History

Multi-Agent Multi-Host — Gap Analysis & Implementation Plan

Created: 19.jun.2026 (Sam & Hermes) Updated: 25.jun.2026 (Sam & Claude) — reflects 0.12.0 release; Phases 1 + 2 complete Status: Phases 1 + 2 complete; Phase 3 (agent presence schema) deferred

Context

Colibri 0.12.0 is released (MIT license, 258 tests, FreeBSD port + CI running). The tenant/vault provision chain has landed (register-tenant → jail spawn → provision_tenant_env() → colibri-vault::provision). The next milestone is proving the multi-agent, multi-host coordination model: multiple agents on different hosts reading from the same Colibri task board, each picking up work by capability, and reporting results back.

PR #83 landed the first cross-host plumbing — a socat TCP bridge, Python polling scripts, and a Hermes cronjob configuration. The gap analysis below defines what remains to close the multi-host testing gap.

Current architecture (as of 0.12.0)

The multi-host stack lives outside the Rust daemon:

  FreeBSD host (colibri-daemon)
    └── Unix socket: /var/run/colibri/colibri.sock
         └── socat bridge (colibri_bridge rc.d, port 9190, Tailscale)
              └── TCP reachable from: debby, domedog
                   └── colibri_poll.py (Python, raw JSON-over-socket)
                        └── Hermes cronjob (2min poll / 5min work)

Transport: tokio::net::UnixListener only — zero TCP in Rust. The socat bridge is a shell-level relay.
Agent model: register-agent stores name + capabilities + status (active/idle/offline). Awaiting host field, last_seen, heartbeat, and lease/TTL (Phase 3).
Task assignment: pick_agent() matches by capability score (partial match counts, highest score wins, tie → later-in-slice). claim_task() is a blind UPDATE; await a concurrency guard (Gap 4).
Polling: colibri_poll.py queries list-tasks status=started filtered by agent_id. colibri_task_done.py calls transition-task.
Spawning: poll_tasks() in daemon.rs spawns agents for Claimed tasks, skipping those with an existing session (idempotency guard).

Socket command inventory (19 commands, all Unix-socket)

Category	Commands
Daemon	`status`, `glasspane-snapshot`, `set-cost-mode`
Session	`list-sessions`, `get-session`, `compact-session`
Agent process	`spawn-agent`, `kill-agent`
Board	`list-tasks`, `create-task`, `transition-task`, `claim-task`, `intake-task`
Agent registry	`register-agent`, `list-agents`
Tenant	`register-tenant`, `list-tenants`
Skills	`list-skills`, `register-skill`

CLI surface (19 of 19 commands exposed)

All socket commands now have CLI wrappers. claim-task, transition-task, and set-cost-mode were added in Phase 2b (PR #138).

Gap analysis

What IS tested (single-host, single-agent)

Agent spawn → JSONL → glasspane → Done lifecycle
Task create/intake/claim/start/done over socket
SIGTERM cleanup + stale socket safety
Session isolation with 2 agents (bypasses task board)
Cost mode derivation in background rotation
pick_agent unit tests: best match (2 agents), offline exclusion, no-match, empty-required, partial scoring, none scoring
Scheduler tick drains intake queue without deadlock
poll_tasks spawns agent for a claimed task
Double-spawn session isolation
Tenant register + list over socket

Test targets (remaining gaps)

#	Gap	Severity	Linux-doable?
3	Agent presence model — await `host`, `last_seen`, and heartbeat/lease columns to detect stale remote agents (Phase 3)	High	Yes (schema change)
5	Python polling scripts — `colibri_poll.py` and `colibri_task_done.py` have zero test coverage	Medium	Yes
6	TCP bridge round-trip — socat bridge untested end-to-end	Medium	Partial (needs socat or FreeBSD)
7	Cross-host coordination — await a test simulating a remote agent claiming/transitioning a task over the bridge	High	FreeBSD only

Closed gaps (since the original 19.jun.2026 analysis)

Remote-safe task claim (Gap 4) — claim_task was a blind UPDATE (last writer wins). Now guarded on status = 'queued', so the claim is atomic and exclusive: racing agents — exactly the contention the Tailscale bridge exposes — get a Conflict instead of silently stealing a claimed task. Covered by test_claim_task_is_exclusive (store) and socket_rejects_double_claim_of_same_task (daemon, end-to-end over the socket).
Multi-agent task-board contention (Gap 1) — tie-breaking, multi-required- capability, and active-status eligibility tests added (Phase 1a, PR #138). Full board lifecycle, capability routing, and contention tests added (Phase 1b/1c, PR #186). Key finding: capabilities must be registered as a JSON array (["freebsd"]); the object form ({"freebsd":true}) silently scores zero in pick_agent because it deserializes Vec<String>.
CLI surface gaps (Gap 2) — claim-task, transition-task, set-cost-mode added to CLI with parse tests (Phase 2b/2c, PR #138). CLI surface is now 19/19.
CLI: register-agent + list-agents — merged (Phase 2a, PR #107)
CLI: register-tenant + list-tenants + register-skill — merged
pick_agent scoring — partial-match and no-match scoring tests added
Tenant/vault provision chain — register-tenant, jail spawn flags, provision_tenant_env(), colibri-vault::provision all landed
Issue #88 (CollectionNotFound) — daemon passes tenant_id (collection name) to vault::provision
Issue #91 (tenant provision target verification) — trim_trailing_slash string-equality check
Issue #92 (vault provision canonicalization) — canonicalize + allowed-root containment (PR #119)

Implementation phases

Phase 1: Multi-agent task board tests — COMPLETE

1a. Pure `pick_agent` unit tests — COMPLETE (PR #138)

Added to scheduler.rs test module:

Test	What it proves
`test_pick_agent_tie_breaking`	Two agents with same score — verify deterministic tie-break (later-in-slice wins)
`test_pick_agent_multiple_required_capabilities`	Required `["rust","freebsd"]` — agent with both beats agent with one
`test_pick_agent_active_status_eligible`	`status: "active"` is treated same as `"idle"` (both eligible)

1b. Multi-agent board integration test — COMPLETE (PR #186)

Full lifecycle via real Unix socket: register 2 agents with disjoint capabilities, submit 2 intake tasks with matching required capabilities, scheduler's pick_agent auto-routes each task to the capable agent (no manual claim), verified by polling list-tasks status=claimed.

Register agent "sysadmin" with ["freebsd"]   ← array form required
Register agent "db-admin" with ["postgres"]
Submit intake "scrub zroot" required ["freebsd"]
Submit intake "vacuum db" required ["postgres"]
Scheduler tick (50ms interval) auto-claims:
  → verify freebsd task agent_id == sysadmin.id
  → verify postgres task agent_id == db-admin.id

This proves the core multi-agent coordination loop: different agents get different tasks by capability, assigned by the scheduler.

Capability format: pick_agent deserializes Vec<String>, so capabilities must be registered as a JSON array (["freebsd"]). The object form ({"freebsd":true}) silently deserializes to an empty vec and scores zero. The board-mechanics tests (1a, 1c) use manual claim-task so the format is inert there; the routing test (1b) uses array form and documents this requirement.

1c. Same-capability multi-task test — COMPLETE (PR #186)

Register agent "worker" with ["freebsd"]
Create 2 plain board tasks (scrub zroot, check smart)
Same agent claims both via manual claim-task
  → verify both transition started → done
  → documents current contention behavior (no guard)

Documents the current contention behavior (no guard against same agent getting multiple tasks) and proves session isolation when one agent handles multiple tasks.

Phase 2: CLI surface completion

2a. Merge `feat/cli-register-agent` — COMPLETE

register-agent and list-agents are in the CLI (merged via PR #107).

2b. Add `claim-task`, `transition-task`, and `set-cost-mode` to CLI — COMPLETE (PR #138)

All three commands are in the CLI. Remote agents can now work entirely through the colibri binary:

colibri claim-task --task-id <UUID> --agent-id <UUID>
colibri transition-task --task-id <UUID> --status done|failed
colibri set-cost-mode MODE

2c. CLI unit tests for new commands — COMPLETE (PR #138)

Parse tests added: parses_claim_task, parses_transition_task, parses_set_cost_mode, rejects_claim_task_missing_flags, rejects_transition_task_missing_flags, rejects_set_cost_mode_without_arg.

Phase 3: Agent presence schema (deferred)

Add host and last_seen columns to the agents table. Update register-agent to accept an optional host parameter and update last_seen on each call. Add a heartbeat socket command for liveness. Enables detecting stale remote agents.

Deferred — requires schema migration and broader design discussion about lease semantics. Not blocking the multi-agent test coverage goal.

Phase 4: Polling workflow integration test (deferred)

Test the full poll → claim → work → done cycle from the agent's perspective, simulating what colibri_poll.py does. Register two agents, create tasks with different capabilities, verify each agent sees only its tasks via the poll path, transition tasks to done.

Deferred — Phase 2b CLI additions are now complete; this test can be written when prioritized.

Phase 5: Bridge validation (FreeBSD-only)

Closes Gap 6 (bridge round-trip) and Gap 7 (cross-host coordination) on the real Tailscale mesh — an operational run, not more code. The bridge is the colibri_bridge rc.d service running socat TCP-LISTEN:9190,fork → UNIX-CONNECT:/var/run/colibri/colibri.sock.

Prerequisites

colibri_daemon running (socket at /var/run/colibri/colibri.sock)
pkg install socat
both hosts on the tailnet; a pf rule allowing the bridge port inbound on tailscale0 only, never the public interface

On the FreeBSD host (OSA)

sysrc colibri_bridge_enable=YES
sysrc colibri_bridge_listen_addr=<osa-tailnet-ip>   # this host's tailnet address
service colibri_bridge start
sockstat -4 -l | grep 9190                           # confirm socat is listening

From a second host (e.g. domedog) over Tailscale — the remote path is raw TCP (the colibri_poll.py script speaks the local Unix socket; over the wire, send newline-delimited JSON with nc):

printf '%s\n' '{"cmd":"status"}'     | nc -w2 <osa-tailnet-ip> 9190
printf '%s\n' '{"cmd":"list-tasks"}' | nc -w2 <osa-tailnet-ip> 9190

Cross-host coordination — the real proof (Gap 7)

Register a remote agent (array-form caps — object form scores zero in pick_agent): {"cmd":"register-agent","name":"domedog","capabilities":["linux"]}
Submit an intake task requiring that capability: {"cmd":"intake-task","title":"...","capabilities":["linux"]}
Confirm the scheduler routed it: {"cmd":"list-tasks","status":"claimed"} → the task's agent_id is the remote agent.
From the remote side, transition-task it to done and verify.

Acceptance: a task created on OSA is claimed and driven to done by an agent on a different host, entirely over the Tailscale bridge — the same routing the scheduler_routes_intake_tasks_by_capability test proves on a single host.

Security: bind to the tailnet interface only and scope the pf rule to tailscale0. Use placeholder tailnet addresses in any committed notes — never paste real 100.x IPs into git. (The shipped colibri_bridge.in currently hardcodes a real default listen_addr; that should be scrubbed to a placeholder or required-via-rc.conf separately.)

Summary

Phase	What	Files	Linux?	Status
1a	`pick_agent` unit tests (tie-break, multi-cap, active)	`scheduler.rs` tests	Yes	Complete (PR #138)
1b	Multi-agent board integration test (capability routing)	`tests/multi_agent_board.rs`	Yes	Complete (PR #186)
1c	Same-capability multi-task test (contention)	`tests/multi_agent_board.rs`	Yes	Complete (PR #186)
2a	Merge `feat/cli-register-agent`	`colibri.rs` + `lib.rs`	Yes	Complete (PR #107)
2b	Add `claim-task` + `transition-task` + `set-cost-mode` CLI	`colibri.rs` + `lib.rs`	Yes	Complete (PR #138)
2c	CLI parse tests	`colibri.rs` tests	Yes	Complete (PR #138)
3	Agent presence schema	`schema.rs` + `lib.rs` + `socket.rs`	Yes	Deferred
4	Polling workflow test	`tests/`	Yes	Deferred
5	TCP bridge validation	FreeBSD host	No	FreeBSD lane

Phases 1 + 2 complete. Next scope: Phase 3 (agent presence schema) or Phase 5 (FreeBSD bridge validation).

15 KiB Raw Blame History

Multi-Agent Multi-Host — Gap Analysis & Implementation Plan

Context

Current architecture (as of 0.12.0)

Socket command inventory (19 commands, all Unix-socket)

CLI surface (19 of 19 commands exposed)

Gap analysis

What IS tested (single-host, single-agent)

Test targets (remaining gaps)

Closed gaps (since the original 19.jun.2026 analysis)

Implementation phases

Phase 1: Multi-agent task board tests — COMPLETE

1a. Pure pick_agent unit tests — COMPLETE (PR #138)

1b. Multi-agent board integration test — COMPLETE (PR #186)

1c. Same-capability multi-task test — COMPLETE (PR #186)

Phase 2: CLI surface completion

2a. Merge feat/cli-register-agent — COMPLETE

2b. Add claim-task, transition-task, and set-cost-mode to CLI — COMPLETE (PR #138)

2c. CLI unit tests for new commands — COMPLETE (PR #138)

Phase 3: Agent presence schema (deferred)

Phase 4: Polling workflow integration test (deferred)

Phase 5: Bridge validation (FreeBSD-only)

Summary

15 KiB

Raw Blame History

1a. Pure `pick_agent` unit tests — COMPLETE (PR #138)

2a. Merge `feat/cli-register-agent` — COMPLETE

2b. Add `claim-task`, `transition-task`, and `set-cost-mode` to CLI — COMPLETE (PR #138)