docs(routing): mark cross-host routing LIVE — socat bridge + poller/worker

Cross-host transport landed via colibri PR #83 (socat bridge on osa
100.72.229.63:9190, Tailscale-only, + poller/worker loop), validated
debby<->osa.

- HOST-MATRIX: Current-vs-Designed note -> Routing LIVE; Track C -> DONE
- CAPABILITY-ROUTING: banner, caveat, topology [PLANNED]->[LIVE], worked example

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
Sam & Claude 2026-06-19 16:51:27 +02:00
parent 0f6b5c4438
commit 8b88a030d1
2 changed files with 23 additions and 13 deletions

View file

@ -1,6 +1,6 @@
# Capability-Based Task Routing
**LIVE VS PLANNED: this document describes the intended architecture.** Colibri's capability matcher exists (Colibri daemon) and works for a single-host daemon/agent pool. Cross-host routing across debby/domedog/osa is **not live yet** — that is the next wiring step. Sections below are labelled `[LIVE]` or `[PLANNED]`.
**LIVE VS PLANNED.** Colibri's capability matcher exists (Colibri daemon) and works for a single-host daemon/agent pool. **Cross-host transport is now LIVE** (2026-06-19): a `socat` bridge over Tailscale exposes osa's daemon, and a poller/worker loop runs assigned tasks across hosts — validated on the debby↔osa lane (colibri PR #83). Full capability-scored routing across all three hosts is maturing on top of this transport. Sections below are labelled `[LIVE]` or `[PLANNED]`.
**Principle: a tool that one OS can't support is not a loss — it's a routing
constraint.** In a multi-agent, multi-OS fleet we don't force every capability onto
@ -22,16 +22,23 @@ The matching engine exists today in `colibri-daemon` — this is working, per-ho
matches, `pick_agent` returns `None`: the task is created but left **unassigned until a
capable agent appears**.
> **Important caveat:** the daemon listens on a **local Unix socket only**. This means today the agent pool is per-host. An agent on osa cannot pick up a task from debby's daemon automatically.
> **Note:** the daemon itself listens on a **local Unix socket only**. Cross-host reach is
> provided by the bridge below, not by the daemon binding a network port directly.
## [PLANNED] Cross-host topology
## [LIVE] Cross-host topology
To route *across* hosts, agents on every host must be visible to one scheduler. Not implemented yet. Recommended approach:
Implemented 2026-06-19 (colibri PR #83), using the `socat`-over-Tailscale approach:
- **Central orchestrator daemon on debby (Hermes).** Agents on domedog/osa reach its
socket over Tailscale (forwarded via SSH/`socat`). Hermes is already the designated
orchestrator, so this matches the agent matrix.
- Alternative (heavier, deferred): daemon-to-daemon federation.
- **`socat` bridge** (`colibri_bridge` rc.d, daemon(8)-supervised) maps osa's daemon Unix
socket to a TCP port on the **Tailscale interface only** (`100.72.229.63:9190`, never
`0.0.0.0`), with a `pf` rule on `tailscale0`. The debby orchestrator reaches it over the
tailnet.
- **Poller/worker loop**`colibri_poll.py` (filters by agent UUID) and
`colibri_task_done.py` (transition-task), driven on the live 2 min / 5 min cadence by
Hermes' internal scheduler (see `packaging/freebsd/colibri-agent-loop.md`), not OS cron.
- **Validated** on the debby↔osa lane (real tasks completed end-to-end). domedog joins via
the same bridge pattern.
- Alternative (heavier, not pursued): daemon-to-daemon federation.
## [LIVE] Capability vocabulary (initial)
@ -58,7 +65,7 @@ Hosts advertise only what they truly have. Example from the current fleet:
## [DESIGN] Worked example: the tmux-screenshot skill
This illustrates the intended routing flow (requires [PLANNED] cross-host topology above):
This illustrates the routing flow (now runnable over the [LIVE] cross-host topology above):
1. FreeBSD image drops Pillow — stays lean (`pkg-list` carries only `python312`).
2. The skill manifest declares `required_capabilities: ["screenshot"]` (or `image-render`).

View file

@ -40,9 +40,12 @@ on any host fills in its own row. Source of truth for facts is the probe — not
>
> - Provider per agent (DeepSeek / OpenRouter / Z.AI / local) — fill in the per-host table.
> - One Telegram token per running service. Never share a token across instances.
> - **Current vs Designed**: Colibri has a capability matcher for agent routing, but today it
> works for local daemon/agent pools. Cross-host routing across debby/domedog/osa is the
> next wiring step — not live yet. `verify_facts_probe.py` is a required discipline/tool,
> - **Routing**: Colibri has a capability matcher for per-host agent pools, and **cross-host
> routing is LIVE** (2026-06-19): a `socat` bridge exposes osa's colibri-daemon on its
> Tailscale IP (`100.72.229.63:9190`, tailnet-only), the debby orchestrator dispatches over
> the tailnet, and a poller (2 min) / worker (5 min) loop executes assigned tasks. Validated
> on the debby↔osa lane; colibri PR #83. See [`CAPABILITY-ROUTING.md`](./CAPABILITY-ROUTING.md).
> - **Probe vs identity**: `verify_facts_probe.py` is a required discipline/tool,
> not an automatic startup hook — agents run it when grounding host facts, and HOST-MATRIX
> records the result. OS/hardware facts come from probes and the matrix, not from SOUL.md
> (which carries identity and values).
@ -137,7 +140,7 @@ host that fails. What you guess will be wrong; what you probe will be right.
- **Future tracks (separate, none blocking)**:
- Track A: daemon/rc.d promotion (hermes_daemon service, dedicated user)
- Track B: ~~Telegram/gateway integration~~ DONE (2026-06-17) — gateway daemonization (rc.d) still deferred
- Track C: Colibri cross-host routing (see CAPABILITY-ROUTING.md)
- Track C: ~~Colibri cross-host routing~~ **DONE (2026-06-19)**`socat` bridge on osa `:9190` (Tailscale-only) + poller/worker loop; colibri PR #83 merged. See CAPABILITY-ROUTING.md
- Track D: old clawdie_glass cleanup
_See [`../AGENTS.md`](../AGENTS.md) for the canonical agent matrix and operating rules._