Per operator decision: stop fighting FreeBSD's PYTHON_DEFAULT=3.11 — python3 is 3.11 everywhere, python3.12 stays available for apps needing newer. This makes Pillow trivial (py311-pillow imports on python3), so the prior "3.12 floor + py312-pillow absent + run on 3.11 explicitly" explanation collapses. - TOOLCHAIN.md: table row + decision section flipped to 3.11-default and cut to a few lines (supersedes the 17.jun.2026 "3.12 floor" decision); symlink note now says build.sh points python3 at 3.11. - CAPABILITY-ROUTING.md: trimmed the osa line + worked example — image-render via py311-pillow on python3, no version gymnastics. - HOST-MATRIX.md: trimmed the operator-image image-render/screenshot note. prettier + layered_soul validate clean. Dates in edited blocks use EU format. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
91 lines
5.7 KiB
Markdown
91 lines
5.7 KiB
Markdown
# Capability-Based Task Routing
|
|
|
|
**LIVE VS PLANNED.** Colibri's capability matcher exists (Colibri daemon) and works for a single-host daemon/agent pool. **Cross-host transport is now LIVE** (2026-06-19): a `socat` bridge over Tailscale exposes osa's daemon, and a poller/worker loop runs assigned tasks across hosts — validated on the debby↔osa lane (colibri PR #83). Full capability-scored routing across all three hosts is maturing on top of this transport. Sections below are labelled `[LIVE]` or `[PLANNED]`.
|
|
|
|
**Principle: a tool that one OS can't support is not a loss — it's a routing
|
|
constraint.** In a multi-agent, multi-OS fleet we don't force every capability onto
|
|
every host. We let each host advertise what it can do, let each task declare what it
|
|
needs, and let the scheduler send the task to a host that qualifies. FreeBSD stays lean;
|
|
the capability simply lives where it's cheap.
|
|
|
|
## [LIVE] What Colibri already provides (single host)
|
|
|
|
The matching engine exists today in `colibri-daemon` — this is working, per-host:
|
|
|
|
- **Agents carry capability tags** — `agents.capabilities` (JSON array) in the store
|
|
(`colibri-store` schema); registered via `colibri` client / `--capabilities`.
|
|
- **Tasks declare requirements** — jobs and intake requests carry `required_capabilities`
|
|
(`colibri intake-task --capabilities <csv>`).
|
|
- **The scheduler matches** — `pick_agent(required, agents)` scores each idle/active agent
|
|
with `capability_match_score` and picks the best fit.
|
|
- **Unmatched = parked, not failed** — if requirements are non-empty and no online agent
|
|
matches, `pick_agent` returns `None`: the task is created but left **unassigned until a
|
|
capable agent appears**.
|
|
|
|
> **Note:** the daemon itself listens on a **local Unix socket only**. Cross-host reach is
|
|
> provided by the bridge below, not by the daemon binding a network port directly.
|
|
|
|
## [LIVE] Cross-host topology
|
|
|
|
Implemented 2026-06-19 (colibri PR #83), using the `socat`-over-Tailscale approach:
|
|
|
|
- **`socat` bridge** (`colibri_bridge` rc.d, daemon(8)-supervised) maps osa's daemon Unix
|
|
socket to a TCP port on the **Tailscale interface only** (`${OSA_TS_IP}:9190`, never
|
|
`0.0.0.0`), with a `pf` rule on `tailscale0`. **osa is the always-on VPS** and hosts the
|
|
board + orchestrator (hermes-osa); agents on debby/domedog reach it over the tailnet. (debby
|
|
is an intermittent laptop — a client, never the hub.)
|
|
- **Poller/worker loop** — `colibri_poll.py` (filters by agent UUID) and
|
|
`colibri_task_done.py` (transition-task), driven on the live 2 min / 5 min cadence by
|
|
Hermes' internal scheduler (see `packaging/freebsd/colibri-agent-loop.md`), not OS cron.
|
|
- **Validated** on the debby↔osa lane (real tasks completed end-to-end). **domedog joined
|
|
2026-06-19** via the same pattern — a client-side `socat` shim → osa `${OSA_TS_IP}:9190`.
|
|
- Alternative (heavier, not pursued): daemon-to-daemon federation.
|
|
|
|
## [LIVE] Capability vocabulary (initial)
|
|
|
|
| Piece | Status | Action |
|
|
| --------------------- | ---------------------------------------------- | ------------------------------ |
|
|
| Capability vocabulary | tags are free-form (`rust`, `python`, `linux`) | Agree a shared tag set (below) |
|
|
|
|
Flat, explicit tags — the matcher does exact string comparison, no implied hierarchy.
|
|
Sourced from the probe and recorded per host in [`HOST-MATRIX.md`](./HOST-MATRIX.md).
|
|
|
|
| Category | Tags |
|
|
| --------- | ------------------------------------ |
|
|
| OS | `linux`, `freebsd` |
|
|
| Isolation | `docker`, `freebsd-jail` |
|
|
| Display | `gui`, `screenshot`, `wayland` |
|
|
| Hardware | `gpu`, `zfs` |
|
|
| Runtime | `python3.12`, `node24`, `rust`, `go` |
|
|
| Media | `ffmpeg`, `pillow`/`image-render` |
|
|
|
|
Hosts advertise only what they truly have. Actual registered agents (2026-06-19):
|
|
|
|
- **domedog (Linux, headless):** `linux`, `python3.12`, `rust`, `go`, `node`, `ffmpeg`,
|
|
`image-render` — the media/compute lane. **No** `screenshot`/`gui` (headless VM), no `docker`.
|
|
- **debby / hermes-debby (Linux):** `linux`, `docker`, `shell`, `gateway`, `hermes`, `tailscale`.
|
|
- **osa / hermes-osa (FreeBSD):** `freebsd`, `shell`, `gateway`, `tailscale`, `rc.d`, `pf`,
|
|
`nginx`, `acme`, `hermes` — no `image-render` (headless server; `py311-pillow` not installed
|
|
there). The FreeBSD operator image enables it via `py311-pillow` (below).
|
|
|
|
## [DESIGN] Worked example: the tmux-screenshot skill
|
|
|
|
This illustrates the routing flow (now runnable over the [LIVE] cross-host topology above):
|
|
|
|
1. The FreeBSD operator image ships `py311-pillow` (clawdie-iso #85), so
|
|
`clawdie-join-hive.sh` advertises `image-render` (Pillow on `python3` = 3.11) and
|
|
`screenshot` when a display is present.
|
|
2. The skill manifest declares `required_capabilities: ["image-render"]` (or `screenshot`).
|
|
3. Both **domedog** (Linux) and the **FreeBSD operator image** advertise `image-render`.
|
|
`screenshot` also needs a display, so a _headless_ host (domedog, osa) does not qualify —
|
|
the operator image, with its XFCE session, does.
|
|
4. Colibri routes the task to a matching host automatically — **proven 19.jun.2026: an
|
|
`image-render` task routed to domedog**; with no match it parks until a capable agent appears.
|
|
|
|
The capability moved hosts. It was never lost.
|
|
|
|
_See [`HIVE-ONBOARDING.md`](./HIVE-ONBOARDING.md) for the hive-onboarding vision built on
|
|
this routing layer, [`MCP-INTEGRATION.md`](./MCP-INTEGRATION.md) for connecting agents to the
|
|
board over MCP, [`AGENTS.md`](../AGENTS.md) for the agent matrix,
|
|
[`HOST-MATRIX.md`](./HOST-MATRIX.md) for per-host facts, and
|
|
[`TOOLCHAIN.md`](./TOOLCHAIN.md) for runtime versions._
|