Multi-OS routing: hosts advertise capability tags, tasks declare required_capabilities, Colibri's scheduler (pick_agent/capability_match_score, already implemented) places each task on a qualifying host. Documents the vocabulary, the probe->capability mapping, the SkillManifest.required_capabilities addition, central-daemon topology, and the tmux-screenshot skill as the worked example (why dropping FreeBSD Pillow loses no capability). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
3.9 KiB
Capability-Based Task Routing
Principle: a tool that one OS can't support is not a loss — it's a routing constraint. In a multi-agent, multi-OS fleet we don't force every capability onto every host. We let each host advertise what it can do, let each task declare what it needs, and let the scheduler send the task to a host that qualifies. FreeBSD stays lean; the capability simply lives where it's cheap.
This is the operational payoff of the dual-OS survivability model: heterogeneous hosts, one task board, automatic placement.
What Colibri already provides
The matching engine exists today in colibri-daemon — this is wiring, not a rewrite:
- Agents carry capability tags —
agents.capabilities(JSON array) in the store (colibri-storeschema); registered viacolibriclient /--capabilities. - Tasks declare requirements — jobs and intake requests carry
required_capabilities(colibri intake-task --capabilities <csv>). - The scheduler matches —
pick_agent(required, agents)scores each idle/active agent withcapability_match_scoreand picks the best fit. - Unmatched = parked, not failed — if requirements are non-empty and no online agent
matches,
pick_agentreturnsNone: the task is created but left unassigned until a capable agent appears. Exactly the behaviour we want — a screenshot task waits for a Linux host rather than failing on FreeBSD.
What we add to realize it
| Piece | Status | Action |
|---|---|---|
| Capability vocabulary | tags are free-form (rust, python, linux) |
Agree a shared tag set (below) |
| Agents advertise real capabilities | manual / ad-hoc | Derive from verify_facts_probe.py; register at agent start |
| Skills declare their needs | SkillManifest has no requirements field |
Add required_capabilities: Vec<String>; scheduler reads it |
| Cross-host agent pool | daemon listens on a local Unix socket only | One orchestrator daemon (debby/Hermes); remote agents reach it over Tailscale |
Cross-host topology (the one real decision)
The daemon's socket is local, so today the agent pool is per-host. To route across hosts, agents on every host must be visible to one scheduler. Recommended:
- Central orchestrator daemon on debby (Hermes). Agents on domedog/osa reach its
socket over Tailscale (forwarded via SSH/
socat). Hermes is already the designated orchestrator, so this matches the agent matrix. - Alternative (heavier, deferred): daemon-to-daemon federation.
Capability vocabulary (initial)
Flat, explicit tags — the matcher does exact string comparison, no implied hierarchy.
Sourced from the probe and recorded per host in HOST-MATRIX.md.
| Category | Tags |
|---|---|
| OS | linux, freebsd |
| Isolation | docker, freebsd-jail |
| Display | gui, screenshot, wayland |
| Hardware | gpu, zfs |
| Runtime | python3.12, node24, rust, go |
| Media | ffmpeg, pillow/image-render |
Hosts advertise only what they truly have. Example from the current fleet:
- domedog / debby (Linux):
linux,docker,gui,screenshot,image-render, … - osa (FreeBSD):
freebsd,freebsd-jail,zfs,rust, … (noscreenshot/image-render)
Worked example: the tmux-screenshot skill
This is why we could drop py312-pillow from the FreeBSD ISO without losing the skill:
- FreeBSD image drops Pillow — stays lean (
pkg-listcarries onlypython312). - The skill manifest declares
required_capabilities: ["screenshot"](orimage-render). - Only Linux hosts advertise
screenshot(Pillow is trivial there). - Colibri routes any screenshot task to debby/domedog automatically; if both are offline the task parks until one returns.
The capability moved hosts. It was never lost.
See AGENTS.md for the agent matrix, HOST-MATRIX.md
for per-host facts, and TOOLCHAIN.md for runtime versions.