79 lines
3.9 KiB
Markdown
79 lines
3.9 KiB
Markdown
|
|
# Capability-Based Task Routing
|
||
|
|
|
||
|
|
**Principle: a tool that one OS can't support is not a loss — it's a routing
|
||
|
|
constraint.** In a multi-agent, multi-OS fleet we don't force every capability onto
|
||
|
|
every host. We let each host advertise what it can do, let each task declare what it
|
||
|
|
needs, and let the scheduler send the task to a host that qualifies. FreeBSD stays lean;
|
||
|
|
the capability simply lives where it's cheap.
|
||
|
|
|
||
|
|
This is the operational payoff of the dual-OS survivability model: heterogeneous hosts,
|
||
|
|
one task board, automatic placement.
|
||
|
|
|
||
|
|
## What Colibri already provides
|
||
|
|
|
||
|
|
The matching engine exists today in `colibri-daemon` — this is wiring, not a rewrite:
|
||
|
|
|
||
|
|
- **Agents carry capability tags** — `agents.capabilities` (JSON array) in the store
|
||
|
|
(`colibri-store` schema); registered via `colibri` client / `--capabilities`.
|
||
|
|
- **Tasks declare requirements** — jobs and intake requests carry `required_capabilities`
|
||
|
|
(`colibri intake-task --capabilities <csv>`).
|
||
|
|
- **The scheduler matches** — `pick_agent(required, agents)` scores each idle/active agent
|
||
|
|
with `capability_match_score` and picks the best fit.
|
||
|
|
- **Unmatched = parked, not failed** — if requirements are non-empty and no online agent
|
||
|
|
matches, `pick_agent` returns `None`: the task is created but left **unassigned until a
|
||
|
|
capable agent appears**. Exactly the behaviour we want — a screenshot task waits for a
|
||
|
|
Linux host rather than failing on FreeBSD.
|
||
|
|
|
||
|
|
## What we add to realize it
|
||
|
|
|
||
|
|
| Piece | Status | Action |
|
||
|
|
| ----- | ------ | ------ |
|
||
|
|
| Capability vocabulary | tags are free-form (`rust`, `python`, `linux`) | Agree a shared tag set (below) |
|
||
|
|
| Agents advertise real capabilities | manual / ad-hoc | Derive from `verify_facts_probe.py`; register at agent start |
|
||
|
|
| Skills declare their needs | `SkillManifest` has no requirements field | Add `required_capabilities: Vec<String>`; scheduler reads it |
|
||
|
|
| Cross-host agent pool | daemon listens on a **local Unix socket only** | One orchestrator daemon (debby/Hermes); remote agents reach it over Tailscale |
|
||
|
|
|
||
|
|
### Cross-host topology (the one real decision)
|
||
|
|
|
||
|
|
The daemon's socket is local, so today the agent pool is per-host. To route *across*
|
||
|
|
hosts, agents on every host must be visible to one scheduler. Recommended:
|
||
|
|
|
||
|
|
- **Central orchestrator daemon on debby (Hermes).** Agents on domedog/osa reach its
|
||
|
|
socket over Tailscale (forwarded via SSH/`socat`). Hermes is already the designated
|
||
|
|
orchestrator, so this matches the agent matrix.
|
||
|
|
- Alternative (heavier, deferred): daemon-to-daemon federation.
|
||
|
|
|
||
|
|
## Capability vocabulary (initial)
|
||
|
|
|
||
|
|
Flat, explicit tags — the matcher does exact string comparison, no implied hierarchy.
|
||
|
|
Sourced from the probe and recorded per host in [`HOST-MATRIX.md`](./HOST-MATRIX.md).
|
||
|
|
|
||
|
|
| Category | Tags |
|
||
|
|
| -------- | ---- |
|
||
|
|
| OS | `linux`, `freebsd` |
|
||
|
|
| Isolation | `docker`, `freebsd-jail` |
|
||
|
|
| Display | `gui`, `screenshot`, `wayland` |
|
||
|
|
| Hardware | `gpu`, `zfs` |
|
||
|
|
| Runtime | `python3.12`, `node24`, `rust`, `go` |
|
||
|
|
| Media | `ffmpeg`, `pillow`/`image-render` |
|
||
|
|
|
||
|
|
Hosts advertise only what they truly have. Example from the current fleet:
|
||
|
|
|
||
|
|
- **domedog / debby (Linux):** `linux`, `docker`, `gui`, `screenshot`, `image-render`, …
|
||
|
|
- **osa (FreeBSD):** `freebsd`, `freebsd-jail`, `zfs`, `rust`, … (no `screenshot`/`image-render`)
|
||
|
|
|
||
|
|
## Worked example: the tmux-screenshot skill
|
||
|
|
|
||
|
|
This is why we could drop `py312-pillow` from the FreeBSD ISO without losing the skill:
|
||
|
|
|
||
|
|
1. FreeBSD image drops Pillow — stays lean (`pkg-list` carries only `python312`).
|
||
|
|
2. The skill manifest declares `required_capabilities: ["screenshot"]` (or `image-render`).
|
||
|
|
3. Only Linux hosts advertise `screenshot` (Pillow is trivial there).
|
||
|
|
4. Colibri routes any screenshot task to debby/domedog automatically; if both are offline
|
||
|
|
the task parks until one returns.
|
||
|
|
|
||
|
|
The capability moved hosts. It was never lost.
|
||
|
|
|
||
|
|
_See [`AGENTS.md`](../AGENTS.md) for the agent matrix, [`HOST-MATRIX.md`](./HOST-MATRIX.md)
|
||
|
|
for per-host facts, and [`TOOLCHAIN.md`](./TOOLCHAIN.md) for runtime versions._
|