layered-soul/docs/CAPABILITY-ROUTING.md
Sam & Claude 2fd29cead7 docs: python 3.11/3.12 coexistence on FreeBSD; correct the Pillow rationale
Reconcile the toolchain + capability docs with clawdie-iso #84 (FreeBSD
PYTHON_DEFAULT=3.11):

- TOOLCHAIN.md: the FreeBSD column claimed `py312-*` flavors; reality is
  python312 (app) + python311 (pkg default, transitive), with py311-* prebuilt
  and py312-* absent in the quarterly repo. Added the 3.11/3.12 coexistence note
  ("3.12 floor" = floor for our code, not a ban on the base's 3.11).
- CAPABILITY-ROUTING.md: corrected the imprecise "Pillow dropped on FreeBSD"
  rationale. The blocker was the missing py312-pillow flavor, not Pillow itself;
  the prebuilt py311-pillow is available, so image-render can be restored on
  FreeBSD via 3.11. Clarified screenshot also needs a display (XFCE operator
  image yes, headless osa no → image-render only there).

prettier + layered_soul validate clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-21 09:44:49 +02:00

6.2 KiB

Capability-Based Task Routing

LIVE VS PLANNED. Colibri's capability matcher exists (Colibri daemon) and works for a single-host daemon/agent pool. Cross-host transport is now LIVE (2026-06-19): a socat bridge over Tailscale exposes osa's daemon, and a poller/worker loop runs assigned tasks across hosts — validated on the debby↔osa lane (colibri PR #83). Full capability-scored routing across all three hosts is maturing on top of this transport. Sections below are labelled [LIVE] or [PLANNED].

Principle: a tool that one OS can't support is not a loss — it's a routing constraint. In a multi-agent, multi-OS fleet we don't force every capability onto every host. We let each host advertise what it can do, let each task declare what it needs, and let the scheduler send the task to a host that qualifies. FreeBSD stays lean; the capability simply lives where it's cheap.

[LIVE] What Colibri already provides (single host)

The matching engine exists today in colibri-daemon — this is working, per-host:

  • Agents carry capability tagsagents.capabilities (JSON array) in the store (colibri-store schema); registered via colibri client / --capabilities.
  • Tasks declare requirements — jobs and intake requests carry required_capabilities (colibri intake-task --capabilities <csv>).
  • The scheduler matchespick_agent(required, agents) scores each idle/active agent with capability_match_score and picks the best fit.
  • Unmatched = parked, not failed — if requirements are non-empty and no online agent matches, pick_agent returns None: the task is created but left unassigned until a capable agent appears.

Note: the daemon itself listens on a local Unix socket only. Cross-host reach is provided by the bridge below, not by the daemon binding a network port directly.

[LIVE] Cross-host topology

Implemented 2026-06-19 (colibri PR #83), using the socat-over-Tailscale approach:

  • socat bridge (colibri_bridge rc.d, daemon(8)-supervised) maps osa's daemon Unix socket to a TCP port on the Tailscale interface only (${OSA_TS_IP}:9190, never 0.0.0.0), with a pf rule on tailscale0. osa is the always-on VPS and hosts the board + orchestrator (hermes-osa); agents on debby/domedog reach it over the tailnet. (debby is an intermittent laptop — a client, never the hub.)
  • Poller/worker loopcolibri_poll.py (filters by agent UUID) and colibri_task_done.py (transition-task), driven on the live 2 min / 5 min cadence by Hermes' internal scheduler (see packaging/freebsd/colibri-agent-loop.md), not OS cron.
  • Validated on the debby↔osa lane (real tasks completed end-to-end). domedog joined 2026-06-19 via the same pattern — a client-side socat shim → osa ${OSA_TS_IP}:9190.
  • Alternative (heavier, not pursued): daemon-to-daemon federation.

[LIVE] Capability vocabulary (initial)

Piece Status Action
Capability vocabulary tags are free-form (rust, python, linux) Agree a shared tag set (below)

Flat, explicit tags — the matcher does exact string comparison, no implied hierarchy. Sourced from the probe and recorded per host in HOST-MATRIX.md.

Category Tags
OS linux, freebsd
Isolation docker, freebsd-jail
Display gui, screenshot, wayland
Hardware gpu, zfs
Runtime python3.12, node24, rust, go
Media ffmpeg, pillow/image-render

Hosts advertise only what they truly have. Actual registered agents (2026-06-19):

  • domedog (Linux, headless): linux, python3.12, rust, go, node, ffmpeg, image-render — the media/compute lane. No screenshot/gui (headless VM), no docker.
  • debby / hermes-debby (Linux): linux, docker, shell, gateway, hermes, tailscale.
  • osa / hermes-osa (FreeBSD): freebsd, shell, gateway, tailscale, rc.d, pf, nginx, acme, hermes — no image-render today. Not because Pillow is unavailable: the blocker was that the py312-pillow flavor isn't in the quarterly repo. The prebuilt py311-pillow is available (FreeBSD's pkg default is 3.11, present transitively — clawdie-iso #84), so image-render can be restored on FreeBSD by adding py311-pillow and running it on 3.11. (screenshot additionally needs a display — see the worked example.)

[DESIGN] Worked example: the tmux-screenshot skill

This illustrates the routing flow (now runnable over the [LIVE] cross-host topology above):

  1. FreeBSD ships no py312-pillow flavor in the quarterly repo, so the image has stayed lean. But the default-flavor py311-pillow is prebuilt and python311 is already present (clawdie-iso #84), so image-render can be restored on FreeBSD via py311-pillow on 3.11. screenshot additionally needs a display — the XFCE operator image has one (so screenshots work there), but headless osa does not (image-render only).
  2. The skill manifest declares required_capabilities: ["image-render"] (or screenshot).
  3. Only a Linux host advertises these — today domedog carries image-render/ffmpeg (osa dropped Pillow). screenshot additionally needs a display, so a headless host does not qualify for it.
  4. Colibri routes the task to a matching host automatically — proven 2026-06-19: an image-render task routed to domedog; with no match it parks until a capable agent appears.

The capability moved hosts. It was never lost.

See HIVE-ONBOARDING.md for the hive-onboarding vision built on this routing layer, MCP-INTEGRATION.md for connecting agents to the board over MCP, AGENTS.md for the agent matrix, HOST-MATRIX.md for per-host facts, and TOOLCHAIN.md for runtime versions.