layered-soul/docs/HOST-MATRIX.md
Sam & Claude 815c482a7d docs: promote operator conventions + refresh stale facts
Pull durable knowledge out of agent session memory into the cross-harness
contract so every harness/agent honors it, not just this session:

- USER.md: new Conventions & voice section (EU date format DD.mon.YYYY,
  positive instruction framing, plain-language naming + detection not
  sniffing, lean/current docs). Colibri fact 12 -> 13 crates, MIT, v0.11.0.
- AGENTS.md: two operating rules (verify on the forge not local git status;
  CI dormant by choice, merges ride local gates, domedog stays Docker-free).
- HOST-MATRIX.md + AGENTS.md matrix: domedog isolation Docker -> host
  (no Docker), matching the probe in HOST-MATRIX section 3.
- curated/: colibri 13 crates/MIT/0.11.0 + vault, python3=3.11 policy,
  real Docker layout (debby only; domedog Docker-free), hermes-bsd row.

Validated: scripts/layered_soul.py validate . -> OK.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-21 13:02:26 +02:00

24 KiB
Raw Blame History

Host & Agent Matrix (shared, fill-as-you-go)

A living inventory of who runs where and what each host actually is. Any agent on any host fills in its own row. Source of truth for facts is the probe — not memory.

How to fill your row

cd ~/layered-soul
python3 scripts/verify_facts_probe.py --os --hardware --storage --network --text

Copy the verified values into the tables below, set Probed to today's UTC date, and commit. Never guess hardware, OS, or IPs — paste what the probe reports. On FreeBSD the probe synthesizes an OS-specific command map; trust its output over Linux habits.

Disk before action: before installing a toolchain or starting a build, check real free space (df -h /, or the probe's --storage) — never estimate. Keep the Disk (free) column current and flag any host past ~85%. See Disk discipline below.

Cost before buying: before purchasing or retiring infrastructure, record provider, plan/SKU, verified monthly cost, and the source of truth (invoice/control panel/utility bill). IP-range guesses are not billing proof. See Cost provenance below.

Never paste real IPs or bot handles here. Use ${HOST_TS_IP} and ${*_BOT} placeholders; real values live in fleet.env (gitignored) and are live via tailscale status. Copy fleet.env.examplefleet.env to resolve them. The probe prints real IPs — record them in fleet.env, not in this table.


1. Agent placement (who runs where)

Agent Host OS / Isolation Harness Role Bot / channel Status
Hermes debby Debian 13 / Docker Hermes Agent (upstream) Secondary agent + soul backup (intermittent laptop) ${HERMES_BOT} LIVE (intermittent)
Zot debby Debian 13 / Docker Zot RPC Coding, media workflows ${ZOT_BOT} LIVE
Claude domedog Ubuntu 24.04 / host (no Docker) Claude Code Verification, review — (CLI) LIVE
Mevy osa FreeBSD 15 / host Hermes Agent (upstream, CLI) Consolidated into hermes-osa ${HERMES_OSA_BOT} (OSA-bot) LIVE — under hermes-osa
hermes-osa osa FreeBSD 15 / host Hermes Agent (FreeBSD fork) Orchestrator + board host (always-on VPS): chat + gateway ${HERMES_OSA_BOT} (OSA-bot) LIVE — chat + Telegram
Codex osa FreeBSD 15 / jail Codex CLI ISO builds, validation — (CLI) LIVE
domedog-agent domedog Ubuntu 24.04 / host Colibri board agent Headless Linux media/compute lane (image-render, ffmpeg, rust/go/py/node) LIVE — on central board 2026-06-19

Mevy vs hermes-osa distinction: Mevy (${HERMES_OSA_BOT} / OSA-bot) has been consolidated into hermes-osa as of 2026-06-17. The Telegram bot token was migrated from the old backup .env. hermes-osa now runs both the local CLI chat and the Telegram gateway (polling mode, tmux session hermes-gateway).

Status key: LIVE = running and validated right now. INSTALLED = binary present, not yet validated in role. PLANNED = not yet set up. No guessing.

Notes:

  • Provider per agent (DeepSeek / OpenRouter / Z.AI / local) — fill in the per-host table.
  • One Telegram token per running service. Never share a token across instances.
  • Orchestrator lives on the always-on host. osa is the always-on VPS and hosts the colibri board + orchestrator (hermes-osa). debby is an intermittent laptop (powers off periodically) — a secondary agent + soul backup, never the hub. The board must sit where it never disappears; tasks routed to debby simply park until it returns.
  • Routing: Colibri has a capability matcher for per-host agent pools, and cross-host routing is LIVE (2026-06-19): a socat bridge exposes osa's colibri-daemon on its Tailscale IP (${OSA_TS_IP}:9190, tailnet-only); agents on debby/domedog reach the osa board over the tailnet, and a poller (2 min) / worker (5 min) loop executes assigned tasks. Validated on the debby↔osa lane; colibri PR #83. See CAPABILITY-ROUTING.md.
  • Probe vs identity: verify_facts_probe.py is a required discipline/tool, not an automatic startup hook — agents run it when grounding host facts, and HOST-MATRIX records the result. OS/hardware facts come from probes and the matrix, not from SOUL.md (which carries identity and values).

2. Host hardware & facts (one row per host)

Host Tailscale IP OS / Kernel Virt CPU vCPU RAM Swap Disk (free) GPU Probed By
domedog ${DOMEDOG_TS_IP} Ubuntu 24.04.4 / 6.8.0-117 KVM AMD EPYC 7543P (32-core host) 2 7.8 GiB 2.0 GiB 100 GB QEMU (51G free) none (headless) 2026-06-17 Claude
debby ${DEBBY_TS_IP} Debian 13 / 6.12.90+deb13.1-amd64 bare metal AMD Ryzen 7 5700U (8-core) 16 15 GiB 15 GiB nvme0n1p2 453G (23G free) Radeon Graphics (iGPU) 2026-06-17 Hermes
osa ${OSA_TS_IP} FreeBSD 15.0-RELEASE-p10 / GENERIC not reported by probe Intel Core Processor (Haswell, no TSX) 6 11 GiB not reported by probe ZFS pool: zroot (23.4G free) not reported by probe 2026-06-17 Pi

Disk discipline (check, don't guess)

Disk is a first-class fact, same as OS or CPU — measure it before you act, don't estimate.

  • Before installing a toolchain or starting a build, run df -h / (Linux) or zfs list / df -h (FreeBSD), or the probe's --storage. Confirm the headroom is really there.
  • Keep the Disk (free) column above current when you add or remove anything large.
  • Flag any host past ~85% used. Reference footprints to budget with: Go SDK ≈ 290 MB, Rust toolchain (~/.rustup + ~/.cargo) ≈ 1.8 GB, a Node version ≈ 150 MB; build/module caches grow on top of these.
  • Standing watch: debby runs ~95% full (23 GB free). Treat new installs/builds there as a deliberate decision, not a default — prefer the host with real headroom.

This is the survivability principle applied to storage: a host that silently fills up is a host that fails. What you guess will be wrong; what you probe will be right.

Cost provenance (invoice/control-panel facts, not guesses)

Hosting spend is a first-class fleet fact, but it must stay non-secret: record provider, plan/SKU, region, verified monthly cost, and the proof source. Do not commit invoice IDs, account numbers, billing addresses, or payment details. If a provider is inferred from an IP range, mark it TBD until the control panel or invoice confirms it.

Host / candidate Provider Plan / SKU Region Monthly cost Billing cycle Role paid for Source / proof Status / notes
osa TBD (verify; OVHcloud is suspected but not invoice-confirmed here) TBD TBD TBD TBD always-on orchestrator + board + Hermes gateway operator invoice/control panel needed Existing always-on VPS; do not treat IP range as proof.
domedog TBD TBD TBD TBD TBD Linux media/compute lane operator invoice/control panel needed Existing Linux VM; cost not tracked yet.
debby self-owned laptop local utility/power TBD intermittent secondary agent + soul backup local device + utility rate if needed Not an always-on hub; power cost only matters when left on.
mother-build (candidate) proposed OVHcloud TBD: Public Cloud hourly or Eco/dedicated TBD TBD TBD FreeBSD build host / poudriere / Rust+zot builds → serves pkg.clawdie.si (first-party pkg repo) OVH quote needed before purchase Prefer on-demand if builds are infrequent; dedicated only if build demand justifies standing cost.
ML350p Gen8 (candidate/retire) self-hosted hardware owned hardware local ~€5363/mo @ 460 W high-load estimate utility bill multitenant/build candidate; fallback if TCO beats cloud GEN-I + URO tariff research; fan/PSU label, not wall-metered Use as planning band only; measure wall draw before committing tenants.
vultr-svc (Forgejo + Vaultwarden) Vultr TBD TBD (verify EU) TBD TBD git mirror (layered-soul + hermes-soul) + Vaultwarden secrets store DNS code/vault.smilepowered.org → Vultr (verified 2026-06-20); invoice needed Off-OVH backup target (good) BUT Forgejo + Vault share one box → SPOF for backups AND secrets; needs own off-box backup + EU-region verify + MFA

Cost discipline mirrors disk discipline: measure before action. For self-hosted hardware, calculate monthly power with watts / 1000 * 24 * 30 * €/kWh using measured idle/load wattage and the actual utility rate; do not compare cloud invoices to guessed electricity.

ML350p Gen8 planning note: for the multitenant/high-load case, use the visible fan/PSU-side 460 W mark as the conservative continuous-load assumption until a wall meter proves otherwise.

  • Monthly energy: 0.460 kW * 24 h * 30.4375 d = ~336 kWh/month.
  • GEN-I regular household ET energy price: 0.13286 EUR/kWh with VAT → ~€44.6/mo energy-only.
  • Add URO network-energy ET estimate (0.01864 EUR/kWh before VAT, 0.02274 EUR/kWh with VAT) → **€52.3/mo** variable electricity + network-energy estimate.
  • Practical planning band with smaller per-kWh state charges: ~€53/mo if 460 W is wall draw; ~€5963/mo if 460 W is output-side load at ~9085% PSU efficiency.
  • Annualized planning band: ~€640760/year.

Registry & supply-chain provenance

What an agent consumes splits into two layers, each with its own registry. Record which are first-party (we run/sign them) versus third-party (external, untrusted until vetted). Rationale and the curation flow live in HIVE-ONBOARDING.md §10.

Registry / source Layer Ownership Direction Status
pkg.clawdie.si (poudriere) OS packages first-party we host/sign [PLANNED] — on mother-build
first-party skill repo (proposed skills.clawdie.si) skills first-party we host/sign [PLANNED]
clawhub.ai (https://clawhub.ai/api/v1) skills third-party we pull only external — Hermes ClawHubSource
skills.sh, lobehub, browse.sh, claude-marketplace skills third-party we pull only external — Hermes community sources
public FreeBSD pkg mirrors OS packages third-party we pull only external — to be fronted by poudriere

Key point: clawhub.ai is not Clawdie infrastructure and is unrelated to the planned pkg.clawdie.si — different layer (skills vs OS packages) and different ownership (upstream we consume vs server we operate). Paid tenants are provisioned from first-party rows only.


3. Per-host detail (expand as needed)

domedog (Claude / verification) — probed 2026-06-17 by Claude

  • Identity: hostname domedog.pro, Tailscale ${DOMEDOG_TS_IP}
  • OS: Ubuntu 24.04.4 LTS, kernel 6.8.0-117-generic, x86_64, KVM guest
  • CPU: AMD EPYC 7543P 32-Core (2 vCPU exposed to guest)
  • Memory: 7.8 GiB RAM, 2.0 GiB swap
  • Storage: /dev/sda1 96 GB ext4 root, 51 GB free (QEMU HARDDISK). No ZFS.
  • GPU: none (headless VM)
  • Uptime at probe: ~3.5 weeks
  • Role here: Claude Code — verification & review lane. No Telegram bot.
  • Colibri agent (joined central board 2026-06-19) — the headless Linux media/compute lane:
    • Capabilities advertised: linux, python3.12, rust, go, node, ffmpeg, image-render. Not screenshot/gui (headless VM), not docker (absent). In the always-on fleet image-render/ffmpeg are domedog-only; the FreeBSD operator image (live USB) also advertises image-render + screenshot via py311-pillow (clawdie-iso #85).
    • Reach: client shim colibri-shim.service (system unit, User=clawdija, Restart=always, reboot-persistent) runs socat UNIX-LISTEN:~/.colibri/colibri.sock → TCP ${OSA_TS_IP}:9190 (osa bridge over Tailscale). A system unit, not --user: systemctl --user has no bus on this host.
    • Operate: ~/.colibri/agent.env holds COLIBRI_AGENT_ID + COLIBRI_SOCKET; helpers in ~/.colibri/colibri_cmd.py (raw JSON), colibri_poll.py, colibri_task_done.py.
    • Validated: register → scheduler routed an image-render task to domedog → poller saw it → worker marked it done (2026-06-19).
    • Executor pending (decision required): domedog receives capability-matched tasks, but no persistent execution loop runs yet — until one does, routed tasks sit started (no lease/reaper). Decide what executes (Claude Code worker / script) and with what authority before relying on autonomous domedog task completion.

debby (Hermes secondary + Zot — intermittent laptop) — probed 2026-06-17 by Hermes

  • Identity: hostname debby, Tailscale ${DEBBY_TS_IP}
  • OS: Debian 13 (Trixie), kernel 6.12.90+deb13.1-amd64, bare metal (KDE Plasma desktop)
  • CPU: AMD Ryzen 7 5700U with Radeon Graphics, 8 physical cores, 16 threads
  • Memory: 15 GiB RAM, 15 GiB swap
  • Storage: /dev/nvme0n1p2 453 GB ext4 root, 23 GB free (95% full). No ZFS.
  • GPU: AMD Radeon Graphics (integrated, Lucienne)
  • Containers: Docker 29.5.3 installed (daemon not currently running)
  • Hermes Agent: v0.16.0 (upstream f9c8d95e), DeepSeek v4 Pro primary provider, OpenRouter for vision/fallback, Z.AI/GLM available
  • Zot RPC: Go binary at ~/.local/bin/zot, GLM-5.1 model
  • Telegram: ${HERMES_BOT} + ${ZOT_BOT} in "My Debby" group
  • Layered soul: commit 817624c, 6 curated memories, 9 cross-harness skills

osa (FreeBSD: hermes-osa orchestrator + board host, always-on VPS; + Mevy + Codex) — probed 2026-06-17 by hermes-osa

  • Identity: hostname osa.smilepowered.org, Tailscale ${OSA_TS_IP}
  • OS: FreeBSD 15.0-RELEASE-p10, kernel FreeBSD osa.smilepowered.org 15.0-RELEASE-p10 FreeBSD 15.0-RELEASE-p10 releng/15.0-n281064-98258a339269 GENERIC amd64
  • CPU: Intel Core Processor (Haswell, no TSX), 6 vCPU
  • Memory: 11 GiB RAM
  • Storage: ZFS pool zroot, 98.5G ONLINE, 23.4G available
  • Jails: cms and worker (Bastille jails); Docker not installed
  • Agents on host:
    • hermes-osa — Hermes Agent v0.16.0 (hermes-bsd clean-room MIT fork), FreeBSD local CLI runtime + Telegram gateway. Status: LIVE — validated local chat + Telegram. Default provider: DeepSeek direct (provider: deepseek, default: deepseek-chat). OpenRouter available as fallback/manual lane. Telegram/gateway: LIVE — ${HERMES_OSA_BOT} (Mevy/OSA-bot), polling mode, tmux session hermes-gateway on osa. Daemon/rc.d: deferred (Track A).
    • Mevy — ${HERMES_OSA_BOT} (OSA-bot) — now consolidated under hermes-osa gateway. Token migrated from old backup .env.
    • Codexcodex-cli 0.117.0, ISO builds and validation. Runs in a Bastille jail.
    • Claude Code — installed (path: /home/clawdie/.npm-global/bin/claude), no dedicated role yet.
  • Provider stack (hermes-osa):
    provider: deepseek # primary — direct credits, proven DEEPSEEK_OK
    default: deepseek-chat
    fallback: openrouter # available manually, not auto-fallback configured yet
    
  • Z.AI: deferred (not configured for hermes-osa; available via OpenRouter if needed)
  • Telegram: LIVE — ${HERMES_OSA_BOT}, polling mode, connected 2026-06-17
  • Gateway: LIVE — running in tmux session hermes-gateway, manual start (no rc.d yet)
  • Launch command:
    tmux new -s hermes-osa
    cd /home/clawdie/ai/hermes-bsd
    export HERMES_HOME=/home/clawdie/.hermes
    source venv/bin/activate    # or: .venv/bin/activate
    hermes chat
    
  • Layered soul: commit c9c88fd, 10 skills, 7 curated memories
  • Future tracks (separate, none blocking):
    • Track A: daemon/rc.d promotion (hermes_daemon service, dedicated user)
    • Track B: Telegram/gateway integration DONE (2026-06-17) — gateway daemonization (rc.d) still deferred
    • Track C: Colibri cross-host routing DONE (2026-06-19)socat bridge on osa :9190 (Tailscale-only) + poller/worker loop; colibri PR #83 merged. See CAPABILITY-ROUTING.md
    • Track D: old clawdie_glass cleanup

See ../AGENTS.md for the canonical agent matrix and operating rules.

§4 Compliance standing constraints

  • EU region only: All OVHcloud resources in FR/DE/PL. Sidesteps non-EU transfer/SCC burden under GDPR.
  • Off-box backup before any reinstall: OVH DPA §10 + GTS §6.3/6.5/10.6 — reinstall/termination = irreversible deletion including OVH-side backups, no recovery, OVH not liable. Identity/skills covered by git (layered-soul + hermes-soul on Forgejo). Runtime state (ZFS snapshots, Vaultwarden DB) must be verified backed up outside OVH.
    • Backup independence (verified 2026-06-20): Forgejo and Vaultwarden both run on Vultr (the code / vault.smilepowered.org host) — a different provider than osa/OVH, so an OVH loss does not take the git backup (good). But Forgejo and Vaultwarden share that one Vultr box, making it a single point of failure for both the backups and all secrets. → that box needs its own off-box backup (Vaultwarden DB export + Forgejo data to a third location), and backups are unverified until test-restored (cost-discipline applies to backups: check, don't assume). Add the Vultr host to the provenance table; apply EU-region (verify) + MFA to it too.
  • MFA on every master-key account: GTS §2.3/2.4 — operator is liable for fraudulent account use. Enable MFA on OVH, Vultr, the domain registrar (clawdie.si / smilepowered.org), Forgejo admin, and Vaultwarden — each is a master key to the fleet. Auto-renew the domains: a lapsed domain silently kills pkg.clawdie.si, ACME certs, and SSH-by-hostname.
  • Billing hygiene: provider auto-renew is on by default (OVH/Vultr) — disable before the 19th of the month if not renewing. Commitment Periods lock you in (full term due, no refund for early cancel/non-use). Act on price-increase / end-of-life notices within the 30-day cancel window. Track renewal dates per provider in the provenance table.
  • Continuity plan (contractually required): OVH GTS §6.3 makes a recovery plan the Client's obligation, and §4/§10 cap provider liability at service credits — no data-loss or downtime damages. The fleet's multi-host survivability (Linux/Docker + FreeBSD/jails, relocatable via layered-soul) is the recovery plan; pair it with the off-box backups above.
  • Do not commit OVH contracts/credentials: GTS §13 makes contract terms confidential. A compliance summary only in public repos — no verbatim DPA/GTS text, no NIC handles or login credentials.

Multi-tenant GDPR gates (administrative, not technical)

These switch on when the hive goes multi-tenant. None block current internal use:

  • GDPR controller docs (privacy notice, legal basis for processing, ROPA)
  • DPIA only if agents make automated decisions about individuals with legal/significant effect (GDPR Art. 35/22) — the internal agent task scheduler (routing work to machines) does not trigger this
  • Pass OVH terms down to customers (GTS §10.6 — sub-licensing)
  • Third-party / "AAA" professional indemnity insurance (§10.6)
  • Customer sanctions screening (GTS §14.3 — denied parties / export controls)
  • Data Processing Agreement with each tenant (DPA §12 — controller→processor chain)

See HIVE-ONBOARDING.md §9 for the integration checklist.