docs(host-matrix): add infrastructure cost provenance (Sam & Pi)

Track hosting spend as a verified fleet fact alongside disk and hardware, seed TBD rows for osa/domedom/debby/proposed OVH build capacity/ML350p, and update HIVE status now that first-proof blockers are code-complete.\n\nValidation: npx --yes prettier@3 --check docs/HOST-MATRIX.md docs/HIVE-ONBOARDING.md; python3 scripts/layered_soul.py validate .
2026-06-20 09:48:12 +02:00 · 2026-06-20 09:48:12 +02:00 · 058e4ce926
commit 058e4ce926
parent 4192574f74
2 changed files with 71 additions and 44 deletions
--- a/docs/HIVE-ONBOARDING.md
+++ b/docs/HIVE-ONBOARDING.md
@ -2,7 +2,7 @@

 **LIVE VS PLANNED.** This is a **design/vision** doc. The building blocks are real and
 proven (Bastille jails on osa, capability routing, `register-agent`, and the
-`clawdie-vault-fetch` flow validated end-to-end on domedog 2026-06-19). The *platform*
+`clawdie-vault-fetch` flow validated end-to-end on domedog 2026-06-19). The _platform_
 described here — `colibri-vault` as a crate, multi-tenant buckets, the mother skill — is
 `[PLANNED]`. The thesis: it is mostly **composition of pieces we already have**, not new
 invention. Sections are tagged `[LIVE]` / `[PLANNED]`.
@ -14,32 +14,35 @@ invention. Sections are tagged `[LIVE]` / `[PLANNED]`.
 The four MVP steps (§8) are **code-complete on colibri `main`**:

 | MVP step                    | Status                    | Landed via                                  |
-| -------- | ------ | ---------- |
-| 1. `colibri-vault` crate | done; hardening in flight | #85 → #94 → PR #100 (server-match + serialize) |
+| --------------------------- | ------------------------- | ------------------------------------------- |
+| 1. `colibri-vault` crate    | done; hardening in flight | #85 → #94 → #100 (server-match + serialize) |
 | 2. `tenants` table          | on `main`                 | (PR #90 closed as superseded)               |
 | 3. spawner → provision hook | done                      | #91 (root-verify) → #94 (wired)             |
 | 4. `mother` skill           | done (draft)              | layered-soul                                |

 Supporting pieces merged: `agent-jail-bootstrap.sh` (#96 → #97 version-pin → #104
 cold-cache guard), `provider.env` staging (#69/#99), vault-fetch shell helper
-server-match (#67/#68/#69).
+server-match (#67/#68/#69), and the first-proof runbook (#103).

-**First proof is *not* code-blocked** — the chain works today via the interim manual
+**First proof is _not_ code-blocked** — the chain works today via the interim manual
 path in [`../docs/VAULT-PROVISION-FIRST-PROOF.md`](https://code.smilepowered.org/clawdie/colibri)
-(colibri). Critical path: merge PR #100 + #103 → run the runbook (scratch jail + test
-collection, manual SQLite tenant insert, raw-socket jailed spawn) → verify `.env` at
-`0600` + tenant `active`.
+(colibri). Critical path now: operator runs the runbook (scratch jail + test collection,
+manual SQLite tenant insert, raw-socket jailed spawn) → verify `.env` at `0600` + tenant
+`active`.

 Open work, categorized:

- **Hardening:** colibri PR #100 (closes #95), #92 (path canonicalization/containment).
+- **Hardening:** #92 (path canonicalization/containment).
 - **CLI-driveability (post-proof ergonomics, not proof blockers):** #101 (`register-tenant`
  command), #102 (`--jail` on `spawn-agent`) — these replace the runbook's manual steps.
 - **Source-of-truth/naming:** #98 (`npm-node24` vs `npm`), clawdie-iso #70 (agent-jail
  section in `pkg-list-jails.txt`).
+- **Cost/source-of-truth:** fill `docs/HOST-MATRIX.md` cost provenance rows before buying
+  or retiring build capacity; compare OVH quotes/invoices against measured self-host power.

-**One-line plan:** merge #100 + #103 → run the runbook for the first proof → then land
-#101/#102 for CLI driveability, and #92 before promoting past scratch.
+**One-line plan:** run the first-proof runbook → then land #101/#102 for CLI driveability,
+#92 before promoting past scratch, and fill verified OVH/self-host cost data before buying
+or depending on a new mother/build host.

 ---

@ -50,7 +53,7 @@ primitive**. Promote it from the `clawdie-vault-fetch` shell helper to a first-c
 crate, **`colibri-vault`**, sitting beside `colibri-spawner` / `colibri-store`:

 - **in:** a tenant id (→ a bucket) + a target jail/home
- **out:** a `0600` `.env` materialized *inside the jail*, owned by the jail user
+- **out:** a `0600` `.env` materialized _inside the jail_, owned by the jail user
 - wraps the `bw` CLI for now (do **not** reimplement the Bitwarden protocol), fail-closed,
  idempotent, no-op when there is no bucket

@ -76,8 +79,8 @@ indirection than that.

 On "folder vs bucket":

- **Folders** are personal-vault organization → fine for *Clawdie's own internal* agents.
- **Organization + Collections** give *access-scoped isolation* → the multi-tenant
+- **Folders** are personal-vault organization → fine for _Clawdie's own internal_ agents.
+- **Organization + Collections** give _access-scoped isolation_ → the multi-tenant
  primitive. One customer = one Collection; a scoped credential reads only that collection.
 - **Do not** run a separate Vaultwarden instance per customer — Collections are exactly
  this feature.
@ -91,7 +94,7 @@ On "folder vs bucket":
  the orchestrator, that can read any tenant collection to provision jails.

 Everything non-secret — harness, base config, model-routing prefs — **ships in the
-clawdie-iso image**. The image is the *body*; the bucket is the *one private nerve*.
+clawdie-iso image**. The image is the _body_; the bucket is the _one private nerve_.

 ## 5. [PLANNED] The mother skill

@ -105,7 +108,7 @@ mother := resolve-identity (layered-soul)
 ```

 - **Narrow:** onboarding — births one working agent from a bare jail.
- **Wide:** self-replication. An agent that *holds* the mother skill can spawn and
+- **Wide:** self-replication. An agent that _holds_ the mother skill can spawn and
  provision more jails (a queen births workers, each inheriting the mother skill), gated
  by capability/policy so it cannot run away. That is "agent swarms with a mother skill,"
  and `colibri-vault` is how each birth gets its one nerve.
@ -119,7 +122,7 @@ osa/FreeBSD/Bastille is the natural womb — cheap, dense, isolated jails.
 > already shipped.

 A one-key agent on osa needs `image-render`? It routes to a Linux lane (domedog). Needs a
-build? Routes to a capable host. The customer pays for *one agent* but stands on a
+build? Routes to a capable host. The customer pays for _one agent_ but stands on a
 survivable, multi-OS hive. Anyone can run an LLM in a container; few hand you a swarm
 behind one key — **capability routing is the differentiator.**

@ -133,7 +136,7 @@ behind one key — **capability routing is the differentiator.**
 **Bootstraps live on the host; jails hold only their resolved secrets.**

 - The orchestrator holds the org service-account credential. It fetches a tenant's
-  collection, writes the resolved `.env` *into* the jail, and the **bootstrap never enters
+  collection, writes the resolved `.env` _into_ the jail, and the **bootstrap never enters
  the jail**. A compromised jail cannot re-fetch and cannot reach another tenant.
 - Per-tenant blast radius = one collection. Scoped credential, never a master.
 - This is the same shape the domedog smoke test validated (bootstrap on host, `.env` is the
@ -151,14 +154,15 @@ Smallest path that is real:

 **First-proof policy.** The first proven end-to-end runs against a **scratch jail + a
 throwaway test collection only** — no real tenant data until the path hardening lands
-(canonicalize + allowed-root containment, colibri issue #92). The two first-proof blockers
-are colibri **#88** (resolve the collection by name) and **#89** (per-call unlock); #92 is
-hardening that follows. Tracker state lives on those issues.
+(canonicalize + allowed-root containment, colibri issue #92). The former first-proof
+blockers — colibri **#88** (resolve the collection by name) and **#89** (per-call unlock)
+— are resolved on `main`; the remaining first-proof step is the operator-run scratch
+runbook. #92 is hardening that follows before real tenant data.

 **Overengineering traps to avoid for now:** a custom Bitwarden web UI (Vaultwarden's own UI
-+ a Collection is enough to start), billing/metering, a native Bitwarden protocol in Rust,
-multi-region control plane, and recursive auto-spawn (gate it off until policy exists).
-Those are product layers; the four steps above are the engine.
+plus a Collection is enough to start), billing/metering, a native Bitwarden protocol in
+Rust, multi-region control plane, and recursive auto-spawn (gate it off until policy
+exists). Those are product layers; the four steps above are the engine.

 ---

--- a/docs/HOST-MATRIX.md
+++ b/docs/HOST-MATRIX.md
@ -19,6 +19,10 @@ on any host fills in its own row. Source of truth for facts is the probe — not
 > real free space (`df -h /`, or the probe's `--storage`) — never estimate. Keep the
 > **Disk (free)** column current and flag any host past ~85%. See _Disk discipline_ below.
 >
+> **Cost before buying:** before purchasing or retiring infrastructure, record provider,
+> plan/SKU, verified monthly cost, and the source of truth (invoice/control panel/utility
+> bill). IP-range guesses are not billing proof. See _Cost provenance_ below.
+>
 > **Never paste real IPs or bot handles here.** Use `${HOST_TS_IP}` and `${*_BOT}`
 > placeholders; real values live in `fleet.env` (gitignored) and are live via
 > `tailscale status`. Copy `fleet.env.example` → `fleet.env` to resolve them. The probe
@ -29,7 +33,7 @@ on any host fills in its own row. Source of truth for facts is the probe — not
 ## 1. Agent placement (who runs where)

 | Agent             | Host    | OS / Isolation        | Harness                      | Role                                                                      | Bot / channel               | Status                                 |
-| ----------- | ------- | --------------------------- | ---------------------------- | -------------------------------- | --------------------- | ----------------------------- |
+| ----------------- | ------- | --------------------- | ---------------------------- | ------------------------------------------------------------------------- | --------------------------- | -------------------------------------- |
 | Hermes            | debby   | Debian 13 / Docker    | Hermes Agent (upstream)      | Secondary agent + soul backup (intermittent laptop)                       | ${HERMES_BOT}               | LIVE (intermittent)                    |
 | Zot               | debby   | Debian 13 / Docker    | Zot RPC                      | Coding, media workflows                                                   | ${ZOT_BOT}                  | LIVE                                   |
 | Claude            | domedog | Ubuntu 24.04 / Docker | Claude Code                  | Verification, review                                                      | — (CLI)                     | LIVE                                   |
@ -65,7 +69,7 @@ on any host fills in its own row. Source of truth for facts is the probe — not
 ## 2. Host hardware & facts (one row per host)

 | Host        | Tailscale IP     | OS / Kernel                        | Virt                  | CPU                                    | vCPU | RAM     | Swap                  | Disk (free)                  | GPU                    | Probed     | By     |
-| ----------- | -------------- | ---------------------------------- | --------------------- | -------------------------------------- | ---- | ------- | --------------------- | ---------------------------- | ---------------------- | ---------- | ------ |
+| ----------- | ---------------- | ---------------------------------- | --------------------- | -------------------------------------- | ---- | ------- | --------------------- | ---------------------------- | ---------------------- | ---------- | ------ |
 | **domedog** | ${DOMEDOG_TS_IP} | Ubuntu 24.04.4 / 6.8.0-117         | KVM                   | AMD EPYC 7543P (32-core host)          | 2    | 7.8 GiB | 2.0 GiB               | 100 GB QEMU (51G free)       | none (headless)        | 2026-06-17 | Claude |
 | **debby**   | ${DEBBY_TS_IP}   | Debian 13 / 6.12.90+deb13.1-amd64  | bare metal            | AMD Ryzen 7 5700U (8-core)             | 16   | 15 GiB  | 15 GiB                | nvme0n1p2 453G (23G free)    | Radeon Graphics (iGPU) | 2026-06-17 | Hermes |
 | **osa**     | ${OSA_TS_IP}     | FreeBSD 15.0-RELEASE-p10 / GENERIC | not reported by probe | Intel Core Processor (Haswell, no TSX) | 6    | 11 GiB  | not reported by probe | ZFS pool: zroot (23.4G free) | not reported by probe  | 2026-06-17 | Pi     |
@ -87,6 +91,25 @@ Disk is a first-class fact, same as OS or CPU — **measure it before you act, d
 This is the survivability principle applied to storage: a host that silently fills up is a
 host that fails. What you guess will be wrong; what you probe will be right.

+### Cost provenance (invoice/control-panel facts, not guesses)
+
+Hosting spend is a first-class fleet fact, but it must stay non-secret: record provider,
+plan/SKU, region, verified monthly cost, and the proof source. Do **not** commit invoice
+IDs, account numbers, billing addresses, or payment details. If a provider is inferred from
+an IP range, mark it `TBD` until the control panel or invoice confirms it.
+
+| Host / candidate                   | Provider                                                           | Plan / SKU                                | Region | Monthly cost      | Billing cycle | Role paid for                                    | Source / proof                        | Status / notes                                                                                     |
+| ---------------------------------- | ------------------------------------------------------------------ | ----------------------------------------- | ------ | ----------------- | ------------- | ------------------------------------------------ | ------------------------------------- | -------------------------------------------------------------------------------------------------- |
+| **osa**                            | TBD (verify; OVHcloud is suspected but not invoice-confirmed here) | TBD                                       | TBD    | TBD               | TBD           | always-on orchestrator + board + Hermes gateway  | operator invoice/control panel needed | Existing always-on VPS; do not treat IP range as proof.                                            |
+| **domedog**                        | TBD                                                                | TBD                                       | TBD    | TBD               | TBD           | Linux media/compute lane                         | operator invoice/control panel needed | Existing Linux VM; cost not tracked yet.                                                           |
+| **debby**                          | self-owned laptop                                                  | —                                         | local  | utility/power TBD | —             | intermittent secondary agent + soul backup       | local device + utility rate if needed | Not an always-on hub; power cost only matters when left on.                                        |
+| **mother-build** (candidate)       | proposed OVHcloud                                                  | TBD: Public Cloud hourly or Eco/dedicated | TBD    | TBD               | TBD           | FreeBSD build host / poudriere / Rust+zot builds | OVH quote needed before purchase      | Prefer on-demand if builds are infrequent; dedicated only if build demand justifies standing cost. |
+| **ML350p Gen8** (candidate/retire) | self-hosted hardware                                               | owned hardware                            | local  | power TBD         | utility bill  | fallback build host only                         | measured watts + actual €/kWh needed  | Do not make critical paths depend on it until reliability and TCO beat cloud.                      |
+
+Cost discipline mirrors disk discipline: measure before action. For self-hosted hardware,
+calculate monthly power with `watts / 1000 * 24 * 30 * €/kWh` using measured idle/load
+wattage and the actual utility rate; do not compare cloud invoices to guessed electricity.
+
 ---

 ## 3. Per-host detail (expand as needed)
@ -113,7 +136,7 @@ host that fails. What you guess will be wrong; what you probe will be right.
    in `~/.colibri/` — `colibri_cmd.py` (raw JSON), `colibri_poll.py`, `colibri_task_done.py`.
  - **Validated**: register → scheduler routed an `image-render` task to domedog → poller saw
    it → worker marked it `done` (2026-06-19).
-  - **Executor pending (decision required)**: domedog *receives* capability-matched tasks, but
+  - **Executor pending (decision required)**: domedog _receives_ capability-matched tasks, but
    no persistent execution loop runs yet — until one does, routed tasks sit `started` (no
    lease/reaper). Decide what executes (Claude Code worker / script) and with what authority
    before relying on autonomous domedog task completion.