Track hosting spend as a verified fleet fact alongside disk and hardware, seed TBD rows for osa/domedom/debby/proposed OVH build capacity/ML350p, and update HIVE status now that first-proof blockers are code-complete.\n\nValidation: npx --yes prettier@3 --check docs/HOST-MATRIX.md docs/HIVE-ONBOARDING.md; python3 scripts/layered_soul.py validate .
8.6 KiB
Hive Onboarding — colibri-vault and the "join the hive" primitive
LIVE VS PLANNED. This is a design/vision doc. The building blocks are real and
proven (Bastille jails on osa, capability routing, register-agent, and the
clawdie-vault-fetch flow validated end-to-end on domedog 2026-06-19). The platform
described here — colibri-vault as a crate, multi-tenant buckets, the mother skill — is
[PLANNED]. The thesis: it is mostly composition of pieces we already have, not new
invention. Sections are tagged [LIVE] / [PLANNED].
Status — 2026-06-20
The four MVP steps (§8) are code-complete on colibri main:
| MVP step | Status | Landed via |
|---|---|---|
1. colibri-vault crate |
done; hardening in flight | #85 → #94 → #100 (server-match + serialize) |
2. tenants table |
on main |
(PR #90 closed as superseded) |
| 3. spawner → provision hook | done | #91 (root-verify) → #94 (wired) |
4. mother skill |
done (draft) | layered-soul |
Supporting pieces merged: agent-jail-bootstrap.sh (#96 → #97 version-pin → #104
cold-cache guard), provider.env staging (#69/#99), vault-fetch shell helper
server-match (#67/#68/#69), and the first-proof runbook (#103).
First proof is not code-blocked — the chain works today via the interim manual
path in ../docs/VAULT-PROVISION-FIRST-PROOF.md
(colibri). Critical path now: operator runs the runbook (scratch jail + test collection,
manual SQLite tenant insert, raw-socket jailed spawn) → verify .env at 0600 + tenant
active.
Open work, categorized:
- Hardening: #92 (path canonicalization/containment).
- CLI-driveability (post-proof ergonomics, not proof blockers): #101 (
register-tenantcommand), #102 (--jailonspawn-agent) — these replace the runbook's manual steps. - Source-of-truth/naming: #98 (
npm-node24vsnpm), clawdie-iso #70 (agent-jail section inpkg-list-jails.txt). - Cost/source-of-truth: fill
docs/HOST-MATRIX.mdcost provenance rows before buying or retiring build capacity; compare OVH quotes/invoices against measured self-host power.
One-line plan: run the first-proof runbook → then land #101/#102 for CLI driveability, #92 before promoting past scratch, and fill verified OVH/self-host cost data before buying or depending on a new mother/build host.
1. The core idea
The Vaultwarden→.env fetch we proved is not a utility — it is the onboarding
primitive. Promote it from the clawdie-vault-fetch shell helper to a first-class
crate, colibri-vault, sitting beside colibri-spawner / colibri-store:
- in: a tenant id (→ a bucket) + a target jail/home
- out: a
0600.envmaterialized inside the jail, owned by the jail user - wraps the
bwCLI for now (do not reimplement the Bitwarden protocol), fail-closed, idempotent, no-op when there is no bucket
It stops being "a thing you run" and becomes "a thing the hive does to you when you join."
2. [PLANNED] "Join the hive" = one composed step
spawn jail → colibri-vault provision → register-agent
(spawner,LIVE) (new crate, PLANNED) (LIVE)
The first and third primitives already exist. Vault-provision is the missing limb
between an empty Bastille jail and a participating hive member. Once secrets land and the
agent registers its capabilities, everything else — capability routing, poll/worker loop,
the cross-host bridge — is already live (see CAPABILITY-ROUTING.md).
3. The mapping (decided)
tenant_id == Bastille jail name == Vaultwarden bucket, 1:1:1. One row in
colibri-store: (tenant_id, jail, collection_id, status, created_at). No more
indirection than that.
On "folder vs bucket":
- Folders are personal-vault organization → fine for Clawdie's own internal agents.
- Organization + Collections give access-scoped isolation → the multi-tenant primitive. One customer = one Collection; a scoped credential reads only that collection.
- Do not run a separate Vaultwarden instance per customer — Collections are exactly this feature.
4. The "one key" ideal — actually two ones
- Customer's one key: a single provider key in their bucket. OpenRouter is the exemplar (one key → every model), but a single direct-provider key works too — DeepSeek alone is the currently validated single-key case. The point is one secret per tenant.
- Operator's one key: the Vaultwarden org service-account credential, held only on the orchestrator, that can read any tenant collection to provision jails.
Everything non-secret — harness, base config, model-routing prefs — ships in the clawdie-iso image. The image is the body; the bucket is the one private nerve.
5. [PLANNED] The mother skill
The genesis routine every image carries — the one skill that turns a jail into an agent:
mother := resolve-identity (layered-soul)
∘ acquire-secrets (colibri-vault)
∘ register (colibri capabilities)
∘ heartbeat / poll
- Narrow: onboarding — births one working agent from a bare jail.
- Wide: self-replication. An agent that holds the mother skill can spawn and
provision more jails (a queen births workers, each inheriting the mother skill), gated
by capability/policy so it cannot run away. That is "agent swarms with a mother skill,"
and
colibri-vaultis how each birth gets its one nerve.
osa/FreeBSD/Bastille is the natural womb — cheap, dense, isolated jails.
6. The product, and the moat
A customer pastes one key → gets a private agent in an isolated jail → that lean agent transparently borrows the whole multi-OS swarm's capabilities via the routing already shipped.
A one-key agent on osa needs image-render? It routes to a Linux lane (domedog). Needs a
build? Routes to a capable host. The customer pays for one agent but stands on a
survivable, multi-OS hive. Anyone can run an LLM in a container; few hand you a swarm
behind one key — capability routing is the differentiator.
- osa = the tenant-jail host (the hive body, dense Bastille jails)
- debby / domedog = capability lanes (specialized organs)
- Vaultwarden = per-tenant nerve store
- clawdie-iso = the shared body every jail boots from
7. The security invariant (non-negotiable)
Bootstraps live on the host; jails hold only their resolved secrets.
- The orchestrator holds the org service-account credential. It fetches a tenant's
collection, writes the resolved
.envinto the jail, and the bootstrap never enters the jail. A compromised jail cannot re-fetch and cannot reach another tenant. - Per-tenant blast radius = one collection. Scoped credential, never a master.
- This is the same shape the domedog smoke test validated (bootstrap on host,
.envis the output) — just made multi-tenant.
8. [PLANNED] Lean MVP — and what NOT to build yet
Smallest path that is real:
colibri-vaultcrate — liftclawdie-vault-fetchinto Rust (lib + CLI), fetch a named collection → jail.env. Retire the shell helper.tenantsrow incolibri-store— the 1:1:1 map.- Spawner hook — call vault-provision right after jail create.
motherskill in layered-soul — the genesis sequence above.
First-proof policy. The first proven end-to-end runs against a scratch jail + a
throwaway test collection only — no real tenant data until the path hardening lands
(canonicalize + allowed-root containment, colibri issue #92). The former first-proof
blockers — colibri #88 (resolve the collection by name) and #89 (per-call unlock)
— are resolved on main; the remaining first-proof step is the operator-run scratch
runbook. #92 is hardening that follows before real tenant data.
Overengineering traps to avoid for now: a custom Bitwarden web UI (Vaultwarden's own UI plus a Collection is enough to start), billing/metering, a native Bitwarden protocol in Rust, multi-region control plane, and recursive auto-spawn (gate it off until policy exists). Those are product layers; the four steps above are the engine.
See CAPABILITY-ROUTING.md for the routing layer the moat rests
on, MCP-INTEGRATION.md for the board interface, and
../AGENTS.md for the agent matrix.