feat(wiki): mother-hive decisions page + strict lint gate + pre-push hook
Some checks failed
CI / rust (pull_request) Has been cancelled
CI / markdown (pull_request) Has been cancelled
CI / port (pull_request) Has been cancelled
CI / agent-jail-pkgs (pull_request) Has been cancelled

- New docs/wiki/mother-hive.md — thin decisions page covering forced-command
  SSH boundary, single-home-in-colibri, hive_nodes rationale, peer auth,
  key-on-seed, and daemon-user design. Links to MOTHER-SETUP.md for setup
  instructions; never duplicates them.
- Flip wiki-lint to --strict in ci-checks.sh — drift failures now block the
  gate the same as clippy warnings. 42 PASS / 0 FAIL, clean since merge.
- New scripts/pre-push — runs ci-checks.sh on every git push to main. Install
  once: ln -sf ../../scripts/pre-push .git/hooks/pre-push. Bypass only with
  --no-verify. Closes the gap that let pi_binary reach main (gate existed but
  nobody was forced through it).
- Updated AGENTS.md, quality-gates.md, and index.md to reflect all three.
This commit is contained in:
Sam & Claude 2026-06-24 13:26:40 +02:00
parent 6ab86275e2
commit 4f5876a7ea
6 changed files with 167 additions and 15 deletions

View file

@ -105,10 +105,13 @@ cargo build --workspace --release
**Mandatory before every merge to `main`:** run the full gate via
```sh
./scripts/ci-checks.sh # fmt --check, clippy -D warnings, test, markdown gate
./scripts/wiki-lint # wiki ledger vs. codebase drift check (dangling refs, old names, orphans)
./scripts/ci-checks.sh # fmt --check, clippy -D warnings, test, markdown gate, wiki-lint --strict
```
A `git push` to `main` also runs this gate through the pre-push hook (install
once: `ln -sf ../../scripts/pre-push .git/hooks/pre-push`). The hook rejects
the push if any gate fails; bypass only in emergencies with `--no-verify`.
`.forgejo/workflows/ci.yml` encodes the same checks, but **no Actions runner is
currently registered**, so nothing enforces them server-side. Until a runner is
active, `ci-checks.sh` passing locally is the only gate — a green run is a

View file

@ -47,10 +47,9 @@ Open drift already noted by hand:
## Pages
| Page | What it covers |
| ----------------------------------------- | ------------------------------------------------------------------------ |
| [agent-harness](./agent-harness.md) | The zot (agent) + Colibri (control plane) split; autospawn + RPC driver |
| [naming-decisions](./naming-decisions.md) | Ledger of harness-neutral / architecture renames — shipped and in-flight |
| [quality-gates](./quality-gates.md) | `ci-checks.sh` as the pre-merge gate; why drift reached `main` before |
_Pending: a `mother-hive` page once the mother MCP infra lands (colibri #161)._
| Page | What it covers |
| ----------------------------------------- | ------------------------------------------------------------------------------- |
| [agent-harness](./agent-harness.md) | The zot (agent) + Colibri (control plane) split; autospawn + RPC driver |
| [mother-hive](./mother-hive.md) | Mother MCP architecture — forced-command SSH, single-home-in-colibri, peer auth |
| [naming-decisions](./naming-decisions.md) | Ledger of harness-neutral / architecture renames — shipped and in-flight |
| [quality-gates](./quality-gates.md) | `ci-checks.sh` as the pre-merge gate; why drift reached `main` before |

110
docs/wiki/mother-hive.md Normal file
View file

@ -0,0 +1,110 @@
# Mother hive
← [index](./index.md)
## What this is
The mother node (OSA) coordinates USB operator nodes via MCP over SSH →
PostgreSQL. USB nodes send hardware profiles; mother derives capabilities and
maintains the hive registry. This page records the **decisions** behind the
implementation — the rationale the code can't express. For setup instructions,
architecture diagrams, and the first-run checklist, see
[`packaging/mother/MOTHER-SETUP.md`](../packaging/mother/MOTHER-SETUP.md).
## Decisions
### Forced-command SSH boundary (not a listening daemon)
USB nodes reach mother by spawning `ssh colibri@mother` (no remote command).
On the mother side, `authorized_keys` enforces
`command="/usr/local/bin/colibri-mcp-ssh",restrict,...` — the connection
**cannot** run an interactive shell or any command except the wrapper.
The wrapper (`colibri-mcp-ssh`) further allowlists `SSH_ORIGINAL_COMMAND` to
`""` (stdio MCP mode) or `"tools"` (one-shot discovery). Every other value is
rejected.
**Why not a listening daemon** (HTTP, gRPC, raw TCP): Tailscale encrypts the
wire, so the SSH layer adds authentication + confinement without extra
infrastructure (no TLS certs, no auth tokens, no open ports). The forced-command
boundary is a second lock on top of the SSH key — even a compromised USB that
holds the key can only invoke the wrapper, and the wrapper only delegates to
colibri-mcp. Defense in depth, deployed as one OpenSSH feature.
→ [`colibri-mcp-ssh`](../packaging/mother/colibri-mcp-ssh), [`MOTHER-SETUP.md` §Security](../packaging/mother/MOTHER-SETUP.md#security-properties)
### Single home for mother infra (colibri, not clawdie-iso)
The mother MCP scripts (`node-register-mcp`, `geodesic-dome-mcp`, etc.) were
originally copied into both repos. The clawdie-iso copy drifted — its
`node-register-mcp` used `E'${...}'` string interpolation (SQL-injectable)
while the colibri copy used parameterized `psql -v :'variable'`. The iso copy
was removed in clawdie-iso PR #129.
**Lesson**: a script in two repos **will** drift. The wiki lint is single-repo
and can't see cross-repo duplicates. The mitigation is discipline: mother infra
lives in one place.
→ [naming-decisions §Structural](./naming-decisions.md#structural-decisions) ("Single home" row)
### `hive_nodes` — not `usb_nodes`
The original table name assumed only USB-booted nodes would register. But a
node is any host that joins the hive — USB, NVMe, a jail. Renamed to
`hive_nodes` with a `node_type` column (colibri #161). The `derive_capabilities()`
trigger is table-agnostic and auto-computes `has_gpu`, `gpu_vendor`,
`can_run_local_llm`, `has_wifi`, `max_model` on INSERT.
→ [`mother_schema.sql`](../packaging/mother/mother_schema.sql),
[naming-decisions](./naming-decisions.md) (`usb_nodes → hive_nodes` row)
### PostgreSQL peer auth (no passwords)
The `colibri` OS user connects to `mother_hive` via peer authentication — the
kernel attests the Unix user, no password needed. `node-register-mcp` runs as
this user and inherits the trust. No pgpass files, no env vars, no credential
rotation. One moving part: the `pg_hba.conf` peer rule must precede any
catch-all `local all all` line (first-match).
**Why not a password or certificate**: passwords rotate and leak; certificates
need a CA. Peer auth is built into PostgreSQL on every Unix and works for a
localhost connection with zero configuration beyond one `pg_hba.conf` line.
→ [`MOTHER-SETUP.md` §Setup step 6](../packaging/mother/MOTHER-SETUP.md#setup-one-time)
### Key on seed partition, not in the image
The `mother-mcp` private key is placed on the CLAWDIESEED partition, not baked
into the ISO. The build script has a release guard that **refuses** to bake it
into a release image. The seed importer (`clawdie-live-seed`) installs it at
boot time.
**Why**: a release ISO is a downloadable artifact. Baking a private key into it
would give every downloader access to the mother MCP. The seed partition is a
separate physical medium that the operator controls. Even without a seed, the
ISO boots and runs — the daemon's external MCP connection to mother fails
gracefully (SSH: "config file not found"), and the node operates standalone.
→ [naming-decisions](./naming-decisions.md) ("Known residue"), clawdie-iso #133
### Daemon user, not operator
The colibri daemon runs as the `colibri` user (`/var/db/colibri`), not as the
operator (`clawdie`, `/home/clawdie`). The external MCP SSH connection to mother
is spawned by the daemon — so the SSH key, config, and known_hosts must be in
the daemon's home. The seed importer installs SSH material to **both** homes
(operator + daemon).
**Why not just put it in clawdie's home and `sudo`**: the daemon is not the
operator. Running as a separate user means the blast radius of a daemon
compromise is limited to what the `colibri` user can do — MCP calls to mother,
not operator files or `sudo`.
→ [`clawdie-live-seed` (clawdie-iso)](https://code.smilepowered.org/clawdie/clawdie-iso/src/branch/main/live/operator-session/clawdie-live-seed),
[`MOTHER-SETUP.md` §Key management](../packaging/mother/MOTHER-SETUP.md#key-management)
## See also
- [agent-harness](./agent-harness.md) — the zot/Colibri split; autospawn
- [naming-decisions](./naming-decisions.md) — `usb_nodes → hive_nodes`, `AUTOSPAWN_PI → AUTOSPAWN`
- [quality-gates](./quality-gates.md) — the gate that should catch drift at PR time

View file

@ -7,13 +7,18 @@
A change is not "done" until the gate passes locally:
```sh
./scripts/ci-checks.sh # cargo fmt --check, clippy -D warnings, cargo test, markdown gate
./scripts/ci-checks.sh # cargo fmt --check, clippy -D warnings, cargo test, markdown gate, wiki-lint --strict
```
The pre-push hook (`scripts/pre-push`) runs this same gate on every `git push`
to `main` — install once with `ln -sf ../../scripts/pre-push .git/hooks/pre-push`.
The hook rejects the push if any gate fails; bypass only in emergencies with
`--no-verify`.
`.forgejo/workflows/ci.yml` encodes the same checks, but **no Forgejo Actions
runner is registered**, so nothing enforces them server-side. Until a runner is
active, `ci-checks.sh` passing locally is the only gate. Stated as mandatory in
`AGENTS.md`.
active, the local gate + pre-push hook are the enforcement layer. Stated as
mandatory in `AGENTS.md`.
## Why this page exists
@ -31,10 +36,11 @@ gate nobody runs (and that's red anyway) is the root cause of drift reaching
## Relationship to this wiki
The [naming-decisions](./naming-decisions.md) ledger + a future `lint` pass are
The [naming-decisions](./naming-decisions.md) ledger + `wiki-lint --strict` are
the _semantic_ counterpart to `ci-checks.sh`: the compiler/clippy catch broken
_code_, but not a doc that still describes the old design or a name that drifted.
The wiki lint is meant to cover that gap (advisory first).
The wiki lint covers that gap. It is now part of the mandatory gate — a drift
failure blocks the push, same as a clippy warning.
## See also

View file

@ -3,7 +3,8 @@
#
# ./scripts/ci-checks.sh
#
# Gates: rustfmt, clippy (warnings = errors), workspace tests, markdown format.
# Gates: rustfmt, clippy (warnings = errors), workspace tests, markdown format,
# wiki-lint (dangling refs, resurrected old names, orphan pages).
set -eu
@ -22,4 +23,7 @@ cargo test --workspace
echo "==> markdown format gate"
./scripts/check-format.sh
echo "==> wiki-lint --strict"
./scripts/wiki-lint --strict
echo "All checks passed."

30
scripts/pre-push Executable file
View file

@ -0,0 +1,30 @@
#!/bin/sh
# Pre-push hook — run the full gate before allowing a push to main.
#
# Install: ln -sf ../../scripts/pre-push .git/hooks/pre-push
#
# This runs the same checks as ci-checks.sh + wiki-lint --strict.
# If either fails, the push is rejected. The gate is deterministic,
# has no network calls, and completes in under 2 minutes on a warm build.
#
# Bypass (emergency only): git push --no-verify
set -eu
REPO_ROOT="$(git rev-parse --show-toplevel)"
echo "=== pre-push gate ==="
echo ""
cd "$REPO_ROOT"
# Full CI gate (fmt, clippy, test, markdown, wiki-lint --strict)
if ! ./scripts/ci-checks.sh; then
echo ""
echo "PRE-PUSH REJECTED: ci-checks.sh failed."
echo "Fix the failures above, or push with --no-verify (emergency only)."
exit 1
fi
echo ""
echo "=== pre-push gate: PASS ==="