diff --git a/AGENTS.md b/AGENTS.md index d08c25f..ee94d42 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -105,10 +105,13 @@ cargo build --workspace --release **Mandatory before every merge to `main`:** run the full gate via ```sh -./scripts/ci-checks.sh # fmt --check, clippy -D warnings, test, markdown gate -./scripts/wiki-lint # wiki ledger vs. codebase drift check (dangling refs, old names, orphans) +./scripts/ci-checks.sh # fmt --check, clippy -D warnings, test, markdown gate, wiki-lint --strict ``` +A `git push` to `main` also runs this gate through the pre-push hook (install +once: `ln -sf ../../scripts/pre-push .git/hooks/pre-push`). The hook rejects +the push if any gate fails; bypass only in emergencies with `--no-verify`. + `.forgejo/workflows/ci.yml` encodes the same checks, but **no Actions runner is currently registered**, so nothing enforces them server-side. Until a runner is active, `ci-checks.sh` passing locally is the only gate — a green run is a diff --git a/docs/wiki/index.md b/docs/wiki/index.md index 0f5e502..cba5e28 100644 --- a/docs/wiki/index.md +++ b/docs/wiki/index.md @@ -47,10 +47,9 @@ Open drift already noted by hand: ## Pages -| Page | What it covers | -| ----------------------------------------- | ------------------------------------------------------------------------ | -| [agent-harness](./agent-harness.md) | The zot (agent) + Colibri (control plane) split; autospawn + RPC driver | -| [naming-decisions](./naming-decisions.md) | Ledger of harness-neutral / architecture renames — shipped and in-flight | -| [quality-gates](./quality-gates.md) | `ci-checks.sh` as the pre-merge gate; why drift reached `main` before | - -_Pending: a `mother-hive` page once the mother MCP infra lands (colibri #161)._ +| Page | What it covers | +| ----------------------------------------- | ------------------------------------------------------------------------------- | +| [agent-harness](./agent-harness.md) | The zot (agent) + Colibri (control plane) split; autospawn + RPC driver | +| [mother-hive](./mother-hive.md) | Mother MCP architecture — forced-command SSH, single-home-in-colibri, peer auth | +| [naming-decisions](./naming-decisions.md) | Ledger of harness-neutral / architecture renames — shipped and in-flight | +| [quality-gates](./quality-gates.md) | `ci-checks.sh` as the pre-merge gate; why drift reached `main` before | diff --git a/docs/wiki/mother-hive.md b/docs/wiki/mother-hive.md new file mode 100644 index 0000000..5838f8c --- /dev/null +++ b/docs/wiki/mother-hive.md @@ -0,0 +1,110 @@ +# Mother hive + +← [index](./index.md) + +## What this is + +The mother node (OSA) coordinates USB operator nodes via MCP over SSH → +PostgreSQL. USB nodes send hardware profiles; mother derives capabilities and +maintains the hive registry. This page records the **decisions** behind the +implementation — the rationale the code can't express. For setup instructions, +architecture diagrams, and the first-run checklist, see +[`packaging/mother/MOTHER-SETUP.md`](../packaging/mother/MOTHER-SETUP.md). + +## Decisions + +### Forced-command SSH boundary (not a listening daemon) + +USB nodes reach mother by spawning `ssh colibri@mother` (no remote command). +On the mother side, `authorized_keys` enforces +`command="/usr/local/bin/colibri-mcp-ssh",restrict,...` — the connection +**cannot** run an interactive shell or any command except the wrapper. + +The wrapper (`colibri-mcp-ssh`) further allowlists `SSH_ORIGINAL_COMMAND` to +`""` (stdio MCP mode) or `"tools"` (one-shot discovery). Every other value is +rejected. + +**Why not a listening daemon** (HTTP, gRPC, raw TCP): Tailscale encrypts the +wire, so the SSH layer adds authentication + confinement without extra +infrastructure (no TLS certs, no auth tokens, no open ports). The forced-command +boundary is a second lock on top of the SSH key — even a compromised USB that +holds the key can only invoke the wrapper, and the wrapper only delegates to +colibri-mcp. Defense in depth, deployed as one OpenSSH feature. + +→ [`colibri-mcp-ssh`](../packaging/mother/colibri-mcp-ssh), [`MOTHER-SETUP.md` §Security](../packaging/mother/MOTHER-SETUP.md#security-properties) + +### Single home for mother infra (colibri, not clawdie-iso) + +The mother MCP scripts (`node-register-mcp`, `geodesic-dome-mcp`, etc.) were +originally copied into both repos. The clawdie-iso copy drifted — its +`node-register-mcp` used `E'${...}'` string interpolation (SQL-injectable) +while the colibri copy used parameterized `psql -v :'variable'`. The iso copy +was removed in clawdie-iso PR #129. + +**Lesson**: a script in two repos **will** drift. The wiki lint is single-repo +and can't see cross-repo duplicates. The mitigation is discipline: mother infra +lives in one place. + +→ [naming-decisions §Structural](./naming-decisions.md#structural-decisions) ("Single home" row) + +### `hive_nodes` — not `usb_nodes` + +The original table name assumed only USB-booted nodes would register. But a +node is any host that joins the hive — USB, NVMe, a jail. Renamed to +`hive_nodes` with a `node_type` column (colibri #161). The `derive_capabilities()` +trigger is table-agnostic and auto-computes `has_gpu`, `gpu_vendor`, +`can_run_local_llm`, `has_wifi`, `max_model` on INSERT. + +→ [`mother_schema.sql`](../packaging/mother/mother_schema.sql), +[naming-decisions](./naming-decisions.md) (`usb_nodes → hive_nodes` row) + +### PostgreSQL peer auth (no passwords) + +The `colibri` OS user connects to `mother_hive` via peer authentication — the +kernel attests the Unix user, no password needed. `node-register-mcp` runs as +this user and inherits the trust. No pgpass files, no env vars, no credential +rotation. One moving part: the `pg_hba.conf` peer rule must precede any +catch-all `local all all` line (first-match). + +**Why not a password or certificate**: passwords rotate and leak; certificates +need a CA. Peer auth is built into PostgreSQL on every Unix and works for a +localhost connection with zero configuration beyond one `pg_hba.conf` line. + +→ [`MOTHER-SETUP.md` §Setup step 6](../packaging/mother/MOTHER-SETUP.md#setup-one-time) + +### Key on seed partition, not in the image + +The `mother-mcp` private key is placed on the CLAWDIESEED partition, not baked +into the ISO. The build script has a release guard that **refuses** to bake it +into a release image. The seed importer (`clawdie-live-seed`) installs it at +boot time. + +**Why**: a release ISO is a downloadable artifact. Baking a private key into it +would give every downloader access to the mother MCP. The seed partition is a +separate physical medium that the operator controls. Even without a seed, the +ISO boots and runs — the daemon's external MCP connection to mother fails +gracefully (SSH: "config file not found"), and the node operates standalone. + +→ [naming-decisions](./naming-decisions.md) ("Known residue"), clawdie-iso #133 + +### Daemon user, not operator + +The colibri daemon runs as the `colibri` user (`/var/db/colibri`), not as the +operator (`clawdie`, `/home/clawdie`). The external MCP SSH connection to mother +is spawned by the daemon — so the SSH key, config, and known_hosts must be in +the daemon's home. The seed importer installs SSH material to **both** homes +(operator + daemon). + +**Why not just put it in clawdie's home and `sudo`**: the daemon is not the +operator. Running as a separate user means the blast radius of a daemon +compromise is limited to what the `colibri` user can do — MCP calls to mother, +not operator files or `sudo`. + +→ [`clawdie-live-seed` (clawdie-iso)](https://code.smilepowered.org/clawdie/clawdie-iso/src/branch/main/live/operator-session/clawdie-live-seed), +[`MOTHER-SETUP.md` §Key management](../packaging/mother/MOTHER-SETUP.md#key-management) + +## See also + +- [agent-harness](./agent-harness.md) — the zot/Colibri split; autospawn +- [naming-decisions](./naming-decisions.md) — `usb_nodes → hive_nodes`, `AUTOSPAWN_PI → AUTOSPAWN` +- [quality-gates](./quality-gates.md) — the gate that should catch drift at PR time diff --git a/docs/wiki/quality-gates.md b/docs/wiki/quality-gates.md index 353992c..26a91a5 100644 --- a/docs/wiki/quality-gates.md +++ b/docs/wiki/quality-gates.md @@ -7,13 +7,18 @@ A change is not "done" until the gate passes locally: ```sh -./scripts/ci-checks.sh # cargo fmt --check, clippy -D warnings, cargo test, markdown gate +./scripts/ci-checks.sh # cargo fmt --check, clippy -D warnings, cargo test, markdown gate, wiki-lint --strict ``` +The pre-push hook (`scripts/pre-push`) runs this same gate on every `git push` +to `main` — install once with `ln -sf ../../scripts/pre-push .git/hooks/pre-push`. +The hook rejects the push if any gate fails; bypass only in emergencies with +`--no-verify`. + `.forgejo/workflows/ci.yml` encodes the same checks, but **no Forgejo Actions runner is registered**, so nothing enforces them server-side. Until a runner is -active, `ci-checks.sh` passing locally is the only gate. Stated as mandatory in -`AGENTS.md`. +active, the local gate + pre-push hook are the enforcement layer. Stated as +mandatory in `AGENTS.md`. ## Why this page exists @@ -31,10 +36,11 @@ gate nobody runs (and that's red anyway) is the root cause of drift reaching ## Relationship to this wiki -The [naming-decisions](./naming-decisions.md) ledger + a future `lint` pass are +The [naming-decisions](./naming-decisions.md) ledger + `wiki-lint --strict` are the _semantic_ counterpart to `ci-checks.sh`: the compiler/clippy catch broken _code_, but not a doc that still describes the old design or a name that drifted. -The wiki lint is meant to cover that gap (advisory first). +The wiki lint covers that gap. It is now part of the mandatory gate — a drift +failure blocks the push, same as a clippy warning. ## See also diff --git a/scripts/ci-checks.sh b/scripts/ci-checks.sh index bf33164..5a9e874 100755 --- a/scripts/ci-checks.sh +++ b/scripts/ci-checks.sh @@ -3,7 +3,8 @@ # # ./scripts/ci-checks.sh # -# Gates: rustfmt, clippy (warnings = errors), workspace tests, markdown format. +# Gates: rustfmt, clippy (warnings = errors), workspace tests, markdown format, +# wiki-lint (dangling refs, resurrected old names, orphan pages). set -eu @@ -22,4 +23,7 @@ cargo test --workspace echo "==> markdown format gate" ./scripts/check-format.sh +echo "==> wiki-lint --strict" +./scripts/wiki-lint --strict + echo "All checks passed." diff --git a/scripts/pre-push b/scripts/pre-push new file mode 100755 index 0000000..eac7c09 --- /dev/null +++ b/scripts/pre-push @@ -0,0 +1,30 @@ +#!/bin/sh +# Pre-push hook — run the full gate before allowing a push to main. +# +# Install: ln -sf ../../scripts/pre-push .git/hooks/pre-push +# +# This runs the same checks as ci-checks.sh + wiki-lint --strict. +# If either fails, the push is rejected. The gate is deterministic, +# has no network calls, and completes in under 2 minutes on a warm build. +# +# Bypass (emergency only): git push --no-verify + +set -eu + +REPO_ROOT="$(git rev-parse --show-toplevel)" + +echo "=== pre-push gate ===" +echo "" + +cd "$REPO_ROOT" + +# Full CI gate (fmt, clippy, test, markdown, wiki-lint --strict) +if ! ./scripts/ci-checks.sh; then + echo "" + echo "PRE-PUSH REJECTED: ci-checks.sh failed." + echo "Fix the failures above, or push with --no-verify (emergency only)." + exit 1 +fi + +echo "" +echo "=== pre-push gate: PASS ==="