From e690b49d2737ecb89a081db4e904f575c2cd6b19 Mon Sep 17 00:00:00 2001 From: Sam & Claude Date: Tue, 12 May 2026 18:02:16 +0200 Subject: [PATCH] Add implementation review to live GUI proposal; escalate bridge naming bug (Sam & Claude) --- NETWORKING.md | 17 +++-- doc/ISO-LIVE-GUI-SETUP-PROPOSAL.md | 113 +++++++++++++++++++++++++++++ 2 files changed, 125 insertions(+), 5 deletions(-) diff --git a/NETWORKING.md b/NETWORKING.md index 1275da16..4c812b04 100644 --- a/NETWORKING.md +++ b/NETWORKING.md @@ -65,7 +65,7 @@ debug issues visually, and verify behavior — all without exposing VNC to the p `shell-pf.sh` runs during firstboot and: 1. **Detects ext_if** via `route -n get default` — no hardcoded interface names -2. **Creates agent bridge** named `${ASSISTANT_NAME}0` (e.g., `clawdie0`) at `192.168.100.1/24` +2. **Creates agent bridge** — currently `${ASSISTANT_NAME}0` (e.g., `clawdie0`), **must be `warden0`** (see alignment note below) 3. **Writes `/etc/pf.conf`** with block-all default, SSH protection, jail NAT 4. **Installs `pf_reload`** rc.d service — see cold boot race below 5. **Enables PF** via rc.conf @@ -74,10 +74,17 @@ debug issues visually, and verify behavior — all without exposing VNC to the p ## Agent bridge naming -> **Alignment note:** Clawdie-AI `AGENTS.md` mandates `warden0` as the canonical -> host bridge name. The per-agent naming below (`clawdie0`, `natasha0`) predates -> that decision. `shell-pf.sh` should be updated to use `warden0` to match the -> runtime expectation. The table below reflects current behavior, not the target. +> **CRITICAL ALIGNMENT BUG:** Clawdie-AI `src/jail-config.ts` expects the host +> bridge to be named `warden0`. `shell-pf.sh:32` currently creates +> `${ASSISTANT_NAME:-clawdie}0` instead. **Jails will fail to route on first +> boot** because the bridge name doesn't match what the runtime looks up. +> +> **Fix:** Change `shell-pf.sh:32` to `local BRIDGE="warden0"`. Single-tenant +> uses `warden0`. Multi-tenant adds `warden1`, etc. This matches `AGENTS.md` +> bridge naming convention and `shell-env.sh:198-200` which already writes +> `WARDEN_*` vars. +> +> The table below reflects current (broken) behavior, not the target. The bridge interface is named after the agent: `${ASSISTANT_NAME}0`. diff --git a/doc/ISO-LIVE-GUI-SETUP-PROPOSAL.md b/doc/ISO-LIVE-GUI-SETUP-PROPOSAL.md index dfada62f..d1599c54 100644 --- a/doc/ISO-LIVE-GUI-SETUP-PROPOSAL.md +++ b/doc/ISO-LIVE-GUI-SETUP-PROPOSAL.md @@ -358,6 +358,116 @@ before the next step is merged. - bhyve remains the primary repeatable regression path for first-boot testing; real-hardware validation is still required, but operator-cycle dependent. +## Phase 1 Implementation Review (12.maj.2026 — Claude, Linux) + +Phase 1 is implemented (commits `cc40f48`..`94e9958`). The core flow works: +live USB boots into Lumina, QML installer collects identity/network/access, +writes two-file handoff, invokes `bsdinstall script`, firstboot converges on +HDD boot. Setup token is rotated by `shell-deploy.sh` and written to +`/var/db/clawdie-installer/setup-token` with SSH tunnel instructions in +`/etc/motd`. + +### Implementation status vs. proposal + +| Proposal step | Status | Notes | +| ------------------------------------------ | ---------- | ---------------------------------------------------------------- | +| 1. Live image boots into installer session | Done | LightDM autologin → xprofile → QML | +| 2. QML installer as primary frontend | Done | Collects identity/network/access only | +| 3. Live validation in QML | Partial | Disk selection validates; no Tailscale/SSH key format validation | +| 4. Two-file handoff model | Done | `live-install.env` + `clawdie-handoff.sealed` | +| 5. Commit step | Done | `clawdie-live-commit.sh` → `bsdinstall script` | +| 6. Thin first HDD boot converge | Done | `firstboot.sh` sources handoff, runs shell modules | +| Acceptance criteria | Mostly met | See gaps below | + +### Gaps found in implementation + +#### Critical: bridge naming mismatch + +**File:** `firstboot/shell-pf.sh:32` + +```sh +local BRIDGE="${ASSISTANT_NAME:-clawdie}0" +``` + +Creates `clawdie0` bridge. Clawdie-AI `src/jail-config.ts` expects `warden0`. +`shell-env.sh` correctly writes `WARDEN_*` vars but PF creates the wrong bridge +name. **Jails will fail to route on first boot.** + +**Fix:** `shell-pf.sh:32` → `local BRIDGE="warden0"` + +This is the most critical remaining cross-repo misalignment. See `NETWORKING.md` +which already documents this as a known issue. + +#### High: live path drops feature flags + +**File:** `firstboot/gui/qml-installer/main.cpp:445-510` + +The live-session `startInstall()` path does not write `FEATURE_DESKTOP`, +`FEATURE_DEVTOOLS`, `FEATURE_NVIDIA`, or `LOCAL_LLM_PROVIDER`. Only the +non-live fallback path (lines 528-535) writes them. + +`firstboot.sh` dispatches `run_step_if` for GPU, SSH, system, desktop, and PF +modules. An empty `FEATURE_DESKTOP` is neither `"YES"` nor `"NO"` — behavior +depends on how each module handles unset vars. + +**Fix:** Add feature flag writes to the live handoff path. At minimum, write +safe defaults: + +``` +FEATURE_DESKTOP="NO" +FEATURE_DEVTOOLS="NO" +FEATURE_NVIDIA="NO" +LOCAL_LLM_PROVIDER="none" +``` + +Or add feature toggle UI to the QML installer (Phase 2). + +#### High: GPU_DRIVER_VERSION only in non-live path + +The non-live path writes `GPU_DRIVER_VERSION` (line 530). The live path does +not. `shell-gpu.sh` auto-detects via `pciconf`, so this likely works in +practice, but the two paths are inconsistent. + +#### Medium: CompletePage refers to setup token that doesn't exist yet + +**File:** `firstboot/gui/qml-installer/pages/CompletePage.qml:85` + +Tells the operator the setup token "lives" at `/var/db/clawdie-installer/setup-token`. +But the token is only written during `firstboot.sh` on first HDD boot, after +the controlplane starts. During the live session, the file doesn't exist. + +**Fix:** Update text to say the token will be available after first boot, or +remove the reference and rely on `/etc/motd`. + +#### Medium: hardcoded Slovenian locale + +**File:** `firstboot/gui/qml-installer/main.cpp:465-469` + +Live handoff always writes `SYSTEM_LOCALE=sl_SI.UTF-8` and `KEYMAP=sl.kbd`. +This is correct for the developer but wrong for other operators. The old shell +wizard asked for locale/keymap interactively. + +**Fix (short-term):** Default to `en_US.UTF-8` / `us.kbd`. +**Fix (Phase 2):** Add locale/keymap selection page to QML installer. + +#### Medium: stale README.md + +`README.md:79-88` still describes the old manual `bsdinstall` + wizard flow. +The live installer auto-boots into QML. Operators never see a shell wizard for +baremetal installs. + +#### Low: stale handoff doc + +`HANDOFF-v1.0.0-RELEASE.md` references old Phase 4 architecture and file paths. +Should be deleted per AGENTS.md handoff lifecycle rules. + +#### Low: `local` keyword in "POSIX-compliant" scripts + +`shell-pf.sh` and `shell-deploy.sh` claim POSIX compliance but use `local`. +Works on FreeBSD `/bin/sh` but the claim is misleading. + +--- + ## Recommendation Ship Phase 1 now. @@ -365,3 +475,6 @@ Ship Phase 1 now. It gives the project a meaningfully better install UX within the current repo shape, keeps the most dangerous low-level work inside `bsdinstall`, and makes Phase 2 an explicit future evolution instead of an abandoned idea. + +**But fix the bridge naming first.** Without `warden0`, jail networking is +broken on every fresh install.