Add implementation review to live GUI proposal; escalate bridge naming bug (Sam & Claude)

This commit is contained in:
Sam & Claude 2026-05-12 18:02:16 +02:00 committed by 123kupola
parent f2c9123a88
commit e690b49d27
2 changed files with 125 additions and 5 deletions

View file

@ -65,7 +65,7 @@ debug issues visually, and verify behavior — all without exposing VNC to the p
`shell-pf.sh` runs during firstboot and:
1. **Detects ext_if** via `route -n get default` — no hardcoded interface names
2. **Creates agent bridge** named `${ASSISTANT_NAME}0` (e.g., `clawdie0`) at `192.168.100.1/24`
2. **Creates agent bridge** — currently `${ASSISTANT_NAME}0` (e.g., `clawdie0`), **must be `warden0`** (see alignment note below)
3. **Writes `/etc/pf.conf`** with block-all default, SSH protection, jail NAT
4. **Installs `pf_reload`** rc.d service — see cold boot race below
5. **Enables PF** via rc.conf
@ -74,10 +74,17 @@ debug issues visually, and verify behavior — all without exposing VNC to the p
## Agent bridge naming
> **Alignment note:** Clawdie-AI `AGENTS.md` mandates `warden0` as the canonical
> host bridge name. The per-agent naming below (`clawdie0`, `natasha0`) predates
> that decision. `shell-pf.sh` should be updated to use `warden0` to match the
> runtime expectation. The table below reflects current behavior, not the target.
> **CRITICAL ALIGNMENT BUG:** Clawdie-AI `src/jail-config.ts` expects the host
> bridge to be named `warden0`. `shell-pf.sh:32` currently creates
> `${ASSISTANT_NAME:-clawdie}0` instead. **Jails will fail to route on first
> boot** because the bridge name doesn't match what the runtime looks up.
>
> **Fix:** Change `shell-pf.sh:32` to `local BRIDGE="warden0"`. Single-tenant
> uses `warden0`. Multi-tenant adds `warden1`, etc. This matches `AGENTS.md`
> bridge naming convention and `shell-env.sh:198-200` which already writes
> `WARDEN_*` vars.
>
> The table below reflects current (broken) behavior, not the target.
The bridge interface is named after the agent: `${ASSISTANT_NAME}0`.

View file

@ -358,6 +358,116 @@ before the next step is merged.
- bhyve remains the primary repeatable regression path for first-boot testing;
real-hardware validation is still required, but operator-cycle dependent.
## Phase 1 Implementation Review (12.maj.2026 — Claude, Linux)
Phase 1 is implemented (commits `cc40f48`..`94e9958`). The core flow works:
live USB boots into Lumina, QML installer collects identity/network/access,
writes two-file handoff, invokes `bsdinstall script`, firstboot converges on
HDD boot. Setup token is rotated by `shell-deploy.sh` and written to
`/var/db/clawdie-installer/setup-token` with SSH tunnel instructions in
`/etc/motd`.
### Implementation status vs. proposal
| Proposal step | Status | Notes |
| ------------------------------------------ | ---------- | ---------------------------------------------------------------- |
| 1. Live image boots into installer session | Done | LightDM autologin → xprofile → QML |
| 2. QML installer as primary frontend | Done | Collects identity/network/access only |
| 3. Live validation in QML | Partial | Disk selection validates; no Tailscale/SSH key format validation |
| 4. Two-file handoff model | Done | `live-install.env` + `clawdie-handoff.sealed` |
| 5. Commit step | Done | `clawdie-live-commit.sh``bsdinstall script` |
| 6. Thin first HDD boot converge | Done | `firstboot.sh` sources handoff, runs shell modules |
| Acceptance criteria | Mostly met | See gaps below |
### Gaps found in implementation
#### Critical: bridge naming mismatch
**File:** `firstboot/shell-pf.sh:32`
```sh
local BRIDGE="${ASSISTANT_NAME:-clawdie}0"
```
Creates `clawdie0` bridge. Clawdie-AI `src/jail-config.ts` expects `warden0`.
`shell-env.sh` correctly writes `WARDEN_*` vars but PF creates the wrong bridge
name. **Jails will fail to route on first boot.**
**Fix:** `shell-pf.sh:32``local BRIDGE="warden0"`
This is the most critical remaining cross-repo misalignment. See `NETWORKING.md`
which already documents this as a known issue.
#### High: live path drops feature flags
**File:** `firstboot/gui/qml-installer/main.cpp:445-510`
The live-session `startInstall()` path does not write `FEATURE_DESKTOP`,
`FEATURE_DEVTOOLS`, `FEATURE_NVIDIA`, or `LOCAL_LLM_PROVIDER`. Only the
non-live fallback path (lines 528-535) writes them.
`firstboot.sh` dispatches `run_step_if` for GPU, SSH, system, desktop, and PF
modules. An empty `FEATURE_DESKTOP` is neither `"YES"` nor `"NO"` — behavior
depends on how each module handles unset vars.
**Fix:** Add feature flag writes to the live handoff path. At minimum, write
safe defaults:
```
FEATURE_DESKTOP="NO"
FEATURE_DEVTOOLS="NO"
FEATURE_NVIDIA="NO"
LOCAL_LLM_PROVIDER="none"
```
Or add feature toggle UI to the QML installer (Phase 2).
#### High: GPU_DRIVER_VERSION only in non-live path
The non-live path writes `GPU_DRIVER_VERSION` (line 530). The live path does
not. `shell-gpu.sh` auto-detects via `pciconf`, so this likely works in
practice, but the two paths are inconsistent.
#### Medium: CompletePage refers to setup token that doesn't exist yet
**File:** `firstboot/gui/qml-installer/pages/CompletePage.qml:85`
Tells the operator the setup token "lives" at `/var/db/clawdie-installer/setup-token`.
But the token is only written during `firstboot.sh` on first HDD boot, after
the controlplane starts. During the live session, the file doesn't exist.
**Fix:** Update text to say the token will be available after first boot, or
remove the reference and rely on `/etc/motd`.
#### Medium: hardcoded Slovenian locale
**File:** `firstboot/gui/qml-installer/main.cpp:465-469`
Live handoff always writes `SYSTEM_LOCALE=sl_SI.UTF-8` and `KEYMAP=sl.kbd`.
This is correct for the developer but wrong for other operators. The old shell
wizard asked for locale/keymap interactively.
**Fix (short-term):** Default to `en_US.UTF-8` / `us.kbd`.
**Fix (Phase 2):** Add locale/keymap selection page to QML installer.
#### Medium: stale README.md
`README.md:79-88` still describes the old manual `bsdinstall` + wizard flow.
The live installer auto-boots into QML. Operators never see a shell wizard for
baremetal installs.
#### Low: stale handoff doc
`HANDOFF-v1.0.0-RELEASE.md` references old Phase 4 architecture and file paths.
Should be deleted per AGENTS.md handoff lifecycle rules.
#### Low: `local` keyword in "POSIX-compliant" scripts
`shell-pf.sh` and `shell-deploy.sh` claim POSIX compliance but use `local`.
Works on FreeBSD `/bin/sh` but the claim is misleading.
---
## Recommendation
Ship Phase 1 now.
@ -365,3 +475,6 @@ Ship Phase 1 now.
It gives the project a meaningfully better install UX within the current repo
shape, keeps the most dangerous low-level work inside `bsdinstall`, and makes
Phase 2 an explicit future evolution instead of an abandoned idea.
**But fix the bridge naming first.** Without `warden0`, jail networking is
broken on every fresh install.