Add implementation review to live GUI proposal; escalate bridge naming bug (Sam & Claude)
This commit is contained in:
parent
f2c9123a88
commit
e690b49d27
2 changed files with 125 additions and 5 deletions
|
|
@ -65,7 +65,7 @@ debug issues visually, and verify behavior — all without exposing VNC to the p
|
|||
`shell-pf.sh` runs during firstboot and:
|
||||
|
||||
1. **Detects ext_if** via `route -n get default` — no hardcoded interface names
|
||||
2. **Creates agent bridge** named `${ASSISTANT_NAME}0` (e.g., `clawdie0`) at `192.168.100.1/24`
|
||||
2. **Creates agent bridge** — currently `${ASSISTANT_NAME}0` (e.g., `clawdie0`), **must be `warden0`** (see alignment note below)
|
||||
3. **Writes `/etc/pf.conf`** with block-all default, SSH protection, jail NAT
|
||||
4. **Installs `pf_reload`** rc.d service — see cold boot race below
|
||||
5. **Enables PF** via rc.conf
|
||||
|
|
@ -74,10 +74,17 @@ debug issues visually, and verify behavior — all without exposing VNC to the p
|
|||
|
||||
## Agent bridge naming
|
||||
|
||||
> **Alignment note:** Clawdie-AI `AGENTS.md` mandates `warden0` as the canonical
|
||||
> host bridge name. The per-agent naming below (`clawdie0`, `natasha0`) predates
|
||||
> that decision. `shell-pf.sh` should be updated to use `warden0` to match the
|
||||
> runtime expectation. The table below reflects current behavior, not the target.
|
||||
> **CRITICAL ALIGNMENT BUG:** Clawdie-AI `src/jail-config.ts` expects the host
|
||||
> bridge to be named `warden0`. `shell-pf.sh:32` currently creates
|
||||
> `${ASSISTANT_NAME:-clawdie}0` instead. **Jails will fail to route on first
|
||||
> boot** because the bridge name doesn't match what the runtime looks up.
|
||||
>
|
||||
> **Fix:** Change `shell-pf.sh:32` to `local BRIDGE="warden0"`. Single-tenant
|
||||
> uses `warden0`. Multi-tenant adds `warden1`, etc. This matches `AGENTS.md`
|
||||
> bridge naming convention and `shell-env.sh:198-200` which already writes
|
||||
> `WARDEN_*` vars.
|
||||
>
|
||||
> The table below reflects current (broken) behavior, not the target.
|
||||
|
||||
The bridge interface is named after the agent: `${ASSISTANT_NAME}0`.
|
||||
|
||||
|
|
|
|||
|
|
@ -358,6 +358,116 @@ before the next step is merged.
|
|||
- bhyve remains the primary repeatable regression path for first-boot testing;
|
||||
real-hardware validation is still required, but operator-cycle dependent.
|
||||
|
||||
## Phase 1 Implementation Review (12.maj.2026 — Claude, Linux)
|
||||
|
||||
Phase 1 is implemented (commits `cc40f48`..`94e9958`). The core flow works:
|
||||
live USB boots into Lumina, QML installer collects identity/network/access,
|
||||
writes two-file handoff, invokes `bsdinstall script`, firstboot converges on
|
||||
HDD boot. Setup token is rotated by `shell-deploy.sh` and written to
|
||||
`/var/db/clawdie-installer/setup-token` with SSH tunnel instructions in
|
||||
`/etc/motd`.
|
||||
|
||||
### Implementation status vs. proposal
|
||||
|
||||
| Proposal step | Status | Notes |
|
||||
| ------------------------------------------ | ---------- | ---------------------------------------------------------------- |
|
||||
| 1. Live image boots into installer session | Done | LightDM autologin → xprofile → QML |
|
||||
| 2. QML installer as primary frontend | Done | Collects identity/network/access only |
|
||||
| 3. Live validation in QML | Partial | Disk selection validates; no Tailscale/SSH key format validation |
|
||||
| 4. Two-file handoff model | Done | `live-install.env` + `clawdie-handoff.sealed` |
|
||||
| 5. Commit step | Done | `clawdie-live-commit.sh` → `bsdinstall script` |
|
||||
| 6. Thin first HDD boot converge | Done | `firstboot.sh` sources handoff, runs shell modules |
|
||||
| Acceptance criteria | Mostly met | See gaps below |
|
||||
|
||||
### Gaps found in implementation
|
||||
|
||||
#### Critical: bridge naming mismatch
|
||||
|
||||
**File:** `firstboot/shell-pf.sh:32`
|
||||
|
||||
```sh
|
||||
local BRIDGE="${ASSISTANT_NAME:-clawdie}0"
|
||||
```
|
||||
|
||||
Creates `clawdie0` bridge. Clawdie-AI `src/jail-config.ts` expects `warden0`.
|
||||
`shell-env.sh` correctly writes `WARDEN_*` vars but PF creates the wrong bridge
|
||||
name. **Jails will fail to route on first boot.**
|
||||
|
||||
**Fix:** `shell-pf.sh:32` → `local BRIDGE="warden0"`
|
||||
|
||||
This is the most critical remaining cross-repo misalignment. See `NETWORKING.md`
|
||||
which already documents this as a known issue.
|
||||
|
||||
#### High: live path drops feature flags
|
||||
|
||||
**File:** `firstboot/gui/qml-installer/main.cpp:445-510`
|
||||
|
||||
The live-session `startInstall()` path does not write `FEATURE_DESKTOP`,
|
||||
`FEATURE_DEVTOOLS`, `FEATURE_NVIDIA`, or `LOCAL_LLM_PROVIDER`. Only the
|
||||
non-live fallback path (lines 528-535) writes them.
|
||||
|
||||
`firstboot.sh` dispatches `run_step_if` for GPU, SSH, system, desktop, and PF
|
||||
modules. An empty `FEATURE_DESKTOP` is neither `"YES"` nor `"NO"` — behavior
|
||||
depends on how each module handles unset vars.
|
||||
|
||||
**Fix:** Add feature flag writes to the live handoff path. At minimum, write
|
||||
safe defaults:
|
||||
|
||||
```
|
||||
FEATURE_DESKTOP="NO"
|
||||
FEATURE_DEVTOOLS="NO"
|
||||
FEATURE_NVIDIA="NO"
|
||||
LOCAL_LLM_PROVIDER="none"
|
||||
```
|
||||
|
||||
Or add feature toggle UI to the QML installer (Phase 2).
|
||||
|
||||
#### High: GPU_DRIVER_VERSION only in non-live path
|
||||
|
||||
The non-live path writes `GPU_DRIVER_VERSION` (line 530). The live path does
|
||||
not. `shell-gpu.sh` auto-detects via `pciconf`, so this likely works in
|
||||
practice, but the two paths are inconsistent.
|
||||
|
||||
#### Medium: CompletePage refers to setup token that doesn't exist yet
|
||||
|
||||
**File:** `firstboot/gui/qml-installer/pages/CompletePage.qml:85`
|
||||
|
||||
Tells the operator the setup token "lives" at `/var/db/clawdie-installer/setup-token`.
|
||||
But the token is only written during `firstboot.sh` on first HDD boot, after
|
||||
the controlplane starts. During the live session, the file doesn't exist.
|
||||
|
||||
**Fix:** Update text to say the token will be available after first boot, or
|
||||
remove the reference and rely on `/etc/motd`.
|
||||
|
||||
#### Medium: hardcoded Slovenian locale
|
||||
|
||||
**File:** `firstboot/gui/qml-installer/main.cpp:465-469`
|
||||
|
||||
Live handoff always writes `SYSTEM_LOCALE=sl_SI.UTF-8` and `KEYMAP=sl.kbd`.
|
||||
This is correct for the developer but wrong for other operators. The old shell
|
||||
wizard asked for locale/keymap interactively.
|
||||
|
||||
**Fix (short-term):** Default to `en_US.UTF-8` / `us.kbd`.
|
||||
**Fix (Phase 2):** Add locale/keymap selection page to QML installer.
|
||||
|
||||
#### Medium: stale README.md
|
||||
|
||||
`README.md:79-88` still describes the old manual `bsdinstall` + wizard flow.
|
||||
The live installer auto-boots into QML. Operators never see a shell wizard for
|
||||
baremetal installs.
|
||||
|
||||
#### Low: stale handoff doc
|
||||
|
||||
`HANDOFF-v1.0.0-RELEASE.md` references old Phase 4 architecture and file paths.
|
||||
Should be deleted per AGENTS.md handoff lifecycle rules.
|
||||
|
||||
#### Low: `local` keyword in "POSIX-compliant" scripts
|
||||
|
||||
`shell-pf.sh` and `shell-deploy.sh` claim POSIX compliance but use `local`.
|
||||
Works on FreeBSD `/bin/sh` but the claim is misleading.
|
||||
|
||||
---
|
||||
|
||||
## Recommendation
|
||||
|
||||
Ship Phase 1 now.
|
||||
|
|
@ -365,3 +475,6 @@ Ship Phase 1 now.
|
|||
It gives the project a meaningfully better install UX within the current repo
|
||||
shape, keeps the most dangerous low-level work inside `bsdinstall`, and makes
|
||||
Phase 2 an explicit future evolution instead of an abandoned idea.
|
||||
|
||||
**But fix the bridge naming first.** Without `warden0`, jail networking is
|
||||
broken on every fresh install.
|
||||
|
|
|
|||
Loading…
Add table
Reference in a new issue