clawdie-iso/firstboot/MODULE-MANIFEST.md

623 lines
18 KiB
Markdown

# Clawdie Shell Module Manifest
**Purpose:** Explicit dependency graph for 12-module pipeline (Phase 1)
**Used by:** firstboot.sh wizard (Phase 2.1) for sequencing and recovery
---
## Module Dependencies Map
```
[0.0 setup-import] (optional) → setup.txt + system.env import
[0.1 zfs] (baremetal only) → CLAWDIE_BOOT_MODE
[User Input]
[1.1 env] → ASSISTANT_NAME, AGENT_DOMAIN, TZ, LLM_PROVIDER
├→ [1.2 pkg] (optional: can run any time)
├→ [1.3 gpu] (optional: hardware detection)
├→ [1.3b nvidia] ← REQUIRES: DETECTED_GPU from [1.3]
├→ [1.3c ssh] ← REQUIRES: ASSISTANT_NAME from [1.1]
├→ [1.4 system] ← REQUIRES: TZ, AGENT_DOMAIN from [1.1]
├→ [1.4.1 pf] ← REQUIRES: ASSISTANT_NAME from [1.1]
├→ [1.4.2 desktop] ← PREFERS: GPU config from [1.3]
├→ [1.5 tailscale] (optional) ← PREFERS: repos from [1.2]
← PREFERS: pf from [1.4.1]
├→ [1.5.1 npm-globals] ← PREFERS: repos from [1.2]
└→ [1.6 deploy] ← REQUIRES: .env from [1.1]
← REQUIRES: repos from [1.2]
← PREFERS: gpu config from [1.3]
```
---
## Module Specifications
### **0.0: setup-import.sh**
**Purpose:** Read `setup.txt` and `system.env` from the writable USB config partition before the wizard.
**Wizard Inputs:** None
**Outputs (Exports):**
- identity/substrate values: `ASSISTANT_NAME`, `HOSTNAME`, `AGENT_DOMAIN`, `TZ`
- optional pre-baked provider/channel values, if present for backward compatibility
- `ZFS_POOL`, `ZFS_LAYOUT`, `ZFS_DATA_DISKS`, `ZFS_HOT_SPARES`, `ZFS_PREFIX`
- optional hardware hints from `system.env`
**Checkpoint Name:** `setup-import`
**Skip Condition:**
- no `CLAWDIE` FAT32 partition found
- or `setup.txt` missing
- or identity/substrate values are incomplete for the non-interactive path
**Error Handling:**
- safe-fail back to wizard
- never blocks interactive install on a missing or partial config partition
### **0.1: shell-zfs.sh**
**Purpose:** Detect existing ZFS pools and select boot mode (install | upgrade | maintenance)
**Wizard Inputs:** None (automatic detection)
**Outputs (Exports):**
- `CLAWDIE_BOOT_MODE``install`, `upgrade`, or `maintenance`
- `POOL_NAME` — detected pool name (when present)
**Checkpoint Name:** `[ZFS]`
**Skip Condition:**
- If `TARGET=vps` (ZFS not used)
**Error Handling:**
- Safe-fail: if no pools found, defaults to `install`
- If maintenance mode selected, execs `maintenance-mode.sh` and exits
---
### **1.1: shell-env.sh**
**Purpose:** Generate a minimal `.env` seed (identity + feature flags)
**Wizard Inputs (Tier 1 - Required):**
- `ASSISTANT_NAME` — Human name (e.g., "Clawdie Smith")
- `AGENT_DOMAIN` — public or local domain zone (default: `${agentname}.home.arpa`)
- `TZ` — Timezone (e.g., "Europe/Ljubljana")
**Wizard Inputs (Tier 2 - Optional):**
- Provider/model values — normally configured after first boot in `/setup`
- Provider API keys — optional backward-compatible pre-bake only
- Telegram credentials — optional backward-compatible pre-bake only
**Outputs (Created):**
- `$ENV_FILE``/home/clawdie/.env` (chmod 600)
- Contains: identity + feature flags from the live installer handoff or fallback wizard
- Copied into: `/home/clawdie/clawdie-ai/.env` by shell-deploy (1.6)
- Completed by: Clawdie-AI onboarding (secrets + derived defaults)
**Exports (for downstream modules):**
- `AGENT_NAME` — derived from ASSISTANT_NAME (e.g., "clawdie-smith")
- `DETECTED_ABI` — optional (from `pkg config abi`)
**Checkpoint Name:** `[ENV]`
**Skip Condition:**
- If `.env` exists AND contains AGENT_DOMAIN: skip (idempotent)
**Error Handling:**
- Fails if ASSISTANT_NAME, AGENT_DOMAIN, or TZ missing
- Validates `.env` has the minimal required variables before marking complete
**Recovery Note:**
- If 1.1 fails, user runs `clawdie-firstboot --resume`
- Skips 1.1 if `[ENV] COMPLETE` in progress file
- Can re-run to update .env (idempotent via sysrc pattern)
---
### **1.2: shell-pkg.sh**
**Purpose:** Configure package repositories (online FreeBSD + offline USB)
**Wizard Inputs:** None (automatic from [1.1])
**Inputs from [1.1]:**
- `DETECTED_ABI` — optional (auto-detected if missing)
- Used for FreeBSD repo URL: `pkg+https://pkg.FreeBSD.org/${ABI}/latest`
**Outputs (Created):**
- `/etc/pkg/repos/FreeBSD.conf`
- Online repo, priority 10 (fallback)
- URL: `pkg+https://pkg.FreeBSD.org/FreeBSD:15:amd64/latest`
- Enable: yes
- `/etc/pkg/repos/Clawdie-USB.conf`
- Offline USB repo, priority 100 (preferred)
- URL: `file:///mnt/media/packages`
- Enable: yes
- `/var/cache/pkg/bastille/`
- Seeded with .pkg files from USB (for offline jail provisioning)
**Exports:** None
**Checkpoint Name:** `[PKG]`
**Skip Condition:**
- If both FreeBSD.conf AND Clawdie-USB.conf exist: skip
**Error Handling:**
- Safe to fail: If USB packages missing, continues (online only)
- pkg update may fail in chroot (expected, logged as warning)
- Cache seeding errors are non-fatal (safe-fail with `|| true`)
**Recovery Note:**
- Can re-run to update repo configs
- If jails fail to provision, user may re-run 1.2 to refresh cache
- Dependency-free: can run before or after 1.1 (but run 1.1 first for ABI)
---
### **1.3: shell-gpu.sh**
**Purpose:** Detect GPU hardware, select driver kernel module
**Wizard Inputs:** None (automatic detection)
**Inputs from [1.1]:**
- None (hardware detection is autonomous)
**Hardware Detection (via pciconf -lv):**
- Intel → i915kms
- AMD → amdgpu
- NVIDIA → nvidia (sets DETECTED_GPU="nvidia")
- VMware → vmwgfx
- Unknown → vesa (fallback)
**Outputs (Created):**
- `/etc/rc.conf` (append/update):
- `kld_list="i915kms"` (or amdgpu, vesa, etc.)
- Idempotent: updates if already exists
**Exports (for downstream):**
- `DETECTED_GPU` — "intel", "amd", "nvidia", "vmware", or "vesa"
- Used by [1.3b nvidia] to decide: run or skip
**Checkpoint Name:** `[GPU]`
**Skip Condition:**
- If `kld_list=` already in rc.conf: skip
**Error Handling:**
- pciconf may fail outside chroot (expected)
- Silently falls back to "vesa" if detection fails
- kldload live fails gracefully (expected in chroot)
**Recovery Note:**
- Can re-run to re-detect GPU
- No dependencies on [1.1], [1.2]
- MUST run before [1.4 system] (which reads rc.conf)
- REQUIRED before [1.3b nvidia]
---
### **1.3b: shell-nvidia.sh** ⭐ (NEW in Option B)
**Purpose:** NVIDIA driver version selection (PC-BSD heritage)
**Wizard Inputs (optional, Tier 2):**
- `NVIDIA_DRIVER_VERSION` — "590", "470", or "390" (env var for unattended)
- OR: interactive bsddialog menu if not set
**Inputs from [1.3 gpu]:**
-**REQUIRED:** `DETECTED_GPU` variable
- If DETECTED_GPU ≠ "nvidia": **skip gracefully**
- If DETECTED_GPU = "nvidia": show driver selection
**Outputs (Created):**
- `/etc/rc.conf` (append/update):
- `nvidia_driver_version="590"` (or "470", "390")
- Idempotent: updates if exists
- `/var/run/nvidia_driver_version.txt` (optional, safe-fail):
- Marker file for build system
- Non-critical; failure is ignored
**Exports (for downstream):**
- `NVIDIA_DRIVER_VERSION` — "590", "470", or "390"
- Used during package selection in 1.6
**Checkpoint Name:** `[NVIDIA]`
**Skip Condition:**
- If DETECTED_GPU ≠ "nvidia": skip (returns 0 without error)
- If `nvidia_driver_version=` already in rc.conf: skip
**Driver Mapping:**
- 590 → nvidia-driver-590 (Maxwell & newer: GTX 750 Ti+, RTX 20/30/40)
- 470 → nvidia-driver-470 (Kepler: GTX 600/700, Titan Black)
- 390 → nvidia-driver-390 (Fermi: GTX 400/500)
**Error Handling:**
- Invalid version: logs warning, uses default "590"
- bsddialog not available: silently uses default
- /var/run write fails: non-fatal, continues
**Recovery Note:**
- DEPENDS ON [1.3 gpu]: WILL FAIL if DETECTED_GPU not set
- Can run after [1.3], before [1.4], [1.6]
- **Unattended mode:** set `NVIDIA_DRIVER_VERSION` env var before wizard
---
### **1.3c: shell-ssh.sh**
**Purpose:** Configure SSH access and system passwords
**Wizard Inputs (Tier 1 - Required):**
- `ASSISTANT_NAME` — used for default user naming
- `ASSISTANT_PASSWORD` — operator password
- `ROOT_PASSWORD` — root password (optional if disabled)
**Outputs (Created):**
- `/etc/master.passwd` updates via `pw` (user + password hashes)
- `/home/clawdie/.ssh/authorized_keys` (if SSH key provided)
- `/etc/ssh/sshd_config` (hardened defaults)
**Checkpoint Name:** `[SSH]`
**Skip Condition:**
- If `[SSH] COMPLETE` in progress file
**Error Handling:**
- Password prompts must succeed; failure exits the step
- Missing SSH key is non-fatal (password login allowed)
---
### **1.4: shell-system.sh**
**Purpose:** System configuration (hostname, timezone, rc.conf, services)
**Wizard Inputs (from [1.1]):**
- `TZ` — Timezone (e.g., "Europe/Ljubljana")
- `AGENT_DOMAIN` — public or local domain zone (e.g., "home.arpa")
**Inputs from [1.3 gpu]:**
- rc.conf already updated by [1.3] (this module appends to it)
**Outputs (Created/Modified):**
- `/etc/rc.conf` (append/update):
- `timezone="Europe/Ljubljana"`
- `dbus_enable="YES"`
- `hald_enable="YES"`
- `seatd_enable="YES"`
- `display_manager="lightdm"`
- `lightdm_enable="YES"`
- Idempotent: uses sysrc pattern
- `/etc/hostname`:
- Contains single line: `home.arpa`
- `/etc/profile.d/clawdie.sh`:
- npm environment (PATH, npm_config_prefix)
**Exports:** None
**Checkpoint Name:** `[SYSTEM]`
**Skip Condition:**
- If `timezone=` AND `dbus_enable=` already in rc.conf: skip
**Error Handling:**
- hostname command may fail in chroot (safe-fail)
- service onestart fails gracefully (expected in chroot)
- Missing /etc/profile.d: creates directory
**Recovery Note:**
- DEPENDS ON [1.1]: requires TZ, AGENT_DOMAIN
- Can run after [1.3], before [1.4.2], [1.5]
- Idempotent: safe to re-run
---
### **1.4.2: shell-desktop.sh**
**Purpose:** Detect display hardware and enable desktop stack when appropriate
**Wizard Inputs:** None (automatic detection)
**Inputs from [1.3 gpu]:**
- `DETECTED_GPU` — used to decide which desktop packages to enable
**Outputs (Created/Modified):**
- `/etc/rc.conf` updates for desktop services (when enabled)
- Desktop flag file for firstboot progress tracking
**Checkpoint Name:** `[DESKTOP]`
**Skip Condition:**
- If headless/VPS scenario detected
- If desktop already enabled
**Error Handling:**
- Detection failures fall back to headless mode
---
### **1.4.1: shell-pf.sh** ⭐ (NEW in v0.5.0)
**Purpose:** Configure PF firewall with block-all default, SSH protection, jail NAT, glasspane VNC
**Wizard Inputs:** None (automatic)
**Inputs from [1.1]:**
- `ASSISTANT_NAME` — used for assistant identity, not bridge naming
- `AGENT_NET` — optional, default: 192.168.100.0/24
**Outputs (Created):**
- `/etc/pf.conf`
- Block-all default
- SSH brute-force protection (max-src-conn-rate 3/60)
- NAT for agent jails (192.168.0.0/16 supernet)
- Pass rules for HTTP/HTTPS
- Commented block for Tailscale + glasspane VNC (agent enables later)
- `/etc/rc.conf` (append):
- `cloned_interfaces="bridge0"`
- `ifconfig_bridge0_name="warden0"`
- `ifconfig_${BRIDGE}="inet 192.168.100.1/24 up"`
- `gateway_enable="YES"`
- `pf_enable="YES"`
- `pf_reload_enable="YES"`
- `/usr/local/etc/rc.d/pf_reload`
- rc.d service: `REQUIRE: tailscaled`
- Fixes PF/Tailscale cold boot race (resolves tailscale0 interface index)
**Exports:** None
**Checkpoint Name:** `[PF]`
**Skip Condition:**
- If `pf_enable="YES"` already in rc.conf: skip
**Error Handling:**
- Fails if external interface cannot be detected (no default route)
- Bridge creation may fail in chroot (expected, non-fatal)
- PF load fails in chroot (expected, will load on reboot)
**Recovery Note:**
- DEPENDS ON [1.1]: requires ASSISTANT_NAME
- Should run AFTER [1.4 system] (both write rc.conf, different vars)
- MUST run BEFORE [1.5 tailscale] (tailscale modifies PF rules)
- pf_reload pre-installed even without Tailscale — harmless now, essential later
**Glasspane VNC:**
- Port 5900 (wayvnc → cage → Chromium)
- Blocked on ext_if, passed on tailscale0
- Uncommented by agent when Tailscale is enabled
---
### **1.5: shell-tailscale.sh**
**Purpose:** Optional Tailscale install + enablement for secure remote access.
**Wizard Inputs (Tier 2 - Optional):**
- `FEATURE_TAILSCALE` — YES/NO toggle (default NO)
- `TAILSCALE_AUTHKEY` — optional device auth key (tskey-...) for unattended login; recommended reusable for host + jails
**Inputs from [1.1]:**
- `ASSISTANT_NAME` — used to derive a safe hostname if needed
- `AGENT_DOMAIN` — used to derive a hostname label if set to a public domain
**Outputs (Created):**
- `/etc/rc.conf` entry: `tailscaled_enable="YES"`
- Tailscale device login (if no auth key), logged to `/var/log/clawdie-firstboot.log`
**Checkpoint Name:** `[TAILSCALE]`
**Skip Condition:**
- If `FEATURE_TAILSCALE != YES`
**Error Handling:**
- Safe to fail: network or daemon start issues are logged and do not abort install
- If `pkg install tailscale` fails, logs a warning and continues
**Recovery Note:**
- Can be re-run manually: `service tailscaled start && tailscale up`
---
### **1.5.1: shell-npm-globals.sh**
**Purpose:** Install bundled npm CLI tools (claude, gemini, pi) from ISO cache
**Wizard Inputs:** None (automatic)
**Inputs from [1.2] — PREFERRED:**
- Offline repo cache (ensures fully offline install)
**Outputs (Created):**
- `/opt/clawdie/npm-global/bin/*` — bundled CLI binaries
- `/etc/profile.d/clawdie.sh` updates (PATH) if missing
**Checkpoint Name:** `[NPM-GLOBALS]`
**Skip Condition:**
- If npm globals already installed in `/opt/clawdie/npm-global`
**Error Handling:**
- Missing cache logs warning and skips (non-fatal)
---
### **1.6: shell-deploy.sh**
**Purpose:** Extract Clawdie-AI (offline tarball incl. node_modules), run installer, start services
**Wizard Inputs:** None (automatic)
**Inputs from [1.1] — REQUIRED:**
- `.env` file must exist (path: `$ENV_FILE`)
- Sourced to get: ASSISTANT_NAME, AGENT_DOMAIN, jail IPs, DB config
- Failure if missing: **hard exit**
**Inputs from [1.2] — PREFERRED (not required):**
- Repos configured (speeds up offline provisioning)
- Pkg cache seeded (enables offline jail setup)
- But: can run without [1.2] (falls back to online)
**Inputs from [1.3], [1.3b]** — OPTIONAL:**
- rc.conf kld_list (already applied to system)
- nvidia_driver_version (used for jail package selection)
**Outputs (Created):**
- `/home/clawdie/clawdie-ai/.env`**seeded from [1.1]**
- Clawdie-AI directory structure extracted
- `node_modules/` present (bundled into ISO tarball at build time)
- Jails created (worker, db, cms, optional mgmt)
- Services provisioned (just install)
**Exports:** None
**Checkpoint Name:** `[DEPLOY]`
**Skip Condition:**
- If Clawdie-AI/package.json AND node_modules/ exist: skip extraction
- But: always runs just install (jails may need provisioning)
**Error Handling:**
- Missing .env: **hard exit** (module exits with error)
- Missing tarball: logs warning, assumes already extracted
- If node_modules are missing: attempts `npm ci` (requires network) and fails if it cannot install deps
- Jail creation failures: logs warning, continues
- DB connectivity failures: deferred to runtime
**Recovery Note:**
- HARD DEPENDENCY ON [1.1]: will fail if .env missing
- Should run AFTER [1.1], [1.2], [1.3], [1.3c], [1.4], [1.4.2], [1.5.1]
- Can be re-run for idempotency (skips if already deployed)
- Failures here block full system setup
---
## Execution Order (from firstboot.sh)
```
User Input (wizard Tier 1 + optional Tier 2)
1. zfs ✓ CONDITIONAL (baremetal only, sets boot mode)
2. gpu ✓ RECOMMENDED (hardware detection)
3. nvidia ✓ CONDITIONAL (only if GPU=nvidia)
4. pkg ✓ OPTIONAL (run early for offline support)
5. ssh ✓ REQUIRED (user + password provisioning)
6. env ✓ REQUIRED (creates .env with identity)
7. system ✓ REQUIRED (system-level config)
8. desktop ✓ CONDITIONAL (display detection + enablement)
9. pf ✓ REQUIRED (PF firewall + jail NAT)
10. tailscale ✓ OPTIONAL (remote access, modifies PF rules)
11. npm-globals ✓ OPTIONAL (offline npm CLIs)
12. deploy ✓ REQUIRED (depends on .env from env)
SUCCESS: Clawdie-AI ready for first boot
```
---
## Resume/Recovery Rules (for Phase 2.1)
### Rule 1: Hard Dependencies
- **1.1 must complete before 1.6** (deploy sources .env)
- **1.3 must complete before 1.3b** (1.3b checks DETECTED_GPU)
- **1.4 should run before 1.4.2 and 1.4.1** (system config affects desktop + PF)
- **1.4.1 should run before 1.5** (PF rules updated by tailscale)
### Rule 2: Skip Logic on Resume
```bash
# Pseudo-code for firstboot.sh --resume logic:
if grep -q "[ENV] COMPLETE" $PROGRESS_FILE; then
skip_module env
fi
if grep -q "[PKG] COMPLETE" $PROGRESS_FILE; then
skip_module pkg
fi
if grep -q "[GPU] COMPLETE" $PROGRESS_FILE; then
skip_module gpu
fi
if grep -q "[NVIDIA] COMPLETE" OR DETECTED_GPU != "nvidia"; then
skip_module nvidia
fi
# ... etc
```
### Rule 3: Error Recovery
- **Soft failure (warning):** Module logs error but continues
- Examples: gpu live-load fails, pkg update fails
- Action on resume: re-run module (idempotent)
- **Hard failure (exit 1):** Module exits shell
- Examples: .env missing in 1.6, AGENT_DOMAIN not set
- Action on resume: fix input, re-run from beginning
---
## For Phase 2.1 Implementation
Use this manifest to:
1. **Sequence modules** in firstboot.sh:
```bash
. shell-zfs.sh
clawdie_shell_zfs_detect # [ZFS] COMPLETE (baremetal)
. shell-gpu.sh
clawdie_shell_gpu_detect # [GPU] COMPLETE
. shell-nvidia.sh
clawdie_shell_nvidia_detect # [NVIDIA] COMPLETE
# ... etc
```
2. **Implement resume logic:**
```bash
# If --resume, skip completed modules
grep -q "[ENV] COMPLETE" $PROGRESS && skip_env=1
```
3. **Handle wizard tiers:**
- Tier 1 (required): ASSISTANT_NAME, AGENT_DOMAIN, TZ
- Tier 2 (optional): provider/model and Telegram values for backward-compatible pre-bake
4. **Define checkpoint names** for progress tracking
---
## Version History
- **current dev ISO**: live QML installer, post-install setup token, 12 shell modules, ZFS/desktop/npm-globals, bundled npm CLIs
- **v0.9.0**: 8 modules, runtime GPU detection, no ZFS/desktop/npm-globals
- **v0.5.0**: 6 modules, PF firewall, glasspane VNC support
- **v1.1 (planned):** Add shell-gpu-passthrough.sh, add shell-upgrade.sh