clawdie-iso/firstboot/firstboot.sh
Sam & Claude 4c60ed81e3 fix(installer): Phase A — stable ZFS, safe upgrades, module matrix
Four critical fixes before v1.0.0 VM test, informed by PC-BSD failure
modes and GhostBSD's improvements:

1. shell-zfs.sh: zpool labelclear on fresh install
   Clear ZFS labels from every device that was in the old pool before
   bsdinstall writes new ones. Prevents the "can't find pool by GUID"
   boot failure that made PC-BSD reinstalls unreliable.

2. shell-zfs.sh: pre-upgrade snapshot
   When operator selects Upgrade, take zfs snapshot -r
   pool@pre-upgrade-{timestamp} before any changes. One reboot to
   roll back if the upgrade goes wrong. UPGRADE_SNAPSHOT exported for
   downstream modules to reference.

3. shell-env.sh: never overwrite secrets on upgrade
   clawdie_shell_env_generate() now checks CLAWDIE_BOOT_MODE. In
   upgrade mode it calls clawdie_shell_env_append_new_keys() instead
   of regenerating — reads existing .env and appends only keys that
   are absent. Existing DB passwords, JWT secrets, API keys are never
   touched. This fixes the root cause of the orphaned-database bug:
   new passwords that don't match the existing pool's data.

4. firstboot.sh: module execution matrix via run_step_if
   New run_step_if "<modes>" wrapper marks steps as done without
   running them when not applicable to the current boot mode.
   Upgrade skips: gpu, nvidia, ssh, system, desktop, pf, tailscale
   Upgrade runs: pkg, env (append-only), npm-globals, deploy
   Prevents SSH key resets, rc.conf overwrites, and firewall rewrites
   during upgrade — all of which undid operator customisations.

Also adds INSTALLER-PLAN.md: full architecture plan for unified
GUI/TUI installer with Fresh / Upgrade / Repair modes, boot
environment support, and a clear phase roadmap to v1.1.0.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-04 20:04:22 +02:00

319 lines
14 KiB
Bash

#!/bin/sh
# Clawdie-AI Firstboot Orchestrator
# Runs once on first boot (rc.d/clawdie-firstboot, REQUIRE: NETWORKING LOGIN)
# Dispatches to clawdie-shell-*.sh modules for cloud or baremetal path
#
# Usage:
# firstboot.sh — Normal run (wizard on baremetal, pre-baked on cloud)
# firstboot.sh --resume — Skip already-completed steps, continue from last failure
# firstboot.sh --reset — Clear progress and start over from the beginning
set -eu
SHARE="${SHARE:-/usr/local/share/clawdie-iso}"
LOG_FILE="${LOG_FILE:-/var/log/clawdie-firstboot.log}"
PROGRESS_FILE="${PROGRESS_FILE:-/var/log/clawdie-firstboot.progress}"
RC_CONF="${RC_CONF:-/etc/rc.conf}"
# ── Arg parsing ────────────────────────────────────────────────────────────
RESUME=0
case "${1:-}" in
--resume) RESUME=1 ;;
--reset)
rm -f "$PROGRESS_FILE"
echo "$(date '+%H:%M:%S') [firstboot] Progress reset — starting over" | tee -a "$LOG_FILE"
;;
--help|-h)
echo "Usage: firstboot.sh [--resume|--reset]"
echo " --resume Skip completed steps, continue from last failure"
echo " --reset Clear progress file and start from the beginning"
exit 0
;;
esac
log_msg() { echo "$(date '+%H:%M:%S') $1" | tee -a "$LOG_FILE"; }
# ── Checkpoint helpers ─────────────────────────────────────────────────────
# Mark a step done in the progress file
step_done() {
echo "$1" >> "$PROGRESS_FILE"
}
# Return 0 (true) if the step was already completed
step_completed() {
[ "$RESUME" -eq 1 ] && grep -qx "$1" "$PROGRESS_FILE" 2>/dev/null
}
# Run a module function with checkpoint guard.
# Usage: run_step <step_name> <function> [description] [step_num]
run_step() {
_step="$1"
_fn="$2"
_desc="${3:-$_fn}"
_step_num="${4:-0}" # Optional: step number for progress tracking
if step_completed "$_step"; then
log_msg "[firstboot] Skipping $_step (already completed)"
[ "$_step_num" -gt 0 ] && echo "PROGRESS=$_step_num" >> "$PROGRESS_FILE"
return 0
fi
log_msg "[firstboot] Running: $_desc"
"$_fn"
step_done "$_step"
[ "$_step_num" -gt 0 ] && echo "PROGRESS=$_step_num" >> "$PROGRESS_FILE"
}
# Run a step only when the current boot mode is in the allowed list.
# Usage: run_step_if "<modes>" <step_name> <function> [description] [step_num]
# Modes: space-separated list of CLAWDIE_BOOT_MODE values (install upgrade repair)
# Steps not applicable to the current mode are marked done (so --resume works).
run_step_if() {
_allowed="$1"; shift
_step="$1"
_mode="${CLAWDIE_BOOT_MODE:-install}"
if echo "$_allowed" | grep -qw "$_mode"; then
run_step "$@"
else
if ! step_completed "$_step"; then
log_msg "[firstboot] Skipping $_step (not applicable in $_mode mode)"
step_done "$_step"
fi
fi
}
# ── Set package path to bundled packages on HDD ──────────────────────────────
# After bsdinstall, packages live at SHARE/packages (not /mnt/media)
export USB_PKG_PATH="${SHARE}/packages"
# ── Prevent modules from auto-running when sourced ─────────────────────────
export SHELL_GPU_TEST=1
export SHELL_NVIDIA_TEST=1
export SHELL_PKG_TEST=1
export SHELL_ENV_TEST=1
export SHELL_DEPLOY_TEST=1
export SHELL_NPM_GLOBALS_TEST=1
export SHELL_TAILSCALE_TEST=1
export SHELL_ZFS_TEST=1
export SHELL_PF_TEST=1
export SHELL_DESKTOP_TEST=1
# shell-ssh.sh and shell-system.sh use case/${0##*/} — no flag needed
# ── Source modules (functions only, nothing runs yet) ─────────────────────────
. "${SHARE}/build.cfg"
. "${SHARE}/firstboot/shell-zfs.sh"
. "${SHARE}/firstboot/shell-gpu.sh"
. "${SHARE}/firstboot/shell-nvidia.sh"
. "${SHARE}/firstboot/shell-pkg.sh"
. "${SHARE}/firstboot/shell-ssh.sh"
. "${SHARE}/firstboot/shell-env.sh"
. "${SHARE}/firstboot/shell-system.sh"
. "${SHARE}/firstboot/shell-desktop.sh"
. "${SHARE}/firstboot/shell-pf.sh"
. "${SHARE}/firstboot/shell-tailscale.sh"
. "${SHARE}/firstboot/shell-npm-globals.sh"
. "${SHARE}/firstboot/shell-deploy.sh"
# ── Load GUI config if present ───────────────────────────────────────────────
if [ -f "/tmp/clawdie-install.conf" ]; then
log_msg "[firstboot] Loading GUI installer configuration"
. "/tmp/clawdie-install.conf"
step_done "wizard"
fi
log_msg "[firstboot] Starting — target: ${TARGET:-baremetal}${RESUME:+, resume mode}"
# ── ZFS pool detection (baremetal only) ───────────────────────────────────
# Runs early to decide boot mode: install | upgrade | maintenance
# Maintenance mode exec's away and never returns.
CLAWDIE_BOOT_MODE="${CLAWDIE_BOOT_MODE:-install}"
if [ "${TARGET:-baremetal}" != "vps" ]; then
run_step "zfs" clawdie_shell_zfs_detect "ZFS pool detection"
fi
export CLAWDIE_BOOT_MODE
# ── Collect configuration ──────────────────────────────────────────────────
if step_completed "wizard"; then
log_msg "[firstboot] Skipping wizard (already completed)"
elif [ "$CLAWDIE_BOOT_MODE" = "upgrade" ]; then
# Upgrade: import existing pool, load .env from previous install
log_msg "[firstboot] Upgrade mode — loading existing configuration"
kldload zfs 2>/dev/null || true
zpool import "$POOL_NAME" 2>/dev/null || true
_existing_env="/home/clawdie/clawdie-ai/.env"
if [ -f "$_existing_env" ]; then
ENV_FILE="${ENV_FILE:-/home/clawdie/.env}"
cp "$_existing_env" "$ENV_FILE"
log_msg "[firstboot] Loaded existing .env from previous install"
else
log_msg "[firstboot] WARNING: No existing .env found — falling through to wizard"
fi
step_done "wizard"
elif [ "${TARGET:-baremetal}" = "vps" ]; then
# VPS: all values must be pre-baked in build.cfg — validate
[ -z "${ASSISTANT_NAME:-}" ] && log_msg "ERROR: ASSISTANT_NAME not baked" && exit 1
[ -z "${AGENT_DOMAIN:-}" ] && log_msg "ERROR: AGENT_DOMAIN not baked" && exit 1
[ -z "${TZ:-}" ] && log_msg "ERROR: TZ not baked" && exit 1
log_msg "[firstboot] VPS — pre-baked config OK"
step_done "wizard"
else
# Baremetal: minimal wizard — identity, network, keys only
# All jails (db, git/forgejo, cms) are provisioned by default.
# API keys are deferred to web UI on first desktop login.
_dialog() { bsddialog --backtitle "Clawdie-AI Setup" "$@" 2>&1; }
_dialog --msgbox "\
EXPERIMENTAL BUILD
This is pre-release software.
Not recommended for production use.
Data loss or service interruption possible.
By continuing, you assume all risks." 12 60
# Tailscale (recommended, but optional)
if _dialog --yesno \
"Enable Tailscale for secure remote access?\n\n" \
"Tailscale creates a private network for SSH access.\n" \
"Without it, SSH will be exposed on public port 22.\n\n" \
"Recommended: Yes (you can add auth key later if needed)" 14 70; then
FEATURE_TAILSCALE="YES"
TAILSCALE_AUTHKEY=$(_dialog --passwordbox \
"Tailscale device auth key (tskey-...).\n\n" \
"Leave blank to skip auth (you can run 'tailscale up' later).\n" \
"Generate at: https://login.tailscale.com/admin/settings/keys" 13 72 "")
if [ -z "$TAILSCALE_AUTHKEY" ]; then
_dialog --msgbox "No auth key provided.\n\nTailscale will be installed but not authenticated.\nRun 'tailscale up' after first boot to connect." 10 60
fi
else
FEATURE_TAILSCALE="NO"
TAILSCALE_AUTHKEY=""
_dialog --msgbox "WARNING: SSH will be publicly accessible on port 22.\n\nYou are responsible for securing network access." 10 60
fi
ASSISTANT_NAME=$(_dialog --inputbox "Assistant name:" 8 50 "Clawdie")
# Derive default domain from assistant name (e.g., Clawdie → clawdie.home.arpa)
_agent_name_lower=$(echo "$ASSISTANT_NAME" | tr 'A-Z' 'a-z' | sed 's/[^a-z0-9]//g')
_default_domain="${_agent_name_lower}.home.arpa"
AGENT_DOMAIN=$(_dialog --inputbox \
"Domain zone (public or local):" 8 60 "$_default_domain")
if command -v route >/dev/null 2>&1 && command -v ifconfig >/dev/null 2>&1; then
HOST_IF="$(route -n get default 2>/dev/null | awk '/interface:/ { print $2; exit }')"
HOST_IPS=""
if [ -n "$HOST_IF" ]; then
HOST_IPS="$(ifconfig "$HOST_IF" 2>/dev/null | awk '/inet / { print $2 }')"
fi
if [ -z "$HOST_IPS" ]; then
HOST_IPS="$(ifconfig 2>/dev/null | awk '/inet / && $2 != "127.0.0.1" { print $2 }')"
fi
if [ -n "$HOST_IPS" ]; then
HOST_IPS_LINE="$(echo "$HOST_IPS" | tr '\n' ' ' | sed 's/ $//')"
_dialog --msgbox "\
DNS note: If you use *.home.arpa, create an A record for
${AGENT_DOMAIN} pointing to this host IP.
Detected IP(s): ${HOST_IPS_LINE}" 10 70
fi
fi
TZ=$(_dialog --inputbox \
"Timezone (e.g. Europe/Ljubljana):" 8 50 "UTC")
SYSTEM_LOCALE=$(_dialog --inputbox \
"Locale (e.g. sl_SI.UTF-8):" 8 50 "sl_SI.UTF-8")
KEYMAP=$(_dialog --inputbox \
"Console keymap (e.g. sl.kbd):" 8 50 "sl.kbd")
[ -z "${SYSTEM_LOCALE:-}" ] && SYSTEM_LOCALE="sl_SI.UTF-8"
DISPLAY_LOCALE="${SYSTEM_LOCALE}"
ASSISTANT_LOCALE="${SYSTEM_LOCALE}"
SSH_PUBLIC_KEY=$(_dialog --inputbox \
"SSH public key (optional — paste ssh-ed25519 or ssh-rsa):" 12 70 "")
ANTHROPIC_API_KEY=$(_dialog --passwordbox \
"Anthropic API Key (optional).\n\n" \
"Get from: console.anthropic.com\n" \
"Leave blank to configure later." 12 70 "")
CLAUDE_CODE_OAUTH_TOKEN=$(_dialog --passwordbox \
"Claude Code OAuth Token (optional).\n\n" \
"Run 'claude setup-token' elsewhere, paste token here.\n" \
"Leave blank to configure later." 12 70 "")
# Defaults: all jails enabled, no local LLM (can be enabled post-install)
: "${AGENT_GENDER:=f}"
FEATURE_GIT="YES"
FEATURE_GITEA="YES"
CODE_HOSTING_MODE="gitea"
LOCAL_LLM_PROVIDER="none"
FEATURE_OLLAMA="NO"
FEATURE_LLAMA_CPP="NO"
FEATURE_OLLAMA_HPP="NO"
# Summary screen
SUMMARY_MSG="Configuration Summary:\n\n"
SUMMARY_MSG+="Name: ${ASSISTANT_NAME}\n"
SUMMARY_MSG+="Domain: ${AGENT_DOMAIN}\n"
SUMMARY_MSG+="Timezone: ${TZ}\n"
SUMMARY_MSG+="Locale: ${SYSTEM_LOCALE}\n"
SUMMARY_MSG+="Keymap: ${KEYMAP}\n"
SUMMARY_MSG+="SSH key: $([ -n "$SSH_PUBLIC_KEY" ] && echo "✓ Provided" || echo "✗ None")\n"
SUMMARY_MSG+="Claude API: $([ -n "$ANTHROPIC_API_KEY" ] && echo "✓ Provided" || echo "✗ None")\n"
SUMMARY_MSG+="Claude OAuth: $([ -n "$CLAUDE_CODE_OAUTH_TOKEN" ] && echo "✓ Provided" || echo "✗ None")\n"
if [ "${FEATURE_TAILSCALE}" = "YES" ]; then
if [ -n "$TAILSCALE_AUTHKEY" ]; then
SUMMARY_MSG+="Tailscale: ✓ Enabled (auth key provided)\n"
else
SUMMARY_MSG+="Tailscale: ⚠ Enabled (no auth key - run 'tailscale up' later)\n"
fi
else
SUMMARY_MSG+="Tailscale: ✗ Disabled (SSH on public port 22)\n"
fi
SUMMARY_MSG+="\nProceed with installation?"
if ! _dialog --yesno "$SUMMARY_MSG" 16 70; then
_dialog --msgbox "Installation cancelled. Rebooting..." 6 40
reboot
fi
step_done "wizard"
fi
export CLAWDIE_BOOT_MODE POOL_NAME
export ASSISTANT_NAME AGENT_GENDER AGENT_DOMAIN TZ SSH_PUBLIC_KEY
export SYSTEM_LOCALE DISPLAY_LOCALE ASSISTANT_LOCALE KEYMAP
export PI_TUI_PROVIDER PI_TUI_MODEL ZAI_API_KEY OPENROUTER_API_KEY ANTHROPIC_API_KEY
export CLAUDE_CODE_OAUTH_TOKEN
export EMBED_BASE_URL EMBED_MODEL EMBED_API_KEY EMBED_DIMENSIONS
export TELEGRAM_BOT_TOKEN TELEGRAM_CHAT_ID FEATURE_TELEGRAM
export FEATURE_TAILSCALE TAILSCALE_AUTHKEY
export CODE_HOSTING_MODE FEATURE_GIT FEATURE_GITEA FORGEJO_DISK_ESTIMATE
export LOCAL_LLM_PROVIDER FEATURE_OLLAMA FEATURE_LLAMA_CPP FEATURE_OLLAMA_HPP
export OLLAMA_RAM_ESTIMATE OLLAMA_DISK_ESTIMATE LLAMA_CPP_RAM_ESTIMATE LLAMA_CPP_DISK_ESTIMATE
export USB_LLM_MODELS_PATH
# ── Run modules ────────────────────────────────────────────────────────────
log_msg "[firstboot] Running modules..."
# Module execution matrix — controls which modules run in each mode.
# fresh install: all modules
# upgrade: pkg + env (append-only) + npm-globals + deploy only
# repair: deploy only (targeted repair actions inside shell-deploy)
#
# GPU, SSH, system, desktop, PF, Tailscale are skipped on upgrade/repair —
# they would reset keys, overwrite rc.conf, and undo operator customisations.
run_step_if "install" "gpu" clawdie_shell_gpu_detect "GPU driver detection" 1
run_step_if "install" "nvidia" clawdie_shell_nvidia_detect "NVIDIA version selection" 2
run_step_if "install upgrade" "pkg" clawdie_shell_pkg_setup "Package repo configuration" 3
run_step_if "install" "ssh" clawdie_shell_ssh_setup "SSH keys + system passwords" 4
run_step_if "install upgrade" "env" clawdie_shell_env_generate "Generate .env with secrets" 5
run_step_if "install" "system" clawdie_shell_system_config "Hostname, rc.conf, services" 6
run_step_if "install" "desktop" clawdie_shell_desktop_detect "Desktop enablement" 7
run_step_if "install" "pf" clawdie_shell_pf "PF firewall + jail NAT" 8
run_step_if "install" "tailscale" clawdie_shell_tailscale_setup "Tailscale remote access" 8
run_step_if "install upgrade" "npm-globals" clawdie_shell_npm_globals_install "Install bundled npm CLIs (claude/gemini/pi)" 8
run_step_if "install upgrade" "deploy" clawdie_shell_deploy "Extract tarball + just install" 8
log_msg "[firstboot] Complete."
log_msg "[firstboot] Aider (primary harness): aider --help"
log_msg "[firstboot] Pi (primary harness): pi --help"
log_msg "[firstboot] Optional: Codex CLI (headless): codex login --device-auth"
log_msg "[firstboot] Optional: Codex CLI (API key): printenv OPENAI_API_KEY | codex login --with-api-key"