diff --git a/docs/FRESH-INSTALL-CHECKLIST.md b/docs/FRESH-INSTALL-CHECKLIST.md new file mode 100644 index 0000000..ee429b2 --- /dev/null +++ b/docs/FRESH-INSTALL-CHECKLIST.md @@ -0,0 +1,213 @@ +# Fresh Install Checklist + +Verification checklist for new Clawdie-AI installations (bare metal, bhyve VM, +or jail-based). Run after firstboot completes. Each check includes the exact +command and expected result. + +Designed to work with the tmux-screenshot skill — capture each section for the +installation record. + +## Timing milestones + +Record wall-clock timestamps at each stage. On bhyve, the serial console +shows boot messages with timestamps. + +| Milestone | Command / event | Record | +|-----------|----------------|--------| +| Boot start | First kernel message | `T0` | +| Firstboot wizard shown | `bsddialog` prompt appears | `T1 = T1 - T0` | +| Wizard complete | `[firstboot] Complete.` in log | `T2 = T2 - T0` | +| Desktop ready (Lumina) | `lightdm` login screen visible | `T3 = T3 - T0` | +| Agent responding | `/ping` on Telegram returns pong | `T4 = T4 - T0` | + +Check firstboot log for exact timestamps: + +```sh +head -5 /var/log/clawdie-firstboot.log +tail -5 /var/log/clawdie-firstboot.log +``` + +## 1. Jails running + +```sh +jls -N +``` + +Expected (agent name may vary): + +``` + JID IP Address Hostname Name + 1 10.0.X.2 {agent}-controlplane {agent}-controlplane + 2 10.0.X.3 db db + 3 10.0.X.4 cms cms + 4 10.0.X.5 llamacpp llamacpp +``` + +All four jails must be present and running. If any are missing: + +```sh +cat /var/log/clawdie-firstboot.log | grep -i 'fail\|error' +``` + +## 2. .env correctness + +```sh +grep -E '^(AGENT_NAME|AGENT_GENDER|AGENT_DOMAIN|AGENT_INTERNAL_DOMAIN|AGENT_TMP_DIR|PI_TUI_PROVIDER|PI_TUI_MODEL|EMBED_BASE_URL|TELEGRAM_BOT_TOKEN)=' .env +``` + +Verify: + +| Key | Expected | +|-----|----------| +| `AGENT_NAME` | Lowercase, no spaces (e.g. `clawdie`, `mevy`) | +| `AGENT_GENDER` | `f`, `m`, or `n` | +| `AGENT_DOMAIN` | Valid domain or `.internal` | +| `AGENT_INTERNAL_DOMAIN` | `{agent}.home.arpa` | +| `AGENT_TMP_DIR` | Writable path, not `/tmp` | +| `PI_TUI_PROVIDER` | `zai`, `openrouter`, `anthropic`, etc. | +| `PI_TUI_MODEL` | Valid model for the provider | +| `EMBED_BASE_URL` | URL ending in `/v1` | +| `TELEGRAM_BOT_TOKEN` | Non-empty if `FEATURE_TELEGRAM=true` | + +## 3. Watchdog IPC status + +```sh +# Check socket exists +ls -la "${AGENT_TMP_DIR:-tmp}/ipc/" + +# Query watchdog status +echo '{"cmd":"status"}' | nc -U "${AGENT_TMP_DIR:-tmp}/ipc/${AGENT_NAME}-watchdog.sock" +``` + +Expected: JSON response with `mode`, `throttle`, `memory`, `activeJails`. + +If socket is missing, check if the agent process is running: + +```sh +pgrep -f 'node.*dist/index.js' +``` + +## 4. Database connectivity + +```sh +# From host — test PostgreSQL in db jail +sudo bastille cmd db service postgresql status + +# Test connection (uses .env credentials) +npm run setup -- --step verify +``` + +Expected: `postgresql is running` and verify step exits 0. + +## 5. LLM provider connectivity + +```sh +# Quick inference test via pi +pi --provider "${PI_TUI_PROVIDER}" --model "${PI_TUI_MODEL}" -e "reply with OK" +``` + +Expected: Model responds. If using ZAI (GLM), verify the API key: + +```sh +grep '^ZAI_API_KEY=' .env | cut -c1-20 +``` + +## 6. Telegram bot + +```sh +# Check bot token is valid (should return bot info) +curl -s "https://api.telegram.org/bot$(grep '^TELEGRAM_BOT_TOKEN=' .env | cut -d= -f2)/getMe" | python3 -m json.tool +``` + +Expected: `"ok": true` with the bot username. + +## 7. Lumina desktop (baremetal only) + +```sh +service lightdm status +service dbus status +``` + +If Lumina fails to start, check: + +```sh +# X11 log +cat /var/log/Xorg.0.log | tail -30 + +# LightDM log +cat /var/log/lightdm/lightdm.log | tail -30 + +# GPU driver loaded? +sysctl kern.conftxt | grep -i gpu +pciconf -lv | grep -B3 'VGA' +``` + +## 8. Network and firewall + +```sh +# PF rules loaded +sudo pfctl -sr | head -10 + +# NAT working (from inside a jail) +sudo bastille cmd db ping -c1 1.1.1.1 + +# Bridge healthy +ifconfig warden0 | grep 'inet ' +``` + +## 9. ZFS health + +```sh +zpool status -x +zfs list -o name,used,avail -t filesystem | head -20 +``` + +Expected: `all pools are healthy`. + +## 10. Screenshot smoke test + +Capture the final state as proof of successful install: + +```sh +python3 .agent/skills/tmux-screenshot/tmux-screenshot.py \ + --session "${AGENT_NAME}" \ + --base-url "https://${AGENT_DOMAIN}/screenshots" \ + --publish +``` + +Verify the capture landed: + +```sh +ls -la /usr/local/www/clawdie/screenshots/*.png | tail -3 +``` + +## Log paths reference + +| Log | Path | +|-----|------| +| Firstboot orchestrator | `/var/log/clawdie-firstboot.log` | +| Firstboot progress | `/var/log/clawdie-firstboot.progress` | +| Agent (production) | `logs/klavdija.log` (relative to project) | +| Watchdog | Same as agent log (pino structured) | +| Preflight run | `logs/preflight-{runstamp}/` | +| LightDM | `/var/log/lightdm/lightdm.log` | +| X11 | `/var/log/Xorg.0.log` | +| PostgreSQL | `/var/log/postgresql.log` (inside db jail) | +| nginx | `/var/log/nginx/error.log` | + +## Running the full preflight + +The automated version of this checklist: + +```sh +# As root (for jail and firewall steps) +sudo npm run preflight + +# With onboarding wizard +sudo npm run preflight -- --with-onboarding + +# Stop on first failure +sudo npm run preflight -- --fail-fast +``` + +Results are written to `logs/preflight-{timestamp}/summary.json`.