clawdie-ai/docs/internal/REFACTOR-PLAN.md
Mevy Assistant c633fdcc49 Remove legacy agent IDs + tighten task API
- Canonicalize controlplane agent IDs/roles to: sysadmin, db-admin, git-admin (drop *_agent variants).

- Add DB migration to rewrite existing *_agent rows and references to canonical IDs.

- Tighten POST /api/controlplane/tasks contract: require assigned_to (remove agent_id alias).

- Update tests and docs to match canonical IDs.

---

Build: pass (just typecheck)

Tests: pass — 1536 passed (92 files) (just test)
2026-04-19 06:54:28 +00:00

16 KiB
Raw Blame History

Clawdie-AI Refactor Plan: Lumina Desktop + NanoClaw Alignment

Date: 27.Mar.2026 Status: DRAFT — no code changes yet Context: Pivot from TUI-only jailed deployment to FreeBSD 15 + Lumina desktop Companion repo: /home/clawdie/clawdie-iso (v0.9.0, builds the bootable USB)


Page 1: What Changes and Why

The pivot

Clawdie-AI was built as a headless FreeBSD jail system — agents run inside Bastille jails, no display server, everything via terminal. The new target is a FreeBSD 15 machine with Lumina desktop (delivered by clawdie-iso) where agents run in terminal windows (tmux/pi console), with a real browser available natively on the desktop.

This aligns clawdie-ai with clawdie-iso, which already boots into Lumina and runs a firstboot shell wizard to configure the machine. The two projects now share the same target environment.

What clawdie-iso delivers (v0.9.0, 26.Mar.2026)

  • 25GB bootable USB image (895 pre-built packages, offline install)
  • FreeBSD 15.0 + Lumina desktop + LightDM
  • Firstboot wizard (7 POSIX shell modules, ~1,754 lines)
  • GPU detection (Intel/AMD/NVIDIA), SSH key injection, .env generation
  • rc.d service that runs once and self-disables
  • Output: user logged into Lumina, ready to run claude/setup

What clawdie-ai needs to do differently

Stop assuming headless jails. Agents run as processes in Lumina terminal windows. The TUI wizard that mimicked bsdinstall is dead — clawdie-iso's firstboot wizard already does that job better, in real shell.

What this eliminates

Component Lines Why it's dead
TUI setup wizard (setup-wizard.ts) 1,698 clawdie-iso firstboot does this
Screenshot wizard (screenshot-wizard.ts) 666 Lumina has native screenshot
ANSI-to-HTML converter (ansi-to-html.ts) 355 Only existed for TUI screenshots
Browser VM jail profile + skill ~200 Lumina has native Chromium/Firefox
bhyve VMM detection in wizard ~80 No Linux VM needed
Warden-specific skills (5 skills) ~1,500 MD Jails no longer primary runtime

Total removable: ~3,000 lines TS + ~1,500 lines skill docs

What stays

  • Core orchestrator (index.ts, group-queue.ts, task-scheduler.ts)
  • PostgreSQL memory (skills, memory, ops)
  • Telegram channel (grammy)
  • IPC system
  • Health monitoring (simplified)
  • Mount security (still useful for agent sandboxing)

Page 2: Align Install with NanoClaw

Current state (to be replaced)

Our install is a 1,698-line interactive TUI wizard that collects network config, jail profiles, SSH keys, feature flags — all for a jailed deployment model we're abandoning. clawdie-iso's firstboot already handles the machine setup (hostname, GPU, SSH keys, .env secrets). Clawdie-ai's /setup only needs to handle the agent layer on top.

Target state

Identical to NanoClaw: user sits at Lumina desktop, opens terminal, runs claude, types /setup. Claude walks through modular steps, fixing things as it goes. Most machine config is already done by firstboot.

Steps to align

  1. Delete setup/setup-wizard.ts, setup/screenshot-wizard.ts, setup/ansi-to-html.ts

  2. Port NanoClaw's /setup skill (.claude/skills/setup/SKILL.md) with FreeBSD adaptations:

    • Step 0: Git & fork → keep as-is
    • Step 1: Bootstrap → pkg install node24 npm (firstboot may have done this already)
    • Step 2: Environment check → read .env generated by firstboot's shell-env.sh
    • Step 3: Runtime check → verify tmux and pi are installed (no Docker/containers)
    • Step 4: Claude auth → check .env for ANTHROPIC_API_KEY or CLAUDE_CODE_OAUTH_TOKEN
    • Step 5: Channels → delegate to /add-telegram skill
    • Step 6: Mount allowlist → simplify (no container boundary)
    • Step 7: Service → FreeBSD rc.d script
    • Step 8: Verify → FreeBSD-specific checks
  3. Keep our setup modules that still apply:

    • setup/environment.ts — detect platform, auth state
    • setup/register.ts — group registration
    • setup/verify.ts — validation checks
    • setup/service.ts — rewrite for rc.d
  4. Delete setup modules that don't apply:

    • setup/network.ts — jail networking, handled by firstboot
    • setup/jail.ts — jail creation, not needed
  5. Rewrite setup/platform.ts — detect FreeBSD + Lumina instead of macOS/Linux

Integration with clawdie-iso firstboot

The shell-env.sh module in clawdie-iso generates .env with 65 variables including ZAI_API_KEY, TELEGRAM_BOT_TOKEN, etc. The /setup skill should check for this file first and skip asking for secrets already present.

Line count impact

  • Remove: ~2,700 lines (wizard + screenshot + ansi + network + jail)
  • Add: ~200 lines (FreeBSD rc.d service, platform detection)
  • Net: -2,500 lines

Page 3: Runtime Simplification

Current: 4-file jail runtime (~500 lines)

src/jail-runner.ts      786 lines  — spawn agents in jails
src/jail-runtime.ts     173 lines  — detect jail environment
src/jail-ops.ts          85 lines  — jail lifecycle
src/jail-config.ts      358 lines  — jail profiles

Target: 1-file process runner (~200 lines)

Replace with a single src/agent-runner.ts modeled on NanoClaw's container-runner.ts (707 lines → our simpler version ~200):

  • Spawns pi as a subprocess (no tmux pane needed for daemon mode)
  • Passes secrets via env vars directly (no jail stdin trick)
  • Same IPC file protocol (file-based, no changes to IPC layer)
  • Same output sentinel markers (---NANOCLAW_OUTPUT_START/END---)
  • Same timeout/idle logic
  • Per-group folder isolation via directory permissions

What we keep from jail-runner

  • Secrets management (read .env, pass to subprocess)
  • IPC message/task file watching
  • Output streaming with sentinels
  • Per-group logging (groups/{folder}/logs/)
  • Health checkpoint calls

What we drop

  • All jexec, jls, bastille calls
  • Jail profile selection (controlPlane, worker, networkedWorker, browserVm)
  • Mount nullfs logic
  • devfs ruleset management
  • Linux emulation detection
  • Warden bridge configuration

Upstream potential

Our agent-runner.ts could be offered to NanoClaw as a "bare-metal FreeBSD" runtime option. NanoClaw's container-runtime.ts (127 lines) is the right model — a thin abstraction behind 5 functions. We do the same for processes:

// src/agent-runtime.ts — FreeBSD process variant
export const RUNTIME = 'process';
export function spawnAgent(input, groupFolder) { ... }
export function stopAgent(pid) { ... }
export function cleanupOrphans() { ... }

Page 4: Skills Audit — Keep, Drop, Upstream

Skills to DROP (jail/infra-specific, replaced by clawdie-iso)

Skill Reason
warden-bootstrap clawdie-iso firstboot does this
warden-health Jails gone
warden-pf PF rules for jail networking
warden-zfs ZFS for jail snapshots (see Appendix A)
bastille-network Bastille jail manager
browser-vm Lumina has native browser
nginx-glasspane Rethink for Lumina
tmux-screenshot Lumina has native screenshot

Skills to KEEP (still relevant)

Skill Notes
ansible-freebsd Host provisioning on top of clawdie-iso
coding-agent Core: configures pi agent engine and API keys
nginx Still need reverse proxy
ollama Local LLM, runs on host (great upstream candidate)
postgres-memory Supabase memory backend (great upstream candidate)
telegram-admin Telegram bot management
freebsd-admin Host sysadmin tasks
sanoid ZFS snapshots — now must cover /home/clawdie too

Skills to PORT FROM NanoClaw (we're missing)

Skill What it does
add-compact Compact mode for cheaper API usage
add-image-vision Image analysis via Claude vision
add-parallel Parallel agent execution
add-pdf-reader PDF ingestion
add-reactions Emoji reactions to messages
add-voice-transcription Voice message → text
use-local-whisper Local Whisper for voice (aligns with our ollama skill)
update-skills Pull latest skills from upstream

Skills we could UPSTREAM to NanoClaw

Skill Value to upstream
postgres-memory Persistent memory — high demand
ollama Local LLM — popular request
freebsd-admin Expands platform support
ansible-freebsd Infrastructure-as-code pattern
coding-agent Pi agent engine configuration

Best upstream candidates: postgres-memory and ollama — platform-agnostic, add real value, no FreeBSD-specific code.


Page 5: Channel Architecture Decision

Current state

We hardcoded Telegram in src/channels/telegram.ts and removed NanoClaw's multi-channel registry (src/channels/registry.ts).

NanoClaw's approach

Channels are skill branches merged via git. Each channel:

  1. Adds src/channels/{name}.ts implementing Channel interface
  2. Self-registers via class-based pattern in src/channels/registry.ts
  3. Auto-enables when its env var is present (e.g., TELEGRAM_BOT_TOKEN)

Recommendation

Restore the channel registry. Even if we only use Telegram now, the registry costs ~100 lines and keeps us upstream-compatible. Discord, Slack, WhatsApp become /add-discord away.

Action items:

  1. Port src/channels/registry.ts from NanoClaw
  2. Refactor src/channels/telegram.ts to implement Channel interface
  3. Remove hardcoded Telegram init from src/index.ts
  4. Telegram auto-enables when TELEGRAM_BOT_TOKEN is in .env

Note: clawdie-iso's shell-env.sh already writes TELEGRAM_BOT_TOKEN to .env — so Telegram auto-enables out of the box after firstboot.


Page 6: Execution Order and Dependencies

Phase 0: Snapshot before anything (NEW — see Appendix A)

Add zroot/home/clawdie to sanoid config before touching anything:

[zroot/home/clawdie]
  use_template = production
  hourly = 24
  daily = 7
  monthly = 1

Result: Config changes are recoverable.

Phase 1: Clean house (no new features)

  1. Delete TUI wizard + screenshot wizard + ansi-to-html
  2. Delete jail-specific setup modules (network.ts, jail.ts)
  3. Delete jail runtime files (jail-runner, jail-runtime, jail-ops, jail-config)
  4. Delete dead skills (warden-, bastille-, browser-vm, tmux-screenshot)
  5. Delete container/ directory (legacy Docker)

Result: ~3,500 lines removed, project compiles but can't run agents

Phase 2: New runtime

  1. Create src/agent-runner.ts (~200 lines) — subprocess spawner
  2. Create src/agent-runtime.ts (~50 lines) — FreeBSD process runtime
  3. Update src/index.ts to use new runner
  4. Verify IPC still works (file-based, unchanged)

Result: Agents run as processes in Lumina terminal

Phase 3: Install alignment

  1. Port NanoClaw's /setup skill with FreeBSD/Lumina adaptations
  2. Rewrite setup/platform.ts for FreeBSD + Lumina detection
  3. Rewrite setup/service.ts for rc.d
  4. Integrate with clawdie-iso .env (skip already-configured secrets)
  5. Port missing skills from NanoClaw (add-compact, add-image-vision, etc.)

Result: /setup works, full install from fresh clawdie-iso boot

Phase 4: Channel registry

  1. Port src/channels/registry.ts from NanoClaw
  2. Adapt Telegram channel to implement Channel interface
  3. Test multi-channel registration flow

Result: Upstream-compatible channel system

Phase 5: Upstream contributions

  1. Package postgres-memory as standalone skill branch
  2. Package ollama as standalone skill branch
  3. Open PRs to NanoClaw upstream

Result: Give back to the community


Summary

Metric Before After Change
TS source lines ~10,500 ~6,500 -38%
Setup lines ~4,300 ~800 -81%
Runtime files 4 (jail) 2 (process) -50%
Skills 34 ~22 -35%
Dependencies 8 7 -12%
Install method TUI wizard /setup skill Aligned with upstream
Machine setup DIY jails clawdie-iso firstboot Done before agent starts
Agent isolation FreeBSD jails Process + permissions Simpler
Browser Future bhyve VM Native Lumina Done

Appendix A: Incident Report — Pi TUI z.ai Key Loss (11.Mar.2026)

What happened

Pi TUI stopped connecting to z.ai GLM-5 between March 1011. Session logs confirmed it worked on March 10 (provider: "zai", model: "glm-5").

Root cause

During a test install session, pi's config files at ~/.pi/agent/ were overwritten:

  • auth.json — zai entry dropped, only ollama key remained:
    { "ollama": { "type": "api_key", "key": "c1**ab**...sU*Z**V*Z*pz..." } }
    
  • settings.json — defaults switched to openrouter / moonshotai/kimi-k2.5
  • models.json — custom ollama-cloud provider added, overwrote prior config

The ZAI_API_KEY was still in .env but pi reads from auth.json first.

Why ZFS couldn't save us

Sanoid was snapshotting jail datasets only:

zroot/clawdie-runtime/jails/controlplane  — hourly ✓
zroot/clawdie-runtime/jails/db            — hourly ✓
zroot/home/clawdie                        — ZERO snapshots ✗

No rollback possible for ~/.pi/agent/. Diagnosis took ~20 min.

Fix applied (27.Mar.2026)

Restored auth.json with both keys:

{
  "ollama": { "type": "api_key", "key": "c1**ab**...sU*Z**V*..." },
  "zai": { "type": "api_key", "key": "9d**d8**c7*b4**...Jt*Q9*..." }
}

Restored settings.json defaults to zai / glm-4-plus.

z.ai is no longer the target provider. Balance hit zero and the dependency on a Chinese cloud API is undesirable for a self-hosted personal AI. Local llama-cpp replaces it — see LOCAL-LLM.md.

Current direction: pi points at http://10.0.0.5:8081/v1 (local llama-server chat instance). Model choice scales with available RAM:

  • 12 GB: dolphin3.0-phi4-mini Q4_K_M (~2.4 GB) — stepping stone
  • 32 GB: Qwen3-14B Q4_K_M (~9 GB) — target for reliable tool use + HEARTBEAT

Action items baked into Phase 0

Before any refactor work starts:

  1. Add zroot/home/clawdie to sanoid (daily + hourly snapshots)
  2. /setup skill must append to auth.json, never overwrite it
  3. /setup skill must snapshot zroot/home/clawdie before touching pi config
  4. Point pi at local llama-server :8081 — remove z.ai from auth.json

Lesson for the coding-agent skill

The coding-agent skill update should document that pi config lives at ~/.pi/agent/ and is fragile. Any automated setup that touches it must back up first.