clawdie-ai

Author	SHA1	Message	Date
Operator & Codex	bab4f76439	Reorder shared service IPs and switch docs to English root --- Build: pass \| Tests: FAIL — Tests 9 failed \| 2081 passed \| 4 skipped (2094)	2026-05-02 20:21:19 +02:00
Operator & Codex	24ccda6e47	Align root shared DB defaults and drop screenshot auth --- Build: pass \| Tests: FAIL — Tests 8 failed \| 2087 passed \| 4 skipped (2099)	2026-05-02 18:04:09 +02:00
Operator & Claude Code	eb8e05bfe6	docs: split brand identifier from platform namespace; add vocabulary Resolves the collision class where a tenant named `clawdie` would produce `clawdie_ops` clashing with the platform's shared ops DB. Two constants instead of one: - service name / brand / UNIX user: `clawdie` (one of them) - platform namespace prefix for shared resources: `system` Shared DBs become `system_ops` / `system_brain` / `system_skills`; shared dataset becomes `zroot/system-runtime`. `system` joins the reserved_host_labels list so the same collision cannot reappear at the FQDN layer. Also adds: - Vocabulary section distinguishing operator account, service account, service name, platform namespace, assistant display name, tenant id (six terms, one bug class each) - Install-paths section formalizing fresh-machine (ISO) vs existing-host flows; `just install` is the platform install, never the OS install - Service-account override field as bootstrap config, not an onboarding prompt; default stays `clawdie` - Operator-account treatment: existing-host path checks for it; Clawdie never renames or recreates it AGENTS.md "Multitenant Rules" updated to match. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --- Build: pass \| Tests: pass — Tests 2099 passed (2099)	2026-05-02 12:59:24 +02:00
Operator & Claude Code	02f7027f07	docs: collapse 8 multitenant/platform docs into one MULTITENANT.md Single source of truth at docs/internal/MULTITENANT.md (~430 lines) replaces the previous spread across NAMING-POLICY, ARCHITECTURE, HOST-REALITY, INTERNAL-ROLLOUT, ROADMAP, HANDOFF, AGENT-WORKFLOW, and PLATFORM-V2-MANIFESTO. Load-bearing content (vision, conceptual model, naming schema, surfaces, controlplane, publishing, conventions) is folded in; current-state runbooks, phased migration plans, and deployment-drift snapshots are dropped — design phase, fresh start. Identity decision: drop PLATFORM_ID / PLATFORM_SERVICE_NAME / PLATFORM_RUNTIME_USER. Platform identity is the constant 'clawdie' baked into code; ASSISTANT_NAME is display-only and never feeds infra names; TENANT_ID is for additive tenants only. AGENTS.md gains a short "Multitenant Rules" block carrying the day-to-day do/don't extract. Cross-references in AGENT-WORKFLOW-CHECKLIST, AGENT-WORKTREE-WORKFLOW, and the two freebsd-jail-implementation docs updated to point at the new file. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --- Build: pass \| Tests: pass — Tests 2099 passed (2099)	2026-05-02 12:16:58 +02:00
Operator & Claude Code	7c6d076b5c	docs(host-db): correct readiness review — placement was already right Earlier version claimed the readiness wait was "in the wrong place" — only running in the 5-min periodic check. That was wrong: runControlPlaneChecks() is called at src/index.ts:1087, before initDatabase / loadState / initMemoryPool. The wait already gates bootstrap. Trimmed the doc to the real follow-up scope: swap tcpReachable for pg_isready, add HOST_DB_READINESS_TIMEOUT_MS env (default 60s), minimal logging, one timeout-path test. No move, no restructuring. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --- Build: pass \| Tests: FAIL — Tests 3 failed \| 2081 passed (2084)	2026-05-01 11:51:58 +02:00
Operator & Claude Code	5ee8debfd5	docs(host-db): review of `371b237` — readiness wait needs a real probe Improvement over no-wait, but two follow-ups before §E is closed: - default probe is `tcpReachable` — pg opens its socket during WAL recovery while rejecting queries with "starting up", so TCP-open is not the same as accepting connections. Need a SELECT 1 / pg_isready check. - wait runs inside the 5-minute periodic controlplane check, not at Mevy bootstrap. If anything in startup touches DB before the first tick, the wait does not gate the actual race. Plus: 30s default may be tight post-incident, no logs during the wait, no env override, post-deadline extra probe makes timeout fuzzy, and the "3 failed tests" trailer is still present. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --- Build: pass \| Tests: FAIL — Tests 3 failed \| 2081 passed (2084)	2026-05-01 11:44:36 +02:00
Operator & Claude Code	f2358fcb80	docs(host-db): review of `b02746c` — request revisions before merge Implementation review of zai/Codex's "Harden host DB reboot path." Direction is right but three blockers: - snapshots are not atomic (two separate `zfs snapshot` calls reproduce the pgwal/pgdata skew that caused the incident) - `serviceMaybeStop` swallows real `onestatus` errors as "already stopped" — can proceed to checkpoint pg with mevy still running - committed with 3 failing tests Plus smells around missing readiness wait (§E), no spawnSync timeouts, duplicated pool resolution, and an unrelated bonus fix smuggled into the commit. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --- Build: pass \| Tests: FAIL — Tests 2 failed \| 2080 passed (2082)	2026-05-01 10:57:13 +02:00
Operator & Claude Code	2bc14c7040	docs(host-db): tighten recovery plan — atomic snapshots, readiness wait, failure semantics Addendum to `d456aa4`. Three gaps that would have left the plan implementable-but-unsafe: - snapshot step now mandates a single recursive ZFS snapshot of the common parent; two separate snapshots reproduce the pgwal/pgdata skew that caused the 30.apr.2026 incident - new §E: Mevy startup must poll for DB readiness (pg_isready or equivalent); rc.d REQUIRE only orders start invocations, not actual connect-ability - §A now specifies failure semantics for the maintenance-reboot op (each pre-reboot step aborts on failure; reboot only schedules after all prior steps succeed) - pg_resetwal explicitly demoted to non-recovery-path - note that CHECKPOINT before clean stop is belt-and-suspenders Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --- Build: pass \| Tests: pass — Tests 2080 passed (2080) --- Build: pass \| Tests: FAIL — Tests 2 failed \| 2080 passed (2082)	2026-05-01 10:27:08 +02:00
Operator & Codex	d456aa4be1	docs: add host DB recovery plan --- Build: pass \| Tests: FAIL — Tests 3 failed \| 2077 passed (2080)	2026-05-01 10:17:11 +02:00
Operator & Codex	bef38d218a	Add maintainer skills artifact builder --- Build: pass \| Tests: pass — Tests 2075 passed (2075) --- Build: pass \| Tests: pass — Tests 2075 passed (2075)	2026-04-29 13:12:30 +02:00
Operator & Codex	b97e623e3a	Document tenant-site verify states --- Build: pass \| Tests: pass — Tests 2057 passed (2057)	2026-04-29 12:04:25 +02:00
Operator & Codex	6ed65e29c5	Tighten Astro handoff verification status --- Build: FAIL \| Tests: pass — Tests 2055 passed (2055)	2026-04-29 11:34:30 +02:00
Operator & Codex	aa026586a5	Rewrite localization docs for current Astro flow --- Build: FAIL \| Tests: pass — Tests 2055 passed (2055)	2026-04-29 11:31:50 +02:00
Operator & Codex	03a23a965b	Use ječa wording in Slovenian docs --- Build: pass \| Tests: pass — Tests 2044 passed (2044) --- Build: FAIL \| Tests: pass — Tests 2055 passed (2055)	2026-04-29 10:55:07 +02:00
Operator & Codex	d97b0531ff	Clarify host-vs-jail Astro publish commands --- Build: pass \| Tests: pass — Tests 2044 passed (2044) --- Build: FAIL \| Tests: pass — Tests 2055 passed (2055)	2026-04-29 10:54:39 +02:00
Operator & Codex	a81da587bd	Document current Astro publish workflows --- Build: pass \| Tests: pass — Tests 2044 passed (2044)	2026-04-29 10:07:23 +02:00
Operator & Claude Code	7dbbca7197	docs(handoff): rewrite for Codex on-host verification and simplification plan Replaces the decision-tree handoff with a concrete step-by-step test guide for Codex to run on the live host. Documents what Claude already shipped, the exact verification commands, the nginx pattern question (direct vs proxy), and a prioritized simplification assessment. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --- Build: pass \| Tests: FAIL — Tests 8 failed \| 2009 passed (2017)	2026-04-28 14:10:06 +02:00
Operator & Codex	8aeedd54e9	docs(cms): align astro source paths and handoff --- Build: pass \| Tests: pass — Tests 2017 passed (2017)	2026-04-28 13:43:03 +02:00
Operator & Codex	d573daed76	docs(architecture): align docs with live naming and docs topology --- Build: pass \| Tests: pass — Tests 2017 passed (2017) --- Build: pass \| Tests: pass — Tests 2017 passed (2017)	2026-04-27 21:27:44 +02:00
Operator & Claude Code	fdeaa39588	docs: add runtime-manifest architecture documentation - Update ARCHITECTURE.md Prompt Assembly section to document runtime-manifest as a new context layer injected per-message, explaining it answers coherence questions: 'what repo/branch/skills do I have?' - Update docs/internal/AGENT-HARNESS-V2.md Phase 5 to detail both System State and Runtime Manifest as complementary context blocks, explaining the coherence gap they solve together - New docs/internal/RUNTIME-MANIFEST-DESIGN.md: complete specification - Why: agents had infrastructure facts but couldn't see them - What: machine-generated inventory from .git/library.yaml/artifacts - How: fresh per-message, cheap local sources, compact XML-like format - Where: injected in system prompt alongside SOUL/IDENTITY files - Testing: coverage for git parsing, skills counting, specialist discovery The three-layer coherence system is now: 1. Hand-written identity (SOUL/USER/IDENTITY/MEMORY) — philosophy, stable 2. Machine-generated manifest (RUNTIME_MANIFEST) — inventory, fresh 3. Live system state (system-state.ts) — operations, current Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com> --- Build: pass \| Tests: FAIL — Tests 8 failed \| 2009 passed (2017)	2026-04-27 21:21:30 +02:00
Operator & Codex	ae6a1e4ff9	fix(install): relax topology checks for upgrade --- Build: pass \| Tests: pass — Tests 2007 passed (2007) --- Build: pass \| Tests: pass — Tests 2007 passed (2007)	2026-04-27 12:17:44 +02:00
Operator & Codex	c99a04d93f	docs(iso): rename setup import design doc --- Build: pass \| Tests: pass — Tests 2005 passed (2005) --- Build: pass \| Tests: pass — Tests 2005 passed (2005)	2026-04-27 11:58:34 +02:00
Operator & Claude	fee881d458	docs(handoff): review notes on `7919327` install-identity Three issues: ZFS topology fields in mismatchKeys block valid storage expansion upgrades (fix: move layout/data-disks/hot-spares out of mismatch set); rescue mode silent on identity mismatches (fix: add warn log); setup.txt path assumption on ISO path (note for bridge phase, not a blocker). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --- Build: pass \| Tests: FAIL — Tests 4 failed \| 1996 passed (2000) --- Build: pass \| Tests: FAIL — Tests 5 failed \| 2000 passed (2005)	2026-04-27 11:10:19 +02:00
Operator & Claude	d53a1e018d	docs(iso): design contract for shell-setup-txt.sh bridge module Two-phase validation model: POSIX sh reads setup.txt from FAT32 partition at boot (no Node), TypeScript validates fully post-deploy. Covers partition detection, parser, derivation layer (AGENT_DOMAIN, locale, mode translation), system.env passthrough, pool name fix, and Codex implementation checklist. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --- Build: pass \| Tests: FAIL — Tests 4 failed \| 1996 passed (2000)	2026-04-27 10:36:29 +02:00
Operator & Claude	7ccf592fa0	docs(handoff): record 6 ISO-AI drift gaps for Codex Cross-repo analysis after `975f37f` landed. TypeScript setup layer is correct in isolation; gaps are all at the ISO firstboot boundary: no setup.txt reader, pool name mismatch, mode naming divergence, AGENT_DOMAIN derivation missing, Slovenian locale defaults, and system.env unknown to the ISO. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --- Build: pass \| Tests: FAIL — Tests 4 failed \| 1996 passed (2000)	2026-04-27 10:31:03 +02:00
Operator & Codex	975f37f895	feat(install): add versioned setup and system contracts --- Build: pass \| Tests: pass — Tests 2000 passed (2000)	2026-04-27 10:06:44 +02:00
Operator & Codex	d5182ec480	docs+setup: clarify install mode names --- Build: pass \| Tests: pass — Tests 1992 passed (1992)	2026-04-27 09:07:18 +02:00
Operator & Codex	bcb27d4d56	feat(install): backfill setup from inspect output --- Build: pass \| Tests: FAIL — Tests 2 failed \| 1989 passed (1991)	2026-04-27 08:55:21 +02:00
Operator & Codex	7b14e27783	feat(install): add shell-based inspect mode --- Build: pass \| Tests: pass — Tests 1991 passed (1991)	2026-04-27 08:47:56 +02:00
Operator & Codex	2ab3fa050a	refactor(setup): unify operator auth entrypoints --- Build: pass \| Tests: pass — Tests 1991 passed (1991)	2026-04-27 08:13:36 +02:00
Operator & Codex	1425aa08eb	feat(setup): add first-boot install modes and storage contract --- Build: pass \| Tests: pass — Tests 1990 passed (1990) # Conflicts: # docs/internal/ISO-FIRST-BOOT-IMPLEMENTATION.md # docs/public/install/first-boot.md # Conflicts: # docs/internal/ISO-FIRST-BOOT-IMPLEMENTATION.md # docs/internal/ISO-FIRST-BOOT-SECRETS-HANDOFF.md	2026-04-27 08:02:27 +02:00
Operator & claude	a16838b772	docs(handoff): record adopt-mode decisions + flag operator-auth unification Round 5 in the handoff doc captures the five agreed adopt-mode decisions (INSTALL_MODE field, fill-blanks default, identity mismatch blocks, Telegram identity changes require explicit flag, fingerprint gate) so they survive into Codex's design doc. Implementation doc gets an "Adopt Mode (V1.1)" section with the proposed 4-task split + per-field freeze contract table, plus a task-4 followup subsection naming the legacy `operators` table sync gap and the unification plan with Codex's setup/operator-auth.ts. scripts/set-operator.ts gets a TODO(unify) header pointing at the same gap. first-boot.md notes adopt mode is V1.1 and to back up before reflashing until then. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --- Build: pass \| Tests: FAIL — Tests 3 failed \| 1972 passed (1975)	2026-04-27 07:12:55 +02:00
Operator & claude	0e01ecc8ca	docs(install): align install/architecture pages with V1 first-boot Net -206 lines across install docs while making the V1 first-boot model the recommended path: - install/index: restructure to put first-boot + ISO as the recommended path; existing-host install demoted. - install/iso: collapse to image selection + USB write; defer the V1 setup.txt flow to first-boot.md (saves ~30 lines). - install/requirements: drop @Andy/Mac-launchd/personal-config sections and the duplicated memory/session/task model that lives in architecture docs (saves ~150 lines). - install/install: reframe the onboarding step as setup.txt-first with TUI as the explicit fallback. - install/fresh-install-checklist: replace bsddialog wizard milestone with setup.txt seed milestone, note TUI fallback case. - architecture/deployment-models: ISO model now says "setup.txt seed, TUI fallback". - architecture/admin-panel: note planned set-operator menu entry. - ISO-FIRST-BOOT-IMPLEMENTATION: sharpen task 4 reasoning — clawdie-admin exists but as a TUI launcher, not a CLI router. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --- Build: pass \| Tests: FAIL — Tests 3 failed \| 1972 passed (1975)	2026-04-27 06:54:33 +02:00
Operator & claude	29fbb1e6c8	docs(install): draft skeletal first-boot V1 walkthrough Lands task 6 in skeleton form: docs/public/install/first-boot.md covers the four required lines, optional fields (profile, locale, dashboard credentials, SSH key, headless password), the post-install set-operator command, and how to switch off OpenRouter. Two TBD blocks remain: "Where setup.txt lives" (waits on task 5 delivery-mechanism validation) and "Troubleshooting" (waits on real failure traces from the ISO build). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --- Build: pass \| Tests: FAIL — Tests 3 failed \| 1972 passed (1975)	2026-04-27 06:47:48 +02:00
Operator & claude	b9e771316d	feat(setup): add set-operator script for post-install dashboard credentials Lands task 4 from the ISO first-boot implementation split as a standalone scripts/set-operator.ts (matches existing scripts/ convention — no clawdie-admin umbrella). Reuses ensureControlplaneBootstrapOperator() for the Better Auth signUp path. Prompts password via stdin with echo suppressed; refuses non-TTY runs; updates OPERATOR_PASSWORD in .env (mode 0600). First-set only — rotation goes through the dashboard. Both planning docs updated to drop "notional" references and point at the real npm run set-operator -- <email> command. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --- Build: pass \| Tests: FAIL — Tests 3 failed \| 1972 passed (1975)	2026-04-27 06:41:53 +02:00
Operator & Codex	1971a8075e	feat(setup): add first-boot config parser and profile bundles --- Build: pass \| Tests: pass — Tests 1975 passed (1975) --- Build: pass \| Tests: pass — Tests 1975 passed (1975)	2026-04-27 06:32:33 +02:00
Operator & claude	8086135183	docs(handoff): round 4 on ISO first-boot — SSH key + headless credentials Drops ROOT_PASSWORD (root locked by default), adds SSH_AUTHORIZED_KEY as the preferred headless box-access path, adds CLAWDIE_USER_PASSWORD as fallback only. Parser warns visibly when plaintext passwords are present in setup.txt. Implementation doc task 1 (parser) and task 5 (delivery validation) extended to cover the new fields. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --- Build: pass \| Tests: FAIL — Tests 3 failed \| 1958 passed (1961)	2026-04-27 06:15:16 +02:00
Operator & claude	e405c3df1a	docs(handoff): freeze ISO first-boot spec + add 6-task implementation split Folds three Round 3 clarifications inline (notional clawdie-admin surface, ASSISTANT_NAME blank → Clawdie, OpenRouter as bootstrap not commitment) and adds ISO-FIRST-BOOT-IMPLEMENTATION.md as the implementation contract with claim slots. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --- Build: pass \| Tests: FAIL — Tests 3 failed \| 1958 passed (1961)	2026-04-27 05:22:03 +02:00
Operator & claude	2f2b5e5376	docs(handoff): round 3 on ISO first-boot secrets — final field shape Folds in Codex's three reservations on Round 2. Round 2 field list is now superseded by Round 3 below. Resolutions: - "First registration wins" registration window dropped. The mitigations Round 2 proposed (IP logging, post-install summary) were detection, not prevention — useless against an attacker on a shared LAN who registered first. Replaced with Option α: if dashboard credentials are missing from setup.txt, the dashboard waits until the operator runs `clawdie-admin set-operator <email>` post-install. Telegram remains the operator interface in the meantime. Option β (Telegram CONFIRM flow for registration requests) documented as the upgrade path if dashboard becomes load-bearing enough to justify the extra friction. - PROFILE=balanced moved from "required" to "prefilled." If it always defaults, calling it required misrepresents the operator's cognitive load. The line stays in setup.txt as visible documentation, not as a question the operator must answer. - ASSISTANT_NAME promoted to "recommended" tier; HOSTNAME demoted to "optional with derived default." The project currently conflates two distinct concepts (system-admin hostname vs emotional assistant identity); for first-boot, the emotional one is what the operator cares about. HOSTNAME defaults to lowercased ASSISTANT_NAME. Round 3 field list (authoritative): - 3 required (OpenRouter key, Telegram bot token, Telegram admin ID) - 1 recommended (ASSISTANT_NAME) - 1 prefilled (PROFILE=balanced) - 4 optional (TIMEZONE, HOSTNAME, OPERATOR_EMAIL, OPERATOR_PASSWORD) Cognitive bar before first boot: four lines the operator types into. Everything else has a sensible fallback. Doc split (Codex's recommendation to extract a V1 onboarding spec doc plus an implementation task breakdown) acknowledged as the right next move, but premature — two items remain open (seed delivery mechanism, clawdie-admin set-operator surface). Split happens once those resolve. Section explicitly lists what's now firmly decided vs still open after Round 3 so future readers don't re-litigate closed questions or silently commit open ones. No code changes. Pure planning convergence. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --- Build: pass \| Tests: FAIL — Tests 3 failed \| 1958 passed (1961)	2026-04-26 21:21:57 +02:00
Operator & claude	2580eed76c	docs(handoff): round 2 on ISO first-boot secrets — credentials decision Captures the converged state after Codex's pushback on the Round 1 take. Three pushbacks accepted with resolutions; one open question (OPERATOR_EMAIL / OPERATOR_PASSWORD in setup.txt) resolved as a hybrid; final V1 field list locked. Resolutions: - Seed-partition specifics (256 MB FAT32, label "CLAWDIE-SEED") demoted from "spec" to "direction." Architectural commitment is file-based seed import; exact mechanism stays open until validated against the ISO repo and real flashers. - Auto-wipe of setup.txt on import is dropped. Replaced with an installer warning to the operator (immediate + post-install summary) telling them to reformat the media. Keeps multi-machine reflash working; treats credential hygiene as documented operator action, not silent destruction. - PROFILE explicitly sets all three (chat primary, fallback, compaction) as a coordinated bundle. Splitting them re-creates the configuration sprawl the profile is supposed to prevent. Advanced operators drop down to explicit lines that override the profile mapping. OPERATOR_EMAIL / OPERATOR_PASSWORD resolution: - Both optional in setup.txt. - If both present: installer pre-creates the operator account in Better Auth on first boot. Unattended-install path. - If either missing: Better Auth opens a "first registration wins" window (default 30 min, configurable) for local-network IPs only. First person to hit /dashboard registers through the normal sign-up form. Window auto-closes on success or timeout. - Bound to local-network IPs via existing CONTROLPLANE_AUTH_MODE semantics; full source IP logged; "operator account registered from <ip>" surfaced in post-install summary so hijacked registration is visible immediately. - Recovery via "clawdie-admin reopen-registration --minutes 30" CLI if window expires. Final V1 field list: 5 required (OPENROUTER_API_KEY, TELEGRAM_BOT_TOKEN, TELEGRAM_ADMIN_ID, PROFILE=balanced, HOSTNAME=clawdie) + 3 optional (TIMEZONE, OPERATOR_EMAIL, OPERATOR_PASSWORD). Anything else gets configured from the live system, not from setup.txt. Three items explicitly listed as still-open after Round 2 (seed mechanism, registration window default, post-install summary delivery channel) so they don't get silently committed. No code changes. The Claude Take + Round 2 sections are scoped to lift in the same commit that lands the actual seed-import implementation. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --- Build: pass \| Tests: FAIL — Tests 3 failed \| 1958 passed (1961)	2026-04-26 21:11:39 +02:00
Operator & claude	59d9aaf020	docs(handoff): claude take on ISO first-boot secrets Answers all six review questions in the handoff doc with a single recommended V1 design (writable seed partition + profile indirection + TUI fallback), two realistic alternatives (post-bootstrap web/SSH config, two-USB), eight named risks, and a complete eight-field setup.txt template. Operator-facing rename folded in: setup.env → setup.txt. The .env extension is a developer convention; setup.txt opens cleanly in any text editor on Win/Mac/Linux without configuration, which removes one of the largest non-technical-operator friction points in the flow. Profile indirection (PROFILE=balanced/economy/quality) keeps model IDs out of operator hands at install time and lets the team change the validated mapping over time without breaking old setup.txt files. The installer resolves the profile to actual PI_TUI_PROVIDER/PI_TUI_MODEL/LLM_FALLBACK_* at install time. The take also flags the second onboarding cliff (Telegram BotFather flow, easily underestimated) and the V2 follow-up (web-based setup wizard) so the seed-file work isn't throwaway when the better UX ships later. No code changes. Pure handoff response in docs/internal/ISO-FIRST-BOOT-SECRETS-HANDOFF.md. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --- Build: pass \| Tests: FAIL — Tests 3 failed \| 1958 passed (1961)	2026-04-26 20:59:19 +02:00
Operator & Codex	83c33a2080	docs(handoff): add ISO first-boot secrets review Adds a focused handoff doc for reviewing V1 ISO secrets onboarding, setup.env scope, seed-media import, and provider bootstrap choices. --- Build: pass \| Tests: pass — docs only --- Build: pass \| Tests: pass — Tests 1961 passed (1961) --- Build: pass \| Tests: pass — Tests 1961 passed (1961)	2026-04-26 20:52:12 +02:00
Operator & claude	e672af354a	docs(plan): pivot ISO V1 plan from reactive to release-shaped Codex's 6-step plan from the 26.apr.2026 chat session (defaults policy → state policy → token burn → truth-surface polish → smoke checklist → ISO dry run) lands in BOOTABLE-ISO-PLAN-V1.md, with six refinements integrated: - Already-resolved snapshot section so reading-cold agents do not re-open closed questions (provider fallback works end-to-end, cooldown path normalized in `6983415`, Token Ledgers in /budgetreport, Telegram basics stable). - Step 1 explicitly absorbs the previously-separate Open Questions list (default primary, default fallback, free-tier policy, identity wording) so "freeze defaults" actually closes them rather than parallel-tracking. Identity wording is named as the same root cause as the `fe14fad` fixture failures, not a separate concern. - Step 1 notes the cost-amplification trap in "compaction follows primary" (when fallback is paid-stable, compaction follows there too — burn during cooldown can amplify). - Step 3 (token burn) promoted ahead of polish work because a fresh ISO install that quietly eats budget leaves a worse first impression than any of the polish items. - Step 5 smoke checklist gains per-item triage hints so the dry-run operator knows what to check first when an item fails. - Step 6 explicitly notes one dry run only catches one-time issues; two runs on different hardware is the post-1.0.0 bar, so nobody is surprised when the first prod install hits something the lone dry run missed. Each step is now tagged by which agent class can claim it (decision/docs / code / deploy) so independent claims do not stall on "only Codex can do this." Original Goal / Success Criteria / Non-Goals / Operator DoD / Working Rule preserved verbatim at the top and bottom of the doc. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --- Build: pass \| Tests: FAIL — Tests 3 failed \| 1958 passed (1961)	2026-04-26 19:52:21 +02:00
Operator & Codex	6983415357	fix(runtime): normalize provider cooldown state path Provider cooldown persistence now follows AGENT_STATUS_DIR, then legacy CLAWDIE_VAR_DIR, and otherwise defaults to repo-local tmp/state instead of ~/.clawdie/state. Updates docs to match the live behavior. --- Build: pass \| Tests: pass — 28 passed (2 files) --- Build: pass \| Tests: pass — Tests 1961 passed (1961)	2026-04-26 16:37:21 +02:00
Operator & claude	32e671c802	docs(.env.example): record validated chat + compaction config Closes the .env drift Codex flagged on 26.apr.2026: the live deploy runtime had switched to openrouter/openai/o3 as chat default and unpinned AGENT_COMPACTION_PROVIDER, but neither change was reflected in .env.example. Effect: a fresh ISO build or reinstall would have started in the old (silent-no-reply) configuration. This commit does not change the current zai/glm-5-turbo example primary — some operators have working zAI keys with budget — but adds a clearly-marked "known-stable alternative primary" block that documents the openrouter/openai/o3 setup the operator validated today, with the rationale (zAI 5-hour cap → silent no-reply). The AGENT_COMPACTION_PROVIDER block now explains both modes: unset (compaction follows chat runtime, including fallback) is the validated default; pinning decouples compaction from chat fallback for cost or stability reasons. The previous one-liner left both pieces undocumented. provider-fallback.md gets a matching "Compaction interaction" note so the reading order from the operator guide ends up at the same answer. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --- Build: pass \| Tests: FAIL — Tests 3 failed \| 1956 passed (1959)	2026-04-26 15:11:27 +02:00
Operator & claude	3e41cf6072	docs: handoff for fallback model + token ledger UX + ISO plan review Three findings from the operator's afternoon session, captured for Codex to act on or defer. One docs change pre-applied; rest are notes. Pre-applied in this commit: - docs/public/operate/provider-fallback.md: example fallback model changed from meta-llama/llama-3.3-70b-instruct:free to openai/o3 (paid, stable). New "Choosing a fallback model" subsection warns explicitly that free-tier models are unsafe as fallback targets — they rate-limit silently and the failure mode is indistinguishable from "agent dead." Operator hit this in production today. - .env.example: LLM_FALLBACK_PROVIDER, LLM_FALLBACK_MODEL, LLM_FALLBACK_DEFAULT_COOLDOWN_SECONDS now documented (were missing entirely), with the same free-tier warning inline. New session block in docs/internal/MULTITENANT-HANDOFF.md: - Finding 1 (V1-blocker): live .env on deploy host needs the same model swap; consider startup WARN if LLM_FALLBACK_MODEL ends with :free; decide whether silent rate-limit-no-output should bubble as a visible Telegram error. - Finding 2: token ledger views (/usage, /tokens, /policy) are arithmetically reconcilable but ask operator to mental-diff across three places. Recommended fix is a "Token Ledgers" section in /budgetreport showing quota + activity + reset-archived together. - Finding 3: verify whether the mevy 0→14054 spent_today snapshot was a reset or a recording-path bug in recordTokenSpend (`a73f211`). - Finding 4: review notes on BOOTABLE-ISO-PLAN-V1.md — promote identity wording from open question to Priority; split Priority A into regeneratable-status vs persistent-state; add synthetic-cap test path for fallback verification; add brief risk register. No code changes. Docs and a single .env.example block. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --- Build: pass \| Tests: FAIL — Tests 2 failed \| 1949 passed (1951)	2026-04-26 14:10:41 +02:00
Operator & Codex	d8794983e0	docs(plan): add bootable ISO v1 target --- Build: pass \| Tests: pass — Tests 1951 passed (1951) --- Build: pass \| Tests: pass — Tests 1951 passed (1951)	2026-04-26 13:26:58 +02:00
Operator & claude	3828e5ce83	docs: integrate operator observability + provider fallback work Brings the public docs in line with what shipped on multitenant over the last few days. Three new operator-facing pages, three updates to existing ones, and a CHANGELOG batch. New pages (docs/public/operate/): - operator-commands.md — single reference for all Telegram slash commands, grouped by purpose (status, structured reports, runtime, sessions, admin actions) with auth gating per command. Previously only in-bot /help text. - provider-fallback.md — operator guide for the cooldown layer: env vars, how cooldowns are detected and tracked, /policy surfacing, /clearcooldown for manual release, the configured/effective/actual observability triple. Includes a "path convention note" flagging that the cooldown file still uses the legacy $CLAWDIE_VAR_DIR resolution while test/build status files have moved to repo tmp/ — divergence to harmonize later in code. - structured-reports.md — explains the Observed/Interpretation/Operator Notes pattern, lists the six structured reports, documents the test/build pipeline contract (status JSON schema + new $AGENT_STATUS_DIR → $CLAWDIE_VAR_DIR → tmp/status precedence Codex landed in `1389e17`), and covers free-text routing (classifyReportIntent + isOpsFlavored). Updates: - monitoring.md: appended "Operator-Facing Reports" section pointing at the new structured-reports page, and "Provider Fallback Health" pointing at the fallback page. - operate/index.md: added the three new pages to the runbook list. - architecture/controlplane.md: added "Runtime Observability" section documenting the configured/effective/actual triple and linking to the new operate pages. - README.md: expanded the Telegram Commands table (was 10 rows, missing every structured report, /policy, /clearcooldown, /budgetreset) and added a pointer to operator-commands.md as the full reference. Also noted free-text routing. - CHANGELOG.md: appended an "operator observability + provider fallback, apr.2026" batch under [Unreleased] covering provider fallback, the reports family, the test/build wrapper pipeline, free-text routing, /clearcooldown, the observability triple, the Telegram setMyCommands menu, and the new "Verify Before Claiming Remote State" rule in AGENTS.md. No code changes. Slovenian sl/ mirror left untouched (out of localization scope). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --- Build: pass \| Tests: FAIL — Tests 8 failed \| 1940 passed (1948) --- Build: pass \| Tests: FAIL — Tests 2 failed \| 1949 passed (1951)	2026-04-26 13:01:43 +02:00
Operator & claude	9acbd1bfc3	docs: handoff for nanoclaw cleanup + verify-before-claim rule Two outputs from this session bundled for the next agent: - AGENTS.md gains "Verify Before Claiming Remote State" — durable rule born from the `1e87f34` vs `3d33482` confusion: don't speak about a remote without a fresh git fetch. When two agents disagree about a tip, both fetch before debugging. - MULTITENANT-HANDOFF.md gains a 26.apr session block telling Codex how to disable the nanoclaw upstream remote in each worktree without deleting the source code (setup/upstream.ts and the check_upstream_updates MCP tool both gracefully degrade and stay useful as a re-enable path). No code changes. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --- Build: pass \| Tests: pass — Tests 1944 passed (1944) --- Build: pass \| Tests: pass — Tests 1944 passed (1944)	2026-04-26 10:43:58 +02:00
Operator & Codex	b4996f732f	chore(docs): salvage public hardening note and git setup test --- Build: FAIL \| Tests: FAIL	2026-04-26 08:44:45 +02:00

1 2 3 4 5 ...

351 commits