Commit graph

351 commits

Author SHA1 Message Date
bab4f76439 Reorder shared service IPs and switch docs to English root
---
Build: pass | Tests: FAIL — Tests  9 failed | 2081 passed | 4 skipped (2094)
2026-05-02 20:21:19 +02:00
24ccda6e47 Align root shared DB defaults and drop screenshot auth
---
Build: pass | Tests: FAIL — Tests  8 failed | 2087 passed | 4 skipped (2099)
2026-05-02 18:04:09 +02:00
Operator & Claude Code
eb8e05bfe6 docs: split brand identifier from platform namespace; add vocabulary
Resolves the collision class where a tenant named `clawdie` would
produce `clawdie_ops` clashing with the platform's shared ops DB.

Two constants instead of one:
- service name / brand / UNIX user: `clawdie` (one of them)
- platform namespace prefix for shared resources: `system`

Shared DBs become `system_ops` / `system_brain` / `system_skills`;
shared dataset becomes `zroot/system-runtime`. `system` joins the
reserved_host_labels list so the same collision cannot reappear at
the FQDN layer.

Also adds:
- Vocabulary section distinguishing operator account, service
  account, service name, platform namespace, assistant display name,
  tenant id (six terms, one bug class each)
- Install-paths section formalizing fresh-machine (ISO) vs
  existing-host flows; `just install` is the platform install, never
  the OS install
- Service-account override field as bootstrap config, not an
  onboarding prompt; default stays `clawdie`
- Operator-account treatment: existing-host path checks for it;
  Clawdie never renames or recreates it

AGENTS.md "Multitenant Rules" updated to match.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---
Build: pass | Tests: pass — Tests  2099 passed (2099)
2026-05-02 12:59:24 +02:00
Operator & Claude Code
02f7027f07 docs: collapse 8 multitenant/platform docs into one MULTITENANT.md
Single source of truth at docs/internal/MULTITENANT.md (~430 lines)
replaces the previous spread across NAMING-POLICY, ARCHITECTURE,
HOST-REALITY, INTERNAL-ROLLOUT, ROADMAP, HANDOFF, AGENT-WORKFLOW, and
PLATFORM-V2-MANIFESTO. Load-bearing content (vision, conceptual model,
naming schema, surfaces, controlplane, publishing, conventions) is
folded in; current-state runbooks, phased migration plans, and
deployment-drift snapshots are dropped — design phase, fresh start.

Identity decision: drop PLATFORM_ID / PLATFORM_SERVICE_NAME /
PLATFORM_RUNTIME_USER. Platform identity is the constant 'clawdie'
baked into code; ASSISTANT_NAME is display-only and never feeds infra
names; TENANT_ID is for additive tenants only. AGENTS.md gains a short
"Multitenant Rules" block carrying the day-to-day do/don't extract.

Cross-references in AGENT-WORKFLOW-CHECKLIST, AGENT-WORKTREE-WORKFLOW,
and the two freebsd-jail-implementation docs updated to point at the
new file.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---
Build: pass | Tests: pass — Tests  2099 passed (2099)
2026-05-02 12:16:58 +02:00
Operator & Claude Code
7c6d076b5c docs(host-db): correct readiness review — placement was already right
Earlier version claimed the readiness wait was "in the wrong place" —
only running in the 5-min periodic check. That was wrong:
runControlPlaneChecks() is called at src/index.ts:1087, before
initDatabase / loadState / initMemoryPool. The wait already gates
bootstrap.

Trimmed the doc to the real follow-up scope: swap tcpReachable for
pg_isready, add HOST_DB_READINESS_TIMEOUT_MS env (default 60s),
minimal logging, one timeout-path test. No move, no restructuring.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---
Build: pass | Tests: FAIL — Tests  3 failed | 2081 passed (2084)
2026-05-01 11:51:58 +02:00
Operator & Claude Code
5ee8debfd5 docs(host-db): review of 371b237 — readiness wait needs a real probe
Improvement over no-wait, but two follow-ups before §E is closed:

- default probe is `tcpReachable` — pg opens its socket during WAL
  recovery while rejecting queries with "starting up", so TCP-open
  is not the same as accepting connections. Need a SELECT 1 /
  pg_isready check.
- wait runs inside the 5-minute periodic controlplane check, not at
  Mevy bootstrap. If anything in startup touches DB before the
  first tick, the wait does not gate the actual race.

Plus: 30s default may be tight post-incident, no logs during the
wait, no env override, post-deadline extra probe makes timeout
fuzzy, and the "3 failed tests" trailer is still present.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---
Build: pass | Tests: FAIL — Tests  3 failed | 2081 passed (2084)
2026-05-01 11:44:36 +02:00
Operator & Claude Code
f2358fcb80 docs(host-db): review of b02746c — request revisions before merge
Implementation review of zai/Codex's "Harden host DB reboot path."
Direction is right but three blockers:

- snapshots are not atomic (two separate `zfs snapshot` calls
  reproduce the pgwal/pgdata skew that caused the incident)
- `serviceMaybeStop` swallows real `onestatus` errors as
  "already stopped" — can proceed to checkpoint pg with mevy
  still running
- committed with 3 failing tests

Plus smells around missing readiness wait (§E), no spawnSync
timeouts, duplicated pool resolution, and an unrelated bonus fix
smuggled into the commit.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---
Build: pass | Tests: FAIL — Tests  2 failed | 2080 passed (2082)
2026-05-01 10:57:13 +02:00
Operator & Claude Code
2bc14c7040 docs(host-db): tighten recovery plan — atomic snapshots, readiness wait, failure semantics
Addendum to d456aa4. Three gaps that would have left the plan
implementable-but-unsafe:

- snapshot step now mandates a single recursive ZFS snapshot of the
  common parent; two separate snapshots reproduce the pgwal/pgdata skew
  that caused the 30.apr.2026 incident
- new §E: Mevy startup must poll for DB readiness (pg_isready or
  equivalent); rc.d REQUIRE only orders start invocations, not actual
  connect-ability
- §A now specifies failure semantics for the maintenance-reboot op
  (each pre-reboot step aborts on failure; reboot only schedules after
  all prior steps succeed)
- pg_resetwal explicitly demoted to non-recovery-path
- note that CHECKPOINT before clean stop is belt-and-suspenders

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---
Build: pass | Tests: pass — Tests  2080 passed (2080)

---
Build: pass | Tests: FAIL — Tests  2 failed | 2080 passed (2082)
2026-05-01 10:27:08 +02:00
d456aa4be1 docs: add host DB recovery plan
---
Build: pass | Tests: FAIL — Tests  3 failed | 2077 passed (2080)
2026-05-01 10:17:11 +02:00
bef38d218a Add maintainer skills artifact builder
---
Build: pass | Tests: pass — Tests  2075 passed (2075)

---
Build: pass | Tests: pass — Tests  2075 passed (2075)
2026-04-29 13:12:30 +02:00
b97e623e3a Document tenant-site verify states
---
Build: pass | Tests: pass — Tests  2057 passed (2057)
2026-04-29 12:04:25 +02:00
6ed65e29c5 Tighten Astro handoff verification status
---
Build: FAIL | Tests: pass — Tests  2055 passed (2055)
2026-04-29 11:34:30 +02:00
aa026586a5 Rewrite localization docs for current Astro flow
---
Build: FAIL | Tests: pass — Tests  2055 passed (2055)
2026-04-29 11:31:50 +02:00
03a23a965b Use ječa wording in Slovenian docs
---
Build: pass | Tests: pass — Tests  2044 passed (2044)

---
Build: FAIL | Tests: pass — Tests  2055 passed (2055)
2026-04-29 10:55:07 +02:00
d97b0531ff Clarify host-vs-jail Astro publish commands
---
Build: pass | Tests: pass — Tests  2044 passed (2044)

---
Build: FAIL | Tests: pass — Tests  2055 passed (2055)
2026-04-29 10:54:39 +02:00
a81da587bd Document current Astro publish workflows
---
Build: pass | Tests: pass — Tests  2044 passed (2044)
2026-04-29 10:07:23 +02:00
Operator & Claude Code
7dbbca7197 docs(handoff): rewrite for Codex on-host verification and simplification plan
Replaces the decision-tree handoff with a concrete step-by-step test guide
for Codex to run on the live host. Documents what Claude already shipped,
the exact verification commands, the nginx pattern question (direct vs proxy),
and a prioritized simplification assessment.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---
Build: pass | Tests: FAIL — Tests  8 failed | 2009 passed (2017)
2026-04-28 14:10:06 +02:00
8aeedd54e9 docs(cms): align astro source paths and handoff
---
Build: pass | Tests: pass — Tests  2017 passed (2017)
2026-04-28 13:43:03 +02:00
d573daed76 docs(architecture): align docs with live naming and docs topology
---
Build: pass | Tests: pass — Tests  2017 passed (2017)

---
Build: pass | Tests: pass — Tests  2017 passed (2017)
2026-04-27 21:27:44 +02:00
Operator & Claude Code
fdeaa39588 docs: add runtime-manifest architecture documentation
- Update ARCHITECTURE.md Prompt Assembly section to document runtime-manifest
  as a new context layer injected per-message, explaining it answers
  coherence questions: 'what repo/branch/skills do I have?'

- Update docs/internal/AGENT-HARNESS-V2.md Phase 5 to detail both System State
  and Runtime Manifest as complementary context blocks, explaining the
  coherence gap they solve together

- New docs/internal/RUNTIME-MANIFEST-DESIGN.md: complete specification
  - Why: agents had infrastructure facts but couldn't see them
  - What: machine-generated inventory from .git/library.yaml/artifacts
  - How: fresh per-message, cheap local sources, compact XML-like format
  - Where: injected in system prompt alongside SOUL/IDENTITY files
  - Testing: coverage for git parsing, skills counting, specialist discovery

The three-layer coherence system is now:
  1. Hand-written identity (SOUL/USER/IDENTITY/MEMORY) — philosophy, stable
  2. Machine-generated manifest (RUNTIME_MANIFEST) — inventory, fresh
  3. Live system state (system-state.ts) — operations, current

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

---
Build: pass | Tests: FAIL — Tests  8 failed | 2009 passed (2017)
2026-04-27 21:21:30 +02:00
ae6a1e4ff9 fix(install): relax topology checks for upgrade
---
Build: pass | Tests: pass — Tests  2007 passed (2007)

---
Build: pass | Tests: pass — Tests  2007 passed (2007)
2026-04-27 12:17:44 +02:00
c99a04d93f docs(iso): rename setup import design doc
---
Build: pass | Tests: pass — Tests  2005 passed (2005)

---
Build: pass | Tests: pass — Tests  2005 passed (2005)
2026-04-27 11:58:34 +02:00
fee881d458 docs(handoff): review notes on 7919327 install-identity
Three issues: ZFS topology fields in mismatchKeys block valid storage
expansion upgrades (fix: move layout/data-disks/hot-spares out of
mismatch set); rescue mode silent on identity mismatches (fix: add
warn log); setup.txt path assumption on ISO path (note for bridge
phase, not a blocker).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---
Build: pass | Tests: FAIL — Tests  4 failed | 1996 passed (2000)

---
Build: pass | Tests: FAIL — Tests  5 failed | 2000 passed (2005)
2026-04-27 11:10:19 +02:00
d53a1e018d docs(iso): design contract for shell-setup-txt.sh bridge module
Two-phase validation model: POSIX sh reads setup.txt from FAT32
partition at boot (no Node), TypeScript validates fully post-deploy.
Covers partition detection, parser, derivation layer (AGENT_DOMAIN,
locale, mode translation), system.env passthrough, pool name fix,
and Codex implementation checklist.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---
Build: pass | Tests: FAIL — Tests  4 failed | 1996 passed (2000)
2026-04-27 10:36:29 +02:00
7ccf592fa0 docs(handoff): record 6 ISO-AI drift gaps for Codex
Cross-repo analysis after 975f37f landed. TypeScript setup layer is
correct in isolation; gaps are all at the ISO firstboot boundary:
no setup.txt reader, pool name mismatch, mode naming divergence,
AGENT_DOMAIN derivation missing, Slovenian locale defaults, and
system.env unknown to the ISO.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---
Build: pass | Tests: FAIL — Tests  4 failed | 1996 passed (2000)
2026-04-27 10:31:03 +02:00
975f37f895 feat(install): add versioned setup and system contracts
---
Build: pass | Tests: pass — Tests  2000 passed (2000)
2026-04-27 10:06:44 +02:00
d5182ec480 docs+setup: clarify install mode names
---
Build: pass | Tests: pass — Tests  1992 passed (1992)
2026-04-27 09:07:18 +02:00
bcb27d4d56 feat(install): backfill setup from inspect output
---
Build: pass | Tests: FAIL — Tests  2 failed | 1989 passed (1991)
2026-04-27 08:55:21 +02:00
7b14e27783 feat(install): add shell-based inspect mode
---
Build: pass | Tests: pass — Tests  1991 passed (1991)
2026-04-27 08:47:56 +02:00
2ab3fa050a refactor(setup): unify operator auth entrypoints
---
Build: pass | Tests: pass — Tests  1991 passed (1991)
2026-04-27 08:13:36 +02:00
1425aa08eb feat(setup): add first-boot install modes and storage contract
---
Build: pass | Tests: pass — Tests  1990 passed (1990)

# Conflicts:
#	docs/internal/ISO-FIRST-BOOT-IMPLEMENTATION.md
#	docs/public/install/first-boot.md

# Conflicts:
#	docs/internal/ISO-FIRST-BOOT-IMPLEMENTATION.md
#	docs/internal/ISO-FIRST-BOOT-SECRETS-HANDOFF.md
2026-04-27 08:02:27 +02:00
Operator & claude
a16838b772 docs(handoff): record adopt-mode decisions + flag operator-auth unification
Round 5 in the handoff doc captures the five agreed adopt-mode
decisions (INSTALL_MODE field, fill-blanks default, identity
mismatch blocks, Telegram identity changes require explicit flag,
fingerprint gate) so they survive into Codex's design doc.

Implementation doc gets an "Adopt Mode (V1.1)" section with the
proposed 4-task split + per-field freeze contract table, plus a
task-4 followup subsection naming the legacy `operators` table
sync gap and the unification plan with Codex's
setup/operator-auth.ts. scripts/set-operator.ts gets a TODO(unify)
header pointing at the same gap.

first-boot.md notes adopt mode is V1.1 and to back up before
reflashing until then.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---
Build: pass | Tests: FAIL — Tests  3 failed | 1972 passed (1975)
2026-04-27 07:12:55 +02:00
Operator & claude
0e01ecc8ca docs(install): align install/architecture pages with V1 first-boot
Net -206 lines across install docs while making the V1 first-boot
model the recommended path:

- install/index: restructure to put first-boot + ISO as the
  recommended path; existing-host install demoted.
- install/iso: collapse to image selection + USB write; defer the
  V1 setup.txt flow to first-boot.md (saves ~30 lines).
- install/requirements: drop @Andy/Mac-launchd/personal-config
  sections and the duplicated memory/session/task model that lives
  in architecture docs (saves ~150 lines).
- install/install: reframe the onboarding step as setup.txt-first
  with TUI as the explicit fallback.
- install/fresh-install-checklist: replace bsddialog wizard
  milestone with setup.txt seed milestone, note TUI fallback case.
- architecture/deployment-models: ISO model now says
  "setup.txt seed, TUI fallback".
- architecture/admin-panel: note planned set-operator menu entry.
- ISO-FIRST-BOOT-IMPLEMENTATION: sharpen task 4 reasoning —
  clawdie-admin exists but as a TUI launcher, not a CLI router.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---
Build: pass | Tests: FAIL — Tests  3 failed | 1972 passed (1975)
2026-04-27 06:54:33 +02:00
Operator & claude
29fbb1e6c8 docs(install): draft skeletal first-boot V1 walkthrough
Lands task 6 in skeleton form: docs/public/install/first-boot.md
covers the four required lines, optional fields (profile, locale,
dashboard credentials, SSH key, headless password), the post-install
set-operator command, and how to switch off OpenRouter. Two
TBD blocks remain: "Where setup.txt lives" (waits on task 5
delivery-mechanism validation) and "Troubleshooting" (waits on real
failure traces from the ISO build).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---
Build: pass | Tests: FAIL — Tests  3 failed | 1972 passed (1975)
2026-04-27 06:47:48 +02:00
Operator & claude
b9e771316d feat(setup): add set-operator script for post-install dashboard credentials
Lands task 4 from the ISO first-boot implementation split as a
standalone scripts/set-operator.ts (matches existing scripts/
convention — no clawdie-admin umbrella). Reuses
ensureControlplaneBootstrapOperator() for the Better Auth signUp
path. Prompts password via stdin with echo suppressed; refuses
non-TTY runs; updates OPERATOR_PASSWORD in .env (mode 0600).
First-set only — rotation goes through the dashboard.

Both planning docs updated to drop "notional" references and point
at the real npm run set-operator -- <email> command.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---
Build: pass | Tests: FAIL — Tests  3 failed | 1972 passed (1975)
2026-04-27 06:41:53 +02:00
1971a8075e feat(setup): add first-boot config parser and profile bundles
---
Build: pass | Tests: pass — Tests  1975 passed (1975)

---
Build: pass | Tests: pass — Tests  1975 passed (1975)
2026-04-27 06:32:33 +02:00
Operator & claude
8086135183 docs(handoff): round 4 on ISO first-boot — SSH key + headless credentials
Drops ROOT_PASSWORD (root locked by default), adds SSH_AUTHORIZED_KEY
as the preferred headless box-access path, adds CLAWDIE_USER_PASSWORD
as fallback only. Parser warns visibly when plaintext passwords are
present in setup.txt. Implementation doc task 1 (parser) and task 5
(delivery validation) extended to cover the new fields.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---
Build: pass | Tests: FAIL — Tests  3 failed | 1958 passed (1961)
2026-04-27 06:15:16 +02:00
Operator & claude
e405c3df1a docs(handoff): freeze ISO first-boot spec + add 6-task implementation split
Folds three Round 3 clarifications inline (notional clawdie-admin
surface, ASSISTANT_NAME blank → Clawdie, OpenRouter as bootstrap not
commitment) and adds ISO-FIRST-BOOT-IMPLEMENTATION.md as the
implementation contract with claim slots.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---
Build: pass | Tests: FAIL — Tests  3 failed | 1958 passed (1961)
2026-04-27 05:22:03 +02:00
Operator & claude
2f2b5e5376 docs(handoff): round 3 on ISO first-boot secrets — final field shape
Folds in Codex's three reservations on Round 2. Round 2 field list is
now superseded by Round 3 below.

Resolutions:

- "First registration wins" registration window dropped. The
  mitigations Round 2 proposed (IP logging, post-install summary)
  were detection, not prevention — useless against an attacker on a
  shared LAN who registered first. Replaced with Option α: if
  dashboard credentials are missing from setup.txt, the dashboard
  waits until the operator runs `clawdie-admin set-operator <email>`
  post-install. Telegram remains the operator interface in the
  meantime. Option β (Telegram CONFIRM flow for registration
  requests) documented as the upgrade path if dashboard becomes
  load-bearing enough to justify the extra friction.
- PROFILE=balanced moved from "required" to "prefilled." If it
  always defaults, calling it required misrepresents the operator's
  cognitive load. The line stays in setup.txt as visible
  documentation, not as a question the operator must answer.
- ASSISTANT_NAME promoted to "recommended" tier; HOSTNAME demoted
  to "optional with derived default." The project currently
  conflates two distinct concepts (system-admin hostname vs
  emotional assistant identity); for first-boot, the emotional one
  is what the operator cares about. HOSTNAME defaults to lowercased
  ASSISTANT_NAME.

Round 3 field list (authoritative):

- 3 required (OpenRouter key, Telegram bot token, Telegram admin ID)
- 1 recommended (ASSISTANT_NAME)
- 1 prefilled (PROFILE=balanced)
- 4 optional (TIMEZONE, HOSTNAME, OPERATOR_EMAIL, OPERATOR_PASSWORD)

Cognitive bar before first boot: four lines the operator types
into. Everything else has a sensible fallback.

Doc split (Codex's recommendation to extract a V1 onboarding spec
doc plus an implementation task breakdown) acknowledged as the right
next move, but premature — two items remain open (seed delivery
mechanism, clawdie-admin set-operator surface). Split happens once
those resolve.

Section explicitly lists what's now firmly decided vs still open
after Round 3 so future readers don't re-litigate closed questions
or silently commit open ones.

No code changes. Pure planning convergence.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---
Build: pass | Tests: FAIL — Tests  3 failed | 1958 passed (1961)
2026-04-26 21:21:57 +02:00
Operator & claude
2580eed76c docs(handoff): round 2 on ISO first-boot secrets — credentials decision
Captures the converged state after Codex's pushback on the Round 1
take. Three pushbacks accepted with resolutions; one open question
(OPERATOR_EMAIL / OPERATOR_PASSWORD in setup.txt) resolved as a
hybrid; final V1 field list locked.

Resolutions:

- Seed-partition specifics (256 MB FAT32, label "CLAWDIE-SEED")
  demoted from "spec" to "direction." Architectural commitment is
  file-based seed import; exact mechanism stays open until validated
  against the ISO repo and real flashers.
- Auto-wipe of setup.txt on import is dropped. Replaced with an
  installer warning to the operator (immediate + post-install
  summary) telling them to reformat the media. Keeps multi-machine
  reflash working; treats credential hygiene as documented operator
  action, not silent destruction.
- PROFILE explicitly sets all three (chat primary, fallback,
  compaction) as a coordinated bundle. Splitting them re-creates the
  configuration sprawl the profile is supposed to prevent. Advanced
  operators drop down to explicit lines that override the profile
  mapping.

OPERATOR_EMAIL / OPERATOR_PASSWORD resolution:

- Both optional in setup.txt.
- If both present: installer pre-creates the operator account in
  Better Auth on first boot. Unattended-install path.
- If either missing: Better Auth opens a "first registration wins"
  window (default 30 min, configurable) for local-network IPs only.
  First person to hit /dashboard registers through the normal sign-up
  form. Window auto-closes on success or timeout.
- Bound to local-network IPs via existing CONTROLPLANE_AUTH_MODE
  semantics; full source IP logged; "operator account registered
  from <ip>" surfaced in post-install summary so hijacked
  registration is visible immediately.
- Recovery via "clawdie-admin reopen-registration --minutes 30" CLI
  if window expires.

Final V1 field list: 5 required (OPENROUTER_API_KEY,
TELEGRAM_BOT_TOKEN, TELEGRAM_ADMIN_ID, PROFILE=balanced,
HOSTNAME=clawdie) + 3 optional (TIMEZONE, OPERATOR_EMAIL,
OPERATOR_PASSWORD). Anything else gets configured from the live
system, not from setup.txt.

Three items explicitly listed as still-open after Round 2 (seed
mechanism, registration window default, post-install summary
delivery channel) so they don't get silently committed.

No code changes. The Claude Take + Round 2 sections are scoped to
lift in the same commit that lands the actual seed-import
implementation.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---
Build: pass | Tests: FAIL — Tests  3 failed | 1958 passed (1961)
2026-04-26 21:11:39 +02:00
Operator & claude
59d9aaf020 docs(handoff): claude take on ISO first-boot secrets
Answers all six review questions in the handoff doc with a single
recommended V1 design (writable seed partition + profile indirection
+ TUI fallback), two realistic alternatives (post-bootstrap web/SSH
config, two-USB), eight named risks, and a complete eight-field
setup.txt template.

Operator-facing rename folded in: setup.env → setup.txt. The .env
extension is a developer convention; setup.txt opens cleanly in any
text editor on Win/Mac/Linux without configuration, which removes
one of the largest non-technical-operator friction points in the
flow.

Profile indirection (PROFILE=balanced/economy/quality) keeps model
IDs out of operator hands at install time and lets the team change
the validated mapping over time without breaking old setup.txt
files. The installer resolves the profile to actual
PI_TUI_PROVIDER/PI_TUI_MODEL/LLM_FALLBACK_* at install time.

The take also flags the second onboarding cliff (Telegram BotFather
flow, easily underestimated) and the V2 follow-up (web-based setup
wizard) so the seed-file work isn't throwaway when the better UX
ships later.

No code changes. Pure handoff response in
docs/internal/ISO-FIRST-BOOT-SECRETS-HANDOFF.md.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---
Build: pass | Tests: FAIL — Tests  3 failed | 1958 passed (1961)
2026-04-26 20:59:19 +02:00
83c33a2080 docs(handoff): add ISO first-boot secrets review
Adds a focused handoff doc for reviewing V1 ISO secrets onboarding, setup.env scope, seed-media import, and provider bootstrap choices.

---
Build: pass | Tests: pass — docs only

---
Build: pass | Tests: pass — Tests  1961 passed (1961)

---
Build: pass | Tests: pass — Tests  1961 passed (1961)
2026-04-26 20:52:12 +02:00
Operator & claude
e672af354a docs(plan): pivot ISO V1 plan from reactive to release-shaped
Codex's 6-step plan from the 26.apr.2026 chat session (defaults policy
→ state policy → token burn → truth-surface polish → smoke checklist →
ISO dry run) lands in BOOTABLE-ISO-PLAN-V1.md, with six refinements
integrated:

- Already-resolved snapshot section so reading-cold agents do not
  re-open closed questions (provider fallback works end-to-end,
  cooldown path normalized in 6983415, Token Ledgers in
  /budgetreport, Telegram basics stable).
- Step 1 explicitly absorbs the previously-separate Open Questions
  list (default primary, default fallback, free-tier policy, identity
  wording) so "freeze defaults" actually closes them rather than
  parallel-tracking. Identity wording is named as the same root cause
  as the fe14fad fixture failures, not a separate concern.
- Step 1 notes the cost-amplification trap in "compaction follows
  primary" (when fallback is paid-stable, compaction follows there
  too — burn during cooldown can amplify).
- Step 3 (token burn) promoted ahead of polish work because a fresh
  ISO install that quietly eats budget leaves a worse first
  impression than any of the polish items.
- Step 5 smoke checklist gains per-item triage hints so the dry-run
  operator knows what to check first when an item fails.
- Step 6 explicitly notes one dry run only catches one-time issues;
  two runs on different hardware is the post-1.0.0 bar, so nobody is
  surprised when the first prod install hits something the lone dry
  run missed.

Each step is now tagged by which agent class can claim it
(decision/docs / code / deploy) so independent claims do not stall on
"only Codex can do this."

Original Goal / Success Criteria / Non-Goals / Operator DoD / Working
Rule preserved verbatim at the top and bottom of the doc.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---
Build: pass | Tests: FAIL — Tests  3 failed | 1958 passed (1961)
2026-04-26 19:52:21 +02:00
6983415357 fix(runtime): normalize provider cooldown state path
Provider cooldown persistence now follows AGENT_STATUS_DIR, then legacy CLAWDIE_VAR_DIR, and otherwise defaults to repo-local tmp/state instead of ~/.clawdie/state. Updates docs to match the live behavior.

---
Build: pass | Tests: pass — 28 passed (2 files)

---
Build: pass | Tests: pass — Tests  1961 passed (1961)
2026-04-26 16:37:21 +02:00
Operator & claude
32e671c802 docs(.env.example): record validated chat + compaction config
Closes the .env drift Codex flagged on 26.apr.2026: the live deploy
runtime had switched to openrouter/openai/o3 as chat default and
unpinned AGENT_COMPACTION_PROVIDER, but neither change was reflected in
.env.example. Effect: a fresh ISO build or reinstall would have started
in the old (silent-no-reply) configuration.

This commit does not change the current zai/glm-5-turbo example primary
— some operators have working zAI keys with budget — but adds a
clearly-marked "known-stable alternative primary" block that documents
the openrouter/openai/o3 setup the operator validated today, with the
rationale (zAI 5-hour cap → silent no-reply).

The AGENT_COMPACTION_PROVIDER block now explains both modes: unset
(compaction follows chat runtime, including fallback) is the validated
default; pinning decouples compaction from chat fallback for cost or
stability reasons. The previous one-liner left both pieces undocumented.

provider-fallback.md gets a matching "Compaction interaction" note so
the reading order from the operator guide ends up at the same answer.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---
Build: pass | Tests: FAIL — Tests  3 failed | 1956 passed (1959)
2026-04-26 15:11:27 +02:00
Operator & claude
3e41cf6072 docs: handoff for fallback model + token ledger UX + ISO plan review
Three findings from the operator's afternoon session, captured for Codex
to act on or defer. One docs change pre-applied; rest are notes.

Pre-applied in this commit:

- docs/public/operate/provider-fallback.md: example fallback model
  changed from meta-llama/llama-3.3-70b-instruct:free to openai/o3 (paid,
  stable). New "Choosing a fallback model" subsection warns explicitly
  that free-tier models are unsafe as fallback targets — they rate-limit
  silently and the failure mode is indistinguishable from "agent dead."
  Operator hit this in production today.
- .env.example: LLM_FALLBACK_PROVIDER, LLM_FALLBACK_MODEL,
  LLM_FALLBACK_DEFAULT_COOLDOWN_SECONDS now documented (were missing
  entirely), with the same free-tier warning inline.

New session block in docs/internal/MULTITENANT-HANDOFF.md:

- Finding 1 (V1-blocker): live .env on deploy host needs the same model
  swap; consider startup WARN if LLM_FALLBACK_MODEL ends with :free;
  decide whether silent rate-limit-no-output should bubble as a visible
  Telegram error.
- Finding 2: token ledger views (/usage, /tokens, /policy) are
  arithmetically reconcilable but ask operator to mental-diff across
  three places. Recommended fix is a "Token Ledgers" section in
  /budgetreport showing quota + activity + reset-archived together.
- Finding 3: verify whether the mevy 0→14054 spent_today snapshot was
  a reset or a recording-path bug in recordTokenSpend (a73f211).
- Finding 4: review notes on BOOTABLE-ISO-PLAN-V1.md — promote identity
  wording from open question to Priority; split Priority A into
  regeneratable-status vs persistent-state; add synthetic-cap test path
  for fallback verification; add brief risk register.

No code changes. Docs and a single .env.example block.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---
Build: pass | Tests: FAIL — Tests  2 failed | 1949 passed (1951)
2026-04-26 14:10:41 +02:00
d8794983e0 docs(plan): add bootable ISO v1 target
---
Build: pass | Tests: pass — Tests  1951 passed (1951)

---
Build: pass | Tests: pass — Tests  1951 passed (1951)
2026-04-26 13:26:58 +02:00
Operator & claude
3828e5ce83 docs: integrate operator observability + provider fallback work
Brings the public docs in line with what shipped on multitenant over the
last few days. Three new operator-facing pages, three updates to existing
ones, and a CHANGELOG batch.

New pages (docs/public/operate/):
- operator-commands.md — single reference for all Telegram slash commands,
  grouped by purpose (status, structured reports, runtime, sessions, admin
  actions) with auth gating per command. Previously only in-bot /help text.
- provider-fallback.md — operator guide for the cooldown layer: env vars,
  how cooldowns are detected and tracked, /policy surfacing, /clearcooldown
  for manual release, the configured/effective/actual observability triple.
  Includes a "path convention note" flagging that the cooldown file still
  uses the legacy $CLAWDIE_VAR_DIR resolution while test/build status
  files have moved to repo tmp/ — divergence to harmonize later in code.
- structured-reports.md — explains the Observed/Interpretation/Operator
  Notes pattern, lists the six structured reports, documents the
  test/build pipeline contract (status JSON schema + new $AGENT_STATUS_DIR
  → $CLAWDIE_VAR_DIR → tmp/status precedence Codex landed in 1389e17),
  and covers free-text routing (classifyReportIntent + isOpsFlavored).

Updates:
- monitoring.md: appended "Operator-Facing Reports" section pointing at
  the new structured-reports page, and "Provider Fallback Health" pointing
  at the fallback page.
- operate/index.md: added the three new pages to the runbook list.
- architecture/controlplane.md: added "Runtime Observability" section
  documenting the configured/effective/actual triple and linking to the
  new operate pages.
- README.md: expanded the Telegram Commands table (was 10 rows, missing
  every structured report, /policy, /clearcooldown, /budgetreset) and
  added a pointer to operator-commands.md as the full reference. Also
  noted free-text routing.
- CHANGELOG.md: appended an "operator observability + provider fallback,
  apr.2026" batch under [Unreleased] covering provider fallback, the
  reports family, the test/build wrapper pipeline, free-text routing,
  /clearcooldown, the observability triple, the Telegram setMyCommands
  menu, and the new "Verify Before Claiming Remote State" rule in
  AGENTS.md.

No code changes. Slovenian sl/ mirror left untouched (out of localization
scope).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---
Build: pass | Tests: FAIL — Tests  8 failed | 1940 passed (1948)

---
Build: pass | Tests: FAIL — Tests  2 failed | 1949 passed (1951)
2026-04-26 13:01:43 +02:00
Operator & claude
9acbd1bfc3 docs: handoff for nanoclaw cleanup + verify-before-claim rule
Two outputs from this session bundled for the next agent:

- AGENTS.md gains "Verify Before Claiming Remote State" — durable rule
  born from the 1e87f34 vs 3d33482 confusion: don't speak about a
  remote without a fresh git fetch. When two agents disagree about a
  tip, both fetch before debugging.
- MULTITENANT-HANDOFF.md gains a 26.apr session block telling Codex
  how to disable the nanoclaw upstream remote in each worktree
  without deleting the source code (setup/upstream.ts and the
  check_upstream_updates MCP tool both gracefully degrade and stay
  useful as a re-enable path).

No code changes.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---
Build: pass | Tests: pass — Tests  1944 passed (1944)

---
Build: pass | Tests: pass — Tests  1944 passed (1944)
2026-04-26 10:43:58 +02:00
b4996f732f chore(docs): salvage public hardening note and git setup test
---
Build: FAIL | Tests: FAIL
2026-04-26 08:44:45 +02:00