Delivers Claude-side deliverables from EXPLANATION-GROUNDER-PROPOSAL.md
"Claude Work Split": 10 coarse-grained domains with aliases, runtime
facts, curated sources, and exclusions; a flagged list of stale
grounding-source candidates per the recent audits; and a seed prompt
corpus covering single-domain, plain-language, mixed-subject, and
adversarial cases.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---
Build: FAIL | Tests: FAIL — 15 failed
Argues for retrieving canonical source files at runtime as grounding
context for explanation prompts, instead of writing one deterministic
responder per architecture topic. Includes a hybrid recommendation:
keep existing responders for stable high-volume topics, use the
grounder for the long tail.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---
Build: FAIL | Tests: FAIL — 15 failed
Documents the duplication and mixed-responsibility issues across the
four routing-related regex lists, and proposes a five-atom +
three-composition restructure with a truth-table proof of behavior
preservation.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---
Build: FAIL | Tests: FAIL — 15 failed
Reports canonical IP (10.0.1.5 from infra/jails.yaml) and groups every
stale reference in code/docs by error shape, plus the latent code-level
default mismatch in jail-schema.ts and config.ts. No patches yet.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---
Build: FAIL | Tests: FAIL — 15 failed
---
Build: FAIL | Tests: FAIL — 15 failed
Drafts replacement text for debug/SKILL.md (full rewrite), patches for
postgres-memory/SKILL.md and POSTGRES-MEMORY.md, and a seed routing test
corpus from real regressions. Proposals only — Codex retains authority
over the actual skill/doc edits per the routing handoff split.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---
Build: FAIL | Tests: FAIL — 15 failed
Reflects fixes landed in 8414953, 5c685f1, 7124c1c, 7acf771. Open
section now lists only what still needs attention; resolved items kept
as historical context with the commit that closed each.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---
Build: FAIL | Tests: FAIL — 15 failed
Captures blind spots in the recent auth/bootstrap/controlplane batch so
the FreeBSD agent can triage without re-running the audit.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---
Build: FAIL | Tests: FAIL — 35 failed
Resolves the collision class where a tenant named `clawdie` would
produce `clawdie_ops` clashing with the platform's shared ops DB.
Two constants instead of one:
- service name / brand / UNIX user: `clawdie` (one of them)
- platform namespace prefix for shared resources: `system`
Shared DBs become `system_ops` / `system_brain` / `system_skills`;
shared dataset becomes `zroot/system-runtime`. `system` joins the
reserved_host_labels list so the same collision cannot reappear at
the FQDN layer.
Also adds:
- Vocabulary section distinguishing operator account, service
account, service name, platform namespace, assistant display name,
tenant id (six terms, one bug class each)
- Install-paths section formalizing fresh-machine (ISO) vs
existing-host flows; `just install` is the platform install, never
the OS install
- Service-account override field as bootstrap config, not an
onboarding prompt; default stays `clawdie`
- Operator-account treatment: existing-host path checks for it;
Clawdie never renames or recreates it
AGENTS.md "Multitenant Rules" updated to match.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---
Build: pass | Tests: pass — Tests 2099 passed (2099)
Single source of truth at docs/internal/MULTITENANT.md (~430 lines)
replaces the previous spread across NAMING-POLICY, ARCHITECTURE,
HOST-REALITY, INTERNAL-ROLLOUT, ROADMAP, HANDOFF, AGENT-WORKFLOW, and
PLATFORM-V2-MANIFESTO. Load-bearing content (vision, conceptual model,
naming schema, surfaces, controlplane, publishing, conventions) is
folded in; current-state runbooks, phased migration plans, and
deployment-drift snapshots are dropped — design phase, fresh start.
Identity decision: drop PLATFORM_ID / PLATFORM_SERVICE_NAME /
PLATFORM_RUNTIME_USER. Platform identity is the constant 'clawdie'
baked into code; ASSISTANT_NAME is display-only and never feeds infra
names; TENANT_ID is for additive tenants only. AGENTS.md gains a short
"Multitenant Rules" block carrying the day-to-day do/don't extract.
Cross-references in AGENT-WORKFLOW-CHECKLIST, AGENT-WORKTREE-WORKFLOW,
and the two freebsd-jail-implementation docs updated to point at the
new file.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---
Build: pass | Tests: pass — Tests 2099 passed (2099)
Earlier version claimed the readiness wait was "in the wrong place" —
only running in the 5-min periodic check. That was wrong:
runControlPlaneChecks() is called at src/index.ts:1087, before
initDatabase / loadState / initMemoryPool. The wait already gates
bootstrap.
Trimmed the doc to the real follow-up scope: swap tcpReachable for
pg_isready, add HOST_DB_READINESS_TIMEOUT_MS env (default 60s),
minimal logging, one timeout-path test. No move, no restructuring.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---
Build: pass | Tests: FAIL — Tests 3 failed | 2081 passed (2084)
Improvement over no-wait, but two follow-ups before §E is closed:
- default probe is `tcpReachable` — pg opens its socket during WAL
recovery while rejecting queries with "starting up", so TCP-open
is not the same as accepting connections. Need a SELECT 1 /
pg_isready check.
- wait runs inside the 5-minute periodic controlplane check, not at
Mevy bootstrap. If anything in startup touches DB before the
first tick, the wait does not gate the actual race.
Plus: 30s default may be tight post-incident, no logs during the
wait, no env override, post-deadline extra probe makes timeout
fuzzy, and the "3 failed tests" trailer is still present.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---
Build: pass | Tests: FAIL — Tests 3 failed | 2081 passed (2084)
Implementation review of zai/Codex's "Harden host DB reboot path."
Direction is right but three blockers:
- snapshots are not atomic (two separate `zfs snapshot` calls
reproduce the pgwal/pgdata skew that caused the incident)
- `serviceMaybeStop` swallows real `onestatus` errors as
"already stopped" — can proceed to checkpoint pg with mevy
still running
- committed with 3 failing tests
Plus smells around missing readiness wait (§E), no spawnSync
timeouts, duplicated pool resolution, and an unrelated bonus fix
smuggled into the commit.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---
Build: pass | Tests: FAIL — Tests 2 failed | 2080 passed (2082)
Addendum to d456aa4. Three gaps that would have left the plan
implementable-but-unsafe:
- snapshot step now mandates a single recursive ZFS snapshot of the
common parent; two separate snapshots reproduce the pgwal/pgdata skew
that caused the 30.apr.2026 incident
- new §E: Mevy startup must poll for DB readiness (pg_isready or
equivalent); rc.d REQUIRE only orders start invocations, not actual
connect-ability
- §A now specifies failure semantics for the maintenance-reboot op
(each pre-reboot step aborts on failure; reboot only schedules after
all prior steps succeed)
- pg_resetwal explicitly demoted to non-recovery-path
- note that CHECKPOINT before clean stop is belt-and-suspenders
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---
Build: pass | Tests: pass — Tests 2080 passed (2080)
---
Build: pass | Tests: FAIL — Tests 2 failed | 2080 passed (2082)
Replaces the decision-tree handoff with a concrete step-by-step test guide
for Codex to run on the live host. Documents what Claude already shipped,
the exact verification commands, the nginx pattern question (direct vs proxy),
and a prioritized simplification assessment.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---
Build: pass | Tests: FAIL — Tests 8 failed | 2009 passed (2017)
- Update ARCHITECTURE.md Prompt Assembly section to document runtime-manifest
as a new context layer injected per-message, explaining it answers
coherence questions: 'what repo/branch/skills do I have?'
- Update docs/internal/AGENT-HARNESS-V2.md Phase 5 to detail both System State
and Runtime Manifest as complementary context blocks, explaining the
coherence gap they solve together
- New docs/internal/RUNTIME-MANIFEST-DESIGN.md: complete specification
- Why: agents had infrastructure facts but couldn't see them
- What: machine-generated inventory from .git/library.yaml/artifacts
- How: fresh per-message, cheap local sources, compact XML-like format
- Where: injected in system prompt alongside SOUL/IDENTITY files
- Testing: coverage for git parsing, skills counting, specialist discovery
The three-layer coherence system is now:
1. Hand-written identity (SOUL/USER/IDENTITY/MEMORY) — philosophy, stable
2. Machine-generated manifest (RUNTIME_MANIFEST) — inventory, fresh
3. Live system state (system-state.ts) — operations, current
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
---
Build: pass | Tests: FAIL — Tests 8 failed | 2009 passed (2017)
Cross-repo analysis after 975f37f landed. TypeScript setup layer is
correct in isolation; gaps are all at the ISO firstboot boundary:
no setup.txt reader, pool name mismatch, mode naming divergence,
AGENT_DOMAIN derivation missing, Slovenian locale defaults, and
system.env unknown to the ISO.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
---
Build: pass | Tests: FAIL — Tests 4 failed | 1996 passed (2000)