Commit graph

372 commits

Author SHA1 Message Date
Operator & Claude Code
67541f93f2 Review explanation grounder jails pilot
---
Build: FAIL | Tests: FAIL — 15 failed
2026-05-06 10:30:37 +02:00
c4239b2b11 Align jail policy and add system update path
---
Build: pass | Tests: pass — 51 passed (3 files)

---
Build: pass | Tests: pass — 2189 passed (648 files)
2026-05-06 09:43:08 +02:00
Operator & Claude Code
eef58906a6 Propose initial explanation-domain registry
Delivers Claude-side deliverables from EXPLANATION-GROUNDER-PROPOSAL.md
"Claude Work Split": 10 coarse-grained domains with aliases, runtime
facts, curated sources, and exclusions; a flagged list of stale
grounding-source candidates per the recent audits; and a seed prompt
corpus covering single-domain, plain-language, mixed-subject, and
adversarial cases.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---
Build: FAIL | Tests: FAIL — 15 failed
2026-05-06 00:16:14 +02:00
1cf2325749 Clarify generic explanation grounding direction
---
Build: pass | Tests: pass — 2176 passed (640 files)
2026-05-06 00:12:16 +02:00
Operator & Claude Code
c0ab09c55b Add explanation-grounder counter-proposal
Argues for retrieving canonical source files at runtime as grounding
context for explanation prompts, instead of writing one deterministic
responder per architecture topic. Includes a hybrid recommendation:
keep existing responders for stable high-volume topics, use the
grounder for the long tail.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---
Build: FAIL | Tests: FAIL — 15 failed
2026-05-05 23:56:22 +02:00
Operator & Claude Code
aee5a90a6e Add report-intent.ts refactor proposal
Documents the duplication and mixed-responsibility issues across the
four routing-related regex lists, and proposes a five-atom +
three-composition restructure with a truth-table proof of behavior
preservation.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---
Build: FAIL | Tests: FAIL — 15 failed
2026-05-05 22:52:27 +02:00
49ce3c703f Clarify controlplane jail IP examples
---
Build: pass | Tests: not run — docs-only change

---
Build: pass | Tests: pass — 2163 passed (630 files)
2026-05-05 22:43:52 +02:00
aabc403648 Align subnet defaults and public jail docs
---
Build: pass | Tests: pass — 71 passed (3 files)

---
Build: pass | Tests: pass — 2163 passed (630 files)
2026-05-05 22:23:42 +02:00
c888d83cec Sweep stale host DB and PostgreSQL docs
---
Build: pass | Tests: pass — 2163 passed (630 files)

---
Build: pass | Tests: pass — 2163 passed (630 files)
2026-05-05 21:46:48 +02:00
Operator & Claude Code
293e589c6f Add db-jail IP source-of-truth verification
Reports canonical IP (10.0.1.5 from infra/jails.yaml) and groups every
stale reference in code/docs by error shape, plus the latent code-level
default mismatch in jail-schema.ts and config.ts. No patches yet.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---
Build: FAIL | Tests: FAIL — 15 failed

---
Build: FAIL | Tests: FAIL — 15 failed
2026-05-05 21:45:12 +02:00
ae8a4b4e9e Rewrite stale PostgreSQL specialist guidance
---
Build: pass | Tests: pass — 2162 passed (630 files)

---
Build: pass | Tests: pass — 2162 passed (630 files)
2026-05-05 21:30:15 +02:00
Operator & Claude Code
93d35ad95f Add stale specialist rewrite proposals + routing seed corpus
Drafts replacement text for debug/SKILL.md (full rewrite), patches for
postgres-memory/SKILL.md and POSTGRES-MEMORY.md, and a seed routing test
corpus from real regressions. Proposals only — Codex retains authority
over the actual skill/doc edits per the routing handoff split.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---
Build: FAIL | Tests: FAIL — 15 failed
2026-05-05 20:50:36 +02:00
Operator & Claude Code
4a63f59641 Add Claude addendum to stale-specialist audit
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---
Build: FAIL | Tests: FAIL — 15 failed
2026-05-05 20:44:47 +02:00
485286604a Audit stale specialist runtime guidance
---
Build: pass | Tests: pass — 2162 passed (630 files)
2026-05-05 20:00:33 +02:00
Operator & Claude Code
5a25622d15 Add Claude review to routing handoff
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---
Build: FAIL | Tests: FAIL — 15 failed

---
Build: FAIL | Tests: FAIL — 15 failed
2026-05-05 19:47:51 +02:00
0c4c246e48 Stop replaying stale architecture memories
---
Build: pass | Tests: pass — 2160 passed (629 files)
2026-05-05 19:44:49 +02:00
71f9ff577f Keep explanation prompts in main chat
---
Build: pass | Tests: pass — 2157 passed (628 files)
2026-05-05 19:35:33 +02:00
38b71eaf00 Add routing source-of-truth split handoff
---
Build: pass | Tests: pass — 2153 passed (627 files)
2026-05-05 19:05:22 +02:00
Operator & Claude Code
4adb9100d0 Add glasspane FreeBSD handoff doc
---
Build: FAIL | Tests: FAIL — 15 failed
2026-05-05 16:14:56 +02:00
Operator & Claude Code
6f03f0d5b0 Update system-namespace review with resolved findings
Reflects fixes landed in 8414953, 5c685f1, 7124c1c, 7acf771. Open
section now lists only what still needs attention; resolved items kept
as historical context with the commit that closed each.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---
Build: FAIL | Tests: FAIL — 15 failed
2026-05-03 21:01:53 +02:00
Operator & Claude Code
1f4ec5ca94 Add system-namespace branch review notes
Captures blind spots in the recent auth/bootstrap/controlplane batch so
the FreeBSD agent can triage without re-running the audit.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---
Build: FAIL | Tests: FAIL — 35 failed
2026-05-03 17:43:07 +02:00
bab4f76439 Reorder shared service IPs and switch docs to English root
---
Build: pass | Tests: FAIL — Tests  9 failed | 2081 passed | 4 skipped (2094)
2026-05-02 20:21:19 +02:00
24ccda6e47 Align root shared DB defaults and drop screenshot auth
---
Build: pass | Tests: FAIL — Tests  8 failed | 2087 passed | 4 skipped (2099)
2026-05-02 18:04:09 +02:00
Operator & Claude Code
eb8e05bfe6 docs: split brand identifier from platform namespace; add vocabulary
Resolves the collision class where a tenant named `clawdie` would
produce `clawdie_ops` clashing with the platform's shared ops DB.

Two constants instead of one:
- service name / brand / UNIX user: `clawdie` (one of them)
- platform namespace prefix for shared resources: `system`

Shared DBs become `system_ops` / `system_brain` / `system_skills`;
shared dataset becomes `zroot/system-runtime`. `system` joins the
reserved_host_labels list so the same collision cannot reappear at
the FQDN layer.

Also adds:
- Vocabulary section distinguishing operator account, service
  account, service name, platform namespace, assistant display name,
  tenant id (six terms, one bug class each)
- Install-paths section formalizing fresh-machine (ISO) vs
  existing-host flows; `just install` is the platform install, never
  the OS install
- Service-account override field as bootstrap config, not an
  onboarding prompt; default stays `clawdie`
- Operator-account treatment: existing-host path checks for it;
  Clawdie never renames or recreates it

AGENTS.md "Multitenant Rules" updated to match.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---
Build: pass | Tests: pass — Tests  2099 passed (2099)
2026-05-02 12:59:24 +02:00
Operator & Claude Code
02f7027f07 docs: collapse 8 multitenant/platform docs into one MULTITENANT.md
Single source of truth at docs/internal/MULTITENANT.md (~430 lines)
replaces the previous spread across NAMING-POLICY, ARCHITECTURE,
HOST-REALITY, INTERNAL-ROLLOUT, ROADMAP, HANDOFF, AGENT-WORKFLOW, and
PLATFORM-V2-MANIFESTO. Load-bearing content (vision, conceptual model,
naming schema, surfaces, controlplane, publishing, conventions) is
folded in; current-state runbooks, phased migration plans, and
deployment-drift snapshots are dropped — design phase, fresh start.

Identity decision: drop PLATFORM_ID / PLATFORM_SERVICE_NAME /
PLATFORM_RUNTIME_USER. Platform identity is the constant 'clawdie'
baked into code; ASSISTANT_NAME is display-only and never feeds infra
names; TENANT_ID is for additive tenants only. AGENTS.md gains a short
"Multitenant Rules" block carrying the day-to-day do/don't extract.

Cross-references in AGENT-WORKFLOW-CHECKLIST, AGENT-WORKTREE-WORKFLOW,
and the two freebsd-jail-implementation docs updated to point at the
new file.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---
Build: pass | Tests: pass — Tests  2099 passed (2099)
2026-05-02 12:16:58 +02:00
Operator & Claude Code
7c6d076b5c docs(host-db): correct readiness review — placement was already right
Earlier version claimed the readiness wait was "in the wrong place" —
only running in the 5-min periodic check. That was wrong:
runControlPlaneChecks() is called at src/index.ts:1087, before
initDatabase / loadState / initMemoryPool. The wait already gates
bootstrap.

Trimmed the doc to the real follow-up scope: swap tcpReachable for
pg_isready, add HOST_DB_READINESS_TIMEOUT_MS env (default 60s),
minimal logging, one timeout-path test. No move, no restructuring.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---
Build: pass | Tests: FAIL — Tests  3 failed | 2081 passed (2084)
2026-05-01 11:51:58 +02:00
Operator & Claude Code
5ee8debfd5 docs(host-db): review of 371b237 — readiness wait needs a real probe
Improvement over no-wait, but two follow-ups before §E is closed:

- default probe is `tcpReachable` — pg opens its socket during WAL
  recovery while rejecting queries with "starting up", so TCP-open
  is not the same as accepting connections. Need a SELECT 1 /
  pg_isready check.
- wait runs inside the 5-minute periodic controlplane check, not at
  Mevy bootstrap. If anything in startup touches DB before the
  first tick, the wait does not gate the actual race.

Plus: 30s default may be tight post-incident, no logs during the
wait, no env override, post-deadline extra probe makes timeout
fuzzy, and the "3 failed tests" trailer is still present.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---
Build: pass | Tests: FAIL — Tests  3 failed | 2081 passed (2084)
2026-05-01 11:44:36 +02:00
Operator & Claude Code
f2358fcb80 docs(host-db): review of b02746c — request revisions before merge
Implementation review of zai/Codex's "Harden host DB reboot path."
Direction is right but three blockers:

- snapshots are not atomic (two separate `zfs snapshot` calls
  reproduce the pgwal/pgdata skew that caused the incident)
- `serviceMaybeStop` swallows real `onestatus` errors as
  "already stopped" — can proceed to checkpoint pg with mevy
  still running
- committed with 3 failing tests

Plus smells around missing readiness wait (§E), no spawnSync
timeouts, duplicated pool resolution, and an unrelated bonus fix
smuggled into the commit.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---
Build: pass | Tests: FAIL — Tests  2 failed | 2080 passed (2082)
2026-05-01 10:57:13 +02:00
Operator & Claude Code
2bc14c7040 docs(host-db): tighten recovery plan — atomic snapshots, readiness wait, failure semantics
Addendum to d456aa4. Three gaps that would have left the plan
implementable-but-unsafe:

- snapshot step now mandates a single recursive ZFS snapshot of the
  common parent; two separate snapshots reproduce the pgwal/pgdata skew
  that caused the 30.apr.2026 incident
- new §E: Mevy startup must poll for DB readiness (pg_isready or
  equivalent); rc.d REQUIRE only orders start invocations, not actual
  connect-ability
- §A now specifies failure semantics for the maintenance-reboot op
  (each pre-reboot step aborts on failure; reboot only schedules after
  all prior steps succeed)
- pg_resetwal explicitly demoted to non-recovery-path
- note that CHECKPOINT before clean stop is belt-and-suspenders

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---
Build: pass | Tests: pass — Tests  2080 passed (2080)

---
Build: pass | Tests: FAIL — Tests  2 failed | 2080 passed (2082)
2026-05-01 10:27:08 +02:00
d456aa4be1 docs: add host DB recovery plan
---
Build: pass | Tests: FAIL — Tests  3 failed | 2077 passed (2080)
2026-05-01 10:17:11 +02:00
bef38d218a Add maintainer skills artifact builder
---
Build: pass | Tests: pass — Tests  2075 passed (2075)

---
Build: pass | Tests: pass — Tests  2075 passed (2075)
2026-04-29 13:12:30 +02:00
b97e623e3a Document tenant-site verify states
---
Build: pass | Tests: pass — Tests  2057 passed (2057)
2026-04-29 12:04:25 +02:00
6ed65e29c5 Tighten Astro handoff verification status
---
Build: FAIL | Tests: pass — Tests  2055 passed (2055)
2026-04-29 11:34:30 +02:00
aa026586a5 Rewrite localization docs for current Astro flow
---
Build: FAIL | Tests: pass — Tests  2055 passed (2055)
2026-04-29 11:31:50 +02:00
03a23a965b Use ječa wording in Slovenian docs
---
Build: pass | Tests: pass — Tests  2044 passed (2044)

---
Build: FAIL | Tests: pass — Tests  2055 passed (2055)
2026-04-29 10:55:07 +02:00
d97b0531ff Clarify host-vs-jail Astro publish commands
---
Build: pass | Tests: pass — Tests  2044 passed (2044)

---
Build: FAIL | Tests: pass — Tests  2055 passed (2055)
2026-04-29 10:54:39 +02:00
a81da587bd Document current Astro publish workflows
---
Build: pass | Tests: pass — Tests  2044 passed (2044)
2026-04-29 10:07:23 +02:00
Operator & Claude Code
7dbbca7197 docs(handoff): rewrite for Codex on-host verification and simplification plan
Replaces the decision-tree handoff with a concrete step-by-step test guide
for Codex to run on the live host. Documents what Claude already shipped,
the exact verification commands, the nginx pattern question (direct vs proxy),
and a prioritized simplification assessment.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---
Build: pass | Tests: FAIL — Tests  8 failed | 2009 passed (2017)
2026-04-28 14:10:06 +02:00
8aeedd54e9 docs(cms): align astro source paths and handoff
---
Build: pass | Tests: pass — Tests  2017 passed (2017)
2026-04-28 13:43:03 +02:00
d573daed76 docs(architecture): align docs with live naming and docs topology
---
Build: pass | Tests: pass — Tests  2017 passed (2017)

---
Build: pass | Tests: pass — Tests  2017 passed (2017)
2026-04-27 21:27:44 +02:00
Operator & Claude Code
fdeaa39588 docs: add runtime-manifest architecture documentation
- Update ARCHITECTURE.md Prompt Assembly section to document runtime-manifest
  as a new context layer injected per-message, explaining it answers
  coherence questions: 'what repo/branch/skills do I have?'

- Update docs/internal/AGENT-HARNESS-V2.md Phase 5 to detail both System State
  and Runtime Manifest as complementary context blocks, explaining the
  coherence gap they solve together

- New docs/internal/RUNTIME-MANIFEST-DESIGN.md: complete specification
  - Why: agents had infrastructure facts but couldn't see them
  - What: machine-generated inventory from .git/library.yaml/artifacts
  - How: fresh per-message, cheap local sources, compact XML-like format
  - Where: injected in system prompt alongside SOUL/IDENTITY files
  - Testing: coverage for git parsing, skills counting, specialist discovery

The three-layer coherence system is now:
  1. Hand-written identity (SOUL/USER/IDENTITY/MEMORY) — philosophy, stable
  2. Machine-generated manifest (RUNTIME_MANIFEST) — inventory, fresh
  3. Live system state (system-state.ts) — operations, current

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

---
Build: pass | Tests: FAIL — Tests  8 failed | 2009 passed (2017)
2026-04-27 21:21:30 +02:00
ae6a1e4ff9 fix(install): relax topology checks for upgrade
---
Build: pass | Tests: pass — Tests  2007 passed (2007)

---
Build: pass | Tests: pass — Tests  2007 passed (2007)
2026-04-27 12:17:44 +02:00
c99a04d93f docs(iso): rename setup import design doc
---
Build: pass | Tests: pass — Tests  2005 passed (2005)

---
Build: pass | Tests: pass — Tests  2005 passed (2005)
2026-04-27 11:58:34 +02:00
fee881d458 docs(handoff): review notes on 7919327 install-identity
Three issues: ZFS topology fields in mismatchKeys block valid storage
expansion upgrades (fix: move layout/data-disks/hot-spares out of
mismatch set); rescue mode silent on identity mismatches (fix: add
warn log); setup.txt path assumption on ISO path (note for bridge
phase, not a blocker).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---
Build: pass | Tests: FAIL — Tests  4 failed | 1996 passed (2000)

---
Build: pass | Tests: FAIL — Tests  5 failed | 2000 passed (2005)
2026-04-27 11:10:19 +02:00
d53a1e018d docs(iso): design contract for shell-setup-txt.sh bridge module
Two-phase validation model: POSIX sh reads setup.txt from FAT32
partition at boot (no Node), TypeScript validates fully post-deploy.
Covers partition detection, parser, derivation layer (AGENT_DOMAIN,
locale, mode translation), system.env passthrough, pool name fix,
and Codex implementation checklist.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---
Build: pass | Tests: FAIL — Tests  4 failed | 1996 passed (2000)
2026-04-27 10:36:29 +02:00
7ccf592fa0 docs(handoff): record 6 ISO-AI drift gaps for Codex
Cross-repo analysis after 975f37f landed. TypeScript setup layer is
correct in isolation; gaps are all at the ISO firstboot boundary:
no setup.txt reader, pool name mismatch, mode naming divergence,
AGENT_DOMAIN derivation missing, Slovenian locale defaults, and
system.env unknown to the ISO.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---
Build: pass | Tests: FAIL — Tests  4 failed | 1996 passed (2000)
2026-04-27 10:31:03 +02:00
975f37f895 feat(install): add versioned setup and system contracts
---
Build: pass | Tests: pass — Tests  2000 passed (2000)
2026-04-27 10:06:44 +02:00
d5182ec480 docs+setup: clarify install mode names
---
Build: pass | Tests: pass — Tests  1992 passed (1992)
2026-04-27 09:07:18 +02:00
bcb27d4d56 feat(install): backfill setup from inspect output
---
Build: pass | Tests: FAIL — Tests  2 failed | 1989 passed (1991)
2026-04-27 08:55:21 +02:00
7b14e27783 feat(install): add shell-based inspect mode
---
Build: pass | Tests: pass — Tests  1991 passed (1991)
2026-04-27 08:47:56 +02:00