Commit graph

101 commits

Author SHA1 Message Date
21818e4eb0 Refine glasspane tmux launcher
---
Build: pass | Tests: pass — 2147 passed (625 files)
2026-05-05 15:37:20 +02:00
0be5a169ac Fix tmux glasspane session handling
---
Build: pass | Tests: pass — 2147 passed (625 files)
2026-05-05 15:15:54 +02:00
3383afad9c Fix operator report automation and Telegram command scopes
---
Build: pass | Tests: FAIL — 31 failed (full-suite baseline from status writer; focused validation passed)

---
Build: pass | Tests: FAIL — 31 failed
2026-05-03 09:31:44 +02:00
c1560e108d Harden hostd auth and operator password hashing
---
Build: pass | Tests: FAIL — 4 failed (pre-existing controlplane-api tenant fixture cases)
2026-05-03 06:50:06 +02:00
Operator & Claude Code
75009dcb7f refactor(identity): remove PLATFORM_ID/SERVICE_NAME/RUNTIME_USER env vars
Step 5 of system-namespace cutover: complete the env-var removal that
step 4 set up. All consumers now import SERVICE_NAME from
src/platform-identity.ts directly; the deprecated PLATFORM_*
re-exports in src/config.ts are gone.

src/config.ts:
- PLATFORM_ID, PLATFORM_SERVICE_NAME, PLATFORM_RUNTIME_USER exports
  removed.
- PLATFORM_RUNTIME_HOME stays (derived from SERVICE_NAME, used by
  ~10 consumers for path construction).
- Env-var allowlist drops PLATFORM_ID / PLATFORM_SERVICE_NAME /
  PLATFORM_RUNTIME_USER / PLATFORM_RUNTIME_HOME entries.
- CONTROLPLANE_AIDER_TMUX_SESSION uses SERVICE_NAME directly.

setup/onboarding.ts:
- writeIdentity() simplified to write only ASSISTANT_NAME (display).
  PLATFORM_ID / PLATFORM_SERVICE_NAME / PLATFORM_RUNTIME_USER are no
  longer written to .env. Fresh installs have no PLATFORM_* keys.
- Status emission switched from PLATFORM_ID to SERVICE_NAME.

setup/env-audit.ts:
- Audit lists SERVICE_NAME instead of PLATFORM_ID; the env-file
  PLATFORM_ID read is gone.

24 source files (src/*.ts, setup/*.ts, scripts/dashboard.ts):
- Bare PLATFORM_ID / PLATFORM_SERVICE_NAME / PLATFORM_RUNTIME_USER
  references replaced with SERVICE_NAME.
- Imports rewired: SERVICE_NAME comes from
  ../{src/}platform-identity.js, not from config.js.
- Imports deduped where the sed sweep produced collisions.

Shell scripts (scripts/bhyve-evidence.sh, glass.sh, inspect-system.sh):
- Hardcoded SERVICE_NAME='clawdie' and SERVICE_USER='clawdie'.
  No more grep-the-.env fallbacks; the constants are the source.

Tests (middle path):
- Mechanical fixes (import path, renamed assertion text):
  src/hostd/privileged-commands.test.ts, src/startup-report.test.ts,
  setup/env-audit.test.ts, setup/install-mode.test.ts.
- Skipped with `// system-namespace:` markers (pinned removed
  env-driven override behavior; Codex rewrites once the bootstrap-
  config service-user override path lands):
    setup/verify.test.ts > 'uses the platform service name for PID candidates'
    setup/service.test.ts > 'resolves a platform runtime separately from the tenant'

Test files still containing PLATFORM_* strings in vi.mock contents,
ENV_KEYS arrays, or comments are left untouched — they are test
artifacts that don't affect runtime; mock contents resolve to
'clawdie' which still equals SERVICE_NAME.

tsc clean. 2095 tests pass, 4 skipped, 0 fail.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---
Build: pass | Tests: pass — Tests  2095 passed | 4 skipped (2099)
2026-05-02 14:49:19 +02:00
Operator & Claude Code
00a908306d Honor configured ZFS pool everywhere
Codex caught zroot hardcodes in setup/sanoid.ts and setup/db.ts; same
pattern remained in three more shipping locations:

- scripts/backup.ts: jail and shared dataset paths
- src/tenant-registry.ts: default tenant dataset list
- setup/sanoid.ts: npm-global retention candidate

Add zfsPool() helper to maintenance-snapshots.ts (where the analogous
buildHostDbDatasets reads ZFS_POOL) and use it in all three. Operators
running on non-default pools no longer get silently-wrong dataset paths
in backup, tenant provisioning, or sanoid retention.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---
Build: pass | Tests: pass — Tests  2099 passed (2099)
2026-05-02 08:20:32 +02:00
8c4b8a88ef Remove hardcoded mevy runtime identity
Replace remaining executable-code mevy assumptions with config-derived values. This updates operator messaging, runtime prompts, agent-task role defaults, inspect-system fallbacks, and OpenRouter metadata headers to follow PLATFORM_SERVICE_NAME, PLATFORM_ID, TENANT_ID, and PROJECT_ROOT instead of the live example tenant.

---
Build: pass | Tests: FAIL — Tests  3 failed | 2089 passed (2092)
2026-05-02 07:16:30 +02:00
bef38d218a Add maintainer skills artifact builder
---
Build: pass | Tests: pass — Tests  2075 passed (2075)

---
Build: pass | Tests: pass — Tests  2075 passed (2075)
2026-04-29 13:12:30 +02:00
975f37f895 feat(install): add versioned setup and system contracts
---
Build: pass | Tests: pass — Tests  2000 passed (2000)
2026-04-27 10:06:44 +02:00
d5182ec480 docs+setup: clarify install mode names
---
Build: pass | Tests: pass — Tests  1992 passed (1992)
2026-04-27 09:07:18 +02:00
bcb27d4d56 feat(install): backfill setup from inspect output
---
Build: pass | Tests: FAIL — Tests  2 failed | 1989 passed (1991)
2026-04-27 08:55:21 +02:00
7b14e27783 feat(install): add shell-based inspect mode
---
Build: pass | Tests: pass — Tests  1991 passed (1991)
2026-04-27 08:47:56 +02:00
2ab3fa050a refactor(setup): unify operator auth entrypoints
---
Build: pass | Tests: pass — Tests  1991 passed (1991)
2026-04-27 08:13:36 +02:00
Operator & claude
a16838b772 docs(handoff): record adopt-mode decisions + flag operator-auth unification
Round 5 in the handoff doc captures the five agreed adopt-mode
decisions (INSTALL_MODE field, fill-blanks default, identity
mismatch blocks, Telegram identity changes require explicit flag,
fingerprint gate) so they survive into Codex's design doc.

Implementation doc gets an "Adopt Mode (V1.1)" section with the
proposed 4-task split + per-field freeze contract table, plus a
task-4 followup subsection naming the legacy `operators` table
sync gap and the unification plan with Codex's
setup/operator-auth.ts. scripts/set-operator.ts gets a TODO(unify)
header pointing at the same gap.

first-boot.md notes adopt mode is V1.1 and to back up before
reflashing until then.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---
Build: pass | Tests: FAIL — Tests  3 failed | 1972 passed (1975)
2026-04-27 07:12:55 +02:00
Operator & claude
b9e771316d feat(setup): add set-operator script for post-install dashboard credentials
Lands task 4 from the ISO first-boot implementation split as a
standalone scripts/set-operator.ts (matches existing scripts/
convention — no clawdie-admin umbrella). Reuses
ensureControlplaneBootstrapOperator() for the Better Auth signUp
path. Prompts password via stdin with echo suppressed; refuses
non-TTY runs; updates OPERATOR_PASSWORD in .env (mode 0600).
First-set only — rotation goes through the dashboard.

Both planning docs updated to drop "notional" references and point
at the real npm run set-operator -- <email> command.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---
Build: pass | Tests: FAIL — Tests  3 failed | 1972 passed (1975)
2026-04-27 06:41:53 +02:00
1389e17ec4 fix(runtime): align startup brief and test status paths
---
Build: pass | Tests: pass — Tests  1951 passed (1951)
2026-04-26 12:48:47 +02:00
1e87f34121 feat(dashboard): expand operator tenant and publish view
---
Build: FAIL | Tests: FAIL

---
Build: FAIL | Tests: FAIL
2026-04-26 08:49:24 +02:00
af2648be87 fix(reports): keep test status artifacts in repo tmp
---
Build: FAIL | Tests: FAIL
2026-04-26 07:48:43 +02:00
Operator & claude
1759a8bd85 feat(reports): add structured test/build report
Reads JSON status files written by scripts/write-test-build-status.sh
so /testreport reflects the last real build/test run instead of model
memory. Missing or stale status degrades to "unknown" with an action
note rather than fabricating success.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---
Build: pass | Tests: pass — Tests  1914 passed (1914)

---
Build: pass | Tests: pass — Tests  1917 passed (1917)

---
Build: pass | Tests: pass — Tests  1921 passed (1921)
2026-04-26 07:44:21 +02:00
0d9ad52922 fix(controlplane): stop git push token burn in jail
---
Build: FAIL | Tests: FAIL
2026-04-25 19:37:54 +02:00
d8cbd5ca70 chore(multitenant): harden agent workflow and README sync
Move the multitenant agent-workflow decision into repo docs, enforce effective author/committer identities in the pre-commit hook, and replace the shell-based README version rewrite with a reusable Node helper.

---
Build: pass | Tests: pass — node scripts/update-readme-version.mjs --check; sh -n hooks/pre-commit

---
Build: FAIL | Tests: FAIL — Tests  58 failed | 1109 passed (1167)

---
Build: FAIL | Tests: FAIL — Tests  58 failed | 1107 passed (1165)
2026-04-25 07:58:18 +02:00
9605c7ad81 refactor(multitenant): collapse planTenantApply allowedResources duplication
Drop the allowedResources field from TenantApplyPlan — it was derived
field-for-field from resourceChecklist already, which was exactly the
"triplicate representation" flagged in the handoff's consolidation list.
Update scripts/tenant-lifecycle.ts to compute the same lists from the
checklist when it prints, and drop the tautological equality assertions
from the test (resourceChecklist is now the single source).

---
Build: pass | Tests: pass — 33 passed (1 file)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-24 19:12:12 +02:00
d8f43fc4a0 Clean up controlplane naming consumers
Fix the remaining operator-surface drift after the naming cutover. This aligns controlplane defaults around ai.<base>, makes the dashboard use the shared display-date helper and approved controlplane host, reuses the derived code-service hostname in Forgejo config, and fixes local-host syncing so underscore-form tenant jails are no longer skipped.

---
Build: pass | Tests: pass — 67 passed (5 files)
2026-04-24 16:50:08 +02:00
9fea739140 Finish controlplane naming propagation sweep
---
Build: pass | Tests: pass — 122 passed (7 files)
2026-04-24 16:03:47 +02:00
0c690d2065 Surface tenant naming overrides in apply plan 2026-04-24 11:06:20 +02:00
3992503522 Clarify tenant apply normalization hints
Explain in tenant-apply output when an existing tenant still carries declared non-default state, so operators can distinguish current tenant-specific carryover from the smaller V2 default for new tenants.

---
Build: pass | Tests: pass — 31 passed (1 file)
2026-04-24 10:17:41 +02:00
ae7a109da4 Consolidate tenant apply contract shape
Reduce duplication in planTenantApply by treating the resource checklist as the canonical resource list, deriving blockers from preflight state, and trimming redundant action-policy payloads.

---
Build: pass | Tests: pass — 31 passed (1 file)
2026-04-24 10:10:44 +02:00
daf29fa332 Add tenant apply resource checklist
Refine tenant-apply dry runs with per-resource status entries so databases, worker jails, and datasets are reported as explicit future-create candidates instead of only appearing inside summary sections.

---
Build: pass | Tests: pass — 31 passed (1 file)
2026-04-24 09:40:04 +02:00
253cdcecb6 Classify tenant apply policy actions
Refine tenant-apply planning so future automatic candidates, manual-only steps, and permanent out-of-scope actions are reported explicitly instead of being implied by generic prose.

---
Build: pass | Tests: pass — 31 passed (1 file)
2026-04-24 09:36:31 +02:00
2d3f2253c9 Define tenant apply preflight policy
Turn tenant-apply into a structured preflight contract that marks what already passes in the declarative model, what remains manual, and what still blocks any future automatic host mutation.

---
Build: pass | Tests: pass — 31 passed (1 file)
2026-04-24 09:32:10 +02:00
36827ab478 Add dry-run tenant apply planning
Introduce a separate tenant-apply contract that describes what a future live apply would be allowed to touch, what prerequisites it would require, and what stays explicitly manual or out of scope.

---
Build: pass | Tests: pass — 28 passed (1 file)
2026-04-24 09:14:37 +02:00
59c4006938 refactor(multitenant): platform domain config, richer CLI, comment-safe registry
- platform record now accepts internal_domain and internal_base; tenant
  internal-domain derivation honors platform.internal_base instead of
  hard-coding home.arpa
- validateTenantRecord now rejects a tenant whose internal_domain
  collides with the platform internal_domain
- tenant-lifecycle CLI now accepts --internal-domain, --service, and
  repeatable --dataset flags; tenant-list now prints
  id\\tservice\\tinternal-domain\\tdisplay-name
- writeTenantRegistry preserves YAML comments and key order via the
  yaml Document API instead of parse/stringify round-tripping
- platformHostd{SocketPath,PidFile} now use normalizeResourceId
  directly so platform-side helpers stop calling normalizeTenantId

Build: pass | Tests: pass — 1783 passed (114 files); two failing
suites (vision.test.ts, controlplane-api.test.ts) are pre-existing
on origin/multitenant and unrelated to this change.
2026-04-24 09:13:20 +02:00
b48e073848 Define tenant provisioning contract
Turn tenant planning into an explicit declarative contract that states which logical resources belong to a tenant and which host-level concerns remain intentionally out of scope.

---
Build: pass | Tests: pass — 20 passed (1 file)
2026-04-24 08:48:48 +02:00
56fbddb616 Define tenant removal safety boundaries
Make tenant removal planning distinguish declarative registry changes from protected platform resources, and block removal when a tenant overlaps platform identity or shared services.

---
Build: pass | Tests: pass — 18 passed (1 file)
2026-04-24 08:41:25 +02:00
311f663523 Harden tenant lifecycle validation
Reject empty tenant input, normalize read-path lookups, and treat shared platform resource aliases as reserved so lifecycle validation catches underscore and hyphen collisions consistently.

---
Build: pass | Tests: pass — 25 passed (2 files)
2026-04-24 08:38:29 +02:00
e040f5cfcc Add tenant lifecycle removal planning
Keep tenants as logical platform identities, preserve human display names while normalizing system ids, and add a dry-run removal path plus stronger registry validation.

---
Build: pass | Tests: pass — 28 passed (3 files)
2026-04-24 08:32:45 +02:00
ac160ea7f0 Refine tenant model to logical platform identities
Drop tenant home/repo workspace fields from the registry and lifecycle planning so tenants remain logical identities inside one platform deployment.

---
Build: pass | Tests: pass — 13 passed (2 files)
2026-04-24 08:13:58 +02:00
b8fd655f02 Refactor V2 identity and platform ownership model
Make the multitenant branch use a clean PLATFORM_*/TENANT_* model, remove active AGENT_NAME runtime usage, collapse hostd ownership into the shared platform, add operator audit surfaces, and add read-only tenant lifecycle commands.

---
Build: pass | Tests: pass — 151 passed (14 files)
2026-04-24 07:49:09 +02:00
c65c289f08 refactor(multitenant): make tenant and platform identity explicit
Replace ambiguous AGENT_NAME usage across runtime, setup, and helper scripts with explicit TENANT_ID or platform runtime identity where appropriate. Keep AGENT_NAME as a compatibility boundary instead of the primary source for shared runtime naming.

---
Build: pass | Tests: pass — 138 passed (10 files)
2026-04-23 21:41:42 +02:00
66a36a6548 refactor(multitenant): centralize controlplane session paths
Introduce a shared controlplane paths helper and use it in runtime plus operator tooling. This removes another tenant-derived path assumption and aligns controlplane session logs with the actual tmp-based layout used by the platform.

---
Build: pass | Tests: pass — 105 passed (7 files)
2026-04-23 10:11:40 +02:00
c8cfa898de refactor(multitenant): platform-scope worker jail naming
Move controlplane worker jail naming off tenant identity and onto the shared platform service identity. Also update operator-facing controlplane scripts so their error messages describe the platform service instead of implying tenant ownership.

---
Build: pass | Tests: pass — 103 passed (6 files)
2026-04-23 09:52:15 +02:00
0d801e6ecf refactor(multitenant): move shared runtime names to platform scope
Continue the platform runtime split by moving shared watchdog and controlplane defaults off tenant-derived names. Operator-facing dashboard and controlplane defaults now use the platform service identity, with tests covering the new config and socket behavior.

---
Build: pass | Tests: pass — 103 passed (6 files)
2026-04-23 09:26:36 +02:00
42393a2f99 fix(system-health): use display date format
Render the Generated timestamp using src/display-date.ts (DD.mmm.YYYY HH:MM:SS) instead of ISO 8601.

---

Build: pass | Tests: not run
2026-04-22 09:56:08 +02:00
374b3a8982 chore(harness): remove duplicate types and note local parser
Remove duplicate JailRow type in scripts/dashboard.ts and clarify that the pi extension keeps a local bastille parser copy. Also trims extra blank line.

---

Build: pass | Tests: pass — agent-runner
2026-04-22 09:49:19 +02:00
fb4104e0c3 fix(cli): correct jail-list columns (Sam & Codex)
Use shared bastille list parser so IP/name/state are correct for wide bastille output.

---
Build: pass | Tests: pass — 1683 passed (104 files)
2026-04-21 23:45:26 +02:00
2765bae250 refactor(bastille): unify list parsing (Sam & Codex)
- Add src/bastille-list.ts parser supporting wide + legacy formats (Up/Down/Stopped)
- Use shared parser in scripts dashboard/system-health
- Extend pi harness extension tests to cover wide table

---
Build: pass | Tests: pass — 1683 passed (104 files)
2026-04-21 23:36:37 +02:00
5a052718f5 fix(controlplane): repair agent-task scripts
- Add required Authorization header (CONTROLPLANE_SHARED_SECRET)
- Support selecting assigned role via `just agent-task "..." db-admin`
- Update agent-task-status to understand `task_id` and list recent tasks
- Update harness handoff Phase 7e example

---
Build: pass | Tests: pass — 103 files, 1680 tests
2026-04-21 22:23:02 +02:00
f3b2c0189a fix: restore harness validation commands (Sam & Codex)
- Fix `just dashboard`: correct hostd socket default, default output dir `html/dashboard`, mkdir output dir\n- Fix `just system-health`: parse `bastille list` correctly + call hostd `service-status` with `name`\n- Update harness validation handoff checkboxes + results\n\n---\nBuild: pass | Tests: pass — 1674 passed (102 files)
2026-04-21 21:07:06 +02:00
e97a1dec2c chore: commit backfill-embeddings maintenance script
Repairs memory_chunks rows missing vector embeddings — useful after
embedding API outages, provider switches, or fresh installs. Has
dry-run mode and rate-limit backoff. Not a skill; run manually when
semantic memory search degrades.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-21 09:01:23 +02:00
7a0d3888d5 fix: update all stale PostgreSQL 17 references to 18
data17 path and postgresql17 package refs were never updated when PG was
upgraded to 18. Fixes setup scripts, skills, docs, tests, and archived
playbooks to match the running system (PG 18.3, /var/db/postgres/data).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-18 09:12:48 +00:00