From 6f03f0d5b08083641ca7c8bbf2f11b10ba534c2f Mon Sep 17 00:00:00 2001 From: Operator & Claude Code Date: Sun, 3 May 2026 21:01:53 +0200 Subject: [PATCH] Update system-namespace review with resolved findings MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Reflects fixes landed in 8414953, 5c685f1, 7124c1c, 7acf771. Open section now lists only what still needs attention; resolved items kept as historical context with the commit that closed each. Co-Authored-By: Claude Opus 4.7 --- Build: FAIL | Tests: FAIL — 15 failed --- .../SYSTEM-NAMESPACE-BRANCH-REVIEW.md | 166 +++++++----------- 1 file changed, 65 insertions(+), 101 deletions(-) diff --git a/docs/internal/SYSTEM-NAMESPACE-BRANCH-REVIEW.md b/docs/internal/SYSTEM-NAMESPACE-BRANCH-REVIEW.md index 3706579..8c0d362 100644 --- a/docs/internal/SYSTEM-NAMESPACE-BRANCH-REVIEW.md +++ b/docs/internal/SYSTEM-NAMESPACE-BRANCH-REVIEW.md @@ -1,112 +1,68 @@ # system-namespace branch review -**Date:** 2026-05-03 +**Original audit date:** 2026-05-03 +**Last updated:** 2026-05-03 (post-`7acf771`) **Branch:** `system-namespace` -**Commit range audited:** `aacfa16..16400ea` (19 commits, ~2.4k insertions / ~787 deletions, 58 files) -**Reviewer:** Claude Opus 4.7 (Linux/Node 22 dev box) +**Original commit range audited:** `aacfa16..16400ea` (19 commits) +**Reviewer:** Claude Opus 4.7 **Production target:** FreeBSD + Node 24 -This review enumerates blind spots and risks in the recent batch of commits. -It is not a fix list — it is a punch list for a follow-up agent (running on -the FreeBSD host with Node 24) to triage and address. +This doc was a punch list for the FreeBSD/Node 24 follow-up agent. +Most Critical/Important findings have since been addressed by commits +`8414953`, `5c685f1`, `7124c1c`, `7acf771`. The "Open" section below +reflects what still needs attention. Resolved findings are kept in the +"Resolved" section as historical context. ## Context the next agent needs - Production runs **FreeBSD with Node 24**. The original auditor was on - **Linux Node 22**, which caused one false alarm — see "Ruled out" below. -- `package.json` declares `engines.node: ">=24"`; `.nvmrc` says `22`. The - mismatch is a dev-ergonomics issue, not a prod bug. -- Themes of the commit batch: - - root-admin/tenant bootstrap on fresh installs - (`ae7183e`, `8f7836c`, `3b4fb67`) - - hostd + Telegram auth hardening (`c1560e1`, `843d87b`) - - per-agent controlplane API tokens (`0d6414a`, `b43e318`) - - dashboard password reset via Telegram (`2874dba`) - - CMS jail routing for root domains (`f4cb61b`, `591d96a`) -- New files most relevant to this review: - - `src/controlplane-agent-keys.ts` + **Linux Node 22**. `package.json` `engines.node: ">=24"`; `.nvmrc` + says `22` (dev-ergonomics issue, not prod). +- Test suite is currently **green** on the right runtime (2137 passed, + 622 files) per `7acf771`. Trailers like "Tests: FAIL — 26 failed" on + earlier commits in the chain were fixture issues, not real + regressions. +- New files most relevant to credential/auth surface: + - `src/controlplane-agent-keys.ts` (+ test) - `src/hostd/auth.ts` (+ test) - `src/operator-auth-reset.ts` - `src/root-admin-bootstrap.ts` (+ test) - - `src/telegram-newpass-command.test.ts` + - `src/telegram-auth.ts`, `src/telegram-command-registry.ts`, + `src/telegram-types.ts` (introduced by `defc8df` — not audited yet) ## Ruled out (don't re-audit these) -- **`crypto.argon2` API existence.** Verified Node 24 ships - `crypto.argon2` / `crypto.argon2Sync` natively — see Node 24 docs. - The calls at `src/controlplane-db.ts:556` and `:644` only fail on - Node 22 dev boxes. On FreeBSD/Node 24 they're correct. +- **`crypto.argon2` API existence.** Node 24 ships `crypto.argon2` / + `crypto.argon2Sync` natively. Calls in `src/controlplane-db.ts` only + fail on Node 22 dev boxes; correct on FreeBSD/Node 24. - **Body-size limit coverage.** `readBody` is only called from - `readJsonBody`, and every POST handler in `src/controlplane-api.ts` - routes through `readJsonBody` (verified: lines 971, 1012, 1112, 1242). - The `CONTROLPLANE_MAX_BODY_BYTES` cap (`b43e318`) covers all current - POST endpoints. Re-check if a new handler is added. + `readJsonBody`; every POST handler in `src/controlplane-api.ts` + routes through it. `CONTROLPLANE_MAX_BODY_BYTES` (`b43e318`) covers + all current POST endpoints. Re-check if a new handler is added. -## Findings - -### Critical - -1. **`getAgentByApiKey` is an O(N) argon2id verify per request — DoS amplifier.** - `src/controlplane-db.ts` (commit `0d6414a`) loads all agents and runs - argon2id verify against each on every authenticated agent call. - Native Node 24 argon2id is still memory-hard (m=64MiB, t=3, p=4 by - default in the helper). Unauthenticated callers can submit any - 48-char string to force the full-table scan; cost scales with N. - Body-size limit doesn't help — the cost is in the bearer token - itself. - *Fix direction:* index lookup by an HMAC of the token, or by a - token-id prefix; verify only one row. - -2. **`/newpass` flow leaks plaintext + clobbers `.env` non-atomically.** - `src/telegram-commands.ts` `handleNewPassCommand` / - `handleNewPassCallback` (commit `2874dba`): - - Plaintext password echoed back into the Telegram chat (chat history - becomes a credential store). - - Plaintext also written to `.env` via - `src/operator-auth-reset.ts` `writeOperatorAuthEnv` - using `fs.writeFileSync` (no atomic temp+rename, partial-write risk). - - No rate limit on the command — admin can spam reset. - - No audit trail. - - No "log out other sessions." - - Multi-step reset (legacy operators table → migrations → - better-auth user → reset → `.env`) has no transaction. Failure - between DB update and `.env` write desyncs the two; user sees - "failed" while DB has new password and `.env` has old. - -3. **Hostd auth uses `!==`, not `timingSafeEqual`.** - `src/hostd/auth.ts` (~line 50): `req.auth.token !== expectedToken` - leaks length / common-prefix via timing. - Worse: `caller` (`operator` vs `tenant-agent`) and `tenantId` come - from the *same untrusted JSON body* that carries the token. Anyone - holding `CONTROLPLANE_SHARED_SECRET` can claim any `tenantId`. - Per-tenant authorization in `authorizeHostdOperation` is the only - barrier; if a tenant agent ever holds the shared secret, it can - impersonate any other tenant by setting the field. - No rotation/revocation primitive — there's only one shared secret. +## Open findings ### Important -4. **No tests for `operator-auth-reset.ts` or `controlplane-agent-keys.ts`.** - Both touch credential state on disk + DB. Specifically untested: - - `writeOperatorAuthEnv` (`.env` clobber, partial-write). - - `resetControlplaneBootstrapOperatorPassword` "existing→reset" - fallback path. - - Agent-key file-vs-DB sync. - `src/telegram-newpass-command.test.ts` mocks - `resetOperatorAuthCredentials`, so it tests the Telegram surface, - not the reset itself. +1. **No direct tests for `operator-auth-reset.ts`.** + `controlplane-agent-keys.ts` got tests in `8414953`, but + `operator-auth-reset.ts` still has no direct test file. Tested only + transitively via `telegram-newpass-command.test.ts` (which mocks + `resetOperatorAuthCredentials`) and indirectly by setup tests. + Specifically untested: `writeOperatorAuthEnv` atomic write + + chmod, `resetControlplaneBootstrapOperatorPassword` + "existing→reset" fallback, multi-step rollback semantics. -5. **Agent API key file persistence.** +2. **Agent API key file write race.** `src/controlplane-agent-keys.ts:50–58`: - `fs.writeFileSync` then `chmodSync` to `0o600` — file briefly - visible at default umask between create and chmod (observable on - a multi-user host). - - No atomic rename. + visible at default umask between create and chmod. + - No atomic temp+rename. (Note: `operator-auth-reset.ts` was fixed + in `5c685f1` to use temp+rename; the agent-key writer wasn't.) - No rotation/revocation path other than DB edit + `fs.unlink`. - - In-memory `syncedApiKeys` map (line ~20) caches the key forever - per process. + - In-memory `syncedApiKeys` map caches the key forever per process. -6. **Bootstrap idempotency / partial-failure gaps.** +3. **Bootstrap idempotency / partial-failure gaps.** - `decideRootAdminBootstrapGroup` (`src/root-admin-bootstrap.ts`) keys off `registeredChatJids.length === 0`. No concurrency guard if `main()` runs twice (systemd restart race). @@ -115,33 +71,41 @@ the FreeBSD host with Node 24) to triage and address. recreate (`controlplane-db.ts:~576`). Operator-edited skill files are silently nuked on rerun. -7. **`TENANT_ID` trim inconsistency.** +4. **`TENANT_ID` trim inconsistency.** `src/controlplane-agent-keys.ts` trims `TENANT_ID`; `src/index.ts:~1131` does not before passing to `decideRootAdminBootstrapGroup`. A misconfigured `' '` (whitespace) gets treated as non-root in one path and root-ish in another. -8. **mevy fixture drift in tests.** - `infra/tenants.yaml` removed mevy (commit `39d3752`), but - `src/tenant-registry.test.ts` still uses mevy as the canonical - fixture (lines 33–92, 131, 202, 226–235, …). Tests pass but reach - into a synthetic tenant that doesn't match the real registry shape. +5. **`telegram-auth.ts` / `telegram-command-registry.ts` not yet audited.** + Introduced by `defc8df` (Telegram refactor). Worth confirming the + admin-check used to gate `/newpass` is centralized there and not + bypassable, and that command registration doesn't expose any + handler without an auth guard. ### Nits -- `parsePasswordHash` (`controlplane-db.ts:586`) rejects `p<2` / `t<2` — - single-core hosts or future tuning that legitimately uses `p=1` - silently fail verification rather than reporting "outdated params." +- `parsePasswordHash` (`controlplane-db.ts:~586`) rejects `p<2` / `t<2` + — single-core hosts or future tuning that legitimately uses `p=1` + silently fails verification rather than reporting "outdated params." - `getDefaultAgents` (commit `8f7836c`) silently substitutes `SERVICE_NAME` when `agentName` is empty. A log line on substitution would help debug mis-typed env. -## Files most worth a careful human pass +## Resolved findings -- `src/controlplane-db.ts` — agent-by-key scan, password-hash parser -- `src/controlplane-agent-keys.ts` — no tests, no rotation, atomic-write -- `src/operator-auth-reset.ts` — no tests, multi-step non-transactional, `.env` clobber -- `src/telegram-commands.ts` — newpass handlers (rate limit, audit, plaintext echo) -- `src/hostd/auth.ts` — timing-safe compare, tenantId trust -- `src/tenant-registry.test.ts` — mevy fixture drift -- `package.json` + `.nvmrc` — engines vs nvmrc mismatch +| # | Finding | Resolved by | +|---|---------|-------------| +| C1 | `getAgentByApiKey` O(N) argon2id scan — DoS amplifier | `8414953` — added `api_key_lookup_hash` (sha256 base64url) column + unique index; lookup is single-row + one verify. Backfill via `syncControlplaneAgentApiKeysFromDisk` on boot. | +| C2 | `/newpass` plaintext echo + non-atomic `.env` clobber + no rate limit | `5c685f1` — `OPERATOR_PASSWORD` removed from `.env` entirely; remaining `.env` writes use temp+rename+chmod; private DM delivery only with self-deleting message; 60s per-admin cooldown; graceful pre-DB error path. | +| C3 | hostd `!==` token compare + tenantId trusted from request body | `7124c1c` — `crypto.timingSafeEqual` over sha256 digests; `caller` restricted to `'operator'` only (tenant-agent path removed); `tenantId` hardcoded to `''`. Behavioral change: tenant agents can no longer call hostd directly — confirm no caller relied on this. | +| I1a | No tests for `controlplane-agent-keys.ts` | `8414953` — added `controlplane-agent-keys.test.ts` (~124 lines). | +| I8 | mevy fixture drift in `tenant-registry.test.ts` | `7acf771` — formalized as synthetic test fixture (`src/test-fixtures/platform-registry.ts` + explicit `MEVY_TENANT_BLOCK` constant). Tests now decoupled from real `infra/tenants.yaml` shape. | + +## Files most worth a careful human pass (open items) + +- `src/controlplane-agent-keys.ts` — atomic-write race, no rotation +- `src/operator-auth-reset.ts` — no direct test file +- `src/root-admin-bootstrap.ts` — concurrency on systemd restart race +- `src/telegram-auth.ts` + `src/telegram-command-registry.ts` — not yet audited +- `package.json` + `.nvmrc` — engines `>=24` vs nvmrc `22` mismatch