Add system-namespace branch review notes

Captures blind spots in the recent auth/bootstrap/controlplane batch so
the FreeBSD agent can triage without re-running the audit.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---
Build: FAIL | Tests: FAIL — 35 failed
This commit is contained in:
Operator & Claude Code 2026-05-03 17:43:07 +02:00
parent 16400eab70
commit 1f4ec5ca94

View file

@ -0,0 +1,147 @@
# system-namespace branch review
**Date:** 2026-05-03
**Branch:** `system-namespace`
**Commit range audited:** `aacfa16..16400ea` (19 commits, ~2.4k insertions / ~787 deletions, 58 files)
**Reviewer:** Claude Opus 4.7 (Linux/Node 22 dev box)
**Production target:** FreeBSD + Node 24
This review enumerates blind spots and risks in the recent batch of commits.
It is not a fix list — it is a punch list for a follow-up agent (running on
the FreeBSD host with Node 24) to triage and address.
## Context the next agent needs
- Production runs **FreeBSD with Node 24**. The original auditor was on
**Linux Node 22**, which caused one false alarm — see "Ruled out" below.
- `package.json` declares `engines.node: ">=24"`; `.nvmrc` says `22`. The
mismatch is a dev-ergonomics issue, not a prod bug.
- Themes of the commit batch:
- root-admin/tenant bootstrap on fresh installs
(`ae7183e`, `8f7836c`, `3b4fb67`)
- hostd + Telegram auth hardening (`c1560e1`, `843d87b`)
- per-agent controlplane API tokens (`0d6414a`, `b43e318`)
- dashboard password reset via Telegram (`2874dba`)
- CMS jail routing for root domains (`f4cb61b`, `591d96a`)
- New files most relevant to this review:
- `src/controlplane-agent-keys.ts`
- `src/hostd/auth.ts` (+ test)
- `src/operator-auth-reset.ts`
- `src/root-admin-bootstrap.ts` (+ test)
- `src/telegram-newpass-command.test.ts`
## Ruled out (don't re-audit these)
- **`crypto.argon2` API existence.** Verified Node 24 ships
`crypto.argon2` / `crypto.argon2Sync` natively — see Node 24 docs.
The calls at `src/controlplane-db.ts:556` and `:644` only fail on
Node 22 dev boxes. On FreeBSD/Node 24 they're correct.
- **Body-size limit coverage.** `readBody` is only called from
`readJsonBody`, and every POST handler in `src/controlplane-api.ts`
routes through `readJsonBody` (verified: lines 971, 1012, 1112, 1242).
The `CONTROLPLANE_MAX_BODY_BYTES` cap (`b43e318`) covers all current
POST endpoints. Re-check if a new handler is added.
## Findings
### Critical
1. **`getAgentByApiKey` is an O(N) argon2id verify per request — DoS amplifier.**
`src/controlplane-db.ts` (commit `0d6414a`) loads all agents and runs
argon2id verify against each on every authenticated agent call.
Native Node 24 argon2id is still memory-hard (m=64MiB, t=3, p=4 by
default in the helper). Unauthenticated callers can submit any
48-char string to force the full-table scan; cost scales with N.
Body-size limit doesn't help — the cost is in the bearer token
itself.
*Fix direction:* index lookup by an HMAC of the token, or by a
token-id prefix; verify only one row.
2. **`/newpass` flow leaks plaintext + clobbers `.env` non-atomically.**
`src/telegram-commands.ts` `handleNewPassCommand` /
`handleNewPassCallback` (commit `2874dba`):
- Plaintext password echoed back into the Telegram chat (chat history
becomes a credential store).
- Plaintext also written to `.env` via
`src/operator-auth-reset.ts` `writeOperatorAuthEnv`
using `fs.writeFileSync` (no atomic temp+rename, partial-write risk).
- No rate limit on the command — admin can spam reset.
- No audit trail.
- No "log out other sessions."
- Multi-step reset (legacy operators table → migrations →
better-auth user → reset → `.env`) has no transaction. Failure
between DB update and `.env` write desyncs the two; user sees
"failed" while DB has new password and `.env` has old.
3. **Hostd auth uses `!==`, not `timingSafeEqual`.**
`src/hostd/auth.ts` (~line 50): `req.auth.token !== expectedToken`
leaks length / common-prefix via timing.
Worse: `caller` (`operator` vs `tenant-agent`) and `tenantId` come
from the *same untrusted JSON body* that carries the token. Anyone
holding `CONTROLPLANE_SHARED_SECRET` can claim any `tenantId`.
Per-tenant authorization in `authorizeHostdOperation` is the only
barrier; if a tenant agent ever holds the shared secret, it can
impersonate any other tenant by setting the field.
No rotation/revocation primitive — there's only one shared secret.
### Important
4. **No tests for `operator-auth-reset.ts` or `controlplane-agent-keys.ts`.**
Both touch credential state on disk + DB. Specifically untested:
- `writeOperatorAuthEnv` (`.env` clobber, partial-write).
- `resetControlplaneBootstrapOperatorPassword` "existing→reset"
fallback path.
- Agent-key file-vs-DB sync.
`src/telegram-newpass-command.test.ts` mocks
`resetOperatorAuthCredentials`, so it tests the Telegram surface,
not the reset itself.
5. **Agent API key file persistence.**
`src/controlplane-agent-keys.ts:5058`:
- `fs.writeFileSync` then `chmodSync` to `0o600` — file briefly
visible at default umask between create and chmod (observable on
a multi-user host).
- No atomic rename.
- No rotation/revocation path other than DB edit + `fs.unlink`.
- In-memory `syncedApiKeys` map (line ~20) caches the key forever
per process.
6. **Bootstrap idempotency / partial-failure gaps.**
- `decideRootAdminBootstrapGroup` (`src/root-admin-bootstrap.ts`)
keys off `registeredChatJids.length === 0`. No concurrency guard
if `main()` runs twice (systemd restart race).
- Commit `8f7836c` does
`fs.rmSync(skillDest, { recursive: true, force: true })` before
recreate (`controlplane-db.ts:~576`). Operator-edited skill files
are silently nuked on rerun.
7. **`TENANT_ID` trim inconsistency.**
`src/controlplane-agent-keys.ts` trims `TENANT_ID`;
`src/index.ts:~1131` does not before passing to
`decideRootAdminBootstrapGroup`. A misconfigured `' '` (whitespace)
gets treated as non-root in one path and root-ish in another.
8. **mevy fixture drift in tests.**
`infra/tenants.yaml` removed mevy (commit `39d3752`), but
`src/tenant-registry.test.ts` still uses mevy as the canonical
fixture (lines 3392, 131, 202, 226235, …). Tests pass but reach
into a synthetic tenant that doesn't match the real registry shape.
### Nits
- `parsePasswordHash` (`controlplane-db.ts:586`) rejects `p<2` / `t<2`
single-core hosts or future tuning that legitimately uses `p=1`
silently fail verification rather than reporting "outdated params."
- `getDefaultAgents` (commit `8f7836c`) silently substitutes
`SERVICE_NAME` when `agentName` is empty. A log line on substitution
would help debug mis-typed env.
## Files most worth a careful human pass
- `src/controlplane-db.ts` — agent-by-key scan, password-hash parser
- `src/controlplane-agent-keys.ts` — no tests, no rotation, atomic-write
- `src/operator-auth-reset.ts` — no tests, multi-step non-transactional, `.env` clobber
- `src/telegram-commands.ts` — newpass handlers (rate limit, audit, plaintext echo)
- `src/hostd/auth.ts` — timing-safe compare, tenantId trust
- `src/tenant-registry.test.ts` — mevy fixture drift
- `package.json` + `.nvmrc` — engines vs nvmrc mismatch