Add system-namespace branch review notes
Captures blind spots in the recent auth/bootstrap/controlplane batch so the FreeBSD agent can triage without re-running the audit. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --- Build: FAIL | Tests: FAIL — 35 failed
This commit is contained in:
parent
16400eab70
commit
1f4ec5ca94
1 changed files with 147 additions and 0 deletions
147
docs/internal/SYSTEM-NAMESPACE-BRANCH-REVIEW.md
Normal file
147
docs/internal/SYSTEM-NAMESPACE-BRANCH-REVIEW.md
Normal file
|
|
@ -0,0 +1,147 @@
|
|||
# system-namespace branch review
|
||||
|
||||
**Date:** 2026-05-03
|
||||
**Branch:** `system-namespace`
|
||||
**Commit range audited:** `aacfa16..16400ea` (19 commits, ~2.4k insertions / ~787 deletions, 58 files)
|
||||
**Reviewer:** Claude Opus 4.7 (Linux/Node 22 dev box)
|
||||
**Production target:** FreeBSD + Node 24
|
||||
|
||||
This review enumerates blind spots and risks in the recent batch of commits.
|
||||
It is not a fix list — it is a punch list for a follow-up agent (running on
|
||||
the FreeBSD host with Node 24) to triage and address.
|
||||
|
||||
## Context the next agent needs
|
||||
|
||||
- Production runs **FreeBSD with Node 24**. The original auditor was on
|
||||
**Linux Node 22**, which caused one false alarm — see "Ruled out" below.
|
||||
- `package.json` declares `engines.node: ">=24"`; `.nvmrc` says `22`. The
|
||||
mismatch is a dev-ergonomics issue, not a prod bug.
|
||||
- Themes of the commit batch:
|
||||
- root-admin/tenant bootstrap on fresh installs
|
||||
(`ae7183e`, `8f7836c`, `3b4fb67`)
|
||||
- hostd + Telegram auth hardening (`c1560e1`, `843d87b`)
|
||||
- per-agent controlplane API tokens (`0d6414a`, `b43e318`)
|
||||
- dashboard password reset via Telegram (`2874dba`)
|
||||
- CMS jail routing for root domains (`f4cb61b`, `591d96a`)
|
||||
- New files most relevant to this review:
|
||||
- `src/controlplane-agent-keys.ts`
|
||||
- `src/hostd/auth.ts` (+ test)
|
||||
- `src/operator-auth-reset.ts`
|
||||
- `src/root-admin-bootstrap.ts` (+ test)
|
||||
- `src/telegram-newpass-command.test.ts`
|
||||
|
||||
## Ruled out (don't re-audit these)
|
||||
|
||||
- **`crypto.argon2` API existence.** Verified Node 24 ships
|
||||
`crypto.argon2` / `crypto.argon2Sync` natively — see Node 24 docs.
|
||||
The calls at `src/controlplane-db.ts:556` and `:644` only fail on
|
||||
Node 22 dev boxes. On FreeBSD/Node 24 they're correct.
|
||||
- **Body-size limit coverage.** `readBody` is only called from
|
||||
`readJsonBody`, and every POST handler in `src/controlplane-api.ts`
|
||||
routes through `readJsonBody` (verified: lines 971, 1012, 1112, 1242).
|
||||
The `CONTROLPLANE_MAX_BODY_BYTES` cap (`b43e318`) covers all current
|
||||
POST endpoints. Re-check if a new handler is added.
|
||||
|
||||
## Findings
|
||||
|
||||
### Critical
|
||||
|
||||
1. **`getAgentByApiKey` is an O(N) argon2id verify per request — DoS amplifier.**
|
||||
`src/controlplane-db.ts` (commit `0d6414a`) loads all agents and runs
|
||||
argon2id verify against each on every authenticated agent call.
|
||||
Native Node 24 argon2id is still memory-hard (m=64MiB, t=3, p=4 by
|
||||
default in the helper). Unauthenticated callers can submit any
|
||||
48-char string to force the full-table scan; cost scales with N.
|
||||
Body-size limit doesn't help — the cost is in the bearer token
|
||||
itself.
|
||||
*Fix direction:* index lookup by an HMAC of the token, or by a
|
||||
token-id prefix; verify only one row.
|
||||
|
||||
2. **`/newpass` flow leaks plaintext + clobbers `.env` non-atomically.**
|
||||
`src/telegram-commands.ts` `handleNewPassCommand` /
|
||||
`handleNewPassCallback` (commit `2874dba`):
|
||||
- Plaintext password echoed back into the Telegram chat (chat history
|
||||
becomes a credential store).
|
||||
- Plaintext also written to `.env` via
|
||||
`src/operator-auth-reset.ts` `writeOperatorAuthEnv`
|
||||
using `fs.writeFileSync` (no atomic temp+rename, partial-write risk).
|
||||
- No rate limit on the command — admin can spam reset.
|
||||
- No audit trail.
|
||||
- No "log out other sessions."
|
||||
- Multi-step reset (legacy operators table → migrations →
|
||||
better-auth user → reset → `.env`) has no transaction. Failure
|
||||
between DB update and `.env` write desyncs the two; user sees
|
||||
"failed" while DB has new password and `.env` has old.
|
||||
|
||||
3. **Hostd auth uses `!==`, not `timingSafeEqual`.**
|
||||
`src/hostd/auth.ts` (~line 50): `req.auth.token !== expectedToken`
|
||||
leaks length / common-prefix via timing.
|
||||
Worse: `caller` (`operator` vs `tenant-agent`) and `tenantId` come
|
||||
from the *same untrusted JSON body* that carries the token. Anyone
|
||||
holding `CONTROLPLANE_SHARED_SECRET` can claim any `tenantId`.
|
||||
Per-tenant authorization in `authorizeHostdOperation` is the only
|
||||
barrier; if a tenant agent ever holds the shared secret, it can
|
||||
impersonate any other tenant by setting the field.
|
||||
No rotation/revocation primitive — there's only one shared secret.
|
||||
|
||||
### Important
|
||||
|
||||
4. **No tests for `operator-auth-reset.ts` or `controlplane-agent-keys.ts`.**
|
||||
Both touch credential state on disk + DB. Specifically untested:
|
||||
- `writeOperatorAuthEnv` (`.env` clobber, partial-write).
|
||||
- `resetControlplaneBootstrapOperatorPassword` "existing→reset"
|
||||
fallback path.
|
||||
- Agent-key file-vs-DB sync.
|
||||
`src/telegram-newpass-command.test.ts` mocks
|
||||
`resetOperatorAuthCredentials`, so it tests the Telegram surface,
|
||||
not the reset itself.
|
||||
|
||||
5. **Agent API key file persistence.**
|
||||
`src/controlplane-agent-keys.ts:50–58`:
|
||||
- `fs.writeFileSync` then `chmodSync` to `0o600` — file briefly
|
||||
visible at default umask between create and chmod (observable on
|
||||
a multi-user host).
|
||||
- No atomic rename.
|
||||
- No rotation/revocation path other than DB edit + `fs.unlink`.
|
||||
- In-memory `syncedApiKeys` map (line ~20) caches the key forever
|
||||
per process.
|
||||
|
||||
6. **Bootstrap idempotency / partial-failure gaps.**
|
||||
- `decideRootAdminBootstrapGroup` (`src/root-admin-bootstrap.ts`)
|
||||
keys off `registeredChatJids.length === 0`. No concurrency guard
|
||||
if `main()` runs twice (systemd restart race).
|
||||
- Commit `8f7836c` does
|
||||
`fs.rmSync(skillDest, { recursive: true, force: true })` before
|
||||
recreate (`controlplane-db.ts:~576`). Operator-edited skill files
|
||||
are silently nuked on rerun.
|
||||
|
||||
7. **`TENANT_ID` trim inconsistency.**
|
||||
`src/controlplane-agent-keys.ts` trims `TENANT_ID`;
|
||||
`src/index.ts:~1131` does not before passing to
|
||||
`decideRootAdminBootstrapGroup`. A misconfigured `' '` (whitespace)
|
||||
gets treated as non-root in one path and root-ish in another.
|
||||
|
||||
8. **mevy fixture drift in tests.**
|
||||
`infra/tenants.yaml` removed mevy (commit `39d3752`), but
|
||||
`src/tenant-registry.test.ts` still uses mevy as the canonical
|
||||
fixture (lines 33–92, 131, 202, 226–235, …). Tests pass but reach
|
||||
into a synthetic tenant that doesn't match the real registry shape.
|
||||
|
||||
### Nits
|
||||
|
||||
- `parsePasswordHash` (`controlplane-db.ts:586`) rejects `p<2` / `t<2` —
|
||||
single-core hosts or future tuning that legitimately uses `p=1`
|
||||
silently fail verification rather than reporting "outdated params."
|
||||
- `getDefaultAgents` (commit `8f7836c`) silently substitutes
|
||||
`SERVICE_NAME` when `agentName` is empty. A log line on substitution
|
||||
would help debug mis-typed env.
|
||||
|
||||
## Files most worth a careful human pass
|
||||
|
||||
- `src/controlplane-db.ts` — agent-by-key scan, password-hash parser
|
||||
- `src/controlplane-agent-keys.ts` — no tests, no rotation, atomic-write
|
||||
- `src/operator-auth-reset.ts` — no tests, multi-step non-transactional, `.env` clobber
|
||||
- `src/telegram-commands.ts` — newpass handlers (rate limit, audit, plaintext echo)
|
||||
- `src/hostd/auth.ts` — timing-safe compare, tenantId trust
|
||||
- `src/tenant-registry.test.ts` — mevy fixture drift
|
||||
- `package.json` + `.nvmrc` — engines vs nvmrc mismatch
|
||||
Loading…
Add table
Reference in a new issue