Update system-namespace review with resolved findings
Reflects fixes landed in8414953,5c685f1,7124c1c,7acf771. Open section now lists only what still needs attention; resolved items kept as historical context with the commit that closed each. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --- Build: FAIL | Tests: FAIL — 15 failed
This commit is contained in:
parent
7acf771a3b
commit
6f03f0d5b0
1 changed files with 65 additions and 101 deletions
|
|
@ -1,112 +1,68 @@
|
|||
# system-namespace branch review
|
||||
|
||||
**Date:** 2026-05-03
|
||||
**Original audit date:** 2026-05-03
|
||||
**Last updated:** 2026-05-03 (post-`7acf771`)
|
||||
**Branch:** `system-namespace`
|
||||
**Commit range audited:** `aacfa16..16400ea` (19 commits, ~2.4k insertions / ~787 deletions, 58 files)
|
||||
**Reviewer:** Claude Opus 4.7 (Linux/Node 22 dev box)
|
||||
**Original commit range audited:** `aacfa16..16400ea` (19 commits)
|
||||
**Reviewer:** Claude Opus 4.7
|
||||
**Production target:** FreeBSD + Node 24
|
||||
|
||||
This review enumerates blind spots and risks in the recent batch of commits.
|
||||
It is not a fix list — it is a punch list for a follow-up agent (running on
|
||||
the FreeBSD host with Node 24) to triage and address.
|
||||
This doc was a punch list for the FreeBSD/Node 24 follow-up agent.
|
||||
Most Critical/Important findings have since been addressed by commits
|
||||
`8414953`, `5c685f1`, `7124c1c`, `7acf771`. The "Open" section below
|
||||
reflects what still needs attention. Resolved findings are kept in the
|
||||
"Resolved" section as historical context.
|
||||
|
||||
## Context the next agent needs
|
||||
|
||||
- Production runs **FreeBSD with Node 24**. The original auditor was on
|
||||
**Linux Node 22**, which caused one false alarm — see "Ruled out" below.
|
||||
- `package.json` declares `engines.node: ">=24"`; `.nvmrc` says `22`. The
|
||||
mismatch is a dev-ergonomics issue, not a prod bug.
|
||||
- Themes of the commit batch:
|
||||
- root-admin/tenant bootstrap on fresh installs
|
||||
(`ae7183e`, `8f7836c`, `3b4fb67`)
|
||||
- hostd + Telegram auth hardening (`c1560e1`, `843d87b`)
|
||||
- per-agent controlplane API tokens (`0d6414a`, `b43e318`)
|
||||
- dashboard password reset via Telegram (`2874dba`)
|
||||
- CMS jail routing for root domains (`f4cb61b`, `591d96a`)
|
||||
- New files most relevant to this review:
|
||||
- `src/controlplane-agent-keys.ts`
|
||||
**Linux Node 22**. `package.json` `engines.node: ">=24"`; `.nvmrc`
|
||||
says `22` (dev-ergonomics issue, not prod).
|
||||
- Test suite is currently **green** on the right runtime (2137 passed,
|
||||
622 files) per `7acf771`. Trailers like "Tests: FAIL — 26 failed" on
|
||||
earlier commits in the chain were fixture issues, not real
|
||||
regressions.
|
||||
- New files most relevant to credential/auth surface:
|
||||
- `src/controlplane-agent-keys.ts` (+ test)
|
||||
- `src/hostd/auth.ts` (+ test)
|
||||
- `src/operator-auth-reset.ts`
|
||||
- `src/root-admin-bootstrap.ts` (+ test)
|
||||
- `src/telegram-newpass-command.test.ts`
|
||||
- `src/telegram-auth.ts`, `src/telegram-command-registry.ts`,
|
||||
`src/telegram-types.ts` (introduced by `defc8df` — not audited yet)
|
||||
|
||||
## Ruled out (don't re-audit these)
|
||||
|
||||
- **`crypto.argon2` API existence.** Verified Node 24 ships
|
||||
`crypto.argon2` / `crypto.argon2Sync` natively — see Node 24 docs.
|
||||
The calls at `src/controlplane-db.ts:556` and `:644` only fail on
|
||||
Node 22 dev boxes. On FreeBSD/Node 24 they're correct.
|
||||
- **`crypto.argon2` API existence.** Node 24 ships `crypto.argon2` /
|
||||
`crypto.argon2Sync` natively. Calls in `src/controlplane-db.ts` only
|
||||
fail on Node 22 dev boxes; correct on FreeBSD/Node 24.
|
||||
- **Body-size limit coverage.** `readBody` is only called from
|
||||
`readJsonBody`, and every POST handler in `src/controlplane-api.ts`
|
||||
routes through `readJsonBody` (verified: lines 971, 1012, 1112, 1242).
|
||||
The `CONTROLPLANE_MAX_BODY_BYTES` cap (`b43e318`) covers all current
|
||||
POST endpoints. Re-check if a new handler is added.
|
||||
`readJsonBody`; every POST handler in `src/controlplane-api.ts`
|
||||
routes through it. `CONTROLPLANE_MAX_BODY_BYTES` (`b43e318`) covers
|
||||
all current POST endpoints. Re-check if a new handler is added.
|
||||
|
||||
## Findings
|
||||
|
||||
### Critical
|
||||
|
||||
1. **`getAgentByApiKey` is an O(N) argon2id verify per request — DoS amplifier.**
|
||||
`src/controlplane-db.ts` (commit `0d6414a`) loads all agents and runs
|
||||
argon2id verify against each on every authenticated agent call.
|
||||
Native Node 24 argon2id is still memory-hard (m=64MiB, t=3, p=4 by
|
||||
default in the helper). Unauthenticated callers can submit any
|
||||
48-char string to force the full-table scan; cost scales with N.
|
||||
Body-size limit doesn't help — the cost is in the bearer token
|
||||
itself.
|
||||
*Fix direction:* index lookup by an HMAC of the token, or by a
|
||||
token-id prefix; verify only one row.
|
||||
|
||||
2. **`/newpass` flow leaks plaintext + clobbers `.env` non-atomically.**
|
||||
`src/telegram-commands.ts` `handleNewPassCommand` /
|
||||
`handleNewPassCallback` (commit `2874dba`):
|
||||
- Plaintext password echoed back into the Telegram chat (chat history
|
||||
becomes a credential store).
|
||||
- Plaintext also written to `.env` via
|
||||
`src/operator-auth-reset.ts` `writeOperatorAuthEnv`
|
||||
using `fs.writeFileSync` (no atomic temp+rename, partial-write risk).
|
||||
- No rate limit on the command — admin can spam reset.
|
||||
- No audit trail.
|
||||
- No "log out other sessions."
|
||||
- Multi-step reset (legacy operators table → migrations →
|
||||
better-auth user → reset → `.env`) has no transaction. Failure
|
||||
between DB update and `.env` write desyncs the two; user sees
|
||||
"failed" while DB has new password and `.env` has old.
|
||||
|
||||
3. **Hostd auth uses `!==`, not `timingSafeEqual`.**
|
||||
`src/hostd/auth.ts` (~line 50): `req.auth.token !== expectedToken`
|
||||
leaks length / common-prefix via timing.
|
||||
Worse: `caller` (`operator` vs `tenant-agent`) and `tenantId` come
|
||||
from the *same untrusted JSON body* that carries the token. Anyone
|
||||
holding `CONTROLPLANE_SHARED_SECRET` can claim any `tenantId`.
|
||||
Per-tenant authorization in `authorizeHostdOperation` is the only
|
||||
barrier; if a tenant agent ever holds the shared secret, it can
|
||||
impersonate any other tenant by setting the field.
|
||||
No rotation/revocation primitive — there's only one shared secret.
|
||||
## Open findings
|
||||
|
||||
### Important
|
||||
|
||||
4. **No tests for `operator-auth-reset.ts` or `controlplane-agent-keys.ts`.**
|
||||
Both touch credential state on disk + DB. Specifically untested:
|
||||
- `writeOperatorAuthEnv` (`.env` clobber, partial-write).
|
||||
- `resetControlplaneBootstrapOperatorPassword` "existing→reset"
|
||||
fallback path.
|
||||
- Agent-key file-vs-DB sync.
|
||||
`src/telegram-newpass-command.test.ts` mocks
|
||||
`resetOperatorAuthCredentials`, so it tests the Telegram surface,
|
||||
not the reset itself.
|
||||
1. **No direct tests for `operator-auth-reset.ts`.**
|
||||
`controlplane-agent-keys.ts` got tests in `8414953`, but
|
||||
`operator-auth-reset.ts` still has no direct test file. Tested only
|
||||
transitively via `telegram-newpass-command.test.ts` (which mocks
|
||||
`resetOperatorAuthCredentials`) and indirectly by setup tests.
|
||||
Specifically untested: `writeOperatorAuthEnv` atomic write +
|
||||
chmod, `resetControlplaneBootstrapOperatorPassword`
|
||||
"existing→reset" fallback, multi-step rollback semantics.
|
||||
|
||||
5. **Agent API key file persistence.**
|
||||
2. **Agent API key file write race.**
|
||||
`src/controlplane-agent-keys.ts:50–58`:
|
||||
- `fs.writeFileSync` then `chmodSync` to `0o600` — file briefly
|
||||
visible at default umask between create and chmod (observable on
|
||||
a multi-user host).
|
||||
- No atomic rename.
|
||||
visible at default umask between create and chmod.
|
||||
- No atomic temp+rename. (Note: `operator-auth-reset.ts` was fixed
|
||||
in `5c685f1` to use temp+rename; the agent-key writer wasn't.)
|
||||
- No rotation/revocation path other than DB edit + `fs.unlink`.
|
||||
- In-memory `syncedApiKeys` map (line ~20) caches the key forever
|
||||
per process.
|
||||
- In-memory `syncedApiKeys` map caches the key forever per process.
|
||||
|
||||
6. **Bootstrap idempotency / partial-failure gaps.**
|
||||
3. **Bootstrap idempotency / partial-failure gaps.**
|
||||
- `decideRootAdminBootstrapGroup` (`src/root-admin-bootstrap.ts`)
|
||||
keys off `registeredChatJids.length === 0`. No concurrency guard
|
||||
if `main()` runs twice (systemd restart race).
|
||||
|
|
@ -115,33 +71,41 @@ the FreeBSD host with Node 24) to triage and address.
|
|||
recreate (`controlplane-db.ts:~576`). Operator-edited skill files
|
||||
are silently nuked on rerun.
|
||||
|
||||
7. **`TENANT_ID` trim inconsistency.**
|
||||
4. **`TENANT_ID` trim inconsistency.**
|
||||
`src/controlplane-agent-keys.ts` trims `TENANT_ID`;
|
||||
`src/index.ts:~1131` does not before passing to
|
||||
`decideRootAdminBootstrapGroup`. A misconfigured `' '` (whitespace)
|
||||
gets treated as non-root in one path and root-ish in another.
|
||||
|
||||
8. **mevy fixture drift in tests.**
|
||||
`infra/tenants.yaml` removed mevy (commit `39d3752`), but
|
||||
`src/tenant-registry.test.ts` still uses mevy as the canonical
|
||||
fixture (lines 33–92, 131, 202, 226–235, …). Tests pass but reach
|
||||
into a synthetic tenant that doesn't match the real registry shape.
|
||||
5. **`telegram-auth.ts` / `telegram-command-registry.ts` not yet audited.**
|
||||
Introduced by `defc8df` (Telegram refactor). Worth confirming the
|
||||
admin-check used to gate `/newpass` is centralized there and not
|
||||
bypassable, and that command registration doesn't expose any
|
||||
handler without an auth guard.
|
||||
|
||||
### Nits
|
||||
|
||||
- `parsePasswordHash` (`controlplane-db.ts:586`) rejects `p<2` / `t<2` —
|
||||
single-core hosts or future tuning that legitimately uses `p=1`
|
||||
silently fail verification rather than reporting "outdated params."
|
||||
- `parsePasswordHash` (`controlplane-db.ts:~586`) rejects `p<2` / `t<2`
|
||||
— single-core hosts or future tuning that legitimately uses `p=1`
|
||||
silently fails verification rather than reporting "outdated params."
|
||||
- `getDefaultAgents` (commit `8f7836c`) silently substitutes
|
||||
`SERVICE_NAME` when `agentName` is empty. A log line on substitution
|
||||
would help debug mis-typed env.
|
||||
|
||||
## Files most worth a careful human pass
|
||||
## Resolved findings
|
||||
|
||||
- `src/controlplane-db.ts` — agent-by-key scan, password-hash parser
|
||||
- `src/controlplane-agent-keys.ts` — no tests, no rotation, atomic-write
|
||||
- `src/operator-auth-reset.ts` — no tests, multi-step non-transactional, `.env` clobber
|
||||
- `src/telegram-commands.ts` — newpass handlers (rate limit, audit, plaintext echo)
|
||||
- `src/hostd/auth.ts` — timing-safe compare, tenantId trust
|
||||
- `src/tenant-registry.test.ts` — mevy fixture drift
|
||||
- `package.json` + `.nvmrc` — engines vs nvmrc mismatch
|
||||
| # | Finding | Resolved by |
|
||||
|---|---------|-------------|
|
||||
| C1 | `getAgentByApiKey` O(N) argon2id scan — DoS amplifier | `8414953` — added `api_key_lookup_hash` (sha256 base64url) column + unique index; lookup is single-row + one verify. Backfill via `syncControlplaneAgentApiKeysFromDisk` on boot. |
|
||||
| C2 | `/newpass` plaintext echo + non-atomic `.env` clobber + no rate limit | `5c685f1` — `OPERATOR_PASSWORD` removed from `.env` entirely; remaining `.env` writes use temp+rename+chmod; private DM delivery only with self-deleting message; 60s per-admin cooldown; graceful pre-DB error path. |
|
||||
| C3 | hostd `!==` token compare + tenantId trusted from request body | `7124c1c` — `crypto.timingSafeEqual` over sha256 digests; `caller` restricted to `'operator'` only (tenant-agent path removed); `tenantId` hardcoded to `''`. Behavioral change: tenant agents can no longer call hostd directly — confirm no caller relied on this. |
|
||||
| I1a | No tests for `controlplane-agent-keys.ts` | `8414953` — added `controlplane-agent-keys.test.ts` (~124 lines). |
|
||||
| I8 | mevy fixture drift in `tenant-registry.test.ts` | `7acf771` — formalized as synthetic test fixture (`src/test-fixtures/platform-registry.ts` + explicit `MEVY_TENANT_BLOCK` constant). Tests now decoupled from real `infra/tenants.yaml` shape. |
|
||||
|
||||
## Files most worth a careful human pass (open items)
|
||||
|
||||
- `src/controlplane-agent-keys.ts` — atomic-write race, no rotation
|
||||
- `src/operator-auth-reset.ts` — no direct test file
|
||||
- `src/root-admin-bootstrap.ts` — concurrency on systemd restart race
|
||||
- `src/telegram-auth.ts` + `src/telegram-command-registry.ts` — not yet audited
|
||||
- `package.json` + `.nvmrc` — engines `>=24` vs nvmrc `22` mismatch
|
||||
|
|
|
|||
Loading…
Add table
Reference in a new issue