diff --git a/CHANGELOG.md b/CHANGELOG.md index d2496e1..8f5bc39 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -54,6 +54,35 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 - `README.md`, `CLAWDIE-ISO.md`, `AGENTS.md` synced to mention the agent-CLI prereq gate and the npm-globals bundle path - `AGENTS.md` + nginx/freebsd-admin skills updated with controlplane dashboard build notes (Paperclip UI) and Tailscale proxy/PF pointers +### Added (operator observability + provider fallback, apr.2026) + +- **Provider fallback layer** (`src/provider-fallback.ts`) — automatically swaps the configured LLM provider for an operator-defined fallback when the primary hits a usage cap. Detects `429 Usage limit reached` from pi stderr/stdout, parses `Your limit will reset at YYYY-MM-DD HH:MM:SS`, and marks a cooldown until the reset timestamp passes. Cooldowns are in-memory plus persisted to `$CLAWDIE_VAR_DIR/provider-cooldowns.json` (default `$HOME/.clawdie/state/`) so a restart inside the cap window does not re-trip the cap. Wired into `agent-runner.ts` (main chat) and `controlplane-heartbeat.ts` (specialists). Per-chat overrides (`group.jailConfig.provider`) are unchanged — only the spawn-time effective values are swapped while the cooldown is live. +- `LLM_FALLBACK_PROVIDER`, `LLM_FALLBACK_MODEL`, `LLM_FALLBACK_DEFAULT_COOLDOWN_SECONDS` config — operator picks the fallback (e.g. `openrouter` + a free-tier model). Default cooldown (3600s) is used only when the cap message has no parseable reset stamp. +- `getLlmKeyForProvider(provider)` (`src/env.ts`) — provider-aware secret resolution so the right API key is injected when fallback swaps providers; falls back to first-available when the requested key is absent. +- Startup validation: when `LLM_FALLBACK_PROVIDER` is set, the matching API key is now in the `criticalConfig` warn list. Warns separately when `LLM_FALLBACK_PROVIDER` is set without `LLM_FALLBACK_MODEL`. +- `/clearcooldown` admin command (ops-chat-gated) — lists active cooldowns when called without args; takes `` or `all`. Persists immediately so cleared state survives restart. +- `/policy` now shows a `Provider cooldown: until → fallback ` line for each active cooldown. +- Activity payload now records `effective_provider` / `effective_model` next to `actual_*` so for any run you can read configured vs effective vs actual. +- **Structured operator reports family** with consistent `Observed` / `Interpretation` / `Operator Notes` sections — `src/reports/{system,disk,tasks,budget,publish,test}-report.ts`. Each report is a pure builder + renderer fed by raw inputs (DB rows, command output, JSON status files), tested independently of the wiring layer. +- `/report`, `/disk`, `/tasks`, `/budgetreport`, `/publishreport`, `/testreport` Telegram commands — the structured-report surfaces. +- **Test/build status pipeline** — `scripts/write-test-build-status.sh` runs the project's `npm run build` and `npx vitest run --reporter=json --outputFile=...`, then writes `build-status.json` and `test-status.json` to the status directory: `$AGENT_STATUS_DIR` (primary) → `$CLAWDIE_VAR_DIR` (legacy) → `/tmp/status` (default). `/testreport` reads these files; missing or stale (>6h) files degrade to `unknown` with an action note rather than fabricating success. Pre-commit/post-commit hooks append the latest status to commit messages so reviewers see what was passing at commit time. +- **Free-text ops routing** (`src/report-intent.ts`) — bot-addressed phrasings like "disk usage", "are the tests passing", "what tasks do we have", "budget report" are classified by `classifyReportIntent()` and routed to the matching structured builder instead of the LLM path. Keeps memory/narrative recall from overriding a fresh probe. +- `isOpsFlavored()` — broader pattern matcher used to suppress stale memory injection on ops-flavored prompts so the LLM answers from live tools rather than narrative recall. +- **Specialist capability gate** (`src/agent-capabilities.ts`) — pre-flight check that compares the requested skill (and task description) against the assigned jail's installed tools, refusing the run with a clear reason when the agent cannot perform it. +- Telegram bot now publishes a proper command menu via `setMyCommands` with separate command lists for private chats vs the ops chat (`src/channels/telegram.ts`). +- `AGENTS.md` § "Verify Before Claiming Remote State" — convention requiring `git fetch` before reporting on any remote ref. Born from a real two-agent confusion on 26.apr where stale `origin/multitenant` refs in two worktrees produced contradictory "no new remote work" claims. + +### Changed (operator observability) + +- Many Telegram commands moved from `requireRegistered(ctx)` gate to direct chat resolution; per-handler `requireAdmin` / `requireOpsChat` still enforce auth. Effect: admins can run read-only ops commands from any chat without registering it first. +- `/status` ZFS section caps at 8 lines with a "… N more dataset(s) hidden" footer. +- `parseBastilleList` consolidated to use the shared `bastille-list.ts` parser. `summarizeZfsRows` extracted as a pure exportable helper. + +### Fixed (operator observability) + +- `/report` controlplane probe: when `CONTROLPLANE_BIND_HOST=0.0.0.0`, `getControlplaneProbeHost()` now derives a reachable host from `BETTER_AUTH_URL` instead of probing the wildcard address. Previously the report would say "controlplane unreachable" even when controlplane was healthy. +- Test artifacts now write to repo-local `tmp/` instead of system `/tmp` (per `AGENTS.md` § "Temporary File Storage"). + ## [0.10.0] - 2026-04-07 ### Paperclip Control Plane Integration diff --git a/README.md b/README.md index 742f59d..7e85c1d 100644 --- a/README.md +++ b/README.md @@ -450,18 +450,36 @@ From the main channel (your self-chat), you can manage groups and tasks: ## Telegram Commands -| Command | Description | Auth | -| ------------- | -------------------------------------------------- | ------ | -| `/status` | System status: jails, ZFS, PF, budget, model | anyone | -| `/usage` | Per-agent token budget breakdown | anyone | -| `/compact` | Compact session (summarize old, keep recent turns) | admin | -| `/new` | Hard reset session, start fresh | admin | -| `/resume` | Unpause a budget-paused chat | admin | -| `/stop` | Kill running agent mid-response | admin | -| `/tts` | Toggle voice replies (on/off/status/default) | admin | -| `/activation` | Set trigger mode (always/mention) | admin | -| `/whoami` | Show your Telegram identity | anyone | -| `/help` | List available commands | anyone | +A short selection — for the full reference (status, structured reports, +runtime, sessions, admin actions, free-text routing) see +[Operator Commands](docs/public/operate/operator-commands.md). + +| Command | Description | Auth | +| ---------------- | -------------------------------------------------------------- | --------- | +| `/status` | System summary: jails, ZFS, PF, budget, model | anyone | +| `/report` | Structured system + auth report | admin | +| `/disk` | Structured ZFS pool + snapshot report | admin | +| `/tasks` | Structured controlplane task report | admin | +| `/budgetreport` | Structured budget + token analytics | admin | +| `/publishreport` | Structured tenant publish/content report | admin | +| `/testreport` | Structured build + test status (from wrapper-written JSON) | admin | +| `/policy` | Default runtime, per-chat overrides, fallback cooldowns | anyone | +| `/usage` | Per-agent token budget breakdown | anyone | +| `/clearcooldown` | Clear a [provider fallback](docs/public/operate/provider-fallback.md) cooldown | ops chat | +| `/budgetreset` | Reset agent token budget | ops chat | +| `/compact` | Compact session (summarize old, keep recent turns) | admin | +| `/new` | Hard reset session, start fresh | admin | +| `/resume` | Unpause a budget-paused chat | admin | +| `/stop` | Kill running agent mid-response | admin | +| `/tts` | Toggle voice replies (on/off/status/default) | admin | +| `/activation` | Set trigger mode (always/mention) | admin | +| `/whoami` | Show your Telegram identity | anyone | +| `/help` | List available commands | anyone | + +The bot also routes **free-text ops phrasings** ("disk usage", "are the +tests passing", "task report", etc.) to the matching structured report +instead of the LLM path — see +[Structured Reports → Free-Text Routing](docs/public/operate/structured-reports.md#free-text-routing). ### Session Compaction diff --git a/docs/public/architecture/controlplane.md b/docs/public/architecture/controlplane.md index 1716b38..cf109e7 100644 --- a/docs/public/architecture/controlplane.md +++ b/docs/public/architecture/controlplane.md @@ -136,9 +136,30 @@ just setup-controlplane --- +## Runtime Observability + +Every agent run (orchestrator main chat or specialist heartbeat) records +three provider/model values in `agent_activity.payload`: + +| Field | Meaning | +| -------------- | --------------------------------------------------------- | +| `configured_*` | What `.env` says (`PI_TUI_PROVIDER` / `PI_TUI_MODEL`) | +| `effective_*` | What was actually passed to pi (after fallback swap) | +| `actual_*` | What pi reports having used (parsed from session JSONL) | + +`configured_*` and `effective_*` differ when [provider fallback](../operate/provider-fallback/) +is active (cooldown is live, runtime is using the operator's chosen +fallback). `actual_*` should match `effective_*` for a successful run; a +divergence suggests pi rewrote the model selection internally. + +`/budgetreport` and `/tokens` surface these values; `/policy` shows the +fallback cooldown line when one is active. + ## References - `doc/CONTROLPLANE-ARCHITECTURE.md` — detailed service layout - `doc/CONTROLPLANE-MESSAGE-CONTRACT.md` — API contracts (what agents query and post) - `doc/CONTROLPLANE-AGENT-ROLES.md` — role definitions, skill mappings, budgets - `SOUL.md`, `SYSADMIN_AGENT.md`, `DB_ADMIN_AGENT.md`, `GIT_ADMIN_AGENT.md` — agent identity files +- [Provider Fallback](../operate/provider-fallback/) — automatic provider switching when the primary hits a usage cap +- [Structured Reports](../operate/structured-reports/) — operator-facing report family + free-text routing diff --git a/docs/public/operate/index.md b/docs/public/operate/index.md index 3b29782..1b3d4c3 100644 --- a/docs/public/operate/index.md +++ b/docs/public/operate/index.md @@ -7,5 +7,8 @@ Runbooks for day-to-day operation and recovery. - [Security](./security/) - [Monitoring](./monitoring/) +- [Operator Commands](./operator-commands/) +- [Structured Reports](./structured-reports/) +- [Provider Fallback](./provider-fallback/) - [DB disaster recovery](./db-disaster-recovery/) - [Git storage](./git-storage/) diff --git a/docs/public/operate/monitoring.md b/docs/public/operate/monitoring.md index ad20d71..700b2fe 100644 --- a/docs/public/operate/monitoring.md +++ b/docs/public/operate/monitoring.md @@ -145,3 +145,37 @@ Bastille monitor and Clawdie doctor solve different problems: - **Clawdie doctor** — application, pipeline, and control plane health Use both; don't confuse them. + +## Operator-Facing Reports + +Beyond the runtime health files above, the agent exposes a family of +**structured reports** for operator inspection on demand. Each report has a +matching Telegram slash command and follows the same `Observed` / +`Interpretation` / `Operator Notes` template — see +[Structured Reports](./structured-reports/) for the design and the full list. + +| Report | Command | What it answers | +| ---------- | ----------------- | --------------------------------------------------- | +| System | `/report` | Are services + jails + controlplane healthy? | +| Disk | `/disk` | What is consuming ZFS pool space and snapshots? | +| Tasks | `/tasks` | What is in the controlplane task queue? | +| Budget | `/budgetreport` | Token budgets and burn analytics | +| Publish | `/publishreport` | Tenant publish/content state | +| Test/Build | `/testreport` | Was the last build/test run green? | + +`/testreport` is fed by `scripts/write-test-build-status.sh`, not by the +running process — invoke the wrapper from CI, a hook, or by hand to refresh +its status files. The pre-commit and post-commit hooks run it automatically +so each commit message footer reflects what was passing at commit time. + +For the full operator command reference (status, sessions, admin actions, +free-text routing), see [Operator Commands](./operator-commands/). + +## Provider Fallback Health + +When the configured LLM provider is in cooldown (e.g. zAI usage cap), the +agent transparently routes to the operator-defined fallback. Active +cooldowns are visible in `/policy` and as structured `logger.warn` lines on +every fallback-active run. See [Provider Fallback](./provider-fallback/) for +configuration, manual release (`/clearcooldown`), and the +configured / effective / actual observability triple. diff --git a/docs/public/operate/operator-commands.md b/docs/public/operate/operator-commands.md new file mode 100644 index 0000000..f9e1873 --- /dev/null +++ b/docs/public/operate/operator-commands.md @@ -0,0 +1,118 @@ +--- +title: 'Operator Commands' +description: Reference for the Telegram slash commands operators use to inspect and control the running agent. +--- + +The agent exposes its operational surface as Telegram slash commands. This +page is the single reference for what each command does, who can run it, +and which underlying surface it inspects. The Telegram bot also publishes a +native command menu via `setMyCommands` — start typing `/` in any chat for +the live in-app list. + +## Authorization Layers + +Three layers gate the commands. A command may pass through one, two, or all +three: + +| Gate | Where | Effect | +| ------------------- | -------------------------------------- | --------------------------------------------- | +| `requireAdmin` | Per-handler | Only operators on the admin allow-list run it | +| `requireOpsChat` | Per-handler (write/destructive only) | Only the configured ops chat may invoke it | +| Per-chat overrides | `group.jailConfig` (registered groups) | Per-chat model/provider overrides | + +Read-only commands (`/status`, `/disk`, `/report`, `/testreport`, etc.) are +admin-gated but not ops-chat-gated — admins can run them from any chat. +Destructive commands (`/budgetreset`, `/clearcooldown`) require the ops chat. + +## Status & Identity + +| Command | Purpose | Surface | +| ------------- | ------------------------------------------------------------- | --------------------------------------------- | +| `/ping` | Confirm the bot process is responsive | Direct reply | +| `/chatid` | Print the current chat's JID | Useful for `.env` registration | +| `/whoami` | Show your Telegram identity | Confirms admin-allowlist match | +| `/status` | Compact system summary (jails, ZFS pools, PF, budget) | `src/system-state.ts` snapshot | + +## Structured Reports + +All structured reports follow the same `Observed` / `Interpretation` / +`Operator Notes` template. See [Structured Reports](./structured-reports/) for +the design pattern. + +| Command | Report | Source | +| ---------------- | ----------------------------------------------------- | ----------------------------------------------------------------------- | +| `/report` | System & auth — services, jails, PF, controlplane | `hostd` probes + `probeControlplaneAuth()` | +| `/disk` | ZFS pools and snapshots | `zpool list -H` + `zfs list -H -o name,usedsnap` | +| `/tasks` | Controlplane task queue | `getAllTasks()` (Postgres) | +| `/budgetreport` | Token budgets and burn analytics | `getAllBudgets()` + `getAgentTokenAnalytics()` | +| `/publishreport` | Tenant publish/content state | `loadTenantRegistry()` + webroot inspection | +| `/testreport` | Build and test pass/fail | `tmp/status/build-status.json` + `tmp/status/test-status.json` | + +`/testreport` is fed by `scripts/write-test-build-status.sh` — see +[Structured Reports](./structured-reports/#test-build-pipeline) for the +write/read contract. + +## Runtime & Policy + +| Command | Purpose | +| ----------------- | ------------------------------------------------------------------------- | +| `/policy` | Active runtime policy (default model, overrides, cooldowns, budget state) | +| `/budget` | Alias for `/policy` | +| `/usage` | Token budget per agent | +| `/tokens` | Runtime token burn per agent (last-N analytics) | +| `/model` | Set provider/model for this chat (per-chat override) | +| `/activation` | Set trigger mode (always-respond vs mention-only) | +| `/tts` | Toggle voice replies (`on` / `off` / `status`) | + +`/policy` shows the [Provider fallback](./provider-fallback/) cooldown line +when one is active. + +## Sessions + +| Command | Purpose | +| ------------- | ------------------------------------------------------------------ | +| `/new` | Reset this chat's session | +| `/compact` | Compact the session (summarize old, keep recent) | +| `/stop` | Stop a running agent for this chat | +| `/resume` | Resume a budget-paused chat | + +## Admin Actions (Ops-Chat Only) + +| Command | Purpose | +| ---------------------- | ------------------------------------------------------------------------ | +| `/budgetreset ` | Reset an agent's token budget. `all` requires `confirm` second arg. | +| `/clearcooldown [id]` | Clear a [provider fallback](./provider-fallback/) cooldown | +| `/audit` | Platform ownership audit (which jail/dataset/service belongs to which) | +| `/snapshots [dataset]` | List ZFS snapshots | +| `/scrub [op]` | ZFS scrub controls (`status` / `start` / `stop`) | +| `/updates` | FreeBSD base + ports update status | +| `/schedule` | Manage scheduled agent tasks (list / add / cancel / done) | + +## Free-Text Routing + +The bot recognizes **bot-addressed** ops-flavored phrasings without requiring +a slash command. Examples that route to structured reports instead of the LLM +path: + +| Phrase | Routed to | +| -------------------------------------- | --------------- | +| `disk usage`, `how much disk` | `/disk` | +| `task report`, `active tasks` | `/tasks` | +| `budget report`, `how many tokens` | `/budgetreport` | +| `are the tests passing`, `build status`| `/testreport` | +| `system report`, `report please` | `/report` | + +This keeps memory or narrative recall from drifting into a stale answer when +fresh structured data is available. The full pattern set lives in +`classifyReportIntent()` in `src/report-intent.ts`. + +A broader `isOpsFlavored()` matcher also suppresses memory injection on any +ops-flavored prompt (services, jails, deploy, auth, controlplane terms), +even when no specific report matches — so the LLM answers from live tools +rather than narrative recall. + +## Help + +`/help` prints the in-bot command list. The list is generated from the same +constants that drive the Telegram menu publication, so it reflects whatever +is currently registered. diff --git a/docs/public/operate/provider-fallback.md b/docs/public/operate/provider-fallback.md new file mode 100644 index 0000000..3086b27 --- /dev/null +++ b/docs/public/operate/provider-fallback.md @@ -0,0 +1,141 @@ +--- +title: 'Provider Fallback' +description: Automatic LLM provider switching when the primary provider hits a usage cap. +--- + +When the primary LLM provider returns a "usage cap reached" error, the agent +keeps replying instead of looping on 429s — it transparently switches to a +configured fallback until the cap window passes, then automatically returns to +the primary. + +## In Plain Language + +- Some LLM providers (notably zAI) impose rolling 5-hour usage caps. When you + hit one, every request fails until the reset. +- Without fallback, the bot would retry the capped provider on every message + and stay broken for hours. +- Fallback puts the capped provider in a "cooldown" until the reset timestamp, + routes new runs through your operator-chosen alternative (e.g. OpenRouter + with a free-tier model), and resumes the primary the moment the cooldown + expires. +- The cooldown survives a process restart so a quick service bounce inside the + cap window does not re-trip the cap. + +## Configuration + +Set in `.env`: + +| Variable | Required | Example | +| --------------------------------------- | ---------------------------------------- | -------------------------------------------- | +| `LLM_FALLBACK_PROVIDER` | yes (when fallback is desired) | `openrouter` | +| `LLM_FALLBACK_MODEL` | recommended | `meta-llama/llama-3.3-70b-instruct:free` | +| `LLM_FALLBACK_DEFAULT_COOLDOWN_SECONDS` | optional (default `3600`) | `1800` | + +The default cooldown is used **only** when the cap message has no parseable +reset stamp. Real zAI cap errors include the reset timestamp and the cooldown +matches the reset exactly. + +The fallback provider's API key (`OPENROUTER_API_KEY` for openrouter, +`ZAI_API_KEY` for zai, etc.) must also be set. The agent verifies this at +startup and warns in the logs if it is missing — the warning is the only +notice you will get before the fallback fails for real. + +## How Cooldowns Work + +1. A run fails with `429 Usage limit reached for 5 hour. Your limit will reset + at YYYY-MM-DD HH:MM:SS`. +2. The runner parses the reset timestamp (treated as local time) and stores + `{ provider: 'zai', until: , reason: }` in memory and on + disk. +3. Every subsequent run consults the cooldown map *before* spawning pi. If the + configured provider is in cooldown, the spawn args swap to the fallback + provider/model. +4. The cooldown auto-expires at the reset timestamp. Next run uses the primary + again. + +The cooldown file lives at `$CLAWDIE_VAR_DIR/provider-cooldowns.json` (default +`$HOME/.clawdie/state/provider-cooldowns.json`). Expired entries are dropped +on load. + +> **Path convention note.** The cooldown file currently uses the legacy +> `$CLAWDIE_VAR_DIR` / `$HOME/.clawdie/state/` resolution. The newer +> [test/build status files](./structured-reports/#test-build-pipeline) +> moved to repo-local `tmp/` to align with `AGENTS.md` § "Temporary File +> Storage". A future code change should harmonize provider-fallback to the +> same precedence (`AGENT_STATUS_DIR` → `CLAWDIE_VAR_DIR` → `tmp/state/`). +> Until then, if you set `AGENT_STATUS_DIR`, also set `CLAWDIE_VAR_DIR` to +> the same path so both subsystems agree. + +## Inspecting State + +`/policy` shows active cooldowns under the runtime line: + +``` +Default runtime: zai / glm-4.6 +Provider cooldown: zai until 2026-04-25T19:00:59 → fallback openrouter/meta-llama/llama-3.3-70b-instruct:free +``` + +When no cooldowns are active, the line is omitted — runtime looks normal. + +Logs include structured warnings on every fallback-active run: + +``` +{ originalProvider: 'zai', fallbackProvider: 'openrouter', cooldownUntil: '...' } Provider fallback active — preferred provider is in cooldown +``` + +And on the run that *trips* the cooldown: + +``` +{ provider: 'zai', until: '2026-04-25T19:00:59', reason: '429 Usage limit reached; resets ...' } Provider cap detected — marking cooldown +``` + +## Manual Release + +If you know the cap was lifted early or want to retry the primary before the +parsed reset time, clear the cooldown manually: + +``` +/clearcooldown # lists active cooldowns and prints usage +/clearcooldown zai # clears one +/clearcooldown all # clears every active cooldown +``` + +The command is admin-only and ops-chat-gated. It persists immediately so the +cleared state survives restart. + +## Observability Triple + +Every agent activity row now records three provider/model values: + +| Field | Meaning | +| -------------------- | -------------------------------------------------------- | +| `configured_*` | What `.env` says (`PI_TUI_PROVIDER` / `PI_TUI_MODEL`) | +| `effective_*` | What was actually passed to pi (after fallback swap) | +| `actual_*` | What pi reports having used (parsed from session JSONL) | + +When fallback is active, `configured_*` and `effective_*` differ. +`actual_*` should match `effective_*` for a successful run; a divergence +suggests pi rewrote the model selection internally. + +## Behavior That Stays The Same + +- **Per-chat overrides** (`group.jailConfig.provider` / `.model`) are not + touched by the cooldown layer. If you have explicitly set a chat to a + specific provider, only that provider's cooldowns affect it. +- **Cap detection is conservative** — the parser only matches the specific + zAI cap signature, not generic 429s, transport errors, or rate-limit + responses from other providers. This is intentional to avoid false + positives. If you need the same behavior for another provider, the + pattern lives in `parseProviderCapError()` in `src/provider-fallback.ts`. + +## When Fallback Is Not Configured + +If a primary provider hits its cap and `LLM_FALLBACK_PROVIDER` is unset: + +- The cooldown is still tracked. +- Runs continue to use the primary and continue to fail until reset. +- Logs include a clear warning: `Provider in cooldown but no fallback configured; passing through`. +- `/policy` will show the cooldown line without a fallback target. + +This is intentional — the fallback is opt-in. Without it, you fail visibly +rather than silently routing to a wrong provider. diff --git a/docs/public/operate/structured-reports.md b/docs/public/operate/structured-reports.md new file mode 100644 index 0000000..06331c6 --- /dev/null +++ b/docs/public/operate/structured-reports.md @@ -0,0 +1,168 @@ +--- +title: 'Structured Reports' +description: The Observed / Interpretation / Operator Notes pattern, the report family, and the free-text routing layer. +--- + +The agent's operator-facing reports follow a single template so an operator +or a peer agent can read any of them at a glance and know what is observed +fact, what is interpretation, and what action (if any) is suggested. + +## In Plain Language + +- A **structured report** is a deterministic snapshot of one slice of the + system (disk, services, tasks, budget, publish state, build/test status). +- Reports are built from **raw inputs** — DB rows, command output, JSON + status files — by a **pure builder function**. The builder has no side + effects and is unit-tested independently of how the report is delivered. +- The result is rendered to HTML for Telegram and could equally be rendered + to JSON for a dashboard or to plain text for a CLI. +- When the agent answers an ops question, it reads the structured report + rather than narrating from memory. This matters because memory drifts; + ZFS pool capacity does not. + +## The Three-Section Template + +Every structured report has the same three top-level sections: + +### Observed + +What the report measured, with no interpretation. ZFS shows pool A at 87% +capacity. Build status file says `status: "fail"`. The last task in the +queue was created at 10:23. + +This section is the source of truth for the rest of the report. If +`Observed` is empty, the underlying probe failed and the report says so. + +### Interpretation + +A handful of `findings` extracted from `Observed`, each tagged `info`, +`warn`, or `error`. "Pool A is at 87% capacity." "Tests last run failed. +12 failing tests." "No active controlplane tasks are queued right now." + +Findings are short, factual, and avoid recommending action. Their job is +to reduce a wall of data to the few signals that matter. + +### Operator Notes + +Suggestions, conditional and labeled `note` or `action`. "Largest +snapshot: `tank/data@2026-04-20-weekly` (4.2 GB). Remove only if that +rollback point is no longer needed." "Re-run the test wrapper before +relying on this as evidence the branch is green." + +Notes are *suggestions*, not commands. They include the **conditional** +that makes the action correct ("only if X"), so an operator can decide +without re-deriving the context. + +## The Report Family + +| Report | Module | Slash command | Source | +| ---------- | ------------------------------------- | ---------------- | -------------------------------------------------------- | +| System | `src/reports/system-report.ts` | `/report` | `hostd` probes + controlplane auth probe | +| Disk | `src/reports/disk-report.ts` | `/disk` | `zpool list -H` + `zfs list -H -o name,usedsnap` | +| Tasks | `src/reports/tasks-report.ts` | `/tasks` | `getAllTasks()` (Postgres) | +| Budget | `src/reports/budget-report.ts` | `/budgetreport` | `getAllBudgets()` + `getAgentTokenAnalytics()` | +| Publish | `src/reports/publish-report.ts` | `/publishreport` | tenant registry + webroot inspection | +| Test/Build | `src/reports/test-report.ts` | `/testreport` | `tmp/status/build-status.json` + `test-status.json` | + +Each module exports two functions: + +```ts +buildXxxReport(inputs) // pure: takes raw inputs, returns a typed report +renderXxxReport(report) // pure: takes the report, returns an HTML string +``` + +The split lets you unit-test the analysis without touching IO and reuse +the builder against a JSON sink later. + +## Test/Build Pipeline + +`/testreport` is the only report whose source-of-truth is a file the agent +does not write itself. The contract: + +1. `scripts/write-test-build-status.sh` runs `npm run build` and + `npx vitest run --reporter=json --outputFile=...` (or one of them via + `build` / `tests` argument). +2. The wrapper writes two JSON files into the **status directory**: + - `/build-status.json` + - `/test-status.json` + + The status directory resolves with this precedence (matched by both the + wrapper and `getDefaultStatusDir()` in `src/reports/test-report.ts`): + + 1. `$AGENT_STATUS_DIR` if set + 2. `$CLAWDIE_VAR_DIR` if set (legacy) + 3. `/tmp/status` (default) + + Per `AGENTS.md` § "Temporary File Storage", artifact paths under repo + `tmp/` are the preferred default — point `$AGENT_STATUS_DIR` elsewhere + only if you have a reason to. + +3. `/testreport` reads both files, builds the report, renders it. + +The schema for each file is intentionally narrow: + +```json +{ + "status": "ok" | "fail" | "unknown", + "completedAt": "2026-04-26T10:00:00Z", + "command": "npx vitest run", + "exitCode": 0, + "durationMs": 12345, + "totalTests": 1934, + "failingTests": 0, + "skippedTests": 0, + "failingTestNames": ["..."], + "summary": "..." +} +``` + +Only `status` and `completedAt` are required; everything else degrades +gracefully. Files older than 6 hours surface as `stale` with a warn finding. +Missing or malformed files surface as `status: "unknown"` with an action +note rather than fabricating success. + +The pre-commit and post-commit hooks call this wrapper so commit messages +include a `Build: pass | Tests: 12 failed | 1936 passed (1948)` footer +visible in `git log`. + +## Free-Text Routing + +When the agent receives a bot-addressed message, `classifyReportIntent()` in +`src/report-intent.ts` checks a set of conservative regexes and routes to +the matching structured report instead of the LLM path. This means an +operator typing "how much disk?" gets a fresh `/disk` snapshot, not a +half-remembered narrative from a session three days ago. + +The routing rules are intentionally **narrow** (false negatives are fine, +false positives are not). For broader detection of "this prompt smells +operational", a separate `isOpsFlavored()` matcher catches a wider net of +phrasings (services, jails, deploy, controlplane terms, etc.) — and is +used to **suppress memory injection** on those prompts so the LLM answers +from live tools rather than narrative recall. + +| Function | Use | +| ---------------------------- | -------------------------------------------------------------------- | +| `classifyReportIntent(text)` | Hard route → structured report. Only fires on confident phrasings. | +| `isOpsFlavored(text)` | Soft signal → drop memory injection. Wider net, lower bar. | + +Both ignore slash-command messages (those are routed by grammy) and +`@assistant` mentions are stripped before matching. + +## Why Pure Builders + +The pure builder pattern was a deliberate choice over a one-shot +"render-to-HTML now" approach. Three reasons: + +- **Testable** — unit tests exercise the analysis logic with synthetic + inputs, no Postgres or pi running. +- **Reusable** — the same `buildDiskReport()` could feed a dashboard widget + or a daily email digest later. We are not committed to Telegram as the + only sink. +- **Inspectable** — when an operator asks "why did the report flag this?", + the answer is a `findings[]` array with explicit codes, not opaque text + generation. + +If you add a new report, follow the same shape: a `Report` interface, a +`buildXxxReport()` function with `findings: XxxReportFinding[]` and +`operatorNotes: XxxReportOperatorNote[]`, a `renderXxxReport()` HTML +renderer, and a `*.test.ts` covering the builder independently.