Clarify generic explanation grounding direction

--- Build: pass | Tests: pass — 2176 passed (640 files)
2026-05-06 00:12:16 +02:00 · 2026-05-06 00:12:16 +02:00 · 1cf2325749
commit 1cf2325749
parent c0ab09c55b
1 changed files with 156 additions and 0 deletions
--- a/docs/internal/EXPLANATION-GROUNDER-PROPOSAL.md
+++ b/docs/internal/EXPLANATION-GROUNDER-PROPOSAL.md
@ -298,3 +298,159 @@ file is authoritative; any prose paraphrase is a copy that can drift).

 The proposal is to make that distinction explicit instead of choosing
 the same answer for both.
+
+## Plain English Decision
+
+The current "write a custom explainer for each question" path does not
+scale. There are too many subjects:
+
+- `pi`
+- `bash`
+- `sudo`
+- `pkg`
+- `vector embeddings`
+- `aider`
+- `zfs`
+- `snapshots`
+- `git`
+- and many more combinations
+
+We should stop thinking in terms of:
+
+- question variant -> hand-written final answer
+
+And switch to:
+
+- topic/domain -> source-of-truth inputs
+
+In simple terms:
+
+1. Detect that the user is asking for an explanation.
+2. Extract the subject or subjects from the prompt.
+3. Map those subjects to a small number of explanation domains.
+4. Build a grounding pack for those domains from:
+   - live runtime facts
+   - a curated set of source files/docs
+5. Let the LLM write the explanation from that grounding pack.
+
+The maintained unit should be:
+
+- `domain -> trusted sources`
+
+Not:
+
+- `domain -> prewritten prose module`
+
+That is the architectural direction this repo should move toward.
+
+## Refined Recommendation
+
+Do not extend the deterministic-responder pattern beyond the two modules
+already landed:
+
+- `src/memory-architecture.ts`
+- `src/database-architecture.ts`
+
+Treat those as temporary guardrails for already-broken high-volume
+questions, not as the long-term answer.
+
+The long-term default should be one generic explanation pipeline with:
+
+- `isExplanationPrompt(...)`
+- `extractExplanationSubjects(...)`
+- `EXPLANATION_DOMAINS`
+- `buildExplanationGrounding(...)`
+- one normal LLM call using that grounding
+
+The registry should be domain-based, not sentence-based. Example domains:
+
+- `memory`
+- `database`
+- `jails`
+- `controlplane`
+- `git`
+- `zfs`
+- `snapshots`
+- `pi`
+- `aider`
+- `embeddings`
+
+Each domain should define:
+
+- aliases / subject words
+- live runtime fact builders
+- canonical repo sources
+- trim budget / priority
+
+## Preferred Grounding Shape
+
+Do not inject large raw files blindly as the default.
+
+Prefer a smaller "grounding pack" per domain:
+
+- runtime facts
+  - `DB_RUNTIME`
+  - `DB_HOST`
+  - current DB names
+  - current jail shape
+  - current embedding endpoint/model
+- curated snippets
+  - selected code exports
+  - selected canonical docs
+  - small extracted sections, not full files when avoidable
+
+This keeps:
+
+- token cost lower
+- answers closer to current truth
+- debugging simpler
+
+## Claude Work Split
+
+Claude is useful here, but should help on the design and curation side,
+not own the runtime wiring.
+
+Claude-side deliverables:
+
+1. Propose the initial domain registry.
+   - start with 8-12 domains
+   - include aliases and recommended canonical sources
+
+2. Propose source-of-truth inputs per domain.
+   - runtime facts needed
+   - source files/docs needed
+   - what should be trimmed or excluded
+
+3. Draft a seed corpus of explanation prompts.
+   - broad prompts
+   - plain-language prompts
+   - mixed-subject prompts
+   - ambiguous prompts
+
+4. Flag stale or dangerous sources that should never be used for
+   grounding without cleanup.
+
+Codex-side deliverables:
+
+1. Implement the generic explanation pipeline.
+2. Build the grounding-pack loader.
+3. Wire it into `composeSystemContext`.
+4. Keep the two existing deterministic responders as temporary
+   guardrails.
+5. Stop adding new per-topic responder modules unless there is an
+   exceptional reason.
+
+## Immediate Next Step
+
+The next implementation should not be another `*-architecture.ts` file.
+
+Use the next uncovered topic (for example `jails`) to pilot the generic
+explanation grounder instead:
+
+- detect explanation prompt
+- map `jails` to the right domain
+- assemble runtime facts + curated sources
+- let the LLM answer from grounding
+
+If that works well, the project should formally freeze the whack-a-mole
+path and use the generic explanation pipeline for the long tail.