2026-06-21 14:19:23 +02:00
5 changed files with 170 additions and 140 deletions
--- a/AGENTS.md
+++ b/AGENTS.md
@ -2,7 +2,7 @@

 Instructions for AI coding assistants and developers working on the hermes-agent codebase.

-**Never give up on the right solution.**
+**Persist until the right solution is found.**

 ## What Hermes Is

@ -19,8 +19,8 @@ reviewing any change:
 - **Per-conversation prompt caching is sacred.** A long-lived conversation
  reuses a cached prefix every turn. Anything that mutates past context,
  swaps toolsets, or rebuilds the system prompt mid-conversation invalidates
-  that cache and multiplies the user's cost. We do not do it (the one
-  exception is context compression).
+  that cache and multiplies the user's cost. We preserve cache stability
+  throughout every conversation (the one exception is context compression).
 - **The core is a narrow waist; capability lives at the edges.** Every model
  tool we add is sent on every API call, so the bar for a new *core* tool is
  high. Most new capability should arrive as a CLI command + skill, a
@ -37,8 +37,8 @@ This is the project's intent layer. Use it two ways:
   `cannot_reproduce`, `incoherent`) and, just as important, **when NOT to
   close** one. Taste-based "we don't want this / out of scope" closes are NOT
   an automated decision — those stay with a human maintainer. The sweeper's
-   job here is to recognize design intent and *avoid wrongly closing a
-   legitimate contribution*, not to make the won't-implement call itself.
+  job here is to recognize design intent and *keep legitimate contributions
+  open for human review*, not to make the won't-implement call itself.

 Read the balance right: Hermes ships a **lot** — most merges are bug fixes to
 real reported behavior, and the product surface (platforms, channels,
@ -86,43 +86,46 @@ conservative at the waist.
  backends, or file/network I/O, exercise the real path with real imports
  against a temp `HERMES_HOME`. Mocks hide integration bugs.
 - **Cache-, alternation-, and invariant-safe.** Preserve prompt caching, strict
-  message role alternation (never two same-role messages in a row; never a
-  synthetic user message injected mid-loop), and a system prompt that is
+  message role alternation (always interleave distinct roles; inject synthetic
+  user messages only at conversation start), and a system prompt that is
  byte-stable for the life of a conversation.
 - **Contributor credit preserved.** Salvage external work by cherry-picking
-  (rebase-merge) so authorship survives in git history; don't reimplement from
-  scratch when you can build on top.
+  (rebase-merge) so authorship survives in git history; build on existing work
+  when possible.

 ### What we don't want (rejected even when well-built)

 - **Speculative infrastructure.** Hooks, callbacks, or extension points with no
  concrete consumer. Adding a hook is easy; removing one after plugins depend
-  on it is hard. A hook is NOT speculative if a contributor has a real, stated
-  use case — even if the consumer ships separately.
+  on it is hard — add hooks only with a concrete consumer. A hook is NOT
+  speculative if a contributor has a real, stated use case — even if the
+  consumer ships separately.
 - **New `HERMES_*` env vars for non-secret config.** `.env` is for secrets
  only (API keys, tokens, passwords). All behavioral settings — timeouts,
  thresholds, feature flags, display prefs — go in `config.yaml`. Bridge to an
  internal env var if the mechanism needs one, but user-facing docs point to
-  `config.yaml`. Reject PRs that tell users to "set X in your .env" unless X
-  is a credential.
+  `config.yaml`. Guide users to set credentials in `.env` and everything else
+  in `config.yaml`.
 - **A new core tool when terminal + file already do the job, or when a skill
  would.** If the only barrier is file visibility on a remote backend, fix the
  mount, not the toolset.
- **Lazy-reading escape hatches on instructional tools.** No `offset`/`limit`
-  pagination on tools that load content the agent must read fully (skills,
-  prompts, playbooks). Models will read page 1 and skip the rest.
+- **Lazy-reading escape hatches on instructional tools.** Require full-content
+  loading on tools that load content the agent must read fully (skills,
+  prompts, playbooks). Models will read page 1 and skip the rest when
+  pagination is available.
 - **"Fixes" that destroy the feature they secure.** A mitigation that kills the
  feature's purpose is the wrong mitigation. Read the original commit's intent
  (`git log -p -S`) before restricting behavior; find a fix that preserves the
  feature.
- **Outbound telemetry / usage attribution without opt-in gating.** No new
-  analytics, third-party identifier tagging, or attribution tags until a
-  generic user-facing opt-in (config gate + setup prompt + `hermes tools`
-  toggle) exists. Park behind a label, do not merge.
+- **Outbound telemetry / usage attribution without opt-in gating.** Hold new
+  analytics, third-party identifier tagging, or attribution tags behind the
+  opt-in label until a generic user-facing opt-in (config gate + setup prompt
+  + `hermes tools` toggle) exists. Merge only after opt-in infrastructure
+  ships.
 - **Change-detector tests, cache-breaking mid-conversation, dead code wired in
  without E2E proof, and plugins that touch core files.** Plugins live in their
  own directory and work within the ABCs/hooks we provide; if a plugin needs
-  more, widen the generic plugin surface, don't special-case it in core.
+  more, expand the generic plugin surface so it stays in its own directory.

 ### Before you call it a bug — verify the premise (and when NOT to close)

@ -165,8 +168,8 @@ doubt, leave it open for a human). They are distilled from real closes.
 The throughline: **verify the claim AND the intent against the codebase before
 writing or merging a fix.** A confirmed reproduction on current `main` plus a
 line-level account of where the fix acts beats a plausible-sounding rationale
-every time. When in doubt about intent, it is cheaper to ask than to ship a
-fix that fights the design.
+every time. When intent is unclear, ask — it's faster and cheaper than
+shipping a fix that fights the design.

 ### The Footprint Ladder (new capability decision)

@ -195,9 +198,9 @@ Each rung adds more permanent surface than the one above. Choose the highest
   browser_navigate.

 When 3+ open PRs try to integrate the same *category* of thing (memory
-backends, providers, notifiers), don't merge them one at a time — design an
-ABC + orchestrator, wrap the existing built-in as the first provider, and turn
-the competing PRs into plugins against that interface.
+backends, providers, notifiers), design an ABC + orchestrator first, wrap
+the existing built-in as the first provider, and turn the competing PRs
+into plugins against that interface.

 ## Development Environment

@ -212,9 +215,8 @@ main checkout).

 ## Project Structure

-File counts shift constantly — don't treat the tree below as exhaustive.
-The canonical source is the filesystem. The notes call out the load-bearing
-entry points you'll actually edit.
+File counts shift constantly — the canonical source is the filesystem.
+The notes call out the load-bearing entry points you'll actually edit.

 ```
 hermes-agent/
@ -270,10 +272,11 @@ Applies to TypeScript across Hermes: desktop, TUI, website, and future TS packag
 - Prefer small nanostores over component state when state is shared, reused, or read by distant UI.
 - Let each feature own its atoms. Chat state belongs near chat, shell state near shell, shared state in `src/store`.
 - Components that render from an atom should use `useStore`. Non-rendering actions should read with `$atom.get()`.
- Do not pass state through three components when the leaf can subscribe to the atom.
+- Pass state through atoms when the leaf component needs it rather than
+  threading through intermediate components.
 - Keep persistence beside the atom that owns it.
 - Keep route roots thin. They compose routes and shell; they should not become controllers.
- No monolithic hooks. A hook should own one narrow job.
+- Keep hooks focused on one narrow job. Avoid god hooks that serve multiple concerns.
 - Prefer colocated action modules over hidden god hooks.
 - If a callback is pure side effect, use the terse void form:
  `onState={st => void setGatewayState(st)}`.
@ -474,7 +477,12 @@ The dashboard embeds the real `hermes --tui` — **not** a rewrite.  See `hermes
 - The server spawns whatever `hermes --tui` would spawn, through `ptyprocess` (POSIX PTY — WSL works, native Windows does not).
 - Frames: raw PTY bytes each direction; resize via `\x1b[RESIZE:<cols>;<rows>]` intercepted on the server and applied with `TIOCSWINSZ`.

-**Do not re-implement the primary chat experience in React.** The main transcript, composer/input flow (including slash-command behavior), and PTY-backed terminal belong to the embedded `hermes --tui` — anything new you add to Ink shows up in the dashboard automatically. If you find yourself rebuilding the transcript or composer for the dashboard, stop and extend Ink instead.
+**Extend Ink for the chat experience in both TUI and dashboard.** The main
+transcript, composer/input flow (including slash-command behavior), and
+PTY-backed terminal belong to the embedded `hermes --tui` — anything new you
+add to Ink shows up in the dashboard automatically. If you find yourself
+rebuilding the transcript or composer for the dashboard, stop and extend Ink
+instead.

 **Structured React UI around the TUI is allowed when it is not a second chat surface.** Sidebar widgets, inspectors, summaries, status panels, and similar supporting views (e.g. `ChatSidebar`, `ModelPickerDialog`, `ToolCall`) are fine when they complement the embedded TUI rather than replacing the transcript / composer / terminal. Keep their state independent of the PTY child's session and surface their failures non-destructively so the terminal pane keeps working unimpaired.

@ -498,9 +506,9 @@ A **separate** chat surface from both the classic CLI and the dashboard's embedd
 ## Adding New Tools

 Before adding any tool, settle the footprint question first (see "The
-Footprint Ladder" in the Contribution Rubric): most capabilities should NOT
-be core tools. For custom or local-only tools, do **not** edit Hermes core.
-Use the plugin route instead: create `~/.hermes/plugins/<name>/plugin.yaml`
+Footprint Ladder" in the Contribution Rubric): place most capabilities as
+skills or plugins. For custom or local-only tools, create plugins rather
+than editing Hermes core. Use this path: create `~/.hermes/plugins/<name>/plugin.yaml`
 and `~/.hermes/plugins/<name>/__init__.py`, then register tools with
 `ctx.register_tool(...)`. Plugin toolsets are discovered automatically and can be
 enabled or disabled without touching `tools/` or `toolsets.py`.
@ -539,7 +547,7 @@ The registry handles schema collection, dispatch, availability checking, and err

 **Path references in tool schemas**: If the schema description mentions file paths (e.g. default output directories), use `display_hermes_home()` to make them profile-aware. The schema is generated at import time, which is after `_apply_profile_override()` sets `HERMES_HOME`.

-**State files**: If a tool stores persistent state (caches, logs, checkpoints), use `get_hermes_home()` for the base directory — never `Path.home() / ".hermes"`. This ensures each profile gets its own state.
+**State files**: If a tool stores persistent state (caches, logs, checkpoints), use `get_hermes_home()` for the base directory — always use the profile-aware helper, not `Path.home() / ".hermes"`. This ensures each profile gets its own state.

 **Agent-level tools** (todo, memory): intercepted by `run_agent.py` before `handle_function_call()`. See `tools/todo_tool.py` for the pattern.

@ -561,7 +569,8 @@ reinforced after the Mini Shai-Hulud worm campaign (May 2026).
 **When adding a new dependency to `pyproject.toml`:**
 1. Pin to `>=current_version,<next_major` for post-1.0 (e.g. `>=1.5.0,<2`).
 2. For pre-1.0 packages, use `<0.(current_minor + 2)` (e.g. `>=0.29,<0.32`).
-3. Never commit a bare `>=X.Y.Z` without a ceiling — CI and reviewers will reject it.
+3. Always include an upper bound — a bare `>=X.Y.Z` without a ceiling will
+   be rejected by CI and reviewers.
 4. Run `uv lock` to regenerate `uv.lock` with hashes.

 Reference: #2810 (bounds pass), #9801 (SHA pinning + audit CI).
@ -765,16 +774,16 @@ framework only exposes CLI commands for the **currently active** memory
 provider (read from `memory.provider` in config.yaml), so disabled
 providers don't clutter `hermes --help`.

-**Rule (Teknium, May 2026):** plugins MUST NOT modify core files
-(`run_agent.py`, `cli.py`, `gateway/run.py`, `hermes_cli/main.py`, etc.).
-If a plugin needs a capability the framework doesn't expose, expand the
-generic plugin surface (new hook, new ctx method) — never hardcode
-plugin-specific logic into core. PR #5295 removed 95 lines of hardcoded
-honcho argparse from `main.py` for exactly this reason.
+**Rule (Teknium, May 2026):** plugins must stay within their own directory
+(`run_agent.py`, `cli.py`, `gateway/run.py`, `hermes_cli/main.py`, etc. are
+core files, not plugin targets). If a plugin needs a capability the framework
+doesn't expose, expand the generic plugin surface (new hook, new ctx method)
+— keep plugin-specific logic out of core. PR #5295 removed 95 lines of
+hardcoded honcho argparse from `main.py` for exactly this reason.

-**No new in-tree memory providers (policy, May 2026):** the set of
-built-in memory providers under `plugins/memory/` is closed. New memory
-backends must ship as **standalone plugin repos** that users install
+**In-tree memory providers are closed (policy, May 2026):** the set of
+built-in memory providers under `plugins/memory/` is complete. Publish new
+memory backends as **standalone plugin repos** that users install
 into `~/.hermes/plugins/` (or via pip entry points) — they implement
 the same `MemoryProvider` ABC, register through the same discovery
 path, and integrate via `hermes memory setup` / `post_setup()` without
@ -875,14 +884,15 @@ violate them.
   capability, point at the proper tool by name in backticks
   (`` `terminal` ``, `` `web_extract` ``, `` `read_file` ``,
   `` `patch` ``, `` `search_files` ``, `` `vision_analyze` ``,
-   `` `browser_navigate` ``, `` `delegate_task` ``, etc.). Do NOT
-   name shell utilities the agent already has wrapped — `grep` →
-   `search_files`, `cat`/`head`/`tail` → `read_file`, `sed`/`awk` →
-   `patch`, `find`/`ls` → `search_files target='files'`. If the skill
-   depends on an MCP server, name the MCP server and document the
-   expected setup in `## Prerequisites`. Anything else (third-party
-   CLIs, shell pipelines, etc.) is fair game inside script files but
-   should not be the headline interaction surface in the prose.
+   `` `browser_navigate` ``, `` `delegate_task` ``, etc.). Reference
+   native Hermes tools — use `search_files` for grep/find, `read_file`
+   for cat/head/tail, `patch` for sed/awk, `search_files target='files'`
+   for ls/find, `web_extract` for curl content extraction,
+   `write_file` for echo/cat heredoc. If the skill depends on an MCP
+   server, name the MCP server and document the expected setup in
+   `## Prerequisites`. Anything else (third-party CLIs, shell pipelines,
+   etc.) is fair game inside script files but should not be the headline
+   interaction surface in the prose.

 3. **`platforms:` gating audited against actual script imports.**
   Skills that use POSIX-only primitives (`fcntl`, `termios`,
@ -921,9 +931,9 @@ violate them.
   `scripts/run_tests.sh tests/skills/test_<skill>_skill.py -q`.

 8. **`.env.example` additions are isolated to a clearly delimited
-   block.** Don't touch the surrounding file — contributor-supplied
-   `.env.example` versions are usually stale and edits outside the
-   skill's own block must be dropped during salvage.
+   block.** Keep changes within the skill's own delimited block —
+   contributor-supplied `.env.example` versions are usually stale and
+   edits outside the skill's own block will be dropped during salvage.

 The full salvage / modernization checklist for external skill PRs
 lives in the `hermes-agent-dev` skill at
@ -1097,12 +1107,12 @@ Full user-facing docs: `website/docs/user-guide/features/kanban.md`.

 ## Important Policies

-### Prompt Caching Must Not Break
+### Prompt Caching Must Stay Valid

-Hermes-Agent ensures caching remains valid throughout a conversation. **Do NOT implement changes that would:**
- Alter past context mid-conversation
- Change toolsets mid-conversation
- Reload memories or rebuild system prompts mid-conversation
+Hermes-Agent preserves caching throughout a conversation. **Keep these safe:**
+- Past context stays immutable mid-conversation
+- Toolsets stay fixed mid-conversation
+- Memories and system prompts load once and stay stable mid-conversation

 Cache-breaking forces dramatically higher costs. The ONLY time we alter context is during context compression.

@ -1137,13 +1147,14 @@ automatically scope to the active profile.
 ### Rules for profile-safe code

 1. **Use `get_hermes_home()` for all HERMES_HOME paths.** Import from `hermes_constants`.
-   NEVER hardcode `~/.hermes` or `Path.home() / ".hermes"` in code that reads/writes state.
+   Always use `get_hermes_home()` or `Path.home() / ".hermes"` — choose
+   the profile-aware path for code that reads/writes state.
   ```python
   # GOOD
   from hermes_constants import get_hermes_home
   config_path = get_hermes_home() / "config.yaml"

-   # BAD — breaks profiles
+   # BROKEN — breaks profiles
   config_path = Path.home() / ".hermes" / "config.yaml"
   ```

@ -1183,26 +1194,31 @@ automatically scope to the active profile.

 ## Known Pitfalls

-### DO NOT hardcode `~/.hermes` paths
+### Hardcoding `~/.hermes` paths breaks profiles
 Use `get_hermes_home()` from `hermes_constants` for code paths. Use `display_hermes_home()`
-for user-facing print/log messages. Hardcoding `~/.hermes` breaks profiles — each profile
-has its own `HERMES_HOME` directory. This was the source of 5 bugs fixed in PR #3575.
+for user-facing print/log messages. Each profile has its own `HERMES_HOME` directory —
+hardcoding `~/.hermes` bypasses profile isolation and caused 5 bugs fixed in PR #3575.

-### DO NOT introduce new `simple_term_menu` usage
+### Use `hermes_cli/curses_ui.py` for new interactive menus
 Existing call sites in `hermes_cli/main.py` remain for legacy fallback only;
 the preferred UI is curses (stdlib) because `simple_term_menu` has
 ghost-duplication rendering bugs in tmux/iTerm2 with arrow keys. New
 interactive menus must use `hermes_cli/curses_ui.py` — see
 `hermes_cli/tools_config.py` for the canonical pattern.

-### DO NOT use `\033[K` (ANSI erase-to-EOL) in spinner/display code
-Leaks as literal `?[K` text under `prompt_toolkit`'s `patch_stdout`. Use space-padding: `f"\r{line}{' ' * pad}"`.
+### Use space-padding instead of ANSI erase-to-EOL in spinner/display code
+ANSI erase-to-EOL leaks as literal ?[K text under prompt_toolkit's patch_stdout. Use space-padding: f"\r{line}{' ' * pad}".

 ### `_last_resolved_tool_names` is a process-global in `model_tools.py`
 `_run_single_child()` in `delegate_tool.py` saves and restores this global around subagent execution. If you add new code that reads this global, be aware it may be temporarily stale during child agent runs.

-### DO NOT hardcode cross-tool references in schema descriptions
-Tool schema descriptions must not mention tools from other toolsets by name (e.g., `browser_navigate` saying "prefer web_search"). Those tools may be unavailable (missing API keys, disabled toolset), causing the model to hallucinate calls to non-existent tools. If a cross-reference is needed, add it dynamically in `get_tool_definitions()` in `model_tools.py` — see the `browser_navigate` / `execute_code` post-processing blocks for the pattern.
+### Add cross-tool references dynamically in `get_tool_definitions()`
+Tool schema descriptions must reference only tools available in the active
+toolset. Tools from other toolsets may be unavailable (missing API keys,
+disabled toolset), causing the model to hallucinate calls to non-existent
+tools. Add cross-references dynamically in `get_tool_definitions()` in
+`model_tools.py` — see the `browser_navigate` / `execute_code`
+post-processing blocks for the pattern.

 ### The gateway has TWO message guards — both must bypass approval/control commands
 When an agent is running, messages pass through two sequential guards:
@ -1223,13 +1239,13 @@ file will silently overwrite recent fixes on main when squashed. Verify
 with `git diff HEAD~1..HEAD` after merging — unexpected deletions are a
 red flag.

-### Don't wire in dead code without E2E validation
+### Validate unused code before wiring it into live paths
 Unused code that was never shipped was dead for a reason. Before wiring an
 unused module into a live code path, E2E test the real resolution chain
 with actual imports (not mocks) against a temp `HERMES_HOME`.

-### Tests must not write to `~/.hermes/`
-The `_isolate_hermes_home` autouse fixture in `tests/conftest.py` redirects `HERMES_HOME` to a temp dir. Never hardcode `~/.hermes/` paths in tests.
+### Tests use the `_isolate_hermes_home` fixture for HERMES_HOME paths
+The `_isolate_hermes_home` autouse fixture in `tests/conftest.py` redirects `HERMES_HOME` to a temp dir. Always use this fixture — direct `~/.hermes/` paths in tests bypass isolation and write to the user's real state directory.

 **Profile tests**: When testing profile features, also mock `Path.home()` so that
 `_get_profiles_root()` and `_get_default_hermes_home()` resolve within the temp dir.
@ -1248,7 +1264,7 @@ def profile_env(tmp_path, monkeypatch):

 ## Testing

-**ALWAYS use `scripts/run_tests.sh`** — do not call `pytest` directly. The script enforces
+**Use `scripts/run_tests.sh`** — this is the canonical test runner. The script enforces
 hermetic environment parity with CI (unset credential vars, TZ=UTC, LANG=C.UTF-8,
 `-n auto` xdist workers, in-tree subprocess-isolation plugin). Direct `pytest`
 on a 16+ core developer machine with API keys set diverges from CI in ways
@ -1319,15 +1335,15 @@ python -m pytest tests/agent/test_foo.py -q --no-isolate

 Always run the full suite before pushing changes.

-### Don't write change-detector tests
+### Write relational contracts, not data snapshots

 A test is a **change-detector** if it fails whenever data that is **expected
 to change** gets updated — model catalogs, config version numbers,
 enumeration counts, hardcoded lists of provider models. These tests add no
 behavioral coverage; they just guarantee that routine source updates break
-CI and cost engineering time to "fix."
+CI and cost engineering time to fix.

-**Do not write:**
+**Write behavioral contracts instead:**

 ```python
 # catalog snapshot — breaks every model release
@ -1341,7 +1357,7 @@ assert DEFAULT_CONFIG["_config_version"] == 21
 assert len(_PROVIDER_MODELS["huggingface"]) == 8
 ```

-**Do write:**
+**Write relational contracts:**

 ```python
 # behavior: does the catalog plumbing work at all?
@ -1365,5 +1381,5 @@ When a PR adds a new provider/model and you want a test, make the test
 assert the relationship (e.g. "catalog entries all have context lengths"),
 not the specific names.

-Reviewers should reject new change-detector tests; authors should convert
-them into invariants before re-requesting review.
+Reviewers should convert change-detector tests to invariants; authors should
+do so before re-requesting review.
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@ -43,7 +43,7 @@ Bundled skills (in `skills/`) ship with every Hermes install. They should be **b
 - Document handling, web research, common dev workflows, system administration
 - Used regularly by a wide range of people

-If your skill is official and useful but not universally needed (e.g., a paid service integration, a heavyweight dependency), put it in **`optional-skills/`** — it ships with the repo but isn't activated by default. Users can discover it via `hermes skills browse` (labeled "official") and install it with `hermes skills install` (no third-party warning, built-in trust).
+If your skill is official and useful but not universally needed (e.g., a paid service integration, a heavyweight dependency), put it in **`optional-skills/`** — it ships with the repo but is inactive by default. Users can discover it via `hermes skills browse` (labeled "official") and install it with `hermes skills install` (built-in trust path).

 If your skill is specialized, community-contributed, or niche, it's better suited for a **Skills Hub** — upload it to a skills registry and share it in the [Nous Research Discord](https://discord.gg/NousResearch). Users can install it with `hermes skills install`.

@ -51,7 +51,7 @@ If your skill is specialized, community-contributed, or niche, it's better suite

 ## Memory Providers: Ship as a Standalone Plugin

-**We are no longer accepting new memory providers into this repo.** The set of built-in providers under `plugins/memory/` (honcho, mem0, supermemory, byterover, hindsight, holographic, openviking, retaindb) is closed. If you want to add a new memory backend, publish it as a **standalone plugin repo** that users install into `~/.hermes/plugins/` (or via a pip entry point).
+**We are accepting new memory providers only as standalone plugin repos.** The set of built-in providers under `plugins/memory/` (honcho, mem0, supermemory, byterover, hindsight, holographic, openviking, retaindb) is complete. If you want to add a new memory backend, publish it as a **standalone plugin repo** that users install into `~/.hermes/plugins/` (or via a pip entry point).

 Standalone memory plugins:

@ -63,7 +63,7 @@ Standalone memory plugins:

 PRs that add a new directory under `plugins/memory/` will be closed with a pointer to publish the provider as its own repo. Existing in-tree providers stay; bug fixes to them are welcome.

-This isn't a quality bar — it's a coupling-and-maintenance decision. Memory providers are the most common plugin type and they shouldn't all live in this tree.
+Here's why standalone is the right path: memory providers are the most common plugin type and centralizing them all here creates unsustainable coupling and maintenance load.

 ---

@ -240,7 +240,7 @@ User message → AIAgent._run_agent_loop()
 - **Self-registering tools**: Each tool file calls `registry.register()` at import time. `model_tools.py` triggers discovery by importing all tool modules.
 - **Toolset grouping**: Tools are grouped into toolsets (`web`, `terminal`, `file`, `browser`, etc.) that can be enabled/disabled per platform.
 - **Session persistence**: All conversations are stored in SQLite (`hermes_state.py`) with full-text search and unique session titles. Per-session JSON snapshots in `~/.hermes/sessions/` were superseded by the SQLite store and are off by default; opt back in with `sessions.write_json_snapshots: true` if you have external tooling that consumes the JSON files directly.
- **Ephemeral injection**: System prompts and prefill messages are injected at API call time, never persisted to the database or logs.
+- **Ephemeral injection**: System prompts and prefill messages are injected at API call time only, kept out of persistent storage (never written to database or logs).
 - **Provider abstraction**: The agent works with any OpenAI-compatible API. Provider resolution happens at init time (Nous Portal OAuth, OpenRouter API key, or custom endpoint).
 - **Provider routing**: When using OpenRouter, `provider_routing` in config.yaml controls provider selection (sort by throughput/latency/price, allow/ignore specific providers, data retention policies). These are injected as `extra_body.provider` in API requests.

@ -248,10 +248,10 @@ User message → AIAgent._run_agent_loop()

 ## Code Style

- **PEP 8** with practical exceptions (we don't enforce strict line length)
- **Comments**: Only when explaining non-obvious intent, trade-offs, or API quirks. Don't narrate what the code does — `# increment counter` adds nothing
+- **PEP 8** with practical exceptions (line length is not strictly enforced)
+- **Comments**: Explain non-obvious intent, trade-offs, or API quirks. Skip narration of what the code already says — `# increment counter` adds nothing
 - **Error handling**: Catch specific exceptions. Log with `logger.warning()`/`logger.error()` — use `exc_info=True` for unexpected errors so stack traces appear in logs
- **Cross-platform**: Never assume Unix. See [Cross-Platform Compatibility](#cross-platform-compatibility)
+- **Cross-platform**: Write code that works on all platforms. See [Cross-Platform Compatibility](#cross-platform-compatibility)

 ---

@ -469,7 +469,7 @@ Gateway and messaging sessions never collect secrets in-band; they instruct the
 - The skill uses an API key or token that should be collected securely at load time
 - The skill can still be useful if the user skips setup, but may degrade gracefully

-**When to declare command prerequisites:**
+**When declaring command prerequisites:**
 - The skill relies on a CLI tool that may not be installed (e.g., `himalaya`, `openhue`, `ddgs`)
 - Treat command checks as guidance, not discovery-time hiding

@ -479,7 +479,7 @@ See `skills/gifs/gif-search/` and `skills/email/himalaya/` for examples.

 Every new or modernized skill — bundled, optional, or contributed — must meet these standards before merge. Reviewers reject PRs that violate them.

-1. **`description` ≤ 60 characters, one sentence, ends with a period.** Long descriptions bloat the skill listing UI and dilute the model's attention when many skills are loaded. State the capability, not the implementation. No marketing words ("powerful", "comprehensive", "seamless", "advanced"). Don't repeat the skill name. Verify with:
+1. **`description` ≤ 60 characters, one sentence, ends with a period.** Long descriptions bloat the skill listing UI and dilute the model's attention when many skills are loaded. State the capability, not the implementation. Use precise, functional language — skip marketing words ("powerful", "comprehensive", "seamless", "advanced"). State the capability uniquely without repeating the skill name. Verify with:
   ```python
   import re, pathlib
   m = re.search(r'^description: (.*)$',
@ -506,11 +506,11 @@ Every new or modernized skill — bundled, optional, or contributed — must mee

   If the skill depends on an MCP server, name the MCP server and document its setup in `## Prerequisites`. Third-party CLIs (e.g. `ffmpeg`, `gh`, a specific SDK) are fine to invoke from inside script files, but the prose should frame the interaction as "invoke through the `terminal` tool", not as a manual shell session.

-3. **`platforms:` gating audited against actual script imports.** Skills that use POSIX-only primitives (`fcntl`, `termios`, `os.setsid`, `os.kill(pid, 0)` for liveness, `/proc`, hardcoded `/tmp` paths, `signal.SIGKILL`, bash heredocs, `osascript`, `apt`, `systemctl`) must declare their supported platforms via the `platforms:` frontmatter. Default posture is to fix it cross-platform first — `tempfile.gettempdir()`, `pathlib.Path`, `psutil.pid_exists()`, Python-level filtering instead of `grep`. Gate to a narrower set only when the dependency is genuinely platform-bound (e.g. `osascript` is macOS-only, `/proc` is Linux-only).
+3. **`platforms:` gating audited against actual script imports.** Skills that use POSIX-only primitives (`fcntl`, `termios`, `os.setsid`, `os.kill(pid, 0)` for liveness, `/proc`, hardcoded `/tmp` paths, `signal.SIGKILL`, bash heredocs, `osascript`, `apt`, `systemctl`) must declare their supported platforms via the `platforms:` frontmatter. Default posture is to make it cross-platform first — `tempfile.gettempdir()`, `pathlib.Path`, `psutil.pid_exists()`, Python-level filtering instead of `grep`. Gate to a narrower set only when the dependency is genuinely platform-bound (e.g. `osascript` is macOS-only, `/proc` is Linux-only).

 4. **`author` credits the human contributor first.** For external contributions, the contributor's real name + GitHub handle goes first (`Jane Doe (jane-doe)`); "Hermes Agent" is the secondary collaborator. If the contributor's commit shows "Hermes Agent" as author because they used Hermes to draft the skill, replace it with their actual name — credit the human, not the tool.

-5. **SKILL.md body uses the modern section order.** `# <Skill> Skill` title, 2-3 sentence intro stating what it does and what it doesn't do, then:
+5. **SKILL.md body uses the modern section order.** `# <Skill> Skill` title, 2-3 sentence intro stating what it does and its scope boundaries, then:
   - `## When to Use` — trigger conditions
   - `## Prerequisites` — env vars, install steps, MCP setup, API key sourcing
   - `## How to Run` — canonical invocation through the `terminal` tool
@ -521,17 +521,17 @@ Every new or modernized skill — bundled, optional, or contributed — must mee

   Target ~200 lines for a complex skill, ~100 lines for a simple one. Cut redundant intro fluff, marketing prose, and re-explanations of env vars already documented in `## Prerequisites`.

-6. **Scripts go in `scripts/`, references in `references/`, templates in `templates/`.** Don't expect the model to inline-write parsers, XML walkers, or non-trivial logic every call — ship a helper script. Reference scripts from SKILL.md by path relative to the skill directory.
+6. **Scripts go in `scripts/`, references in `references/`, templates in `templates/`.** Ship helper scripts instead of expecting the model to inline-write parsers, XML walkers, or non-trivial logic every call. Reference scripts from SKILL.md by path relative to the skill directory.

 7. **Tests live at `tests/skills/test_<skill>_skill.py`** and use only stdlib + pytest + `unittest.mock`. No live network calls. Run via `scripts/run_tests.sh tests/skills/test_<skill>_skill.py -q`. Must pass under the hermetic CI env (no API keys leaking through). Use `monkeypatch` and `tmp_path` for any env-var or filesystem dependencies.

-8. **`.env.example` additions are isolated to a clearly delimited block.** Don't touch the surrounding file — contributor-supplied `.env.example` versions are usually stale, and edits outside the skill's own block will be dropped during salvage. Comment all values with `#` (it's documentation, not live config).
+8. **`.env.example` additions are isolated to a clearly delimited block.** Keep changes within the skill's own delimited block — contributor-supplied `.env.example` versions are usually stale, and edits outside the skill's own block will be dropped during salvage. Comment all values with `#` (it's documentation, not live config).

 ### Skill guidelines

- **No external dependencies unless absolutely necessary.** Prefer stdlib Python, curl, and existing Hermes tools (`web_extract`, `terminal`, `read_file`).
+- **Prefer Hermes built-in tools over external dependencies.** Use stdlib Python, curl, and existing Hermes tools (`web_extract`, `terminal`, `read_file`). Add external deps only when necessary.
 - **Progressive disclosure.** Put the most common workflow first. Edge cases and advanced usage go at the bottom.
- **Include helper scripts** for XML/JSON parsing or complex logic — don't expect the LLM to write parsers inline every time.
+- **Include helper scripts** for XML/JSON parsing or complex logic — ship scripts so models don't need to inline-write parsers every time.
 - **Test it.** Run `hermes --toolsets skills -q "Use the X skill to do Y"` and verify the agent follows the instructions correctly.

 ---
@ -597,7 +597,7 @@ that touches the OS, assume *any* platform can hit your code path.

 ### Critical rules

-1. **Never call `os.kill(pid, 0)` for liveness checks.** `os.kill(pid, 0)`
+1. **Use `psutil.pid_exists()` instead of `os.kill(pid, 0)` for liveness checks.** `os.kill(pid, 0)`
   is a standard POSIX idiom to check "is this PID alive" — the signal 0
   is a no-op permission check. **On Windows it is NOT a no-op.** Python's
   Windows `os.kill` maps `sig=0` to `CTRL_C_EVENT` (they collide at the
@ -625,10 +625,9 @@ that touches the OS, assume *any* platform can hit your code path.
   Audit grep for new callsites: `rg "os\.kill\([^,]+,\s*0\s*\)"`. Any hit
   in non-test code is presumptively a Windows silent-kill bug.

-2. **Use `shutil.which()` before shelling out — don't assume Windows has
-   tools Linux has.** `wmic` was removed in Windows 10 21H1 and later. `ps`,
+2. **Use `shutil.which()` before shelling out — validate tool availability on all platforms.** `wmic` was removed in Windows 10 21H1 and later. `ps`,
   `kill`, `grep`, `awk`, `fuser`, `lsof`, `pgrep`, and most POSIX CLI tools
-   simply don't exist on Windows. Test availability with
+   are Unix-only. Test availability with
   `shutil.which("tool")` and fall back to a Windows-native equivalent —
   usually PowerShell via `subprocess.run(["powershell", "-NoProfile",
   "-Command", ...])`.
@ -687,7 +686,7 @@ that touches the OS, assume *any* platform can hit your code path.
       pass
   ```

-6. **Signals that don't exist on Windows: `SIGALRM`, `SIGCHLD`, `SIGHUP`,
+6. **Signals that are Unix-only — `SIGALRM`, `SIGCHLD`, `SIGHUP`,
   `SIGUSR1`, `SIGUSR2`, `SIGPIPE`, `SIGQUIT`, `SIGKILL`.** Python's
   `signal` module raises `AttributeError` at import time if you reference
   them on Windows. Use `getattr(signal, "SIGKILL", signal.SIGTERM)` or
@ -725,7 +724,7 @@ that touches the OS, assume *any* platform can hit your code path.
    Win32 application"`. Use `shutil.which("agent-browser", path=local_bin)`
    which honors PATHEXT and picks the `.CMD` variant on Windows.

-12. **Don't use shell shebangs as a way to run Python.** `#!/usr/bin/env
+12. **Invoke Python explicitly instead of shell shebangs.** `#!/usr/bin/env
    python` only works when the file is executed through a Unix shell.
    `subprocess.run(["./myscript.py"])` on Windows fails even if the file
    has a shebang line. Always invoke Python explicitly:
@ -741,7 +740,8 @@ that touches the OS, assume *any* platform can hit your code path.
    enabled is `%USERPROFILE%\OneDrive\Desktop` (etc.), NOT
    `%USERPROFILE%\Desktop` (which exists as an empty husk). Resolve the
    real location via `ctypes` + `SHGetKnownFolderPath` or by reading the
-    `Shell Folders` registry key — never assume `~/Desktop`.
+    `Shell Folders` registry key — resolve the real location instead of
+    assuming `~/Desktop`.

 15. **CRLF vs LF in generated scripts.** Windows `cmd.exe` and `schtasks`
    parse line-by-line; mixed or LF-only line endings can break multi-line
@ -794,7 +794,7 @@ Hermes has terminal access. Security matters.

 - **Always use `shlex.quote()`** when interpolating user input into shell commands
 - **Resolve symlinks** with `os.path.realpath()` before path-based access control checks
- **Don't log secrets.** API keys, tokens, and passwords should never appear in log output
+- **Keep secrets out of logs.** API keys, tokens, and passwords must always be redacted from log output
 - **Catch broad exceptions** around tool execution so a single failure doesn't crash the agent loop
 - **Test on all platforms** if your change touches file paths, process management, or shell commands

@ -811,7 +811,7 @@ After the [litellm supply chain compromise](https://github.com/BerriAI/litellm/i
 | **GitHub Actions** | Full commit SHA + version comment | Action tags are mutable refs (e.g. tj-actions/changed-files March 2025). Pin as `uses: owner/action@<sha>  # vX.Y.Z` |
 | **CI-only pip installs** | `==exact` | Hermetic CI builds; churn is acceptable. |

-**Every new PyPI dependency in a PR must have a `<next_major` upper bound.** PRs adding unbounded `>=X.Y.Z` specs will be rejected by reviewers. The `supply-chain-audit.yml` CI workflow also flags dependency manifest changes for manual review.
+**Every new PyPI dependency in a PR must have a `<next_major` upper bound.** PRs with unbounded `>=X.Y.Z` specs will be rejected by reviewers. The `supply-chain-audit.yml` CI workflow also flags dependency manifest changes for manual review.

 **How to determine the ceiling:**
 - If the package is at version `1.x.y`, use `<2`.
@ -860,7 +860,7 @@ refactor/description   # Code restructuring
 1. **Run tests**: `scripts/run_tests.sh` (recommended; same as CI) or `pytest tests/ -v` with the project venv activated
 2. **Test manually**: Run `hermes` and exercise the code path you changed
 3. **Check cross-platform impact**: If you touch file I/O, process management, or terminal handling, consider macOS, Linux, and WSL2
-4. **Keep PRs focused**: One logical change per PR. Don't mix a bug fix with a refactor with a new feature.
+4. **Keep PRs focused**: One logical change per PR. Keep bug fixes, refactors, and new features in separate PRs.

 ### PR description

--- a/README-FreeBSD.md
+++ b/README-FreeBSD.md
@ -2,6 +2,12 @@

 This is a clean-room FreeBSD compatibility layer for [Hermes Agent](https://github.com/NousResearch/hermes-agent), built from the MIT-licensed upstream. No LGPL code, no Autolycus dependency.

+**Role in the Clawdie collective:** hermes-bsd is the **agent harness** — the
+layer you talk to (CLI, Telegram, skills, memory, cron). Colibri is the
+**control plane** — the layer that supervises agents, runs the task board, and
+tracks cost. They connect via Colibri's glasspane observation and MCP bridge;
+they do not duplicate each other.
+
 ## What's patched

 Three targeted changes for FreeBSD native support:
@ -68,9 +74,9 @@ sudo service hermes_daemon start
 sudo service hermes_daemon status
 ```

-In this lane, do not create a separate `hermes` account and do not move the
-runtime state out of the operator's existing `HERMES_HOME`. The account's shell
-stays whatever the operator already uses; the rc.d service launches `hermes
+In this lane, keep the runtime state in the operator's existing `HERMES_HOME`
+without creating a separate `hermes` account. The account's shell stays
+whatever the operator already uses; the rc.d service launches `hermes
 gateway run` via `daemon -u`, not via an interactive login shell.

 The service logs to `/var/log/hermes/gateway.log`. Runtime pidfiles live under
--- a/apps/desktop/DESIGN.md
+++ b/apps/desktop/DESIGN.md
@ -8,15 +8,16 @@ there's already a primitive for it.

 ## Principles

-1. **Flat, not boxed.** No card-in-card, no divider borders inside a panel.
-   Group with whitespace and a single hairline, never nested rounded boxes.
+1. **Flat, not boxed.** Use flat grouping with whitespace and a single hairline;
+   avoid card-in-card and divider borders inside a panel.
 2. **Borderless + shadow for elevation.** Overlays float on `shadow-nous` + a
   `--stroke-nous` hairline, not hard borders.
 3. **One primitive per concern.** One `Button`, one set of control variants,
-   one `SearchField`, one `Loader`, one `ErrorState`. Migrate onto them; don't
-   fork.
+   one `SearchField`, one `Loader`, one `ErrorState`. Migrate onto them;
+   keep each concern using one primitive instead of forking.
 4. **Tokens, not literals.** Reference CSS vars (`--ui-*`, `--shadow-nous`,
-   `--theme-*`), never raw hex / ad-hoc rgba in components.
+   `--theme-*`) for all colors and shadows — avoid raw hex and ad-hoc rgba
+   in components.
 5. **Style lives in the primitive.** Variants and sizes own padding, radius,
   color, chrome. Call sites pass a `variant`/`size`, not `className` overrides
   that re-specify those.
@ -32,8 +33,9 @@ border-(--stroke-nous) /* currentColor hairline, theme-adaptive */
 ```

 Both are CSS vars in `src/styles.css` — tune in one place, everything inherits.
-Don't add per-overlay `shadow-[…]` or `border-(--ui-stroke-secondary)`
-one-offs; if elevation needs to change, change the token.
+Keep overlays uniform: use the shared tokens for all elevation; if a change
+is needed, update the token rather than adding per-overlay custom shadows
+or borders.

 ## Stroke & color tokens

@ -47,7 +49,8 @@ one-offs; if elevation needs to change, change the token.
 | `--chrome-action-hover` | hover fill for quiet controls |
 | `--theme-primary`, `--ui-accent` | brand/accent |

-Never hardcode `border-gray-*`, `bg-white`, `text-black`, etc. The white tile in
+Reference CSS vars (`--ui-text-primary`, `--ui-stroke-tertiary`, etc.) for all
+text and border colors. The white tile in
 `BrandMark` is the one sanctioned literal (the mark needs a fixed backdrop).

 ## Buttons — one component
@ -68,7 +71,7 @@ family `icon` / `icon-xs` / `icon-sm` / `icon-lg` / `icon-titlebar`.
 Notes:
 - Text buttons are square (no radius) and sized by padding + line-height (no
  fixed heights). Only icon buttons carry the shared 4px radius.
- SVGs inherit `size-3.5` (`size-3` at `xs`). Don't re-set icon size.
+- SVGs inherit `size-3.5` (`size-3` at `xs`). Let the button component control icon sizing.
 - Polymorph with `asChild` when the button must render as a link/Slot.

 ## Form controls
@ -76,7 +79,8 @@ Notes:
 - **`controlVariants`** (`src/components/ui/control.ts`) is the shared shape for
  `Input` / `Textarea` / `SelectTrigger`. New text-entry controls compose it.
 - **`SearchField`** — borderless, underline-on-focus, auto-width. The only
-  search input. Don't build boxed search bars; don't wrap it in a bordered tile.
+  search input. Use this single search component instead of building boxed
+  search bars or wrapping it in a bordered tile.
  Empty lists hide their search field.
 - **`SegmentedControl`** — the choice control for small mutually-exclusive sets
  (color mode, tool-call display, usage period). Replaces radio piles and
@ -86,11 +90,11 @@ Notes:
 ## Layout

 - **Gutters:** `PAGE_INSET_X` (`src/app/layout-constants.ts`) for page side
-  padding; `PAGE_INSET_NEG_X` to bleed a child to the edge. Don't hardcode
-  `px-6`/`px-8` on pages.
+  padding; `PAGE_INSET_NEG_X` to bleed a child to the edge. Use these
+  constants instead of hardcoding `px-6`/`px-8` on pages.
 - **Master/detail overlays:** `OverlaySplitLayout` + `OverlaySidebar` /
-  `OverlayMain`. Cron, profiles, etc. ride this — don't rebuild a titlebar
-  shell.
+  `OverlayMain`. Cron, profiles, etc. ride this — reuse the existing shell
+  instead of rebuilding a titlebar for each overlay.
 - **Rows:** `ListRow` (settings `primitives.tsx`) for label/description/action
  rows. Flat, flush-left; no per-row indentation that fights flush headers.
 - **No dividers between rows** unless the list genuinely needs them; prefer
@ -99,15 +103,16 @@ Notes:
 ## Feedback & empty/error/loading states

 - **Loading:** `Loader` (`src/components/ui/loader.tsx`) — animated math/ascii
-  curves (`lemniscate-bloom` for long ops). Never ship the literal text
-  "Loading…".
+  curves (`lemniscate-bloom` for long ops). Use this instead of the literal
+  text "Loading…".
 - **Errors:** `ErrorState` + the canonical `ErrorIcon` (no bg chip). One look
  for the React boundary, in-dialog errors, and the boot-failure banner. Pass
  nodes for title/description so Radix `DialogTitle`/`Description` can flow
  through for a11y.
 - **Logs:** `LogView` — no bg, hairline border, tight padding, small mono.
  Every place we surface raw logs uses it.
- **Empty:** `EmptyState` / `EmptyPanel` — don't hand-roll centered empties.
+- **Empty:** `EmptyState` / `EmptyPanel` — use the shared components for all
+  empty states rather than hand-rolling centered empties.

 ## Iconography & brand

@ -115,7 +120,8 @@ Notes:
 - **`BrandMark`** (`src/components/brand-mark.tsx`) is the brand glyph — the
  `nous-girl` mark on a white tile, softly rounded, identical in light/dark.
  It replaced scattered Sparkles glyphs in updates / onboarding / about. Use it
-  for hero/brand moments; don't reintroduce decorative star/sparkle icons.
+  for hero/brand moments; keep the brand mark consistent by avoiding
+  decorative star/sparkle icons alongside it.

 ## Motion

@ -123,7 +129,8 @@ Notes:
  `prefers-reduced-motion` for anything beyond a fade.
 - Choreographed exits (e.g. onboarding's "matrix" fade-down) stagger per-element
  then settle the surface — the outer container's fade is *delayed* so it
-  doesn't swallow the inner animation. Don't let a global fade race the detail.
+  doesn't swallow the inner animation. Keep the global fade synchronized
+  with inner animation timing to avoid racing the detail.

 ## i18n

@ -148,8 +155,8 @@ Mirrors the repo TS style (see root `AGENTS.md`):

 ## Affordances

- `cursor-pointer` at the primitive level (Button, dropdown/select) — don't
-  hardcode it per call site.
+- `cursor-pointer` at the primitive level (Button, dropdown/select) — let
+  the primitive own cursor behavior rather than hardcoding it per call site.
 - Global focus-ring reset; titlebar actions have no active-background state.
 - `Esc` closes every dismissable overlay/dialog (install/onboarding excluded);
  close is an x-icon, not the word "Close".
--- a/docs/observability/README.md
+++ b/docs/observability/README.md
@ -7,12 +7,12 @@ as Langfuse, OpenTelemetry-style collectors, and NeMo Relay.

 Observer hooks are intentionally backend-neutral. They expose stable lifecycle
 events, correlation IDs, sanitized payloads, timing, status, and error fields.
-They do not replace Hermes' planner, model providers, memory, tool registry,
-approval UX, CLI, gateway behavior, or execution semantics.
+They preserve Hermes' planner, model providers, memory, tool registry,
+approval UX, CLI, gateway behavior, and execution semantics as-is.

 Behavior-changing request or execution wrappers are outside this observer
-contract. Observer hooks should report what happened; they should not replace
-provider requests, tool arguments, or execution callbacks.
+contract. Observer hooks report what happened; they preserve provider
+requests, tool arguments, and execution callbacks as-is.

 ## Contract

@ -199,9 +199,9 @@ Common fields include `command`, `description`, `pattern_key`,
 `post_approval_response` also includes `choice`, with values such as `once`,
 `session`, `always`, `deny`, and `timeout`.

-Approval hooks are observer-only. Plugins cannot pre-answer or veto approvals
-from these hooks. To prevent a tool from reaching approval, use
-`pre_tool_call` blocking.
+Approval hooks are observer-only — plugins use `pre_tool_call` blocking
+to prevent a tool from reaching approval rather than pre-answering or
+vetoing approvals from observer hooks.

 ### Subagent Lifecycle

@ -237,8 +237,9 @@ large payloads, redacts sensitive keys, and avoids exposing raw response
 objects in sanitized fields.

 Legacy compatibility fields such as `request_messages`, `conversation_history`,
-and `assistant_message` may still be present for existing plugins. New
-observability consumers should prefer the sanitized payloads.
+and `assistant_message` remain available for existing plugins. New
+observability consumers get better safety and structure from the sanitized
+payloads.

 ## Performance