hermes-bsd/hermes_cli
Rob Moen 0dd373ec43 fix(context): honor model.context_length for Ollama num_ctx and all display paths
When a user sets model.context_length in config.yaml, the value was only
used for Hermes' internal compression decisions (context_compressor) but
NOT for Ollama's num_ctx parameter. Ollama auto-detects context from GGUF
metadata (often 256K+) and allocates that much VRAM regardless of the
user's config — causing OOM on smaller GPUs like the P100 (16GB).

Root cause: two separate context values existed independently:
  - context_compressor.context_length = config value (e.g. 65536) ✓
  - _ollama_num_ctx = GGUF metadata value (e.g. 256000) ✗ ignored config

Changes:

1. Cap Ollama num_ctx to config context_length (run_agent.py)
   When model.context_length is explicitly set and no explicit
   ollama_num_ctx override exists, cap the auto-detected GGUF value
   to the user's context_length. This is the core fix — it prevents
   Ollama from allocating more VRAM than the user budgeted.

2. Pass config_context_length through all secondary call sites
   Several paths called get_model_context_length() without the config
   override, falling through to the 256K default fallback:
   - cli.py: @-reference expansion and /model switch display
   - gateway/run.py: @-reference expansion and /model switch display
   - tui_gateway/server.py: @-reference expansion
   - hermes_cli/model_switch.py: resolve_display_context_length()

3. Normalize root-level context_length in config (hermes_cli/config.py)
   _normalize_root_model_keys() now migrates root-level context_length
   into the model section, matching existing behavior for provider and
   base_url. Users who wrote `context_length: 65536` at the YAML root
   instead of under `model:` had it silently ignored.

4. Fix misleading comments (agent/model_metadata.py)
   DEFAULT_FALLBACK_CONTEXT is 256K (CONTEXT_PROBE_TIERS[0]), not 128K
   as two comments stated.

Tests: 3 new tests for root-level context_length normalization.
All existing context_length tests pass (96 tests).
2026-04-30 04:31:23 -07:00
..
__init__.py
_parser.py refactor(cli): derive relaunch flag table from argparse introspection 2026-04-29 20:33:29 -07:00
auth.py chore(salvage): strip duplicated/merge-corrupted blocks from PR #17664 2026-04-29 21:56:51 -07:00
auth_commands.py feat(cli): add minimax-oauth provider with PKCE browser flow 2026-04-29 09:53:42 -07:00
azure_detect.py chore: remove unused imports and dead locals (ruff F401, F841) (#17010) 2026-04-28 06:46:45 -07:00
backup.py feat(claw-migrate): harden OpenClaw import with plan-first apply, redaction, and pre-migration backup (#16911) 2026-04-28 01:50:23 -07:00
banner.py fix(banner): show correct update status on nix-built hermes (#17550) 2026-04-30 07:03:00 +05:30
browser_connect.py fix(browser): address Copilot review on /browser connect 2026-04-28 22:11:10 -07:00
callbacks.py
claw.py feat(claw-migrate): harden OpenClaw import with plan-first apply, redaction, and pre-migration backup (#16911) 2026-04-28 01:50:23 -07:00
cli_output.py
clipboard.py
codex_models.py
colors.py
commands.py chore(salvage): strip duplicated/merge-corrupted blocks from PR #17664 2026-04-29 21:56:51 -07:00
completion.py
config.py fix(context): honor model.context_length for Ollama num_ctx and all display paths 2026-04-30 04:31:23 -07:00
copilot_auth.py
cron.py
curator.py feat(curator): per-run reports — run.json + REPORT.md under logs/curator/ (#17307) 2026-04-28 23:23:11 -07:00
curses_ui.py
debug.py chore: remove unused imports and dead locals (ruff F401, F841) (#17010) 2026-04-28 06:46:45 -07:00
default_soul.py
dingtalk_auth.py chore: remove unused imports and dead locals (ruff F401, F841) (#17010) 2026-04-28 06:46:45 -07:00
doctor.py fix(ci): stabilize main test suite regressions (#17660) 2026-04-29 23:18:55 -07:00
dump.py refactor(redact): canonical mask_secret helper; fix status.py DIM drift (#17207) 2026-04-28 21:04:35 -07:00
env_loader.py refactor: consolidate symlink-safe atomic replace into shared helper 2026-04-28 04:58:22 -07:00
fallback_cmd.py feat(cli): add 'hermes fallback' command to manage fallback providers (#16052) 2026-04-26 06:19:04 -07:00
gateway.py fix: handle gateway Ctrl+C shutdown cleanly 2026-04-30 03:29:57 -07:00
hooks.py chore: remove unused imports and dead locals (ruff F401, F841) (#17010) 2026-04-28 06:46:45 -07:00
logs.py
main.py fix(curator): unify under auxiliary.curator (hermes model, dashboard) (#17868) 2026-04-30 02:46:01 -07:00
mcp_config.py refactor(config): migrate remaining 33 cfg_get call sites (#17311) 2026-04-29 04:03:03 -07:00
memory_setup.py
model_catalog.py chore: remove unused imports and dead locals (ruff F401, F841) (#17010) 2026-04-28 06:46:45 -07:00
model_normalize.py feat(minimax-oauth): full integration with peer OAuth providers 2026-04-29 09:53:42 -07:00
model_switch.py fix(context): honor model.context_length for Ollama num_ctx and all display paths 2026-04-30 04:31:23 -07:00
models.py fix(anthropic): reactive recovery for OAuth 1M-context beta rejection (#17752) 2026-04-29 21:56:54 -07:00
nous_subscription.py fix(cli): coerce use_gateway config flags in tool routing 2026-04-26 19:02:55 -07:00
oneshot.py fix(tui): honor launch toolsets (#17623) 2026-04-29 16:55:27 -07:00
pairing.py
platforms.py feat: complete plugin platform parity — all 12 integration points 2026-04-29 21:56:51 -07:00
plugins.py feat(plugins): bundled platform plugins auto-load by default 2026-04-29 21:56:51 -07:00
plugins_cmd.py feat(gateway): unify setup flows, load platforms dynamically from registry 2026-04-29 21:56:51 -07:00
profiles.py fix(cli): exclude profiles/ from profile create --clone-all 2026-04-29 14:21:35 -07:00
providers.py feat(minimax-oauth): full integration with peer OAuth providers 2026-04-29 09:53:42 -07:00
pty_bridge.py
relaunch.py remove relaunch_chat 2026-04-29 20:33:29 -07:00
runtime_provider.py fix(runtime_provider): _get_named_custom_provider must honour transport field on v12+ providers dict 2026-04-30 03:29:48 -07:00
setup.py feat(plugins): bundled platform plugins auto-load by default 2026-04-29 21:56:51 -07:00
skills_config.py refactor(config): migrate remaining 33 cfg_get call sites (#17311) 2026-04-29 04:03:03 -07:00
skills_hub.py feat(skills): install skills from a direct HTTP(S) URL (#16323) 2026-04-26 20:57:10 -07:00
skin_engine.py fix(tui): restore macOS copy behavior and theme polish (#17131) 2026-04-28 18:47:14 -05:00
slack_cli.py feat(slack): register every gateway command as a native slash (Discord/Telegram parity) (#16164) 2026-04-26 11:38:32 -07:00
status.py feat: complete plugin platform parity — all 12 integration points 2026-04-29 21:56:51 -07:00
timeouts.py refactor(timeouts): drop redundant ImportError in except clause 2026-04-26 20:48:20 -07:00
tips.py feat(tips): add cost-saving tips from April 30 tip-of-the-day (#17841) 2026-04-30 02:30:36 -07:00
tools_config.py feat(tts): add Piper as a native local TTS provider (closes #8508) (#17885) 2026-04-30 02:53:20 -07:00
uninstall.py
vercel_auth.py feat: add Vercel Sandbox backend 2026-04-29 07:22:33 -07:00
voice.py
web_server.py fix(curator): unify under auxiliary.curator (hermes model, dashboard) (#17868) 2026-04-30 02:46:01 -07:00
webhook.py refactor(config): migrate remaining 33 cfg_get call sites (#17311) 2026-04-29 04:03:03 -07:00