hermes-bsd/tools
Teknium 3800972dd0
feat(vision): vision_analyze returns pixels to vision-capable models, not aux text (#22955)
When the active main model has native vision and the provider supports
multimodal tool results (Anthropic, OpenAI Chat, Codex Responses, Gemini
3, OpenRouter, Nous), vision_analyze loads the image bytes and returns
them to the model as a multimodal tool-result envelope. The model then
sees the pixels directly on its next turn instead of receiving a lossy
text description from an auxiliary LLM.

Falls back to the legacy aux-LLM text path for non-vision models and
unverified providers.

Mirrors the architecture used in OpenCode, Claude Code, Codex CLI, and
Cline. All four converge on the same pattern: tool results carry image
content blocks for vision-capable provider/model combinations.

Changes
- tools/vision_tools.py: _vision_analyze_native fast path + provider
  capability table (_supports_media_in_tool_results). Schema description
  updated to reflect new behaviour.
- agent/codex_responses_adapter.py: function_call_output.output now
  accepts the array form for multimodal tool results (was string-only).
  Preflight validates input_text/input_image parts.
- agent/auxiliary_client.py: _RUNTIME_MAIN_PROVIDER/_MODEL globals so
  tools see the live CLI/gateway override, not the stale config.yaml
  default. set_runtime_main()/clear_runtime_main() helpers.
- run_agent.py: AIAgent.run_conversation calls set_runtime_main at turn
  start so vision_analyze's fast-path check sees the actual runtime.
- tests/conftest.py: clear runtime-main override between tests.

Tests
- tests/tools/test_vision_native_fast_path.py: provider capability
  table, envelope shape, fast-path gating (vision-capable model uses
  fast path; non-vision model falls through to aux).
- tests/run_agent/test_codex_multimodal_tool_result.py: list tool
  content becomes function_call_output.output array; preflight
  preserves arrays and drops unknown part types.

Live verified
- Opus 4.6 + Sonnet 4.6 on OpenRouter: model calls vision_analyze on a
  typed filepath, gets pixels back, reads exact text from images that
  no aux description could capture (font color irony, multi-line
  fruit-count list, etc.).

PR replaces the closed prior efforts (#16506 shipped the inbound user-
attached path; this PR closes the gap for tool-discovered images).
2026-05-09 21:06:19 -07:00
..
browser_providers
computer_use fix(tools): install cua-driver when Computer Use is enabled via 'hermes tools' (#22765) 2026-05-09 13:02:25 -07:00
environments feat(cross-platform): psutil for PID/process management + Windows footgun checker 2026-05-08 14:27:40 -07:00
neutts_samples
web_providers docs(web): fix SearXNG env configuration 2026-05-07 17:54:47 -07:00
__init__.py
ansi_strip.py
approval.py fix(approval): cron jobs must not be treated as gateway context 2026-05-08 07:30:14 -07:00
binary_extensions.py
browser_camofox.py
browser_camofox_state.py
browser_cdp_tool.py fix(async): replace get_event_loop() with get_running_loop() in async contexts 2026-05-09 02:34:19 -07:00
browser_dialog_tool.py
browser_supervisor.py
browser_tool.py fix(browser_tool): fall through to autodetect on config read failure 2026-05-09 13:35:39 -07:00
budget_config.py
checkpoint_manager.py fix(checkpoint): guard _touch_project against non-dict project metadata 2026-05-09 17:53:13 -07:00
clarify_tool.py
code_execution_tool.py feat(cross-platform): psutil for PID/process management + Windows footgun checker 2026-05-08 14:27:40 -07:00
computer_use_tool.py feat(computer-use): cua-driver backend, universal any-model schema 2026-05-08 11:07:38 -07:00
credential_files.py fix(gateway): translate inbound document host paths to container paths for Docker backend 2026-05-07 05:02:26 -07:00
cronjob_tools.py fix(cron): allow quoted URL in github auth-header allowlist 2026-05-09 11:11:45 -07:00
debug_helpers.py
delegate_tool.py feat(openrouter): wire Pareto Code router with min_coding_score knob (#22838) 2026-05-09 14:47:00 -07:00
discord_tool.py feat: add Discord message deletion action 2026-05-07 05:11:09 -07:00
env_passthrough.py
feishu_doc_tool.py perf(cli): cut ~19s from 'hermes' cold start (skills cache + lazy Feishu + no Nous HTTP) (#22138) 2026-05-08 16:39:32 -07:00
feishu_drive_tool.py perf(cli): cut ~19s from 'hermes' cold start (skills cache + lazy Feishu + no Nous HTTP) (#22138) 2026-05-08 16:39:32 -07:00
file_operations.py fix(windows): %1 install error, patch CRLF false-negative, SOUL.md BOM 2026-05-08 14:27:40 -07:00
file_state.py
file_tools.py fix(patch-tool): advertise per-mode required params in schema descriptions 2026-05-08 16:59:24 -07:00
fuzzy_match.py
homeassistant_tool.py
image_generation_tool.py perf(image_gen): defer fal_client import to first generation request (#22859) 2026-05-09 17:45:09 -07:00
interrupt.py
kanban_tools.py fix(security): drop caller-controlled author override in kanban_comment 2026-05-09 02:32:16 -07:00
managed_tool_gateway.py
mcp_oauth.py fix(mcp-oauth): persist OAuth server metadata across process restarts (#21226) 2026-05-07 05:35:33 -07:00
mcp_oauth_manager.py fix(mcp-oauth): persist OAuth server metadata across process restarts (#21226) 2026-05-07 05:35:33 -07:00
mcp_tool.py fix(windows): os.kill(pid, 0) is NOT a no-op on Windows — route through new _pid_exists helper 2026-05-08 14:27:40 -07:00
memory_tool.py feat(cross-platform): psutil for PID/process management + Windows footgun checker 2026-05-08 14:27:40 -07:00
microsoft_graph_auth.py feat(msgraph): add auth and client foundation 2026-05-08 09:27:26 -07:00
microsoft_graph_client.py fix(msgraph): stream download_to_file body instead of buffering 2026-05-08 09:27:26 -07:00
mixture_of_agents_tool.py
neutts_synth.py
openrouter_client.py
osv_check.py
patch_parser.py
path_security.py
process_registry.py fix(process_registry): kill orphaned Popen on post-spawn setup failure 2026-05-09 17:53:24 -07:00
registry.py feat(delegate): show user's actual concurrency / spawn-depth limits in tool description (#22694) 2026-05-09 11:07:53 -07:00
rl_training_tool.py codebase: add encoding='utf-8' to all bare open() calls (PLW1514) 2026-05-08 14:27:40 -07:00
schema_sanitizer.py fix: strip Codex-hostile top-level schema combinators 2026-05-07 07:03:21 -07:00
send_message_tool.py feat(plugins): add standalone_sender_fn for out-of-process cron delivery 2026-05-09 02:56:29 -07:00
session_search_tool.py fix: make session search initialize session db 2026-05-09 14:36:58 -07:00
skill_manager_tool.py fix: exclude hidden and archive dirs from _find_skill rglob 2026-05-07 05:15:28 -07:00
skill_provenance.py fix(curator): only mark agent-created for background-review sediment (#19621) 2026-05-04 02:42:16 -07:00
skill_usage.py feat(cross-platform): psutil for PID/process management + Windows footgun checker 2026-05-08 14:27:40 -07:00
skills_guard.py
skills_hub.py fix(skills-hub): cover remaining SSRF fetch paths after #10029 2026-05-09 17:52:12 -07:00
skills_sync.py
skills_tool.py fix(skills): support category-qualified local skill names 2026-05-05 10:15:31 -07:00
slash_confirm.py
terminal_tool.py fix(terminal): bridge docker_env config to TERMINAL_DOCKER_ENV 2026-05-09 17:53:35 -07:00
tirith_security.py codebase: add encoding='utf-8' to all bare open() calls (PLW1514) 2026-05-08 14:27:40 -07:00
todo_tool.py
tool_backend_helpers.py
tool_output_limits.py
tool_result_storage.py fix(tool-result-storage): persist via stdin to bypass 128 KB exec-arg cap (#22913) 2026-05-09 18:44:58 -07:00
transcription_tools.py
tts_tool.py feat(cross-platform): psutil for PID/process management + Windows footgun checker 2026-05-08 14:27:40 -07:00
url_safety.py fix(browser): enforce cloud-metadata SSRF floor in hybrid routing (#16234) (#21228) 2026-05-07 05:38:05 -07:00
vision_tools.py feat(vision): vision_analyze returns pixels to vision-capable models, not aux text (#22955) 2026-05-09 21:06:19 -07:00
voice_mode.py codebase: add encoding='utf-8' to all bare open() calls (PLW1514) 2026-05-08 14:27:40 -07:00
web_tools.py perf(cli): cut ~19s from 'hermes' cold start (skills cache + lazy Feishu + no Nous HTTP) (#22138) 2026-05-08 16:39:32 -07:00
website_policy.py
xai_http.py
yuanbao_tools.py