2026-02-19 23:23:43 -08:00
#!/usr/bin/env python3
"""
Code Execution Tool - - Programmatic Tool Calling ( PTC )
Lets the LLM write a Python script that calls Hermes tools via RPC ,
collapsing multi - step tool chains into a single inference turn .
2026-04-04 12:57:49 -07:00
Architecture ( two transports ) :
* * Local backend ( UDS ) : * *
1. Parent generates a ` hermes_tools . py ` stub module with UDS RPC functions
2026-02-19 23:23:43 -08:00
2. Parent opens a Unix domain socket and starts an RPC listener thread
3. Parent spawns a child process that runs the LLM ' s script
2026-04-04 12:57:49 -07:00
4. Tool calls travel over the UDS back to the parent for dispatch
* * Remote backends ( file - based RPC ) : * *
1. Parent generates ` hermes_tools . py ` with file - based RPC stubs
2. Parent ships both files to the remote environment
3. Script runs inside the terminal backend ( Docker / SSH / Modal / Daytona / etc . )
4. Tool calls are written as request files ; a polling thread on the parent
feat(environments): unified spawn-per-call execution layer
Replace dual execution model (PersistentShellMixin + per-backend oneshot)
with spawn-per-call + session snapshot for all backends except ManagedModal.
Core changes:
- Every command spawns a fresh bash process; session snapshot (env vars,
functions, aliases) captured at init and re-sourced before each command
- CWD persists via file-based read (local) or in-band stdout markers (remote)
- ProcessHandle protocol + _ThreadedProcessHandle adapter for SDK backends
- cancel_fn wired for Modal (sandbox.terminate) and Daytona (sandbox.stop)
- Shared utilities extracted: _pipe_stdin, _popen_bash, _load_json_store,
_save_json_store, _file_mtime_key, _SYNC_INTERVAL_SECONDS
- Rate-limited file sync unified in base _before_execute() with _sync_files() hook
- execute_oneshot() removed; all 11 call sites in code_execution_tool.py
migrated to execute()
- Daytona timeout wrapper replaced with SDK-native timeout parameter
- persistent_shell.py deleted (291 lines)
Backend-specific:
- Local: process-group kill via os.killpg, file-based CWD read
- Docker: -e env flags only on init_session, not per-command
- SSH: shlex.quote transport, ControlMaster connection reuse
- Singularity: apptainer exec with instance://, no forced --pwd
- Modal: _AsyncWorker + _ThreadedProcessHandle, cancel_fn -> sandbox.terminate
- Daytona: SDK-level timeout (not shell wrapper), cancel_fn -> sandbox.stop
- ManagedModal: unchanged (gateway owns execution); docstring added explaining why
2026-04-08 13:38:04 -07:00
reads them via env . execute ( ) , dispatches , and writes response files
2026-04-04 12:57:49 -07:00
5. The script polls for response files and continues
In both cases , only the script ' s stdout is returned to the LLM; intermediate
tool results never enter the context window .
2026-02-19 23:23:43 -08:00
2026-04-04 12:57:49 -07:00
Platform : Linux / macOS only ( Unix domain sockets for local ) . Disabled on Windows .
Remote execution additionally requires Python 3 in the terminal backend .
2026-02-19 23:23:43 -08:00
"""
2026-04-04 12:57:49 -07:00
import base64
feat(execute_code): add project/strict execution modes, default to project (#11971)
Weaker models (Gemma-class) repeatedly rediscover and forget that
execute_code uses a different CWD and Python interpreter than terminal(),
causing them to flip-flop on whether user files exist and to hit import
errors on project dependencies like pandas.
Adds a new 'code_execution.mode' config key (default 'project') that
brings execute_code into line with terminal()'s filesystem/interpreter:
project (new default):
- cwd = session's TERMINAL_CWD (falls back to os.getcwd())
- python = active VIRTUAL_ENV/bin/python or CONDA_PREFIX/bin/python
with a Python 3.8+ version check; falls back cleanly to
sys.executable if no venv or the candidate fails
- result : 'import pandas' works, '.env' resolves, matches terminal()
strict (opt-in):
- cwd = staging tmpdir (today's behavior)
- python = sys.executable (today's behavior)
- result : maximum reproducibility and isolation; project deps
won't resolve
Security-critical invariants are identical across both modes and covered by
explicit regression tests:
- env scrubbing (strips *_API_KEY, *_TOKEN, *_SECRET, *_PASSWORD,
*_CREDENTIAL, *_PASSWD, *_AUTH substrings)
- SANDBOX_ALLOWED_TOOLS whitelist (no execute_code recursion, no
delegate_task, no MCP from inside scripts)
- resource caps (5-min timeout, 50KB stdout, 50 tool calls)
Deliberately avoids 'sandbox'/'isolated'/'cloud' language in tool
descriptions (regression from commit 39b83f34 where agents on local
backends falsely believed they were sandboxed and refused networking).
Override via env var: HERMES_EXECUTE_CODE_MODE=strict|project
2026-04-18 01:46:25 -07:00
import functools
2026-02-19 23:23:43 -08:00
import json
import logging
import os
2026-03-01 01:54:27 +03:00
import platform
2026-04-09 13:46:08 +02:00
import shlex
2026-02-19 23:23:43 -08:00
import signal
import socket
import subprocess
import sys
import tempfile
import threading
import time
import uuid
2026-03-01 01:54:27 +03:00
_IS_WINDOWS = platform . system ( ) == " Windows "
2026-02-19 23:23:43 -08:00
from typing import Any , Dict , List , Optional
2026-05-07 18:17:31 -07:00
# Availability gate. On Windows we fall back to loopback TCP for the
# sandbox RPC transport (AF_UNIX is unreliable on Windows Python) — see
# ``_use_tcp_rpc`` in ``_execute_local`` below. That makes execute_code
# available on every platform Hermes itself runs on.
2026-02-21 03:32:11 -08:00
logger = logging . getLogger ( __name__ )
2026-05-07 18:17:31 -07:00
SANDBOX_AVAILABLE = True
2026-02-19 23:23:43 -08:00
# The 7 tools allowed inside the sandbox. The intersection of this list
# and the session's enabled tools determines which stubs are generated.
SANDBOX_ALLOWED_TOOLS = frozenset ( [
" web_search " ,
" web_extract " ,
" read_file " ,
" write_file " ,
2026-02-20 02:43:57 -08:00
" search_files " ,
2026-02-19 23:23:43 -08:00
" patch " ,
" terminal " ,
] )
# Resource limit defaults (overridable via config.yaml → code_execution.*)
2026-02-20 01:29:53 -08:00
DEFAULT_TIMEOUT = 300 # 5 minutes
2026-02-19 23:23:43 -08:00
DEFAULT_MAX_TOOL_CALLS = 50
MAX_STDOUT_BYTES = 50_000 # 50 KB
MAX_STDERR_BYTES = 10_000 # 10 KB
2026-05-07 18:39:38 -07:00
# Environment variable scrubbing rules (shared between the local + remote
# backends). Secret-substring block is applied first; anything left must
# match either a safe prefix or, on Windows, an OS-essential name.
_SAFE_ENV_PREFIXES = ( " PATH " , " HOME " , " USER " , " LANG " , " LC_ " , " TERM " ,
" TMPDIR " , " TMP " , " TEMP " , " SHELL " , " LOGNAME " ,
" XDG_ " , " PYTHONPATH " , " VIRTUAL_ENV " , " CONDA " ,
" HERMES_ " )
_SECRET_SUBSTRINGS = ( " KEY " , " TOKEN " , " SECRET " , " PASSWORD " , " CREDENTIAL " ,
" PASSWD " , " AUTH " )
# Windows-only: a handful of variables are required by the OS/CRT itself.
# Without them, even stdlib calls like ``socket.socket()`` fail with
# WinError 10106 (Winsock can't locate mswsock.dll) and ``subprocess``
# can't resolve cmd.exe. These are well-known OS paths, not secrets, so
# we allow them through by exact name. The _SECRET_SUBSTRINGS block
# still runs as a safety net (none of these names match those substrings).
_WINDOWS_ESSENTIAL_ENV_VARS = frozenset ( {
" SYSTEMROOT " , # %SYSTEMROOT%\System32 — Winsock needs this
" SYSTEMDRIVE " , # C: (or wherever Windows lives)
" WINDIR " , # usually same as SYSTEMROOT
" COMSPEC " , # cmd.exe path — subprocess shell=True needs it
" PATHEXT " , # .COM;.EXE;.BAT;... — shell lookup
" OS " , # "Windows_NT" — some tools gate on this
" PROCESSOR_ARCHITECTURE " ,
" NUMBER_OF_PROCESSORS " ,
" PUBLIC " , # C:\Users\Public
" ALLUSERSPROFILE " , # C:\ProgramData — some stdlib paths use it
" PROGRAMDATA " , # C:\ProgramData
" PROGRAMFILES " ,
" PROGRAMFILES(X86) " ,
" PROGRAMW6432 " ,
" APPDATA " , # %USERPROFILE%\AppData\Roaming — Python uses it
" LOCALAPPDATA " , # %USERPROFILE%\AppData\Local
" USERPROFILE " , # C:\Users\<name> — Python's expanduser uses it
" USERDOMAIN " ,
" USERNAME " ,
" HOMEDRIVE " , # C:
" HOMEPATH " , # \Users\<name>
" COMPUTERNAME " ,
} )
def _scrub_child_env ( source_env , is_passthrough = None , is_windows = None ) :
""" Produce the scrubbed child-process env for execute_code.
Rules ( order matters ) :
1. Passthrough vars ( skill - or config - declared ) always pass .
2. Secret - substring names ( KEY / TOKEN / etc . ) are blocked .
3. Names matching a safe prefix pass .
4. On Windows , a small OS - essential allowlist passes by exact name
— without these the child can ' t even create a socket or spawn a
subprocess .
Extracted into a helper so tests can exercise the logic without
spawning a subprocess .
"""
if is_passthrough is None :
try :
from tools . env_passthrough import is_env_passthrough as _ep
except Exception :
_ep = lambda _ : False # noqa: E731
is_passthrough = _ep
if is_windows is None :
is_windows = _IS_WINDOWS
scrubbed = { }
for k , v in source_env . items ( ) :
if is_passthrough ( k ) :
scrubbed [ k ] = v
continue
if any ( s in k . upper ( ) for s in _SECRET_SUBSTRINGS ) :
continue
if any ( k . startswith ( p ) for p in _SAFE_ENV_PREFIXES ) :
scrubbed [ k ] = v
continue
if is_windows and k . upper ( ) in _WINDOWS_ESSENTIAL_ENV_VARS :
scrubbed [ k ] = v
return scrubbed
2026-02-19 23:23:43 -08:00
def check_sandbox_requirements ( ) - > bool :
""" Code execution sandbox requires a POSIX OS for Unix domain sockets. """
2026-04-29 18:20:53 +05:30
if not SANDBOX_AVAILABLE :
return False
try :
from tools . terminal_tool import (
_check_vercel_sandbox_requirements ,
_get_env_config ,
)
config = _get_env_config ( )
except Exception :
logger . debug ( " Could not resolve terminal config for execute_code availability " , exc_info = True )
return False
if config . get ( " env_type " ) == " vercel_sandbox " :
return _check_vercel_sandbox_requirements ( config )
return True
2026-02-19 23:23:43 -08:00
# ---------------------------------------------------------------------------
# hermes_tools.py code generator
# ---------------------------------------------------------------------------
# Per-tool stub templates: (function_name, signature, docstring, args_dict_expr)
# The args_dict_expr builds the JSON payload sent over the RPC socket.
_TOOL_STUBS = {
" web_search " : (
" web_search " ,
" query: str, limit: int = 5 " ,
2026-02-19 23:30:01 -08:00
' " " " Search the web. Returns dict with data.web list of { url, title, description}. " " " ' ,
2026-02-19 23:23:43 -08:00
' { " query " : query, " limit " : limit} ' ,
) ,
" web_extract " : (
" web_extract " ,
" urls: list " ,
fix: improve read-loop detection — consecutive-only, correct thresholds, fix bugs
Follow-up to PR #705 (merged from 0xbyt4). Addresses several issues:
1. CONSECUTIVE-ONLY TRACKING: Redesigned the read/search tracker to only
warn/block on truly consecutive identical calls. Any other tool call
in between (write, patch, terminal, etc.) resets the counter via
notify_other_tool_call(), called from handle_function_call() in
model_tools.py. This prevents false blocks in read→edit→verify flows.
2. THRESHOLD ADJUSTMENT: Warn on 3rd consecutive (was 2nd), block on
4th+ consecutive (was 3rd+). Gives the model more room before
intervening.
3. TUPLE UNPACKING BUG: Fixed get_read_files_summary() which crashed on
search keys (5-tuple) when trying to unpack as 3-tuple. Now uses a
separate read_history set that only tracks file reads.
4. WEB_EXTRACT DOCSTRING: Reverted incorrect removal of 'title' from
web_extract return docs in code_execution_tool.py — the field IS
returned by web_tools.py.
5. TESTS: Rewrote test_read_loop_detection.py (35 tests) to cover
consecutive-only behavior, notify_other_tool_call, interleaved
read/search, and summary-unaffected-by-searches.
2026-03-10 16:25:41 -07:00
' " " " Extract content from URLs. Returns dict with results list of { url, title, content, error}. " " " ' ,
2026-02-19 23:23:43 -08:00
' { " urls " : urls} ' ,
) ,
" read_file " : (
" read_file " ,
" path: str, offset: int = 1, limit: int = 500 " ,
' " " " Read a file (1-indexed lines). Returns dict with " content " and " total_lines " . " " " ' ,
' { " path " : path, " offset " : offset, " limit " : limit} ' ,
) ,
" write_file " : (
" write_file " ,
fix(profiles): cross-profile soft guard on file-write tools + system-prompt hint (#31290)
* fix(profiles): cross-profile soft guard on file-write tools + system-prompt hint
Adds a soft guard so an agent running under one Hermes profile cannot
silently edit a different profile's skills/plugins/cron/memories.
Three layers:
A. agent/file_safety.classify_cross_profile_target
Classifies a write target against the active HERMES_HOME. Returns
a {active_profile, target_profile, area, target_path} dict when the
path lands in another profile's scoped area. PROFILE_SCOPED_AREAS =
(skills, plugins, cron, memories). get_cross_profile_warning()
wraps it into a model-facing error string that names both profiles,
names the area, and points at the cross_profile=True bypass.
Defense-in-depth, NOT a security boundary — the terminal tool runs
as the same OS user and can write any of these paths directly. The
guard exists to prevent confused-agent corruption, not to stop a
determined attacker. SECURITY.md §3.2 (terminal-bypass posture)
still applies.
Wired into tools/file_tools.write_file_tool and patch_tool with a
cross_profile=False kwarg. WRITE_FILE_SCHEMA and PATCH_SCHEMA both
advertise cross_profile so the model can pass it after explicit
user direction. patch_tool extracts target paths from V4A patch
bodies before checking (same shape as the existing sensitive-path
check).
skill_manage is already scoped to the active profile's SKILLS_DIR
by construction, so no extra guard wiring is needed there. The
D-side error message (below) still names other profiles when the
skill exists elsewhere.
B. agent/system_prompt
One deterministic line near the environment-hints block names the
active profile and tells the model not to modify another profile's
skills/plugins/cron/memories without explicit direction. Profile
name is stable for the lifetime of the AIAgent, so the line is
prompt-cache-safe.
D. tools/skill_manager_tool._skill_not_found_error
Replaces the bare "Skill 'X' not found." with a message that:
- names the active profile,
- searches OTHER profiles' skills dirs for the same name,
- names the profile(s) where the skill exists and the path,
- suggests `hermes -p <name>` to switch profiles, or
cross_profile=True for an explicit edit.
All 5 "not found" sites in skill_manager_tool (edit, patch, delete,
write_file, remove_file) now go through the helper.
Reference incident (May 2026): a hermes-security profile session
edited skills under both ~/.hermes/profiles/hermes-security/skills/
AND ~/.hermes/skills/ (the default profile's skills) without
realizing the second path belonged to a different profile. Three of
the four skill files needed manual restoration afterward.
What this PR does NOT do:
* No hard block. The terminal tool can still touch any of these
paths with no guard — same posture as the dangerous-command
approval flow. SECURITY.md §3.2 applies.
* No regex sweep on terminal commands for cross-profile paths.
That direction is a Skills-Guard-style arms race (cd + relative
paths, base64, etc.) and would false-positive on legitimate
cross-profile reads. Filed as a follow-up.
* No on-disk path migration. ~/.hermes/skills/ remains the
default profile's skills dir; this PR is about telling the
agent about that boundary, not changing the layout.
Tests:
tests/agent/test_file_safety_cross_profile.py (16 tests)
- _resolve_active_profile_name covers default/named/failure paths
- classify_cross_profile_target covers all four scoped areas,
both directions (default → named, named → default, named → named),
non-Hermes paths, and root-level config files
- get_cross_profile_warning covers in-profile no-op, cross-profile
message shape, and the defense-in-depth self-documentation
tests/tools/test_cross_profile_guard.py (12 tests)
- write_file: in-profile allow, cross-profile block, cross_profile=True
bypass, non-Hermes pass-through
- patch: replace-mode block, cross_profile=True bypass, V4A patch
path extraction
- skill_manage: error names the other profile (single + multiple),
missing-everywhere falls back to skills_list hint
- system prompt: contract-level checks (both branches present,
cross_profile=True mentioned, ~/.hermes/profiles/ referenced)
All 207 existing tests in file_safety/file_operations/skill_manager
still pass. 10 system-prompt tests still pass.
E2E verified: the exact incident scenario (security profile editing
default's hermes-agent-dev skill) is now blocked with the warning
message; cross_profile=True unblocks.
* fix(code_execution): add cross_profile to write_file/patch stubs
The cross_profile kwarg added to write_file_tool/patch_tool needs to
flow through the execute_code sandbox stubs in _TOOL_STUBS so the
test_stubs_cover_all_schema_params drift test passes. Without this,
scripts running inside execute_code couldn't pass cross_profile=True
through hermes_tools.write_file().
Caught by CI on PR #31290.
2026-05-24 00:38:17 -07:00
" path: str, content: str, cross_profile: bool = False " ,
' " " " Write content to a file (always overwrites). Returns dict with status. cross_profile=True opts out of the cross-Hermes-profile soft guard. " " " ' ,
' { " path " : path, " content " : content, " cross_profile " : cross_profile} ' ,
2026-02-19 23:23:43 -08:00
) ,
2026-02-20 02:43:57 -08:00
" search_files " : (
" search_files " ,
2026-03-06 03:40:06 -08:00
' pattern: str, target: str = " content " , path: str = " . " , file_glob: str = None, limit: int = 50, offset: int = 0, output_mode: str = " content " , context: int = 0 ' ,
' " " " Search file contents (target= " content " ) or find files by name (target= " files " ). Returns dict with " matches " . " " " ' ,
' { " pattern " : pattern, " target " : target, " path " : path, " file_glob " : file_glob, " limit " : limit, " offset " : offset, " output_mode " : output_mode, " context " : context} ' ,
2026-02-19 23:23:43 -08:00
) ,
" patch " : (
" patch " ,
fix(profiles): cross-profile soft guard on file-write tools + system-prompt hint (#31290)
* fix(profiles): cross-profile soft guard on file-write tools + system-prompt hint
Adds a soft guard so an agent running under one Hermes profile cannot
silently edit a different profile's skills/plugins/cron/memories.
Three layers:
A. agent/file_safety.classify_cross_profile_target
Classifies a write target against the active HERMES_HOME. Returns
a {active_profile, target_profile, area, target_path} dict when the
path lands in another profile's scoped area. PROFILE_SCOPED_AREAS =
(skills, plugins, cron, memories). get_cross_profile_warning()
wraps it into a model-facing error string that names both profiles,
names the area, and points at the cross_profile=True bypass.
Defense-in-depth, NOT a security boundary — the terminal tool runs
as the same OS user and can write any of these paths directly. The
guard exists to prevent confused-agent corruption, not to stop a
determined attacker. SECURITY.md §3.2 (terminal-bypass posture)
still applies.
Wired into tools/file_tools.write_file_tool and patch_tool with a
cross_profile=False kwarg. WRITE_FILE_SCHEMA and PATCH_SCHEMA both
advertise cross_profile so the model can pass it after explicit
user direction. patch_tool extracts target paths from V4A patch
bodies before checking (same shape as the existing sensitive-path
check).
skill_manage is already scoped to the active profile's SKILLS_DIR
by construction, so no extra guard wiring is needed there. The
D-side error message (below) still names other profiles when the
skill exists elsewhere.
B. agent/system_prompt
One deterministic line near the environment-hints block names the
active profile and tells the model not to modify another profile's
skills/plugins/cron/memories without explicit direction. Profile
name is stable for the lifetime of the AIAgent, so the line is
prompt-cache-safe.
D. tools/skill_manager_tool._skill_not_found_error
Replaces the bare "Skill 'X' not found." with a message that:
- names the active profile,
- searches OTHER profiles' skills dirs for the same name,
- names the profile(s) where the skill exists and the path,
- suggests `hermes -p <name>` to switch profiles, or
cross_profile=True for an explicit edit.
All 5 "not found" sites in skill_manager_tool (edit, patch, delete,
write_file, remove_file) now go through the helper.
Reference incident (May 2026): a hermes-security profile session
edited skills under both ~/.hermes/profiles/hermes-security/skills/
AND ~/.hermes/skills/ (the default profile's skills) without
realizing the second path belonged to a different profile. Three of
the four skill files needed manual restoration afterward.
What this PR does NOT do:
* No hard block. The terminal tool can still touch any of these
paths with no guard — same posture as the dangerous-command
approval flow. SECURITY.md §3.2 applies.
* No regex sweep on terminal commands for cross-profile paths.
That direction is a Skills-Guard-style arms race (cd + relative
paths, base64, etc.) and would false-positive on legitimate
cross-profile reads. Filed as a follow-up.
* No on-disk path migration. ~/.hermes/skills/ remains the
default profile's skills dir; this PR is about telling the
agent about that boundary, not changing the layout.
Tests:
tests/agent/test_file_safety_cross_profile.py (16 tests)
- _resolve_active_profile_name covers default/named/failure paths
- classify_cross_profile_target covers all four scoped areas,
both directions (default → named, named → default, named → named),
non-Hermes paths, and root-level config files
- get_cross_profile_warning covers in-profile no-op, cross-profile
message shape, and the defense-in-depth self-documentation
tests/tools/test_cross_profile_guard.py (12 tests)
- write_file: in-profile allow, cross-profile block, cross_profile=True
bypass, non-Hermes pass-through
- patch: replace-mode block, cross_profile=True bypass, V4A patch
path extraction
- skill_manage: error names the other profile (single + multiple),
missing-everywhere falls back to skills_list hint
- system prompt: contract-level checks (both branches present,
cross_profile=True mentioned, ~/.hermes/profiles/ referenced)
All 207 existing tests in file_safety/file_operations/skill_manager
still pass. 10 system-prompt tests still pass.
E2E verified: the exact incident scenario (security profile editing
default's hermes-agent-dev skill) is now blocked with the warning
message; cross_profile=True unblocks.
* fix(code_execution): add cross_profile to write_file/patch stubs
The cross_profile kwarg added to write_file_tool/patch_tool needs to
flow through the execute_code sandbox stubs in _TOOL_STUBS so the
test_stubs_cover_all_schema_params drift test passes. Without this,
scripts running inside execute_code couldn't pass cross_profile=True
through hermes_tools.write_file().
Caught by CI on PR #31290.
2026-05-24 00:38:17 -07:00
' path: str = None, old_string: str = None, new_string: str = None, replace_all: bool = False, mode: str = " replace " , patch: str = None, cross_profile: bool = False ' ,
' " " " Targeted find-and-replace (mode= " replace " ) or V4A multi-file patches (mode= " patch " ). Returns dict with status. cross_profile=True opts out of the cross-Hermes-profile soft guard. " " " ' ,
' { " path " : path, " old_string " : old_string, " new_string " : new_string, " replace_all " : replace_all, " mode " : mode, " patch " : patch, " cross_profile " : cross_profile} ' ,
2026-02-19 23:23:43 -08:00
) ,
" terminal " : (
" terminal " ,
" command: str, timeout: int = None, workdir: str = None " ,
' " " " Run a shell command (foreground only). Returns dict with " output " and " exit_code " . " " " ' ,
' { " command " : command, " timeout " : timeout, " workdir " : workdir} ' ,
) ,
}
2026-04-04 12:57:49 -07:00
def generate_hermes_tools_module ( enabled_tools : List [ str ] ,
transport : str = " uds " ) - > str :
2026-02-19 23:23:43 -08:00
"""
Build the source code for the hermes_tools . py stub module .
Only tools in both SANDBOX_ALLOWED_TOOLS and enabled_tools get stubs .
2026-04-04 12:57:49 -07:00
Args :
enabled_tools : Tool names enabled in the current session .
transport : ` ` " uds " ` ` for Unix domain socket ( local backend ) or
` ` " file " ` ` for file - based RPC ( remote backends ) .
2026-02-19 23:23:43 -08:00
"""
tools_to_generate = sorted ( SANDBOX_ALLOWED_TOOLS & set ( enabled_tools ) )
stub_functions = [ ]
export_names = [ ]
for tool_name in tools_to_generate :
if tool_name not in _TOOL_STUBS :
continue
func_name , sig , doc , args_expr = _TOOL_STUBS [ tool_name ]
stub_functions . append (
f " def { func_name } ( { sig } ): \n "
f " { doc } \n "
f " return _call( { func_name !r} , { args_expr } ) \n "
)
export_names . append ( func_name )
2026-04-04 12:57:49 -07:00
if transport == " file " :
header = _FILE_TRANSPORT_HEADER
else :
header = _UDS_TRANSPORT_HEADER
2026-02-19 23:23:43 -08:00
2026-04-04 12:57:49 -07:00
return header + " \n " . join ( stub_functions )
2026-02-19 23:23:43 -08:00
2026-03-06 01:52:46 -08:00
2026-04-04 12:57:49 -07:00
# ---- Shared helpers section (embedded in both transport headers) ----------
_COMMON_HELPERS = ''' \
2026-03-06 01:52:46 -08:00
# ---------------------------------------------------------------------------
# Convenience helpers (avoid common scripting pitfalls)
# ---------------------------------------------------------------------------
def json_parse ( text : str ) :
""" Parse JSON tolerant of control characters (strict=False).
Use this instead of json . loads ( ) when parsing output from terminal ( )
or web_extract ( ) that may contain raw tabs / newlines in strings . """
return json . loads ( text , strict = False )
def shell_quote ( s : str ) - > str :
""" Shell-escape a string for safe interpolation into commands.
Use this when inserting dynamic content into terminal ( ) commands :
terminal ( f " echo { shell_quote ( user_input ) } " )
"""
return shlex . quote ( s )
def retry ( fn , max_attempts = 3 , delay = 2 ) :
""" Retry a function up to max_attempts times with exponential backoff.
Use for transient failures ( network errors , API rate limits ) :
result = retry ( lambda : terminal ( " gh issue list ... " ) )
"""
last_err = None
for attempt in range ( max_attempts ) :
try :
return fn ( )
except Exception as e :
last_err = e
if attempt < max_attempts - 1 :
time . sleep ( delay * ( 2 * * attempt ) )
raise last_err
2026-04-04 12:57:49 -07:00
'''
# ---- UDS transport (local backend) ---------------------------------------
_UDS_TRANSPORT_HEADER = ''' \
""" Auto-generated Hermes tools RPC stubs. """
2026-04-30 13:20:05 +08:00
import json , os , socket , shlex , threading , time
2026-04-04 12:57:49 -07:00
_sock = None
2026-04-30 13:20:05 +08:00
# The RPC server handles a single client connection serially and has no
# request-id in the protocol, so concurrent _call() invocations from multiple
# threads (e.g. ThreadPoolExecutor) would race on the shared socket and get
# each other's responses. Serialize the entire send+recv round-trip.
_call_lock = threading . Lock ( )
2026-04-04 12:57:49 -07:00
''' + _COMMON_HELPERS + ''' \
2026-02-19 23:23:43 -08:00
def _connect ( ) :
feat(windows): close remaining POSIX-only landmines — TUI crash, kanban waitpid, AF_UNIX sandbox, /bin/bash, npm .cmd shims, cwd tracking, detach flags
Second pass on native Windows support, driven by a systematic audit across
five areas: POSIX-only primitives (signal.SIGKILL/SIGHUP/SIGPIPE, os.WNOHANG,
os.setsid), path translation bugs (/c/Users → C:\Users), subprocess patterns
(npm.cmd batch shims, start_new_session no-op on Windows), subsystem health
(cron, gateway daemon, update flow), and module-level import guards.
Every change is platform-gated — POSIX (Linux/macOS) behaviour is preserved
bit-identical. Explicit "do no harm" test: test_posix_path_preserved_on_linux,
test_posix_noop, test_windows_detach_popen_kwargs_is_posix_equivalent_on_posix.
## New module
- hermes_cli/_subprocess_compat.py — shared helpers (resolve_node_command,
windows_detach_flags, windows_hide_flags, windows_detach_popen_kwargs).
All no-ops on non-Windows.
## CRITICAL fixes (would crash or silently break on Windows)
- tui_gateway/entry.py: SIGPIPE/SIGHUP referenced at module top level would
AttributeError on import on Windows, breaking `hermes --tui` entirely (it
spawns this module as a subprocess). Guard each signal.signal() call with
hasattr() and add SIGBREAK as Windows' SIGHUP equivalent.
- hermes_cli/kanban_db.py: os.waitpid(-1, os.WNOHANG) in dispatcher tick was
unguarded. os.WNOHANG doesn't exist on Windows. Gate the whole reap loop
behind `os.name != "nt"` — Windows has no zombies anyway.
- tools/code_execution_tool.py: AF_UNIX socket for execute_code RPC fails on
most Windows builds. Fall back to loopback TCP (AF_INET on 127.0.0.1:0
ephemeral port) when _IS_WINDOWS. HERMES_RPC_SOCKET env var now accepts
either a filesystem path (POSIX) or `tcp://127.0.0.1:<port>` (Windows).
Generated sandbox client parses both.
- cron/scheduler.py: `argv = ["/bin/bash", str(path)]` hardcoded. Use
shutil.which("bash") so Windows (Git Bash via MinGit) works, with a
readable error when bash is genuinely absent.
- 6 bare npm/npx spawn sites: tools_config.py x2, doctor.py, whatsapp.py
(npm install + node version probe), browser_tool.py x2. On Windows npm
is npm.cmd / npx is npx.cmd (batch shims); subprocess.Popen(["npm", ...])
fails with WinError 193. shutil.which(...) returns the absolute .cmd
path which CreateProcessW accepts because the extension routes through
cmd.exe /c. POSIX behaviour unchanged (shutil.which still returns the
same path subprocess would resolve itself).
## HIGH fixes (silent misbehaviour on Windows)
- tools/environments/local.py get_temp_dir: hardcoded /tmp returned on
Windows meant `_cwd_file = "/tmp/hermes-cwd-*.txt"`, which bash wrote
via MSYS2's virtual /tmp but native Python couldn't open. Result: cwd
tracking silently broken — `cd` in terminal tool did nothing. Windows
branch now returns `%HERMES_HOME%/cache/terminal` with forward slashes
(works in both bash and Python, guaranteed no spaces).
- tools/environments/local.py _make_run_env PATH injection: `/usr/bin not
in split(":")` heuristic mangles Windows PATH (";" separator). Gate
the injection behind `not _IS_WINDOWS`.
- hermes_cli/gateway.py launch_detached_profile_gateway_restart: outer
Popen + watcher-script Popen both used start_new_session=True, which
Windows silently ignores. Watcher stayed attached to CLI's console,
died when user closed terminal after `hermes update`, left gateway
stale. Now branches through windows_detach_popen_kwargs() helper
(CREATE_NEW_PROCESS_GROUP | DETACHED_PROCESS | CREATE_NO_WINDOW on
Windows, start_new_session=True on POSIX — identical to main).
## MEDIUM fixes
- gateway/run.py /restart and /update handlers: hardcoded bash/setsid
chain crashes on Windows when user triggers /update in-gateway. Now
has sys.platform=="win32" branch using sys.executable + a tiny
Python watcher with proper detach flags. POSIX path is unchanged.
- cli.py _git_repo_root: Git on Windows sometimes returns /c/Users/...
style paths that break subprocess.Popen(cwd=...) and Path().resolve().
Added _normalize_git_bash_path() helper that translates /c/Users,
/cygdrive/c, /mnt/c variants to native C:\Users form. POSIX no-op.
_git_repo_root() now routes every result through it.
- cli.py worktree .worktreeinclude: os.symlink on directories failed
hard on Windows (requires admin or Developer Mode). Falls back to
shutil.copytree with a warning log.
## Tests
- 29 new tests in tests/tools/test_windows_native_support.py covering:
subprocess_compat helpers, TUI entry signal guards, kanban waitpid
guard, code_execution TCP fallback source-level invariants, cron bash
resolution, npm/npx bare-spawn lint per-file, local env Windows temp
dir, PATH injection gating, git bash path normalization, symlink
fallback, gateway detached watcher flags.
- One existing test assertion adjusted in test_browser_homebrew_paths:
it compared captured Popen argv to the BARE `"npx"` literal; after the
shutil.which() change argv[0] is the absolute path. New assertion
checks the shape (two items, second is `agent-browser`) rather than
the exact first-item string. Behaviour unchanged; test was too strict.
All 56 tests pass on Linux (30 from previous commits + 26 new).
267 tests from the affected files/dirs (browser, code_exec, local_env,
process_registry, kanban_db, windows_compat) all pass — zero regressions.
tests/hermes_cli/ (3909 pass) and tests/gateway/ (5021 pass) unchanged;
all pre-existing test failures confirmed unrelated via `git stash` re-run.
## What's still deferred (LOW priority)
- Visible cmd-window flashes on short-lived console apps (~14 sites) —
cosmetic, needs a follow-up pass once we have user reports.
- agent/file_safety.py POSIX-only security deny patterns — separate
hardening task.
- tools/process_registry.py returning "/tmp" as fallback — theoretical;
reachable only when all env-var candidates fail.
2026-05-07 17:29:31 -07:00
""" Connect to the parent ' s RPC server via the transport it picked.
HERMES_RPC_SOCKET can be either :
- a filesystem path ( POSIX Unix domain socket — the default on
Linux and macOS )
- a string of the form ` ` tcp : / / 127.0 .0 .1 : < port > ` ` ( Windows , where
AF_UNIX is unreliable — the parent falls back to loopback TCP )
"""
2026-02-19 23:23:43 -08:00
global _sock
if _sock is None :
feat(windows): close remaining POSIX-only landmines — TUI crash, kanban waitpid, AF_UNIX sandbox, /bin/bash, npm .cmd shims, cwd tracking, detach flags
Second pass on native Windows support, driven by a systematic audit across
five areas: POSIX-only primitives (signal.SIGKILL/SIGHUP/SIGPIPE, os.WNOHANG,
os.setsid), path translation bugs (/c/Users → C:\Users), subprocess patterns
(npm.cmd batch shims, start_new_session no-op on Windows), subsystem health
(cron, gateway daemon, update flow), and module-level import guards.
Every change is platform-gated — POSIX (Linux/macOS) behaviour is preserved
bit-identical. Explicit "do no harm" test: test_posix_path_preserved_on_linux,
test_posix_noop, test_windows_detach_popen_kwargs_is_posix_equivalent_on_posix.
## New module
- hermes_cli/_subprocess_compat.py — shared helpers (resolve_node_command,
windows_detach_flags, windows_hide_flags, windows_detach_popen_kwargs).
All no-ops on non-Windows.
## CRITICAL fixes (would crash or silently break on Windows)
- tui_gateway/entry.py: SIGPIPE/SIGHUP referenced at module top level would
AttributeError on import on Windows, breaking `hermes --tui` entirely (it
spawns this module as a subprocess). Guard each signal.signal() call with
hasattr() and add SIGBREAK as Windows' SIGHUP equivalent.
- hermes_cli/kanban_db.py: os.waitpid(-1, os.WNOHANG) in dispatcher tick was
unguarded. os.WNOHANG doesn't exist on Windows. Gate the whole reap loop
behind `os.name != "nt"` — Windows has no zombies anyway.
- tools/code_execution_tool.py: AF_UNIX socket for execute_code RPC fails on
most Windows builds. Fall back to loopback TCP (AF_INET on 127.0.0.1:0
ephemeral port) when _IS_WINDOWS. HERMES_RPC_SOCKET env var now accepts
either a filesystem path (POSIX) or `tcp://127.0.0.1:<port>` (Windows).
Generated sandbox client parses both.
- cron/scheduler.py: `argv = ["/bin/bash", str(path)]` hardcoded. Use
shutil.which("bash") so Windows (Git Bash via MinGit) works, with a
readable error when bash is genuinely absent.
- 6 bare npm/npx spawn sites: tools_config.py x2, doctor.py, whatsapp.py
(npm install + node version probe), browser_tool.py x2. On Windows npm
is npm.cmd / npx is npx.cmd (batch shims); subprocess.Popen(["npm", ...])
fails with WinError 193. shutil.which(...) returns the absolute .cmd
path which CreateProcessW accepts because the extension routes through
cmd.exe /c. POSIX behaviour unchanged (shutil.which still returns the
same path subprocess would resolve itself).
## HIGH fixes (silent misbehaviour on Windows)
- tools/environments/local.py get_temp_dir: hardcoded /tmp returned on
Windows meant `_cwd_file = "/tmp/hermes-cwd-*.txt"`, which bash wrote
via MSYS2's virtual /tmp but native Python couldn't open. Result: cwd
tracking silently broken — `cd` in terminal tool did nothing. Windows
branch now returns `%HERMES_HOME%/cache/terminal` with forward slashes
(works in both bash and Python, guaranteed no spaces).
- tools/environments/local.py _make_run_env PATH injection: `/usr/bin not
in split(":")` heuristic mangles Windows PATH (";" separator). Gate
the injection behind `not _IS_WINDOWS`.
- hermes_cli/gateway.py launch_detached_profile_gateway_restart: outer
Popen + watcher-script Popen both used start_new_session=True, which
Windows silently ignores. Watcher stayed attached to CLI's console,
died when user closed terminal after `hermes update`, left gateway
stale. Now branches through windows_detach_popen_kwargs() helper
(CREATE_NEW_PROCESS_GROUP | DETACHED_PROCESS | CREATE_NO_WINDOW on
Windows, start_new_session=True on POSIX — identical to main).
## MEDIUM fixes
- gateway/run.py /restart and /update handlers: hardcoded bash/setsid
chain crashes on Windows when user triggers /update in-gateway. Now
has sys.platform=="win32" branch using sys.executable + a tiny
Python watcher with proper detach flags. POSIX path is unchanged.
- cli.py _git_repo_root: Git on Windows sometimes returns /c/Users/...
style paths that break subprocess.Popen(cwd=...) and Path().resolve().
Added _normalize_git_bash_path() helper that translates /c/Users,
/cygdrive/c, /mnt/c variants to native C:\Users form. POSIX no-op.
_git_repo_root() now routes every result through it.
- cli.py worktree .worktreeinclude: os.symlink on directories failed
hard on Windows (requires admin or Developer Mode). Falls back to
shutil.copytree with a warning log.
## Tests
- 29 new tests in tests/tools/test_windows_native_support.py covering:
subprocess_compat helpers, TUI entry signal guards, kanban waitpid
guard, code_execution TCP fallback source-level invariants, cron bash
resolution, npm/npx bare-spawn lint per-file, local env Windows temp
dir, PATH injection gating, git bash path normalization, symlink
fallback, gateway detached watcher flags.
- One existing test assertion adjusted in test_browser_homebrew_paths:
it compared captured Popen argv to the BARE `"npx"` literal; after the
shutil.which() change argv[0] is the absolute path. New assertion
checks the shape (two items, second is `agent-browser`) rather than
the exact first-item string. Behaviour unchanged; test was too strict.
All 56 tests pass on Linux (30 from previous commits + 26 new).
267 tests from the affected files/dirs (browser, code_exec, local_env,
process_registry, kanban_db, windows_compat) all pass — zero regressions.
tests/hermes_cli/ (3909 pass) and tests/gateway/ (5021 pass) unchanged;
all pre-existing test failures confirmed unrelated via `git stash` re-run.
## What's still deferred (LOW priority)
- Visible cmd-window flashes on short-lived console apps (~14 sites) —
cosmetic, needs a follow-up pass once we have user reports.
- agent/file_safety.py POSIX-only security deny patterns — separate
hardening task.
- tools/process_registry.py returning "/tmp" as fallback — theoretical;
reachable only when all env-var candidates fail.
2026-05-07 17:29:31 -07:00
endpoint = os . environ [ " HERMES_RPC_SOCKET " ]
if endpoint . startswith ( " tcp:// " ) :
# tcp://host:port (host is always 127.0.0.1 in practice — we
# only bind loopback server-side)
_host_port = endpoint [ len ( " tcp:// " ) : ]
_host , _ , _port = _host_port . rpartition ( " : " )
_sock = socket . socket ( socket . AF_INET , socket . SOCK_STREAM )
_sock . connect ( ( _host or " 127.0.0.1 " , int ( _port ) ) )
else :
_sock = socket . socket ( socket . AF_UNIX , socket . SOCK_STREAM )
_sock . connect ( endpoint )
2026-02-19 23:23:43 -08:00
_sock . settimeout ( 300 )
return _sock
def _call ( tool_name , args ) :
""" Send a tool call to the parent process and return the parsed result. """
request = json . dumps ( { " tool " : tool_name , " args " : args } ) + " \\ n "
2026-04-30 13:20:05 +08:00
with _call_lock :
conn = _connect ( )
conn . sendall ( request . encode ( ) )
buf = b " "
while True :
chunk = conn . recv ( 65536 )
if not chunk :
raise RuntimeError ( " Agent process disconnected " )
buf + = chunk
if buf . endswith ( b " \\ n " ) :
break
2026-02-19 23:23:43 -08:00
raw = buf . decode ( ) . strip ( )
result = json . loads ( raw )
if isinstance ( result , str ) :
try :
return json . loads ( result )
except ( json . JSONDecodeError , TypeError ) :
return result
return result
'''
2026-04-04 12:57:49 -07:00
# ---- File-based transport (remote backends) -------------------------------
_FILE_TRANSPORT_HEADER = ''' \
""" Auto-generated Hermes tools RPC stubs (file-based transport). """
2026-04-30 13:20:05 +08:00
import json , os , shlex , tempfile , threading , time
2026-04-04 12:57:49 -07:00
2026-04-09 13:46:08 +02:00
_RPC_DIR = os . environ . get ( " HERMES_RPC_DIR " ) or os . path . join ( tempfile . gettempdir ( ) , " hermes_rpc " )
2026-04-04 12:57:49 -07:00
_seq = 0
2026-04-30 13:20:05 +08:00
# `_seq += 1` is not atomic (read-modify-write), so concurrent _call()
# invocations from multiple threads could allocate the same sequence number
# and clobber each other's request files. Guard seq allocation with a lock.
_seq_lock = threading . Lock ( )
2026-04-04 12:57:49 -07:00
''' + _COMMON_HELPERS + ''' \
def _call ( tool_name , args ) :
""" Send a tool call request via file-based RPC and wait for response. """
global _seq
2026-04-30 13:20:05 +08:00
with _seq_lock :
_seq + = 1
seq = _seq
seq_str = f " { seq : 06d } "
2026-04-04 12:57:49 -07:00
req_file = os . path . join ( _RPC_DIR , f " req_ { seq_str } " )
res_file = os . path . join ( _RPC_DIR , f " res_ { seq_str } " )
2026-05-07 18:52:59 -07:00
# Write request atomically (write to .tmp, then rename).
# encoding="utf-8" is critical: on Windows-hosted remote backends
# (or any non-UTF-8 locale) the default open() mode would mangle
# non-ASCII chars in tool args when encoding them as JSON.
2026-04-04 12:57:49 -07:00
tmp = req_file + " .tmp "
2026-05-07 18:52:59 -07:00
with open ( tmp , " w " , encoding = " utf-8 " ) as f :
2026-04-30 13:20:05 +08:00
json . dump ( { " tool " : tool_name , " args " : args , " seq " : seq } , f )
2026-04-04 12:57:49 -07:00
os . rename ( tmp , req_file )
# Wait for response with adaptive polling
deadline = time . monotonic ( ) + 300 # 5-minute timeout per tool call
poll_interval = 0.05 # Start at 50ms
while not os . path . exists ( res_file ) :
if time . monotonic ( ) > deadline :
raise RuntimeError ( f " RPC timeout: no response for { tool_name } after 300s " )
time . sleep ( poll_interval )
poll_interval = min ( poll_interval * 1.2 , 0.25 ) # Back off to 250ms
2026-05-07 18:52:59 -07:00
with open ( res_file , encoding = " utf-8 " ) as f :
2026-04-04 12:57:49 -07:00
raw = f . read ( )
# Clean up response file
try :
os . unlink ( res_file )
except OSError :
pass
result = json . loads ( raw )
if isinstance ( result , str ) :
try :
return json . loads ( result )
except ( json . JSONDecodeError , TypeError ) :
return result
return result
'''
2026-02-19 23:23:43 -08:00
# ---------------------------------------------------------------------------
# RPC server (runs in a thread inside the parent process)
# ---------------------------------------------------------------------------
# Terminal parameters that must not be used from ephemeral sandbox scripts
2026-04-11 17:16:11 -07:00
_TERMINAL_BLOCKED_PARAMS = { " background " , " pty " , " notify_on_complete " , " watch_patterns " }
2026-02-19 23:23:43 -08:00
def _rpc_server_loop (
server_sock : socket . socket ,
task_id : str ,
tool_call_log : list ,
tool_call_counter : list , # mutable [int] so the thread can increment
max_tool_calls : int ,
allowed_tools : frozenset ,
) :
"""
Accept one client connection and dispatch tool - call requests until
the client disconnects or the call limit is reached .
"""
from model_tools import handle_function_call
conn = None
try :
server_sock . settimeout ( 5 )
conn , _ = server_sock . accept ( )
conn . settimeout ( 300 )
buf = b " "
while True :
try :
chunk = conn . recv ( 65536 )
except socket . timeout :
break
if not chunk :
break
buf + = chunk
# Process all complete newline-delimited messages in the buffer
while b " \n " in buf :
line , buf = buf . split ( b " \n " , 1 )
line = line . strip ( )
if not line :
continue
call_start = time . monotonic ( )
try :
request = json . loads ( line . decode ( ) )
except ( json . JSONDecodeError , UnicodeDecodeError ) as exc :
refactor: add tool_error/tool_result helpers + read_raw_config, migrate 129 callsites
Add three reusable helpers to eliminate pervasive boilerplate:
tools/registry.py — tool_error() and tool_result():
Every tool handler returns JSON strings. The pattern
json.dumps({"error": msg}, ensure_ascii=False) appeared 106 times,
and json.dumps({"success": False, "error": msg}, ...) another 23.
Now: tool_error(msg) or tool_error(msg, success=False).
tool_result() handles arbitrary result dicts:
tool_result(success=True, data=payload) or tool_result(some_dict).
hermes_cli/config.py — read_raw_config():
Lightweight YAML reader that returns the raw config dict without
load_config()'s deep-merge + migration overhead. Available for
callsites that just need a single config value.
Migration (129 callsites across 32 files):
- tools/: browser_camofox (18), file_tools (10), homeassistant (8),
web_tools (7), skill_manager (7), cronjob (11), code_execution (4),
delegate (5), send_message (4), tts (4), memory (7), session_search (3),
mcp (2), clarify (2), skills_tool (3), todo (1), vision (1),
browser (1), process_registry (2), image_gen (1)
- plugins/memory/: honcho (9), supermemory (9), hindsight (8),
holographic (7), openviking (7), mem0 (7), byterover (6), retaindb (2)
- agent/: memory_manager (2), builtin_memory_provider (1)
2026-04-07 13:36:20 -07:00
resp = tool_error ( f " Invalid RPC request: { exc } " )
2026-02-19 23:23:43 -08:00
conn . sendall ( ( resp + " \n " ) . encode ( ) )
continue
tool_name = request . get ( " tool " , " " )
tool_args = request . get ( " args " , { } )
# Enforce the allow-list
if tool_name not in allowed_tools :
available = " , " . join ( sorted ( allowed_tools ) )
resp = json . dumps ( {
" error " : (
f " Tool ' { tool_name } ' is not available in execute_code. "
f " Available: { available } "
)
} )
conn . sendall ( ( resp + " \n " ) . encode ( ) )
continue
# Enforce tool call limit
if tool_call_counter [ 0 ] > = max_tool_calls :
resp = json . dumps ( {
" error " : (
f " Tool call limit reached ( { max_tool_calls } ). "
" No more tool calls allowed in this execution. "
)
} )
conn . sendall ( ( resp + " \n " ) . encode ( ) )
continue
# Strip forbidden terminal parameters
if tool_name == " terminal " and isinstance ( tool_args , dict ) :
for param in _TERMINAL_BLOCKED_PARAMS :
tool_args . pop ( param , None )
2026-02-20 01:29:53 -08:00
# Dispatch through the standard tool handler.
# Suppress stdout/stderr from internal tool handlers so
# their status prints don't leak into the CLI spinner.
2026-02-19 23:23:43 -08:00
try :
2026-02-20 01:29:53 -08:00
_real_stdout , _real_stderr = sys . stdout , sys . stderr
codebase: add encoding='utf-8' to all bare open() calls (PLW1514)
Closes the last Python-on-Windows UTF-8 exposure by making every
text-mode open() call explicit about its encoding.
Before: on Windows, bare open(path, 'r') defaults to the system
locale encoding (cp1252 on US-locale installs). That means reading
any config/yaml/markdown/json file with non-ASCII content either
crashes with UnicodeDecodeError or silently mis-decodes bytes.
After: all 89 affected call sites in production code now pass
encoding='utf-8' explicitly. Works identically on every platform
and every locale, no surprise behavior.
Mechanical sweep via:
ruff check --preview --extend-select PLW1514 --unsafe-fixes --fix --exclude 'tests,venv,.venv,node_modules,website,optional-skills, skills,tinker-atropos,plugins' .
All 89 fixes have the same shape: open(x) or open(x, mode) became
open(x, encoding='utf-8') or open(x, mode, encoding='utf-8'). Nothing
else changed. Every modified file still parses and the Windows/sandbox
test suite is still green (85 passed, 14 skipped, 0 failed across
tests/tools/test_code_execution_windows_env.py +
tests/tools/test_code_execution_modes.py + tests/tools/test_env_passthrough.py +
tests/test_hermes_bootstrap.py).
Scope notes:
- tests/ excluded: test fixtures can use locale encoding intentionally
(exercising edge cases). If we want to tighten tests later that's
a separate PR.
- plugins/ excluded: plugin-specific conventions may differ; plugin
authors own their code.
- optional-skills/ and skills/ excluded: skill scripts are user-authored
and we don't want to mass-edit them.
- website/ and tinker-atropos/ excluded: vendored / generated content.
46 files touched, 89 +/- lines (symmetric replacement). No behavior
change on POSIX or on Windows when the file is ASCII; bug fix on
Windows when the file contains non-ASCII.
2026-05-07 19:24:45 -07:00
devnull = open ( os . devnull , " w " , encoding = " utf-8 " )
2026-02-20 01:29:53 -08:00
try :
2026-03-21 15:55:25 -07:00
sys . stdout = devnull
sys . stderr = devnull
2026-02-20 01:29:53 -08:00
result = handle_function_call (
tool_name , tool_args , task_id = task_id
)
finally :
sys . stdout , sys . stderr = _real_stdout , _real_stderr
2026-03-21 15:55:25 -07:00
devnull . close ( )
2026-02-19 23:23:43 -08:00
except Exception as exc :
2026-03-08 14:50:23 +03:00
logger . error ( " Tool call failed in sandbox: %s " , exc , exc_info = True )
refactor: add tool_error/tool_result helpers + read_raw_config, migrate 129 callsites
Add three reusable helpers to eliminate pervasive boilerplate:
tools/registry.py — tool_error() and tool_result():
Every tool handler returns JSON strings. The pattern
json.dumps({"error": msg}, ensure_ascii=False) appeared 106 times,
and json.dumps({"success": False, "error": msg}, ...) another 23.
Now: tool_error(msg) or tool_error(msg, success=False).
tool_result() handles arbitrary result dicts:
tool_result(success=True, data=payload) or tool_result(some_dict).
hermes_cli/config.py — read_raw_config():
Lightweight YAML reader that returns the raw config dict without
load_config()'s deep-merge + migration overhead. Available for
callsites that just need a single config value.
Migration (129 callsites across 32 files):
- tools/: browser_camofox (18), file_tools (10), homeassistant (8),
web_tools (7), skill_manager (7), cronjob (11), code_execution (4),
delegate (5), send_message (4), tts (4), memory (7), session_search (3),
mcp (2), clarify (2), skills_tool (3), todo (1), vision (1),
browser (1), process_registry (2), image_gen (1)
- plugins/memory/: honcho (9), supermemory (9), hindsight (8),
holographic (7), openviking (7), mem0 (7), byterover (6), retaindb (2)
- agent/: memory_manager (2), builtin_memory_provider (1)
2026-04-07 13:36:20 -07:00
result = tool_error ( str ( exc ) )
2026-02-19 23:23:43 -08:00
tool_call_counter [ 0 ] + = 1
call_duration = time . monotonic ( ) - call_start
# Log for observability
args_preview = str ( tool_args ) [ : 80 ]
tool_call_log . append ( {
" tool " : tool_name ,
" args_preview " : args_preview ,
" duration " : round ( call_duration , 2 ) ,
} )
conn . sendall ( ( result + " \n " ) . encode ( ) )
except socket . timeout :
2026-03-08 14:50:23 +03:00
logger . debug ( " RPC listener socket timeout " )
except OSError as e :
logger . debug ( " RPC listener socket error: %s " , e , exc_info = True )
2026-02-19 23:23:43 -08:00
finally :
if conn :
try :
conn . close ( )
2026-03-10 06:59:20 -07:00
except OSError as e :
logger . debug ( " RPC conn close error: %s " , e )
2026-02-19 23:23:43 -08:00
2026-04-04 12:57:49 -07:00
# ---------------------------------------------------------------------------
# Remote execution support (file-based RPC via terminal backend)
# ---------------------------------------------------------------------------
def _get_or_create_env ( task_id : str ) :
""" Get or create the terminal environment for *task_id*.
Reuses the same environment ( container / sandbox / SSH session ) that the
terminal and file tools use , creating one if it doesn ' t exist yet.
Returns ` ` ( env , env_type ) ` ` tuple .
"""
from tools . terminal_tool import (
_active_environments , _env_lock , _create_environment ,
_get_env_config , _last_activity , _start_cleanup_thread ,
_creation_locks , _creation_locks_lock , _task_env_overrides ,
feat(terminal): collapse subagent task_ids to shared container (#16177)
Before: delegate_task children each allocated their own terminal
sandbox keyed by child task_id. Starting extra containers (or Modal
sandboxes / Daytona workspaces) is expensive, and the subagent's work
is invisible to the parent — files written by the child in its
container don't exist in the parent's when the subagent returns.
After: a single `_resolve_container_task_id` helper maps any
tool-call task_id to "default" UNLESS an env override is registered
for it. The parent agent and all delegate_task children therefore
share one long-lived sandbox — installed packages, cwd, /workspace
files, and /tmp scratch carry over freely between them.
RL and benchmark environments (TerminalBench2, HermesSweEnv, ...)
opt in to isolation via `register_task_env_overrides(task_id, {...})`;
those task_ids survive the collapse and get their own sandbox,
preserving the per-task Docker image behavior these benchmarks rely on.
file_state / active-subagents registry / TUI events still key off the
original child task_id, so the 'subagent wrote a file the parent read'
warning and UI per-subagent panels keep working.
Tradeoff: parallel delegate_task children (tasks=[...]) now share one
bash/container. Concurrent cd, env-var mutations, and writes to the
same path will collide. If that bites a specific workflow, the
subagent can opt back into isolation via register_task_env_overrides.
Applied at four lookup sites:
- tools/terminal_tool.py terminal_tool() and get_active_env()
- tools/file_tools.py _get_file_ops() and _get_live_tracking_cwd()
- tools/code_execution_tool.py _get_or_create_environment()
Docs: website/docs/user-guide/configuration.md updated to reflect the
shared-container reality and document the RL/benchmark carve-out.
Tests: tests/tools/test_shared_container_task_id.py (9 cases).
2026-04-26 11:55:02 -07:00
_resolve_container_task_id ,
2026-04-04 12:57:49 -07:00
)
feat(terminal): collapse subagent task_ids to shared container (#16177)
Before: delegate_task children each allocated their own terminal
sandbox keyed by child task_id. Starting extra containers (or Modal
sandboxes / Daytona workspaces) is expensive, and the subagent's work
is invisible to the parent — files written by the child in its
container don't exist in the parent's when the subagent returns.
After: a single `_resolve_container_task_id` helper maps any
tool-call task_id to "default" UNLESS an env override is registered
for it. The parent agent and all delegate_task children therefore
share one long-lived sandbox — installed packages, cwd, /workspace
files, and /tmp scratch carry over freely between them.
RL and benchmark environments (TerminalBench2, HermesSweEnv, ...)
opt in to isolation via `register_task_env_overrides(task_id, {...})`;
those task_ids survive the collapse and get their own sandbox,
preserving the per-task Docker image behavior these benchmarks rely on.
file_state / active-subagents registry / TUI events still key off the
original child task_id, so the 'subagent wrote a file the parent read'
warning and UI per-subagent panels keep working.
Tradeoff: parallel delegate_task children (tasks=[...]) now share one
bash/container. Concurrent cd, env-var mutations, and writes to the
same path will collide. If that bites a specific workflow, the
subagent can opt back into isolation via register_task_env_overrides.
Applied at four lookup sites:
- tools/terminal_tool.py terminal_tool() and get_active_env()
- tools/file_tools.py _get_file_ops() and _get_live_tracking_cwd()
- tools/code_execution_tool.py _get_or_create_environment()
Docs: website/docs/user-guide/configuration.md updated to reflect the
shared-container reality and document the RL/benchmark carve-out.
Tests: tests/tools/test_shared_container_task_id.py (9 cases).
2026-04-26 11:55:02 -07:00
effective_task_id = _resolve_container_task_id ( task_id )
2026-04-04 12:57:49 -07:00
# Fast path: environment already exists
with _env_lock :
if effective_task_id in _active_environments :
_last_activity [ effective_task_id ] = time . time ( )
return _active_environments [ effective_task_id ] , _get_env_config ( ) [ " env_type " ]
# Slow path: create environment (same pattern as file_tools._get_file_ops)
with _creation_locks_lock :
if effective_task_id not in _creation_locks :
_creation_locks [ effective_task_id ] = threading . Lock ( )
task_lock = _creation_locks [ effective_task_id ]
with task_lock :
with _env_lock :
if effective_task_id in _active_environments :
_last_activity [ effective_task_id ] = time . time ( )
return _active_environments [ effective_task_id ] , _get_env_config ( ) [ " env_type " ]
config = _get_env_config ( )
env_type = config [ " env_type " ]
overrides = _task_env_overrides . get ( effective_task_id , { } )
if env_type == " docker " :
image = overrides . get ( " docker_image " ) or config [ " docker_image " ]
elif env_type == " singularity " :
image = overrides . get ( " singularity_image " ) or config [ " singularity_image " ]
elif env_type == " modal " :
image = overrides . get ( " modal_image " ) or config [ " modal_image " ]
elif env_type == " daytona " :
image = overrides . get ( " daytona_image " ) or config [ " daytona_image " ]
else :
image = " "
cwd = overrides . get ( " cwd " ) or config [ " cwd " ]
container_config = None
2026-05-11 11:13:25 -07:00
if env_type in { " docker " , " singularity " , " modal " , " daytona " , " vercel_sandbox " } :
2026-04-04 12:57:49 -07:00
container_config = {
" container_cpu " : config . get ( " container_cpu " , 1 ) ,
" container_memory " : config . get ( " container_memory " , 5120 ) ,
" container_disk " : config . get ( " container_disk " , 51200 ) ,
" container_persistent " : config . get ( " container_persistent " , True ) ,
2026-04-29 18:20:53 +05:30
" vercel_runtime " : config . get ( " vercel_runtime " , " " ) ,
2026-04-04 12:57:49 -07:00
" docker_volumes " : config . get ( " docker_volumes " , [ ] ) ,
2026-04-29 16:16:43 +10:00
" docker_run_as_host_user " : config . get ( " docker_run_as_host_user " , False ) ,
2026-04-04 12:57:49 -07:00
}
ssh_config = None
if env_type == " ssh " :
ssh_config = {
" host " : config . get ( " ssh_host " , " " ) ,
" user " : config . get ( " ssh_user " , " " ) ,
" port " : config . get ( " ssh_port " , 22 ) ,
" key " : config . get ( " ssh_key " , " " ) ,
" persistent " : config . get ( " ssh_persistent " , False ) ,
}
local_config = None
if env_type == " local " :
local_config = {
" persistent " : config . get ( " local_persistent " , False ) ,
}
logger . info ( " Creating new %s environment for execute_code task %s ... " ,
env_type , effective_task_id [ : 8 ] )
env = _create_environment (
env_type = env_type ,
image = image ,
cwd = cwd ,
timeout = config [ " timeout " ] ,
ssh_config = ssh_config ,
container_config = container_config ,
local_config = local_config ,
task_id = effective_task_id ,
host_cwd = config . get ( " host_cwd " ) ,
)
with _env_lock :
_active_environments [ effective_task_id ] = env
_last_activity [ effective_task_id ] = time . time ( )
_start_cleanup_thread ( )
logger . info ( " %s environment ready for execute_code task %s " ,
env_type , effective_task_id [ : 8 ] )
return env , env_type
def _ship_file_to_remote ( env , remote_path : str , content : str ) - > None :
""" Write *content* to *remote_path* on the remote environment.
Uses ` ` echo … | base64 - d ` ` rather than stdin piping because some
backends ( Modal ) don ' t reliably deliver stdin_data to chained
commands . Base64 output is shell - safe ( [ A - Za - z0 - 9 + / = ] ) so single
quotes are fine .
"""
encoded = base64 . b64encode ( content . encode ( " utf-8 " ) ) . decode ( " ascii " )
2026-04-09 13:46:08 +02:00
quoted_remote_path = shlex . quote ( remote_path )
feat(environments): unified spawn-per-call execution layer
Replace dual execution model (PersistentShellMixin + per-backend oneshot)
with spawn-per-call + session snapshot for all backends except ManagedModal.
Core changes:
- Every command spawns a fresh bash process; session snapshot (env vars,
functions, aliases) captured at init and re-sourced before each command
- CWD persists via file-based read (local) or in-band stdout markers (remote)
- ProcessHandle protocol + _ThreadedProcessHandle adapter for SDK backends
- cancel_fn wired for Modal (sandbox.terminate) and Daytona (sandbox.stop)
- Shared utilities extracted: _pipe_stdin, _popen_bash, _load_json_store,
_save_json_store, _file_mtime_key, _SYNC_INTERVAL_SECONDS
- Rate-limited file sync unified in base _before_execute() with _sync_files() hook
- execute_oneshot() removed; all 11 call sites in code_execution_tool.py
migrated to execute()
- Daytona timeout wrapper replaced with SDK-native timeout parameter
- persistent_shell.py deleted (291 lines)
Backend-specific:
- Local: process-group kill via os.killpg, file-based CWD read
- Docker: -e env flags only on init_session, not per-command
- SSH: shlex.quote transport, ControlMaster connection reuse
- Singularity: apptainer exec with instance://, no forced --pwd
- Modal: _AsyncWorker + _ThreadedProcessHandle, cancel_fn -> sandbox.terminate
- Daytona: SDK-level timeout (not shell wrapper), cancel_fn -> sandbox.stop
- ManagedModal: unchanged (gateway owns execution); docstring added explaining why
2026-04-08 13:38:04 -07:00
env . execute (
2026-04-09 13:46:08 +02:00
f " echo ' { encoded } ' | base64 -d > { quoted_remote_path } " ,
2026-04-04 12:57:49 -07:00
cwd = " / " ,
timeout = 30 ,
)
2026-04-09 13:46:08 +02:00
def _env_temp_dir ( env : Any ) - > str :
""" Return a writable temp dir for env-backed execute_code sandboxes. """
get_temp_dir = getattr ( env , " get_temp_dir " , None )
if callable ( get_temp_dir ) :
try :
temp_dir = get_temp_dir ( )
if isinstance ( temp_dir , str ) and temp_dir . startswith ( " / " ) :
return temp_dir . rstrip ( " / " ) or " / "
except Exception as exc :
logger . debug ( " Could not resolve execute_code env temp dir: %s " , exc )
candidate = tempfile . gettempdir ( )
if isinstance ( candidate , str ) and candidate . startswith ( " / " ) :
return candidate . rstrip ( " / " ) or " / "
return " /tmp "
2026-04-04 12:57:49 -07:00
def _rpc_poll_loop (
env ,
rpc_dir : str ,
task_id : str ,
tool_call_log : list ,
tool_call_counter : list ,
max_tool_calls : int ,
allowed_tools : frozenset ,
stop_event : threading . Event ,
) :
""" Poll the remote filesystem for tool call requests and dispatch them.
feat(environments): unified spawn-per-call execution layer
Replace dual execution model (PersistentShellMixin + per-backend oneshot)
with spawn-per-call + session snapshot for all backends except ManagedModal.
Core changes:
- Every command spawns a fresh bash process; session snapshot (env vars,
functions, aliases) captured at init and re-sourced before each command
- CWD persists via file-based read (local) or in-band stdout markers (remote)
- ProcessHandle protocol + _ThreadedProcessHandle adapter for SDK backends
- cancel_fn wired for Modal (sandbox.terminate) and Daytona (sandbox.stop)
- Shared utilities extracted: _pipe_stdin, _popen_bash, _load_json_store,
_save_json_store, _file_mtime_key, _SYNC_INTERVAL_SECONDS
- Rate-limited file sync unified in base _before_execute() with _sync_files() hook
- execute_oneshot() removed; all 11 call sites in code_execution_tool.py
migrated to execute()
- Daytona timeout wrapper replaced with SDK-native timeout parameter
- persistent_shell.py deleted (291 lines)
Backend-specific:
- Local: process-group kill via os.killpg, file-based CWD read
- Docker: -e env flags only on init_session, not per-command
- SSH: shlex.quote transport, ControlMaster connection reuse
- Singularity: apptainer exec with instance://, no forced --pwd
- Modal: _AsyncWorker + _ThreadedProcessHandle, cancel_fn -> sandbox.terminate
- Daytona: SDK-level timeout (not shell wrapper), cancel_fn -> sandbox.stop
- ManagedModal: unchanged (gateway owns execution); docstring added explaining why
2026-04-08 13:38:04 -07:00
Runs in a background thread . Each ` ` env . execute ( ) ` ` spawns an
independent process , so these calls run safely concurrent with the
script - execution thread .
2026-04-04 12:57:49 -07:00
"""
from model_tools import handle_function_call
poll_interval = 0.1 # 100 ms
2026-04-09 13:46:08 +02:00
quoted_rpc_dir = shlex . quote ( rpc_dir )
2026-04-04 12:57:49 -07:00
while not stop_event . is_set ( ) :
try :
# List pending request files (skip .tmp partials)
feat(environments): unified spawn-per-call execution layer
Replace dual execution model (PersistentShellMixin + per-backend oneshot)
with spawn-per-call + session snapshot for all backends except ManagedModal.
Core changes:
- Every command spawns a fresh bash process; session snapshot (env vars,
functions, aliases) captured at init and re-sourced before each command
- CWD persists via file-based read (local) or in-band stdout markers (remote)
- ProcessHandle protocol + _ThreadedProcessHandle adapter for SDK backends
- cancel_fn wired for Modal (sandbox.terminate) and Daytona (sandbox.stop)
- Shared utilities extracted: _pipe_stdin, _popen_bash, _load_json_store,
_save_json_store, _file_mtime_key, _SYNC_INTERVAL_SECONDS
- Rate-limited file sync unified in base _before_execute() with _sync_files() hook
- execute_oneshot() removed; all 11 call sites in code_execution_tool.py
migrated to execute()
- Daytona timeout wrapper replaced with SDK-native timeout parameter
- persistent_shell.py deleted (291 lines)
Backend-specific:
- Local: process-group kill via os.killpg, file-based CWD read
- Docker: -e env flags only on init_session, not per-command
- SSH: shlex.quote transport, ControlMaster connection reuse
- Singularity: apptainer exec with instance://, no forced --pwd
- Modal: _AsyncWorker + _ThreadedProcessHandle, cancel_fn -> sandbox.terminate
- Daytona: SDK-level timeout (not shell wrapper), cancel_fn -> sandbox.stop
- ManagedModal: unchanged (gateway owns execution); docstring added explaining why
2026-04-08 13:38:04 -07:00
ls_result = env . execute (
2026-04-09 13:46:08 +02:00
f " ls -1 { quoted_rpc_dir } /req_* 2>/dev/null || true " ,
2026-04-04 12:57:49 -07:00
cwd = " / " ,
timeout = 10 ,
)
output = ls_result . get ( " output " , " " ) . strip ( )
if not output :
stop_event . wait ( poll_interval )
continue
req_files = sorted ( [
f . strip ( ) for f in output . split ( " \n " )
if f . strip ( )
and not f . strip ( ) . endswith ( " .tmp " )
and " /req_ " in f . strip ( )
] )
for req_file in req_files :
if stop_event . is_set ( ) :
break
call_start = time . monotonic ( )
2026-04-09 13:46:08 +02:00
quoted_req_file = shlex . quote ( req_file )
2026-04-04 12:57:49 -07:00
# Read request
feat(environments): unified spawn-per-call execution layer
Replace dual execution model (PersistentShellMixin + per-backend oneshot)
with spawn-per-call + session snapshot for all backends except ManagedModal.
Core changes:
- Every command spawns a fresh bash process; session snapshot (env vars,
functions, aliases) captured at init and re-sourced before each command
- CWD persists via file-based read (local) or in-band stdout markers (remote)
- ProcessHandle protocol + _ThreadedProcessHandle adapter for SDK backends
- cancel_fn wired for Modal (sandbox.terminate) and Daytona (sandbox.stop)
- Shared utilities extracted: _pipe_stdin, _popen_bash, _load_json_store,
_save_json_store, _file_mtime_key, _SYNC_INTERVAL_SECONDS
- Rate-limited file sync unified in base _before_execute() with _sync_files() hook
- execute_oneshot() removed; all 11 call sites in code_execution_tool.py
migrated to execute()
- Daytona timeout wrapper replaced with SDK-native timeout parameter
- persistent_shell.py deleted (291 lines)
Backend-specific:
- Local: process-group kill via os.killpg, file-based CWD read
- Docker: -e env flags only on init_session, not per-command
- SSH: shlex.quote transport, ControlMaster connection reuse
- Singularity: apptainer exec with instance://, no forced --pwd
- Modal: _AsyncWorker + _ThreadedProcessHandle, cancel_fn -> sandbox.terminate
- Daytona: SDK-level timeout (not shell wrapper), cancel_fn -> sandbox.stop
- ManagedModal: unchanged (gateway owns execution); docstring added explaining why
2026-04-08 13:38:04 -07:00
read_result = env . execute (
2026-04-09 13:46:08 +02:00
f " cat { quoted_req_file } " ,
2026-04-04 12:57:49 -07:00
cwd = " / " ,
timeout = 10 ,
)
try :
request = json . loads ( read_result . get ( " output " , " " ) )
except ( json . JSONDecodeError , ValueError ) :
logger . debug ( " Malformed RPC request in %s " , req_file )
# Remove bad request to avoid infinite retry
2026-04-09 13:46:08 +02:00
env . execute ( f " rm -f { quoted_req_file } " , cwd = " / " , timeout = 5 )
2026-04-04 12:57:49 -07:00
continue
tool_name = request . get ( " tool " , " " )
tool_args = request . get ( " args " , { } )
seq = request . get ( " seq " , 0 )
seq_str = f " { seq : 06d } "
res_file = f " { rpc_dir } /res_ { seq_str } "
2026-04-09 13:46:08 +02:00
quoted_res_file = shlex . quote ( res_file )
2026-04-04 12:57:49 -07:00
# Enforce allow-list
if tool_name not in allowed_tools :
available = " , " . join ( sorted ( allowed_tools ) )
tool_result = json . dumps ( {
" error " : (
f " Tool ' { tool_name } ' is not available in execute_code. "
f " Available: { available } "
)
} )
# Enforce tool call limit
elif tool_call_counter [ 0 ] > = max_tool_calls :
tool_result = json . dumps ( {
" error " : (
f " Tool call limit reached ( { max_tool_calls } ). "
" No more tool calls allowed in this execution. "
)
} )
else :
# Strip forbidden terminal parameters
if tool_name == " terminal " and isinstance ( tool_args , dict ) :
for param in _TERMINAL_BLOCKED_PARAMS :
tool_args . pop ( param , None )
# Dispatch through the standard tool handler
try :
_real_stdout , _real_stderr = sys . stdout , sys . stderr
codebase: add encoding='utf-8' to all bare open() calls (PLW1514)
Closes the last Python-on-Windows UTF-8 exposure by making every
text-mode open() call explicit about its encoding.
Before: on Windows, bare open(path, 'r') defaults to the system
locale encoding (cp1252 on US-locale installs). That means reading
any config/yaml/markdown/json file with non-ASCII content either
crashes with UnicodeDecodeError or silently mis-decodes bytes.
After: all 89 affected call sites in production code now pass
encoding='utf-8' explicitly. Works identically on every platform
and every locale, no surprise behavior.
Mechanical sweep via:
ruff check --preview --extend-select PLW1514 --unsafe-fixes --fix --exclude 'tests,venv,.venv,node_modules,website,optional-skills, skills,tinker-atropos,plugins' .
All 89 fixes have the same shape: open(x) or open(x, mode) became
open(x, encoding='utf-8') or open(x, mode, encoding='utf-8'). Nothing
else changed. Every modified file still parses and the Windows/sandbox
test suite is still green (85 passed, 14 skipped, 0 failed across
tests/tools/test_code_execution_windows_env.py +
tests/tools/test_code_execution_modes.py + tests/tools/test_env_passthrough.py +
tests/test_hermes_bootstrap.py).
Scope notes:
- tests/ excluded: test fixtures can use locale encoding intentionally
(exercising edge cases). If we want to tighten tests later that's
a separate PR.
- plugins/ excluded: plugin-specific conventions may differ; plugin
authors own their code.
- optional-skills/ and skills/ excluded: skill scripts are user-authored
and we don't want to mass-edit them.
- website/ and tinker-atropos/ excluded: vendored / generated content.
46 files touched, 89 +/- lines (symmetric replacement). No behavior
change on POSIX or on Windows when the file is ASCII; bug fix on
Windows when the file contains non-ASCII.
2026-05-07 19:24:45 -07:00
devnull = open ( os . devnull , " w " , encoding = " utf-8 " )
2026-04-04 12:57:49 -07:00
try :
sys . stdout = devnull
sys . stderr = devnull
tool_result = handle_function_call (
tool_name , tool_args , task_id = task_id
)
finally :
sys . stdout , sys . stderr = _real_stdout , _real_stderr
devnull . close ( )
except Exception as exc :
logger . error ( " Tool call failed in remote sandbox: %s " ,
exc , exc_info = True )
refactor: add tool_error/tool_result helpers + read_raw_config, migrate 129 callsites
Add three reusable helpers to eliminate pervasive boilerplate:
tools/registry.py — tool_error() and tool_result():
Every tool handler returns JSON strings. The pattern
json.dumps({"error": msg}, ensure_ascii=False) appeared 106 times,
and json.dumps({"success": False, "error": msg}, ...) another 23.
Now: tool_error(msg) or tool_error(msg, success=False).
tool_result() handles arbitrary result dicts:
tool_result(success=True, data=payload) or tool_result(some_dict).
hermes_cli/config.py — read_raw_config():
Lightweight YAML reader that returns the raw config dict without
load_config()'s deep-merge + migration overhead. Available for
callsites that just need a single config value.
Migration (129 callsites across 32 files):
- tools/: browser_camofox (18), file_tools (10), homeassistant (8),
web_tools (7), skill_manager (7), cronjob (11), code_execution (4),
delegate (5), send_message (4), tts (4), memory (7), session_search (3),
mcp (2), clarify (2), skills_tool (3), todo (1), vision (1),
browser (1), process_registry (2), image_gen (1)
- plugins/memory/: honcho (9), supermemory (9), hindsight (8),
holographic (7), openviking (7), mem0 (7), byterover (6), retaindb (2)
- agent/: memory_manager (2), builtin_memory_provider (1)
2026-04-07 13:36:20 -07:00
tool_result = tool_error ( str ( exc ) )
2026-04-04 12:57:49 -07:00
tool_call_counter [ 0 ] + = 1
call_duration = time . monotonic ( ) - call_start
tool_call_log . append ( {
" tool " : tool_name ,
" args_preview " : str ( tool_args ) [ : 80 ] ,
" duration " : round ( call_duration , 2 ) ,
} )
# Write response atomically (tmp + rename).
# Use echo piping (not stdin_data) because Modal doesn't
# reliably deliver stdin to chained commands.
encoded_result = base64 . b64encode (
tool_result . encode ( " utf-8 " )
) . decode ( " ascii " )
feat(environments): unified spawn-per-call execution layer
Replace dual execution model (PersistentShellMixin + per-backend oneshot)
with spawn-per-call + session snapshot for all backends except ManagedModal.
Core changes:
- Every command spawns a fresh bash process; session snapshot (env vars,
functions, aliases) captured at init and re-sourced before each command
- CWD persists via file-based read (local) or in-band stdout markers (remote)
- ProcessHandle protocol + _ThreadedProcessHandle adapter for SDK backends
- cancel_fn wired for Modal (sandbox.terminate) and Daytona (sandbox.stop)
- Shared utilities extracted: _pipe_stdin, _popen_bash, _load_json_store,
_save_json_store, _file_mtime_key, _SYNC_INTERVAL_SECONDS
- Rate-limited file sync unified in base _before_execute() with _sync_files() hook
- execute_oneshot() removed; all 11 call sites in code_execution_tool.py
migrated to execute()
- Daytona timeout wrapper replaced with SDK-native timeout parameter
- persistent_shell.py deleted (291 lines)
Backend-specific:
- Local: process-group kill via os.killpg, file-based CWD read
- Docker: -e env flags only on init_session, not per-command
- SSH: shlex.quote transport, ControlMaster connection reuse
- Singularity: apptainer exec with instance://, no forced --pwd
- Modal: _AsyncWorker + _ThreadedProcessHandle, cancel_fn -> sandbox.terminate
- Daytona: SDK-level timeout (not shell wrapper), cancel_fn -> sandbox.stop
- ManagedModal: unchanged (gateway owns execution); docstring added explaining why
2026-04-08 13:38:04 -07:00
env . execute (
2026-04-09 13:46:08 +02:00
f " echo ' { encoded_result } ' | base64 -d > { quoted_res_file } .tmp "
f " && mv { quoted_res_file } .tmp { quoted_res_file } " ,
2026-04-04 12:57:49 -07:00
cwd = " / " ,
timeout = 60 ,
)
# Remove the request file
2026-04-09 13:46:08 +02:00
env . execute ( f " rm -f { quoted_req_file } " , cwd = " / " , timeout = 5 )
2026-04-04 12:57:49 -07:00
except Exception as e :
if not stop_event . is_set ( ) :
logger . debug ( " RPC poll error: %s " , e , exc_info = True )
if not stop_event . is_set ( ) :
stop_event . wait ( poll_interval )
def _execute_remote (
code : str ,
task_id : Optional [ str ] ,
enabled_tools : Optional [ List [ str ] ] ,
) - > str :
""" Run a script on the remote terminal backend via file-based RPC.
The script and the generated hermes_tools . py module are shipped to
the remote environment , and tool calls are proxied through a polling
thread that communicates via request / response files .
"""
_cfg = _load_config ( )
timeout = _cfg . get ( " timeout " , DEFAULT_TIMEOUT )
max_tool_calls = _cfg . get ( " max_tool_calls " , DEFAULT_MAX_TOOL_CALLS )
session_tools = set ( enabled_tools ) if enabled_tools else set ( )
sandbox_tools = frozenset ( SANDBOX_ALLOWED_TOOLS & session_tools )
if not sandbox_tools :
sandbox_tools = SANDBOX_ALLOWED_TOOLS
effective_task_id = task_id or " default "
env , env_type = _get_or_create_env ( effective_task_id )
sandbox_id = uuid . uuid4 ( ) . hex [ : 12 ]
2026-04-09 13:46:08 +02:00
temp_dir = _env_temp_dir ( env )
sandbox_dir = f " { temp_dir } /hermes_exec_ { sandbox_id } "
quoted_sandbox_dir = shlex . quote ( sandbox_dir )
quoted_rpc_dir = shlex . quote ( f " { sandbox_dir } /rpc " )
2026-04-04 12:57:49 -07:00
tool_call_log : list = [ ]
tool_call_counter = [ 0 ]
exec_start = time . monotonic ( )
stop_event = threading . Event ( )
rpc_thread = None
try :
# Verify Python is available on the remote
feat(environments): unified spawn-per-call execution layer
Replace dual execution model (PersistentShellMixin + per-backend oneshot)
with spawn-per-call + session snapshot for all backends except ManagedModal.
Core changes:
- Every command spawns a fresh bash process; session snapshot (env vars,
functions, aliases) captured at init and re-sourced before each command
- CWD persists via file-based read (local) or in-band stdout markers (remote)
- ProcessHandle protocol + _ThreadedProcessHandle adapter for SDK backends
- cancel_fn wired for Modal (sandbox.terminate) and Daytona (sandbox.stop)
- Shared utilities extracted: _pipe_stdin, _popen_bash, _load_json_store,
_save_json_store, _file_mtime_key, _SYNC_INTERVAL_SECONDS
- Rate-limited file sync unified in base _before_execute() with _sync_files() hook
- execute_oneshot() removed; all 11 call sites in code_execution_tool.py
migrated to execute()
- Daytona timeout wrapper replaced with SDK-native timeout parameter
- persistent_shell.py deleted (291 lines)
Backend-specific:
- Local: process-group kill via os.killpg, file-based CWD read
- Docker: -e env flags only on init_session, not per-command
- SSH: shlex.quote transport, ControlMaster connection reuse
- Singularity: apptainer exec with instance://, no forced --pwd
- Modal: _AsyncWorker + _ThreadedProcessHandle, cancel_fn -> sandbox.terminate
- Daytona: SDK-level timeout (not shell wrapper), cancel_fn -> sandbox.stop
- ManagedModal: unchanged (gateway owns execution); docstring added explaining why
2026-04-08 13:38:04 -07:00
py_check = env . execute (
2026-04-04 12:57:49 -07:00
" command -v python3 >/dev/null 2>&1 && echo OK " ,
cwd = " / " , timeout = 15 ,
)
if " OK " not in py_check . get ( " output " , " " ) :
return json . dumps ( {
" status " : " error " ,
" error " : (
f " Python 3 is not available in the { env_type } terminal "
" environment. Install Python to use execute_code with "
" remote backends. "
) ,
" tool_calls_made " : 0 ,
" duration_seconds " : 0 ,
} )
# Create sandbox directory on remote
feat(environments): unified spawn-per-call execution layer
Replace dual execution model (PersistentShellMixin + per-backend oneshot)
with spawn-per-call + session snapshot for all backends except ManagedModal.
Core changes:
- Every command spawns a fresh bash process; session snapshot (env vars,
functions, aliases) captured at init and re-sourced before each command
- CWD persists via file-based read (local) or in-band stdout markers (remote)
- ProcessHandle protocol + _ThreadedProcessHandle adapter for SDK backends
- cancel_fn wired for Modal (sandbox.terminate) and Daytona (sandbox.stop)
- Shared utilities extracted: _pipe_stdin, _popen_bash, _load_json_store,
_save_json_store, _file_mtime_key, _SYNC_INTERVAL_SECONDS
- Rate-limited file sync unified in base _before_execute() with _sync_files() hook
- execute_oneshot() removed; all 11 call sites in code_execution_tool.py
migrated to execute()
- Daytona timeout wrapper replaced with SDK-native timeout parameter
- persistent_shell.py deleted (291 lines)
Backend-specific:
- Local: process-group kill via os.killpg, file-based CWD read
- Docker: -e env flags only on init_session, not per-command
- SSH: shlex.quote transport, ControlMaster connection reuse
- Singularity: apptainer exec with instance://, no forced --pwd
- Modal: _AsyncWorker + _ThreadedProcessHandle, cancel_fn -> sandbox.terminate
- Daytona: SDK-level timeout (not shell wrapper), cancel_fn -> sandbox.stop
- ManagedModal: unchanged (gateway owns execution); docstring added explaining why
2026-04-08 13:38:04 -07:00
env . execute (
2026-04-09 13:46:08 +02:00
f " mkdir -p { quoted_rpc_dir } " , cwd = " / " , timeout = 10 ,
2026-04-04 12:57:49 -07:00
)
# Generate and ship files
tools_src = generate_hermes_tools_module (
list ( sandbox_tools ) , transport = " file " ,
)
_ship_file_to_remote ( env , f " { sandbox_dir } /hermes_tools.py " , tools_src )
_ship_file_to_remote ( env , f " { sandbox_dir } /script.py " , code )
# Start RPC polling thread
rpc_thread = threading . Thread (
target = _rpc_poll_loop ,
args = (
env , f " { sandbox_dir } /rpc " , effective_task_id ,
tool_call_log , tool_call_counter , max_tool_calls ,
sandbox_tools , stop_event ,
) ,
daemon = True ,
)
rpc_thread . start ( )
# Build environment variable prefix for the script
env_prefix = (
2026-04-09 13:46:08 +02:00
f " HERMES_RPC_DIR= { shlex . quote ( f ' { sandbox_dir } /rpc ' ) } "
2026-04-04 12:57:49 -07:00
f " PYTHONDONTWRITEBYTECODE=1 "
)
tz = os . getenv ( " HERMES_TIMEZONE " , " " ) . strip ( )
if tz :
env_prefix + = f " TZ= { tz } "
# Execute the script on the remote backend
logger . info ( " Executing code on %s backend (task %s )... " ,
env_type , effective_task_id [ : 8 ] )
script_result = env . execute (
2026-04-09 13:46:08 +02:00
f " cd { quoted_sandbox_dir } && { env_prefix } python3 script.py " ,
2026-04-04 12:57:49 -07:00
timeout = timeout ,
)
stdout_text = script_result . get ( " output " , " " )
exit_code = script_result . get ( " returncode " , - 1 )
status = " success "
# Check for timeout/interrupt from the backend
if exit_code == 124 :
status = " timeout "
elif exit_code == 130 :
status = " interrupted "
except Exception as exc :
duration = round ( time . monotonic ( ) - exec_start , 2 )
logger . error (
" execute_code remote failed after %s s with %d tool calls: %s : %s " ,
duration , tool_call_counter [ 0 ] , type ( exc ) . __name__ , exc ,
exc_info = True ,
)
return json . dumps ( {
" status " : " error " ,
" error " : str ( exc ) ,
" tool_calls_made " : tool_call_counter [ 0 ] ,
" duration_seconds " : duration ,
} , ensure_ascii = False )
finally :
# Stop the polling thread
stop_event . set ( )
if rpc_thread is not None :
rpc_thread . join ( timeout = 5 )
# Clean up remote sandbox dir
try :
feat(environments): unified spawn-per-call execution layer
Replace dual execution model (PersistentShellMixin + per-backend oneshot)
with spawn-per-call + session snapshot for all backends except ManagedModal.
Core changes:
- Every command spawns a fresh bash process; session snapshot (env vars,
functions, aliases) captured at init and re-sourced before each command
- CWD persists via file-based read (local) or in-band stdout markers (remote)
- ProcessHandle protocol + _ThreadedProcessHandle adapter for SDK backends
- cancel_fn wired for Modal (sandbox.terminate) and Daytona (sandbox.stop)
- Shared utilities extracted: _pipe_stdin, _popen_bash, _load_json_store,
_save_json_store, _file_mtime_key, _SYNC_INTERVAL_SECONDS
- Rate-limited file sync unified in base _before_execute() with _sync_files() hook
- execute_oneshot() removed; all 11 call sites in code_execution_tool.py
migrated to execute()
- Daytona timeout wrapper replaced with SDK-native timeout parameter
- persistent_shell.py deleted (291 lines)
Backend-specific:
- Local: process-group kill via os.killpg, file-based CWD read
- Docker: -e env flags only on init_session, not per-command
- SSH: shlex.quote transport, ControlMaster connection reuse
- Singularity: apptainer exec with instance://, no forced --pwd
- Modal: _AsyncWorker + _ThreadedProcessHandle, cancel_fn -> sandbox.terminate
- Daytona: SDK-level timeout (not shell wrapper), cancel_fn -> sandbox.stop
- ManagedModal: unchanged (gateway owns execution); docstring added explaining why
2026-04-08 13:38:04 -07:00
env . execute (
2026-04-09 13:46:08 +02:00
f " rm -rf { quoted_sandbox_dir } " , cwd = " / " , timeout = 15 ,
2026-04-04 12:57:49 -07:00
)
except Exception :
logger . debug ( " Failed to clean up remote sandbox %s " , sandbox_dir )
duration = round ( time . monotonic ( ) - exec_start , 2 )
# --- Post-process output (same as local path) ---
# Truncate stdout to cap
if len ( stdout_text ) > MAX_STDOUT_BYTES :
head_bytes = int ( MAX_STDOUT_BYTES * 0.4 )
tail_bytes = MAX_STDOUT_BYTES - head_bytes
head = stdout_text [ : head_bytes ]
tail = stdout_text [ - tail_bytes : ]
omitted = len ( stdout_text ) - len ( head ) - len ( tail )
stdout_text = (
head
+ f " \n \n ... [OUTPUT TRUNCATED - { omitted : , } chars omitted "
f " out of { len ( stdout_text ) : , } total] ... \n \n "
+ tail
)
# Strip ANSI escape sequences
from tools . ansi_strip import strip_ansi
stdout_text = strip_ansi ( stdout_text )
# Redact secrets
from agent . redact import redact_sensitive_text
stdout_text = redact_sensitive_text ( stdout_text )
# Build response
result : Dict [ str , Any ] = {
" status " : status ,
" output " : stdout_text ,
" tool_calls_made " : tool_call_counter [ 0 ] ,
" duration_seconds " : duration ,
}
if status == " timeout " :
2026-04-16 09:17:24 +02:00
timeout_msg = f " Script timed out after { timeout } s and was killed. "
result [ " error " ] = timeout_msg
# Include timeout message in output so the LLM always surfaces it
# to the user (see local path comment — same reasoning, #10807).
if stdout_text :
result [ " output " ] = stdout_text + f " \n \n ⏰ { timeout_msg } "
else :
result [ " output " ] = f " ⏰ { timeout_msg } "
logger . warning (
" execute_code (remote) timed out after %s s (limit %s s) with %d tool calls " ,
duration , timeout , tool_call_counter [ 0 ] ,
)
2026-04-04 12:57:49 -07:00
elif status == " interrupted " :
result [ " output " ] = (
stdout_text + " \n [execution interrupted — user sent a new message] "
)
elif exit_code != 0 :
result [ " status " ] = " error "
result [ " error " ] = f " Script exited with code { exit_code } "
return json . dumps ( result , ensure_ascii = False )
2026-02-19 23:23:43 -08:00
# ---------------------------------------------------------------------------
# Main entry point
# ---------------------------------------------------------------------------
def execute_code (
code : str ,
task_id : Optional [ str ] = None ,
enabled_tools : Optional [ List [ str ] ] = None ,
) - > str :
"""
Run a Python script in a sandboxed child process with RPC access
to a subset of Hermes tools .
2026-04-04 12:57:49 -07:00
Dispatches to the local ( UDS ) or remote ( file - based RPC ) path
depending on the configured terminal backend .
2026-02-19 23:23:43 -08:00
Args :
code : Python source code to execute .
task_id : Session task ID for tool isolation ( terminal env , etc . ) .
enabled_tools : Tool names enabled in the current session . The sandbox
gets the intersection with SANDBOX_ALLOWED_TOOLS .
Returns :
JSON string with execution results .
"""
if not SANDBOX_AVAILABLE :
return json . dumps ( {
2026-05-07 18:17:31 -07:00
" error " : " execute_code sandbox is unavailable in this environment. "
" Use normal tool calls (terminal, read_file, write_file, ...) instead. "
2026-02-19 23:23:43 -08:00
} )
if not code or not code . strip ( ) :
refactor: add tool_error/tool_result helpers + read_raw_config, migrate 129 callsites
Add three reusable helpers to eliminate pervasive boilerplate:
tools/registry.py — tool_error() and tool_result():
Every tool handler returns JSON strings. The pattern
json.dumps({"error": msg}, ensure_ascii=False) appeared 106 times,
and json.dumps({"success": False, "error": msg}, ...) another 23.
Now: tool_error(msg) or tool_error(msg, success=False).
tool_result() handles arbitrary result dicts:
tool_result(success=True, data=payload) or tool_result(some_dict).
hermes_cli/config.py — read_raw_config():
Lightweight YAML reader that returns the raw config dict without
load_config()'s deep-merge + migration overhead. Available for
callsites that just need a single config value.
Migration (129 callsites across 32 files):
- tools/: browser_camofox (18), file_tools (10), homeassistant (8),
web_tools (7), skill_manager (7), cronjob (11), code_execution (4),
delegate (5), send_message (4), tts (4), memory (7), session_search (3),
mcp (2), clarify (2), skills_tool (3), todo (1), vision (1),
browser (1), process_registry (2), image_gen (1)
- plugins/memory/: honcho (9), supermemory (9), hindsight (8),
holographic (7), openviking (7), mem0 (7), byterover (6), retaindb (2)
- agent/: memory_manager (2), builtin_memory_provider (1)
2026-04-07 13:36:20 -07:00
return tool_error ( " No code provided. " )
2026-02-19 23:23:43 -08:00
2026-04-04 12:57:49 -07:00
# Dispatch: remote backends use file-based RPC, local uses UDS
from tools . terminal_tool import _get_env_config
env_type = _get_env_config ( ) [ " env_type " ]
if env_type != " local " :
return _execute_remote ( code , task_id , enabled_tools )
# --- Local execution path (UDS) --- below this line is unchanged ---
2026-04-11 14:02:58 -07:00
# Import per-thread interrupt check (cooperative cancellation)
from tools . interrupt import is_interrupted as _is_interrupted
2026-02-19 23:23:43 -08:00
# Resolve config
_cfg = _load_config ( )
timeout = _cfg . get ( " timeout " , DEFAULT_TIMEOUT )
max_tool_calls = _cfg . get ( " max_tool_calls " , DEFAULT_MAX_TOOL_CALLS )
# Determine which tools the sandbox can call
session_tools = set ( enabled_tools ) if enabled_tools else set ( )
sandbox_tools = frozenset ( SANDBOX_ALLOWED_TOOLS & session_tools )
if not sandbox_tools :
sandbox_tools = SANDBOX_ALLOWED_TOOLS
# --- Set up temp directory with hermes_tools.py and script.py ---
tmpdir = tempfile . mkdtemp ( prefix = " hermes_sandbox_ " )
2026-03-08 19:31:23 -07:00
# Use /tmp on macOS to avoid the long /var/folders/... path that pushes
# Unix domain socket paths past the 104-byte macOS AF_UNIX limit.
# On Linux, tempfile.gettempdir() already returns /tmp.
feat(windows): close remaining POSIX-only landmines — TUI crash, kanban waitpid, AF_UNIX sandbox, /bin/bash, npm .cmd shims, cwd tracking, detach flags
Second pass on native Windows support, driven by a systematic audit across
five areas: POSIX-only primitives (signal.SIGKILL/SIGHUP/SIGPIPE, os.WNOHANG,
os.setsid), path translation bugs (/c/Users → C:\Users), subprocess patterns
(npm.cmd batch shims, start_new_session no-op on Windows), subsystem health
(cron, gateway daemon, update flow), and module-level import guards.
Every change is platform-gated — POSIX (Linux/macOS) behaviour is preserved
bit-identical. Explicit "do no harm" test: test_posix_path_preserved_on_linux,
test_posix_noop, test_windows_detach_popen_kwargs_is_posix_equivalent_on_posix.
## New module
- hermes_cli/_subprocess_compat.py — shared helpers (resolve_node_command,
windows_detach_flags, windows_hide_flags, windows_detach_popen_kwargs).
All no-ops on non-Windows.
## CRITICAL fixes (would crash or silently break on Windows)
- tui_gateway/entry.py: SIGPIPE/SIGHUP referenced at module top level would
AttributeError on import on Windows, breaking `hermes --tui` entirely (it
spawns this module as a subprocess). Guard each signal.signal() call with
hasattr() and add SIGBREAK as Windows' SIGHUP equivalent.
- hermes_cli/kanban_db.py: os.waitpid(-1, os.WNOHANG) in dispatcher tick was
unguarded. os.WNOHANG doesn't exist on Windows. Gate the whole reap loop
behind `os.name != "nt"` — Windows has no zombies anyway.
- tools/code_execution_tool.py: AF_UNIX socket for execute_code RPC fails on
most Windows builds. Fall back to loopback TCP (AF_INET on 127.0.0.1:0
ephemeral port) when _IS_WINDOWS. HERMES_RPC_SOCKET env var now accepts
either a filesystem path (POSIX) or `tcp://127.0.0.1:<port>` (Windows).
Generated sandbox client parses both.
- cron/scheduler.py: `argv = ["/bin/bash", str(path)]` hardcoded. Use
shutil.which("bash") so Windows (Git Bash via MinGit) works, with a
readable error when bash is genuinely absent.
- 6 bare npm/npx spawn sites: tools_config.py x2, doctor.py, whatsapp.py
(npm install + node version probe), browser_tool.py x2. On Windows npm
is npm.cmd / npx is npx.cmd (batch shims); subprocess.Popen(["npm", ...])
fails with WinError 193. shutil.which(...) returns the absolute .cmd
path which CreateProcessW accepts because the extension routes through
cmd.exe /c. POSIX behaviour unchanged (shutil.which still returns the
same path subprocess would resolve itself).
## HIGH fixes (silent misbehaviour on Windows)
- tools/environments/local.py get_temp_dir: hardcoded /tmp returned on
Windows meant `_cwd_file = "/tmp/hermes-cwd-*.txt"`, which bash wrote
via MSYS2's virtual /tmp but native Python couldn't open. Result: cwd
tracking silently broken — `cd` in terminal tool did nothing. Windows
branch now returns `%HERMES_HOME%/cache/terminal` with forward slashes
(works in both bash and Python, guaranteed no spaces).
- tools/environments/local.py _make_run_env PATH injection: `/usr/bin not
in split(":")` heuristic mangles Windows PATH (";" separator). Gate
the injection behind `not _IS_WINDOWS`.
- hermes_cli/gateway.py launch_detached_profile_gateway_restart: outer
Popen + watcher-script Popen both used start_new_session=True, which
Windows silently ignores. Watcher stayed attached to CLI's console,
died when user closed terminal after `hermes update`, left gateway
stale. Now branches through windows_detach_popen_kwargs() helper
(CREATE_NEW_PROCESS_GROUP | DETACHED_PROCESS | CREATE_NO_WINDOW on
Windows, start_new_session=True on POSIX — identical to main).
## MEDIUM fixes
- gateway/run.py /restart and /update handlers: hardcoded bash/setsid
chain crashes on Windows when user triggers /update in-gateway. Now
has sys.platform=="win32" branch using sys.executable + a tiny
Python watcher with proper detach flags. POSIX path is unchanged.
- cli.py _git_repo_root: Git on Windows sometimes returns /c/Users/...
style paths that break subprocess.Popen(cwd=...) and Path().resolve().
Added _normalize_git_bash_path() helper that translates /c/Users,
/cygdrive/c, /mnt/c variants to native C:\Users form. POSIX no-op.
_git_repo_root() now routes every result through it.
- cli.py worktree .worktreeinclude: os.symlink on directories failed
hard on Windows (requires admin or Developer Mode). Falls back to
shutil.copytree with a warning log.
## Tests
- 29 new tests in tests/tools/test_windows_native_support.py covering:
subprocess_compat helpers, TUI entry signal guards, kanban waitpid
guard, code_execution TCP fallback source-level invariants, cron bash
resolution, npm/npx bare-spawn lint per-file, local env Windows temp
dir, PATH injection gating, git bash path normalization, symlink
fallback, gateway detached watcher flags.
- One existing test assertion adjusted in test_browser_homebrew_paths:
it compared captured Popen argv to the BARE `"npx"` literal; after the
shutil.which() change argv[0] is the absolute path. New assertion
checks the shape (two items, second is `agent-browser`) rather than
the exact first-item string. Behaviour unchanged; test was too strict.
All 56 tests pass on Linux (30 from previous commits + 26 new).
267 tests from the affected files/dirs (browser, code_exec, local_env,
process_registry, kanban_db, windows_compat) all pass — zero regressions.
tests/hermes_cli/ (3909 pass) and tests/gateway/ (5021 pass) unchanged;
all pre-existing test failures confirmed unrelated via `git stash` re-run.
## What's still deferred (LOW priority)
- Visible cmd-window flashes on short-lived console apps (~14 sites) —
cosmetic, needs a follow-up pass once we have user reports.
- agent/file_safety.py POSIX-only security deny patterns — separate
hardening task.
- tools/process_registry.py returning "/tmp" as fallback — theoretical;
reachable only when all env-var candidates fail.
2026-05-07 17:29:31 -07:00
#
# Windows: Python 3.9+ added partial AF_UNIX support but the file-backed
# variant is flaky across Windows builds (requires Windows 10 1803+,
# still fails under some configurations, and the socket file can't live
# on the same temp drive as the script). Fall back to loopback TCP —
# same ephemeral port, same 1-connection listen queue, same serialized
# request/response framing. The generated client reads the transport
# selector from HERMES_RPC_SOCKET (path vs. ``tcp://host:port``).
2026-03-08 19:31:23 -07:00
_sock_tmpdir = " /tmp " if sys . platform == " darwin " else tempfile . gettempdir ( )
feat(windows): close remaining POSIX-only landmines — TUI crash, kanban waitpid, AF_UNIX sandbox, /bin/bash, npm .cmd shims, cwd tracking, detach flags
Second pass on native Windows support, driven by a systematic audit across
five areas: POSIX-only primitives (signal.SIGKILL/SIGHUP/SIGPIPE, os.WNOHANG,
os.setsid), path translation bugs (/c/Users → C:\Users), subprocess patterns
(npm.cmd batch shims, start_new_session no-op on Windows), subsystem health
(cron, gateway daemon, update flow), and module-level import guards.
Every change is platform-gated — POSIX (Linux/macOS) behaviour is preserved
bit-identical. Explicit "do no harm" test: test_posix_path_preserved_on_linux,
test_posix_noop, test_windows_detach_popen_kwargs_is_posix_equivalent_on_posix.
## New module
- hermes_cli/_subprocess_compat.py — shared helpers (resolve_node_command,
windows_detach_flags, windows_hide_flags, windows_detach_popen_kwargs).
All no-ops on non-Windows.
## CRITICAL fixes (would crash or silently break on Windows)
- tui_gateway/entry.py: SIGPIPE/SIGHUP referenced at module top level would
AttributeError on import on Windows, breaking `hermes --tui` entirely (it
spawns this module as a subprocess). Guard each signal.signal() call with
hasattr() and add SIGBREAK as Windows' SIGHUP equivalent.
- hermes_cli/kanban_db.py: os.waitpid(-1, os.WNOHANG) in dispatcher tick was
unguarded. os.WNOHANG doesn't exist on Windows. Gate the whole reap loop
behind `os.name != "nt"` — Windows has no zombies anyway.
- tools/code_execution_tool.py: AF_UNIX socket for execute_code RPC fails on
most Windows builds. Fall back to loopback TCP (AF_INET on 127.0.0.1:0
ephemeral port) when _IS_WINDOWS. HERMES_RPC_SOCKET env var now accepts
either a filesystem path (POSIX) or `tcp://127.0.0.1:<port>` (Windows).
Generated sandbox client parses both.
- cron/scheduler.py: `argv = ["/bin/bash", str(path)]` hardcoded. Use
shutil.which("bash") so Windows (Git Bash via MinGit) works, with a
readable error when bash is genuinely absent.
- 6 bare npm/npx spawn sites: tools_config.py x2, doctor.py, whatsapp.py
(npm install + node version probe), browser_tool.py x2. On Windows npm
is npm.cmd / npx is npx.cmd (batch shims); subprocess.Popen(["npm", ...])
fails with WinError 193. shutil.which(...) returns the absolute .cmd
path which CreateProcessW accepts because the extension routes through
cmd.exe /c. POSIX behaviour unchanged (shutil.which still returns the
same path subprocess would resolve itself).
## HIGH fixes (silent misbehaviour on Windows)
- tools/environments/local.py get_temp_dir: hardcoded /tmp returned on
Windows meant `_cwd_file = "/tmp/hermes-cwd-*.txt"`, which bash wrote
via MSYS2's virtual /tmp but native Python couldn't open. Result: cwd
tracking silently broken — `cd` in terminal tool did nothing. Windows
branch now returns `%HERMES_HOME%/cache/terminal` with forward slashes
(works in both bash and Python, guaranteed no spaces).
- tools/environments/local.py _make_run_env PATH injection: `/usr/bin not
in split(":")` heuristic mangles Windows PATH (";" separator). Gate
the injection behind `not _IS_WINDOWS`.
- hermes_cli/gateway.py launch_detached_profile_gateway_restart: outer
Popen + watcher-script Popen both used start_new_session=True, which
Windows silently ignores. Watcher stayed attached to CLI's console,
died when user closed terminal after `hermes update`, left gateway
stale. Now branches through windows_detach_popen_kwargs() helper
(CREATE_NEW_PROCESS_GROUP | DETACHED_PROCESS | CREATE_NO_WINDOW on
Windows, start_new_session=True on POSIX — identical to main).
## MEDIUM fixes
- gateway/run.py /restart and /update handlers: hardcoded bash/setsid
chain crashes on Windows when user triggers /update in-gateway. Now
has sys.platform=="win32" branch using sys.executable + a tiny
Python watcher with proper detach flags. POSIX path is unchanged.
- cli.py _git_repo_root: Git on Windows sometimes returns /c/Users/...
style paths that break subprocess.Popen(cwd=...) and Path().resolve().
Added _normalize_git_bash_path() helper that translates /c/Users,
/cygdrive/c, /mnt/c variants to native C:\Users form. POSIX no-op.
_git_repo_root() now routes every result through it.
- cli.py worktree .worktreeinclude: os.symlink on directories failed
hard on Windows (requires admin or Developer Mode). Falls back to
shutil.copytree with a warning log.
## Tests
- 29 new tests in tests/tools/test_windows_native_support.py covering:
subprocess_compat helpers, TUI entry signal guards, kanban waitpid
guard, code_execution TCP fallback source-level invariants, cron bash
resolution, npm/npx bare-spawn lint per-file, local env Windows temp
dir, PATH injection gating, git bash path normalization, symlink
fallback, gateway detached watcher flags.
- One existing test assertion adjusted in test_browser_homebrew_paths:
it compared captured Popen argv to the BARE `"npx"` literal; after the
shutil.which() change argv[0] is the absolute path. New assertion
checks the shape (two items, second is `agent-browser`) rather than
the exact first-item string. Behaviour unchanged; test was too strict.
All 56 tests pass on Linux (30 from previous commits + 26 new).
267 tests from the affected files/dirs (browser, code_exec, local_env,
process_registry, kanban_db, windows_compat) all pass — zero regressions.
tests/hermes_cli/ (3909 pass) and tests/gateway/ (5021 pass) unchanged;
all pre-existing test failures confirmed unrelated via `git stash` re-run.
## What's still deferred (LOW priority)
- Visible cmd-window flashes on short-lived console apps (~14 sites) —
cosmetic, needs a follow-up pass once we have user reports.
- agent/file_safety.py POSIX-only security deny patterns — separate
hardening task.
- tools/process_registry.py returning "/tmp" as fallback — theoretical;
reachable only when all env-var candidates fail.
2026-05-07 17:29:31 -07:00
_use_tcp_rpc = _IS_WINDOWS
if _use_tcp_rpc :
sock_path = None # not used on Windows; TCP endpoint stored below
rpc_endpoint = None # set after bind()
else :
sock_path = os . path . join ( _sock_tmpdir , f " hermes_rpc_ { uuid . uuid4 ( ) . hex } .sock " )
rpc_endpoint = sock_path
2026-02-19 23:23:43 -08:00
tool_call_log : list = [ ]
tool_call_counter = [ 0 ] # mutable so the RPC thread can increment
exec_start = time . monotonic ( )
2026-03-16 23:13:26 -07:00
server_sock = None
2026-02-19 23:23:43 -08:00
try :
2026-05-07 18:52:59 -07:00
# Write the auto-generated hermes_tools module.
# encoding="utf-8" is required on Windows — the stub and user code
# both contain non-ASCII characters (em-dashes in docstrings, plus
# whatever the user script carries). Python's default open() uses
# the system locale on Windows (cp1252 typically), which corrupts
# those bytes; the child then fails to import with a SyntaxError
# ("'utf-8' codec can't decode byte 0x97 in position ...") because
# Python source files are decoded as UTF-8 by default (PEP 3120).
2026-03-10 06:35:28 -07:00
# sandbox_tools is already the correct set (intersection with session
# tools, or SANDBOX_ALLOWED_TOOLS as fallback — see lines above).
tools_src = generate_hermes_tools_module ( list ( sandbox_tools ) )
2026-05-07 18:52:59 -07:00
with open ( os . path . join ( tmpdir , " hermes_tools.py " ) , " w " , encoding = " utf-8 " ) as f :
2026-02-19 23:23:43 -08:00
f . write ( tools_src )
# Write the user's script
2026-05-07 18:52:59 -07:00
with open ( os . path . join ( tmpdir , " script.py " ) , " w " , encoding = " utf-8 " ) as f :
2026-02-19 23:23:43 -08:00
f . write ( code )
feat(windows): close remaining POSIX-only landmines — TUI crash, kanban waitpid, AF_UNIX sandbox, /bin/bash, npm .cmd shims, cwd tracking, detach flags
Second pass on native Windows support, driven by a systematic audit across
five areas: POSIX-only primitives (signal.SIGKILL/SIGHUP/SIGPIPE, os.WNOHANG,
os.setsid), path translation bugs (/c/Users → C:\Users), subprocess patterns
(npm.cmd batch shims, start_new_session no-op on Windows), subsystem health
(cron, gateway daemon, update flow), and module-level import guards.
Every change is platform-gated — POSIX (Linux/macOS) behaviour is preserved
bit-identical. Explicit "do no harm" test: test_posix_path_preserved_on_linux,
test_posix_noop, test_windows_detach_popen_kwargs_is_posix_equivalent_on_posix.
## New module
- hermes_cli/_subprocess_compat.py — shared helpers (resolve_node_command,
windows_detach_flags, windows_hide_flags, windows_detach_popen_kwargs).
All no-ops on non-Windows.
## CRITICAL fixes (would crash or silently break on Windows)
- tui_gateway/entry.py: SIGPIPE/SIGHUP referenced at module top level would
AttributeError on import on Windows, breaking `hermes --tui` entirely (it
spawns this module as a subprocess). Guard each signal.signal() call with
hasattr() and add SIGBREAK as Windows' SIGHUP equivalent.
- hermes_cli/kanban_db.py: os.waitpid(-1, os.WNOHANG) in dispatcher tick was
unguarded. os.WNOHANG doesn't exist on Windows. Gate the whole reap loop
behind `os.name != "nt"` — Windows has no zombies anyway.
- tools/code_execution_tool.py: AF_UNIX socket for execute_code RPC fails on
most Windows builds. Fall back to loopback TCP (AF_INET on 127.0.0.1:0
ephemeral port) when _IS_WINDOWS. HERMES_RPC_SOCKET env var now accepts
either a filesystem path (POSIX) or `tcp://127.0.0.1:<port>` (Windows).
Generated sandbox client parses both.
- cron/scheduler.py: `argv = ["/bin/bash", str(path)]` hardcoded. Use
shutil.which("bash") so Windows (Git Bash via MinGit) works, with a
readable error when bash is genuinely absent.
- 6 bare npm/npx spawn sites: tools_config.py x2, doctor.py, whatsapp.py
(npm install + node version probe), browser_tool.py x2. On Windows npm
is npm.cmd / npx is npx.cmd (batch shims); subprocess.Popen(["npm", ...])
fails with WinError 193. shutil.which(...) returns the absolute .cmd
path which CreateProcessW accepts because the extension routes through
cmd.exe /c. POSIX behaviour unchanged (shutil.which still returns the
same path subprocess would resolve itself).
## HIGH fixes (silent misbehaviour on Windows)
- tools/environments/local.py get_temp_dir: hardcoded /tmp returned on
Windows meant `_cwd_file = "/tmp/hermes-cwd-*.txt"`, which bash wrote
via MSYS2's virtual /tmp but native Python couldn't open. Result: cwd
tracking silently broken — `cd` in terminal tool did nothing. Windows
branch now returns `%HERMES_HOME%/cache/terminal` with forward slashes
(works in both bash and Python, guaranteed no spaces).
- tools/environments/local.py _make_run_env PATH injection: `/usr/bin not
in split(":")` heuristic mangles Windows PATH (";" separator). Gate
the injection behind `not _IS_WINDOWS`.
- hermes_cli/gateway.py launch_detached_profile_gateway_restart: outer
Popen + watcher-script Popen both used start_new_session=True, which
Windows silently ignores. Watcher stayed attached to CLI's console,
died when user closed terminal after `hermes update`, left gateway
stale. Now branches through windows_detach_popen_kwargs() helper
(CREATE_NEW_PROCESS_GROUP | DETACHED_PROCESS | CREATE_NO_WINDOW on
Windows, start_new_session=True on POSIX — identical to main).
## MEDIUM fixes
- gateway/run.py /restart and /update handlers: hardcoded bash/setsid
chain crashes on Windows when user triggers /update in-gateway. Now
has sys.platform=="win32" branch using sys.executable + a tiny
Python watcher with proper detach flags. POSIX path is unchanged.
- cli.py _git_repo_root: Git on Windows sometimes returns /c/Users/...
style paths that break subprocess.Popen(cwd=...) and Path().resolve().
Added _normalize_git_bash_path() helper that translates /c/Users,
/cygdrive/c, /mnt/c variants to native C:\Users form. POSIX no-op.
_git_repo_root() now routes every result through it.
- cli.py worktree .worktreeinclude: os.symlink on directories failed
hard on Windows (requires admin or Developer Mode). Falls back to
shutil.copytree with a warning log.
## Tests
- 29 new tests in tests/tools/test_windows_native_support.py covering:
subprocess_compat helpers, TUI entry signal guards, kanban waitpid
guard, code_execution TCP fallback source-level invariants, cron bash
resolution, npm/npx bare-spawn lint per-file, local env Windows temp
dir, PATH injection gating, git bash path normalization, symlink
fallback, gateway detached watcher flags.
- One existing test assertion adjusted in test_browser_homebrew_paths:
it compared captured Popen argv to the BARE `"npx"` literal; after the
shutil.which() change argv[0] is the absolute path. New assertion
checks the shape (two items, second is `agent-browser`) rather than
the exact first-item string. Behaviour unchanged; test was too strict.
All 56 tests pass on Linux (30 from previous commits + 26 new).
267 tests from the affected files/dirs (browser, code_exec, local_env,
process_registry, kanban_db, windows_compat) all pass — zero regressions.
tests/hermes_cli/ (3909 pass) and tests/gateway/ (5021 pass) unchanged;
all pre-existing test failures confirmed unrelated via `git stash` re-run.
## What's still deferred (LOW priority)
- Visible cmd-window flashes on short-lived console apps (~14 sites) —
cosmetic, needs a follow-up pass once we have user reports.
- agent/file_safety.py POSIX-only security deny patterns — separate
hardening task.
- tools/process_registry.py returning "/tmp" as fallback — theoretical;
reachable only when all env-var candidates fail.
2026-05-07 17:29:31 -07:00
# --- Start RPC server ---
# Two transports:
# POSIX: AF_UNIX stream socket on sock_path, chmod 0600 for
# owner-only access. Filesystem permissions gate the socket.
# Windows: AF_INET stream socket on 127.0.0.1 with an ephemeral
# port. No filesystem permission story, but loopback-only bind
# means only the current user's processes (not remote) can
# connect. HERMES_RPC_SOCKET is set to ``tcp://127.0.0.1:<port>``
# which the generated client parses to pick AF_INET.
if _use_tcp_rpc :
server_sock = socket . socket ( socket . AF_INET , socket . SOCK_STREAM )
server_sock . bind ( ( " 127.0.0.1 " , 0 ) ) # ephemeral port
_host , _port = server_sock . getsockname ( ) [ : 2 ]
rpc_endpoint = f " tcp:// { _host } : { _port } "
else :
server_sock = socket . socket ( socket . AF_UNIX , socket . SOCK_STREAM )
server_sock . bind ( sock_path )
os . chmod ( sock_path , 0o600 )
2026-02-19 23:23:43 -08:00
server_sock . listen ( 1 )
rpc_thread = threading . Thread (
target = _rpc_server_loop ,
args = (
server_sock , task_id , tool_call_log ,
tool_call_counter , max_tool_calls , sandbox_tools ,
) ,
daemon = True ,
)
rpc_thread . start ( )
# --- Spawn child process ---
2026-02-25 21:16:15 -08:00
# Build a minimal environment for the child. We intentionally exclude
# API keys and tokens to prevent credential exfiltration from LLM-
# generated scripts. The child accesses tools via RPC, not direct API.
feat: env var passthrough for skills and user config (#2807)
* feat: env var passthrough for skills and user config
Skills that declare required_environment_variables now have those vars
passed through to sandboxed execution environments (execute_code and
terminal). Previously, execute_code stripped all vars containing KEY,
TOKEN, SECRET, etc. and the terminal blocklist removed Hermes
infrastructure vars — both blocked skill-declared env vars.
Two passthrough sources:
1. Skill-scoped (automatic): when a skill is loaded via skill_view and
declares required_environment_variables, vars that are present in
the environment are registered in a session-scoped passthrough set.
2. Config-based (manual): terminal.env_passthrough in config.yaml lets
users explicitly allowlist vars for non-skill use cases.
Changes:
- New module: tools/env_passthrough.py — shared passthrough registry
- hermes_cli/config.py: add terminal.env_passthrough to DEFAULT_CONFIG
- tools/skills_tool.py: register available skill env vars on load
- tools/code_execution_tool.py: check passthrough before filtering
- tools/environments/local.py: check passthrough in _sanitize_subprocess_env
and _make_run_env
- 19 new tests covering all layers
* docs: add environment variable passthrough documentation
Document the env var passthrough feature across four docs pages:
- security.md: new 'Environment Variable Passthrough' section with
full explanation, comparison table, and security considerations
- code-execution.md: update security section, add passthrough subsection,
fix comparison table
- creating-skills.md: add tip about automatic sandbox passthrough
- skills.md: add note about passthrough after secure setup docs
Live-tested: launched interactive CLI, loaded a skill with
required_environment_variables, verified TEST_SKILL_SECRET_KEY was
accessible inside execute_code sandbox (value: passthrough-test-value-42).
2026-03-24 08:19:34 -07:00
# Exception: env vars declared by loaded skills (via env_passthrough
# registry) or explicitly allowed by the user in config.yaml
2026-05-07 18:39:38 -07:00
# (terminal.env_passthrough) are passed through. On Windows, a small
# OS-essential allowlist (SYSTEMROOT, WINDIR, COMSPEC, ...) is also
# passed through — without those, the child can't create a socket
# or spawn a subprocess. See ``_scrub_child_env`` for the rules.
child_env = _scrub_child_env ( os . environ )
feat(windows): close remaining POSIX-only landmines — TUI crash, kanban waitpid, AF_UNIX sandbox, /bin/bash, npm .cmd shims, cwd tracking, detach flags
Second pass on native Windows support, driven by a systematic audit across
five areas: POSIX-only primitives (signal.SIGKILL/SIGHUP/SIGPIPE, os.WNOHANG,
os.setsid), path translation bugs (/c/Users → C:\Users), subprocess patterns
(npm.cmd batch shims, start_new_session no-op on Windows), subsystem health
(cron, gateway daemon, update flow), and module-level import guards.
Every change is platform-gated — POSIX (Linux/macOS) behaviour is preserved
bit-identical. Explicit "do no harm" test: test_posix_path_preserved_on_linux,
test_posix_noop, test_windows_detach_popen_kwargs_is_posix_equivalent_on_posix.
## New module
- hermes_cli/_subprocess_compat.py — shared helpers (resolve_node_command,
windows_detach_flags, windows_hide_flags, windows_detach_popen_kwargs).
All no-ops on non-Windows.
## CRITICAL fixes (would crash or silently break on Windows)
- tui_gateway/entry.py: SIGPIPE/SIGHUP referenced at module top level would
AttributeError on import on Windows, breaking `hermes --tui` entirely (it
spawns this module as a subprocess). Guard each signal.signal() call with
hasattr() and add SIGBREAK as Windows' SIGHUP equivalent.
- hermes_cli/kanban_db.py: os.waitpid(-1, os.WNOHANG) in dispatcher tick was
unguarded. os.WNOHANG doesn't exist on Windows. Gate the whole reap loop
behind `os.name != "nt"` — Windows has no zombies anyway.
- tools/code_execution_tool.py: AF_UNIX socket for execute_code RPC fails on
most Windows builds. Fall back to loopback TCP (AF_INET on 127.0.0.1:0
ephemeral port) when _IS_WINDOWS. HERMES_RPC_SOCKET env var now accepts
either a filesystem path (POSIX) or `tcp://127.0.0.1:<port>` (Windows).
Generated sandbox client parses both.
- cron/scheduler.py: `argv = ["/bin/bash", str(path)]` hardcoded. Use
shutil.which("bash") so Windows (Git Bash via MinGit) works, with a
readable error when bash is genuinely absent.
- 6 bare npm/npx spawn sites: tools_config.py x2, doctor.py, whatsapp.py
(npm install + node version probe), browser_tool.py x2. On Windows npm
is npm.cmd / npx is npx.cmd (batch shims); subprocess.Popen(["npm", ...])
fails with WinError 193. shutil.which(...) returns the absolute .cmd
path which CreateProcessW accepts because the extension routes through
cmd.exe /c. POSIX behaviour unchanged (shutil.which still returns the
same path subprocess would resolve itself).
## HIGH fixes (silent misbehaviour on Windows)
- tools/environments/local.py get_temp_dir: hardcoded /tmp returned on
Windows meant `_cwd_file = "/tmp/hermes-cwd-*.txt"`, which bash wrote
via MSYS2's virtual /tmp but native Python couldn't open. Result: cwd
tracking silently broken — `cd` in terminal tool did nothing. Windows
branch now returns `%HERMES_HOME%/cache/terminal` with forward slashes
(works in both bash and Python, guaranteed no spaces).
- tools/environments/local.py _make_run_env PATH injection: `/usr/bin not
in split(":")` heuristic mangles Windows PATH (";" separator). Gate
the injection behind `not _IS_WINDOWS`.
- hermes_cli/gateway.py launch_detached_profile_gateway_restart: outer
Popen + watcher-script Popen both used start_new_session=True, which
Windows silently ignores. Watcher stayed attached to CLI's console,
died when user closed terminal after `hermes update`, left gateway
stale. Now branches through windows_detach_popen_kwargs() helper
(CREATE_NEW_PROCESS_GROUP | DETACHED_PROCESS | CREATE_NO_WINDOW on
Windows, start_new_session=True on POSIX — identical to main).
## MEDIUM fixes
- gateway/run.py /restart and /update handlers: hardcoded bash/setsid
chain crashes on Windows when user triggers /update in-gateway. Now
has sys.platform=="win32" branch using sys.executable + a tiny
Python watcher with proper detach flags. POSIX path is unchanged.
- cli.py _git_repo_root: Git on Windows sometimes returns /c/Users/...
style paths that break subprocess.Popen(cwd=...) and Path().resolve().
Added _normalize_git_bash_path() helper that translates /c/Users,
/cygdrive/c, /mnt/c variants to native C:\Users form. POSIX no-op.
_git_repo_root() now routes every result through it.
- cli.py worktree .worktreeinclude: os.symlink on directories failed
hard on Windows (requires admin or Developer Mode). Falls back to
shutil.copytree with a warning log.
## Tests
- 29 new tests in tests/tools/test_windows_native_support.py covering:
subprocess_compat helpers, TUI entry signal guards, kanban waitpid
guard, code_execution TCP fallback source-level invariants, cron bash
resolution, npm/npx bare-spawn lint per-file, local env Windows temp
dir, PATH injection gating, git bash path normalization, symlink
fallback, gateway detached watcher flags.
- One existing test assertion adjusted in test_browser_homebrew_paths:
it compared captured Popen argv to the BARE `"npx"` literal; after the
shutil.which() change argv[0] is the absolute path. New assertion
checks the shape (two items, second is `agent-browser`) rather than
the exact first-item string. Behaviour unchanged; test was too strict.
All 56 tests pass on Linux (30 from previous commits + 26 new).
267 tests from the affected files/dirs (browser, code_exec, local_env,
process_registry, kanban_db, windows_compat) all pass — zero regressions.
tests/hermes_cli/ (3909 pass) and tests/gateway/ (5021 pass) unchanged;
all pre-existing test failures confirmed unrelated via `git stash` re-run.
## What's still deferred (LOW priority)
- Visible cmd-window flashes on short-lived console apps (~14 sites) —
cosmetic, needs a follow-up pass once we have user reports.
- agent/file_safety.py POSIX-only security deny patterns — separate
hardening task.
- tools/process_registry.py returning "/tmp" as fallback — theoretical;
reachable only when all env-var candidates fail.
2026-05-07 17:29:31 -07:00
child_env [ " HERMES_RPC_SOCKET " ] = rpc_endpoint
2026-02-19 23:23:43 -08:00
child_env [ " PYTHONDONTWRITEBYTECODE " ] = " 1 "
2026-05-07 18:59:35 -07:00
# Force UTF-8 for the child's stdio and default file encoding.
#
# Without this, on Windows sys.stdout is bound to the console code
# page (cp1252 on US-locale installs), and any script that does
# ``print("café")`` or ``print("→")`` crashes with:
#
# UnicodeEncodeError: 'charmap' codec can't encode character
# '\u2192' in position N: character maps to <undefined>
#
# PYTHONIOENCODING fixes sys.stdin/stdout/stderr.
# PYTHONUTF8=1 enables "UTF-8 mode" (PEP 540) which additionally
# makes ``open()``'s default encoding UTF-8, so user scripts that
# write files without specifying encoding= also work correctly.
#
# On POSIX both values usually match the locale default already,
# so setting them is harmless belt-and-suspenders for environments
# with a C/POSIX locale (containers, minimal base images).
child_env [ " PYTHONIOENCODING " ] = " utf-8 "
child_env [ " PYTHONUTF8 " ] = " 1 "
2026-03-14 15:23:09 +01:00
# Ensure the hermes-agent root is importable in the sandbox so
feat(execute_code): add project/strict execution modes, default to project (#11971)
Weaker models (Gemma-class) repeatedly rediscover and forget that
execute_code uses a different CWD and Python interpreter than terminal(),
causing them to flip-flop on whether user files exist and to hit import
errors on project dependencies like pandas.
Adds a new 'code_execution.mode' config key (default 'project') that
brings execute_code into line with terminal()'s filesystem/interpreter:
project (new default):
- cwd = session's TERMINAL_CWD (falls back to os.getcwd())
- python = active VIRTUAL_ENV/bin/python or CONDA_PREFIX/bin/python
with a Python 3.8+ version check; falls back cleanly to
sys.executable if no venv or the candidate fails
- result : 'import pandas' works, '.env' resolves, matches terminal()
strict (opt-in):
- cwd = staging tmpdir (today's behavior)
- python = sys.executable (today's behavior)
- result : maximum reproducibility and isolation; project deps
won't resolve
Security-critical invariants are identical across both modes and covered by
explicit regression tests:
- env scrubbing (strips *_API_KEY, *_TOKEN, *_SECRET, *_PASSWORD,
*_CREDENTIAL, *_PASSWD, *_AUTH substrings)
- SANDBOX_ALLOWED_TOOLS whitelist (no execute_code recursion, no
delegate_task, no MCP from inside scripts)
- resource caps (5-min timeout, 50KB stdout, 50 tool calls)
Deliberately avoids 'sandbox'/'isolated'/'cloud' language in tool
descriptions (regression from commit 39b83f34 where agents on local
backends falsely believed they were sandboxed and refused networking).
Override via env var: HERMES_EXECUTE_CODE_MODE=strict|project
2026-04-18 01:46:25 -07:00
# repo-root modules are available to child scripts. We also prepend
# the staging tmpdir so ``from hermes_tools import ...`` resolves even
# when the subprocess CWD is not tmpdir (project mode).
2026-03-14 15:23:09 +01:00
_hermes_root = os . path . dirname ( os . path . dirname ( os . path . abspath ( __file__ ) ) )
_existing_pp = child_env . get ( " PYTHONPATH " , " " )
feat(execute_code): add project/strict execution modes, default to project (#11971)
Weaker models (Gemma-class) repeatedly rediscover and forget that
execute_code uses a different CWD and Python interpreter than terminal(),
causing them to flip-flop on whether user files exist and to hit import
errors on project dependencies like pandas.
Adds a new 'code_execution.mode' config key (default 'project') that
brings execute_code into line with terminal()'s filesystem/interpreter:
project (new default):
- cwd = session's TERMINAL_CWD (falls back to os.getcwd())
- python = active VIRTUAL_ENV/bin/python or CONDA_PREFIX/bin/python
with a Python 3.8+ version check; falls back cleanly to
sys.executable if no venv or the candidate fails
- result : 'import pandas' works, '.env' resolves, matches terminal()
strict (opt-in):
- cwd = staging tmpdir (today's behavior)
- python = sys.executable (today's behavior)
- result : maximum reproducibility and isolation; project deps
won't resolve
Security-critical invariants are identical across both modes and covered by
explicit regression tests:
- env scrubbing (strips *_API_KEY, *_TOKEN, *_SECRET, *_PASSWORD,
*_CREDENTIAL, *_PASSWD, *_AUTH substrings)
- SANDBOX_ALLOWED_TOOLS whitelist (no execute_code recursion, no
delegate_task, no MCP from inside scripts)
- resource caps (5-min timeout, 50KB stdout, 50 tool calls)
Deliberately avoids 'sandbox'/'isolated'/'cloud' language in tool
descriptions (regression from commit 39b83f34 where agents on local
backends falsely believed they were sandboxed and refused networking).
Override via env var: HERMES_EXECUTE_CODE_MODE=strict|project
2026-04-18 01:46:25 -07:00
_pp_parts = [ tmpdir , _hermes_root ]
if _existing_pp :
_pp_parts . append ( _existing_pp )
child_env [ " PYTHONPATH " ] = os . pathsep . join ( _pp_parts )
2026-03-03 11:57:18 +05:30
# Inject user's configured timezone so datetime.now() in sandboxed
2026-04-16 02:26:14 -07:00
# code reflects the correct wall-clock time. Only TZ is set —
# HERMES_TIMEZONE is an internal Hermes setting and must not leak
# into child processes.
2026-03-03 11:57:18 +05:30
_tz_name = os . getenv ( " HERMES_TIMEZONE " , " " ) . strip ( )
if _tz_name :
child_env [ " TZ " ] = _tz_name
2026-04-16 02:26:14 -07:00
child_env . pop ( " HERMES_TIMEZONE " , None )
2026-02-19 23:23:43 -08:00
2026-04-10 13:37:45 -07:00
# Per-profile HOME isolation: redirect system tool configs into
# {HERMES_HOME}/home/ when that directory exists.
from hermes_constants import get_subprocess_home
_profile_home = get_subprocess_home ( )
if _profile_home :
child_env [ " HOME " ] = _profile_home
feat(execute_code): add project/strict execution modes, default to project (#11971)
Weaker models (Gemma-class) repeatedly rediscover and forget that
execute_code uses a different CWD and Python interpreter than terminal(),
causing them to flip-flop on whether user files exist and to hit import
errors on project dependencies like pandas.
Adds a new 'code_execution.mode' config key (default 'project') that
brings execute_code into line with terminal()'s filesystem/interpreter:
project (new default):
- cwd = session's TERMINAL_CWD (falls back to os.getcwd())
- python = active VIRTUAL_ENV/bin/python or CONDA_PREFIX/bin/python
with a Python 3.8+ version check; falls back cleanly to
sys.executable if no venv or the candidate fails
- result : 'import pandas' works, '.env' resolves, matches terminal()
strict (opt-in):
- cwd = staging tmpdir (today's behavior)
- python = sys.executable (today's behavior)
- result : maximum reproducibility and isolation; project deps
won't resolve
Security-critical invariants are identical across both modes and covered by
explicit regression tests:
- env scrubbing (strips *_API_KEY, *_TOKEN, *_SECRET, *_PASSWORD,
*_CREDENTIAL, *_PASSWD, *_AUTH substrings)
- SANDBOX_ALLOWED_TOOLS whitelist (no execute_code recursion, no
delegate_task, no MCP from inside scripts)
- resource caps (5-min timeout, 50KB stdout, 50 tool calls)
Deliberately avoids 'sandbox'/'isolated'/'cloud' language in tool
descriptions (regression from commit 39b83f34 where agents on local
backends falsely believed they were sandboxed and refused networking).
Override via env var: HERMES_EXECUTE_CODE_MODE=strict|project
2026-04-18 01:46:25 -07:00
# Resolve interpreter + CWD based on execute_code mode.
# - strict : today's behavior (sys.executable + tmpdir CWD).
# - project: user's venv python + session's working directory, so
# project deps like pandas and user files resolve.
# Env scrubbing and tool whitelist apply identically in both modes.
_mode = _get_execution_mode ( )
_child_python = _resolve_child_python ( _mode )
_child_cwd = _resolve_child_cwd ( _mode , tmpdir )
_script_path = os . path . join ( tmpdir , " script.py " )
2026-02-19 23:23:43 -08:00
proc = subprocess . Popen (
feat(execute_code): add project/strict execution modes, default to project (#11971)
Weaker models (Gemma-class) repeatedly rediscover and forget that
execute_code uses a different CWD and Python interpreter than terminal(),
causing them to flip-flop on whether user files exist and to hit import
errors on project dependencies like pandas.
Adds a new 'code_execution.mode' config key (default 'project') that
brings execute_code into line with terminal()'s filesystem/interpreter:
project (new default):
- cwd = session's TERMINAL_CWD (falls back to os.getcwd())
- python = active VIRTUAL_ENV/bin/python or CONDA_PREFIX/bin/python
with a Python 3.8+ version check; falls back cleanly to
sys.executable if no venv or the candidate fails
- result : 'import pandas' works, '.env' resolves, matches terminal()
strict (opt-in):
- cwd = staging tmpdir (today's behavior)
- python = sys.executable (today's behavior)
- result : maximum reproducibility and isolation; project deps
won't resolve
Security-critical invariants are identical across both modes and covered by
explicit regression tests:
- env scrubbing (strips *_API_KEY, *_TOKEN, *_SECRET, *_PASSWORD,
*_CREDENTIAL, *_PASSWD, *_AUTH substrings)
- SANDBOX_ALLOWED_TOOLS whitelist (no execute_code recursion, no
delegate_task, no MCP from inside scripts)
- resource caps (5-min timeout, 50KB stdout, 50 tool calls)
Deliberately avoids 'sandbox'/'isolated'/'cloud' language in tool
descriptions (regression from commit 39b83f34 where agents on local
backends falsely believed they were sandboxed and refused networking).
Override via env var: HERMES_EXECUTE_CODE_MODE=strict|project
2026-04-18 01:46:25 -07:00
[ _child_python , _script_path ] ,
cwd = _child_cwd ,
2026-02-19 23:23:43 -08:00
env = child_env ,
stdout = subprocess . PIPE ,
stderr = subprocess . PIPE ,
stdin = subprocess . DEVNULL ,
2026-03-01 01:54:27 +03:00
preexec_fn = None if _IS_WINDOWS else os . setsid ,
2026-05-16 12:06:09 -04:00
creationflags = subprocess . CREATE_NO_WINDOW if _IS_WINDOWS else 0 ,
2026-02-19 23:23:43 -08:00
)
# --- Poll loop: watch for exit, timeout, and interrupt ---
deadline = time . monotonic ( ) + timeout
stderr_chunks : list = [ ]
2026-03-11 00:26:13 -07:00
# Background readers to avoid pipe buffer deadlocks.
# For stdout we use a head+tail strategy: keep the first HEAD_BYTES
# and a rolling window of the last TAIL_BYTES so the final print()
# output is never lost. Stderr keeps head-only (errors appear early).
_STDOUT_HEAD_BYTES = int ( MAX_STDOUT_BYTES * 0.4 ) # 40% head
_STDOUT_TAIL_BYTES = MAX_STDOUT_BYTES - _STDOUT_HEAD_BYTES # 60% tail
2026-02-19 23:23:43 -08:00
def _drain ( pipe , chunks , max_bytes ) :
2026-03-11 00:26:13 -07:00
""" Simple head-only drain (used for stderr). """
2026-02-19 23:23:43 -08:00
total = 0
try :
while True :
data = pipe . read ( 4096 )
if not data :
break
if total < max_bytes :
keep = max_bytes - total
chunks . append ( data [ : keep ] )
total + = len ( data )
2026-03-08 14:50:23 +03:00
except ( ValueError , OSError ) as e :
logger . debug ( " Error reading process output: %s " , e , exc_info = True )
2026-02-19 23:23:43 -08:00
2026-03-11 00:26:13 -07:00
stdout_total_bytes = [ 0 ] # mutable ref for total bytes seen
def _drain_head_tail ( pipe , head_chunks , tail_chunks , head_bytes , tail_bytes , total_ref ) :
""" Drain stdout keeping both head and tail data. """
head_collected = 0
from collections import deque
tail_buf = deque ( )
tail_collected = 0
try :
while True :
data = pipe . read ( 4096 )
if not data :
break
total_ref [ 0 ] + = len ( data )
# Fill head buffer first
if head_collected < head_bytes :
keep = min ( len ( data ) , head_bytes - head_collected )
head_chunks . append ( data [ : keep ] )
head_collected + = keep
data = data [ keep : ] # remaining goes to tail
if not data :
continue
# Everything past head goes into rolling tail buffer
tail_buf . append ( data )
tail_collected + = len ( data )
# Evict old tail data to stay within tail_bytes budget
while tail_collected > tail_bytes and tail_buf :
oldest = tail_buf . popleft ( )
tail_collected - = len ( oldest )
except ( ValueError , OSError ) :
pass
# Transfer final tail to output list
tail_chunks . extend ( tail_buf )
stdout_head_chunks : list = [ ]
stdout_tail_chunks : list = [ ]
2026-02-19 23:23:43 -08:00
stdout_reader = threading . Thread (
2026-03-11 00:26:13 -07:00
target = _drain_head_tail ,
args = ( proc . stdout , stdout_head_chunks , stdout_tail_chunks ,
_STDOUT_HEAD_BYTES , _STDOUT_TAIL_BYTES , stdout_total_bytes ) ,
daemon = True
2026-02-19 23:23:43 -08:00
)
stderr_reader = threading . Thread (
target = _drain , args = ( proc . stderr , stderr_chunks , MAX_STDERR_BYTES ) , daemon = True
)
stdout_reader . start ( )
stderr_reader . start ( )
status = " success "
2026-04-16 19:01:56 +05:30
_activity_state = {
" last_touch " : time . monotonic ( ) ,
" start " : exec_start ,
}
2026-02-19 23:23:43 -08:00
while proc . poll ( ) is None :
2026-04-11 14:02:58 -07:00
if _is_interrupted ( ) :
2026-02-19 23:23:43 -08:00
_kill_process_group ( proc )
status = " interrupted "
break
if time . monotonic ( ) > deadline :
_kill_process_group ( proc , escalate = True )
status = " timeout "
break
2026-04-16 09:17:24 +02:00
# Periodic activity touch so the gateway's inactivity timeout
# doesn't kill the agent during long code execution (#10807).
2026-04-16 19:01:56 +05:30
try :
from tools . environments . base import touch_activity_if_due
touch_activity_if_due ( _activity_state , " execute_code running " )
except Exception :
pass
2026-02-19 23:23:43 -08:00
time . sleep ( 0.2 )
# Wait for readers to finish draining
stdout_reader . join ( timeout = 3 )
stderr_reader . join ( timeout = 3 )
2026-03-11 00:26:13 -07:00
stdout_head = b " " . join ( stdout_head_chunks ) . decode ( " utf-8 " , errors = " replace " )
stdout_tail = b " " . join ( stdout_tail_chunks ) . decode ( " utf-8 " , errors = " replace " )
2026-02-19 23:23:43 -08:00
stderr_text = b " " . join ( stderr_chunks ) . decode ( " utf-8 " , errors = " replace " )
2026-03-11 00:26:13 -07:00
# Assemble stdout with head+tail truncation
total_stdout = stdout_total_bytes [ 0 ]
if total_stdout > MAX_STDOUT_BYTES and stdout_tail :
omitted = total_stdout - len ( stdout_head ) - len ( stdout_tail )
truncated_notice = (
f " \n \n ... [OUTPUT TRUNCATED - { omitted : , } chars omitted "
f " out of { total_stdout : , } total] ... \n \n "
)
stdout_text = stdout_head + truncated_notice + stdout_tail
else :
stdout_text = stdout_head + stdout_tail
2026-02-19 23:23:43 -08:00
exit_code = proc . returncode if proc . returncode is not None else - 1
duration = round ( time . monotonic ( ) - exec_start , 2 )
# Wait for RPC thread to finish
2026-03-09 23:40:20 -07:00
server_sock . close ( ) # break accept() so thread exits promptly
2026-03-21 15:55:25 -07:00
server_sock = None # prevent double close in finally
2026-02-19 23:23:43 -08:00
rpc_thread . join ( timeout = 3 )
fix: strip ANSI at the source — clean terminal output before it reaches the model
Root cause: terminal_tool, execute_code, and process_registry returned raw
subprocess output with ANSI escape sequences intact. The model saw these
in tool results and copied them into file writes.
Previous fix (PR #2532) stripped ANSI at the write point in file_tools.py,
but this was a band-aid — regex on file content risks corrupting legitimate
content, and doesn't prevent ANSI from wasting tokens in the model context.
Source-level fix:
- New tools/ansi_strip.py with comprehensive ECMA-48 regex covering CSI
(incl. private-mode, colon-separated, intermediate bytes), OSC (both
terminators), DCS/SOS/PM/APC strings, Fp/Fe/Fs/nF escapes, 8-bit C1
- terminal_tool.py: strip output before returning to model
- code_execution_tool.py: strip stdout/stderr before returning
- process_registry.py: strip output in poll/read_log/wait
- file_tools.py: remove _strip_ansi band-aid (no longer needed)
Verified: `ls --color=always` output returned as clean text to model,
file written from that output contains zero ESC bytes.
2026-03-23 06:50:39 -07:00
# Strip ANSI escape sequences so the model never sees terminal
# formatting — prevents it from copying escapes into file writes.
from tools . ansi_strip import strip_ansi
stdout_text = strip_ansi ( stdout_text )
stderr_text = strip_ansi ( stderr_text )
2026-03-31 18:52:11 -07:00
# Redact secrets (API keys, tokens, etc.) from sandbox output.
# The sandbox env-var filter (lines 434-454) blocks os.environ access,
# but scripts can still read secrets from disk (e.g. open('~/.hermes/.env')).
# This ensures leaked secrets never enter the model context.
from agent . redact import redact_sensitive_text
stdout_text = redact_sensitive_text ( stdout_text )
stderr_text = redact_sensitive_text ( stderr_text )
2026-02-19 23:23:43 -08:00
# Build response
result : Dict [ str , Any ] = {
" status " : status ,
" output " : stdout_text ,
" tool_calls_made " : tool_call_counter [ 0 ] ,
" duration_seconds " : duration ,
}
if status == " timeout " :
2026-04-16 09:17:24 +02:00
timeout_msg = f " Script timed out after { timeout } s and was killed. "
result [ " error " ] = timeout_msg
# Include timeout message in output so the LLM always surfaces it
# to the user. When output is empty, models often treat the result
# as "nothing happened" and produce an empty response, which the
# gateway stream consumer silently drops (#10807).
if stdout_text :
result [ " output " ] = stdout_text + f " \n \n ⏰ { timeout_msg } "
else :
result [ " output " ] = f " ⏰ { timeout_msg } "
logger . warning (
" execute_code timed out after %s s (limit %s s) with %d tool calls " ,
duration , timeout , tool_call_counter [ 0 ] ,
)
2026-02-19 23:23:43 -08:00
elif status == " interrupted " :
result [ " output " ] = stdout_text + " \n [execution interrupted — user sent a new message] "
elif exit_code != 0 :
result [ " status " ] = " error "
result [ " error " ] = stderr_text or f " Script exited with code { exit_code } "
# Include stderr in output so the LLM sees the traceback
if stderr_text :
result [ " output " ] = stdout_text + " \n --- stderr --- \n " + stderr_text
return json . dumps ( result , ensure_ascii = False )
except Exception as exc :
duration = round ( time . monotonic ( ) - exec_start , 2 )
2026-03-16 23:13:26 -07:00
logger . error (
" execute_code failed after %s s with %d tool calls: %s : %s " ,
duration ,
tool_call_counter [ 0 ] ,
type ( exc ) . __name__ ,
exc ,
exc_info = True ,
)
2026-02-19 23:23:43 -08:00
return json . dumps ( {
" status " : " error " ,
" error " : str ( exc ) ,
" tool_calls_made " : tool_call_counter [ 0 ] ,
" duration_seconds " : duration ,
} , ensure_ascii = False )
finally :
# Cleanup temp dir and socket
2026-03-16 23:13:26 -07:00
if server_sock is not None :
try :
server_sock . close ( )
except OSError as e :
logger . debug ( " Server socket close error: %s " , e )
import shutil
shutil . rmtree ( tmpdir , ignore_errors = True )
2026-02-19 23:23:43 -08:00
try :
feat(windows): close remaining POSIX-only landmines — TUI crash, kanban waitpid, AF_UNIX sandbox, /bin/bash, npm .cmd shims, cwd tracking, detach flags
Second pass on native Windows support, driven by a systematic audit across
five areas: POSIX-only primitives (signal.SIGKILL/SIGHUP/SIGPIPE, os.WNOHANG,
os.setsid), path translation bugs (/c/Users → C:\Users), subprocess patterns
(npm.cmd batch shims, start_new_session no-op on Windows), subsystem health
(cron, gateway daemon, update flow), and module-level import guards.
Every change is platform-gated — POSIX (Linux/macOS) behaviour is preserved
bit-identical. Explicit "do no harm" test: test_posix_path_preserved_on_linux,
test_posix_noop, test_windows_detach_popen_kwargs_is_posix_equivalent_on_posix.
## New module
- hermes_cli/_subprocess_compat.py — shared helpers (resolve_node_command,
windows_detach_flags, windows_hide_flags, windows_detach_popen_kwargs).
All no-ops on non-Windows.
## CRITICAL fixes (would crash or silently break on Windows)
- tui_gateway/entry.py: SIGPIPE/SIGHUP referenced at module top level would
AttributeError on import on Windows, breaking `hermes --tui` entirely (it
spawns this module as a subprocess). Guard each signal.signal() call with
hasattr() and add SIGBREAK as Windows' SIGHUP equivalent.
- hermes_cli/kanban_db.py: os.waitpid(-1, os.WNOHANG) in dispatcher tick was
unguarded. os.WNOHANG doesn't exist on Windows. Gate the whole reap loop
behind `os.name != "nt"` — Windows has no zombies anyway.
- tools/code_execution_tool.py: AF_UNIX socket for execute_code RPC fails on
most Windows builds. Fall back to loopback TCP (AF_INET on 127.0.0.1:0
ephemeral port) when _IS_WINDOWS. HERMES_RPC_SOCKET env var now accepts
either a filesystem path (POSIX) or `tcp://127.0.0.1:<port>` (Windows).
Generated sandbox client parses both.
- cron/scheduler.py: `argv = ["/bin/bash", str(path)]` hardcoded. Use
shutil.which("bash") so Windows (Git Bash via MinGit) works, with a
readable error when bash is genuinely absent.
- 6 bare npm/npx spawn sites: tools_config.py x2, doctor.py, whatsapp.py
(npm install + node version probe), browser_tool.py x2. On Windows npm
is npm.cmd / npx is npx.cmd (batch shims); subprocess.Popen(["npm", ...])
fails with WinError 193. shutil.which(...) returns the absolute .cmd
path which CreateProcessW accepts because the extension routes through
cmd.exe /c. POSIX behaviour unchanged (shutil.which still returns the
same path subprocess would resolve itself).
## HIGH fixes (silent misbehaviour on Windows)
- tools/environments/local.py get_temp_dir: hardcoded /tmp returned on
Windows meant `_cwd_file = "/tmp/hermes-cwd-*.txt"`, which bash wrote
via MSYS2's virtual /tmp but native Python couldn't open. Result: cwd
tracking silently broken — `cd` in terminal tool did nothing. Windows
branch now returns `%HERMES_HOME%/cache/terminal` with forward slashes
(works in both bash and Python, guaranteed no spaces).
- tools/environments/local.py _make_run_env PATH injection: `/usr/bin not
in split(":")` heuristic mangles Windows PATH (";" separator). Gate
the injection behind `not _IS_WINDOWS`.
- hermes_cli/gateway.py launch_detached_profile_gateway_restart: outer
Popen + watcher-script Popen both used start_new_session=True, which
Windows silently ignores. Watcher stayed attached to CLI's console,
died when user closed terminal after `hermes update`, left gateway
stale. Now branches through windows_detach_popen_kwargs() helper
(CREATE_NEW_PROCESS_GROUP | DETACHED_PROCESS | CREATE_NO_WINDOW on
Windows, start_new_session=True on POSIX — identical to main).
## MEDIUM fixes
- gateway/run.py /restart and /update handlers: hardcoded bash/setsid
chain crashes on Windows when user triggers /update in-gateway. Now
has sys.platform=="win32" branch using sys.executable + a tiny
Python watcher with proper detach flags. POSIX path is unchanged.
- cli.py _git_repo_root: Git on Windows sometimes returns /c/Users/...
style paths that break subprocess.Popen(cwd=...) and Path().resolve().
Added _normalize_git_bash_path() helper that translates /c/Users,
/cygdrive/c, /mnt/c variants to native C:\Users form. POSIX no-op.
_git_repo_root() now routes every result through it.
- cli.py worktree .worktreeinclude: os.symlink on directories failed
hard on Windows (requires admin or Developer Mode). Falls back to
shutil.copytree with a warning log.
## Tests
- 29 new tests in tests/tools/test_windows_native_support.py covering:
subprocess_compat helpers, TUI entry signal guards, kanban waitpid
guard, code_execution TCP fallback source-level invariants, cron bash
resolution, npm/npx bare-spawn lint per-file, local env Windows temp
dir, PATH injection gating, git bash path normalization, symlink
fallback, gateway detached watcher flags.
- One existing test assertion adjusted in test_browser_homebrew_paths:
it compared captured Popen argv to the BARE `"npx"` literal; after the
shutil.which() change argv[0] is the absolute path. New assertion
checks the shape (two items, second is `agent-browser`) rather than
the exact first-item string. Behaviour unchanged; test was too strict.
All 56 tests pass on Linux (30 from previous commits + 26 new).
267 tests from the affected files/dirs (browser, code_exec, local_env,
process_registry, kanban_db, windows_compat) all pass — zero regressions.
tests/hermes_cli/ (3909 pass) and tests/gateway/ (5021 pass) unchanged;
all pre-existing test failures confirmed unrelated via `git stash` re-run.
## What's still deferred (LOW priority)
- Visible cmd-window flashes on short-lived console apps (~14 sites) —
cosmetic, needs a follow-up pass once we have user reports.
- agent/file_safety.py POSIX-only security deny patterns — separate
hardening task.
- tools/process_registry.py returning "/tmp" as fallback — theoretical;
reachable only when all env-var candidates fail.
2026-05-07 17:29:31 -07:00
# Only UDS has a filesystem socket to unlink; TCP sockets are
# freed by server_sock.close() above.
if sock_path :
os . unlink ( sock_path )
2026-03-16 23:13:26 -07:00
except OSError :
pass # already cleaned up or never created
2026-02-19 23:23:43 -08:00
def _kill_process_group ( proc , escalate : bool = False ) :
feat(cross-platform): psutil for PID/process management + Windows footgun checker
## Why
Hermes supports Linux, macOS, and native Windows, but the codebase grew up
POSIX-first and has accumulated patterns that silently break (or worse,
silently kill!) on Windows:
- `os.kill(pid, 0)` as a liveness probe — on Windows this maps to
CTRL_C_EVENT and broadcasts Ctrl+C to the target's entire console
process group (bpo-14484, open since 2012).
- `os.killpg` — doesn't exist on Windows at all (AttributeError).
- `os.setsid` / `os.getuid` / `os.geteuid` — same.
- `signal.SIGKILL` / `signal.SIGHUP` / `signal.SIGUSR1` — module-attr
errors at runtime on Windows.
- `open(path)` / `open(path, "r")` without explicit encoding= — inherits
the platform default, which is cp1252/mbcs on Windows (UTF-8 on POSIX),
causing mojibake round-tripping between hosts.
- `wmic` — removed from Windows 10 21H1+.
This commit does three things:
1. Makes `psutil` a core dependency and migrates critical callsites to it.
2. Adds a grep-based CI gate (`scripts/check-windows-footguns.py`) that
blocks new instances of any of the above patterns.
3. Fixes every existing instance in the codebase so the baseline is clean.
## What changed
### 1. psutil as a core dependency (pyproject.toml)
Added `psutil>=5.9.0,<8` to core deps. psutil is the canonical
cross-platform answer for "is this PID alive" and "kill this process
tree" — its `pid_exists()` uses `OpenProcess + GetExitCodeProcess` on
Windows (NOT a signal call), and its `Process.children(recursive=True)`
+ `.kill()` combo replaces `os.killpg()` portably.
### 2. `gateway/status.py::_pid_exists`
Rewrote to call `psutil.pid_exists()` first, falling back to the
hand-rolled ctypes `OpenProcess + WaitForSingleObject` dance on Windows
(and `os.kill(pid, 0)` on POSIX) only if psutil is somehow missing —
e.g. during the scaffold phase of a fresh install before pip finishes.
### 3. `os.killpg` migration to psutil (7 callsites, 5 files)
- `tools/code_execution_tool.py`
- `tools/process_registry.py`
- `tools/tts_tool.py`
- `tools/environments/local.py` (3 sites kept as-is, suppressed with
`# windows-footgun: ok` — the pgid semantics psutil can't replicate,
and the calls are already Windows-guarded at the outer branch)
- `gateway/platforms/whatsapp.py`
### 4. `scripts/check-windows-footguns.py` (NEW, 500 lines)
Grep-based checker with 11 rules covering every Windows cross-platform
footgun we've hit so far:
1. `os.kill(pid, 0)` — the silent killer
2. `os.setsid` without guard
3. `os.killpg` (recommends psutil)
4. `os.getuid` / `os.geteuid` / `os.getgid`
5. `os.fork`
6. `signal.SIGKILL`
7. `signal.SIGHUP/SIGUSR1/SIGUSR2/SIGALRM/SIGCHLD/SIGPIPE/SIGQUIT`
8. `subprocess` shebang script invocation
9. `wmic` without `shutil.which` guard
10. Hardcoded `~/Desktop` (OneDrive trap)
11. `asyncio.add_signal_handler` without try/except
12. `open()` without `encoding=` on text mode
Features:
- Triple-quoted-docstring aware (won't flag prose inside docstrings)
- Trailing-comment aware (won't flag mentions in `# os.kill(pid, 0)` comments)
- Guard-hint aware (skips lines with `hasattr(os, ...)`,
`shutil.which(...)`, `if platform.system() != 'Windows'`, etc.)
- Inline suppression with `# windows-footgun: ok — <reason>`
- `--list` to print all rules with fixes
- `--all` / `--diff <ref>` / staged-files (default) modes
- Scans 380 files in under 2 seconds
### 5. CI integration
A GitHub Actions workflow that runs the checker on every PR and push is
staged at `/tmp/hermes-stash/windows-footguns.yml` — not included in this
commit because the GH token on the push machine lacks `workflow` scope.
A maintainer with `workflow` permissions should add it as
`.github/workflows/windows-footguns.yml` in a follow-up. Content:
```yaml
name: Windows footgun check
on:
push:
branches: [main]
pull_request:
branches: [main]
jobs:
check:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with: {python-version: "3.11"}
- run: python scripts/check-windows-footguns.py --all
```
### 6. CONTRIBUTING.md — "Cross-Platform Compatibility" expansion
Expanded from 5 to 16 rules, each with message, example, and fix.
Recommends psutil as the preferred API for PID / process-tree operations.
### 7. Baseline cleanup (91 → 0 findings)
- 14 `open()` sites → added `encoding='utf-8'` (internal logs/caches) or
`encoding='utf-8-sig'` (user-editable files that Notepad may BOM)
- 23 POSIX-only callsites in systemd helpers, pty_bridge, and plugin
tool subprocess management → annotated with
`# windows-footgun: ok — <reason>`
- 7 `os.killpg` sites → migrated to psutil (see §3 above)
## Verification
```
$ python scripts/check-windows-footguns.py --all
✓ No Windows footguns found (380 file(s) scanned).
$ python -c "from gateway.status import _pid_exists; import os
> print('self:', _pid_exists(os.getpid())); print('bogus:', _pid_exists(999999))"
self: True
bogus: False
```
Proof-of-repro that `os.kill(pid, 0)` was actually killing processes
before this fix — see commit `1cbe39914` and bpo-14484. This commit
removes the last hand-rolled ctypes path from the hot liveness-check
path and defers to the best-maintained cross-platform answer.
2026-05-08 12:57:33 -07:00
""" Kill the child and its entire process tree (cross-platform via psutil). """
import psutil
2026-02-19 23:23:43 -08:00
try :
feat(cross-platform): psutil for PID/process management + Windows footgun checker
## Why
Hermes supports Linux, macOS, and native Windows, but the codebase grew up
POSIX-first and has accumulated patterns that silently break (or worse,
silently kill!) on Windows:
- `os.kill(pid, 0)` as a liveness probe — on Windows this maps to
CTRL_C_EVENT and broadcasts Ctrl+C to the target's entire console
process group (bpo-14484, open since 2012).
- `os.killpg` — doesn't exist on Windows at all (AttributeError).
- `os.setsid` / `os.getuid` / `os.geteuid` — same.
- `signal.SIGKILL` / `signal.SIGHUP` / `signal.SIGUSR1` — module-attr
errors at runtime on Windows.
- `open(path)` / `open(path, "r")` without explicit encoding= — inherits
the platform default, which is cp1252/mbcs on Windows (UTF-8 on POSIX),
causing mojibake round-tripping between hosts.
- `wmic` — removed from Windows 10 21H1+.
This commit does three things:
1. Makes `psutil` a core dependency and migrates critical callsites to it.
2. Adds a grep-based CI gate (`scripts/check-windows-footguns.py`) that
blocks new instances of any of the above patterns.
3. Fixes every existing instance in the codebase so the baseline is clean.
## What changed
### 1. psutil as a core dependency (pyproject.toml)
Added `psutil>=5.9.0,<8` to core deps. psutil is the canonical
cross-platform answer for "is this PID alive" and "kill this process
tree" — its `pid_exists()` uses `OpenProcess + GetExitCodeProcess` on
Windows (NOT a signal call), and its `Process.children(recursive=True)`
+ `.kill()` combo replaces `os.killpg()` portably.
### 2. `gateway/status.py::_pid_exists`
Rewrote to call `psutil.pid_exists()` first, falling back to the
hand-rolled ctypes `OpenProcess + WaitForSingleObject` dance on Windows
(and `os.kill(pid, 0)` on POSIX) only if psutil is somehow missing —
e.g. during the scaffold phase of a fresh install before pip finishes.
### 3. `os.killpg` migration to psutil (7 callsites, 5 files)
- `tools/code_execution_tool.py`
- `tools/process_registry.py`
- `tools/tts_tool.py`
- `tools/environments/local.py` (3 sites kept as-is, suppressed with
`# windows-footgun: ok` — the pgid semantics psutil can't replicate,
and the calls are already Windows-guarded at the outer branch)
- `gateway/platforms/whatsapp.py`
### 4. `scripts/check-windows-footguns.py` (NEW, 500 lines)
Grep-based checker with 11 rules covering every Windows cross-platform
footgun we've hit so far:
1. `os.kill(pid, 0)` — the silent killer
2. `os.setsid` without guard
3. `os.killpg` (recommends psutil)
4. `os.getuid` / `os.geteuid` / `os.getgid`
5. `os.fork`
6. `signal.SIGKILL`
7. `signal.SIGHUP/SIGUSR1/SIGUSR2/SIGALRM/SIGCHLD/SIGPIPE/SIGQUIT`
8. `subprocess` shebang script invocation
9. `wmic` without `shutil.which` guard
10. Hardcoded `~/Desktop` (OneDrive trap)
11. `asyncio.add_signal_handler` without try/except
12. `open()` without `encoding=` on text mode
Features:
- Triple-quoted-docstring aware (won't flag prose inside docstrings)
- Trailing-comment aware (won't flag mentions in `# os.kill(pid, 0)` comments)
- Guard-hint aware (skips lines with `hasattr(os, ...)`,
`shutil.which(...)`, `if platform.system() != 'Windows'`, etc.)
- Inline suppression with `# windows-footgun: ok — <reason>`
- `--list` to print all rules with fixes
- `--all` / `--diff <ref>` / staged-files (default) modes
- Scans 380 files in under 2 seconds
### 5. CI integration
A GitHub Actions workflow that runs the checker on every PR and push is
staged at `/tmp/hermes-stash/windows-footguns.yml` — not included in this
commit because the GH token on the push machine lacks `workflow` scope.
A maintainer with `workflow` permissions should add it as
`.github/workflows/windows-footguns.yml` in a follow-up. Content:
```yaml
name: Windows footgun check
on:
push:
branches: [main]
pull_request:
branches: [main]
jobs:
check:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with: {python-version: "3.11"}
- run: python scripts/check-windows-footguns.py --all
```
### 6. CONTRIBUTING.md — "Cross-Platform Compatibility" expansion
Expanded from 5 to 16 rules, each with message, example, and fix.
Recommends psutil as the preferred API for PID / process-tree operations.
### 7. Baseline cleanup (91 → 0 findings)
- 14 `open()` sites → added `encoding='utf-8'` (internal logs/caches) or
`encoding='utf-8-sig'` (user-editable files that Notepad may BOM)
- 23 POSIX-only callsites in systemd helpers, pty_bridge, and plugin
tool subprocess management → annotated with
`# windows-footgun: ok — <reason>`
- 7 `os.killpg` sites → migrated to psutil (see §3 above)
## Verification
```
$ python scripts/check-windows-footguns.py --all
✓ No Windows footguns found (380 file(s) scanned).
$ python -c "from gateway.status import _pid_exists; import os
> print('self:', _pid_exists(os.getpid())); print('bogus:', _pid_exists(999999))"
self: True
bogus: False
```
Proof-of-repro that `os.kill(pid, 0)` was actually killing processes
before this fix — see commit `1cbe39914` and bpo-14484. This commit
removes the last hand-rolled ctypes path from the hot liveness-check
path and defers to the best-maintained cross-platform answer.
2026-05-08 12:57:33 -07:00
parent = psutil . Process ( proc . pid )
children = parent . children ( recursive = True )
for child in children :
try :
child . terminate ( )
except psutil . NoSuchProcess :
pass
try :
parent . terminate ( )
except psutil . NoSuchProcess :
pass
except psutil . NoSuchProcess :
pass
except ( PermissionError , OSError ) as e :
logger . debug ( " Could not terminate process tree: %s " , e , exc_info = True )
2026-02-19 23:23:43 -08:00
try :
proc . kill ( )
2026-03-08 14:50:23 +03:00
except Exception as e2 :
logger . debug ( " Could not kill process: %s " , e2 , exc_info = True )
2026-02-19 23:23:43 -08:00
if escalate :
# Give the process 5s to exit after SIGTERM, then SIGKILL
try :
proc . wait ( timeout = 5 )
except subprocess . TimeoutExpired :
try :
feat(cross-platform): psutil for PID/process management + Windows footgun checker
## Why
Hermes supports Linux, macOS, and native Windows, but the codebase grew up
POSIX-first and has accumulated patterns that silently break (or worse,
silently kill!) on Windows:
- `os.kill(pid, 0)` as a liveness probe — on Windows this maps to
CTRL_C_EVENT and broadcasts Ctrl+C to the target's entire console
process group (bpo-14484, open since 2012).
- `os.killpg` — doesn't exist on Windows at all (AttributeError).
- `os.setsid` / `os.getuid` / `os.geteuid` — same.
- `signal.SIGKILL` / `signal.SIGHUP` / `signal.SIGUSR1` — module-attr
errors at runtime on Windows.
- `open(path)` / `open(path, "r")` without explicit encoding= — inherits
the platform default, which is cp1252/mbcs on Windows (UTF-8 on POSIX),
causing mojibake round-tripping between hosts.
- `wmic` — removed from Windows 10 21H1+.
This commit does three things:
1. Makes `psutil` a core dependency and migrates critical callsites to it.
2. Adds a grep-based CI gate (`scripts/check-windows-footguns.py`) that
blocks new instances of any of the above patterns.
3. Fixes every existing instance in the codebase so the baseline is clean.
## What changed
### 1. psutil as a core dependency (pyproject.toml)
Added `psutil>=5.9.0,<8` to core deps. psutil is the canonical
cross-platform answer for "is this PID alive" and "kill this process
tree" — its `pid_exists()` uses `OpenProcess + GetExitCodeProcess` on
Windows (NOT a signal call), and its `Process.children(recursive=True)`
+ `.kill()` combo replaces `os.killpg()` portably.
### 2. `gateway/status.py::_pid_exists`
Rewrote to call `psutil.pid_exists()` first, falling back to the
hand-rolled ctypes `OpenProcess + WaitForSingleObject` dance on Windows
(and `os.kill(pid, 0)` on POSIX) only if psutil is somehow missing —
e.g. during the scaffold phase of a fresh install before pip finishes.
### 3. `os.killpg` migration to psutil (7 callsites, 5 files)
- `tools/code_execution_tool.py`
- `tools/process_registry.py`
- `tools/tts_tool.py`
- `tools/environments/local.py` (3 sites kept as-is, suppressed with
`# windows-footgun: ok` — the pgid semantics psutil can't replicate,
and the calls are already Windows-guarded at the outer branch)
- `gateway/platforms/whatsapp.py`
### 4. `scripts/check-windows-footguns.py` (NEW, 500 lines)
Grep-based checker with 11 rules covering every Windows cross-platform
footgun we've hit so far:
1. `os.kill(pid, 0)` — the silent killer
2. `os.setsid` without guard
3. `os.killpg` (recommends psutil)
4. `os.getuid` / `os.geteuid` / `os.getgid`
5. `os.fork`
6. `signal.SIGKILL`
7. `signal.SIGHUP/SIGUSR1/SIGUSR2/SIGALRM/SIGCHLD/SIGPIPE/SIGQUIT`
8. `subprocess` shebang script invocation
9. `wmic` without `shutil.which` guard
10. Hardcoded `~/Desktop` (OneDrive trap)
11. `asyncio.add_signal_handler` without try/except
12. `open()` without `encoding=` on text mode
Features:
- Triple-quoted-docstring aware (won't flag prose inside docstrings)
- Trailing-comment aware (won't flag mentions in `# os.kill(pid, 0)` comments)
- Guard-hint aware (skips lines with `hasattr(os, ...)`,
`shutil.which(...)`, `if platform.system() != 'Windows'`, etc.)
- Inline suppression with `# windows-footgun: ok — <reason>`
- `--list` to print all rules with fixes
- `--all` / `--diff <ref>` / staged-files (default) modes
- Scans 380 files in under 2 seconds
### 5. CI integration
A GitHub Actions workflow that runs the checker on every PR and push is
staged at `/tmp/hermes-stash/windows-footguns.yml` — not included in this
commit because the GH token on the push machine lacks `workflow` scope.
A maintainer with `workflow` permissions should add it as
`.github/workflows/windows-footguns.yml` in a follow-up. Content:
```yaml
name: Windows footgun check
on:
push:
branches: [main]
pull_request:
branches: [main]
jobs:
check:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with: {python-version: "3.11"}
- run: python scripts/check-windows-footguns.py --all
```
### 6. CONTRIBUTING.md — "Cross-Platform Compatibility" expansion
Expanded from 5 to 16 rules, each with message, example, and fix.
Recommends psutil as the preferred API for PID / process-tree operations.
### 7. Baseline cleanup (91 → 0 findings)
- 14 `open()` sites → added `encoding='utf-8'` (internal logs/caches) or
`encoding='utf-8-sig'` (user-editable files that Notepad may BOM)
- 23 POSIX-only callsites in systemd helpers, pty_bridge, and plugin
tool subprocess management → annotated with
`# windows-footgun: ok — <reason>`
- 7 `os.killpg` sites → migrated to psutil (see §3 above)
## Verification
```
$ python scripts/check-windows-footguns.py --all
✓ No Windows footguns found (380 file(s) scanned).
$ python -c "from gateway.status import _pid_exists; import os
> print('self:', _pid_exists(os.getpid())); print('bogus:', _pid_exists(999999))"
self: True
bogus: False
```
Proof-of-repro that `os.kill(pid, 0)` was actually killing processes
before this fix — see commit `1cbe39914` and bpo-14484. This commit
removes the last hand-rolled ctypes path from the hot liveness-check
path and defers to the best-maintained cross-platform answer.
2026-05-08 12:57:33 -07:00
parent = psutil . Process ( proc . pid )
for child in parent . children ( recursive = True ) :
try :
child . kill ( )
except psutil . NoSuchProcess :
pass
try :
parent . kill ( )
except psutil . NoSuchProcess :
pass
except psutil . NoSuchProcess :
pass
except ( PermissionError , OSError ) as e :
logger . debug ( " Could not kill process tree: %s " , e , exc_info = True )
2026-02-19 23:23:43 -08:00
try :
proc . kill ( )
2026-03-08 14:50:23 +03:00
except Exception as e2 :
logger . debug ( " Could not kill process: %s " , e2 , exc_info = True )
2026-02-19 23:23:43 -08:00
def _load_config ( ) - > dict :
2026-04-28 22:42:17 -05:00
""" Load code_execution config without importing the interactive CLI.
This helper is called while building the module - level execute_code schema
during tool discovery . Importing ` ` cli ` ` here pulls prompt_toolkit / Rich and
a large chunk of the classic REPL onto every agent startup path , including
` ` hermes - - tui ` ` where it is never used . Read the lightweight raw config
instead ; the config layer already caches by ( mtime , size ) , and an absent
key cleanly falls back to DEFAULT_EXECUTION_MODE .
"""
2026-02-19 23:23:43 -08:00
try :
2026-04-28 22:42:17 -05:00
from hermes_cli . config import read_raw_config
cfg = read_raw_config ( ) . get ( " code_execution " , { } )
return cfg if isinstance ( cfg , dict ) else { }
2026-02-19 23:23:43 -08:00
except Exception :
return { }
feat(execute_code): add project/strict execution modes, default to project (#11971)
Weaker models (Gemma-class) repeatedly rediscover and forget that
execute_code uses a different CWD and Python interpreter than terminal(),
causing them to flip-flop on whether user files exist and to hit import
errors on project dependencies like pandas.
Adds a new 'code_execution.mode' config key (default 'project') that
brings execute_code into line with terminal()'s filesystem/interpreter:
project (new default):
- cwd = session's TERMINAL_CWD (falls back to os.getcwd())
- python = active VIRTUAL_ENV/bin/python or CONDA_PREFIX/bin/python
with a Python 3.8+ version check; falls back cleanly to
sys.executable if no venv or the candidate fails
- result : 'import pandas' works, '.env' resolves, matches terminal()
strict (opt-in):
- cwd = staging tmpdir (today's behavior)
- python = sys.executable (today's behavior)
- result : maximum reproducibility and isolation; project deps
won't resolve
Security-critical invariants are identical across both modes and covered by
explicit regression tests:
- env scrubbing (strips *_API_KEY, *_TOKEN, *_SECRET, *_PASSWORD,
*_CREDENTIAL, *_PASSWD, *_AUTH substrings)
- SANDBOX_ALLOWED_TOOLS whitelist (no execute_code recursion, no
delegate_task, no MCP from inside scripts)
- resource caps (5-min timeout, 50KB stdout, 50 tool calls)
Deliberately avoids 'sandbox'/'isolated'/'cloud' language in tool
descriptions (regression from commit 39b83f34 where agents on local
backends falsely believed they were sandboxed and refused networking).
Override via env var: HERMES_EXECUTE_CODE_MODE=strict|project
2026-04-18 01:46:25 -07:00
# ---------------------------------------------------------------------------
# Execution mode resolution (strict vs project)
# ---------------------------------------------------------------------------
# Valid values for code_execution.mode. Kept as a module constant so tests
# and the config layer can reference the canonical set.
EXECUTION_MODES = ( " project " , " strict " )
DEFAULT_EXECUTION_MODE = " project "
def _get_execution_mode ( ) - > str :
""" Return the active execute_code mode — ' project ' or ' strict ' .
Reads ` ` code_execution . mode ` ` from config . yaml ; invalid values fall back
to ` ` DEFAULT_EXECUTION_MODE ` ` ( ' project ' ) with a log warning .
Mode semantics :
- ` ` project ` ` ( default ) : scripts run in the session ' s working directory
with the active virtual environment ' s python, so project dependencies
( pandas , torch , project packages ) and files resolve naturally .
- ` ` strict ` ` : scripts run in an isolated temp directory with
` ` sys . executable ` ` ( hermes - agent ' s python). Reproducible and the
interpreter is guaranteed to work , but project deps and relative paths
won ' t resolve.
Env scrubbing and tool whitelist apply identically in both modes .
"""
cfg_value = str ( _load_config ( ) . get ( " mode " , DEFAULT_EXECUTION_MODE ) ) . strip ( ) . lower ( )
if cfg_value in EXECUTION_MODES :
return cfg_value
logger . warning (
" Ignoring code_execution.mode= %r (expected one of %s ), falling back to %r " ,
cfg_value , EXECUTION_MODES , DEFAULT_EXECUTION_MODE ,
)
return DEFAULT_EXECUTION_MODE
@functools.lru_cache ( maxsize = 32 )
def _is_usable_python ( python_path : str ) - > bool :
""" Check whether a candidate Python interpreter is usable for execute_code.
Requires Python 3.8 + ( f - strings and stdlib modules the RPC stubs need ) .
Cached so we don ' t fork a subprocess on every execute_code call.
"""
try :
result = subprocess . run (
[ python_path , " -c " ,
" import sys; sys.exit(0 if sys.version_info >= (3, 8) else 1) " ] ,
timeout = 5 ,
capture_output = True ,
2026-05-16 12:06:09 -04:00
creationflags = subprocess . CREATE_NO_WINDOW if _IS_WINDOWS else 0 ,
feat(execute_code): add project/strict execution modes, default to project (#11971)
Weaker models (Gemma-class) repeatedly rediscover and forget that
execute_code uses a different CWD and Python interpreter than terminal(),
causing them to flip-flop on whether user files exist and to hit import
errors on project dependencies like pandas.
Adds a new 'code_execution.mode' config key (default 'project') that
brings execute_code into line with terminal()'s filesystem/interpreter:
project (new default):
- cwd = session's TERMINAL_CWD (falls back to os.getcwd())
- python = active VIRTUAL_ENV/bin/python or CONDA_PREFIX/bin/python
with a Python 3.8+ version check; falls back cleanly to
sys.executable if no venv or the candidate fails
- result : 'import pandas' works, '.env' resolves, matches terminal()
strict (opt-in):
- cwd = staging tmpdir (today's behavior)
- python = sys.executable (today's behavior)
- result : maximum reproducibility and isolation; project deps
won't resolve
Security-critical invariants are identical across both modes and covered by
explicit regression tests:
- env scrubbing (strips *_API_KEY, *_TOKEN, *_SECRET, *_PASSWORD,
*_CREDENTIAL, *_PASSWD, *_AUTH substrings)
- SANDBOX_ALLOWED_TOOLS whitelist (no execute_code recursion, no
delegate_task, no MCP from inside scripts)
- resource caps (5-min timeout, 50KB stdout, 50 tool calls)
Deliberately avoids 'sandbox'/'isolated'/'cloud' language in tool
descriptions (regression from commit 39b83f34 where agents on local
backends falsely believed they were sandboxed and refused networking).
Override via env var: HERMES_EXECUTE_CODE_MODE=strict|project
2026-04-18 01:46:25 -07:00
)
return result . returncode == 0
except ( OSError , subprocess . TimeoutExpired , subprocess . SubprocessError ) :
return False
def _resolve_child_python ( mode : str ) - > str :
""" Pick the Python interpreter for the execute_code subprocess.
In ` ` strict ` ` mode , always ` ` sys . executable ` ` — guaranteed to work and
keeps behavior fully reproducible across sessions .
In ` ` project ` ` mode , prefer the user ' s active virtualenv/conda env ' s
python so ` ` import pandas ` ` etc . work . Falls back to ` ` sys . executable ` `
if no venv is detected , the candidate binary is missing / not executable ,
or it fails a Python 3.8 + version check .
"""
if mode != " project " :
return sys . executable
if _IS_WINDOWS :
exe_names = ( " python.exe " , " python3.exe " )
subdirs = ( " Scripts " , )
else :
exe_names = ( " python " , " python3 " )
subdirs = ( " bin " , )
for var in ( " VIRTUAL_ENV " , " CONDA_PREFIX " ) :
root = os . environ . get ( var , " " ) . strip ( )
if not root :
continue
for subdir in subdirs :
for exe in exe_names :
candidate = os . path . join ( root , subdir , exe )
if not ( os . path . isfile ( candidate ) and os . access ( candidate , os . X_OK ) ) :
continue
if _is_usable_python ( candidate ) :
return candidate
# Found the interpreter but it failed the version check —
# log once and fall through to sys.executable.
logger . info (
" execute_code: skipping %s = %s (Python version < 3.8 or broken). "
" Using sys.executable instead. " , var , candidate ,
)
return sys . executable
return sys . executable
def _resolve_child_cwd ( mode : str , staging_dir : str ) - > str :
""" Resolve the working directory for the execute_code subprocess.
- ` ` strict ` ` : the staging tmpdir ( today ' s behavior).
- ` ` project ` ` : the session ' s TERMINAL_CWD (same as the terminal tool), or
` ` os . getcwd ( ) ` ` if TERMINAL_CWD is unset or doesn ' t point at a real dir.
Falls back to the staging tmpdir as a last resort so we never invoke
Popen with a nonexistent cwd .
"""
if mode != " project " :
return staging_dir
raw = os . environ . get ( " TERMINAL_CWD " , " " ) . strip ( )
if raw :
expanded = os . path . expanduser ( raw )
if os . path . isdir ( expanded ) :
return expanded
here = os . getcwd ( )
if os . path . isdir ( here ) :
return here
return staging_dir
2026-02-19 23:23:43 -08:00
# ---------------------------------------------------------------------------
# OpenAI Function-Calling Schema
# ---------------------------------------------------------------------------
2026-03-06 17:36:06 -08:00
# Per-tool documentation lines for the execute_code description.
# Ordered to match the canonical display order.
_TOOL_DOC_LINES = [
( " web_search " ,
" web_search(query: str, limit: int = 5) -> dict \n "
" Returns { \" data \" : { \" web \" : [ { \" url \" , \" title \" , \" description \" }, ...]}} " ) ,
( " web_extract " ,
" web_extract(urls: list[str]) -> dict \n "
fix: improve read-loop detection — consecutive-only, correct thresholds, fix bugs
Follow-up to PR #705 (merged from 0xbyt4). Addresses several issues:
1. CONSECUTIVE-ONLY TRACKING: Redesigned the read/search tracker to only
warn/block on truly consecutive identical calls. Any other tool call
in between (write, patch, terminal, etc.) resets the counter via
notify_other_tool_call(), called from handle_function_call() in
model_tools.py. This prevents false blocks in read→edit→verify flows.
2. THRESHOLD ADJUSTMENT: Warn on 3rd consecutive (was 2nd), block on
4th+ consecutive (was 3rd+). Gives the model more room before
intervening.
3. TUPLE UNPACKING BUG: Fixed get_read_files_summary() which crashed on
search keys (5-tuple) when trying to unpack as 3-tuple. Now uses a
separate read_history set that only tracks file reads.
4. WEB_EXTRACT DOCSTRING: Reverted incorrect removal of 'title' from
web_extract return docs in code_execution_tool.py — the field IS
returned by web_tools.py.
5. TESTS: Rewrote test_read_loop_detection.py (35 tests) to cover
consecutive-only behavior, notify_other_tool_call, interleaved
read/search, and summary-unaffected-by-searches.
2026-03-10 16:25:41 -07:00
" Returns { \" results \" : [ { \" url \" , \" title \" , \" content \" , \" error \" }, ...]} where content is markdown " ) ,
2026-03-06 17:36:06 -08:00
( " read_file " ,
" read_file(path: str, offset: int = 1, limit: int = 500) -> dict \n "
" Lines are 1-indexed. Returns { \" content \" : \" ... \" , \" total_lines \" : N} " ) ,
( " write_file " ,
" write_file(path: str, content: str) -> dict \n "
" Always overwrites the entire file. " ) ,
( " search_files " ,
" search_files(pattern: str, target= \" content \" , path= \" . \" , file_glob=None, limit=50) -> dict \n "
" target: \" content \" (search inside files) or \" files \" (find files by name). Returns { \" matches \" : [...]} " ) ,
( " patch " ,
" patch(path: str, old_string: str, new_string: str, replace_all: bool = False) -> dict \n "
" Replaces old_string with new_string in the file. " ) ,
( " terminal " ,
" terminal(command: str, timeout=None, workdir=None) -> dict \n "
" Foreground only (no background/pty). Returns { \" output \" : \" ... \" , \" exit_code \" : N} " ) ,
]
feat(execute_code): add project/strict execution modes, default to project (#11971)
Weaker models (Gemma-class) repeatedly rediscover and forget that
execute_code uses a different CWD and Python interpreter than terminal(),
causing them to flip-flop on whether user files exist and to hit import
errors on project dependencies like pandas.
Adds a new 'code_execution.mode' config key (default 'project') that
brings execute_code into line with terminal()'s filesystem/interpreter:
project (new default):
- cwd = session's TERMINAL_CWD (falls back to os.getcwd())
- python = active VIRTUAL_ENV/bin/python or CONDA_PREFIX/bin/python
with a Python 3.8+ version check; falls back cleanly to
sys.executable if no venv or the candidate fails
- result : 'import pandas' works, '.env' resolves, matches terminal()
strict (opt-in):
- cwd = staging tmpdir (today's behavior)
- python = sys.executable (today's behavior)
- result : maximum reproducibility and isolation; project deps
won't resolve
Security-critical invariants are identical across both modes and covered by
explicit regression tests:
- env scrubbing (strips *_API_KEY, *_TOKEN, *_SECRET, *_PASSWORD,
*_CREDENTIAL, *_PASSWD, *_AUTH substrings)
- SANDBOX_ALLOWED_TOOLS whitelist (no execute_code recursion, no
delegate_task, no MCP from inside scripts)
- resource caps (5-min timeout, 50KB stdout, 50 tool calls)
Deliberately avoids 'sandbox'/'isolated'/'cloud' language in tool
descriptions (regression from commit 39b83f34 where agents on local
backends falsely believed they were sandboxed and refused networking).
Override via env var: HERMES_EXECUTE_CODE_MODE=strict|project
2026-04-18 01:46:25 -07:00
def build_execute_code_schema ( enabled_sandbox_tools : set = None ,
mode : str = None ) - > dict :
2026-03-06 17:36:06 -08:00
""" Build the execute_code schema with description listing only enabled tools.
When tools are disabled via ` ` hermes tools ` ` ( e . g . web is turned off ) ,
the schema description should NOT mention web_search / web_extract —
otherwise the model thinks they are available and keeps trying to use them .
feat(execute_code): add project/strict execution modes, default to project (#11971)
Weaker models (Gemma-class) repeatedly rediscover and forget that
execute_code uses a different CWD and Python interpreter than terminal(),
causing them to flip-flop on whether user files exist and to hit import
errors on project dependencies like pandas.
Adds a new 'code_execution.mode' config key (default 'project') that
brings execute_code into line with terminal()'s filesystem/interpreter:
project (new default):
- cwd = session's TERMINAL_CWD (falls back to os.getcwd())
- python = active VIRTUAL_ENV/bin/python or CONDA_PREFIX/bin/python
with a Python 3.8+ version check; falls back cleanly to
sys.executable if no venv or the candidate fails
- result : 'import pandas' works, '.env' resolves, matches terminal()
strict (opt-in):
- cwd = staging tmpdir (today's behavior)
- python = sys.executable (today's behavior)
- result : maximum reproducibility and isolation; project deps
won't resolve
Security-critical invariants are identical across both modes and covered by
explicit regression tests:
- env scrubbing (strips *_API_KEY, *_TOKEN, *_SECRET, *_PASSWORD,
*_CREDENTIAL, *_PASSWD, *_AUTH substrings)
- SANDBOX_ALLOWED_TOOLS whitelist (no execute_code recursion, no
delegate_task, no MCP from inside scripts)
- resource caps (5-min timeout, 50KB stdout, 50 tool calls)
Deliberately avoids 'sandbox'/'isolated'/'cloud' language in tool
descriptions (regression from commit 39b83f34 where agents on local
backends falsely believed they were sandboxed and refused networking).
Override via env var: HERMES_EXECUTE_CODE_MODE=strict|project
2026-04-18 01:46:25 -07:00
` ` mode ` ` controls the working - directory sentence in the description :
- ` ` ' strict ' ` ` : scripts run in a temp dir ( not the session ' s CWD)
- ` ` ' project ' ` ` ( default ) : scripts run in the session ' s CWD with the
active venv ' s python
If ` ` mode ` ` is None , the current ` ` code_execution . mode ` ` config is read .
2026-03-06 17:36:06 -08:00
"""
if enabled_sandbox_tools is None :
enabled_sandbox_tools = SANDBOX_ALLOWED_TOOLS
feat(execute_code): add project/strict execution modes, default to project (#11971)
Weaker models (Gemma-class) repeatedly rediscover and forget that
execute_code uses a different CWD and Python interpreter than terminal(),
causing them to flip-flop on whether user files exist and to hit import
errors on project dependencies like pandas.
Adds a new 'code_execution.mode' config key (default 'project') that
brings execute_code into line with terminal()'s filesystem/interpreter:
project (new default):
- cwd = session's TERMINAL_CWD (falls back to os.getcwd())
- python = active VIRTUAL_ENV/bin/python or CONDA_PREFIX/bin/python
with a Python 3.8+ version check; falls back cleanly to
sys.executable if no venv or the candidate fails
- result : 'import pandas' works, '.env' resolves, matches terminal()
strict (opt-in):
- cwd = staging tmpdir (today's behavior)
- python = sys.executable (today's behavior)
- result : maximum reproducibility and isolation; project deps
won't resolve
Security-critical invariants are identical across both modes and covered by
explicit regression tests:
- env scrubbing (strips *_API_KEY, *_TOKEN, *_SECRET, *_PASSWORD,
*_CREDENTIAL, *_PASSWD, *_AUTH substrings)
- SANDBOX_ALLOWED_TOOLS whitelist (no execute_code recursion, no
delegate_task, no MCP from inside scripts)
- resource caps (5-min timeout, 50KB stdout, 50 tool calls)
Deliberately avoids 'sandbox'/'isolated'/'cloud' language in tool
descriptions (regression from commit 39b83f34 where agents on local
backends falsely believed they were sandboxed and refused networking).
Override via env var: HERMES_EXECUTE_CODE_MODE=strict|project
2026-04-18 01:46:25 -07:00
if mode is None :
mode = _get_execution_mode ( )
2026-03-06 17:36:06 -08:00
# Build tool documentation lines for only the enabled tools
tool_lines = " \n " . join (
doc for name , doc in _TOOL_DOC_LINES if name in enabled_sandbox_tools
)
# Build example import list from enabled tools
import_examples = [ n for n in ( " web_search " , " terminal " ) if n in enabled_sandbox_tools ]
if not import_examples :
import_examples = sorted ( enabled_sandbox_tools ) [ : 2 ]
2026-03-08 13:15:17 +03:00
if import_examples :
import_str = " , " . join ( import_examples ) + " , ... "
else :
import_str = " ... "
2026-03-06 17:36:06 -08:00
feat(execute_code): add project/strict execution modes, default to project (#11971)
Weaker models (Gemma-class) repeatedly rediscover and forget that
execute_code uses a different CWD and Python interpreter than terminal(),
causing them to flip-flop on whether user files exist and to hit import
errors on project dependencies like pandas.
Adds a new 'code_execution.mode' config key (default 'project') that
brings execute_code into line with terminal()'s filesystem/interpreter:
project (new default):
- cwd = session's TERMINAL_CWD (falls back to os.getcwd())
- python = active VIRTUAL_ENV/bin/python or CONDA_PREFIX/bin/python
with a Python 3.8+ version check; falls back cleanly to
sys.executable if no venv or the candidate fails
- result : 'import pandas' works, '.env' resolves, matches terminal()
strict (opt-in):
- cwd = staging tmpdir (today's behavior)
- python = sys.executable (today's behavior)
- result : maximum reproducibility and isolation; project deps
won't resolve
Security-critical invariants are identical across both modes and covered by
explicit regression tests:
- env scrubbing (strips *_API_KEY, *_TOKEN, *_SECRET, *_PASSWORD,
*_CREDENTIAL, *_PASSWD, *_AUTH substrings)
- SANDBOX_ALLOWED_TOOLS whitelist (no execute_code recursion, no
delegate_task, no MCP from inside scripts)
- resource caps (5-min timeout, 50KB stdout, 50 tool calls)
Deliberately avoids 'sandbox'/'isolated'/'cloud' language in tool
descriptions (regression from commit 39b83f34 where agents on local
backends falsely believed they were sandboxed and refused networking).
Override via env var: HERMES_EXECUTE_CODE_MODE=strict|project
2026-04-18 01:46:25 -07:00
# Mode-specific CWD guidance. Project mode is the default and matches
# terminal()'s filesystem/interpreter; strict mode retains the isolated
# temp-dir staging and hermes-agent's own python.
if mode == " strict " :
cwd_note = (
" Scripts run in their own temp dir, not the session ' s CWD — use absolute paths "
" (os.path.expanduser( ' ~/.hermes/.env ' )) or terminal()/read_file() for user files. "
)
else :
cwd_note = (
" Scripts run in the session ' s working directory with the active venv ' s python, "
" so project deps (pandas, etc.) and relative paths work like in terminal(). "
)
2026-03-06 17:36:06 -08:00
description = (
2026-02-19 23:23:43 -08:00
" Run a Python script that can call Hermes tools programmatically. "
" Use this when you need 3+ tool calls with processing logic between them, "
" need to filter/reduce large tool outputs before they enter your context, "
" need conditional branching (if X then Y else Z), or need to loop "
" (fetch N pages, process N files, retry on failure). \n \n "
" Use normal tool calls instead when: single tool call with no processing, "
" you need to see the full result and apply complex reasoning, "
" or the task requires interactive user input. \n \n "
2026-03-06 17:36:06 -08:00
f " Available via `from hermes_tools import ...`: \n \n "
f " { tool_lines } \n \n "
2026-02-21 02:41:30 -08:00
" Limits: 5-minute timeout, 50KB stdout cap, max 50 tool calls per script. "
2026-04-13 04:23:18 -07:00
" terminal() is foreground-only (no background or pty). \n \n "
feat(execute_code): add project/strict execution modes, default to project (#11971)
Weaker models (Gemma-class) repeatedly rediscover and forget that
execute_code uses a different CWD and Python interpreter than terminal(),
causing them to flip-flop on whether user files exist and to hit import
errors on project dependencies like pandas.
Adds a new 'code_execution.mode' config key (default 'project') that
brings execute_code into line with terminal()'s filesystem/interpreter:
project (new default):
- cwd = session's TERMINAL_CWD (falls back to os.getcwd())
- python = active VIRTUAL_ENV/bin/python or CONDA_PREFIX/bin/python
with a Python 3.8+ version check; falls back cleanly to
sys.executable if no venv or the candidate fails
- result : 'import pandas' works, '.env' resolves, matches terminal()
strict (opt-in):
- cwd = staging tmpdir (today's behavior)
- python = sys.executable (today's behavior)
- result : maximum reproducibility and isolation; project deps
won't resolve
Security-critical invariants are identical across both modes and covered by
explicit regression tests:
- env scrubbing (strips *_API_KEY, *_TOKEN, *_SECRET, *_PASSWORD,
*_CREDENTIAL, *_PASSWD, *_AUTH substrings)
- SANDBOX_ALLOWED_TOOLS whitelist (no execute_code recursion, no
delegate_task, no MCP from inside scripts)
- resource caps (5-min timeout, 50KB stdout, 50 tool calls)
Deliberately avoids 'sandbox'/'isolated'/'cloud' language in tool
descriptions (regression from commit 39b83f34 where agents on local
backends falsely believed they were sandboxed and refused networking).
Override via env var: HERMES_EXECUTE_CODE_MODE=strict|project
2026-04-18 01:46:25 -07:00
f " { cwd_note } \n \n "
2026-02-19 23:23:43 -08:00
" Print your final result to stdout. Use Python stdlib (json, re, math, csv, "
2026-03-06 01:52:46 -08:00
" datetime, collections, etc.) for processing between tool calls. \n \n "
" Also available (no import needed — built into hermes_tools): \n "
" json_parse(text: str) — json.loads with strict=False; use for terminal() output with control chars \n "
" shell_quote(s: str) — shlex.quote(); use when interpolating dynamic strings into shell commands \n "
" retry(fn, max_attempts=3, delay=2) — retry with exponential backoff for transient failures "
2026-03-06 17:36:06 -08:00
)
return {
" name " : " execute_code " ,
" description " : description ,
" parameters " : {
" type " : " object " ,
" properties " : {
" code " : {
" type " : " string " ,
" description " : (
" Python code to execute. Import tools with "
f " `from hermes_tools import { import_str } ` "
" and print your final result to stdout. "
) ,
} ,
2026-02-19 23:23:43 -08:00
} ,
2026-03-06 17:36:06 -08:00
" required " : [ " code " ] ,
2026-02-19 23:23:43 -08:00
} ,
2026-03-06 17:36:06 -08:00
}
feat(execute_code): add project/strict execution modes, default to project (#11971)
Weaker models (Gemma-class) repeatedly rediscover and forget that
execute_code uses a different CWD and Python interpreter than terminal(),
causing them to flip-flop on whether user files exist and to hit import
errors on project dependencies like pandas.
Adds a new 'code_execution.mode' config key (default 'project') that
brings execute_code into line with terminal()'s filesystem/interpreter:
project (new default):
- cwd = session's TERMINAL_CWD (falls back to os.getcwd())
- python = active VIRTUAL_ENV/bin/python or CONDA_PREFIX/bin/python
with a Python 3.8+ version check; falls back cleanly to
sys.executable if no venv or the candidate fails
- result : 'import pandas' works, '.env' resolves, matches terminal()
strict (opt-in):
- cwd = staging tmpdir (today's behavior)
- python = sys.executable (today's behavior)
- result : maximum reproducibility and isolation; project deps
won't resolve
Security-critical invariants are identical across both modes and covered by
explicit regression tests:
- env scrubbing (strips *_API_KEY, *_TOKEN, *_SECRET, *_PASSWORD,
*_CREDENTIAL, *_PASSWD, *_AUTH substrings)
- SANDBOX_ALLOWED_TOOLS whitelist (no execute_code recursion, no
delegate_task, no MCP from inside scripts)
- resource caps (5-min timeout, 50KB stdout, 50 tool calls)
Deliberately avoids 'sandbox'/'isolated'/'cloud' language in tool
descriptions (regression from commit 39b83f34 where agents on local
backends falsely believed they were sandboxed and refused networking).
Override via env var: HERMES_EXECUTE_CODE_MODE=strict|project
2026-04-18 01:46:25 -07:00
# Default schema used at registration time (all sandbox tools listed,
# current configured mode). model_tools.py rebuilds per-session anyway.
2026-03-06 17:36:06 -08:00
EXECUTE_CODE_SCHEMA = build_execute_code_schema ( )
2026-02-21 20:22:33 -08:00
# --- Registry ---
refactor: add tool_error/tool_result helpers + read_raw_config, migrate 129 callsites
Add three reusable helpers to eliminate pervasive boilerplate:
tools/registry.py — tool_error() and tool_result():
Every tool handler returns JSON strings. The pattern
json.dumps({"error": msg}, ensure_ascii=False) appeared 106 times,
and json.dumps({"success": False, "error": msg}, ...) another 23.
Now: tool_error(msg) or tool_error(msg, success=False).
tool_result() handles arbitrary result dicts:
tool_result(success=True, data=payload) or tool_result(some_dict).
hermes_cli/config.py — read_raw_config():
Lightweight YAML reader that returns the raw config dict without
load_config()'s deep-merge + migration overhead. Available for
callsites that just need a single config value.
Migration (129 callsites across 32 files):
- tools/: browser_camofox (18), file_tools (10), homeassistant (8),
web_tools (7), skill_manager (7), cronjob (11), code_execution (4),
delegate (5), send_message (4), tts (4), memory (7), session_search (3),
mcp (2), clarify (2), skills_tool (3), todo (1), vision (1),
browser (1), process_registry (2), image_gen (1)
- plugins/memory/: honcho (9), supermemory (9), hindsight (8),
holographic (7), openviking (7), mem0 (7), byterover (6), retaindb (2)
- agent/: memory_manager (2), builtin_memory_provider (1)
2026-04-07 13:36:20 -07:00
from tools . registry import registry , tool_error
2026-02-21 20:22:33 -08:00
registry . register (
name = " execute_code " ,
toolset = " code_execution " ,
schema = EXECUTE_CODE_SCHEMA ,
handler = lambda args , * * kw : execute_code (
code = args . get ( " code " , " " ) ,
task_id = kw . get ( " task_id " ) ,
enabled_tools = kw . get ( " enabled_tools " ) ) ,
check_fn = check_sandbox_requirements ,
2026-03-15 20:21:21 -07:00
emoji = " 🐍 " ,
2026-04-08 01:45:51 -07:00
max_result_size_chars = 100_000 ,
2026-02-21 20:22:33 -08:00
)