hermes-bsd/tools
Teknium cc38282b04 feat(cross-platform): psutil for PID/process management + Windows footgun checker
## Why

Hermes supports Linux, macOS, and native Windows, but the codebase grew up
POSIX-first and has accumulated patterns that silently break (or worse,
silently kill!) on Windows:

- `os.kill(pid, 0)` as a liveness probe — on Windows this maps to
  CTRL_C_EVENT and broadcasts Ctrl+C to the target's entire console
  process group (bpo-14484, open since 2012).
- `os.killpg` — doesn't exist on Windows at all (AttributeError).
- `os.setsid` / `os.getuid` / `os.geteuid` — same.
- `signal.SIGKILL` / `signal.SIGHUP` / `signal.SIGUSR1` — module-attr
  errors at runtime on Windows.
- `open(path)` / `open(path, "r")` without explicit encoding= — inherits
  the platform default, which is cp1252/mbcs on Windows (UTF-8 on POSIX),
  causing mojibake round-tripping between hosts.
- `wmic` — removed from Windows 10 21H1+.

This commit does three things:

1. Makes `psutil` a core dependency and migrates critical callsites to it.
2. Adds a grep-based CI gate (`scripts/check-windows-footguns.py`) that
   blocks new instances of any of the above patterns.
3. Fixes every existing instance in the codebase so the baseline is clean.

## What changed

### 1. psutil as a core dependency (pyproject.toml)

Added `psutil>=5.9.0,<8` to core deps. psutil is the canonical
cross-platform answer for "is this PID alive" and "kill this process
tree" — its `pid_exists()` uses `OpenProcess + GetExitCodeProcess` on
Windows (NOT a signal call), and its `Process.children(recursive=True)`
+ `.kill()` combo replaces `os.killpg()` portably.

### 2. `gateway/status.py::_pid_exists`

Rewrote to call `psutil.pid_exists()` first, falling back to the
hand-rolled ctypes `OpenProcess + WaitForSingleObject` dance on Windows
(and `os.kill(pid, 0)` on POSIX) only if psutil is somehow missing —
e.g. during the scaffold phase of a fresh install before pip finishes.

### 3. `os.killpg` migration to psutil (7 callsites, 5 files)

- `tools/code_execution_tool.py`
- `tools/process_registry.py`
- `tools/tts_tool.py`
- `tools/environments/local.py` (3 sites kept as-is, suppressed with
  `# windows-footgun: ok` — the pgid semantics psutil can't replicate,
  and the calls are already Windows-guarded at the outer branch)
- `gateway/platforms/whatsapp.py`

### 4. `scripts/check-windows-footguns.py` (NEW, 500 lines)

Grep-based checker with 11 rules covering every Windows cross-platform
footgun we've hit so far:

1. `os.kill(pid, 0)` — the silent killer
2. `os.setsid` without guard
3. `os.killpg` (recommends psutil)
4. `os.getuid` / `os.geteuid` / `os.getgid`
5. `os.fork`
6. `signal.SIGKILL`
7. `signal.SIGHUP/SIGUSR1/SIGUSR2/SIGALRM/SIGCHLD/SIGPIPE/SIGQUIT`
8. `subprocess` shebang script invocation
9. `wmic` without `shutil.which` guard
10. Hardcoded `~/Desktop` (OneDrive trap)
11. `asyncio.add_signal_handler` without try/except
12. `open()` without `encoding=` on text mode

Features:
- Triple-quoted-docstring aware (won't flag prose inside docstrings)
- Trailing-comment aware (won't flag mentions in `# os.kill(pid, 0)` comments)
- Guard-hint aware (skips lines with `hasattr(os, ...)`,
  `shutil.which(...)`, `if platform.system() != 'Windows'`, etc.)
- Inline suppression with `# windows-footgun: ok — <reason>`
- `--list` to print all rules with fixes
- `--all` / `--diff <ref>` / staged-files (default) modes
- Scans 380 files in under 2 seconds

### 5. CI integration

A GitHub Actions workflow that runs the checker on every PR and push is
staged at `/tmp/hermes-stash/windows-footguns.yml` — not included in this
commit because the GH token on the push machine lacks `workflow` scope.
A maintainer with `workflow` permissions should add it as
`.github/workflows/windows-footguns.yml` in a follow-up. Content:

```yaml
name: Windows footgun check
on:
  push:
    branches: [main]
  pull_request:
    branches: [main]
jobs:
  check:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with: {python-version: "3.11"}
      - run: python scripts/check-windows-footguns.py --all
```

### 6. CONTRIBUTING.md — "Cross-Platform Compatibility" expansion

Expanded from 5 to 16 rules, each with message, example, and fix.
Recommends psutil as the preferred API for PID / process-tree operations.

### 7. Baseline cleanup (91 → 0 findings)

- 14 `open()` sites → added `encoding='utf-8'` (internal logs/caches) or
  `encoding='utf-8-sig'` (user-editable files that Notepad may BOM)
- 23 POSIX-only callsites in systemd helpers, pty_bridge, and plugin
  tool subprocess management → annotated with
  `# windows-footgun: ok — <reason>`
- 7 `os.killpg` sites → migrated to psutil (see §3 above)

## Verification

```
$ python scripts/check-windows-footguns.py --all
✓ No Windows footguns found (380 file(s) scanned).

$ python -c "from gateway.status import _pid_exists; import os
> print('self:', _pid_exists(os.getpid())); print('bogus:', _pid_exists(999999))"
self: True
bogus: False
```

Proof-of-repro that `os.kill(pid, 0)` was actually killing processes
before this fix — see commit `1cbe39914` and bpo-14484. This commit
removes the last hand-rolled ctypes path from the hot liveness-check
path and defers to the best-maintained cross-platform answer.
2026-05-08 14:27:40 -07:00
..
browser_providers
computer_use feat(computer-use): background focus-safe backend — set_value, structured windows, MIME detection 2026-05-08 11:07:38 -07:00
environments feat(cross-platform): psutil for PID/process management + Windows footgun checker 2026-05-08 14:27:40 -07:00
neutts_samples
web_providers docs(web): fix SearXNG env configuration 2026-05-07 17:54:47 -07:00
__init__.py
ansi_strip.py
approval.py fix(approval): cron jobs must not be treated as gateway context 2026-05-08 07:30:14 -07:00
binary_extensions.py
browser_camofox.py refactor(config): add cfg_get() helper; migrate 20 nested-get call sites (#17304) 2026-04-28 23:17:39 -07:00
browser_camofox_state.py
browser_cdp_tool.py chore: remove unused imports and dead locals (ruff F401, F841) (#17010) 2026-04-28 06:46:45 -07:00
browser_dialog_tool.py
browser_supervisor.py fix(browser_supervisor): verify thread and loop health before returning cached supervisor 2026-04-30 20:33:33 -07:00
browser_tool.py fix(windows): os.kill(pid, 0) is NOT a no-op on Windows — route through new _pid_exists helper 2026-05-08 14:27:40 -07:00
budget_config.py
checkpoint_manager.py feat(checkpoints): v2 single-store rewrite with real pruning + disk guardrails (#20709) 2026-05-06 05:44:35 -07:00
clarify_tool.py
code_execution_tool.py feat(cross-platform): psutil for PID/process management + Windows footgun checker 2026-05-08 14:27:40 -07:00
computer_use_tool.py feat(computer-use): cua-driver backend, universal any-model schema 2026-05-08 11:07:38 -07:00
credential_files.py fix(gateway): translate inbound document host paths to container paths for Docker backend 2026-05-07 05:02:26 -07:00
cronjob_tools.py feat(cron): routing intent — deliver=all fans out to every connected channel (#21495) 2026-05-08 04:17:21 -07:00
debug_helpers.py
delegate_tool.py fix(delegate): expand composite toolsets before intersection in delegate_task 2026-05-07 06:41:42 -07:00
discord_tool.py feat: add Discord message deletion action 2026-05-07 05:11:09 -07:00
env_passthrough.py refactor(config): add cfg_get() helper; migrate 20 nested-get call sites (#17304) 2026-04-28 23:17:39 -07:00
feishu_doc_tool.py
feishu_drive_tool.py
file_operations.py fix(windows): %1 install error, patch CRLF false-negative, SOUL.md BOM 2026-05-08 14:27:40 -07:00
file_state.py
file_tools.py feat(file_tools): post-write delta lint on write_file + patch, add JSON/YAML/TOML/Python in-process linters (#20191) 2026-05-05 04:54:17 -07:00
fuzzy_match.py
homeassistant_tool.py
image_generation_tool.py feat(image-gen): honor image_gen.model from config.yaml in plugin dispatch 2026-05-07 06:24:24 -07:00
interrupt.py
kanban_tools.py fix(kanban): heartbeat tool extends claim TTL, not just last_heartbeat_at 2026-05-07 05:05:20 -07:00
managed_tool_gateway.py
mcp_oauth.py fix(mcp-oauth): persist OAuth server metadata across process restarts (#21226) 2026-05-07 05:35:33 -07:00
mcp_oauth_manager.py fix(mcp-oauth): persist OAuth server metadata across process restarts (#21226) 2026-05-07 05:35:33 -07:00
mcp_tool.py fix(windows): os.kill(pid, 0) is NOT a no-op on Windows — route through new _pid_exists helper 2026-05-08 14:27:40 -07:00
memory_tool.py feat(cross-platform): psutil for PID/process management + Windows footgun checker 2026-05-08 14:27:40 -07:00
microsoft_graph_auth.py feat(msgraph): add auth and client foundation 2026-05-08 09:27:26 -07:00
microsoft_graph_client.py fix(msgraph): stream download_to_file body instead of buffering 2026-05-08 09:27:26 -07:00
mixture_of_agents_tool.py
neutts_synth.py
openrouter_client.py
osv_check.py
patch_parser.py
path_security.py
process_registry.py feat(cross-platform): psutil for PID/process management + Windows footgun checker 2026-05-08 14:27:40 -07:00
registry.py perf(tools): memoize get_tool_definitions + TTL-cache check_fn results (#17098) 2026-04-28 18:20:17 -07:00
rl_training_tool.py codebase: add encoding='utf-8' to all bare open() calls (PLW1514) 2026-05-08 14:27:40 -07:00
schema_sanitizer.py fix: strip Codex-hostile top-level schema combinators 2026-05-07 07:03:21 -07:00
send_message_tool.py feat(gateway): support [[as_document]] directive for skill media routing 2026-05-07 05:20:10 -07:00
session_search_tool.py fix(session-search): report source from resolved parent, not FTS5 child session (#15909) 2026-05-04 05:07:40 -07:00
skill_manager_tool.py fix: exclude hidden and archive dirs from _find_skill rglob 2026-05-07 05:15:28 -07:00
skill_provenance.py fix(curator): only mark agent-created for background-review sediment (#19621) 2026-05-04 02:42:16 -07:00
skill_usage.py feat(cross-platform): psutil for PID/process management + Windows footgun checker 2026-05-08 14:27:40 -07:00
skills_guard.py
skills_hub.py codebase: add encoding='utf-8' to all bare open() calls (PLW1514) 2026-05-08 14:27:40 -07:00
skills_sync.py refactor: consolidate symlink-safe atomic replace into shared helper 2026-04-28 04:58:22 -07:00
skills_tool.py fix(skills): support category-qualified local skill names 2026-05-05 10:15:31 -07:00
slash_confirm.py feat(gateway,cli): confirm /reload-mcp to warn about prompt cache invalidation 2026-04-29 21:56:47 -07:00
terminal_tool.py fix(terminal): skip sudo prompt when local NOPASSWD sudo works 2026-04-30 20:38:09 -07:00
tirith_security.py codebase: add encoding='utf-8' to all bare open() calls (PLW1514) 2026-05-08 14:27:40 -07:00
todo_tool.py
tool_backend_helpers.py fix(cli): coerce use_gateway config flags in tool routing 2026-04-26 19:02:55 -07:00
tool_output_limits.py
tool_result_storage.py
transcription_tools.py fix(ci): stabilize main test suite regressions (#17660) 2026-04-29 23:18:55 -07:00
tts_tool.py feat(cross-platform): psutil for PID/process management + Windows footgun checker 2026-05-08 14:27:40 -07:00
url_safety.py fix(browser): enforce cloud-metadata SSRF floor in hybrid routing (#16234) (#21228) 2026-05-07 05:38:05 -07:00
vision_tools.py fix(vision): guard user_prompt type in video_analyze_tool before debug_call_data construction 2026-05-03 15:28:04 -07:00
voice_mode.py codebase: add encoding='utf-8' to all bare open() calls (PLW1514) 2026-05-08 14:27:40 -07:00
web_tools.py feat(web): add Brave Search (free tier) and DDGS search providers 2026-05-07 09:59:17 -07:00
website_policy.py
xai_http.py
yuanbao_tools.py chore: remove unused imports and dead locals (ruff F401, F841) (#17010) 2026-04-28 06:46:45 -07:00