hermes-bsd/hermes_cli
Teknium fe0b3f2338
fix(windows): retry watcher Popen without breakaway when parent job denies it, plus regression tests for the breakaway bit (#40956)
#40909 added `CREATE_BREAKAWAY_FROM_JOB` to `windows_detach_flags()`,
which fixed the headline bug (gateway dies after Desktop GUI update
and never comes back). The flag's own docstring acknowledges that
restrictive parent job objects can still refuse breakaway with
`ERROR_ACCESS_DENIED`, surfacing as `OSError` on the `subprocess.Popen`
call:

  "Callers in this codebase already wrap detached spawns in
  try/except OSError and fall back to a cmd.exe wrapper, so the
  breakaway-denied case degrades gracefully rather than crashing."

That's true for `_spawn_detached` in `gateway_windows.py` (the
`hermes gateway start` path), which has both the breakaway bit AND a
retry-without-breakaway fallback. It's NOT true for the post-update
watcher path in `launch_detached_profile_gateway_restart`
(`hermes_cli/gateway.py`), which only has `except OSError: return
False` and gives up entirely. If a user's shell/terminal/container
wraps Hermes in a breakaway-denying job, the gateway-respawn watcher
silently fails to launch instead of trying again without breakaway.

This PR closes that gap and adds the regression tests that were
missing from the original fix.

## Changes

### `hermes_cli/_subprocess_compat.py`

Adds a sibling helper `windows_detach_flags_without_breakaway()` so
callers can express the fallback symbolically (via the helper) rather
than coding the magic `& ~0x01000000` mask at every site. Documented
on `windows_detach_flags` and `windows_detach_flags_without_breakaway`
with the recommended try/except pattern.

### `hermes_cli/gateway.py::launch_detached_profile_gateway_restart`

Two changes, both aligned with the canonical pattern in
`gateway_windows._spawn_detached`:

1. The outer watcher Popen now wraps in `try/except OSError`, and on
   failure retries with `windows_detach_flags_without_breakaway()`
   (POSIX never reaches this branch — `start_new_session=True` can't
   raise OSError).
2. The inlined respawn payload (the `python -c` watcher) also
   wraps its CreateProcess in try/except OSError and retries with
   `_flags & ~_CREATE_BREAKAWAY_FROM_JOB` on failure. This matters
   because the watcher's job-object inheritance is independent of the
   outer process's — even if the outer Popen succeeds with breakaway,
   the respawned gateway might inherit a job that doesn't.

### Regression tests in `tests/tools/test_windows_native_support.py`

#40909 shipped the fix without any test that the breakaway bit is
present (the existing `test_windows_detach_flags_has_expected_win32_bits`
asserted only the three legacy bits). Four new tests close that:

- `test_windows_detach_flags_includes_breakaway_from_job` — explicit
  assertion that the breakaway bit is in the default bundle, with the
  rationale spelled out in the docstring so a future maintainer
  staring at this test understands why removing it would resurrect
  the gateway-dies-after-GUI-update bug.
- `test_windows_detach_flags_without_breakaway_drops_only_that_bit`
  — fallback payload keeps the other three detach bits intact.
- `test_launch_detached_profile_gateway_restart_inlined_watcher_uses_breakaway`
  — static-text check on the stringified watcher payload. The inlined
  Python program isn't reachable via normal import-time inspection
  because it lives in a `textwrap.dedent("""...""")` literal that
  gets passed to a separate `python -c` interpreter. Asserting that
  both `_CREATE_BREAKAWAY_FROM_JOB` (symbolic) and `0x01000000` (hex
  literal) appear inside the dedent block is a sufficient regression
  guard against accidental refactors.
- `test_launch_detached_profile_gateway_restart_outer_popen_has_access_denied_fallback`
  — static check that this PR's fallback retry is wired up
  symbolically. Without standing up a real Windows job object that
  refuses breakaway, we can't trigger the OSError in a unit test;
  the text guard catches the case where a future refactor removes
  the helper import or the `& ~_CREATE_BREAKAWAY_FROM_JOB` retry.

Also extends `test_windows_detach_flags_has_expected_win32_bits` to
include the breakaway bit assertion and updates
`test_windows_flags_zero_on_posix` to cover the new helper.

## Tests

Locally on Windows: 8/8 in the `-k "detach or breakaway or
popen_kwargs or launch_detached or gateway_run_update or
hermes_cli_gateway"` slice pass.

Broader `tests/hermes_cli/test_gateway*.py + test_windows_native_support.py`:
172 passed, 10 failed. All 10 failures are pre-existing POSIX-only
tests running on a Windows host (os.geteuid, SIGKILL fallback,
is_linux fixture mismatches). Stashing this PR and re-running on bare
post-#40909 main reproduces all 10 identically — none are regressions.

POSIX paths unchanged: `windows_detach_flags()` and
`windows_detach_flags_without_breakaway()` both return 0 off Windows,
`windows_detach_popen_kwargs()` still yields `{"start_new_session": True}`.

## Out of scope

- The other detached-spawn site in `hermes_cli/gateway.py` (around
  line 3068) also uses `windows_detach_popen_kwargs()` + `except
  OSError`. It deserves the same fallback treatment but the codepath
  is different enough (not the update-flow watcher) that it warrants
  a separate PR with its own scrutiny.
- `gateway/run.py` has Windows branches with `windows_detach_popen_kwargs`
  too — same reasoning.

## Context

Follow-up to #40909 (merged). I had a parallel PR (#40934, closed)
that duplicated the core breakaway fix; the bits unique to that PR
that #40909 didn't cover are the contents of this one. Closing #40934
and opening this slimmed-down version as the focused follow-up.
2026-06-07 01:21:58 -07:00
..
dashboard_auth fix(desktop): gate OAuth remote connect on AT-or-RT, not access token alone 2026-06-04 22:18:46 -07:00
proxy
__init__.py chore: release v0.16.0 (2026.6.5) (#40206) 2026-06-05 17:55:43 -07:00
_parser.py feat(cli): configurable default interface (cli vs tui) 2026-06-02 20:49:44 -05:00
_subprocess_compat.py fix(windows): retry watcher Popen without breakaway when parent job denies it, plus regression tests for the breakaway bit (#40956) 2026-06-07 01:21:58 -07:00
auth.py revert: keep Google Chat OAuth secret + active_provider profile-scoped (#39398) 2026-06-04 16:54:40 -07:00
auth_commands.py fix(auth): set active_provider after hermes auth add qwen-oauth 2026-06-04 05:58:33 -07:00
azure_detect.py
backup.py
banner.py fix(update-check): stop reporting phantom "N commits behind" inside Docker (#39559) 2026-06-05 15:37:19 +10:00
browser_connect.py
build_info.py
bundles.py
callbacks.py
checkpoints.py
claw.py fix: batch of small robustness/correctness fixes from @kyssta-exe 2026-06-01 19:51:03 -07:00
cli_output.py
clipboard.py
codex_models.py
codex_runtime_plugin_migration.py
codex_runtime_switch.py
colors.py
commands.py Add /version slash command across CLI, gateway, TUI, and desktop. 2026-06-05 18:05:05 -07:00
completion.py fix: batch of small robustness/correctness fixes from @kyssta-exe 2026-06-01 19:51:03 -07:00
config.py docs(kanban): clarify decomposer profile roles 2026-06-06 19:29:00 -07:00
container_boot.py fix(docker): seed s6 gateway state for legacy run cmd (#34829) 2026-06-01 11:28:56 +10:00
copilot_auth.py
cron.py fix(cron): don't crash on cron list when a job's repeat is null 2026-06-05 00:19:45 -07:00
curator.py
curses_ui.py feat(cli): ranked fuzzy search in the curses model picker 2026-06-01 16:58:58 -07:00
dashboard_register.py fix(dashboard): honor --portal-url / HERMES_DASHBOARD_PORTAL_URL override in register 2026-06-04 00:17:57 -07:00
debug.py feat(dashboard): add Debug Share to the System page (#38600) 2026-06-03 19:37:04 -07:00
default_soul.py
dep_ensure.py
dingtalk_auth.py
doctor.py fix(install): correct check_dir tautology and add --workspace web test 2026-06-06 18:22:20 -07:00
dump.py
env_loader.py
fallback_cmd.py
fallback_config.py
gateway.py fix(windows): retry watcher Popen without breakaway when parent job denies it, plus regression tests for the breakaway bit (#40956) 2026-06-07 01:21:58 -07:00
gateway_windows.py fix(gateway,windows): reliability — JOB breakaway + status --deep probes + test-leak fix (#40909) 2026-06-06 19:53:58 -07:00
goals.py
gui_uninstall.py feat: uninstall the Chat GUI without removing the agent (CLI + desktop UI) (#40355) 2026-06-06 18:22:38 -07:00
hooks.py
inventory.py fix(inventory): avoid fresh Nous tier checks in picker payloads 2026-06-07 00:41:13 -07:00
kanban.py fix(kanban): isolate board override per concurrent call 2026-06-04 07:39:53 -07:00
kanban_db.py fix(kanban): isolate board override per concurrent call 2026-06-04 07:39:53 -07:00
kanban_decompose.py docs(kanban): clarify decomposer profile roles 2026-06-06 19:29:00 -07:00
kanban_diagnostics.py
kanban_specify.py
kanban_swarm.py
logs.py feat(debug): include desktop.log in hermes debug share / /debug / hermes logs (#38203) 2026-06-03 05:41:35 -07:00
main.py feat: uninstall the Chat GUI without removing the agent (CLI + desktop UI) (#40355) 2026-06-06 18:22:38 -07:00
managed_uv.py fix(update/windows): don't return _UvResult on Windows (subprocess argv crash) (#39820) 2026-06-05 07:54:08 -05:00
mcp_catalog.py
mcp_config.py fix(mcp): ensure server.shutdown() on probe iteration failure 2026-06-04 17:11:17 -07:00
mcp_picker.py
mcp_startup.py
memory_setup.py fix(memory): fall back to pip when uv is unavailable (salvage #5954) (#38668) 2026-06-04 14:03:02 +10:00
middleware.py fix(middleware): single-use next_call guard + deepcopy-safe request copies 2026-06-06 23:07:25 +05:30
migrate.py
model_catalog.py
model_normalize.py
model_switch.py refactor(inventory): make force_fresh_nous_tier keyword-only + pin contract 2026-06-07 00:41:13 -07:00
models.py fix(models): use deepseek-v4-flash as Nous silent default 2026-06-05 02:54:34 -07:00
nous_account.py feat(credits): usage-aware credits — in-session notices, /usage view, dev readout (#40011) 2026-06-06 13:18:18 +05:30
nous_subscription.py fix(cli): require Chromium for local browser readiness in setup/status surfaces 2026-06-05 04:06:17 -07:00
oneshot.py
pairing.py
partial_compress.py
platforms.py
plugins.py feat(middleware): add adaptive execution intercepts 2026-06-03 11:22:06 -07:00
plugins_cmd.py
portal_cli.py feat(cli): make hermes portal run the full quick-setup Nous flow (model picker) 2026-06-04 02:20:31 +05:30
profile_describer.py
profile_distribution.py
profiles.py feat(cli): display custom profile alias names in profile list/show (#40371) 2026-06-06 08:08:07 +00:00
prompt_size.py
providers.py fix(model-picker): OpenAI shows curated models; OpenRouter no longer phantom-shows (#37404) 2026-06-02 06:31:37 -07:00
psutil_android.py
pt_input_extras.py
pty_bridge.py fix(dashboard): clamp PTY resize dimensions for WSL2 winsize garbage (#38200) 2026-06-03 09:00:16 -07:00
relaunch.py
runtime_provider.py fix(gateway): honor per-provider max_output_tokens in max_tokens chain 2026-06-05 09:10:26 -07:00
secret_prompt.py
secrets_cli.py fix(secrets): fail early with clear error when bitwarden setup runs without TTY (#40571) 2026-06-06 18:36:40 -07:00
security_advisories.py
security_audit.py
send_cmd.py
service_manager.py Remove prviliges drop when you never ran as root (#34837) 2026-06-01 13:54:18 +10:00
session_recap.py
setup.py fix(cli): require Chromium for local browser readiness in setup/status surfaces 2026-06-05 04:06:17 -07:00
skills_config.py
skills_hub.py feat(skills): fix browse cap, add source links + copy buttons + category cleanup (#37143) 2026-06-01 19:52:28 -07:00
skin_engine.py
slack_cli.py
status.py feat(cli): make hermes portal the human-readable Portal onboarding alias 2026-06-04 01:19:28 +05:30
stdio.py
telegram_managed_bot.py Add CLI Telegram QR onboarding 2026-06-05 03:20:10 -07:00
timeouts.py
tips.py docs: align runtime footer field docs 2026-06-06 11:20:40 -06:00
tools_config.py fix(install): scope npm installs/audits to avoid pulling in apps/desktop 2026-06-06 18:22:20 -07:00
uninstall.py feat: uninstall the Chat GUI without removing the agent (CLI + desktop UI) (#40355) 2026-06-06 18:22:38 -07:00
voice.py
web_server.py feat(desktop): make cron jobs the first-class sidebar entity 2026-06-06 14:04:11 -05:00
webhook.py
xai_retirement.py