hermes-bsd/tests/test_web_server.py

78 lines
2 KiB
Python
Raw Normal View History

"""Test that start_server configures ws-ping keepalive.
The server now uses uvicorn.Server directly (not uvicorn.run) so we stub
Config + Server + asyncio.run to capture kwargs without starting an event loop.
"""
import asyncio
import contextlib
fix(tui-gateway): reap leaked slash_worker sessions on disconnect + active_list liveness (re-scoped onto current main) Salvaged from #35626 (banditburai) and re-scoped after maintainers landed the parent-death watchdog (slash_worker.py) and PTY process-group teardown (pty_bridge.py) directly on main. Those pieces are intentionally NOT included here — this carries only what is still missing: - C1 disconnect reap: ws.py's `finally` only re-pointed the dead transport at stdio. `_close_sessions_for_transport` now reaps `close_on_disconnect` sessions and schedules the grace-reap for the rest, offloaded via `asyncio.to_thread` so the blocking worker.close() + DB write never stalls the uvicorn loop. - C2 create/close orphan race: `_attach_worker` stores the worker iff `_sessions.get(sid) is session` under the lock (else closes it), applied at every spawn site incl. the post-turn `_restart_slash_worker`. - Single idempotent teardown funnel: session.close, WS disconnect, the generous-TTL idle reaper, shutdown, and the WS grace-reap all reach `_close_session_by_id` → `_teardown_session`; `_finalized`/`_closed` flags make concurrent/double teardown a no-op. `_sessions_lock` upgraded to RLock. - uvicorn `ws_ping_interval/timeout=20s` so a half-open socket (reverse-proxy 524) becomes a `WebSocketDisconnect` and the C1 path runs. Plus two review-driven hardening fixes (mine): - `session.active_list` now skips `_finalized` sessions so the footer "N sessions" count reflects attachable sessions instead of only ever growing until restart (#38950). Keys on `_finalized` only, NOT the stdio sentinel, so a standalone `hermes --tui` session stays visible. - `_schedule_ws_orphan_reap._reap` pops via `_close_session_by_id` (under `_sessions_lock`) instead of `_sessions.pop` under the unrelated `_session_resume_lock` (#39591); the resume_lock now only guards the orphan re-check against `session.resume`. - Float env knobs (`HERMES_SLASH_WATCHDOG_*`, `HERMES_TUI_SESSION_TTL_S`) parse with a fallback helper so a malformed value can't crash the worker at import. Fixes #32377 Fixes #38950 Addresses #22855 Co-authored-by: banditburai <123342691+banditburai@users.noreply.github.com> Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
2026-06-08 19:50:59 +05:30
import uvicorn
from hermes_cli import web_server
def _stub_uvicorn(monkeypatch):
"""Replace uvicorn.Config/Server with fakes so start_server returns
immediately. Returns a dict with captured Config kwargs."""
captured: dict = {}
class _FakeConfig:
loaded = True
host = "127.0.0.1"
port = 8000
def __init__(self, *args, **kwargs):
captured.update(kwargs)
def load(self):
pass
class lifespan_class:
should_exit = False
state: dict = {}
def __init__(self, *a, **kw):
pass
async def startup(self):
pass
async def shutdown(self):
pass
class _FakeServer:
should_exit = False
started = True
servers: list = []
lifespan = None
@staticmethod
def capture_signals():
return contextlib.nullcontext()
async def startup(self, sockets=None):
pass
async def main_loop(self):
pass
async def shutdown(self, sockets=None):
pass
monkeypatch.setattr(uvicorn, "Config", _FakeConfig)
monkeypatch.setattr(uvicorn, "Server", lambda config: _FakeServer())
return captured
fix(tui-gateway): reap leaked slash_worker sessions on disconnect + active_list liveness (re-scoped onto current main) Salvaged from #35626 (banditburai) and re-scoped after maintainers landed the parent-death watchdog (slash_worker.py) and PTY process-group teardown (pty_bridge.py) directly on main. Those pieces are intentionally NOT included here — this carries only what is still missing: - C1 disconnect reap: ws.py's `finally` only re-pointed the dead transport at stdio. `_close_sessions_for_transport` now reaps `close_on_disconnect` sessions and schedules the grace-reap for the rest, offloaded via `asyncio.to_thread` so the blocking worker.close() + DB write never stalls the uvicorn loop. - C2 create/close orphan race: `_attach_worker` stores the worker iff `_sessions.get(sid) is session` under the lock (else closes it), applied at every spawn site incl. the post-turn `_restart_slash_worker`. - Single idempotent teardown funnel: session.close, WS disconnect, the generous-TTL idle reaper, shutdown, and the WS grace-reap all reach `_close_session_by_id` → `_teardown_session`; `_finalized`/`_closed` flags make concurrent/double teardown a no-op. `_sessions_lock` upgraded to RLock. - uvicorn `ws_ping_interval/timeout=20s` so a half-open socket (reverse-proxy 524) becomes a `WebSocketDisconnect` and the C1 path runs. Plus two review-driven hardening fixes (mine): - `session.active_list` now skips `_finalized` sessions so the footer "N sessions" count reflects attachable sessions instead of only ever growing until restart (#38950). Keys on `_finalized` only, NOT the stdio sentinel, so a standalone `hermes --tui` session stays visible. - `_schedule_ws_orphan_reap._reap` pops via `_close_session_by_id` (under `_sessions_lock`) instead of `_sessions.pop` under the unrelated `_session_resume_lock` (#39591); the resume_lock now only guards the orphan re-check against `session.resume`. - Float env knobs (`HERMES_SLASH_WATCHDOG_*`, `HERMES_TUI_SESSION_TTL_S`) parse with a fallback helper so a malformed value can't crash the worker at import. Fixes #32377 Fixes #38950 Addresses #22855 Co-authored-by: banditburai <123342691+banditburai@users.noreply.github.com> Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
2026-06-08 19:50:59 +05:30
def test_start_server_enables_ws_ping_for_half_open_detection(monkeypatch):
"""WS ping must be configured so half-open connections (reverse-proxy 524,
dropped tunnels) raise WebSocketDisconnect into the reaping path (#32377)."""
captured = _stub_uvicorn(monkeypatch)
fix(tui-gateway): reap leaked slash_worker sessions on disconnect + active_list liveness (re-scoped onto current main) Salvaged from #35626 (banditburai) and re-scoped after maintainers landed the parent-death watchdog (slash_worker.py) and PTY process-group teardown (pty_bridge.py) directly on main. Those pieces are intentionally NOT included here — this carries only what is still missing: - C1 disconnect reap: ws.py's `finally` only re-pointed the dead transport at stdio. `_close_sessions_for_transport` now reaps `close_on_disconnect` sessions and schedules the grace-reap for the rest, offloaded via `asyncio.to_thread` so the blocking worker.close() + DB write never stalls the uvicorn loop. - C2 create/close orphan race: `_attach_worker` stores the worker iff `_sessions.get(sid) is session` under the lock (else closes it), applied at every spawn site incl. the post-turn `_restart_slash_worker`. - Single idempotent teardown funnel: session.close, WS disconnect, the generous-TTL idle reaper, shutdown, and the WS grace-reap all reach `_close_session_by_id` → `_teardown_session`; `_finalized`/`_closed` flags make concurrent/double teardown a no-op. `_sessions_lock` upgraded to RLock. - uvicorn `ws_ping_interval/timeout=20s` so a half-open socket (reverse-proxy 524) becomes a `WebSocketDisconnect` and the C1 path runs. Plus two review-driven hardening fixes (mine): - `session.active_list` now skips `_finalized` sessions so the footer "N sessions" count reflects attachable sessions instead of only ever growing until restart (#38950). Keys on `_finalized` only, NOT the stdio sentinel, so a standalone `hermes --tui` session stays visible. - `_schedule_ws_orphan_reap._reap` pops via `_close_session_by_id` (under `_sessions_lock`) instead of `_sessions.pop` under the unrelated `_session_resume_lock` (#39591); the resume_lock now only guards the orphan re-check against `session.resume`. - Float env knobs (`HERMES_SLASH_WATCHDOG_*`, `HERMES_TUI_SESSION_TTL_S`) parse with a fallback helper so a malformed value can't crash the worker at import. Fixes #32377 Fixes #38950 Addresses #22855 Co-authored-by: banditburai <123342691+banditburai@users.noreply.github.com> Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
2026-06-08 19:50:59 +05:30
# Loopback bind => no auth gate, so this reaches the Config constructor.
fix(tui-gateway): reap leaked slash_worker sessions on disconnect + active_list liveness (re-scoped onto current main) Salvaged from #35626 (banditburai) and re-scoped after maintainers landed the parent-death watchdog (slash_worker.py) and PTY process-group teardown (pty_bridge.py) directly on main. Those pieces are intentionally NOT included here — this carries only what is still missing: - C1 disconnect reap: ws.py's `finally` only re-pointed the dead transport at stdio. `_close_sessions_for_transport` now reaps `close_on_disconnect` sessions and schedules the grace-reap for the rest, offloaded via `asyncio.to_thread` so the blocking worker.close() + DB write never stalls the uvicorn loop. - C2 create/close orphan race: `_attach_worker` stores the worker iff `_sessions.get(sid) is session` under the lock (else closes it), applied at every spawn site incl. the post-turn `_restart_slash_worker`. - Single idempotent teardown funnel: session.close, WS disconnect, the generous-TTL idle reaper, shutdown, and the WS grace-reap all reach `_close_session_by_id` → `_teardown_session`; `_finalized`/`_closed` flags make concurrent/double teardown a no-op. `_sessions_lock` upgraded to RLock. - uvicorn `ws_ping_interval/timeout=20s` so a half-open socket (reverse-proxy 524) becomes a `WebSocketDisconnect` and the C1 path runs. Plus two review-driven hardening fixes (mine): - `session.active_list` now skips `_finalized` sessions so the footer "N sessions" count reflects attachable sessions instead of only ever growing until restart (#38950). Keys on `_finalized` only, NOT the stdio sentinel, so a standalone `hermes --tui` session stays visible. - `_schedule_ws_orphan_reap._reap` pops via `_close_session_by_id` (under `_sessions_lock`) instead of `_sessions.pop` under the unrelated `_session_resume_lock` (#39591); the resume_lock now only guards the orphan re-check against `session.resume`. - Float env knobs (`HERMES_SLASH_WATCHDOG_*`, `HERMES_TUI_SESSION_TTL_S`) parse with a fallback helper so a malformed value can't crash the worker at import. Fixes #32377 Fixes #38950 Addresses #22855 Co-authored-by: banditburai <123342691+banditburai@users.noreply.github.com> Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
2026-06-08 19:50:59 +05:30
web_server.start_server(host="127.0.0.1", port=0, open_browser=False)
assert captured["ws_ping_interval"] == 20.0
assert captured["ws_ping_timeout"] == 20.0