hermes-bsd/tests/hermes_cli
Teknium f67063ba81
feat(kanban): generic diagnostics engine for task distress signals (#20332)
* feat(kanban): generic diagnostics engine for task distress signals

Replaces the hallucination-specific ``warnings`` / ``RecoverySection``
surface (shipped in PR #20232) with a reusable diagnostic-rule engine
that covers five distress kinds in v1 and can be extended without
touching UI code. The "something's wrong with this task" signal is
no longer limited to phantom card ids.

Closes the follow-up from #20232 discussion.

New module
----------
``hermes_cli/kanban_diagnostics.py`` — stateless, no-side-effect rule
engine. Each rule is a pure function of
``(task, events, runs, now, config) -> list[Diagnostic]``. Registry
is a simple list; adding a new distress kind is one function + one
import, no UI or API changes required.

v1 rule set
-----------
* ``hallucinated_cards`` (error) — folds the existing
  ``completion_blocked_hallucination`` event into the new surface.
* ``prose_phantom_refs`` (warning) — folds
  ``suspected_hallucinated_references``.
* ``repeated_spawn_failures`` (error → critical at 2x threshold) —
  fires when ``tasks.spawn_failures >= 3``; suggests
  ``hermes -p <profile> doctor`` / ``auth``.
* ``repeated_crashes`` (error → critical) — fires after N consecutive
  ``crashed`` run outcomes with no successful completion between;
  suggests ``hermes kanban log <id>``.
* ``stuck_in_blocked`` (warning) — fires after 24h in ``blocked``
  state with no comments / unblock attempts; suggests commenting.

Every diagnostic carries structured ``actions`` (reclaim, reassign,
unblock, cli_hint, comment, open_docs) that render consistently in
both CLI and dashboard. Suggested actions are highlighted; generic
recovery actions (reclaim / reassign) are available on every kind as
fallbacks.

Diagnostics auto-clear when the underlying failure resolves — a
clean ``completed``/``edited`` event drops hallucination diagnostics,
a successful run drops crash diagnostics, a comment drops
stuck-blocked diagnostics. Audit events persist; the badge goes away.

API
---
``plugin_api.py``:
* ``/board`` now attaches ``diagnostics`` (full list) and
  ``warnings`` (compact summary with ``highest_severity``) per task.
* ``/tasks/{id}`` attaches diagnostics so the drawer's Diagnostics
  section auto-opens on flagged tasks.
* NEW ``/diagnostics`` endpoint — fleet-wide listing, filterable by
  severity, sorted critical-first.

CLI
---
* NEW ``hermes kanban diagnostics [--severity X] [--task id]
  [--json]`` — fleet view or single-task view, matches dashboard rule
  output so CLI users see the same picture.
* ``hermes kanban show <id>`` now renders a Diagnostics section near
  the top with severity markers + suggested actions.

Dashboard
---------
* Card badge is severity-coloured (⚠ amber warning, !! orange error,
  !!! red critical) using ``warnings.highest_severity``.
* Attention strip above the toolbar counts EVERY task with active
  diagnostics (not just hallucinations), severity-coloured, lists
  affected tasks with Open buttons when expanded.
* Drawer's old ``RecoverySection`` replaced with generic
  ``DiagnosticsSection`` rendering a card per active diagnostic:
  title + detail + structured data (task-id chips when payload keys
  look like id lists) + action buttons. Reassign profile picker is
  inline per-diagnostic. Clipboard fallback uses ``.catch()`` for
  environments where writeText rejects.
* Three-rung severity palette; amber for warning, orange for error,
  red for critical. Uses CSS variables so theming is straightforward.

Tests
-----
* NEW ``tests/hermes_cli/test_kanban_diagnostics.py`` — 14 unit tests
  covering each rule's positive/negative/threshold paths, severity
  sorting, broken-rule isolation, and sqlite3.Row integration.
* Dashboard plugin tests extended: ``/diagnostics`` endpoint (empty,
  populated, severity-filtered), ``/board`` exposes both diagnostic
  list and compact summary with ``highest_severity``.
* Existing hallucination-specific test (``test_board_surfaces_
  warnings_field_for_hallucinated_completions``) updated to reflect
  the new contract: warning summary keys by diagnostic kind
  (``hallucinated_cards``) not event kind.

379 kanban-suite tests pass (+16 net from this PR).

Live verification
-----------------
Seeded all 5 diagnostic kinds + one clean + one plain-running task
(7 total) into an isolated HERMES_HOME, spun up the dashboard, and
verified:

* Attention strip: shows ``!! 5 tasks need attention`` in the
  error-severity orange; Show expands to a list of 5 rows ordered
  critical > error > warning.
* Card badges: error tasks render ``!!`` orange, warning tasks
  render ``⚠`` amber, clean and plain-running tasks render no badge.
* Each of the 5 rules opens a correctly-coloured, correctly-styled
  diagnostic card in the drawer with its specific suggested action.
* Live reassign from a diagnostic card flipped
  ``broken-ml-worker → alice`` and the drawer refreshed with the
  new assignee + the same diagnostic still firing (correct:
  spawn_failures counter hasn't reset yet).
* CLI ``hermes kanban diagnostics`` prints all 5 in severity order;
  ``--severity error`` narrows to 3; ``kanban show <id>`` includes
  the Diagnostics block at the top with suggested action hint.

Migration note
--------------
The old ``warnings`` shape (``{count, kinds, latest_at}``) is
preserved on the API but ``kinds`` now keys by diagnostic kind
(``hallucinated_cards``) instead of event kind
(``completion_blocked_hallucination``). ``highest_severity`` is a
new required field. The dashboard was the only consumer and has
been updated in the same commit; external API consumers of the
``warnings`` field will need to update their kind-match logic.

* feat(kanban/diagnostics): lead titles with the actual error text

The generic 'Worker crashed N runs in a row' / 'Worker failed to spawn
N times' titles buried the actual cause in the data section. Operators
had to open logs or expand the diagnostic to see WHY the worker is
stuck — rate-limit vs insufficient quota vs bad auth vs context
overflow vs network blip all looked identical at a glance.

New titles:

  Agent crashed 3x: openai: 429 Too Many Requests - rate limit reached
  Agent crashed 3x: anthropic: 402 insufficient_quota - credit balance
  Agent crashed 3x: provider auth error: 401 Unauthorized
  Agent spawn failed 4x: insufficient_quota: You exceeded your current

Detail keeps the full error snippet (capped at 500 chars + ellipsis
for tracebacks). Title takes the first line capped at 160 chars.
Fallback title if no error recorded stays honest ('no error recorded').

Tests: 4 new cases covering 429/billing/spawn/truncation. 383 total
pass (+4).

Live-verified on dashboard with 6 seeded scenarios
(rate-limit, billing, auth, context, network, spawn-billing) —
each card title leads with the actionable error text.
2026-05-05 13:32:42 -07:00
..
__init__.py
conftest.py fix(kanban): suppress dispatcher stuck-warn when ready queue holds only non-spawnable assignees 2026-05-05 04:13:12 -07:00
test_ai_gateway_models.py
test_anthropic_model_flow_stale_oauth.py
test_anthropic_oauth_flow.py
test_anthropic_provider_persistence.py
test_api_key_providers.py chore(salvage): strip duplicated/merge-corrupted blocks from PR #17664 2026-04-29 21:56:51 -07:00
test_apply_model_switch_result_context.py
test_arcee_provider.py feat(providers): add tencent-tokenhub provider support 2026-04-28 03:45:52 -07:00
test_argparse_flag_propagation.py
test_at_context_completion_filter.py
test_atomic_json_write.py
test_atomic_yaml_write.py
test_auth_codex_provider.py
test_auth_commands.py fix(auth): make provider config writes atomic 2026-04-30 20:39:41 -07:00
test_auth_nous_provider.py feat(nous): persist Nous OAuth across profiles via shared token store (#19712) 2026-05-04 04:54:55 -07:00
test_auth_provider_gate.py
test_auth_qwen_provider.py
test_auth_ssl_macos.py
test_aux_config.py
test_azure_detect.py
test_backup.py fix(backup): floor pre-update backup_keep to 1 so the new backup survives 2026-05-04 05:07:13 -07:00
test_banner.py
test_banner_git_state.py
test_banner_skills.py
test_bedrock_model_picker.py fix(model): avoid bedrock credential probe in provider picker 2026-05-03 00:32:55 -07:00
test_chat_skills_flag.py
test_claw.py fix(ci): stabilize main test suite regressions (#17660) 2026-04-29 23:18:55 -07:00
test_clear_stale_base_url.py
test_cmd_update.py fix(update): sync bundled skills to all profiles, including active (#16176) 2026-05-04 12:34:53 -07:00
test_coalesce_session_args.py
test_codex_cli_model_picker.py
test_codex_models.py
test_commands.py feat: add Telegram DM topic-mode sessions 2026-05-04 12:07:17 -07:00
test_completion.py
test_config.py fix(cli): prevent .env sanitizer from splitting GLM_API_KEY by LM_API_KEY suffix 2026-04-28 22:22:45 -07:00
test_config_drift.py
test_config_env_expansion.py fix(ci): stabilize main test suite regressions (#17660) 2026-04-29 23:18:55 -07:00
test_config_env_refs.py
test_config_validation.py fix(config): accept fallback_model list (chain) in validator + save 2026-04-28 01:40:25 -07:00
test_container_aware_cli.py fix(ci): stabilize main test suite regressions (#17660) 2026-04-29 23:18:55 -07:00
test_copilot_auth.py
test_copilot_catalog_oauth_fallback.py fix(copilot): require successful exchange when walking credential_pool catalog tokens 2026-04-28 01:18:09 -07:00
test_copilot_context.py
test_copilot_in_model_list.py
test_copilot_token_exchange.py
test_cron.py
test_curator_archive_prune.py feat(curator): add archive and prune subcommands (#20200) 2026-05-05 05:15:54 -07:00
test_curator_status.py fix(curator): only mark agent-created for background-review sediment (#19621) 2026-05-04 02:42:16 -07:00
test_custom_provider_context_length.py
test_custom_provider_model_switch.py fix(cli): omit empty api_mode when probing custom models 2026-05-04 02:46:41 -07:00
test_dashboard_browser_safe_imports.py Merge upstream/main and address Copilot review feedback 2026-04-30 06:43:22 -04:00
test_dashboard_lifecycle_flags.py feat(dashboard): add --stop and --status flags (#17840) 2026-04-30 02:30:20 -07:00
test_dashboard_profiles_nav_label.py fix(dashboard): keep profiles list resilient 2026-04-29 01:39:52 -04:00
test_debug.py fix(debug): redact log content at upload time in hermes debug share 2026-05-03 11:42:20 -07:00
test_deprecated_cwd_warning.py
test_detect_api_mode_for_url.py
test_determine_api_mode_hostname.py
test_dingtalk_auth.py
test_discord_skill_clamp_warning.py test: add tests for cmd_key preservation through name clamping 2026-05-03 03:25:45 -07:00
test_doctor.py fix(doctor): check gh auth status when GITHUB_TOKEN absent 2026-05-04 12:34:31 -07:00
test_doctor_command_install.py
test_env_loader.py clarify placeholder telegram credential in tests 2026-05-04 15:31:15 -04:00
test_env_sanitize_on_load.py clarify placeholder telegram credential in tests 2026-05-04 15:31:15 -04:00
test_fallback_cmd.py
test_gateway.py fix(tests): tolerate ps ancestor-walk in find_gateway_pids fallback test (#19590) 2026-05-04 01:40:39 -07:00
test_gateway_linger.py
test_gateway_runtime_health.py
test_gateway_service.py fix(gateway): handle planned service stops 2026-05-04 16:00:49 -07:00
test_gateway_wsl.py
test_gemini_free_tier_setup_block.py
test_gemini_provider.py
test_gmi_provider.py fix(providers/gmi): post-salvage review fixes 2026-04-27 11:17:59 -07:00
test_goals.py feat: /goal — persistent cross-turn goals (Ralph loop) (#18262) 2026-04-30 23:10:20 -07:00
test_hooks_cli.py
test_ignore_user_config_flags.py refactor(cli): derive relaunch flag table from argparse introspection 2026-04-29 20:33:29 -07:00
test_image_gen_picker.py
test_kanban_boards.py fix(kanban): ignore stale current board pointers 2026-05-05 04:34:45 -07:00
test_kanban_cli.py feat(kanban): hallucination gate + recovery UX for worker-created-card claims (#20232) 2026-05-05 08:06:55 -07:00
test_kanban_core_functionality.py feat(kanban): hallucination gate + recovery UX for worker-created-card claims (#20232) 2026-05-05 08:06:55 -07:00
test_kanban_db.py fix(kanban): suppress dispatcher stuck-warn when ready queue holds only non-spawnable assignees 2026-05-05 04:13:12 -07:00
test_kanban_diagnostics.py feat(kanban): generic diagnostics engine for task distress signals (#20332) 2026-05-05 13:32:42 -07:00
test_launcher.py
test_list_picker_providers.py feat(cli): add list_picker_providers for credential-filtered picker 2026-05-05 10:18:58 -07:00
test_logs.py
test_managed_installs.py
test_mcp_config.py
test_mcp_reload_confirm_gate.py feat(gateway,cli): confirm /reload-mcp to warn about prompt cache invalidation 2026-04-29 21:56:47 -07:00
test_mcp_tools_config.py
test_memory_reset.py
test_model_catalog.py
test_model_normalize.py
test_model_picker_viewport.py
test_model_provider_persistence.py fix(auth): make provider config writes atomic 2026-04-30 20:39:41 -07:00
test_model_switch_context_display.py
test_model_switch_copilot_api_mode.py
test_model_switch_custom_providers.py test(model_switch): update regression to reflect bare-custom guard 2026-04-30 04:32:11 -07:00
test_model_switch_opencode_anthropic.py
test_model_switch_variant_tags.py
test_model_validation.py chore(salvage): strip duplicated/merge-corrupted blocks from PR #17664 2026-04-29 21:56:51 -07:00
test_models.py
test_models_dev_preferred_merge.py
test_non_ascii_credential.py
test_nous_hermes_non_agentic.py
test_nous_subscription.py fix(cli): coerce use_gateway config flags in tool routing 2026-04-26 19:02:55 -07:00
test_ollama_cloud_auth.py
test_ollama_cloud_provider.py fix(models): strip :cloud/-cloud suffix from models.dev Ollama Cloud IDs 2026-05-04 12:38:15 -07:00
test_openai_codex_model_validation_fallback.py fix(model-switch): soft-accept unlisted openai-codex models 2026-05-04 05:06:53 -07:00
test_opencode_go_in_model_list.py
test_opencode_go_validation_fallback.py
test_overlay_slug_resolution.py
test_path_completion.py
test_pin_kanban_board_env.py test(kanban): isolate HERMES_KANBAN_BOARD writes in pin-env tests 2026-05-05 04:37:47 -07:00
test_placeholder_usage.py
test_plugin_cli_registration.py
test_plugin_scanner_recursion.py
test_plugins.py fix(plugins): bound async plugin command await with 30s timeout 2026-04-30 19:56:18 -07:00
test_plugins_cmd.py fix: treat ctrl-c as curses cancel 2026-05-04 01:36:44 -07:00
test_profile_export_credentials.py
test_profiles.py fix(profiles): keep validate_profile_name strict; callers normalize first 2026-05-04 04:44:37 -07:00
test_prompt_api_key.py fix(setup): offer Keep/Replace/Clear when API key already exists 2026-05-05 04:08:11 -07:00
test_provider_config_validation.py fix(config): add request_timeout_seconds and stale_timeout_seconds to provider _KNOWN_KEYS 2026-04-28 01:28:25 -07:00
test_pty_bridge.py fix(ci): stabilize main test suite regressions (#17660) 2026-04-29 23:18:55 -07:00
test_reasoning_effort_menu.py
test_redact_config_bridge.py feat(security): make secret redaction off by default (#16794) 2026-04-27 21:24:08 -07:00
test_regression_16767.py test(cli): regression coverage for user-provider routing fix (#16767) 2026-04-28 01:47:20 -07:00
test_relaunch.py remove relaunch_chat 2026-04-29 20:33:29 -07:00
test_resolve_last_session.py fix(cli): tighten MRU lookup and session DB cleanup 2026-04-27 08:52:12 -07:00
test_runtime_provider_resolution.py fix(fallback): let custom_providers shadow built-in aliases 2026-04-30 20:18:44 -07:00
test_session_browse.py fix(sessions): /save lands under $HERMES_HOME, widen browse+TUI picker, force-refresh ollama-cloud on setup (#16296) 2026-04-26 18:49:48 -07:00
test_sessions_delete.py
test_set_config_value.py fix(config): preserve YAML lists in hermes config set (#17876) 2026-04-30 04:32:17 -07:00
test_setup.py fix(setup): add missing SLACK_HOME_CHANNEL prompt to _setup_slack() 2026-05-04 01:37:18 -07:00
test_setup_agent_settings.py fix(gateway): shutdown + restart hygiene (drain timeout, false-fatal, success log) (#18761) 2026-05-02 02:08:06 -07:00
test_setup_hermes_script.py
test_setup_irc.py feat(plugins): bundled platform plugins auto-load by default 2026-04-29 21:56:51 -07:00
test_setup_matrix_e2ee.py
test_setup_model_provider.py
test_setup_noninteractive.py
test_setup_ollama_cloud_force_refresh.py fix(sessions): /save lands under $HERMES_HOME, widen browse+TUI picker, force-refresh ollama-cloud on setup (#16296) 2026-04-26 18:49:48 -07:00
test_setup_openclaw_migration.py feat(plugins): bundled platform plugins auto-load by default 2026-04-29 21:56:51 -07:00
test_setup_prompt_menus.py fix(cli): sanitize bracketed paste markers during setup 2026-05-05 06:12:42 -07:00
test_setup_reconfigure.py
test_skills_config.py
test_skills_hub.py feat(skills): install skills from a direct HTTP(S) URL (#16323) 2026-04-26 20:57:10 -07:00
test_skills_install_flags.py
test_skills_skip_confirm.py
test_skills_subparser.py
test_skin_engine.py fix(tui): restore macOS copy behavior and theme polish (#17131) 2026-04-28 18:47:14 -05:00
test_spotify_auth.py
test_status.py feat: add Vercel Sandbox backend 2026-04-29 07:22:33 -07:00
test_status_model_provider.py feat(agent): add lmstudio integration 2026-04-28 12:27:36 -07:00
test_subparser_routing_fallback.py
test_subprocess_timeouts.py
test_suppress_eio_on_interrupt.py
test_tencent_tokenhub_provider.py feat(providers): add tencent-tokenhub provider support 2026-04-28 03:45:52 -07:00
test_terminal_menu_fallbacks.py
test_timeouts.py
test_tips.py
test_tool_token_estimation.py
test_tools_config.py fix(cli): sync use_gateway in _reconfigure_provider for tts, browser, and web 2026-05-04 02:33:55 -07:00
test_tools_disable_enable.py
test_tui_npm_install.py fix(tui): tolerate npm's peer-flag drop in lockfile comparison 2026-05-04 14:13:38 +10:00
test_tui_resume_flow.py fix(tui): honor launch toolsets (#17623) 2026-04-29 16:55:27 -07:00
test_update_autostash.py fix(ci): recover 38 failing tests on main (#17642) 2026-04-29 20:05:32 -07:00
test_update_check.py
test_update_config_clears_custom_fields.py
test_update_gateway_restart.py fix(gateway): drain manual profile gateways via SIGUSR1 before respawn 2026-04-30 20:00:31 -07:00
test_update_hangup_protection.py
test_update_stale_dashboard.py fix(tests): make test_update_stale_dashboard immune to hermes_cli.main reload (#17881) 2026-04-30 02:46:56 -07:00
test_update_yes_flag.py feat(update): add --yes/-y flag to skip interactive prompts (#18261) 2026-04-30 23:06:32 -07:00
test_user_providers_model_switch.py test(model_switch): cover private user_providers override 2026-04-30 19:44:26 -07:00
test_voice_wrapper.py fix(tui): respect voice.record_key config (supersedes #19028, #19339) (#19835) 2026-05-04 15:49:28 -07:00
test_web_server.py Merge upstream/main and address Copilot review feedback 2026-04-30 06:43:22 -04:00
test_web_server_host_header.py
test_web_ui_build.py
test_webhook_cli.py
test_xiaomi_provider.py feat(providers): add tencent-tokenhub provider support 2026-04-28 03:45:52 -07:00