hermes-bsd

History

Teknium e5af1dd633 fix(review): tell background reviewer not to capture transient env failures as skills (#23004 ) Closes #6051. Reported failure mode: agent migrated to WSL2, browser launch failed because Playwright wasn't installed yet. Background reviewer captured the failure as a durable skill (`browser-tool-launch-issue`) and the agent kept refusing the browser tool for weeks after Playwright was installed and verified working. Negative claims also propagated into unrelated skills ("browser tools do not work", "cannot use Y from execute_code"). Root cause: `_SKILL_REVIEW_PROMPT` and `_COMBINED_REVIEW_PROMPT` both lean hard on "be active, save things, a pass that does nothing is a missed learning opportunity." Neither distinguished durable knowledge from transient environment state. The reviewer was doing what it was told. Fix at the write site — both prompts now carry a "Do NOT capture" section calling out: • Environment-dependent failures (missing binaries, fresh-install errors, post-migration path mismatches, 'command not found', unconfigured credentials, uninstalled packages) • Negative claims about tools or features ("X does not work") that harden into self-cited refusals • Session-specific transient errors that resolved before the conversation ended • One-off task narratives ("summarize today's market", "analyze this PR") — also addresses the #12812 / #4538 family Plus a positive-reframing line: when a tool fails because of setup state, capture the FIX (install command, config step, env var) under an existing setup/troubleshooting skill — never "this tool doesn't work" as a standalone constraint. Targeted tests: 24/24 passing in tests/run_agent/test_review_prompt_class_first.py (2 new + all existing review-prompt assertions). Substring-based checks so future prompt edits don't false-fail.		2026-05-09 22:51:25 -07:00
..
__init__.py
conftest.py
test_413_compression.py	fix(agent): surface preflight compression status	2026-05-04 01:41:51 -07:00
test_860_dedup.py	fix: lazy session creation — defer DB row until first message (#18370 )	2026-05-01 18:39:12 +05:30
test_1630_context_overflow_loop.py
test_agent_guardrails.py	fix(agent): include name field on every role:tool message for Gemini compatibility (#16478 )	2026-05-04 05:06:33 -07:00
test_agent_loop.py
test_agent_loop_tool_calling.py
test_agent_loop_vllm.py
test_anthropic_error_handling.py
test_anthropic_prompt_cache_policy.py	fix(minimax): enable Anthropic prompt caching for MiniMax's own models (#17425 )	2026-04-29 04:56:55 -07:00
test_anthropic_third_party_oauth_guard.py
test_anthropic_truncation_continuation.py
test_api_max_retries_config.py
test_async_httpx_del_neuter.py
test_background_review.py	fix(cli): surface self-improvement review summaries from bg thread	2026-04-30 14:07:22 -07:00
test_background_review_summary.py
test_background_review_toolset_restriction.py	fix(ci): stabilize main test suite regressions (#17660 )	2026-04-29 23:18:55 -07:00
test_codex_multimodal_tool_result.py	feat(vision): vision_analyze returns pixels to vision-capable models, not aux text (#22955 )	2026-05-09 21:06:19 -07:00
test_commit_memory_session_context_engine.py	fix(agent): notify context engine on commit_memory_session (#22764 )	2026-05-09 12:28:42 -07:00
test_compress_focus_plugin_fallback.py
test_compression_boundary.py
test_compression_boundary_hook.py
test_compression_feasibility.py
test_compression_persistence.py
test_compression_trigger_excludes_reasoning.py
test_compressor_fallback_update.py
test_concurrent_interrupt.py	test: remove 50 stale/broken tests to unblock CI (#22098 )	2026-05-08 14:55:40 -07:00
test_context_token_tracking.py
test_copilot_native_vision_headers.py
test_create_openai_client_kwargs_isolation.py
test_create_openai_client_proxy_env.py
test_create_openai_client_reuse.py
test_deepseek_reasoning_content_echo.py	fix(deepseek): use non-empty reasoning_content placeholder for V4 Pro thinking mode	2026-04-30 23:04:23 -07:00
test_deepseek_v4_thinking_live.py	fix(deepseek): preserve v4 reasoning_content on replay	2026-04-30 11:18:39 -07:00
test_dict_tool_call_args.py
test_empty_response_recovery_persistence.py	fix(run_agent): break permanent empty-response loop from orphan tool-tail (#21385 )	2026-05-07 08:35:10 -07:00
test_exit_cleanup_interrupt.py
test_fallback_model.py	fix(fallback): resolve api_key_env in fallback chain entries (carve-out of #22665 )	2026-05-09 17:53:56 -07:00
test_image_rejection_fallback.py	fix(computer-use): harden image-rejection fallback + AUTHOR_MAP	2026-05-08 11:07:38 -07:00
test_image_shrink_recovery.py
test_init_fallback_on_exhausted_pool.py	fix(agent): try fallback providers at init when primary credential pool is exhausted (#17929 )	2026-05-02 02:09:46 -07:00
test_interactive_interrupt.py
test_interrupt_propagation.py
test_invalid_context_length_warning.py
test_iteration_budget_race.py	fix(run_agent): acquire lock in IterationBudget.used property	2026-05-04 12:37:28 -07:00
test_jsondecodeerror_retryable.py
test_last_reasoning_per_turn.py	test: pin per-turn reasoning extraction semantics	2026-05-05 05:00:05 -07:00
test_long_context_tier_429.py
test_memory_nudge_counter_hydration.py	fix(agent): hydrate memory-nudge counters from conversation_history (#22774 )	2026-05-09 12:48:03 -07:00
test_memory_provider_init.py
test_memory_sync_interrupted.py	feat(memory): notify providers on mid-process session_id rotation (#17409 )	2026-04-29 04:57:22 -07:00
test_message_sequence_repair.py	fix(run_agent): break permanent empty-response loop from orphan tool-tail (#21385 )	2026-05-07 08:35:10 -07:00
test_openai_client_lifecycle.py
test_percentage_clamp.py
test_plugin_context_engine_init.py
test_primary_runtime_restore.py
test_provider_attribution_headers.py	refactor(gmi): move User-Agent to profile.default_headers	2026-05-08 03:22:11 -07:00
test_provider_fallback.py	fix(fallback): skip chain entries matching current provider/model/base_url (#22780 )	2026-05-09 12:48:19 -07:00
test_provider_parity.py	fix(aux): remove hardcoded Codex fallback model, drop Codex from auto chain (#17765 )	2026-04-29 23:23:50 -07:00
test_real_interrupt_subagent.py
test_redirect_stdout_issue.py
test_repair_tool_call_arguments.py
test_repair_tool_call_name.py
test_review_prompt_class_first.py	fix(review): tell background reviewer not to capture transient env failures as skills (#23004 )	2026-05-09 22:51:25 -07:00
test_run_agent.py	fix(agent): extract thinking from content-list blocks for DeepSeek V4 Pro	2026-05-09 13:36:12 -07:00
test_run_agent_codex_responses.py
test_run_agent_multimodal_prologue.py
test_sequential_chats_live.py
test_session_meta_filtering.py
test_session_reset_fix.py
test_steer.py
test_stream_drop_logging.py	feat(stream-retry): add upstream + timing diagnostics to drop log (#23005 )	2026-05-09 22:49:35 -07:00
test_stream_interrupt_retry.py
test_streaming.py	fix(copilot-acp): disable streaming path for CopilotACPClient	2026-04-28 11:33:07 -07:00
test_streaming_tool_call_repair.py
test_strict_api_validation.py
test_strip_reasoning_tags_cli.py
test_switch_model_context.py
test_switch_model_fallback_prune.py
test_thinking_only_sanitizer.py	fix(agent): drop thinking-only assistant turns before provider call (#16959 )	2026-04-28 03:50:51 -07:00
test_token_persistence_non_cli.py	fix: make session search initialize session db	2026-05-09 14:36:58 -07:00
test_tool_arg_coercion.py	fix(tools): wrap bare scalars in single-element list for array-typed args	2026-05-04 05:00:37 -07:00
test_tool_call_args_sanitizer.py	fix(agent): include name field on every role:tool message for Gemini compatibility (#16478 )	2026-05-04 05:06:33 -07:00
test_tool_call_guardrail_runtime.py	fix(agent): make tool loop guardrails warning-first	2026-04-30 20:43:15 -07:00
test_tool_executor_contextvar_propagation.py	fix(agent): propagate ContextVars to concurrent tool worker threads (#18123 )	2026-04-30 16:26:26 -07:00
test_unicode_ascii_codec.py
test_vision_aware_preprocessing.py