hermes-bsd/tests/run_agent
Teknium d6c11a4575
test(run_agent): fix racy ordering in test_concurrent_handles_tool_error (#42356)
The test keyed the 'which call raises' decision on a shared invocation
counter (first call → raise, second → success), then asserted the error
landed in messages[0] (c1) and success in messages[1] (c2). But
_execute_tool_calls_concurrent runs the two web_search calls on a thread
pool with no ordering guarantee — c2's handler can be invoked first, take
the 'first call raises' branch, and the error ends up in messages[1].
Results are ordered by tool_call_id, so messages[0] (c1) was then 'success'
and the assertion failed.

It passed in isolation but reliably failed under CI's full parallel slice
(8 xdist workers) where the scheduler actually interleaves the two handlers.

Fix: tie the raise to a specific tool call via its arguments (q=boom raises,
q=ok succeeds) instead of invocation order, and assert tool_call_id ↔ content
pairing explicitly. Deterministic regardless of thread scheduling — verified
10/10 in isolation and the full TestConcurrentToolExecution class (32) green.
2026-06-08 14:40:39 -07:00
..
__init__.py
conftest.py
test_413_compression.py refactor(agent): extract run_conversation prologue into agent/turn_context.py 2026-06-07 22:17:35 -07:00
test_860_dedup.py fix: harden gateway startup and turn persistence 2026-06-07 02:15:23 -07:00
test_1630_context_overflow_loop.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_18028_content_policy_blocked.py fix(agent): fallback immediately on provider content-policy blocks (#33883) 2026-05-28 07:28:24 -07:00
test_31273_402_not_retried.py
test_agent_guardrails.py
test_anthropic_prompt_cache_policy.py
test_anthropic_third_party_oauth_guard.py
test_anthropic_truncation_continuation.py
test_api_max_retries_config.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_async_httpx_del_neuter.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_background_review.py test: cover ci-unblocker production regressions 2026-05-27 22:14:53 -07:00
test_background_review_cache_parity.py
test_background_review_summary.py
test_background_review_toolset_restriction.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_callable_api_key.py refactor(cli): extract agent-construction cluster into CLIAgentSetupMixin (god-file Phase 4) 2026-06-08 09:41:34 -07:00
test_codex_app_server_integration.py
test_codex_multimodal_tool_result.py
test_codex_no_tools_nonetype.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_codex_silent_hang_hint.py
test_codex_xai_oauth_recovery.py Add Hermes desktop app (#20059) 2026-05-31 17:46:56 -05:00
test_commit_memory_session_context_engine.py
test_compress_focus_plugin_fallback.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_compression_boundary.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_compression_boundary_hook.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_compression_feasibility.py
test_compression_persistence.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_compression_trigger_excludes_reasoning.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_compressor_fallback_update.py
test_concurrent_interrupt.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_context_token_tracking.py
test_copilot_native_vision_headers.py
test_create_openai_client_kwargs_isolation.py
test_create_openai_client_proxy_env.py
test_create_openai_client_reuse.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_credential_pool_interrupt.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_deepseek_reasoning_content_echo.py fix(agent): re-pad reasoning_content on cross-provider fallback to require-side providers 2026-05-28 03:21:00 -07:00
test_deepseek_v4_thinking_live.py
test_dict_tool_call_args.py test(run_agent): align test_dict_tool_call_args with explainer suffix 2026-05-29 19:23:05 -07:00
test_empty_response_recovery_persistence.py
test_exit_cleanup_interrupt.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_fallback_credential_isolation.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_file_mutation_verifier.py fix(security): neutralize file paths in mutation-verifier footer (#35584) (#35684) 2026-05-30 23:05:23 -07:00
test_image_rejection_fallback.py
test_image_shrink_recovery.py fix(vision): guard image pixel dimensions, not just bytes (#37677) 2026-06-04 06:16:45 -07:00
test_infinite_compaction_loop.py fix(compaction): prevent infinite loop when transcript fits in tail budget 2026-06-07 21:50:57 -07:00
test_init_fallback_on_exhausted_pool.py
test_interactive_interrupt.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_interrupt_propagation.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_invalid_context_length_warning.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_iteration_budget_race.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_jsondecodeerror_retryable.py fix(agent): classify TypeError('NoneType ... not iterable') as retryable provider shape error 2026-05-27 11:30:55 -07:00
test_last_reasoning_per_turn.py
test_long_context_tier_429.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_materialize_data_url_cleanup.py
test_memory_nudge_counter_hydration.py refactor(agent): extract run_conversation prologue into agent/turn_context.py 2026-06-07 22:17:35 -07:00
test_memory_provider_init.py fix(memory): reject memory tools that shadow core tool names (#40902) 2026-06-06 18:44:09 -07:00
test_memory_sync_interrupted.py feat: expose completed-turn message context to memory providers 2026-05-29 02:16:43 +05:30
test_message_sequence_repair.py
test_multimodal_tool_content_recovery.py fix(vision): proactive downgrade for providers rejecting list-type tool content (#41072) 2026-06-07 21:50:57 -07:00
test_notice_spine.py feat(credits): usage-aware credits — in-session notices, /usage view, dev readout (#40011) 2026-06-06 13:18:18 +05:30
test_openai_client_lifecycle.py
test_partial_stream_finish_reason.py fix(stream): don't report dropped mid-tool-call streams as output truncation (#42314) 2026-06-08 11:56:10 -07:00
test_percentage_clamp.py test: repoint percentage-clamp source guard to gateway/slash_commands.py 2026-06-08 01:25:35 -07:00
test_plugin_context_engine_init.py fix: expose context engine tools with saved toolsets 2026-05-28 00:28:42 -07:00
test_primary_runtime_restore.py Add Hermes desktop app (#20059) 2026-05-31 17:46:56 -05:00
test_provider_attribution_headers.py fix(agent): honor model.default_headers for custom OpenAI-compatible providers (#40033) 2026-06-07 02:02:40 -07:00
test_provider_fallback.py
test_provider_parity.py fix: strip extra_content from tool_calls for strict APIs (Fireworks, Mistral) 2026-06-03 16:42:52 -07:00
test_real_interrupt_subagent.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_redirect_stdout_issue.py
test_repair_tool_call_arguments.py revert: drop cumulative-resend tool-arg heuristic from shared streaming path (#35718) (#35860) 2026-05-31 06:14:32 -07:00
test_repair_tool_call_name.py fix(volcengine): strip XML attribute fragments from tool_use.name (#33007) 2026-06-07 22:22:01 -07:00
test_retry_status_buffer.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_review_prompt_class_first.py
test_run_agent.py test(run_agent): fix racy ordering in test_concurrent_handles_tool_error (#42356) 2026-06-08 14:40:39 -07:00
test_run_agent_codex_responses.py fix(xai-sanitize): deepcopy tools_for_api before in-place mutation (#27907) 2026-05-28 23:29:59 -07:00
test_run_agent_multimodal_prologue.py
test_sequential_chats_live.py
test_session_id_env.py
test_session_meta_filtering.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_session_reset_fix.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_steer.py fix(agent): make mid-turn /steer trusted, not read as injection 2026-06-05 20:59:36 -05:00
test_stream_drop_logging.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_stream_interrupt_retry.py
test_streaming.py Add Hermes desktop app (#20059) 2026-05-31 17:46:56 -05:00
test_streaming_tool_call_repair.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_strict_api_validation.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_strip_reasoning_tags_cli.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_switch_model_context.py
test_switch_model_fallback_prune.py
test_switch_model_rollback.py fix(agent): roll back switch_model() state when client rebuild fails (#33228) 2026-05-27 05:43:20 -07:00
test_thinking_only_sanitizer.py
test_tls_fd_recycle_corruption.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_token_persistence_non_cli.py
test_tool_arg_coercion.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_tool_call_args_sanitizer.py
test_tool_call_guardrail_runtime.py
test_tool_executor_contextvar_propagation.py fix(code-exec): propagate agent-turn context into tool worker threads 2026-05-29 03:44:49 -07:00
test_tool_name_db_persistence.py
test_turn_completion_explainer.py fix(agent): register explainer config key + shorten footer prefix 2026-05-29 19:23:05 -07:00
test_unicode_ascii_codec.py
test_vision_aware_preprocessing.py chore: prune unused imports and duplicate import redefinitions 2026-05-28 22:26:25 -07:00
test_vision_tool_messages.py fix(vision): proactive downgrade for providers rejecting list-type tool content (#41072) 2026-06-07 21:50:57 -07:00