hermes-bsd

History

kshitij db84a78e61 fix(langfuse): complete observability fix — trace I/O, tool outputs, placeholder credentials (closes #22342 , #22763 ) (#26320 ) * fix(langfuse): reject placeholder credentials with one-shot warning When operators leave HERMES_LANGFUSE_PUBLIC_KEY / HERMES_LANGFUSE_SECRET_KEY at a template value like 'placeholder', 'test-key', or 'your-langfuse-key', the Langfuse SDK silently accepts the credentials at construction time and drops every trace at flush time. No warning, no error — just an empty Langfuse dashboard the operator only notices hours later. Add prefix-based validation in _get_langfuse() against the documented 'pk-lf-' / 'sk-lf-' prefixes that Langfuse always issues server-side. Anything else fires a single warning naming the offending env var(s) with a log-safe value preview (full string for short placeholders so the operator knows which template they left in place; truncated for long values so a real secret pasted into the wrong field never hits the log), then short-circuits via the existing _INIT_FAILED cache so the warning fires once per process, not once per hook invocation. The check sits after the 'Langfuse is None' SDK-installed guard so hosts without the optional langfuse SDK don't see misleading 'set real keys' hints when the actionable fix is 'pip install langfuse'. Missing credentials remains the documented opt-out path and stays silent — no log noise for unconfigured installs. Fixes #22763 Fixes #23823 * fix(langfuse): use actual API request messages for generation input on_pre_llm_request previously used the messages kwarg alone, which could be None when Hermes passes the payload via request_messages, conversation_history, or user_message instead. Add _coerce_request_messages to pick the first available list across all variants, falling back to a synthetic user message. Generations now show the real outbound payload rather than an empty input. * fix(langfuse): record tool call outputs in traces Tool observations showed input (arguments) but output was always undefined. Root cause: when tool_call_id is empty, pre_tool_call stored observations under a unique time-based key that post_tool_call could never reconstruct, so every tool span was closed without output by the _finish_trace sweep. Fix pre/post matching by routing empty-tool_call_id tools through a per-name FIFO queue (pending_tools_by_name) instead of the time-based key. Tools with a tool_call_id continue to use the id-keyed dict. Also: - Preserve OpenAI-style nested function shape in serialized tool calls so Langfuse renders name/arguments correctly - Keep name + tool_call_id on role:tool messages for proper pairing - Backfill tool results onto the matching turn_tool_calls entry so the generation's tool-call record carries the result alongside arguments - Coerce request messages from whichever field the runtime provides (request_messages, messages, conversation_history, user_message) * fix(langfuse): salvage-review polish — drop dead is_first_turn, shallow-copy request_messages, real threaded FIFO test Self-review of the combined #22345 + #23831 salvage surfaced three issues worth fixing in the same PR rather than as follow-ups: 1. Drop is_first_turn from the pre_api_request hook. The boolean expression `not bool(conversation_history)` was wrong: conversation_history is reassigned to None mid-run after compression (5 sites in run_agent.py), so the value flips False -> True mid-conversation on every post-compression API call. The langfuse plugin never consumed it, so the kwarg was both misleading AND dead. 2. Replace copy.deepcopy(request_messages) with shallow list() copy. The pre_api_request hook contract discards return values (invoke_hook never writes back to api_kwargs), and the langfuse plugin's _serialize_messages already builds its own snapshot dicts via _safe_value. A deepcopy on every API call would walk every tool result and base64 image — significant overhead for no real isolation benefit. Shallow copy of the outer list protects against later mutations of api_messages without paying for the inner-dict walk. 3. Rename test_empty_tool_call_id_concurrent_fifo_order -> test_empty_tool_call_id_observations_are_fifo_within_tool_name and add a real test_threaded_post_calls_preserve_fifo_under_lock that spawns 8 threads behind a barrier to actually exercise _STATE_LOCK on the pending_tools_by_name queue. The original test was sequential and only validated Python list semantics; this one validates the lock discipline. 4. Fix stale 'Cleared by reset_cache_for_tests()' comment on _INIT_FAILED — that function does not exist. Tests reload the module via sys.modules.pop + importlib.import_module instead. Tests: 37 langfuse plugin tests pass, 658 plugin tests overall pass. --------- Co-authored-by: xxxigm <tuancanhnguyen706@gmail.com> Co-authored-by: Brian Conklin <brian@dralth.com>		2026-05-15 05:04:02 -07:00
..
acp	feat(acp): hermes acp --setup-browser bootstraps browser tools for registry installs	2026-05-15 01:38:24 -07:00
acp_adapter
agent	Merge pull request #25957 from stephenschoettler/fix/main-ci-unblocker-after-21012	2026-05-14 21:26:52 -04:00
cli	revert(cli): drop scrollback box width clamp (#25975 ), restore full-width borders (#26163 )	2026-05-14 23:30:16 -07:00
cron	feat(cron): support name-based lookup for job operations	2026-05-15 01:36:03 -07:00
e2e	fix(dashboard): UI polish — modals, layout, consistency, test fixes	2026-05-12 13:59:22 -04:00
fakes
gateway	feat(gateway): add SimpleX Chat platform plugin	2026-05-15 01:41:30 -07:00
hermes_cli	fix(codex-runtime): de-dup [plugins.X] tables and stop leaking HERMES_HOME into config.toml	2026-05-15 02:31:30 -07:00
hermes_state
honcho_plugin	fix(tests): exercise profile-mode HERMES_HOME for honcho fallback	2026-05-13 22:53:01 -07:00
integration
openviking_plugin
plugins	fix(langfuse): complete observability fix — trace I/O, tool outputs, placeholder credentials (closes #22342 , #22763 ) (#26320 )	2026-05-15 05:04:02 -07:00
providers	fix(ci): stabilize shared test state after 21012	2026-05-14 14:28:14 -07:00
run_agent	fix(langfuse): complete observability fix — trace I/O, tool outputs, placeholder credentials (closes #22342 , #22763 ) (#26320 )	2026-05-15 05:04:02 -07:00
scripts	feat(acp-registry): switch to uvx distribution, drop npm launcher	2026-05-14 22:27:09 -07:00
skills	Add unit tests for hyperliquid skill functionality	2026-05-10 22:15:04 -07:00
stress
tools	fix(url-safety): allow only http and https schemes	2026-05-15 01:52:48 -07:00
tui_gateway
website
__init__.py
conftest.py	chore: remove Atropos RL environments and tinker-atropos integration (#26106 )	2026-05-15 10:36:38 +05:30
run_interrupt_test.py
test_account_usage.py
test_atomic_replace_symlinks.py
test_base_url_hostname.py
test_batch_runner_checkpoint.py
test_cli_file_drop.py
test_cli_manual_compress.py
test_cli_skin_integration.py
test_ctx_halving_fix.py	fix(cache): kill long-lived prefix layout — system prompt is now byte-static within a session (#24778 )	2026-05-12 20:46:04 -07:00
test_empty_model_fallback.py
test_evidence_store.py
test_gateway_streaming_nested_config.py	fix(gateway): load streaming config from nested gateway.streaming key	2026-05-14 14:51:07 -07:00
test_get_tool_definitions_cache_isolation.py
test_hermes_bootstrap.py
test_hermes_constants.py
test_hermes_home_profile_warning.py
test_hermes_logging.py
test_hermes_state.py
test_hermes_state_wal_fallback.py
test_honcho_client_config.py
test_install_sh_browser_install.py	fix(install): support non-sudo service-user installs on apt distros (#25814 )	2026-05-14 09:05:31 -07:00
test_install_sh_pythonpath_sanitization.py
test_install_sh_setup_wizard_tty_probe.py
test_install_sh_symlink_stomp.py	fix(install): preserve pip entry point when re-running on symlinked install	2026-05-14 07:08:45 -07:00
test_install_sh_termux_network_prereqs.py
test_ipv4_preference.py
test_lazy_session_regressions.py
test_lint_config.py
test_live_system_guard_self_test.py	test(conftest): plug every gateway-kill leak path (#23486 )	2026-05-10 18:55:28 -07:00
test_mcp_serve.py
test_mini_swe_runner.py
test_minimax_model_validation.py
test_minimax_oauth.py	fix(minimax): harden OAuth dashboard and runtime	2026-05-11 22:15:16 -07:00
test_minisweagent_path.py
test_model_picker_scroll.py
test_model_tools.py	chore: remove Atropos RL environments and tinker-atropos integration (#26106 )	2026-05-15 10:36:38 +05:30
test_model_tools_async_bridge.py
test_ollama_num_ctx.py
test_packaging_metadata.py
test_plugin_skills.py
test_process_loop_event_loop_warning.py
test_project_metadata.py	fix(install): use `--extra all` not `--all-extras`; drop lazy-covered extras from [all] (#24515 )	2026-05-12 15:06:25 -07:00
test_retry_utils.py
test_sql_injection.py
test_subprocess_home_isolation.py
test_termux_all_extra_compat.py
test_timezone.py
test_toolset_distributions.py
test_toolsets.py	test(toolsets): lock web search into default platform coverage	2026-05-14 08:03:33 -07:00
test_trajectory_compressor.py
test_trajectory_compressor_async.py
test_transform_llm_output_hook.py
test_transform_tool_result_hook.py
test_tui_gateway_server.py
test_utils_truthy_values.py
test_yuanbao_integration.py
test_yuanbao_markdown.py
test_yuanbao_pipeline.py
test_yuanbao_proto.py