hermes-bsd/tools
Teknium fb9f3a4ef9
fix(skills): pull full ClawHub catalog into the skills index (200 → 20k+) (#33748)
* fix(skills): pull full ClawHub catalog into the skills index

The website was showing 200 ClawHub skills out of 20k+ because
`ClawHubSource.search("")` for empty queries went straight to a single
unpaginated request. ClawHub's API caps any single page at 200 items and
returns a `nextCursor`; we grabbed page 1 and stopped, so the cached
index served from hermes-agent.nousresearch.com had a silent 99%
truncation.

End users never hit clawhub.ai directly (the index is rebuilt twice
daily by .github/workflows/skills-index.yml and served as a static JSON
on the docs site), so the cap-and-cache architecture is correct — it
just wasn't being filled.

Changes:
- `ClawHubSource.search(query="")` now routes through the existing
  `_load_catalog_index()` paginating walker instead of the unpaginated
  listing fallback (non-empty queries still hit the fast catalog search).
- `_load_catalog_index()` max_pages 50 → 250 (50k-skill ceiling; live
  catalog is ~20k as of May 2026, with headroom for growth).
- `build_skills_index.py`: per-source crawl limits split out — ClawHub
  and LobeHub get 100k, others keep their effective caps.
- `EXPECTED_FLOORS["clawhub"]` 50 → 5000 so the next pagination
  regression hard-fails the CI build instead of silently shipping a
  degenerate index.

Test plan:
- New unit test `test_search_empty_query_paginates_full_catalog`
  exercises the cursor-following path with three mocked pages (450
  total items) and asserts all pages are walked.
- Existing 9 ClawHub tests + 127 broader skills_hub tests all pass.
- E2E against live ClawHub API: walker reached 9700+ skills across 49
  pages before this commit landed, paginating well past the previous
  50-page cap.

* fix(skills): raise ClawHub ceilings — live catalog is 50k, not 20k

E2E walk against live ClawHub API hit my initial 250-page cap at 49,698
skills with cursor=yes still pending. The catalog is roughly 2.5x larger
than the docstring estimate.

- max_pages 250 → 750 (150k ceiling, walks terminate on cursor=None
  well before this in practice)
- SOURCE_LIMITS['clawhub'] 100k → 200k
- EXPECTED_FLOORS['clawhub'] 5000 → 20000
2026-05-28 01:42:19 -07:00
..
computer_use fix(computer-use): skip capture_after when action failed (ok=False) 2026-05-22 01:19:01 -07:00
environments remove Vercel AI Gateway and Vercel Sandbox (#33067) 2026-05-27 00:43:32 -07:00
neutts_samples
__init__.py
ansi_strip.py
approval.py remove Vercel AI Gateway and Vercel Sandbox (#33067) 2026-05-27 00:43:32 -07:00
binary_extensions.py
browser_camofox.py feat: auto-launch Chromium-family browser for CDP 2026-05-19 22:34:05 -07:00
browser_camofox_state.py
browser_cdp_tool.py feat: auto-launch Chromium-family browser for CDP 2026-05-19 22:34:05 -07:00
browser_dialog_tool.py feat: auto-launch Chromium-family browser for CDP 2026-05-19 22:34:05 -07:00
browser_supervisor.py
browser_tool.py fix(vision): route auxiliary.vision.provider=openai to api.openai.com, skip text-only main (#31452) 2026-05-24 15:01:28 -07:00
budget_config.py
checkpoint_manager.py
clarify_gateway.py
clarify_tool.py
code_execution_tool.py remove Vercel AI Gateway and Vercel Sandbox (#33067) 2026-05-27 00:43:32 -07:00
computer_use_tool.py
credential_files.py remove Vercel AI Gateway and Vercel Sandbox (#33067) 2026-05-27 00:43:32 -07:00
cronjob_tools.py fix(cron): clarify schedule is required for create in tool schema 2026-05-26 14:09:37 -07:00
debug_helpers.py
delegate_tool.py
discord_tool.py
env_passthrough.py harden(env_passthrough): apply GHSA-rhgp-j443-p4rf filter to config.yaml path (#27794) 2026-05-25 03:35:23 -07:00
fal_common.py refactor(image_gen): port FAL backend to plugins/image_gen/fal 2026-05-22 04:10:45 -07:00
feishu_doc_tool.py
feishu_drive_tool.py
file_operations.py remove Vercel AI Gateway and Vercel Sandbox (#33067) 2026-05-27 00:43:32 -07:00
file_state.py
file_tools.py remove Vercel AI Gateway and Vercel Sandbox (#33067) 2026-05-27 00:43:32 -07:00
fuzzy_match.py feat(patch): indentation preservation, CRLF preservation, per-file failure escalation (#507) (#32273) 2026-05-25 15:18:45 -07:00
homeassistant_tool.py
image_generation_tool.py feat(auth) normalise the way in which we check whether a user has free/paid access to nous portal so we can expose behaviour and error messages accordingly. 2026-05-28 00:19:31 -07:00
interrupt.py
kanban_tools.py feat(kanban): stamp originating ACP session_id on tasks 2026-05-18 21:15:21 -07:00
lazy_deps.py remove Vercel AI Gateway and Vercel Sandbox (#33067) 2026-05-27 00:43:32 -07:00
managed_tool_gateway.py
mcp_oauth.py feat(mcp-oauth): accept 'skip' at paste prompt to bypass auth without disabling server (#32069) 2026-05-25 05:37:30 -07:00
mcp_oauth_manager.py
mcp_tool.py feat(mcp): support TLS client certificates (mTLS) for HTTP and SSE servers (#33721) 2026-05-28 00:55:55 -07:00
memory_tool.py feat(security): promptware defense — shared threat patterns + memory load-time scan + tool-result delimiters (#32269) 2026-05-25 14:52:24 -07:00
microsoft_graph_auth.py
microsoft_graph_client.py
mixture_of_agents_tool.py
neutts_synth.py
openrouter_client.py
osv_check.py
patch_parser.py fix(lint): skip per-file shell linter when LSP will handle the file (#29054) 2026-05-20 01:46:40 -05:00
path_security.py
process_registry.py feat(cli): show live background terminal-process count in status bar (#32061) 2026-05-25 05:35:02 -07:00
registry.py
schema_sanitizer.py fix(xai-responses): strip enum values containing '/' from tool schemas 2026-05-18 10:37:35 -07:00
send_message_tool.py refactor(gateway): migrate Mattermost adapter to bundled plugin 2026-05-24 18:05:33 -07:00
session_search_tool.py feat(session_search): single-shape tool with discovery, scroll, browse — no LLM (#27590) 2026-05-17 23:28:45 -07:00
skill_manager_tool.py fix(profiles): cross-profile soft guard on file-write tools + system-prompt hint (#31290) 2026-05-24 00:38:17 -07:00
skill_provenance.py
skill_usage.py fix(skills): prune dependency/venv dirs from all skill scanners (#30042) 2026-05-21 14:18:02 -07:00
skills_ast_audit.py refactor(skills): slim AST diagnostic to single entry point 2026-05-23 17:47:26 -07:00
skills_guard.py Harden Skills Guard multi-word prompt patterns (#26852) 2026-05-25 01:51:27 -07:00
skills_hub.py fix(skills): pull full ClawHub catalog into the skills index (200 → 20k+) (#33748) 2026-05-28 01:42:19 -07:00
skills_sync.py fix(skills): atomic lock write + drop dead _validate_category_name 2026-05-27 13:39:58 -07:00
skills_tool.py remove Vercel AI Gateway and Vercel Sandbox (#33067) 2026-05-27 00:43:32 -07:00
slash_confirm.py
terminal_tool.py feat(auth) normalise the way in which we check whether a user has free/paid access to nous portal so we can expose behaviour and error messages accordingly. 2026-05-28 00:19:31 -07:00
threat_patterns.py feat(security): promptware defense — shared threat patterns + memory load-time scan + tool-result delimiters (#32269) 2026-05-25 14:52:24 -07:00
tirith_security.py fix(tirith): suppress .app lookalike_tld false positives in warn verdicts 2026-05-18 10:20:07 -07:00
todo_tool.py
tool_backend_helpers.py fix(auth): refresh Nous entitlement in tool menus 2026-05-28 00:19:31 -07:00
tool_output_limits.py
tool_result_storage.py
transcription_tools.py feat(auth) normalise the way in which we check whether a user has free/paid access to nous portal so we can expose behaviour and error messages accordingly. 2026-05-28 00:19:31 -07:00
tts_tool.py feat(auth) normalise the way in which we check whether a user has free/paid access to nous portal so we can expose behaviour and error messages accordingly. 2026-05-28 00:19:31 -07:00
url_safety.py fix(url_safety): block IPv4-mapped IPv6 addresses to prevent SSRF bypass 2026-05-18 10:51:15 -07:00
video_generation_tool.py
vision_tools.py fix(vision): route auxiliary.vision.provider=openai to api.openai.com, skip text-only main (#31452) 2026-05-24 15:01:28 -07:00
voice_mode.py docs(voice): use uv pip install faster-whisper in STT install hints (#29800) 2026-05-28 16:23:14 +10:00
web_tools.py feat(auth) normalise the way in which we check whether a user has free/paid access to nous portal so we can expose behaviour and error messages accordingly. 2026-05-28 00:19:31 -07:00
website_policy.py
x_search_tool.py fix(x_search): surface degraded results + validate dates 2026-05-21 02:38:45 +05:30
xai_http.py feat(web): add xAI Web Search provider plugin 2026-05-19 19:27:34 -07:00
yuanbao_tools.py Fix unsafe gateway media path delivery 2026-05-23 01:40:35 -07:00