zot/packages
patriceckhart d2fa18270d fix(provider): clamp max_tokens to fit context window with proportional reserve
OpenRouter enforces input + max_output <= served context_length and
rejects requests where max_tokens equals the whole window, which happens
for models whose catalog MaxOutput is set equal to ContextWindow (e.g.
nemotron-3-super-120B). Two parts:

- discover.go (from #24): prefer top_provider.context_length when it is
  smaller than the inflated model-level context_length, so ContextWindow
  reflects the limit OpenRouter actually serves.
- openai.go: clamp max_tokens to ContextWindow minus a reserve. The
  reserve is derived from the window (window/8, capped at 4096), never
  from MaxOutput, so models whose output already fits the window are
  untouched and small-window models (gpt-4) are not over-penalized.

Adds buildRequest clamp tests (fits-window no-op, large-window cap,
small-window proportional reserve, floor, explicit-request passthrough)
and an httptest-based DiscoverOpenRouter test for the served-context
preference.

Co-authored-by: Neil-urk12 <neil-urk12@users.noreply.github.com>
2026-06-09 19:29:48 +02:00
..
agent style: drop em-dashes from output-token-budget strings/comments 2026-06-09 18:38:09 +02:00
core style: drop em-dashes from output-token-budget strings/comments 2026-06-09 18:38:09 +02:00
provider fix(provider): clamp max_tokens to fit context window with proportional reserve 2026-06-09 19:29:48 +02:00
tui Word-wrap provider error rows instead of truncating 2026-06-04 19:25:16 +02:00