mirror of
https://github.com/patriceckhart/zot.git
synced 2026-06-26 21:36:31 +02:00
OpenRouter enforces input + max_output <= served context_length and rejects requests where max_tokens equals the whole window, which happens for models whose catalog MaxOutput is set equal to ContextWindow (e.g. nemotron-3-super-120B). Two parts: - discover.go (from #24): prefer top_provider.context_length when it is smaller than the inflated model-level context_length, so ContextWindow reflects the limit OpenRouter actually serves. - openai.go: clamp max_tokens to ContextWindow minus a reserve. The reserve is derived from the window (window/8, capped at 4096), never from MaxOutput, so models whose output already fits the window are untouched and small-window models (gpt-4) are not over-penalized. Adds buildRequest clamp tests (fits-window no-op, large-window cap, small-window proportional reserve, floor, explicit-request passthrough) and an httptest-based DiscoverOpenRouter test for the served-context preference. Co-authored-by: Neil-urk12 <neil-urk12@users.noreply.github.com> |
||
|---|---|---|
| .. | ||
| agent | ||
| core | ||
| provider | ||
| tui | ||