The custom/Ollama provider profile had no default_max_tokens, so no max_tokens was sent on requests and Ollama fell back to its internal num_predict=128 — truncating responses after a few tokens with finish_reason='length' (#39281, e.g. gemma4). max_tokens resolution is ephemeral > user model.max_tokens > profile default, so this is only a floor used when the user hasn't set their own cap. Set it to 65536 (matching the qwen-oauth tier) rather than a conservative value, since users can always override per-model. Fixes #39281 |
||
|---|---|---|
| .. | ||
| browser | ||
| context_engine | ||
| dashboard_auth | ||
| disk-cleanup | ||
| google_meet | ||
| hermes-achievements | ||
| image_gen | ||
| kanban | ||
| memory | ||
| model-providers | ||
| observability | ||
| platforms | ||
| security-guidance | ||
| spotify | ||
| teams_pipeline | ||
| video_gen | ||
| web | ||
| __init__.py | ||