chore(release): map islam666 for salvaged PR #39624

fix(ollama): set default_max_tokens=4096 for custom/Ollama provider
Ollama's default num_predict is very small (128 tokens), which causes responses to be truncated with finish_reason='length' — especially noticeable with Gemma4 and other models that need more output headroom. The custom provider profile (which covers Ollama, vLLM, llamacpp, etc.) previously had no default_max_tokens, so no max_tokens was sent in API requests, leaving Ollama to use its very low default. Set default_max_tokens=4096 so Ollama models produce complete responses out of the box. Users can still override via model.max_tokens in their config.yaml if they need a different value. Fixes #39281
2026-06-12 13:18:54 +08:00 · 2026-06-07 06:34:08 -07:00 · 2026-06-07 06:34:07 -07:00
2 changed files with 2 additions and 0 deletions
--- a/plugins/model-providers/custom/init.py
+++ b/plugins/model-providers/custom/init.py
@@ -63,6 +63,7 @@ custom = CustomProfile(
    ),
    env_vars=(),  # No fixed key — custom endpoint
    base_url="",  # User-configured
+    default_max_tokens=4096,
 )

 register_provider(custom)
--- a/scripts/release.py
+++ b/scripts/release.py
@@ -58,6 +58,7 @@ AUTHOR_MAP = {
    "129007007+HeLLGURD@users.noreply.github.com": "HeLLGURD",
    "290859878+synapsesx@users.noreply.github.com": "synapsesx",
    "dirtyren@users.noreply.github.com": "dirtyren",
+    "islam666@users.noreply.github.com": "islam666",
    "zhaolei.vc@bytedance.com": "zhaoleibd",
    "jeffrobodie@gmail.com": "jeffrobodie-glitch",
    "kyssta-exe@users.noreply.github.com": "kyssta-exe",