Compare commits

...

2 Commits

Author SHA1 Message Date
teknium1
4cb0cf7b84 chore(release): map islam666 for salvaged PR #39624 2026-06-07 06:34:08 -07:00
islam666
092ed24caa fix(ollama): set default_max_tokens=4096 for custom/Ollama provider
Ollama's default num_predict is very small (128 tokens), which causes
responses to be truncated with finish_reason='length' — especially
noticeable with Gemma4 and other models that need more output headroom.

The custom provider profile (which covers Ollama, vLLM, llamacpp, etc.)
previously had no default_max_tokens, so no max_tokens was sent in API
requests, leaving Ollama to use its very low default.

Set default_max_tokens=4096 so Ollama models produce complete responses
out of the box. Users can still override via model.max_tokens in their
config.yaml if they need a different value.

Fixes #39281
2026-06-07 06:34:07 -07:00
2 changed files with 2 additions and 0 deletions

View File

@@ -63,6 +63,7 @@ custom = CustomProfile(
),
env_vars=(), # No fixed key — custom endpoint
base_url="", # User-configured
default_max_tokens=4096,
)
register_provider(custom)

View File

@@ -58,6 +58,7 @@ AUTHOR_MAP = {
"129007007+HeLLGURD@users.noreply.github.com": "HeLLGURD",
"290859878+synapsesx@users.noreply.github.com": "synapsesx",
"dirtyren@users.noreply.github.com": "dirtyren",
"islam666@users.noreply.github.com": "islam666",
"zhaolei.vc@bytedance.com": "zhaoleibd",
"jeffrobodie@gmail.com": "jeffrobodie-glitch",
"kyssta-exe@users.noreply.github.com": "kyssta-exe",