mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-04-28 06:51:16 +08:00
fix: wire _ephemeral_max_output_tokens into chat_completions and add NVIDIA NIM default
Based on #12152 by @LVT382009. Two fixes to run_agent.py: 1. _ephemeral_max_output_tokens consumption in chat_completions path: The error-recovery ephemeral override was only consumed in the anthropic_messages branch of _build_api_kwargs. All chat_completions providers (OpenRouter, NVIDIA NIM, Qwen, Alibaba, custom, etc.) silently ignored it. Now consumed at highest priority, matching the anthropic pattern. 2. NVIDIA NIM max_tokens default (16384): NVIDIA NIM falls back to a very low internal default when max_tokens is omitted, causing models like GLM-4.7 to truncate immediately (thinking tokens exhaust the budget before the response starts). 3. Progressive length-continuation boost: When finish_reason='length' triggers a continuation retry, the output budget now grows progressively (2x base on retry 1, 3x on retry 2, capped at 32768) via _ephemeral_max_output_tokens. Previously the retry loop just re-sent the same token limit on all 3 attempts.
This commit is contained in:
@@ -267,6 +267,7 @@ AUTHOR_MAP = {
|
||||
"aviralarora002@gmail.com": "AviArora02-commits",
|
||||
"junminliu@gmail.com": "JimLiu",
|
||||
"jarvischer@gmail.com": "maxchernin",
|
||||
"levantam.98.2324@gmail.com": "LVT382009",
|
||||
}
|
||||
|
||||
|
||||
|
||||
Reference in New Issue
Block a user