mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-04-29 07:21:37 +08:00
Cherry-picked from PR #12481 by @Sanjays2402. Reasoning models (GLM-5.1, QwQ, DeepSeek R1) inflate completion_tokens with internal thinking tokens. The compression trigger summed prompt_tokens + completion_tokens, causing premature compression at ~42% actual context usage instead of the configured 50% threshold. Now uses only prompt_tokens — completion tokens don't consume context window space for the next API call. - 3 new regression tests - Added AUTHOR_MAP entry for @Sanjays2402 Closes #12026