mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-04-28 06:51:16 +08:00
fix(compressor): use text char sum for multimodal token estimation in _find_tail_cut_by_tokens
_find_tail_cut_by_tokens called len(content) to estimate message tokens.
When content is a list of blocks (multimodal: text + image_url), len()
returns block count (e.g. 2) rather than character count, so a message
with 500 chars of text was counted as ~10 tokens instead of ~135.
This caused the backward walk to exhaust all messages before hitting the
budget ceiling; the head_end safeguard then forced cut = n - min_tail,
shrinking the protected tail to the bare minimum and preventing effective
compression of long multimodal conversations.
Fix mirrors the existing pattern in _prune_old_tool_results (line 487):
sum(len(p.get("text", "")) for p in raw_content)
if isinstance(raw_content, list) else len(raw_content)
Tests: 3 new cases in TestTokenBudgetTailProtection — regression guard
(confirms the test fails with the bug), plain-string regression guard,
and image-only block edge case.
Fixes #16087.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
@@ -1082,8 +1082,13 @@ The user has requested that this compaction PRIORITISE preserving all informatio
|
||||
|
||||
for i in range(n - 1, head_end - 1, -1):
|
||||
msg = messages[i]
|
||||
content = msg.get("content") or ""
|
||||
msg_tokens = len(content) // _CHARS_PER_TOKEN + 10 # +10 for role/metadata
|
||||
raw_content = msg.get("content") or ""
|
||||
content_len = (
|
||||
sum(len(p.get("text", "")) for p in raw_content)
|
||||
if isinstance(raw_content, list)
|
||||
else len(raw_content)
|
||||
)
|
||||
msg_tokens = content_len // _CHARS_PER_TOKEN + 10 # +10 for role/metadata
|
||||
# Include tool call arguments in estimate
|
||||
for tc in msg.get("tool_calls") or []:
|
||||
if isinstance(tc, dict):
|
||||
|
||||
Reference in New Issue
Block a user