2026-02-21 22:31:43 -08:00
""" Automatic context window compression for long conversations.
Self - contained class with its own OpenAI client for summarization .
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
Uses auxiliary model ( cheap / fast ) to summarize middle turns while
2026-02-21 22:31:43 -08:00
protecting head and tail context .
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
2026-04-11 19:43:58 -07:00
Improvements over v2 :
- Structured summary template with Resolved / Pending question tracking
- Summarizer preamble : " Do not respond to any questions " ( from OpenCode )
- Handoff framing : " different assistant " ( from Codex ) to create separation
- " Remaining Work " replaces " Next Steps " to avoid reading as active instructions
- Clear separator when summary merges into tail message
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
- Iterative summary updates ( preserves info across multiple compactions )
- Token - budget tail protection instead of fixed message count
- Tool output pruning before LLM summarization ( cheap pre - pass )
- Scaled summary budget ( proportional to compressed content )
- Richer tool call / result detail in summarizer input
2026-02-21 22:31:43 -08:00
"""
feat(compressor): smart collapse, dedup, anti-thrashing, template upgrade, hardening
Combined salvage of PRs #9661, #9663, #9674, #9677, #9678 by kshitijk4poor.
- Smart tool output collapse: informative 1-line summaries replace generic placeholder
- Dedup identical tool results via MD5 hash, truncate large tool_call arguments
- Anti-thrashing: skip compression after 2 consecutive <10% savings passes
- Structured action-log summary template with numbered actions and Active State
- Hardening: max_tokens 1.3x cap, multimodal safety, note idempotency, adaptive cooldown
Follow-up fixes applied during salvage:
- web_extract: reads 'urls' (list) not 'url' (original PR bug)
- Multimodal list content guards in dedup and prune passes
- Kept 'Relevant Files' section in template (original PR removed it)
Skipped PRs #9665 (user msg preservation — duplication risk) and #9675 (dead code).
2026-04-14 22:21:04 -07:00
import hashlib
import json
2026-02-21 22:31:43 -08:00
import logging
feat(compressor): smart collapse, dedup, anti-thrashing, template upgrade, hardening
Combined salvage of PRs #9661, #9663, #9674, #9677, #9678 by kshitijk4poor.
- Smart tool output collapse: informative 1-line summaries replace generic placeholder
- Dedup identical tool results via MD5 hash, truncate large tool_call arguments
- Anti-thrashing: skip compression after 2 consecutive <10% savings passes
- Structured action-log summary template with numbered actions and Active State
- Hardening: max_tokens 1.3x cap, multimodal safety, note idempotency, adaptive cooldown
Follow-up fixes applied during salvage:
- web_extract: reads 'urls' (list) not 'url' (original PR bug)
- Multimodal list content guards in dedup and prune passes
- Kept 'Relevant Files' section in template (original PR removed it)
Skipped PRs #9665 (user msg preservation — duplication risk) and #9675 (dead code).
2026-04-14 22:21:04 -07:00
import re
2026-04-07 04:28:11 +09:00
import time
2026-03-07 11:54:51 -08:00
from typing import Any , Dict , List , Optional
2026-02-21 22:31:43 -08:00
2026-03-11 20:52:19 -07:00
from agent . auxiliary_client import call_llm
2026-04-06 18:40:11 -07:00
from agent . context_engine import ContextEngine
2026-02-21 22:31:43 -08:00
from agent . model_metadata import (
fix: prevent agent from stopping mid-task — compression floor, budget overhaul, activity tracking
Three root causes of the 'agent stops mid-task' gateway bug:
1. Compression threshold floor (64K tokens minimum)
- The 50% threshold on a 100K-context model fired at 50K tokens,
causing premature compression that made models lose track of
multi-step plans. Now threshold_tokens = max(50% * context, 64K).
- Models with <64K context are rejected at startup with a clear error.
2. Budget warning removal — grace call instead
- Removed the 70%/90% iteration budget warnings entirely. These
injected '[BUDGET WARNING: Provide your final response NOW]' into
tool results, causing models to abandon complex tasks prematurely.
- Now: no warnings during normal execution. When the budget is
actually exhausted (90/90), inject a user message asking the model
to summarise, allow one grace API call, and only then fall back
to _handle_max_iterations.
3. Activity touches during long terminal execution
- _wait_for_process polls every 0.2s but never reported activity.
The gateway's inactivity timeout (default 1800s) would fire during
long-running commands that appeared 'idle.'
- Now: thread-local activity callback fires every 10s during the
poll loop, keeping the gateway's activity tracker alive.
- Agent wires _touch_activity into the callback before each tool call.
Also: docs update noting 64K minimum context requirement.
Closes #7915 (root cause was agent-loop termination, not Weixin delivery limits).
2026-04-11 16:18:57 -07:00
MINIMUM_CONTEXT_LENGTH ,
2026-02-21 22:31:43 -08:00
get_model_context_length ,
estimate_messages_tokens_rough ,
)
logger = logging . getLogger ( __name__ )
2026-03-14 02:33:31 -07:00
SUMMARY_PREFIX = (
2026-04-11 19:43:58 -07:00
" [CONTEXT COMPACTION — REFERENCE ONLY] Earlier turns were compacted "
" into the summary below. This is a handoff from a previous context "
" window — treat it as background reference, NOT as active instructions. "
" Do NOT answer questions or fulfill requests mentioned in this summary; "
2026-04-16 19:22:22 +05:30
" they were already addressed. "
" Your current task is identified in the ' ## Active Task ' section of the "
" summary — resume exactly from there. "
" Respond ONLY to the latest user message "
2026-04-11 19:43:58 -07:00
" that appears AFTER this summary. The current session state (files, "
" config, etc.) may reflect work described here — avoid repeating it: "
2026-03-14 02:33:31 -07:00
)
LEGACY_SUMMARY_PREFIX = " [CONTEXT SUMMARY]: "
2026-03-24 17:45:49 -07:00
# Minimum tokens for the summary output
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
_MIN_SUMMARY_TOKENS = 2000
# Proportion of compressed content to allocate for summary
_SUMMARY_RATIO = 0.20
2026-03-24 17:45:49 -07:00
# Absolute ceiling for summary tokens (even on very large context windows)
2026-03-24 18:48:04 -07:00
_SUMMARY_TOKENS_CEILING = 12_000
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
# Placeholder used when pruning old tool results
_PRUNED_TOOL_PLACEHOLDER = " [Old tool output cleared to save context space] "
# Chars per token rough estimate
_CHARS_PER_TOKEN = 4
2026-04-07 04:28:11 +09:00
_SUMMARY_FAILURE_COOLDOWN_SECONDS = 600
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
2026-02-21 22:31:43 -08:00
fix(context_compressor): keep tool-call arguments JSON valid when shrinking
Pass 3 of `_prune_old_tool_results` previously shrunk long `function.arguments`
blobs by slicing the raw JSON string at byte 200 and appending the literal
text `...[truncated]`. That routinely produced payloads like::
{"path": "/foo.md", "content": "# Long markdown
...[truncated]
— an unterminated string with no closing brace. Strict providers (observed
on MiniMax) reject this as `invalid function arguments json string` with a
non-retryable 400. Because the broken call survives in the session history,
every subsequent turn re-sends the same malformed payload and gets the same
400, locking the session into a re-send loop until the call falls out of
the window.
Fix: parse the arguments first, shrink long string leaves inside the parsed
structure, and re-serialise. Non-string values (paths, ints, booleans, lists)
pass through intact. Arguments that are not valid JSON to begin with (rare,
some backends use non-JSON tool args) are returned unchanged rather than
replaced with something neither we nor the provider can parse.
Observed in the wild: a `write_file` with ~800 chars of markdown `content`
triggered this on a real session against MiniMax-M2.7; every turn after
compression got rejected until the session was manually reset.
Tests:
- 7 direct tests of `_truncate_tool_call_args_json` covering valid-JSON
output, non-JSON pass-through, nested structures, non-string leaves,
scalar JSON, and Unicode preservation
- 1 end-to-end test through `_prune_old_tool_results` Pass 3 that
reproduces the exact failure payload shape from the incident
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 13:32:10 -07:00
def _truncate_tool_call_args_json ( args : str , head_chars : int = 200 ) - > str :
""" Shrink long string values inside a tool-call arguments JSON blob while
preserving JSON validity .
The ` ` function . arguments ` ` field on a tool call is a JSON - encoded string
passed through to the LLM provider ; downstream providers strictly
validate it and return a non - retryable 400 when it is not well - formed .
An earlier implementation sliced the raw JSON at a fixed byte offset and
appended ` ` . . . [ truncated ] ` ` — which routinely produced strings like : :
{ " path " : " /foo/bar " , " content " : " # long markdown
. . . [ truncated ]
i . e . an unterminated string and a missing closing brace . MiniMax , for
example , rejects this with ` ` invalid function arguments json string ` `
and the session gets stuck re - sending the same broken history on every
turn . See issue #11762 for the observed loop.
This helper parses the arguments , shrinks long string leaves inside the
parsed structure , and re - serialises . Non - string values ( paths , ints ,
booleans ) are preserved intact . If the arguments are not valid JSON
to begin with — some model backends use non - JSON tool arguments — the
original string is returned unchanged rather than replaced with
something neither we nor the backend can parse .
"""
try :
parsed = json . loads ( args )
except ( ValueError , TypeError ) :
return args
def _shrink ( obj : Any ) - > Any :
if isinstance ( obj , str ) :
if len ( obj ) > head_chars :
return obj [ : head_chars ] + " ...[truncated] "
return obj
if isinstance ( obj , dict ) :
return { k : _shrink ( v ) for k , v in obj . items ( ) }
if isinstance ( obj , list ) :
return [ _shrink ( v ) for v in obj ]
return obj
shrunken = _shrink ( parsed )
# ensure_ascii=False preserves CJK/emoji instead of bloating with \uXXXX
return json . dumps ( shrunken , ensure_ascii = False )
feat(compressor): smart collapse, dedup, anti-thrashing, template upgrade, hardening
Combined salvage of PRs #9661, #9663, #9674, #9677, #9678 by kshitijk4poor.
- Smart tool output collapse: informative 1-line summaries replace generic placeholder
- Dedup identical tool results via MD5 hash, truncate large tool_call arguments
- Anti-thrashing: skip compression after 2 consecutive <10% savings passes
- Structured action-log summary template with numbered actions and Active State
- Hardening: max_tokens 1.3x cap, multimodal safety, note idempotency, adaptive cooldown
Follow-up fixes applied during salvage:
- web_extract: reads 'urls' (list) not 'url' (original PR bug)
- Multimodal list content guards in dedup and prune passes
- Kept 'Relevant Files' section in template (original PR removed it)
Skipped PRs #9665 (user msg preservation — duplication risk) and #9675 (dead code).
2026-04-14 22:21:04 -07:00
def _summarize_tool_result ( tool_name : str , tool_args : str , tool_content : str ) - > str :
""" Create an informative 1-line summary of a tool call + result.
Used during the pre - compression pruning pass to replace large tool
outputs with a short but useful description of what the tool did ,
rather than a generic placeholder that carries zero information .
Returns strings like : :
[ terminal ] ran ` npm test ` - > exit 0 , 47 lines output
[ read_file ] read config . py from line 1 ( 1 , 200 chars )
[ search_files ] content search for ' compress ' in agent / - > 12 matches
"""
try :
args = json . loads ( tool_args ) if tool_args else { }
except ( json . JSONDecodeError , TypeError ) :
args = { }
content = tool_content or " "
content_len = len ( content )
line_count = content . count ( " \n " ) + 1 if content . strip ( ) else 0
if tool_name == " terminal " :
cmd = args . get ( " command " , " " )
if len ( cmd ) > 80 :
cmd = cmd [ : 77 ] + " ... "
exit_match = re . search ( r ' " exit_code " \ s*: \ s*(-? \ d+) ' , content )
exit_code = exit_match . group ( 1 ) if exit_match else " ? "
return f " [terminal] ran ` { cmd } ` -> exit { exit_code } , { line_count } lines output "
if tool_name == " read_file " :
path = args . get ( " path " , " ? " )
offset = args . get ( " offset " , 1 )
return f " [read_file] read { path } from line { offset } ( { content_len : , } chars) "
if tool_name == " write_file " :
path = args . get ( " path " , " ? " )
written_lines = args . get ( " content " , " " ) . count ( " \n " ) + 1 if args . get ( " content " ) else " ? "
return f " [write_file] wrote to { path } ( { written_lines } lines) "
if tool_name == " search_files " :
pattern = args . get ( " pattern " , " ? " )
path = args . get ( " path " , " . " )
target = args . get ( " target " , " content " )
match_count = re . search ( r ' " total_count " \ s*: \ s*( \ d+) ' , content )
count = match_count . group ( 1 ) if match_count else " ? "
return f " [search_files] { target } search for ' { pattern } ' in { path } -> { count } matches "
if tool_name == " patch " :
path = args . get ( " path " , " ? " )
mode = args . get ( " mode " , " replace " )
return f " [patch] { mode } in { path } ( { content_len : , } chars result) "
if tool_name in ( " browser_navigate " , " browser_click " , " browser_snapshot " ,
" browser_type " , " browser_scroll " , " browser_vision " ) :
url = args . get ( " url " , " " )
ref = args . get ( " ref " , " " )
detail = f " { url } " if url else ( f " ref= { ref } " if ref else " " )
return f " [ { tool_name } ] { detail } ( { content_len : , } chars) "
if tool_name == " web_search " :
query = args . get ( " query " , " ? " )
return f " [web_search] query= ' { query } ' ( { content_len : , } chars result) "
if tool_name == " web_extract " :
urls = args . get ( " urls " , [ ] )
url_desc = urls [ 0 ] if isinstance ( urls , list ) and urls else " ? "
if isinstance ( urls , list ) and len ( urls ) > 1 :
url_desc + = f " (+ { len ( urls ) - 1 } more) "
return f " [web_extract] { url_desc } ( { content_len : , } chars) "
if tool_name == " delegate_task " :
goal = args . get ( " goal " , " " )
if len ( goal ) > 60 :
goal = goal [ : 57 ] + " ... "
return f " [delegate_task] ' { goal } ' ( { content_len : , } chars result) "
if tool_name == " execute_code " :
code_preview = ( args . get ( " code " ) or " " ) [ : 60 ] . replace ( " \n " , " " )
if len ( args . get ( " code " , " " ) ) > 60 :
code_preview + = " ... "
return f " [execute_code] ` { code_preview } ` ( { line_count } lines output) "
if tool_name in ( " skill_view " , " skills_list " , " skill_manage " ) :
name = args . get ( " name " , " ? " )
return f " [ { tool_name } ] name= { name } ( { content_len : , } chars) "
if tool_name == " vision_analyze " :
question = args . get ( " question " , " " ) [ : 50 ]
return f " [vision_analyze] ' { question } ' ( { content_len : , } chars) "
if tool_name == " memory " :
action = args . get ( " action " , " ? " )
target = args . get ( " target " , " ? " )
return f " [memory] { action } on { target } "
if tool_name == " todo " :
return " [todo] updated task list "
if tool_name == " clarify " :
return " [clarify] asked user a question "
if tool_name == " text_to_speech " :
return f " [text_to_speech] generated audio ( { content_len : , } chars) "
if tool_name == " cronjob " :
action = args . get ( " action " , " ? " )
return f " [cronjob] { action } "
if tool_name == " process " :
action = args . get ( " action " , " ? " )
sid = args . get ( " session_id " , " ? " )
return f " [process] { action } session= { sid } "
# Generic fallback
first_arg = " "
for k , v in list ( args . items ( ) ) [ : 2 ] :
sv = str ( v ) [ : 40 ]
first_arg + = f " { k } = { sv } "
return f " [ { tool_name } ] { first_arg } ( { content_len : , } chars result) "
2026-04-06 18:40:11 -07:00
class ContextCompressor ( ContextEngine ) :
""" Default context engine — compresses conversation context via lossy summarization.
2026-02-21 22:31:43 -08:00
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
Algorithm :
1. Prune old tool results ( cheap , no LLM call )
2. Protect head messages ( system prompt + first exchange )
3. Protect tail messages by token budget ( most recent ~ 20 K tokens )
4. Summarize middle turns with structured LLM prompt
5. On subsequent compactions , iteratively update the previous summary
2026-02-21 22:31:43 -08:00
"""
2026-04-06 18:40:11 -07:00
@property
def name ( self ) - > str :
return " compressor "
2026-04-06 18:44:12 -07:00
def on_session_reset ( self ) - > None :
""" Reset all per-session state for /new or /reset. """
super ( ) . on_session_reset ( )
self . _context_probed = False
self . _context_probe_persistable = False
self . _previous_summary = None
feat(compressor): smart collapse, dedup, anti-thrashing, template upgrade, hardening
Combined salvage of PRs #9661, #9663, #9674, #9677, #9678 by kshitijk4poor.
- Smart tool output collapse: informative 1-line summaries replace generic placeholder
- Dedup identical tool results via MD5 hash, truncate large tool_call arguments
- Anti-thrashing: skip compression after 2 consecutive <10% savings passes
- Structured action-log summary template with numbered actions and Active State
- Hardening: max_tokens 1.3x cap, multimodal safety, note idempotency, adaptive cooldown
Follow-up fixes applied during salvage:
- web_extract: reads 'urls' (list) not 'url' (original PR bug)
- Multimodal list content guards in dedup and prune passes
- Kept 'Relevant Files' section in template (original PR removed it)
Skipped PRs #9665 (user msg preservation — duplication risk) and #9675 (dead code).
2026-04-14 22:21:04 -07:00
self . _last_compression_savings_pct = 100.0
self . _ineffective_compression_count = 0
2026-04-06 18:44:12 -07:00
fix: robust context engine interface — config selection, plugin discovery, ABC completeness
Follow-up fixes for the context engine plugin slot (PR #5700):
- Enhance ContextEngine ABC: add threshold_percent, protect_first_n,
protect_last_n as class attributes; complete update_model() default
with threshold recalculation; clarify on_session_end() lifecycle docs
- Add ContextCompressor.update_model() override for model/provider/
base_url/api_key updates
- Replace all direct compressor internal access in run_agent.py with
ABC interface: switch_model(), fallback restore, context probing
all use update_model() now; _context_probed guarded with getattr/
hasattr for plugin engine compatibility
- Create plugins/context_engine/ directory with discovery module
(mirrors plugins/memory/ pattern) — discover_context_engines(),
load_context_engine()
- Add context.engine config key to DEFAULT_CONFIG (default: compressor)
- Config-driven engine selection in run_agent.__init__: checks config,
then plugins/context_engine/<name>/, then general plugin system,
falls back to built-in ContextCompressor
- Wire on_session_end() in shutdown_memory_provider() at real session
boundaries (CLI exit, /reset, gateway expiry)
2026-04-08 04:16:58 -07:00
def update_model (
self ,
model : str ,
context_length : int ,
base_url : str = " " ,
api_key : str = " " ,
provider : str = " " ,
2026-04-12 00:10:19 -04:00
api_mode : str = " " ,
fix: robust context engine interface — config selection, plugin discovery, ABC completeness
Follow-up fixes for the context engine plugin slot (PR #5700):
- Enhance ContextEngine ABC: add threshold_percent, protect_first_n,
protect_last_n as class attributes; complete update_model() default
with threshold recalculation; clarify on_session_end() lifecycle docs
- Add ContextCompressor.update_model() override for model/provider/
base_url/api_key updates
- Replace all direct compressor internal access in run_agent.py with
ABC interface: switch_model(), fallback restore, context probing
all use update_model() now; _context_probed guarded with getattr/
hasattr for plugin engine compatibility
- Create plugins/context_engine/ directory with discovery module
(mirrors plugins/memory/ pattern) — discover_context_engines(),
load_context_engine()
- Add context.engine config key to DEFAULT_CONFIG (default: compressor)
- Config-driven engine selection in run_agent.__init__: checks config,
then plugins/context_engine/<name>/, then general plugin system,
falls back to built-in ContextCompressor
- Wire on_session_end() in shutdown_memory_provider() at real session
boundaries (CLI exit, /reset, gateway expiry)
2026-04-08 04:16:58 -07:00
) - > None :
""" Update model info after a model switch or fallback activation. """
self . model = model
self . base_url = base_url
self . api_key = api_key
self . provider = provider
2026-04-12 00:10:19 -04:00
self . api_mode = api_mode
fix: robust context engine interface — config selection, plugin discovery, ABC completeness
Follow-up fixes for the context engine plugin slot (PR #5700):
- Enhance ContextEngine ABC: add threshold_percent, protect_first_n,
protect_last_n as class attributes; complete update_model() default
with threshold recalculation; clarify on_session_end() lifecycle docs
- Add ContextCompressor.update_model() override for model/provider/
base_url/api_key updates
- Replace all direct compressor internal access in run_agent.py with
ABC interface: switch_model(), fallback restore, context probing
all use update_model() now; _context_probed guarded with getattr/
hasattr for plugin engine compatibility
- Create plugins/context_engine/ directory with discovery module
(mirrors plugins/memory/ pattern) — discover_context_engines(),
load_context_engine()
- Add context.engine config key to DEFAULT_CONFIG (default: compressor)
- Config-driven engine selection in run_agent.__init__: checks config,
then plugins/context_engine/<name>/, then general plugin system,
falls back to built-in ContextCompressor
- Wire on_session_end() in shutdown_memory_provider() at real session
boundaries (CLI exit, /reset, gateway expiry)
2026-04-08 04:16:58 -07:00
self . context_length = context_length
fix: prevent agent from stopping mid-task — compression floor, budget overhaul, activity tracking
Three root causes of the 'agent stops mid-task' gateway bug:
1. Compression threshold floor (64K tokens minimum)
- The 50% threshold on a 100K-context model fired at 50K tokens,
causing premature compression that made models lose track of
multi-step plans. Now threshold_tokens = max(50% * context, 64K).
- Models with <64K context are rejected at startup with a clear error.
2. Budget warning removal — grace call instead
- Removed the 70%/90% iteration budget warnings entirely. These
injected '[BUDGET WARNING: Provide your final response NOW]' into
tool results, causing models to abandon complex tasks prematurely.
- Now: no warnings during normal execution. When the budget is
actually exhausted (90/90), inject a user message asking the model
to summarise, allow one grace API call, and only then fall back
to _handle_max_iterations.
3. Activity touches during long terminal execution
- _wait_for_process polls every 0.2s but never reported activity.
The gateway's inactivity timeout (default 1800s) would fire during
long-running commands that appeared 'idle.'
- Now: thread-local activity callback fires every 10s during the
poll loop, keeping the gateway's activity tracker alive.
- Agent wires _touch_activity into the callback before each tool call.
Also: docs update noting 64K minimum context requirement.
Closes #7915 (root cause was agent-loop termination, not Weixin delivery limits).
2026-04-11 16:18:57 -07:00
self . threshold_tokens = max (
int ( context_length * self . threshold_percent ) ,
MINIMUM_CONTEXT_LENGTH ,
)
fix: robust context engine interface — config selection, plugin discovery, ABC completeness
Follow-up fixes for the context engine plugin slot (PR #5700):
- Enhance ContextEngine ABC: add threshold_percent, protect_first_n,
protect_last_n as class attributes; complete update_model() default
with threshold recalculation; clarify on_session_end() lifecycle docs
- Add ContextCompressor.update_model() override for model/provider/
base_url/api_key updates
- Replace all direct compressor internal access in run_agent.py with
ABC interface: switch_model(), fallback restore, context probing
all use update_model() now; _context_probed guarded with getattr/
hasattr for plugin engine compatibility
- Create plugins/context_engine/ directory with discovery module
(mirrors plugins/memory/ pattern) — discover_context_engines(),
load_context_engine()
- Add context.engine config key to DEFAULT_CONFIG (default: compressor)
- Config-driven engine selection in run_agent.__init__: checks config,
then plugins/context_engine/<name>/, then general plugin system,
falls back to built-in ContextCompressor
- Wire on_session_end() in shutdown_memory_provider() at real session
boundaries (CLI exit, /reset, gateway expiry)
2026-04-08 04:16:58 -07:00
2026-02-21 22:31:43 -08:00
def __init__ (
self ,
model : str ,
2026-03-24 18:48:04 -07:00
threshold_percent : float = 0.50 ,
2026-02-21 22:31:43 -08:00
protect_first_n : int = 3 ,
2026-03-24 17:45:49 -07:00
protect_last_n : int = 20 ,
2026-03-24 18:48:04 -07:00
summary_target_ratio : float = 0.20 ,
2026-02-21 22:31:43 -08:00
quiet_mode : bool = False ,
2026-02-28 04:46:35 -08:00
summary_model_override : str = None ,
2026-03-05 16:09:57 -08:00
base_url : str = " " ,
2026-03-18 03:04:07 -07:00
api_key : str = " " ,
2026-03-19 06:01:16 -07:00
config_context_length : int | None = None ,
feat: overhaul context length detection with models.dev and provider-aware resolution (#2158)
Replace the fragile hardcoded context length system with a multi-source
resolution chain that correctly identifies context windows per provider.
Key changes:
- New agent/models_dev.py: Fetches and caches the models.dev registry
(3800+ models across 100+ providers with per-provider context windows).
In-memory cache (1hr TTL) + disk cache for cold starts.
- Rewritten get_model_context_length() resolution chain:
0. Config override (model.context_length)
1. Custom providers per-model context_length
2. Persistent disk cache
3. Endpoint /models (local servers)
4. Anthropic /v1/models API (max_input_tokens, API-key only)
5. OpenRouter live API (existing, unchanged)
6. Nous suffix-match via OpenRouter (dot/dash normalization)
7. models.dev registry lookup (provider-aware)
8. Thin hardcoded defaults (broad family patterns)
9. 128K fallback (was 2M)
- Provider-aware context: same model now correctly resolves to different
context windows per provider (e.g. claude-opus-4.6: 1M on Anthropic,
128K on GitHub Copilot). Provider name flows through ContextCompressor.
- DEFAULT_CONTEXT_LENGTHS shrunk from 80+ entries to ~16 broad patterns.
models.dev replaces the per-model hardcoding.
- CONTEXT_PROBE_TIERS changed from [2M, 1M, 512K, 200K, 128K, 64K, 32K]
to [128K, 64K, 32K, 16K, 8K]. Unknown models no longer start at 2M.
- hermes model: prompts for context_length when configuring custom
endpoints. Supports shorthand (32k, 128K). Saved to custom_providers
per-model config.
- custom_providers schema extended with optional models dict for
per-model context_length (backward compatible).
- Nous Portal: suffix-matches bare IDs (claude-opus-4-6) against
OpenRouter's prefixed IDs (anthropic/claude-opus-4.6) with dot/dash
normalization. Handles all 15 current Nous models.
- Anthropic direct: queries /v1/models for max_input_tokens. Only works
with regular API keys (sk-ant-api*), not OAuth tokens. Falls through
to models.dev for OAuth users.
Tests: 5574 passed (18 new tests for models_dev + updated probe tiers)
Docs: Updated configuration.md context length section, AGENTS.md
Co-authored-by: Test <test@test.com>
2026-03-20 06:04:33 -07:00
provider : str = " " ,
2026-04-12 00:10:19 -04:00
api_mode : str = " " ,
2026-02-21 22:31:43 -08:00
) :
self . model = model
2026-03-05 16:09:57 -08:00
self . base_url = base_url
2026-03-18 03:04:07 -07:00
self . api_key = api_key
feat: overhaul context length detection with models.dev and provider-aware resolution (#2158)
Replace the fragile hardcoded context length system with a multi-source
resolution chain that correctly identifies context windows per provider.
Key changes:
- New agent/models_dev.py: Fetches and caches the models.dev registry
(3800+ models across 100+ providers with per-provider context windows).
In-memory cache (1hr TTL) + disk cache for cold starts.
- Rewritten get_model_context_length() resolution chain:
0. Config override (model.context_length)
1. Custom providers per-model context_length
2. Persistent disk cache
3. Endpoint /models (local servers)
4. Anthropic /v1/models API (max_input_tokens, API-key only)
5. OpenRouter live API (existing, unchanged)
6. Nous suffix-match via OpenRouter (dot/dash normalization)
7. models.dev registry lookup (provider-aware)
8. Thin hardcoded defaults (broad family patterns)
9. 128K fallback (was 2M)
- Provider-aware context: same model now correctly resolves to different
context windows per provider (e.g. claude-opus-4.6: 1M on Anthropic,
128K on GitHub Copilot). Provider name flows through ContextCompressor.
- DEFAULT_CONTEXT_LENGTHS shrunk from 80+ entries to ~16 broad patterns.
models.dev replaces the per-model hardcoding.
- CONTEXT_PROBE_TIERS changed from [2M, 1M, 512K, 200K, 128K, 64K, 32K]
to [128K, 64K, 32K, 16K, 8K]. Unknown models no longer start at 2M.
- hermes model: prompts for context_length when configuring custom
endpoints. Supports shorthand (32k, 128K). Saved to custom_providers
per-model config.
- custom_providers schema extended with optional models dict for
per-model context_length (backward compatible).
- Nous Portal: suffix-matches bare IDs (claude-opus-4-6) against
OpenRouter's prefixed IDs (anthropic/claude-opus-4.6) with dot/dash
normalization. Handles all 15 current Nous models.
- Anthropic direct: queries /v1/models for max_input_tokens. Only works
with regular API keys (sk-ant-api*), not OAuth tokens. Falls through
to models.dev for OAuth users.
Tests: 5574 passed (18 new tests for models_dev + updated probe tiers)
Docs: Updated configuration.md context length section, AGENTS.md
Co-authored-by: Test <test@test.com>
2026-03-20 06:04:33 -07:00
self . provider = provider
2026-04-12 00:10:19 -04:00
self . api_mode = api_mode
2026-02-21 22:31:43 -08:00
self . threshold_percent = threshold_percent
self . protect_first_n = protect_first_n
self . protect_last_n = protect_last_n
2026-03-24 17:45:49 -07:00
self . summary_target_ratio = max ( 0.10 , min ( summary_target_ratio , 0.80 ) )
2026-02-21 22:31:43 -08:00
self . quiet_mode = quiet_mode
2026-03-19 06:01:16 -07:00
self . context_length = get_model_context_length (
model , base_url = base_url , api_key = api_key ,
config_context_length = config_context_length ,
feat: overhaul context length detection with models.dev and provider-aware resolution (#2158)
Replace the fragile hardcoded context length system with a multi-source
resolution chain that correctly identifies context windows per provider.
Key changes:
- New agent/models_dev.py: Fetches and caches the models.dev registry
(3800+ models across 100+ providers with per-provider context windows).
In-memory cache (1hr TTL) + disk cache for cold starts.
- Rewritten get_model_context_length() resolution chain:
0. Config override (model.context_length)
1. Custom providers per-model context_length
2. Persistent disk cache
3. Endpoint /models (local servers)
4. Anthropic /v1/models API (max_input_tokens, API-key only)
5. OpenRouter live API (existing, unchanged)
6. Nous suffix-match via OpenRouter (dot/dash normalization)
7. models.dev registry lookup (provider-aware)
8. Thin hardcoded defaults (broad family patterns)
9. 128K fallback (was 2M)
- Provider-aware context: same model now correctly resolves to different
context windows per provider (e.g. claude-opus-4.6: 1M on Anthropic,
128K on GitHub Copilot). Provider name flows through ContextCompressor.
- DEFAULT_CONTEXT_LENGTHS shrunk from 80+ entries to ~16 broad patterns.
models.dev replaces the per-model hardcoding.
- CONTEXT_PROBE_TIERS changed from [2M, 1M, 512K, 200K, 128K, 64K, 32K]
to [128K, 64K, 32K, 16K, 8K]. Unknown models no longer start at 2M.
- hermes model: prompts for context_length when configuring custom
endpoints. Supports shorthand (32k, 128K). Saved to custom_providers
per-model config.
- custom_providers schema extended with optional models dict for
per-model context_length (backward compatible).
- Nous Portal: suffix-matches bare IDs (claude-opus-4-6) against
OpenRouter's prefixed IDs (anthropic/claude-opus-4.6) with dot/dash
normalization. Handles all 15 current Nous models.
- Anthropic direct: queries /v1/models for max_input_tokens. Only works
with regular API keys (sk-ant-api*), not OAuth tokens. Falls through
to models.dev for OAuth users.
Tests: 5574 passed (18 new tests for models_dev + updated probe tiers)
Docs: Updated configuration.md context length section, AGENTS.md
Co-authored-by: Test <test@test.com>
2026-03-20 06:04:33 -07:00
provider = provider ,
2026-03-19 06:01:16 -07:00
)
fix: prevent agent from stopping mid-task — compression floor, budget overhaul, activity tracking
Three root causes of the 'agent stops mid-task' gateway bug:
1. Compression threshold floor (64K tokens minimum)
- The 50% threshold on a 100K-context model fired at 50K tokens,
causing premature compression that made models lose track of
multi-step plans. Now threshold_tokens = max(50% * context, 64K).
- Models with <64K context are rejected at startup with a clear error.
2. Budget warning removal — grace call instead
- Removed the 70%/90% iteration budget warnings entirely. These
injected '[BUDGET WARNING: Provide your final response NOW]' into
tool results, causing models to abandon complex tasks prematurely.
- Now: no warnings during normal execution. When the budget is
actually exhausted (90/90), inject a user message asking the model
to summarise, allow one grace API call, and only then fall back
to _handle_max_iterations.
3. Activity touches during long terminal execution
- _wait_for_process polls every 0.2s but never reported activity.
The gateway's inactivity timeout (default 1800s) would fire during
long-running commands that appeared 'idle.'
- Now: thread-local activity callback fires every 10s during the
poll loop, keeping the gateway's activity tracker alive.
- Agent wires _touch_activity into the callback before each tool call.
Also: docs update noting 64K minimum context requirement.
Closes #7915 (root cause was agent-loop termination, not Weixin delivery limits).
2026-04-11 16:18:57 -07:00
# Floor: never compress below MINIMUM_CONTEXT_LENGTH tokens even if
# the percentage would suggest a lower value. This prevents premature
# compression on large-context models at 50% while keeping the % sane
# for models right at the minimum.
self . threshold_tokens = max (
int ( self . context_length * threshold_percent ) ,
MINIMUM_CONTEXT_LENGTH ,
)
2026-02-21 22:31:43 -08:00
self . compression_count = 0
2026-03-21 10:47:44 -07:00
2026-03-24 18:48:04 -07:00
# Derive token budgets: ratio is relative to the threshold, not total context
target_tokens = int ( self . threshold_tokens * self . summary_target_ratio )
2026-03-24 17:45:49 -07:00
self . tail_token_budget = target_tokens
self . max_summary_tokens = min (
int ( self . context_length * 0.05 ) , _SUMMARY_TOKENS_CEILING ,
)
2026-03-21 10:47:44 -07:00
if not quiet_mode :
logger . info (
" Context compressor initialized: model= %s context_length= %d "
2026-03-24 17:45:49 -07:00
" threshold= %d ( %.0f %% ) target_ratio= %.0f %% tail_budget= %d "
" provider= %s base_url= %s " ,
2026-03-21 10:47:44 -07:00
model , self . context_length , self . threshold_tokens ,
2026-03-24 17:45:49 -07:00
threshold_percent * 100 , self . summary_target_ratio * 100 ,
self . tail_token_budget ,
provider or " none " , base_url or " none " ,
2026-03-21 10:47:44 -07:00
)
2026-03-05 16:09:57 -08:00
self . _context_probed = False # True after a step-down from context error
2026-02-21 22:31:43 -08:00
self . last_prompt_tokens = 0
self . last_completion_tokens = 0
2026-03-11 20:52:19 -07:00
self . summary_model = summary_model_override or " "
2026-02-21 22:31:43 -08:00
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
# Stores the previous compaction summary for iterative updates
self . _previous_summary : Optional [ str ] = None
feat(compressor): smart collapse, dedup, anti-thrashing, template upgrade, hardening
Combined salvage of PRs #9661, #9663, #9674, #9677, #9678 by kshitijk4poor.
- Smart tool output collapse: informative 1-line summaries replace generic placeholder
- Dedup identical tool results via MD5 hash, truncate large tool_call arguments
- Anti-thrashing: skip compression after 2 consecutive <10% savings passes
- Structured action-log summary template with numbered actions and Active State
- Hardening: max_tokens 1.3x cap, multimodal safety, note idempotency, adaptive cooldown
Follow-up fixes applied during salvage:
- web_extract: reads 'urls' (list) not 'url' (original PR bug)
- Multimodal list content guards in dedup and prune passes
- Kept 'Relevant Files' section in template (original PR removed it)
Skipped PRs #9665 (user msg preservation — duplication risk) and #9675 (dead code).
2026-04-14 22:21:04 -07:00
# Anti-thrashing: track whether last compression was effective
self . _last_compression_savings_pct : float = 100.0
self . _ineffective_compression_count : int = 0
2026-04-07 04:28:11 +09:00
self . _summary_failure_cooldown_until : float = 0.0
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
2026-02-21 22:31:43 -08:00
def update_from_response ( self , usage : Dict [ str , Any ] ) :
""" Update tracked token usage from API response. """
self . last_prompt_tokens = usage . get ( " prompt_tokens " , 0 )
self . last_completion_tokens = usage . get ( " completion_tokens " , 0 )
def should_compress ( self , prompt_tokens : int = None ) - > bool :
feat(compressor): smart collapse, dedup, anti-thrashing, template upgrade, hardening
Combined salvage of PRs #9661, #9663, #9674, #9677, #9678 by kshitijk4poor.
- Smart tool output collapse: informative 1-line summaries replace generic placeholder
- Dedup identical tool results via MD5 hash, truncate large tool_call arguments
- Anti-thrashing: skip compression after 2 consecutive <10% savings passes
- Structured action-log summary template with numbered actions and Active State
- Hardening: max_tokens 1.3x cap, multimodal safety, note idempotency, adaptive cooldown
Follow-up fixes applied during salvage:
- web_extract: reads 'urls' (list) not 'url' (original PR bug)
- Multimodal list content guards in dedup and prune passes
- Kept 'Relevant Files' section in template (original PR removed it)
Skipped PRs #9665 (user msg preservation — duplication risk) and #9675 (dead code).
2026-04-14 22:21:04 -07:00
""" Check if context exceeds the compression threshold.
Includes anti - thrashing protection : if the last two compressions
each saved less than 10 % , skip compression to avoid infinite loops
where each pass removes only 1 - 2 messages .
"""
2026-02-21 22:31:43 -08:00
tokens = prompt_tokens if prompt_tokens is not None else self . last_prompt_tokens
feat(compressor): smart collapse, dedup, anti-thrashing, template upgrade, hardening
Combined salvage of PRs #9661, #9663, #9674, #9677, #9678 by kshitijk4poor.
- Smart tool output collapse: informative 1-line summaries replace generic placeholder
- Dedup identical tool results via MD5 hash, truncate large tool_call arguments
- Anti-thrashing: skip compression after 2 consecutive <10% savings passes
- Structured action-log summary template with numbered actions and Active State
- Hardening: max_tokens 1.3x cap, multimodal safety, note idempotency, adaptive cooldown
Follow-up fixes applied during salvage:
- web_extract: reads 'urls' (list) not 'url' (original PR bug)
- Multimodal list content guards in dedup and prune passes
- Kept 'Relevant Files' section in template (original PR removed it)
Skipped PRs #9665 (user msg preservation — duplication risk) and #9675 (dead code).
2026-04-14 22:21:04 -07:00
if tokens < self . threshold_tokens :
return False
# Anti-thrashing: back off if recent compressions were ineffective
if self . _ineffective_compression_count > = 2 :
if not self . quiet_mode :
logger . warning (
" Compression skipped — last %d compressions saved <10 %% each. "
" Consider /new to start a fresh session, or /compress <topic> "
" for focused compression. " ,
self . _ineffective_compression_count ,
)
return False
return True
2026-02-21 22:31:43 -08:00
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
# ------------------------------------------------------------------
# Tool output pruning (cheap pre-pass, no LLM call)
# ------------------------------------------------------------------
2026-02-21 22:31:43 -08:00
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
def _prune_old_tool_results (
self , messages : List [ Dict [ str , Any ] ] , protect_tail_count : int ,
2026-04-08 17:36:32 +00:00
protect_tail_tokens : int | None = None ,
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
) - > tuple [ List [ Dict [ str , Any ] ] , int ] :
feat(compressor): smart collapse, dedup, anti-thrashing, template upgrade, hardening
Combined salvage of PRs #9661, #9663, #9674, #9677, #9678 by kshitijk4poor.
- Smart tool output collapse: informative 1-line summaries replace generic placeholder
- Dedup identical tool results via MD5 hash, truncate large tool_call arguments
- Anti-thrashing: skip compression after 2 consecutive <10% savings passes
- Structured action-log summary template with numbered actions and Active State
- Hardening: max_tokens 1.3x cap, multimodal safety, note idempotency, adaptive cooldown
Follow-up fixes applied during salvage:
- web_extract: reads 'urls' (list) not 'url' (original PR bug)
- Multimodal list content guards in dedup and prune passes
- Kept 'Relevant Files' section in template (original PR removed it)
Skipped PRs #9665 (user msg preservation — duplication risk) and #9675 (dead code).
2026-04-14 22:21:04 -07:00
""" Replace old tool result contents with informative 1-line summaries.
Instead of a generic placeholder , generates a summary like : :
[ terminal ] ran ` npm test ` - > exit 0 , 47 lines output
[ read_file ] read config . py from line 1 ( 3 , 400 chars )
Also deduplicates identical tool results ( e . g . reading the same file
5 x keeps only the newest full copy ) and truncates large tool_call
arguments in assistant messages outside the protected tail .
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
2026-04-08 17:36:32 +00:00
Walks backward from the end , protecting the most recent messages that
fall within ` ` protect_tail_tokens ` ` ( when provided ) OR the last
` ` protect_tail_count ` ` messages ( backward - compatible default ) .
When both are given , the token budget takes priority and the message
count acts as a hard minimum floor .
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
Returns ( pruned_messages , pruned_count ) .
"""
if not messages :
return messages , 0
result = [ m . copy ( ) for m in messages ]
pruned = 0
2026-04-08 17:36:32 +00:00
feat(compressor): smart collapse, dedup, anti-thrashing, template upgrade, hardening
Combined salvage of PRs #9661, #9663, #9674, #9677, #9678 by kshitijk4poor.
- Smart tool output collapse: informative 1-line summaries replace generic placeholder
- Dedup identical tool results via MD5 hash, truncate large tool_call arguments
- Anti-thrashing: skip compression after 2 consecutive <10% savings passes
- Structured action-log summary template with numbered actions and Active State
- Hardening: max_tokens 1.3x cap, multimodal safety, note idempotency, adaptive cooldown
Follow-up fixes applied during salvage:
- web_extract: reads 'urls' (list) not 'url' (original PR bug)
- Multimodal list content guards in dedup and prune passes
- Kept 'Relevant Files' section in template (original PR removed it)
Skipped PRs #9665 (user msg preservation — duplication risk) and #9675 (dead code).
2026-04-14 22:21:04 -07:00
# Build index: tool_call_id -> (tool_name, arguments_json)
call_id_to_tool : Dict [ str , tuple ] = { }
for msg in result :
if msg . get ( " role " ) == " assistant " :
for tc in msg . get ( " tool_calls " ) or [ ] :
if isinstance ( tc , dict ) :
cid = tc . get ( " id " , " " )
fn = tc . get ( " function " , { } )
call_id_to_tool [ cid ] = ( fn . get ( " name " , " unknown " ) , fn . get ( " arguments " , " " ) )
else :
cid = getattr ( tc , " id " , " " ) or " "
fn = getattr ( tc , " function " , None )
name = getattr ( fn , " name " , " unknown " ) if fn else " unknown "
args_str = getattr ( fn , " arguments " , " " ) if fn else " "
call_id_to_tool [ cid ] = ( name , args_str )
2026-04-08 17:36:32 +00:00
# Determine the prune boundary
if protect_tail_tokens is not None and protect_tail_tokens > 0 :
# Token-budget approach: walk backward accumulating tokens
accumulated = 0
boundary = len ( result )
min_protect = min ( protect_tail_count , len ( result ) - 1 )
for i in range ( len ( result ) - 1 , - 1 , - 1 ) :
msg = result [ i ]
feat(compressor): smart collapse, dedup, anti-thrashing, template upgrade, hardening
Combined salvage of PRs #9661, #9663, #9674, #9677, #9678 by kshitijk4poor.
- Smart tool output collapse: informative 1-line summaries replace generic placeholder
- Dedup identical tool results via MD5 hash, truncate large tool_call arguments
- Anti-thrashing: skip compression after 2 consecutive <10% savings passes
- Structured action-log summary template with numbered actions and Active State
- Hardening: max_tokens 1.3x cap, multimodal safety, note idempotency, adaptive cooldown
Follow-up fixes applied during salvage:
- web_extract: reads 'urls' (list) not 'url' (original PR bug)
- Multimodal list content guards in dedup and prune passes
- Kept 'Relevant Files' section in template (original PR removed it)
Skipped PRs #9665 (user msg preservation — duplication risk) and #9675 (dead code).
2026-04-14 22:21:04 -07:00
raw_content = msg . get ( " content " ) or " "
content_len = sum ( len ( p . get ( " text " , " " ) ) for p in raw_content ) if isinstance ( raw_content , list ) else len ( raw_content )
2026-04-08 17:36:32 +00:00
msg_tokens = content_len / / _CHARS_PER_TOKEN + 10
for tc in msg . get ( " tool_calls " ) or [ ] :
if isinstance ( tc , dict ) :
args = tc . get ( " function " , { } ) . get ( " arguments " , " " )
msg_tokens + = len ( args ) / / _CHARS_PER_TOKEN
if accumulated + msg_tokens > protect_tail_tokens and ( len ( result ) - i ) > = min_protect :
boundary = i
break
accumulated + = msg_tokens
boundary = i
prune_boundary = max ( boundary , len ( result ) - min_protect )
else :
prune_boundary = len ( result ) - protect_tail_count
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
feat(compressor): smart collapse, dedup, anti-thrashing, template upgrade, hardening
Combined salvage of PRs #9661, #9663, #9674, #9677, #9678 by kshitijk4poor.
- Smart tool output collapse: informative 1-line summaries replace generic placeholder
- Dedup identical tool results via MD5 hash, truncate large tool_call arguments
- Anti-thrashing: skip compression after 2 consecutive <10% savings passes
- Structured action-log summary template with numbered actions and Active State
- Hardening: max_tokens 1.3x cap, multimodal safety, note idempotency, adaptive cooldown
Follow-up fixes applied during salvage:
- web_extract: reads 'urls' (list) not 'url' (original PR bug)
- Multimodal list content guards in dedup and prune passes
- Kept 'Relevant Files' section in template (original PR removed it)
Skipped PRs #9665 (user msg preservation — duplication risk) and #9675 (dead code).
2026-04-14 22:21:04 -07:00
# Pass 1: Deduplicate identical tool results.
# When the same file is read multiple times, keep only the most recent
# full copy and replace older duplicates with a back-reference.
content_hashes : dict = { } # hash -> (index, tool_call_id)
for i in range ( len ( result ) - 1 , - 1 , - 1 ) :
msg = result [ i ]
if msg . get ( " role " ) != " tool " :
continue
content = msg . get ( " content " ) or " "
# Skip multimodal content (list of content blocks)
if isinstance ( content , list ) :
continue
if len ( content ) < 200 :
continue
h = hashlib . md5 ( content . encode ( " utf-8 " , errors = " replace " ) ) . hexdigest ( ) [ : 12 ]
if h in content_hashes :
# This is an older duplicate — replace with back-reference
result [ i ] = { * * msg , " content " : " [Duplicate tool output — same content as a more recent call] " }
pruned + = 1
else :
content_hashes [ h ] = ( i , msg . get ( " tool_call_id " , " ? " ) )
# Pass 2: Replace old tool results with informative summaries
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
for i in range ( prune_boundary ) :
msg = result [ i ]
if msg . get ( " role " ) != " tool " :
continue
content = msg . get ( " content " , " " )
feat(compressor): smart collapse, dedup, anti-thrashing, template upgrade, hardening
Combined salvage of PRs #9661, #9663, #9674, #9677, #9678 by kshitijk4poor.
- Smart tool output collapse: informative 1-line summaries replace generic placeholder
- Dedup identical tool results via MD5 hash, truncate large tool_call arguments
- Anti-thrashing: skip compression after 2 consecutive <10% savings passes
- Structured action-log summary template with numbered actions and Active State
- Hardening: max_tokens 1.3x cap, multimodal safety, note idempotency, adaptive cooldown
Follow-up fixes applied during salvage:
- web_extract: reads 'urls' (list) not 'url' (original PR bug)
- Multimodal list content guards in dedup and prune passes
- Kept 'Relevant Files' section in template (original PR removed it)
Skipped PRs #9665 (user msg preservation — duplication risk) and #9675 (dead code).
2026-04-14 22:21:04 -07:00
# Skip multimodal content (list of content blocks)
if isinstance ( content , list ) :
continue
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
if not content or content == _PRUNED_TOOL_PLACEHOLDER :
continue
feat(compressor): smart collapse, dedup, anti-thrashing, template upgrade, hardening
Combined salvage of PRs #9661, #9663, #9674, #9677, #9678 by kshitijk4poor.
- Smart tool output collapse: informative 1-line summaries replace generic placeholder
- Dedup identical tool results via MD5 hash, truncate large tool_call arguments
- Anti-thrashing: skip compression after 2 consecutive <10% savings passes
- Structured action-log summary template with numbered actions and Active State
- Hardening: max_tokens 1.3x cap, multimodal safety, note idempotency, adaptive cooldown
Follow-up fixes applied during salvage:
- web_extract: reads 'urls' (list) not 'url' (original PR bug)
- Multimodal list content guards in dedup and prune passes
- Kept 'Relevant Files' section in template (original PR removed it)
Skipped PRs #9665 (user msg preservation — duplication risk) and #9675 (dead code).
2026-04-14 22:21:04 -07:00
# Skip already-deduplicated or previously-summarized results
if content . startswith ( " [Duplicate tool output " ) :
continue
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
# Only prune if the content is substantial (>200 chars)
if len ( content ) > 200 :
feat(compressor): smart collapse, dedup, anti-thrashing, template upgrade, hardening
Combined salvage of PRs #9661, #9663, #9674, #9677, #9678 by kshitijk4poor.
- Smart tool output collapse: informative 1-line summaries replace generic placeholder
- Dedup identical tool results via MD5 hash, truncate large tool_call arguments
- Anti-thrashing: skip compression after 2 consecutive <10% savings passes
- Structured action-log summary template with numbered actions and Active State
- Hardening: max_tokens 1.3x cap, multimodal safety, note idempotency, adaptive cooldown
Follow-up fixes applied during salvage:
- web_extract: reads 'urls' (list) not 'url' (original PR bug)
- Multimodal list content guards in dedup and prune passes
- Kept 'Relevant Files' section in template (original PR removed it)
Skipped PRs #9665 (user msg preservation — duplication risk) and #9675 (dead code).
2026-04-14 22:21:04 -07:00
call_id = msg . get ( " tool_call_id " , " " )
tool_name , tool_args = call_id_to_tool . get ( call_id , ( " unknown " , " " ) )
summary = _summarize_tool_result ( tool_name , tool_args , content )
result [ i ] = { * * msg , " content " : summary }
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
pruned + = 1
feat(compressor): smart collapse, dedup, anti-thrashing, template upgrade, hardening
Combined salvage of PRs #9661, #9663, #9674, #9677, #9678 by kshitijk4poor.
- Smart tool output collapse: informative 1-line summaries replace generic placeholder
- Dedup identical tool results via MD5 hash, truncate large tool_call arguments
- Anti-thrashing: skip compression after 2 consecutive <10% savings passes
- Structured action-log summary template with numbered actions and Active State
- Hardening: max_tokens 1.3x cap, multimodal safety, note idempotency, adaptive cooldown
Follow-up fixes applied during salvage:
- web_extract: reads 'urls' (list) not 'url' (original PR bug)
- Multimodal list content guards in dedup and prune passes
- Kept 'Relevant Files' section in template (original PR removed it)
Skipped PRs #9665 (user msg preservation — duplication risk) and #9675 (dead code).
2026-04-14 22:21:04 -07:00
# Pass 3: Truncate large tool_call arguments in assistant messages
# outside the protected tail. write_file with 50KB content, for
# example, survives pruning entirely without this.
fix(context_compressor): keep tool-call arguments JSON valid when shrinking
Pass 3 of `_prune_old_tool_results` previously shrunk long `function.arguments`
blobs by slicing the raw JSON string at byte 200 and appending the literal
text `...[truncated]`. That routinely produced payloads like::
{"path": "/foo.md", "content": "# Long markdown
...[truncated]
— an unterminated string with no closing brace. Strict providers (observed
on MiniMax) reject this as `invalid function arguments json string` with a
non-retryable 400. Because the broken call survives in the session history,
every subsequent turn re-sends the same malformed payload and gets the same
400, locking the session into a re-send loop until the call falls out of
the window.
Fix: parse the arguments first, shrink long string leaves inside the parsed
structure, and re-serialise. Non-string values (paths, ints, booleans, lists)
pass through intact. Arguments that are not valid JSON to begin with (rare,
some backends use non-JSON tool args) are returned unchanged rather than
replaced with something neither we nor the provider can parse.
Observed in the wild: a `write_file` with ~800 chars of markdown `content`
triggered this on a real session against MiniMax-M2.7; every turn after
compression got rejected until the session was manually reset.
Tests:
- 7 direct tests of `_truncate_tool_call_args_json` covering valid-JSON
output, non-JSON pass-through, nested structures, non-string leaves,
scalar JSON, and Unicode preservation
- 1 end-to-end test through `_prune_old_tool_results` Pass 3 that
reproduces the exact failure payload shape from the incident
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 13:32:10 -07:00
#
# The shrinking is done inside the parsed JSON structure so the
# result remains valid JSON — otherwise downstream providers 400
# on every subsequent turn until the broken call falls out of
# the window. See ``_truncate_tool_call_args_json`` docstring.
feat(compressor): smart collapse, dedup, anti-thrashing, template upgrade, hardening
Combined salvage of PRs #9661, #9663, #9674, #9677, #9678 by kshitijk4poor.
- Smart tool output collapse: informative 1-line summaries replace generic placeholder
- Dedup identical tool results via MD5 hash, truncate large tool_call arguments
- Anti-thrashing: skip compression after 2 consecutive <10% savings passes
- Structured action-log summary template with numbered actions and Active State
- Hardening: max_tokens 1.3x cap, multimodal safety, note idempotency, adaptive cooldown
Follow-up fixes applied during salvage:
- web_extract: reads 'urls' (list) not 'url' (original PR bug)
- Multimodal list content guards in dedup and prune passes
- Kept 'Relevant Files' section in template (original PR removed it)
Skipped PRs #9665 (user msg preservation — duplication risk) and #9675 (dead code).
2026-04-14 22:21:04 -07:00
for i in range ( prune_boundary ) :
msg = result [ i ]
if msg . get ( " role " ) != " assistant " or not msg . get ( " tool_calls " ) :
continue
new_tcs = [ ]
modified = False
for tc in msg [ " tool_calls " ] :
if isinstance ( tc , dict ) :
args = tc . get ( " function " , { } ) . get ( " arguments " , " " )
if len ( args ) > 500 :
fix(context_compressor): keep tool-call arguments JSON valid when shrinking
Pass 3 of `_prune_old_tool_results` previously shrunk long `function.arguments`
blobs by slicing the raw JSON string at byte 200 and appending the literal
text `...[truncated]`. That routinely produced payloads like::
{"path": "/foo.md", "content": "# Long markdown
...[truncated]
— an unterminated string with no closing brace. Strict providers (observed
on MiniMax) reject this as `invalid function arguments json string` with a
non-retryable 400. Because the broken call survives in the session history,
every subsequent turn re-sends the same malformed payload and gets the same
400, locking the session into a re-send loop until the call falls out of
the window.
Fix: parse the arguments first, shrink long string leaves inside the parsed
structure, and re-serialise. Non-string values (paths, ints, booleans, lists)
pass through intact. Arguments that are not valid JSON to begin with (rare,
some backends use non-JSON tool args) are returned unchanged rather than
replaced with something neither we nor the provider can parse.
Observed in the wild: a `write_file` with ~800 chars of markdown `content`
triggered this on a real session against MiniMax-M2.7; every turn after
compression got rejected until the session was manually reset.
Tests:
- 7 direct tests of `_truncate_tool_call_args_json` covering valid-JSON
output, non-JSON pass-through, nested structures, non-string leaves,
scalar JSON, and Unicode preservation
- 1 end-to-end test through `_prune_old_tool_results` Pass 3 that
reproduces the exact failure payload shape from the incident
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 13:32:10 -07:00
new_args = _truncate_tool_call_args_json ( args )
if new_args != args :
tc = { * * tc , " function " : { * * tc [ " function " ] , " arguments " : new_args } }
modified = True
feat(compressor): smart collapse, dedup, anti-thrashing, template upgrade, hardening
Combined salvage of PRs #9661, #9663, #9674, #9677, #9678 by kshitijk4poor.
- Smart tool output collapse: informative 1-line summaries replace generic placeholder
- Dedup identical tool results via MD5 hash, truncate large tool_call arguments
- Anti-thrashing: skip compression after 2 consecutive <10% savings passes
- Structured action-log summary template with numbered actions and Active State
- Hardening: max_tokens 1.3x cap, multimodal safety, note idempotency, adaptive cooldown
Follow-up fixes applied during salvage:
- web_extract: reads 'urls' (list) not 'url' (original PR bug)
- Multimodal list content guards in dedup and prune passes
- Kept 'Relevant Files' section in template (original PR removed it)
Skipped PRs #9665 (user msg preservation — duplication risk) and #9675 (dead code).
2026-04-14 22:21:04 -07:00
new_tcs . append ( tc )
if modified :
result [ i ] = { * * msg , " tool_calls " : new_tcs }
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
return result , pruned
# ------------------------------------------------------------------
# Summarization
# ------------------------------------------------------------------
def _compute_summary_budget ( self , turns_to_summarize : List [ Dict [ str , Any ] ] ) - > int :
2026-03-24 17:45:49 -07:00
""" Scale summary token budget with the amount of content being compressed.
The maximum scales with the model ' s context window (5 % o f context,
capped at ` ` _SUMMARY_TOKENS_CEILING ` ` ) so large - context models get
richer summaries instead of being hard - capped at 8 K tokens .
"""
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
content_tokens = estimate_messages_tokens_rough ( turns_to_summarize )
budget = int ( content_tokens * _SUMMARY_RATIO )
2026-03-24 17:45:49 -07:00
return max ( _MIN_SUMMARY_TOKENS , min ( budget , self . max_summary_tokens ) )
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
2026-03-30 23:34:58 -04:00
# Truncation limits for the summarizer input. These bound how much of
# each message the summary model sees — the budget is the *summary*
# model's context window, not the main model's.
_CONTENT_MAX = 6000 # total chars per message body
_CONTENT_HEAD = 4000 # chars kept from the start
_CONTENT_TAIL = 1500 # chars kept from the end
_TOOL_ARGS_MAX = 1500 # tool call argument chars
_TOOL_ARGS_HEAD = 1200 # kept from the start of tool args
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
def _serialize_for_summary ( self , turns : List [ Dict [ str , Any ] ] ) - > str :
""" Serialize conversation turns into labeled text for the summarizer.
2026-03-30 23:34:58 -04:00
Includes tool call arguments and result content ( up to
` ` _CONTENT_MAX ` ` chars per message ) so the summarizer can preserve
specific details like file paths , commands , and outputs .
2026-03-07 11:54:51 -08:00
"""
2026-02-21 22:31:43 -08:00
parts = [ ]
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
for msg in turns :
2026-02-21 22:31:43 -08:00
role = msg . get ( " role " , " unknown " )
2026-03-02 01:35:52 -08:00
content = msg . get ( " content " ) or " "
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
2026-03-30 23:34:58 -04:00
# Tool results: keep enough content for the summarizer
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
if role == " tool " :
tool_id = msg . get ( " tool_call_id " , " " )
2026-03-30 23:34:58 -04:00
if len ( content ) > self . _CONTENT_MAX :
content = content [ : self . _CONTENT_HEAD ] + " \n ...[truncated]... \n " + content [ - self . _CONTENT_TAIL : ]
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
parts . append ( f " [TOOL RESULT { tool_id } ]: { content } " )
continue
# Assistant messages: include tool call names AND arguments
if role == " assistant " :
2026-03-30 23:34:58 -04:00
if len ( content ) > self . _CONTENT_MAX :
content = content [ : self . _CONTENT_HEAD ] + " \n ...[truncated]... \n " + content [ - self . _CONTENT_TAIL : ]
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
tool_calls = msg . get ( " tool_calls " , [ ] )
if tool_calls :
tc_parts = [ ]
for tc in tool_calls :
if isinstance ( tc , dict ) :
fn = tc . get ( " function " , { } )
name = fn . get ( " name " , " ? " )
args = fn . get ( " arguments " , " " )
# Truncate long arguments but keep enough for context
2026-03-30 23:34:58 -04:00
if len ( args ) > self . _TOOL_ARGS_MAX :
args = args [ : self . _TOOL_ARGS_HEAD ] + " ... "
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
tc_parts . append ( f " { name } ( { args } ) " )
else :
fn = getattr ( tc , " function " , None )
name = getattr ( fn , " name " , " ? " ) if fn else " ? "
tc_parts . append ( f " { name } (...) " )
content + = " \n [Tool calls: \n " + " \n " . join ( tc_parts ) + " \n ] "
parts . append ( f " [ASSISTANT]: { content } " )
continue
# User and other roles
2026-03-30 23:34:58 -04:00
if len ( content ) > self . _CONTENT_MAX :
content = content [ : self . _CONTENT_HEAD ] + " \n ...[truncated]... \n " + content [ - self . _CONTENT_TAIL : ]
2026-02-21 22:31:43 -08:00
parts . append ( f " [ { role . upper ( ) } ]: { content } " )
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
return " \n \n " . join ( parts )
2026-04-11 19:23:29 -07:00
def _generate_summary ( self , turns_to_summarize : List [ Dict [ str , Any ] ] , focus_topic : str = None ) - > Optional [ str ] :
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
""" Generate a structured summary of conversation turns.
2026-04-11 19:43:58 -07:00
Uses a structured template ( Goal , Progress , Decisions , Resolved / Pending
Questions , Files , Remaining Work ) with explicit preamble telling the
summarizer not to answer questions . When a previous summary exists ,
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
generates an iterative update instead of summarizing from scratch .
2026-04-11 19:23:29 -07:00
Args :
focus_topic : Optional focus string for guided compression . When
provided , the summariser prioritises preserving information
related to this topic and is more aggressive about compressing
everything else . Inspired by Claude Code ' s ``/compact``.
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
Returns None if all attempts fail — the caller should drop
the middle turns without a summary rather than inject a useless
placeholder .
"""
2026-04-07 04:28:11 +09:00
now = time . monotonic ( )
if now < self . _summary_failure_cooldown_until :
logger . debug (
" Skipping context summary during cooldown ( %.0f s remaining) " ,
self . _summary_failure_cooldown_until - now ,
)
return None
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
summary_budget = self . _compute_summary_budget ( turns_to_summarize )
content_to_summarize = self . _serialize_for_summary ( turns_to_summarize )
2026-04-11 19:43:58 -07:00
# Preamble shared by both first-compaction and iterative-update prompts.
# Inspired by OpenCode's "do not respond to any questions" instruction
# and Codex's "another language model" framing.
_summarizer_preamble = (
" You are a summarization agent creating a context checkpoint. "
" Your output will be injected as reference material for a DIFFERENT "
" assistant that continues the conversation. "
" Do NOT respond to any questions or requests in the conversation — "
" only output the structured summary. "
2026-04-19 05:44:29 -07:00
" Do NOT include any preamble, greeting, or prefix. "
" Write the summary in the same language the user was using in the "
" conversation — do not translate or switch to English. "
2026-04-11 19:43:58 -07:00
)
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
2026-04-11 19:43:58 -07:00
# Shared structured template (used by both paths).
2026-04-16 19:22:22 +05:30
_template_sections = f """ ## Active Task
[ THE SINGLE MOST IMPORTANT FIELD . Copy the user ' s most recent request or
task assignment verbatim — the exact words they used . If multiple tasks
were requested and only some are done , list only the ones NOT yet completed .
The next assistant must pick up exactly here . Example :
" User asked: ' Now refactor the auth module to use JWT instead of sessions ' "
If no outstanding task exists , write " None. " ]
## Goal
[ What the user is trying to accomplish overall ]
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
## Constraints & Preferences
2026-04-11 19:43:58 -07:00
[ User preferences , coding style , constraints , important decisions ]
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
feat(compressor): smart collapse, dedup, anti-thrashing, template upgrade, hardening
Combined salvage of PRs #9661, #9663, #9674, #9677, #9678 by kshitijk4poor.
- Smart tool output collapse: informative 1-line summaries replace generic placeholder
- Dedup identical tool results via MD5 hash, truncate large tool_call arguments
- Anti-thrashing: skip compression after 2 consecutive <10% savings passes
- Structured action-log summary template with numbered actions and Active State
- Hardening: max_tokens 1.3x cap, multimodal safety, note idempotency, adaptive cooldown
Follow-up fixes applied during salvage:
- web_extract: reads 'urls' (list) not 'url' (original PR bug)
- Multimodal list content guards in dedup and prune passes
- Kept 'Relevant Files' section in template (original PR removed it)
Skipped PRs #9665 (user msg preservation — duplication risk) and #9675 (dead code).
2026-04-14 22:21:04 -07:00
## Completed Actions
[ Numbered list of concrete actions taken — include tool used , target , and outcome .
Format each as : N . ACTION target — outcome [ tool : name ]
Example :
1. READ config . py : 45 — found ` == ` should be ` != ` [ tool : read_file ]
2. PATCH config . py : 45 — changed ` == ` to ` != ` [ tool : patch ]
3. TEST ` pytest tests / ` — 3 / 50 failed : test_parse , test_validate , test_edge [ tool : terminal ]
Be specific with file paths , commands , line numbers , and results . ]
## Active State
[ Current working state — include :
- Working directory and branch ( if applicable )
- Modified / created files with brief note on each
- Test status ( X / Y passing )
- Any running processes or servers
- Environment details that matter ]
## In Progress
[ Work currently underway — what was being done when compaction fired ]
## Blocked
[ Any blockers , errors , or issues not yet resolved . Include exact error messages . ]
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
## Key Decisions
feat(compressor): smart collapse, dedup, anti-thrashing, template upgrade, hardening
Combined salvage of PRs #9661, #9663, #9674, #9677, #9678 by kshitijk4poor.
- Smart tool output collapse: informative 1-line summaries replace generic placeholder
- Dedup identical tool results via MD5 hash, truncate large tool_call arguments
- Anti-thrashing: skip compression after 2 consecutive <10% savings passes
- Structured action-log summary template with numbered actions and Active State
- Hardening: max_tokens 1.3x cap, multimodal safety, note idempotency, adaptive cooldown
Follow-up fixes applied during salvage:
- web_extract: reads 'urls' (list) not 'url' (original PR bug)
- Multimodal list content guards in dedup and prune passes
- Kept 'Relevant Files' section in template (original PR removed it)
Skipped PRs #9665 (user msg preservation — duplication risk) and #9675 (dead code).
2026-04-14 22:21:04 -07:00
[ Important technical decisions and WHY they were made ]
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
2026-04-11 19:43:58 -07:00
## Resolved Questions
[ Questions the user asked that were ALREADY answered — include the answer so the next assistant does not re - answer them ]
## Pending User Asks
[ Questions or requests from the user that have NOT yet been answered or fulfilled . If none , write " None. " ]
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
## Relevant Files
2026-04-11 19:43:58 -07:00
[ Files read , modified , or created — with brief note on each ]
2026-02-21 22:31:43 -08:00
2026-04-11 19:43:58 -07:00
## Remaining Work
[ What remains to be done — framed as context , not instructions ]
2026-02-21 22:31:43 -08:00
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
## Critical Context
[ Any specific values , error messages , configuration details , or data that would be lost without explicit preservation ]
feat(compressor): smart collapse, dedup, anti-thrashing, template upgrade, hardening
Combined salvage of PRs #9661, #9663, #9674, #9677, #9678 by kshitijk4poor.
- Smart tool output collapse: informative 1-line summaries replace generic placeholder
- Dedup identical tool results via MD5 hash, truncate large tool_call arguments
- Anti-thrashing: skip compression after 2 consecutive <10% savings passes
- Structured action-log summary template with numbered actions and Active State
- Hardening: max_tokens 1.3x cap, multimodal safety, note idempotency, adaptive cooldown
Follow-up fixes applied during salvage:
- web_extract: reads 'urls' (list) not 'url' (original PR bug)
- Multimodal list content guards in dedup and prune passes
- Kept 'Relevant Files' section in template (original PR removed it)
Skipped PRs #9665 (user msg preservation — duplication risk) and #9675 (dead code).
2026-04-14 22:21:04 -07:00
Target ~ { summary_budget } tokens . Be CONCRETE — include file paths , command outputs , error messages , line numbers , and specific values . Avoid vague descriptions like " made some changes " — say exactly what changed .
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
Write only the summary body . Do not include any preamble or prefix . """
2026-02-21 22:31:43 -08:00
2026-04-11 19:43:58 -07:00
if self . _previous_summary :
# Iterative update: preserve existing info, add new progress
prompt = f """ { _summarizer_preamble }
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
2026-04-11 19:43:58 -07:00
You are updating a context compaction summary . A previous compaction produced the summary below . New conversation turns have occurred since then and need to be incorporated .
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
2026-04-11 19:43:58 -07:00
PREVIOUS SUMMARY :
{ self . _previous_summary }
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
2026-04-11 19:43:58 -07:00
NEW TURNS TO INCORPORATE :
{ content_to_summarize }
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
2026-04-16 19:22:22 +05:30
Update the summary using this exact structure . PRESERVE all existing information that is still relevant . ADD new completed actions to the numbered list ( continue numbering ) . Move items from " In Progress " to " Completed Actions " when done . Move answered questions to " Resolved Questions " . Update " Active State " to reflect current state . Remove information only if it is clearly obsolete . CRITICAL : Update " ## Active Task " to reflect the user ' s most recent unfulfilled request — this is the most important field for task continuity.
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
2026-04-11 19:43:58 -07:00
{ _template_sections } """
else :
# First compaction: summarize from scratch
prompt = f """ { _summarizer_preamble }
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
2026-04-11 19:43:58 -07:00
Create a structured handoff summary for a different assistant that will continue this conversation after earlier turns are compacted . The next assistant should be able to understand what happened without re - reading the original turns .
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
2026-04-11 19:43:58 -07:00
TURNS TO SUMMARIZE :
{ content_to_summarize }
2026-03-30 23:34:58 -04:00
2026-04-11 19:43:58 -07:00
Use this exact structure :
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
2026-04-11 19:43:58 -07:00
{ _template_sections } """
2026-02-21 22:31:43 -08:00
2026-04-11 19:23:29 -07:00
# Inject focus topic guidance when the user provides one via /compress <focus>.
# This goes at the end of the prompt so it takes precedence.
if focus_topic :
prompt + = f """
FOCUS TOPIC : " {focus_topic} "
The user has requested that this compaction PRIORITISE preserving all information related to the focus topic above . For content related to " {focus_topic} " , include full detail — exact values , file paths , command outputs , error messages , and decisions . For content NOT related to the focus topic , summarise more aggressively ( brief one - liners or omit if truly irrelevant ) . The focus topic sections should receive roughly 60 - 70 % of the summary token budget . """
2026-03-04 17:58:09 -08:00
try :
2026-03-11 20:52:19 -07:00
call_kwargs = {
" task " : " compression " ,
2026-04-12 00:10:19 -04:00
" main_runtime " : {
" model " : self . model ,
" provider " : self . provider ,
" base_url " : self . base_url ,
" api_key " : self . api_key ,
" api_mode " : self . api_mode ,
} ,
2026-03-11 20:52:19 -07:00
" messages " : [ { " role " : " user " , " content " : prompt } ] ,
feat(compressor): smart collapse, dedup, anti-thrashing, template upgrade, hardening
Combined salvage of PRs #9661, #9663, #9674, #9677, #9678 by kshitijk4poor.
- Smart tool output collapse: informative 1-line summaries replace generic placeholder
- Dedup identical tool results via MD5 hash, truncate large tool_call arguments
- Anti-thrashing: skip compression after 2 consecutive <10% savings passes
- Structured action-log summary template with numbered actions and Active State
- Hardening: max_tokens 1.3x cap, multimodal safety, note idempotency, adaptive cooldown
Follow-up fixes applied during salvage:
- web_extract: reads 'urls' (list) not 'url' (original PR bug)
- Multimodal list content guards in dedup and prune passes
- Kept 'Relevant Files' section in template (original PR removed it)
Skipped PRs #9665 (user msg preservation — duplication risk) and #9675 (dead code).
2026-04-14 22:21:04 -07:00
" max_tokens " : int ( summary_budget * 1.3 ) ,
2026-03-28 14:35:28 -07:00
# timeout resolved from auxiliary.compression.timeout config by call_llm
2026-03-11 20:52:19 -07:00
}
if self . summary_model :
call_kwargs [ " model " ] = self . summary_model
response = call_llm ( * * call_kwargs )
feat(stt): add free local whisper transcription via faster-whisper (#1185)
* fix: Home Assistant event filtering now closed by default
Previously, when no watch_domains or watch_entities were configured,
ALL state_changed events passed through to the agent, causing users
to be flooded with notifications for every HA entity change.
Now events are dropped by default unless the user explicitly configures:
- watch_domains: list of domains to monitor (e.g. climate, light)
- watch_entities: list of specific entity IDs to monitor
- watch_all: true (new option — opt-in to receive all events)
A warning is logged at connect time if no filters are configured,
guiding users to set up their HA platform config.
All 49 gateway HA tests + 52 HA tool tests pass.
* docs: update Home Assistant integration documentation
- homeassistant.md: Fix event filtering docs to reflect closed-by-default
behavior. Add watch_all option. Replace Python dict config example with
YAML. Fix defaults table (was incorrectly showing 'all'). Add required
configuration warning admonition.
- environment-variables.md: Add HASS_TOKEN and HASS_URL to Messaging section.
- messaging/index.md: Add Home Assistant to description, architecture
diagram, platform toolsets table, and Next Steps links.
* fix(terminal): strip provider env vars from background and PTY subprocesses
Extends the env var blocklist from #1157 to also cover the two remaining
leaky paths in process_registry.py:
- spawn_local() PTY path (line 156)
- spawn_local() background Popen path (line 197)
Both were still using raw os.environ, leaking provider vars to background
processes and interactive PTY sessions. Now uses the same dynamic
_HERMES_PROVIDER_ENV_BLOCKLIST from local.py.
Explicit env_vars passed to spawn_local() still override the blocklist,
matching the existing behavior for callers that intentionally need these.
Gap identified by PR #1004 (@PeterFile).
* feat(delegate): add observability metadata to subagent results
Enrich delegate_task results with metadata from the child AIAgent:
- model: which model the child used
- exit_reason: completed | interrupted | max_iterations
- tokens.input / tokens.output: token counts
- tool_trace: per-tool-call trace with byte sizes and ok/error status
Tool trace uses tool_call_id matching to correctly pair parallel tool
calls with their results, with a fallback for messages without IDs.
Cherry-picked from PR #872 by @omerkaz, with fixes:
- Fixed parallel tool call trace pairing (was always updating last entry)
- Removed redundant 'iterations' field (identical to existing 'api_calls')
- Added test for parallel tool call trace correctness
Co-authored-by: omerkaz <omerkaz@users.noreply.github.com>
* feat(stt): add free local whisper transcription via faster-whisper
Replace OpenAI-only STT with a dual-provider system mirroring the TTS
architecture (Edge TTS free / ElevenLabs paid):
STT: faster-whisper local (free, default) / OpenAI Whisper API (paid)
Changes:
- tools/transcription_tools.py: Full rewrite with provider dispatch,
config loading, local faster-whisper backend, and OpenAI API backend.
Auto-downloads model (~150MB for 'base') on first voice message.
Singleton model instance reused across calls.
- pyproject.toml: Add faster-whisper>=1.0.0 as core dependency
- hermes_cli/config.py: Expand stt config to match TTS pattern with
provider selection and per-provider model settings
- agent/context_compressor.py: Fix .strip() crash when LLM returns
non-string content (dict from llama.cpp, None). Fixes #1100 partially.
- tests/: 23 new tests for STT providers + 2 for compressor fix
- docs/: Updated Voice & TTS page with STT provider table, model sizes,
config examples, and fallback behavior
Fallback behavior:
- Local not installed → OpenAI API (if key set)
- OpenAI key not set → local whisper (if installed)
- Neither → graceful error message to user
Co-authored-by: Jah-yee <Jah-yee@users.noreply.github.com>
---------
Co-authored-by: omerkaz <omerkaz@users.noreply.github.com>
Co-authored-by: Jah-yee <Jah-yee@users.noreply.github.com>
2026-03-13 11:11:05 -07:00
content = response . choices [ 0 ] . message . content
# Handle cases where content is not a string (e.g., dict from llama.cpp)
if not isinstance ( content , str ) :
content = str ( content ) if content else " "
summary = content . strip ( )
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
# Store for iterative updates on next compaction
self . _previous_summary = summary
2026-04-07 04:28:11 +09:00
self . _summary_failure_cooldown_until = 0.0
fix: stale agent timeout, uv venv detection, empty response after tools, compression model fallback (#9051, #8620, #9400) (#10093)
Four independent fixes:
1. Reset activity timestamp on cached agent reuse (#9051)
When the gateway reuses a cached AIAgent for a new turn, the
_last_activity_ts from the previous turn (possibly hours ago)
carried over. The inactivity timeout handler immediately saw
the agent as idle for hours and killed it.
Fix: reset _last_activity_ts, _last_activity_desc, and
_api_call_count when retrieving an agent from the cache.
2. Detect uv-managed virtual environments (#8620 sub-issue 1)
The systemd unit generator fell back to sys.executable (uv's
standalone Python) when running under 'uv run', because
sys.prefix == sys.base_prefix. The generated ExecStart pointed
to a Python binary without site-packages.
Fix: check VIRTUAL_ENV env var before falling back to
sys.executable. uv sets VIRTUAL_ENV even when sys.prefix
doesn't reflect the venv.
3. Nudge model to continue after empty post-tool response (#9400)
Weaker models sometimes return empty after tool calls. The agent
silently abandoned the remaining work.
Fix: append assistant('(empty)') + user nudge message and retry
once. Resets after each successful tool round.
4. Compression model fallback on permanent errors (#8620 sub-issue 4)
When the default summary model (gemini-3-flash) returns 503
'model_not_found' on custom proxies, the compressor entered a
600s cooldown, leaving context growing unbounded.
Fix: detect permanent model-not-found errors (503, 404,
'model_not_found', 'no available channel') and fall back to
using the main model for compression instead of entering
cooldown. One-time fallback with immediate retry.
Test plan: 40 compressor tests + 97 gateway/CLI tests + 9 venv tests pass
2026-04-14 22:38:17 -07:00
self . _summary_model_fallen_back = False
2026-03-14 02:33:31 -07:00
return self . _with_summary_prefix ( summary )
2026-03-11 20:52:19 -07:00
except RuntimeError :
feat(compressor): smart collapse, dedup, anti-thrashing, template upgrade, hardening
Combined salvage of PRs #9661, #9663, #9674, #9677, #9678 by kshitijk4poor.
- Smart tool output collapse: informative 1-line summaries replace generic placeholder
- Dedup identical tool results via MD5 hash, truncate large tool_call arguments
- Anti-thrashing: skip compression after 2 consecutive <10% savings passes
- Structured action-log summary template with numbered actions and Active State
- Hardening: max_tokens 1.3x cap, multimodal safety, note idempotency, adaptive cooldown
Follow-up fixes applied during salvage:
- web_extract: reads 'urls' (list) not 'url' (original PR bug)
- Multimodal list content guards in dedup and prune passes
- Kept 'Relevant Files' section in template (original PR removed it)
Skipped PRs #9665 (user msg preservation — duplication risk) and #9675 (dead code).
2026-04-14 22:21:04 -07:00
# No provider configured — long cooldown, unlikely to self-resolve
2026-04-07 04:28:11 +09:00
self . _summary_failure_cooldown_until = time . monotonic ( ) + _SUMMARY_FAILURE_COOLDOWN_SECONDS
2026-03-11 20:52:19 -07:00
logging . warning ( " Context compression: no provider available for "
2026-04-07 04:28:11 +09:00
" summary. Middle turns will be dropped without summary "
" for %d seconds. " ,
_SUMMARY_FAILURE_COOLDOWN_SECONDS )
2026-03-11 20:52:19 -07:00
return None
except Exception as e :
fix: stale agent timeout, uv venv detection, empty response after tools, compression model fallback (#9051, #8620, #9400) (#10093)
Four independent fixes:
1. Reset activity timestamp on cached agent reuse (#9051)
When the gateway reuses a cached AIAgent for a new turn, the
_last_activity_ts from the previous turn (possibly hours ago)
carried over. The inactivity timeout handler immediately saw
the agent as idle for hours and killed it.
Fix: reset _last_activity_ts, _last_activity_desc, and
_api_call_count when retrieving an agent from the cache.
2. Detect uv-managed virtual environments (#8620 sub-issue 1)
The systemd unit generator fell back to sys.executable (uv's
standalone Python) when running under 'uv run', because
sys.prefix == sys.base_prefix. The generated ExecStart pointed
to a Python binary without site-packages.
Fix: check VIRTUAL_ENV env var before falling back to
sys.executable. uv sets VIRTUAL_ENV even when sys.prefix
doesn't reflect the venv.
3. Nudge model to continue after empty post-tool response (#9400)
Weaker models sometimes return empty after tool calls. The agent
silently abandoned the remaining work.
Fix: append assistant('(empty)') + user nudge message and retry
once. Resets after each successful tool round.
4. Compression model fallback on permanent errors (#8620 sub-issue 4)
When the default summary model (gemini-3-flash) returns 503
'model_not_found' on custom proxies, the compressor entered a
600s cooldown, leaving context growing unbounded.
Fix: detect permanent model-not-found errors (503, 404,
'model_not_found', 'no available channel') and fall back to
using the main model for compression instead of entering
cooldown. One-time fallback with immediate retry.
Test plan: 40 compressor tests + 97 gateway/CLI tests + 9 venv tests pass
2026-04-14 22:38:17 -07:00
# If the summary model is different from the main model and the
# error looks permanent (model not found, 503, 404), fall back to
# using the main model instead of entering cooldown that leaves
# context growing unbounded. (#8620 sub-issue 4)
_status = getattr ( e , " status_code " , None ) or getattr ( getattr ( e , " response " , None ) , " status_code " , None )
_err_str = str ( e ) . lower ( )
_is_model_not_found = (
_status in ( 404 , 503 )
or " model_not_found " in _err_str
or " does not exist " in _err_str
or " no available channel " in _err_str
)
if (
_is_model_not_found
and self . summary_model
and self . summary_model != self . model
and not getattr ( self , " _summary_model_fallen_back " , False )
) :
self . _summary_model_fallen_back = True
logging . warning (
" Summary model ' %s ' not available ( %s ). "
" Falling back to main model ' %s ' for compression. " ,
self . summary_model , e , self . model ,
)
self . summary_model = " " # empty = use main model
self . _summary_failure_cooldown_until = 0.0 # no cooldown
return self . _generate_summary ( messages , summary_budget ) # retry immediately
feat(compressor): smart collapse, dedup, anti-thrashing, template upgrade, hardening
Combined salvage of PRs #9661, #9663, #9674, #9677, #9678 by kshitijk4poor.
- Smart tool output collapse: informative 1-line summaries replace generic placeholder
- Dedup identical tool results via MD5 hash, truncate large tool_call arguments
- Anti-thrashing: skip compression after 2 consecutive <10% savings passes
- Structured action-log summary template with numbered actions and Active State
- Hardening: max_tokens 1.3x cap, multimodal safety, note idempotency, adaptive cooldown
Follow-up fixes applied during salvage:
- web_extract: reads 'urls' (list) not 'url' (original PR bug)
- Multimodal list content guards in dedup and prune passes
- Kept 'Relevant Files' section in template (original PR removed it)
Skipped PRs #9665 (user msg preservation — duplication risk) and #9675 (dead code).
2026-04-14 22:21:04 -07:00
# Transient errors (timeout, rate limit, network) — shorter cooldown
_transient_cooldown = 60
self . _summary_failure_cooldown_until = time . monotonic ( ) + _transient_cooldown
2026-04-07 04:28:11 +09:00
logging . warning (
" Failed to generate context summary: %s . "
" Further summary attempts paused for %d seconds. " ,
e ,
feat(compressor): smart collapse, dedup, anti-thrashing, template upgrade, hardening
Combined salvage of PRs #9661, #9663, #9674, #9677, #9678 by kshitijk4poor.
- Smart tool output collapse: informative 1-line summaries replace generic placeholder
- Dedup identical tool results via MD5 hash, truncate large tool_call arguments
- Anti-thrashing: skip compression after 2 consecutive <10% savings passes
- Structured action-log summary template with numbered actions and Active State
- Hardening: max_tokens 1.3x cap, multimodal safety, note idempotency, adaptive cooldown
Follow-up fixes applied during salvage:
- web_extract: reads 'urls' (list) not 'url' (original PR bug)
- Multimodal list content guards in dedup and prune passes
- Kept 'Relevant Files' section in template (original PR removed it)
Skipped PRs #9665 (user msg preservation — duplication risk) and #9675 (dead code).
2026-04-14 22:21:04 -07:00
_transient_cooldown ,
2026-04-07 04:28:11 +09:00
)
2026-03-11 20:52:19 -07:00
return None
2026-03-04 17:58:09 -08:00
2026-03-14 02:33:31 -07:00
@staticmethod
def _with_summary_prefix ( summary : str ) - > str :
""" Normalize summary text to the current compaction handoff format. """
text = ( summary or " " ) . strip ( )
for prefix in ( LEGACY_SUMMARY_PREFIX , SUMMARY_PREFIX ) :
if text . startswith ( prefix ) :
text = text [ len ( prefix ) : ] . lstrip ( )
break
return f " { SUMMARY_PREFIX } \n { text } " if text else SUMMARY_PREFIX
2026-03-07 08:08:00 -08:00
# ------------------------------------------------------------------
# Tool-call / tool-result pair integrity helpers
# ------------------------------------------------------------------
@staticmethod
def _get_tool_call_id ( tc ) - > str :
""" Extract the call ID from a tool_call entry (dict or SimpleNamespace). """
if isinstance ( tc , dict ) :
return tc . get ( " id " , " " )
return getattr ( tc , " id " , " " ) or " "
def _sanitize_tool_pairs ( self , messages : List [ Dict [ str , Any ] ] ) - > List [ Dict [ str , Any ] ] :
""" Fix orphaned tool_call / tool_result pairs after compression.
Two failure modes :
1. A tool * result * references a call_id whose assistant tool_call was
removed ( summarized / truncated ) . The API rejects this with
" No tool call found for function call output with call_id ... " .
2. An assistant message has tool_calls whose results were dropped .
The API rejects this because every tool_call must be followed by
a tool result with the matching call_id .
This method removes orphaned results and inserts stub results for
orphaned calls so the message list is always well - formed .
"""
surviving_call_ids : set = set ( )
for msg in messages :
if msg . get ( " role " ) == " assistant " :
for tc in msg . get ( " tool_calls " ) or [ ] :
cid = self . _get_tool_call_id ( tc )
if cid :
surviving_call_ids . add ( cid )
result_call_ids : set = set ( )
for msg in messages :
if msg . get ( " role " ) == " tool " :
cid = msg . get ( " tool_call_id " )
if cid :
result_call_ids . add ( cid )
# 1. Remove tool results whose call_id has no matching assistant tool_call
orphaned_results = result_call_ids - surviving_call_ids
if orphaned_results :
messages = [
m for m in messages
if not ( m . get ( " role " ) == " tool " and m . get ( " tool_call_id " ) in orphaned_results )
]
if not self . quiet_mode :
logger . info ( " Compression sanitizer: removed %d orphaned tool result(s) " , len ( orphaned_results ) )
# 2. Add stub results for assistant tool_calls whose results were dropped
missing_results = surviving_call_ids - result_call_ids
if missing_results :
patched : List [ Dict [ str , Any ] ] = [ ]
for msg in messages :
patched . append ( msg )
if msg . get ( " role " ) == " assistant " :
for tc in msg . get ( " tool_calls " ) or [ ] :
cid = self . _get_tool_call_id ( tc )
if cid in missing_results :
patched . append ( {
" role " : " tool " ,
" content " : " [Result from earlier conversation — see context summary above] " ,
" tool_call_id " : cid ,
} )
messages = patched
if not self . quiet_mode :
logger . info ( " Compression sanitizer: added %d stub tool result(s) " , len ( missing_results ) )
return messages
def _align_boundary_forward ( self , messages : List [ Dict [ str , Any ] ] , idx : int ) - > int :
""" Push a compress-start boundary forward past any orphan tool results.
If ` ` messages [ idx ] ` ` is a tool result , slide forward until we hit a
non - tool message so we don ' t start the summarised region mid-group.
"""
while idx < len ( messages ) and messages [ idx ] . get ( " role " ) == " tool " :
idx + = 1
return idx
def _align_boundary_backward ( self , messages : List [ Dict [ str , Any ] ] , idx : int ) - > int :
""" Pull a compress-end boundary backward to avoid splitting a
tool_call / result group .
2026-03-18 15:22:51 -07:00
If the boundary falls in the middle of a tool - result group ( i . e .
there are consecutive tool messages before ` ` idx ` ` ) , walk backward
past all of them to find the parent assistant message . If found ,
move the boundary before the assistant so the entire
assistant + tool_results group is included in the summarised region
rather than being split ( which causes silent data loss when
` ` _sanitize_tool_pairs ` ` removes the orphaned tail results ) .
2026-03-07 08:08:00 -08:00
"""
if idx < = 0 or idx > = len ( messages ) :
return idx
2026-03-18 15:22:51 -07:00
# Walk backward past consecutive tool results
check = idx - 1
while check > = 0 and messages [ check ] . get ( " role " ) == " tool " :
check - = 1
# If we landed on the parent assistant with tool_calls, pull the
# boundary before it so the whole group gets summarised together.
if check > = 0 and messages [ check ] . get ( " role " ) == " assistant " and messages [ check ] . get ( " tool_calls " ) :
idx = check
2026-03-07 08:08:00 -08:00
return idx
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
# ------------------------------------------------------------------
# Tail protection by token budget
# ------------------------------------------------------------------
2026-04-16 19:22:22 +05:30
def _find_last_user_message_idx (
self , messages : List [ Dict [ str , Any ] ] , head_end : int
) - > int :
""" Return the index of the last user-role message at or after *head_end*, or -1. """
for i in range ( len ( messages ) - 1 , head_end - 1 , - 1 ) :
if messages [ i ] . get ( " role " ) == " user " :
return i
return - 1
def _ensure_last_user_message_in_tail (
self ,
messages : List [ Dict [ str , Any ] ] ,
cut_idx : int ,
head_end : int ,
) - > int :
""" Guarantee the most recent user message is in the protected tail.
Context compressor bug ( #10896): ``_align_boundary_backward`` can pull
` ` cut_idx ` ` past a user message when it tries to keep tool_call / result
groups together . If the last user message ends up in the * compressed *
middle region the LLM summariser writes it into " Pending User Asks " ,
but ` ` SUMMARY_PREFIX ` ` tells the next model to respond only to user
messages * after * the summary — so the task effectively disappears from
the active context , causing the agent to stall , repeat completed work ,
or silently drop the user ' s latest request.
Fix : if the last user - role message is not already in the tail
( ` ` messages [ cut_idx : ] ` ` ) , walk ` ` cut_idx ` ` back to include it . We
then re - align backward one more time to avoid splitting any
tool_call / result group that immediately precedes the user message .
"""
last_user_idx = self . _find_last_user_message_idx ( messages , head_end )
if last_user_idx < 0 :
# No user message found beyond head — nothing to anchor.
return cut_idx
if last_user_idx > = cut_idx :
# Already in the tail; nothing to do.
return cut_idx
# The last user message is in the middle (compressed) region.
# Pull cut_idx back to it directly — a user message is already a
# clean boundary (no tool_call/result splitting risk), so there is no
# need to call _align_boundary_backward here; doing so would
# unnecessarily pull the cut further back into the preceding
# assistant + tool_calls group.
if not self . quiet_mode :
logger . debug (
" Anchoring tail cut to last user message at index %d "
" (was %d ) to prevent active-task loss after compression " ,
last_user_idx ,
cut_idx ,
)
# Safety: never go back into the head region.
return max ( last_user_idx , head_end + 1 )
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
def _find_tail_cut_by_tokens (
self , messages : List [ Dict [ str , Any ] ] , head_end : int ,
2026-03-24 17:45:49 -07:00
token_budget : int | None = None ,
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
) - > int :
""" Walk backward from the end of messages, accumulating tokens until
the budget is reached . Returns the index where the tail starts .
2026-03-24 17:45:49 -07:00
` ` token_budget ` ` defaults to ` ` self . tail_token_budget ` ` which is
derived from ` ` summary_target_ratio * context_length ` ` , so it
scales automatically with the model ' s context window.
2026-04-08 17:36:32 +00:00
Token budget is the primary criterion . A hard minimum of 3 messages
is always protected , but the budget is allowed to exceed by up to
1.5 x to avoid cutting inside an oversized message ( tool output , file
read , etc . ) . If even the minimum 3 messages exceed 1.5 x the budget
the cut is placed right after the head so compression still runs .
2026-04-16 19:22:22 +05:30
Never cuts inside a tool_call / result group . Always ensures the most
recent user message is in the tail ( see ` ` _ensure_last_user_message_in_tail ` ` ) .
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
"""
2026-03-24 17:45:49 -07:00
if token_budget is None :
token_budget = self . tail_token_budget
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
n = len ( messages )
2026-04-08 17:36:32 +00:00
# Hard minimum: always keep at least 3 messages in the tail
min_tail = min ( 3 , n - head_end - 1 ) if n - head_end > 1 else 0
soft_ceiling = int ( token_budget * 1.5 )
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
accumulated = 0
cut_idx = n # start from beyond the end
for i in range ( n - 1 , head_end - 1 , - 1 ) :
msg = messages [ i ]
content = msg . get ( " content " ) or " "
msg_tokens = len ( content ) / / _CHARS_PER_TOKEN + 10 # +10 for role/metadata
# Include tool call arguments in estimate
for tc in msg . get ( " tool_calls " ) or [ ] :
if isinstance ( tc , dict ) :
args = tc . get ( " function " , { } ) . get ( " arguments " , " " )
msg_tokens + = len ( args ) / / _CHARS_PER_TOKEN
2026-04-08 17:36:32 +00:00
# Stop once we exceed the soft ceiling (unless we haven't hit min_tail yet)
if accumulated + msg_tokens > soft_ceiling and ( n - i ) > = min_tail :
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
break
accumulated + = msg_tokens
cut_idx = i
2026-04-08 17:36:32 +00:00
# Ensure we protect at least min_tail messages
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
fallback_cut = n - min_tail
if cut_idx > fallback_cut :
cut_idx = fallback_cut
# If the token budget would protect everything (small conversations),
2026-04-08 17:36:32 +00:00
# force a cut after the head so compression can still remove middle turns.
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
if cut_idx < = head_end :
2026-04-08 17:36:32 +00:00
cut_idx = max ( fallback_cut , head_end + 1 )
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
# Align to avoid splitting tool groups
cut_idx = self . _align_boundary_backward ( messages , cut_idx )
2026-04-16 19:22:22 +05:30
# Ensure the most recent user message is always in the tail so the
# active task is never lost to compression (fixes #10896).
cut_idx = self . _ensure_last_user_message_in_tail ( messages , cut_idx , head_end )
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
return max ( cut_idx , head_end + 1 )
# ------------------------------------------------------------------
# Main compression entry point
# ------------------------------------------------------------------
2026-04-11 19:23:29 -07:00
def compress ( self , messages : List [ Dict [ str , Any ] ] , current_tokens : int = None , focus_topic : str = None ) - > List [ Dict [ str , Any ] ] :
2026-02-21 22:31:43 -08:00
""" Compress conversation messages by summarizing middle turns.
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
Algorithm :
1. Prune old tool results ( cheap pre - pass , no LLM call )
2. Protect head messages ( system prompt + first exchange )
3. Find tail boundary by token budget ( ~ 20 K tokens of recent context )
4. Summarize middle turns with structured LLM prompt
5. On re - compression , iteratively update the previous summary
2026-03-07 08:08:00 -08:00
After compression , orphaned tool_call / tool_result pairs are cleaned
up so the API never receives mismatched IDs .
2026-04-11 19:23:29 -07:00
Args :
focus_topic : Optional focus string for guided compression . When
provided , the summariser will prioritise preserving information
related to this topic and be more aggressive about compressing
everything else . Inspired by Claude Code ' s ``/compact``.
2026-02-21 22:31:43 -08:00
"""
n_messages = len ( messages )
2026-04-08 17:36:32 +00:00
# Only need head + 3 tail messages minimum (token budget decides the real tail size)
_min_for_compress = self . protect_first_n + 3 + 1
if n_messages < = _min_for_compress :
2026-02-21 22:31:43 -08:00
if not self . quiet_mode :
2026-03-17 16:31:01 -07:00
logger . warning (
" Cannot compress: only %d messages (need > %d ) " ,
2026-04-08 17:36:32 +00:00
n_messages , _min_for_compress ,
2026-03-17 16:31:01 -07:00
)
2026-02-21 22:31:43 -08:00
return messages
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
display_tokens = current_tokens if current_tokens else self . last_prompt_tokens or estimate_messages_tokens_rough ( messages )
# Phase 1: Prune old tool results (cheap, no LLM call)
messages , pruned_count = self . _prune_old_tool_results (
2026-04-08 17:36:32 +00:00
messages , protect_tail_count = self . protect_last_n ,
protect_tail_tokens = self . tail_token_budget ,
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
)
if pruned_count and not self . quiet_mode :
logger . info ( " Pre-compression: pruned %d old tool result(s) " , pruned_count )
2026-02-21 22:31:43 -08:00
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
# Phase 2: Determine boundaries
compress_start = self . protect_first_n
2026-03-07 08:08:00 -08:00
compress_start = self . _align_boundary_forward ( messages , compress_start )
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
# Use token-budget tail protection instead of fixed message count
compress_end = self . _find_tail_cut_by_tokens ( messages , compress_start )
2026-03-07 08:08:00 -08:00
if compress_start > = compress_end :
return messages
2026-02-21 22:31:43 -08:00
turns_to_summarize = messages [ compress_start : compress_end ]
if not self . quiet_mode :
2026-03-17 16:31:01 -07:00
logger . info (
" Context compression triggered ( %d tokens >= %d threshold) " ,
display_tokens ,
self . threshold_tokens ,
)
logger . info (
" Model context limit: %d tokens ( %.0f %% = %d ) " ,
self . context_length ,
self . threshold_percent * 100 ,
self . threshold_tokens ,
)
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
tail_msgs = n_messages - compress_end
2026-03-17 16:31:01 -07:00
logger . info (
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
" Summarizing turns %d - %d ( %d turns), protecting %d head + %d tail messages " ,
2026-03-17 16:31:01 -07:00
compress_start + 1 ,
compress_end ,
len ( turns_to_summarize ) ,
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
compress_start ,
tail_msgs ,
2026-03-17 16:31:01 -07:00
)
2026-02-21 22:31:43 -08:00
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
# Phase 3: Generate structured summary
2026-04-11 19:23:29 -07:00
summary = self . _generate_summary ( turns_to_summarize , focus_topic = focus_topic )
2026-02-21 22:31:43 -08:00
feat(compressor): major context compaction improvements
Six improvements to reduce information loss during context compression,
informed by analysis of Cline, OpenCode, Pi-mono, Codex, and ClawdBot:
1. Structured summary template — sections for Goal, Progress (Done/
In Progress/Blocked), Key Decisions, Relevant Files, Next Steps,
and Critical Context. Forces the summarizer to preserve each
category instead of writing a vague paragraph.
2. Iterative summary updates — on re-compression, the prompt says
'PRESERVE existing info, ADD new progress, UPDATE done/in-progress
status.' Previous summary is stored and fed back to the summarizer
so accumulated context survives across multiple compactions.
3. Token-budget tail protection — instead of fixed protect_last_n=4,
walks backward keeping ~20K tokens of recent context. Adapts to
message density: sessions with big tool results protect fewer
messages, short exchanges protect more. Falls back to protect_last_n
for small conversations.
4. Tool output pruning (pre-pass) — before the expensive LLM summary,
replaces old tool result contents with a placeholder. This is free
(no LLM call) and can save 30%+ of context by itself.
5. Scaled summary budget — instead of fixed 2500 tokens, allocates 20%
of compressed content tokens (clamped to 2000-8000). A 50-turn
conversation gets more summary space than a 10-turn one.
6. Richer summarizer input — tool calls now include arguments (up to
500 chars) and tool results keep up to 3000 chars (was 1500).
The summarizer sees 'terminal(git status) → M src/config.py'
instead of just '[Tool calls: terminal]'.
2026-03-21 08:14:14 -07:00
# Phase 4: Assemble compressed message list
2026-02-21 22:31:43 -08:00
compressed = [ ]
for i in range ( compress_start ) :
msg = messages [ i ] . copy ( )
feat(compressor): smart collapse, dedup, anti-thrashing, template upgrade, hardening
Combined salvage of PRs #9661, #9663, #9674, #9677, #9678 by kshitijk4poor.
- Smart tool output collapse: informative 1-line summaries replace generic placeholder
- Dedup identical tool results via MD5 hash, truncate large tool_call arguments
- Anti-thrashing: skip compression after 2 consecutive <10% savings passes
- Structured action-log summary template with numbered actions and Active State
- Hardening: max_tokens 1.3x cap, multimodal safety, note idempotency, adaptive cooldown
Follow-up fixes applied during salvage:
- web_extract: reads 'urls' (list) not 'url' (original PR bug)
- Multimodal list content guards in dedup and prune passes
- Kept 'Relevant Files' section in template (original PR removed it)
Skipped PRs #9665 (user msg preservation — duplication risk) and #9675 (dead code).
2026-04-14 22:21:04 -07:00
if i == 0 and msg . get ( " role " ) == " system " :
existing = msg . get ( " content " ) or " "
_compression_note = " [Note: Some earlier conversation turns have been compacted into a handoff summary to preserve context space. The current session state may still reflect earlier work, so build on that summary and state rather than re-doing work.] "
if _compression_note not in existing :
msg [ " content " ] = existing + " \n \n " + _compression_note
2026-02-21 22:31:43 -08:00
compressed . append ( msg )
2026-04-09 14:28:49 -07:00
# If LLM summary failed, insert a static fallback so the model
# knows context was lost rather than silently dropping everything.
if not summary :
if not self . quiet_mode :
logger . warning ( " Summary generation failed — inserting static fallback context marker " )
n_dropped = compress_end - compress_start
summary = (
f " { SUMMARY_PREFIX } \n "
f " Summary generation was unavailable. { n_dropped } conversation turns were "
f " removed to free context space but could not be summarized. The removed "
f " turns contained earlier work in this session. Continue based on the "
f " recent messages below and the current state of any files or resources. "
)
2026-03-17 05:18:52 -07:00
_merge_summary_into_tail = False
2026-04-09 14:28:49 -07:00
last_head_role = messages [ compress_start - 1 ] . get ( " role " , " user " ) if compress_start > 0 else " user "
first_tail_role = messages [ compress_end ] . get ( " role " , " user " ) if compress_end < n_messages else " user "
# Pick a role that avoids consecutive same-role with both neighbors.
# Priority: avoid colliding with head (already committed), then tail.
if last_head_role in ( " assistant " , " tool " ) :
summary_role = " user "
2026-03-07 11:54:51 -08:00
else :
2026-04-09 14:28:49 -07:00
summary_role = " assistant "
# If the chosen role collides with the tail AND flipping wouldn't
# collide with the head, flip it.
if summary_role == first_tail_role :
flipped = " assistant " if summary_role == " user " else " user "
if flipped != last_head_role :
summary_role = flipped
else :
# Both roles would create consecutive same-role messages
# (e.g. head=assistant, tail=user — neither role works).
# Merge the summary into the first tail message instead
# of inserting a standalone message that breaks alternation.
_merge_summary_into_tail = True
if not _merge_summary_into_tail :
compressed . append ( { " role " : summary_role , " content " : summary } )
2026-02-21 22:31:43 -08:00
for i in range ( compress_end , n_messages ) :
2026-03-17 05:18:52 -07:00
msg = messages [ i ] . copy ( )
if _merge_summary_into_tail and i == compress_end :
original = msg . get ( " content " ) or " "
2026-04-11 19:43:58 -07:00
msg [ " content " ] = (
summary
+ " \n \n --- END OF CONTEXT SUMMARY — "
" respond to the message below, not the summary above --- \n \n "
+ original
)
2026-03-17 05:18:52 -07:00
_merge_summary_into_tail = False
compressed . append ( msg )
2026-02-21 22:31:43 -08:00
self . compression_count + = 1
2026-03-07 08:08:00 -08:00
compressed = self . _sanitize_tool_pairs ( compressed )
feat(compressor): smart collapse, dedup, anti-thrashing, template upgrade, hardening
Combined salvage of PRs #9661, #9663, #9674, #9677, #9678 by kshitijk4poor.
- Smart tool output collapse: informative 1-line summaries replace generic placeholder
- Dedup identical tool results via MD5 hash, truncate large tool_call arguments
- Anti-thrashing: skip compression after 2 consecutive <10% savings passes
- Structured action-log summary template with numbered actions and Active State
- Hardening: max_tokens 1.3x cap, multimodal safety, note idempotency, adaptive cooldown
Follow-up fixes applied during salvage:
- web_extract: reads 'urls' (list) not 'url' (original PR bug)
- Multimodal list content guards in dedup and prune passes
- Kept 'Relevant Files' section in template (original PR removed it)
Skipped PRs #9665 (user msg preservation — duplication risk) and #9675 (dead code).
2026-04-14 22:21:04 -07:00
new_estimate = estimate_messages_tokens_rough ( compressed )
saved_estimate = display_tokens - new_estimate
# Anti-thrashing: track compression effectiveness
savings_pct = ( saved_estimate / display_tokens * 100 ) if display_tokens > 0 else 0
self . _last_compression_savings_pct = savings_pct
if savings_pct < 10 :
self . _ineffective_compression_count + = 1
else :
self . _ineffective_compression_count = 0
2026-02-21 22:31:43 -08:00
if not self . quiet_mode :
2026-03-17 16:31:01 -07:00
logger . info (
feat(compressor): smart collapse, dedup, anti-thrashing, template upgrade, hardening
Combined salvage of PRs #9661, #9663, #9674, #9677, #9678 by kshitijk4poor.
- Smart tool output collapse: informative 1-line summaries replace generic placeholder
- Dedup identical tool results via MD5 hash, truncate large tool_call arguments
- Anti-thrashing: skip compression after 2 consecutive <10% savings passes
- Structured action-log summary template with numbered actions and Active State
- Hardening: max_tokens 1.3x cap, multimodal safety, note idempotency, adaptive cooldown
Follow-up fixes applied during salvage:
- web_extract: reads 'urls' (list) not 'url' (original PR bug)
- Multimodal list content guards in dedup and prune passes
- Kept 'Relevant Files' section in template (original PR removed it)
Skipped PRs #9665 (user msg preservation — duplication risk) and #9675 (dead code).
2026-04-14 22:21:04 -07:00
" Compressed: %d -> %d messages (~ %d tokens saved, %.0f %% ) " ,
2026-03-17 16:31:01 -07:00
n_messages ,
len ( compressed ) ,
saved_estimate ,
feat(compressor): smart collapse, dedup, anti-thrashing, template upgrade, hardening
Combined salvage of PRs #9661, #9663, #9674, #9677, #9678 by kshitijk4poor.
- Smart tool output collapse: informative 1-line summaries replace generic placeholder
- Dedup identical tool results via MD5 hash, truncate large tool_call arguments
- Anti-thrashing: skip compression after 2 consecutive <10% savings passes
- Structured action-log summary template with numbered actions and Active State
- Hardening: max_tokens 1.3x cap, multimodal safety, note idempotency, adaptive cooldown
Follow-up fixes applied during salvage:
- web_extract: reads 'urls' (list) not 'url' (original PR bug)
- Multimodal list content guards in dedup and prune passes
- Kept 'Relevant Files' section in template (original PR removed it)
Skipped PRs #9665 (user msg preservation — duplication risk) and #9675 (dead code).
2026-04-14 22:21:04 -07:00
savings_pct ,
2026-03-17 16:31:01 -07:00
)
logger . info ( " Compression # %d complete " , self . compression_count )
2026-02-21 22:31:43 -08:00
return compressed