feat: add Anthropic Context Editing API support

Integrate Anthropic's server-side context management (beta) for Claude models.
When enabled, the API automatically clears old tool use/result pairs and
thinking blocks AFTER prompt cache lookup but BEFORE token counting — this
preserves prompt cache prefixes while freeing context space, something
impossible with client-side stripping.

Implementation:
- anthropic_adapter: add context-management-2025-06-27 to beta headers;
  build context_management edits in build_anthropic_kwargs() via extra_body;
  only include clear_thinking edit when reasoning is enabled (API requires it)
- run_agent: pipe context_editing config through AIAgent to the adapter
- cli/gateway: load context_editing config from config.yaml and pass to agent
- config: add context_editing section to DEFAULT_CONFIG with conservative
  defaults (disabled, auto-scale triggers to 60%/10% of context window,
  keep 5 tool uses and 2 thinking turns, exclude memory/skill_manage/todo)

Config (opt-in, add to config.yaml):
  context_editing:
    enabled: true
    trigger_tokens: null       # auto: 60% of context window
    keep_tool_uses: 5
    keep_thinking_turns: 2
    exclude_tools: [memory, skill_manage, todo]
    clear_tool_inputs: false
    clear_at_least_tokens: null  # auto: 10% of context window

Live tested with Anthropic API:
- Single turn with context_management: accepted, response normal
- Multi-turn with tool calls + thinking + context_management: works
- clear_thinking correctly omitted when thinking is disabled
- Config plumbing verified through AIAgent._build_api_kwargs()

Refs: #526, supersedes #528
This commit is contained in:
teknium1
2026-03-13 03:33:14 -07:00
parent 76a654f949
commit 02028a6a9e
7 changed files with 283 additions and 0 deletions

View File

@@ -25,6 +25,29 @@ model:
# api_key: "your-key-here" # Uncomment to set here instead of .env
base_url: "https://openrouter.ai/api/v1"
# =============================================================================
# Anthropic Context Editing (Claude-only, optional)
# =============================================================================
# Server-side context management for Claude models. Automatically clears old
# tool call/result pairs and thinking blocks at the API level, AFTER prompt
# cache lookup but BEFORE token counting. This preserves prompt cache prefixes
# while freeing context space — something impossible with client-side stripping.
#
# Only works with direct Anthropic API (provider: anthropic). Disabled by default.
# Anthropic reports ~29% performance improvement with context editing enabled.
#
# context_editing:
# enabled: true # Enable server-side context editing
# trigger_tokens: null # Input token threshold to start clearing (null = auto: 60% of context window)
# keep_tool_uses: 5 # How many recent tool_use/result pairs to preserve
# keep_thinking_turns: 2 # How many recent thinking turns to preserve
# exclude_tools: # Tool calls that are NEVER cleared
# - memory
# - skill_manage
# - todo
# clear_tool_inputs: false # Also clear tool input params (default: false)
# clear_at_least_tokens: null # Minimum tokens to clear per activation (null = auto: 10% of context window)
# =============================================================================
# OpenRouter Provider Routing (only applies when using OpenRouter)
# =============================================================================