fix: robust context engine interface — config selection, plugin discovery, ABC completeness

Follow-up fixes for the context engine plugin slot (PR #5700):

- Enhance ContextEngine ABC: add threshold_percent, protect_first_n,
  protect_last_n as class attributes; complete update_model() default
  with threshold recalculation; clarify on_session_end() lifecycle docs
- Add ContextCompressor.update_model() override for model/provider/
  base_url/api_key updates
- Replace all direct compressor internal access in run_agent.py with
  ABC interface: switch_model(), fallback restore, context probing
  all use update_model() now; _context_probed guarded with getattr/
  hasattr for plugin engine compatibility
- Create plugins/context_engine/ directory with discovery module
  (mirrors plugins/memory/ pattern) — discover_context_engines(),
  load_context_engine()
- Add context.engine config key to DEFAULT_CONFIG (default: compressor)
- Config-driven engine selection in run_agent.__init__: checks config,
  then plugins/context_engine/<name>/, then general plugin system,
  falls back to built-in ContextCompressor
- Wire on_session_end() in shutdown_memory_provider() at real session
  boundaries (CLI exit, /reset, gateway expiry)
This commit is contained in:
Teknium
2026-04-08 04:16:58 -07:00
committed by Teknium
parent 5d8dd622bc
commit 3fe6938176
5 changed files with 388 additions and 64 deletions

View File

@@ -73,6 +73,22 @@ class ContextCompressor(ContextEngine):
self._context_probe_persistable = False
self._previous_summary = None
def update_model(
self,
model: str,
context_length: int,
base_url: str = "",
api_key: str = "",
provider: str = "",
) -> None:
"""Update model info after a model switch or fallback activation."""
self.model = model
self.base_url = base_url
self.api_key = api_key
self.provider = provider
self.context_length = context_length
self.threshold_tokens = int(context_length * self.threshold_percent)
def __init__(
self,
model: str,

View File

@@ -3,7 +3,11 @@
A context engine controls how conversation context is managed when
approaching the model's token limit. The built-in ContextCompressor
is the default implementation. Third-party engines (e.g. LCM) can
replace it by registering via the plugin system.
replace it via the plugin system or by being placed in the
``plugins/context_engine/<name>/`` directory.
Selection is config-driven: ``context.engine`` in config.yaml.
Default is ``"compressor"`` (the built-in). Only one engine is active.
The engine is responsible for:
- Deciding when compaction should fire
@@ -17,7 +21,8 @@ Lifecycle:
3. update_from_response() called after each API response with usage data
4. should_compress() checked after each turn
5. compress() called when should_compress() returns True
6. on_session_end() called when the conversation ends
6. on_session_end() called at real session boundaries (CLI exit, /reset,
gateway session expiry) — NOT per-turn
"""
from abc import ABC, abstractmethod
@@ -45,6 +50,16 @@ class ContextEngine(ABC):
context_length: int = 0
compression_count: int = 0
# -- Compaction parameters (read by run_agent.py for preflight) --------
#
# These control the preflight compression check. Subclasses may
# override via __init__ or property; defaults are sensible for most
# engines.
threshold_percent: float = 0.75
protect_first_n: int = 3
protect_last_n: int = 6
# -- Core interface ----------------------------------------------------
@abstractmethod
@@ -93,9 +108,10 @@ class ContextEngine(ABC):
"""
def on_session_end(self, session_id: str, messages: List[Dict[str, Any]]) -> None:
"""Called when the conversation ends.
"""Called at real session boundaries (CLI exit, /reset, gateway expiry).
Use this to flush state, close DB connections, etc.
NOT called per-turn — only when the session truly ends.
"""
def on_session_reset(self) -> None:
@@ -158,9 +174,11 @@ class ContextEngine(ABC):
api_key: str = "",
provider: str = "",
) -> None:
"""Called when the user switches models mid-session.
"""Called when the user switches models or on fallback activation.
Default updates context_length and threshold_tokens. Override if
your engine needs to do more (e.g. recalculate DAG budgets).
Default updates context_length and recalculates threshold_tokens
from threshold_percent. Override if your engine needs more
(e.g. recalculate DAG budgets, switch summary models).
"""
self.context_length = context_length
self.threshold_tokens = int(context_length * self.threshold_percent)