fix: move pre_llm_call plugin context to user message, preserve prompt cache (#5146)

Plugin context from pre_llm_call hooks was injected into the system
prompt, breaking the prompt cache prefix every turn when content
changed (typical for memory plugins). Now all plugin context goes
into the current turn's user message — the system prompt stays
identical across turns, preserving cached tokens.

The system prompt is reserved for Hermes internals. Plugins
contribute context alongside the user's input.

Also adds comprehensive documentation for all 6 plugin hooks:
pre_tool_call, post_tool_call, pre_llm_call, post_llm_call,
on_session_start, on_session_end — each with full callback
signatures, parameter tables, firing conditions, and examples.

Supersedes #5138 which identified the same cache-busting bug
and proposed an uncached system suffix approach. This fix goes
further by removing system prompt injection entirely.

Co-identified-by: OutThisLife (PR #5138)
This commit is contained in:
Teknium
2026-04-04 16:55:44 -07:00
committed by GitHub
parent 96e96a79ad
commit 5879b3ef82
6 changed files with 653 additions and 57 deletions

View File

@@ -441,8 +441,18 @@ class PluginManager:
plugin cannot break the core agent loop.
Returns a list of non-``None`` return values from callbacks.
This allows hooks like ``pre_llm_call`` to contribute context
that the agent core can collect and inject.
For ``pre_llm_call``, callbacks may return a dict describing
context to inject into the current turn's user message::
{"context": "recalled text..."}
"recalled text..." # plain string, equivalent
Context is ALWAYS injected into the user message, never the
system prompt. This preserves the prompt cache prefix — the
system prompt stays identical across turns so cached tokens
are reused. All injected context is ephemeral — never
persisted to session DB.
"""
callbacks = self._hooks.get(hook_name, [])
results: List[Any] = []