Introduces MemoryProvider ABC, MemoryManager orchestrator, and
BuiltinMemoryProvider for the existing MEMORY.md/USER.md system.
Key design decisions:
- Built-in memory is ALWAYS active, never disabled by external providers
- Multiple providers can be active simultaneously
- Prefetch results from all providers are merged per-turn
- Sync fans out to all providers after each turn
- Each provider can expose its own tools
Three registration paths:
1. Built-in (BuiltinMemoryProvider) — always first, not removable
2. First-party (Honcho stays as-is for now, migration in follow-up)
3. Plugin — ctx.register_memory_provider() in plugin system
Files:
- agent/memory_provider.py — ABC with core + optional lifecycle hooks
- agent/memory_manager.py — orchestrator, single integration point
- agent/builtin_memory_provider.py — wraps existing MemoryStore
- hermes_cli/plugins.py — register_memory_provider() + accessor
- run_agent.py — MemoryManager wired alongside existing Honcho code
- tests/agent/test_memory_provider.py — 37 tests
This establishes the interface for all pending memory backend PRs
(#1811 Hindsight, #2732 RetainDB, #2933 Mem0, #3499 Byterover,
#3369 OpenViking, #2351 Holographic, #727 Cognitive) to implement
as plugins rather than one-off integrations.
The plugin system defined six lifecycle hooks but only pre_tool_call and
post_tool_call were invoked. This activates the remaining four so that
external plugins (e.g. memory systems) can hook into the conversation
loop without touching core code.
Hook semantics:
- on_session_start: fires once when a new session is created
- pre_llm_call: fires once per turn before the tool-calling loop;
plugins can return {"context": "..."} to inject into the ephemeral
system prompt (not cached, not persisted)
- post_llm_call: fires once per turn after the loop completes, with
user_message and assistant_response for sync/storage
- on_session_end: fires at the end of every run_conversation call
invoke_hook() now returns a list of non-None callback return values,
enabling pre_llm_call context injection while remaining backward
compatible (existing hooks that return None are unaffected).
Salvaged from PR #2823.
Co-authored-by: Nicolò Boschi <boschi1997@gmail.com>
Reverts the sanitizer addition from PR #2466 (originally #2129).
We already have _empty_content_retries handling for reasoning-only
responses. The trailing strip risks silently eating valid messages
and is redundant with existing empty-content handling.