Two data-loss / leak gaps in HindsightMemoryProvider.on_session_switch
introduced by #17409.
1. Buffered turns silently lost when retain_every_n_turns > 1.
on_session_switch unconditionally cleared _session_turns without
flushing. Users who batched every N>1 turns and switched mid-batch
(/reset, /new, /resume, /branch, or context compression) had those
buffered turns disappear. Same data-loss class as the shutdown race,
different lifecycle event.
Note commit_memory_session() -> on_session_end() runs *before*
on_session_switch on /reset, but Hindsight doesn't implement
on_session_end so the buffer survives that step and dies at clear
time. /resume, /branch, and compression skip commit_memory_session
entirely so an on_session_end impl wouldn't help them anyway.
Fix: snapshot the old _session_id, _document_id, _parent_session_id,
_turn_index, and _session_turns; spawn one final retain that lands
under the OLD document_id; then rotate state. Metadata is built
synchronously against the old self._* so session_id / lineage tags
on the flushed item all reference the prior session consistently.
2. Stale _prefetch_result leaks across switch.
If queue_prefetch ran in the old session and the result hadn't been
consumed by prefetch() yet, on_session_switch left the cached recall
text in place. The next session's first prefetch() call would return
text mined from the prior session's bank/query.
Fix: join any in-flight _prefetch_thread (3s bounded — matches
shutdown()), then clear _prefetch_result under _prefetch_lock before
rotating session_id.
Tests
-----
- tests/plugins/memory/test_hindsight_provider.py (TestSessionSwitchBufferFlush):
- buffered turns flushed under OLD document_id with OLD lineage tags
- empty buffer => no spurious retain
- _prefetch_result cleared on switch
- in-flight prefetch thread is awaited before clear (no race)
- tests/agent/test_memory_session_switch.py: factory extended to seed the
attrs the new flush path reads (_retain_source, _platform, _bank_id,
prefetch state, etc.) and stub _run_hindsight_operation so existing
switch-state assertions keep passing without network setup.
Fixes#6672
Memory providers now receive on_session_switch() whenever AIAgent.session_id
rotates mid-process — /resume, /branch, /reset, /new, and context
compression. Before this, providers that cached per-session state in
initialize() (Hindsight's _session_id, _document_id, accumulated
_session_turns, _turn_counter) kept writing into the old session's
record after the agent had moved on.
MemoryProvider ABC
------------------
- New optional hook on_session_switch(new_session_id, *,
parent_session_id='', reset=False, **kwargs) with no-op default for
backward compat. reset=True signals /reset or /new — providers should
flush accumulated per-session buffers. reset=False for /resume,
/branch, compression where the logical conversation continues.
MemoryManager
-------------
- on_session_switch() fans the hook out to every registered provider.
Isolated try/except per provider — one bad provider can't block others.
- Empty/None new_session_id is a no-op to avoid corrupting provider state
during shutdown paths.
run_agent.py
------------
- _sync_external_memory_for_turn now passes session_id=self.session_id
into sync_all() and queue_prefetch_all(). Providers with defensive
session_id updates in sync_turn (Hindsight already had this at
plugins/memory/hindsight/__init__.py:1199) now actually receive the
current id.
- Compression block at ~L8884 already notified the context engine of
the rollover; now also calls
_memory_manager.on_session_switch(reason='compression').
cli.py
------
- new_session() fires reset=True, reason='new_session' so providers
flush buffers.
- _handle_resume_command fires reset=False, reason='resume' with the
previous session as parent_session_id.
- _handle_branch_command fires reset=False, reason='branch' with the
parent session_id already captured for the DB parent link.
gateway/run.py
--------------
- _handle_resume_command now evicts the cached AIAgent, mirroring
/branch and /reset. The next message rebuilds a fresh agent whose
memory provider initialize() runs with the correct session_id —
matches the pattern the gateway already uses for provider state
cross-session transitions.
Hindsight reference implementation
----------------------------------
- plugins/memory/hindsight/__init__.py adds on_session_switch that:
updates _session_id, mints a fresh _document_id (prevents
vectorize-io/hindsight#1303 overwrite), and clears _session_turns /
_turn_counter / _turn_index so in-flight batches don't flush under
the new document id. parent_session_id only overwritten when provided
(avoids clobbering on a bare switch).
Tests
-----
- tests/agent/test_memory_session_switch.py: new dedicated file. ABC
default no-op, manager fan-out, failure isolation, empty-id no-op,
session_id propagation through sync_all/queue_prefetch_all, Hindsight
state transitions for every reset/non-reset case, parent preservation.
- tests/cli/test_branch_command.py: new test verifying /branch fires
the hook with correct parent_session_id + reset=False + reason.
- tests/gateway/test_resume_command.py: new test verifying /resume
evicts the cached agent.
- tests/run_agent/test_memory_sync_interrupted.py: updated existing
assertions to account for the session_id kwarg on sync_all and
queue_prefetch_all.
E2E verified (real imports, tmp HERMES_HOME):
- /resume: session_id updates, doc_id fresh, buffers cleared, parent set
- /branch: session_id forks, parent links to original
- /new: reset=True clears accumulated state
- compression: reason='compression' propagated, lineage preserved
- Empty id: no-op, state preserved
- Legacy provider without on_session_switch: no crash
Reported by @nicoloboschi (Hindsight maintainer); related scope-widening
comment by @kidonng extending coverage to compression.