mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-05-01 00:11:39 +08:00
fix(hindsight): route flush-on-switch through writer queue, not raw thread
Follow-up to the cherry-picked PR #17447. The original flush spawned a bare threading.Thread for the buffer-flush path, overwriting self._sync_thread — which is aliased to the long-lived writer thread. Two consequences: 1. No serialization with the writer queue. If old-session retains were still queued in _retain_queue, the flush ran concurrently with the writer and both threads could call aretain_batch against the same document_id. 2. The pre-spawn 'self._sync_thread.join(timeout=5.0)' tried to join the long-lived writer, which never exits, so the join was a no-op that just timed out — never actually serialized anything. Fix: enqueue the flush closure on _retain_queue via _ensure_writer + put(). Natural FIFO ordering behind any pending retains, no new thread, no broken join. Shutdown-aware so it doesn't enqueue after teardown. Tests updated to drain via _retain_queue.join() instead of the stale _sync_thread.join(). Added regression guard test_flush_serializes_behind_pending_retains_via_writer_queue that blocks the writer mid-retain to prove the flush waits in FIFO behind the old retain. Also seeds _retain_queue / _shutting_down / stubbed _ensure_writer on the bare-object test helper in test_memory_session_switch.py so that path doesn't blow up under the new queue-enqueue. tests/plugins/memory/test_hindsight_provider.py + tests/agent/test_memory_session_switch.py: 103/103 passing.
This commit is contained in:
@@ -1510,15 +1510,16 @@ class HindsightMemoryProvider(MemoryProvider):
|
||||
except Exception as e:
|
||||
logger.warning("Hindsight flush-on-switch failed: %s", e, exc_info=True)
|
||||
|
||||
# Match sync_turn's serialization — wait for any prior retain
|
||||
# thread to finish before spawning the flush, so writes
|
||||
# against the old document arrive in order.
|
||||
if self._sync_thread and self._sync_thread.is_alive():
|
||||
self._sync_thread.join(timeout=5.0)
|
||||
self._sync_thread = threading.Thread(
|
||||
target=_flush, daemon=True, name="hindsight-flush-on-switch"
|
||||
)
|
||||
self._sync_thread.start()
|
||||
# Route the flush through the same writer queue sync_turn
|
||||
# uses. That serializes it behind any still-queued retains
|
||||
# from the old session (FIFO by document_id), avoids racing
|
||||
# two threads on aretain_batch against the same document, and
|
||||
# keeps shutdown's drain semantics intact. Skip enqueue if
|
||||
# shutdown has already fired — the writer is draining/gone.
|
||||
if not self._shutting_down.is_set():
|
||||
self._ensure_writer()
|
||||
self._register_atexit()
|
||||
self._retain_queue.put(_flush)
|
||||
|
||||
# 2. Drain any in-flight prefetch from the old session and drop
|
||||
# its cached result so the new session doesn't see stale recall.
|
||||
|
||||
Reference in New Issue
Block a user