fix(hindsight): route flush-on-switch through writer queue, not raw thread

Follow-up to the cherry-picked PR #17447. The original flush spawned a
bare threading.Thread for the buffer-flush path, overwriting
self._sync_thread — which is aliased to the long-lived writer thread.
Two consequences:

1. No serialization with the writer queue. If old-session retains were
   still queued in _retain_queue, the flush ran concurrently with the
   writer and both threads could call aretain_batch against the same
   document_id.
2. The pre-spawn 'self._sync_thread.join(timeout=5.0)' tried to join the
   long-lived writer, which never exits, so the join was a no-op that
   just timed out — never actually serialized anything.

Fix: enqueue the flush closure on _retain_queue via _ensure_writer +
put(). Natural FIFO ordering behind any pending retains, no new thread,
no broken join. Shutdown-aware so it doesn't enqueue after teardown.

Tests updated to drain via _retain_queue.join() instead of the stale
_sync_thread.join(). Added regression guard
test_flush_serializes_behind_pending_retains_via_writer_queue that
blocks the writer mid-retain to prove the flush waits in FIFO behind
the old retain.

Also seeds _retain_queue / _shutting_down / stubbed _ensure_writer on
the bare-object test helper in test_memory_session_switch.py so that
path doesn't blow up under the new queue-enqueue.

tests/plugins/memory/test_hindsight_provider.py + tests/agent/test_memory_session_switch.py: 103/103 passing.
This commit is contained in:
teknium1
2026-04-29 08:08:02 -07:00
committed by Teknium
parent c38dac742b
commit 0a5ee01e48
3 changed files with 86 additions and 19 deletions

View File

@@ -235,11 +235,21 @@ def _make_hindsight_provider():
provider._prefetch_thread = None
provider._prefetch_lock = threading.Lock()
provider._prefetch_result = ""
# Sync thread tracking — flush spawn target.
# Sync thread tracking (legacy alias at the writer).
provider._sync_thread = None
# Stub the network-touching helper so the spawned flush thread is a
# no-op in unit tests. Real plugin behavior is covered by the
# mock-client tests in tests/plugins/memory/test_hindsight_provider.py.
# Writer queue infra the flush-on-switch path enqueues onto. We stub
# _ensure_writer / _register_atexit so no real thread is spawned;
# tests exercising flush delivery live in
# tests/plugins/memory/test_hindsight_provider.py where the full
# writer-queue wiring is in place.
import queue as _queue
provider._retain_queue = _queue.Queue()
provider._shutting_down = threading.Event()
provider._atexit_registered = True
provider._ensure_writer = lambda: None
provider._register_atexit = lambda: None
# Stub the network-touching helper so any enqueued flush closure is
# a no-op if ever drained in a unit test.
provider._run_hindsight_operation = lambda _op: None
return provider