Belt-and-suspenders on top of @briandevans' #17758 fix. The in-band
drain hand-off (await->create_task + session-guard preservation)
changed cleanup semantics in three places that the original PR
reasoned about but didn't test directly. Pin each invariant so a
future refactor can't silently regress them:
1. Normal single-message path still releases _active_sessions[sk] and
_session_tasks[sk] through end-of-finally. The #17758 follow-up
moved _release_session_guard under
if current_task is self._session_tasks.get(session_key)
For the 99%-common case current_task IS the stored task, so the
guard must still fire. Test would fail if the conditional were
ever tightened in a way that dropped the normal path.
2. Drain-task cancellation releases the session. If the drain task
spawned by the in-band hand-off is cancelled mid-handler (e.g.
/stop fired while draining a follow-up), its own finally must
fire _release_session_guard. Without this a cancel would leave
the session permanently pinned busy.
3. Late-arrival drain still spawns when no in-band drain preceded
it. Pre-existing path, but the #17758 follow-up added a
re-queue branch that only fires when ownership was already
handed off. When no handoff happened the else branch must still
spawn a fresh drain task — otherwise a message arriving during
stop_typing gets silently dropped.
All three tests pass against current main. Zero production code
changes.
When the in-band pending-message drain spawns a fresh task and
transfers ownership via _session_tasks[session_key] = drain_task,
the original task still unwinds through the finally block. The
drain task picks up the same interrupt_event in its own
_process_message_background entry, so an unconditional
_release_session_guard(session_key, guard=interrupt_event) at the
end of the finally matches and deletes _active_sessions[session_key]
while the drain task is still pending its first await.
A concurrent inbound message arriving in that handoff window passes
the Level-1 guard (no entry exists) and spawns a second
_process_message_background for the same session — two agents on
one session_key, duplicate responses, duplicate tool calls.
Fix: only call _release_session_guard when the current task still
owns _session_tasks[session_key]. When ownership has been
transferred to a drain task, leave _active_sessions populated; the
drain task's own lifecycle releases it. This mirrors the
late-arrival drain path in the same finally block, which already
leaves both entries alone after handing off.
Also reorder stdlib imports in the new regression test file to
match the gateway test convention (stdlib before third-party).
Regression test: capture _active_sessions[sk] identity at every
handler entry across a 2-step in-band drain chain and assert the
guard Event identity stays the same. Pre-fix, the original task's
finally deletes the entry, the drain task falls through to the
`or asyncio.Event()` branch, and a fresh Event is installed —
identity diverges. Post-fix, the entry is preserved and the drain
task reuses the original Event.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`_process_message_background` finished a turn, found a queued
follow-up, and drained it via `await
self._process_message_background(pending_event, session_key)`. Each
chained follow-up added a frame to the call stack instead of starting
fresh. Under sustained pending-queue activity (e.g. a user sending
follow-ups faster than the agent finishes turns) the C stack would
exhaust at ~2000 nested frames and SIGSEGV the process.
Mirror the late-arrival drain pattern that already exists in the same
function: spawn a new `asyncio.create_task(...)` for the pending event
and return so the current frame can unwind. The new task takes
ownership via `_session_tasks[session_key]`.
The late-arrival drain in `finally` could now race with the in-band
drain across the `await typing_task` / `await stop_typing` window, so
add a guard: if `_session_tasks[session_key]` is no longer the current
task, an in-band drain already spawned a follow-up task — re-queue the
late-arrival event so that task picks it up after its current event,
instead of spawning a second concurrent task for the same session_key.
Regression test (`test_pending_drain_no_recursion.py`) chains 12
follow-ups and asserts the recorded
`_process_message_background` stack depth stays bounded at handler
entry. Pre-fix: depths grow linearly `[1,2,3,…,12]`. Post-fix: all
depths are `1`.
`test_duplicate_reply_suppression::test_stale_response_suppressed_when_interrupted`
called `_process_message_background` directly and implicitly relied on
the old recursive `await` semantic — updated to wait for the spawned
drain task before checking the sent list.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>