fix: gateway token double-counting — use absolute set instead of increment (#3317)

The gateway's update_session() used += for token counts, but the cached
agent's session_prompt_tokens / session_completion_tokens are cumulative
totals that grow across messages. Each update_session call re-added the
running total, inflating usage stats with every message (1.7x after 3
messages, worse over longer conversations).

Fix: change += to = for in-memory entry fields, add set_token_counts()
to SessionDB that uses direct assignment instead of SQL increment, and
switch the gateway to call it.

CLI mode continues using update_token_counts() (increment) since it
tracks per-API-call deltas — that path is unchanged.

Based on analysis from PR #3222 by @zaycruz (closed).

Co-authored-by: zaycruz <zay@users.noreply.github.com>
This commit is contained in:
Teknium
2026-03-26 19:13:07 -07:00
committed by GitHub
parent 867eefdd9f
commit 22cfad157b
3 changed files with 68 additions and 2 deletions

View File

@@ -846,7 +846,7 @@ class TestLastPromptTokens:
store.update_session("k1", model="openai/gpt-5.4")
store._db.update_token_counts.assert_called_once_with(
store._db.set_token_counts.assert_called_once_with(
"s1",
input_tokens=0,
output_tokens=0,