Files
hermes-agent/tests/tools/test_mcp_reconnect_signal.py

58 lines
2.0 KiB
Python
Raw Normal View History

fix(mcp): consolidate OAuth handling, pick up external token refreshes (#11383) * feat(mcp-oauth): scaffold MCPOAuthManager Central manager for per-server MCP OAuth state. Provides get_or_build_provider (cached), remove (evicts cache + deletes disk), invalidate_if_disk_changed (mtime watch, core fix for external-refresh workflow), and handle_401 (dedup'd recovery). No behavior change yet — existing call sites still use build_oauth_auth directly. Task 1 of 8 in the MCP OAuth consolidation (fixes Cthulhu's BetterStack reliability issues). * feat(mcp-oauth): add HermesMCPOAuthProvider with pre-flow disk watch Subclasses the MCP SDK's OAuthClientProvider to inject a disk mtime check before every async_auth_flow, via the central manager. When a subclass instance is used, external token refreshes (cron, another CLI instance) are picked up before the next API call. Still dead code: the manager's _build_provider still delegates to build_oauth_auth and returns the plain OAuthClientProvider. Task 4 wires this subclass in. Task 2 of 8. * refactor(mcp-oauth): extract build_oauth_auth helpers Decomposes build_oauth_auth into _configure_callback_port, _build_client_metadata, _maybe_preregister_client, and _parse_base_url. Public API preserved. These helpers let MCPOAuthManager._build_provider reuse the same logic in Task 4 instead of duplicating the construction dance. Also updates the SDK version hint in the warning from 1.10.0 to 1.26.0 (which is what we actually require for the OAuth types used here). Task 3 of 8. * feat(mcp-oauth): manager now builds HermesMCPOAuthProvider directly _build_provider constructs the disk-watching subclass using the helpers from Task 3, instead of delegating to the plain build_oauth_auth factory. Any consumer using the manager now gets pre-flow disk-freshness checks automatically. build_oauth_auth is preserved as the public API for backwards compatibility. The code path is now: MCPOAuthManager.get_or_build_provider -> _build_provider -> _configure_callback_port _build_client_metadata _maybe_preregister_client _parse_base_url HermesMCPOAuthProvider(...) Task 4 of 8. * feat(mcp): wire OAuth manager + add _reconnect_event MCPServerTask gains _reconnect_event alongside _shutdown_event. When set, _run_http / _run_stdio exit their async-with blocks cleanly (no exception), and the outer run() loop re-enters the transport to rebuild the MCP session with fresh credentials. This is the recovery path for OAuth failures that the SDK's in-place httpx.Auth cannot handle (e.g. cron externally consumed the refresh_token, or server-side session invalidation). _run_http now asks MCPOAuthManager for the OAuth provider instead of calling build_oauth_auth directly. Config-time, runtime, and reconnect paths all share one provider instance with pre-flow disk-watch active. shutdown() defensively sets both events so there is no race between reconnect and shutdown signalling. Task 5 of 8. * feat(mcp): detect auth failures in tool handlers, trigger reconnect All 5 MCP tool handlers (tool call, list_resources, read_resource, list_prompts, get_prompt) now detect auth failures and route through MCPOAuthManager.handle_401: 1. If the manager says recovery is viable (disk has fresh tokens, or SDK can refresh in-place), signal MCPServerTask._reconnect_event to tear down and rebuild the MCP session with fresh credentials, then retry the tool call once. 2. If no recovery path exists, return a structured needs_reauth JSON error so the model stops hallucinating manual refresh attempts (the 'let me curl the token endpoint' loop Cthulhu pasted from Discord). _is_auth_error catches OAuthFlowError, OAuthTokenError, OAuthNonInteractiveError, and httpx.HTTPStatusError(401). Non-auth exceptions still surface via the generic error path unchanged. Task 6 of 8. * feat(mcp-cli): route add/remove through manager, add 'hermes mcp login' cmd_mcp_add and cmd_mcp_remove now go through MCPOAuthManager instead of calling build_oauth_auth / remove_oauth_tokens directly. This means CLI config-time state and runtime MCP session state are backed by the same provider cache — removing a server evicts the live provider, adding a server populates the same cache the MCP session will read from. New 'hermes mcp login <name>' command: - Wipes both the on-disk tokens file and the in-memory MCPOAuthManager cache - Triggers a fresh OAuth browser flow via the existing probe path - Intended target for the needs_reauth error Task 6 returns to the model Task 7 of 8. * test(mcp-oauth): end-to-end integration tests Five new tests exercising the full consolidation with real file I/O and real imports (no transport mocks): 1. external_refresh_picked_up_without_restart — Cthulhu's cron workflow. External process writes fresh tokens to disk; on the next auth flow the manager's mtime-watch flips _initialized and the SDK re-reads from storage. 2. handle_401_deduplicates_concurrent_callers — 10 concurrent handlers for the same failed token fire exactly ONE recovery attempt (thundering-herd protection). 3. handle_401_returns_false_when_no_provider — defensive path for unknown servers. 4. invalidate_if_disk_changed_handles_missing_file — pre-auth state returns False cleanly. 5. provider_is_reused_across_reconnects — cache stickiness so reconnects preserve the disk-watch baseline mtime. Task 8 of 8 — consolidation complete.
2026-04-16 21:57:10 -07:00
"""Tests for the MCPServerTask reconnect signal.
When the OAuth layer cannot recover in-place (e.g., external refresh of a
single-use refresh_token made the SDK's in-memory refresh fail), the tool
handler signals MCPServerTask to tear down the current MCP session and
reconnect with fresh credentials. This file exercises the signal plumbing
in isolation from the full stdio/http transport machinery.
"""
import asyncio
import pytest
@pytest.mark.asyncio
async def test_reconnect_event_attribute_exists():
"""MCPServerTask has a _reconnect_event alongside _shutdown_event."""
from tools.mcp_tool import MCPServerTask
task = MCPServerTask("test")
assert hasattr(task, "_reconnect_event")
assert isinstance(task._reconnect_event, asyncio.Event)
assert not task._reconnect_event.is_set()
@pytest.mark.asyncio
async def test_wait_for_lifecycle_event_returns_reconnect():
"""When _reconnect_event fires, helper returns 'reconnect' and clears it."""
from tools.mcp_tool import MCPServerTask
task = MCPServerTask("test")
task._reconnect_event.set()
reason = await task._wait_for_lifecycle_event()
assert reason == "reconnect"
# Should have cleared so the next cycle starts fresh
assert not task._reconnect_event.is_set()
@pytest.mark.asyncio
async def test_wait_for_lifecycle_event_returns_shutdown():
"""When _shutdown_event fires, helper returns 'shutdown'."""
from tools.mcp_tool import MCPServerTask
task = MCPServerTask("test")
task._shutdown_event.set()
reason = await task._wait_for_lifecycle_event()
assert reason == "shutdown"
@pytest.mark.asyncio
async def test_wait_for_lifecycle_event_shutdown_wins_when_both_set():
"""If both events are set simultaneously, shutdown takes precedence."""
from tools.mcp_tool import MCPServerTask
task = MCPServerTask("test")
task._shutdown_event.set()
task._reconnect_event.set()
reason = await task._wait_for_lifecycle_event()
assert reason == "shutdown"