Make provider fallback turn-scoped in long-lived CLI sessions. Previously,
a single transient failure pinned the session to the fallback provider for
every subsequent turn. Now the primary model/provider is restored at the
start of each run_conversation() call.
Three pieces:
1. _primary_runtime dict snapshot at __init__ — captures model, provider,
base_url, api_mode, api_key, client_kwargs, prompt caching flag, and
context compressor state. Single dict is easy to extend; avoids the
brittleness of N individual _primary_* attributes.
2. _restore_primary_runtime() — called at the top of run_conversation().
Restores all primary state, rebuilds the client, resets the fallback
chain index (so all fallbacks are available again), and restores the
context compressor's model/context_length/threshold that
_try_activate_fallback() overwrites. No-op in the gateway (fresh
agent per message).
3. _try_recover_primary_transport() — after max_retries exhaust, rebuilds
the primary client (clearing stale connection pools) and gives one more
attempt before falling through to fallback. Only for transient transport
errors (ReadTimeout, ConnectTimeout, PoolTimeout, ConnectError,
RemoteProtocolError). Skipped for aggregator providers (OpenRouter,
Nous) which manage their own retry infrastructure.
Inspired by PR #4612 (betamod) which identified this gap.
Includes 25 tests covering snapshot creation, restore correctness,
fallback index reset, compressor state restoration, transport recovery
scoping, wait time behavior, and error resilience.