2026-03-14 21:14:20 -07:00
|
|
|
|
"""Shared auxiliary client router for side tasks.
|
2026-02-22 02:16:11 -08:00
|
|
|
|
|
|
|
|
|
|
Provides a single resolution chain so every consumer (context compression,
|
|
|
|
|
|
session search, web extraction, vision analysis, browser vision) picks up
|
|
|
|
|
|
the best available backend without duplicating fallback logic.
|
|
|
|
|
|
|
2026-03-08 18:06:40 -07:00
|
|
|
|
Resolution order for text tasks (auto mode):
|
2026-02-22 02:16:11 -08:00
|
|
|
|
1. OpenRouter (OPENROUTER_API_KEY)
|
|
|
|
|
|
2. Nous Portal (~/.hermes/auth.json active provider)
|
refactor: make config.yaml the single source of truth for endpoint URLs (#4165)
OPENAI_BASE_URL was written to .env AND config.yaml, creating a dual-source
confusion. Users (especially Docker) would see the URL in .env and assume
that's where all config lives, then wonder why LLM_MODEL in .env didn't work.
Changes:
- Remove all 27 save_env_value("OPENAI_BASE_URL", ...) calls across main.py,
setup.py, and tools_config.py
- Remove OPENAI_BASE_URL env var reading from runtime_provider.py, cli.py,
models.py, and gateway/run.py
- Remove LLM_MODEL/HERMES_MODEL env var reading from gateway/run.py and
auxiliary_client.py — config.yaml model.default is authoritative
- Vision base URL now saved to config.yaml auxiliary.vision.base_url
(both setup wizard and tools_config paths)
- Tests updated to set config values instead of env vars
Convention enforced: .env is for SECRETS only (API keys). All other
configuration (model names, base URLs, provider selection) lives
exclusively in config.yaml.
2026-03-30 22:02:53 -07:00
|
|
|
|
3. Custom endpoint (config.yaml model.base_url + OPENAI_API_KEY)
|
2026-02-28 21:47:51 -08:00
|
|
|
|
4. Codex OAuth (Responses API via chatgpt.com with gpt-5.3-codex,
|
|
|
|
|
|
wrapped to look like a chat.completions client)
|
2026-03-14 21:14:20 -07:00
|
|
|
|
5. Native Anthropic
|
|
|
|
|
|
6. Direct API-key providers (z.ai/GLM, Kimi/Moonshot, MiniMax, MiniMax-CN)
|
|
|
|
|
|
7. None
|
2026-02-22 02:16:11 -08:00
|
|
|
|
|
2026-03-08 18:06:40 -07:00
|
|
|
|
Resolution order for vision/multimodal tasks (auto mode):
|
2026-03-14 21:14:20 -07:00
|
|
|
|
1. Selected main provider, if it is one of the supported vision backends below
|
|
|
|
|
|
2. OpenRouter
|
|
|
|
|
|
3. Nous Portal
|
|
|
|
|
|
4. Codex OAuth (gpt-5.3-codex supports vision via Responses API)
|
|
|
|
|
|
5. Native Anthropic
|
|
|
|
|
|
6. Custom endpoint (for local vision models: Qwen-VL, LLaVA, Pixtral, etc.)
|
|
|
|
|
|
7. None
|
2026-03-08 18:06:40 -07:00
|
|
|
|
|
2026-04-11 11:21:59 -07:00
|
|
|
|
Per-task overrides are configured in config.yaml under the ``auxiliary:`` section
|
|
|
|
|
|
(e.g. ``auxiliary.vision.provider``, ``auxiliary.compression.model``).
|
2026-03-08 18:06:40 -07:00
|
|
|
|
Default "auto" follows the chains above.
|
|
|
|
|
|
|
2026-04-06 12:41:40 -07:00
|
|
|
|
Payment / credit exhaustion fallback:
|
|
|
|
|
|
When a resolved provider returns HTTP 402 or a credit-related error,
|
|
|
|
|
|
call_llm() automatically retries with the next available provider in the
|
|
|
|
|
|
auto-detection chain. This handles the common case where a user depletes
|
|
|
|
|
|
their OpenRouter balance but has Codex OAuth or another provider available.
|
2026-02-22 02:16:11 -08:00
|
|
|
|
"""
|
|
|
|
|
|
|
|
|
|
|
|
import json
|
|
|
|
|
|
import logging
|
|
|
|
|
|
import os
|
2026-03-17 02:53:33 -07:00
|
|
|
|
import threading
|
2026-03-21 17:36:25 -07:00
|
|
|
|
import time
|
chore: remove ~100 unused imports across 55 files (#3016)
Automated cleanup via pyflakes + autoflake with manual review.
Changes:
- Removed unused stdlib imports (os, sys, json, pathlib.Path, etc.)
- Removed unused typing imports (List, Dict, Any, Optional, Tuple, Set, etc.)
- Removed unused internal imports (hermes_cli.auth, hermes_cli.config, etc.)
- Fixed cli.py: removed 8 shadowed banner imports (imported from hermes_cli.banner
then immediately redefined locally — only build_welcome_banner is actually used)
- Added noqa comments to imports that appear unused but serve a purpose:
- Re-exports (gateway/session.py SessionResetPolicy, tools/terminal_tool.py
is_interrupted/_interrupt_event)
- SDK presence checks in try/except (daytona, fal_client, discord)
- Test mock targets (auxiliary_client.py Path, mcp_config.py get_hermes_home)
Zero behavioral changes. Full test suite passes (6162/6162, 2 pre-existing
streaming test failures unrelated to this change).
2026-03-25 15:02:03 -07:00
|
|
|
|
from pathlib import Path # noqa: F401 — used by test mocks
|
2026-02-28 21:47:51 -08:00
|
|
|
|
from types import SimpleNamespace
|
|
|
|
|
|
from typing import Any, Dict, List, Optional, Tuple
|
2026-02-22 02:16:11 -08:00
|
|
|
|
|
|
|
|
|
|
from openai import OpenAI
|
|
|
|
|
|
|
feat(auth): same-provider credential pools with rotation, custom endpoint support, and interactive CLI (#2647)
* feat(auth): add same-provider credential pools and rotation UX
Add same-provider credential pooling so Hermes can rotate across
multiple credentials for a single provider, recover from exhausted
credentials without jumping providers immediately, and configure
that behavior directly in hermes setup.
- agent/credential_pool.py: persisted per-provider credential pools
- hermes auth add/list/remove/reset CLI commands
- 429/402/401 recovery with pool rotation in run_agent.py
- Setup wizard integration for pool strategy configuration
- Auto-seeding from env vars and existing OAuth state
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
Salvaged from PR #2647
* fix(tests): prevent pool auto-seeding from host env in credential pool tests
Tests for non-pool Anthropic paths and auth remove were failing when
host env vars (ANTHROPIC_API_KEY) or file-backed OAuth credentials
were present. The pool auto-seeding picked these up, causing unexpected
pool entries in tests.
- Mock _select_pool_entry in auxiliary_client OAuth flag tests
- Clear Anthropic env vars and mock _seed_from_singletons in auth remove test
* feat(auth): add thread safety, least_used strategy, and request counting
- Add threading.Lock to CredentialPool for gateway thread safety
(concurrent requests from multiple gateway sessions could race on
pool state mutations without this)
- Add 'least_used' rotation strategy that selects the credential
with the lowest request_count, distributing load more evenly
- Add request_count field to PooledCredential for usage tracking
- Add mark_used() method to increment per-credential request counts
- Wrap select(), mark_exhausted_and_rotate(), and try_refresh_current()
with lock acquisition
- Add tests: least_used selection, mark_used counting, concurrent
thread safety (4 threads × 20 selects with no corruption)
* feat(auth): add interactive mode for bare 'hermes auth' command
When 'hermes auth' is called without a subcommand, it now launches an
interactive wizard that:
1. Shows full credential pool status across all providers
2. Offers a menu: add, remove, reset cooldowns, set strategy
3. For OAuth-capable providers (anthropic, nous, openai-codex), the
add flow explicitly asks 'API key or OAuth login?' — making it
clear that both auth types are supported for the same provider
4. Strategy picker shows all 4 options (fill_first, round_robin,
least_used, random) with the current selection marked
5. Remove flow shows entries with indices for easy selection
The subcommand paths (hermes auth add/list/remove/reset) still work
exactly as before for scripted/non-interactive use.
* fix(tests): update runtime_provider tests for config.yaml source of truth (#4165)
Tests were using OPENAI_BASE_URL env var which is no longer consulted
after #4165. Updated to use model config (provider, base_url, api_key)
which is the new single source of truth for custom endpoint URLs.
* feat(auth): support custom endpoint credential pools keyed by provider name
Custom OpenAI-compatible endpoints all share provider='custom', making
the provider-keyed pool useless. Now pools for custom endpoints are
keyed by 'custom:<normalized_name>' where the name comes from the
custom_providers config list (auto-generated from URL hostname).
- Pool key format: 'custom:together.ai', 'custom:local-(localhost:8080)'
- load_pool('custom:name') seeds from custom_providers api_key AND
model.api_key when base_url matches
- hermes auth add/list now shows custom endpoints alongside registry
providers
- _resolve_openrouter_runtime and _resolve_named_custom_runtime check
pool before falling back to single config key
- 6 new tests covering custom pool keying, seeding, and listing
* docs: add Excalidraw diagram of full credential pool flow
Comprehensive architecture diagram showing:
- Credential sources (env vars, auth.json OAuth, config.yaml, CLI)
- Pool storage and auto-seeding
- Runtime resolution paths (registry, custom, OpenRouter)
- Error recovery (429 retry-then-rotate, 402 immediate, 401 refresh)
- CLI management commands and strategy configuration
Open at: https://excalidraw.com/#json=2Ycqhqpi6f12E_3ITyiwh,c7u9jSt5BwrmiVzHGbm87g
* fix(tests): update setup wizard pool tests for unified select_provider_and_model flow
The setup wizard now delegates to select_provider_and_model() instead
of using its own prompt_choice-based provider picker. Tests needed:
- Mock select_provider_and_model as no-op (provider pre-written to config)
- Call _stub_tts BEFORE custom prompt_choice mock (it overwrites it)
- Pre-write model.provider to config so the pool step is reached
* docs: add comprehensive credential pool documentation
- New page: website/docs/user-guide/features/credential-pools.md
Full guide covering quick start, CLI commands, rotation strategies,
error recovery, custom endpoint pools, auto-discovery, thread safety,
architecture, and storage format.
- Updated fallback-providers.md to reference credential pools as the
first layer of resilience (same-provider rotation before cross-provider)
- Added hermes auth to CLI commands reference with usage examples
- Added credential_pool_strategies to configuration guide
* chore: remove excalidraw diagram from repo (external link only)
* refactor: simplify credential pool code — extract helpers, collapse extras, dedup patterns
- _load_config_safe(): replace 4 identical try/except/import blocks
- _iter_custom_providers(): shared generator for custom provider iteration
- PooledCredential.extra dict: collapse 11 round-trip-only fields
(token_type, scope, client_id, portal_base_url, obtained_at,
expires_in, agent_key_id, agent_key_expires_in, agent_key_reused,
agent_key_obtained_at, tls) into a single extra dict with
__getattr__ for backward-compatible access
- _available_entries(): shared exhaustion-check between select and peek
- Dedup anthropic OAuth seeding (hermes_pkce + claude_code identical)
- SimpleNamespace replaces class _Args boilerplate in auth_commands
- _try_resolve_from_custom_pool(): shared pool-check in runtime_provider
Net -17 lines. All 383 targeted tests pass.
---------
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
2026-03-31 03:10:01 -07:00
|
|
|
|
from agent.credential_pool import load_pool
|
fix(cli): respect HERMES_HOME in all remaining hardcoded ~/.hermes paths
Several files resolved paths via Path.home() / ".hermes" or
os.path.expanduser("~/.hermes/..."), bypassing the HERMES_HOME
environment variable. This broke isolation when running multiple
Hermes instances with distinct HERMES_HOME directories.
Replace all hardcoded paths with calls to get_hermes_home() from
hermes_cli.config, consistent with the rest of the codebase.
Files fixed:
- tools/process_registry.py (processes.json)
- gateway/pairing.py (pairing/)
- gateway/sticker_cache.py (sticker_cache.json)
- gateway/channel_directory.py (channel_directory.json, sessions.json)
- gateway/config.py (gateway.json, config.yaml, sessions_dir)
- gateway/mirror.py (sessions/)
- gateway/hooks.py (hooks/)
- gateway/platforms/base.py (image_cache/, audio_cache/, document_cache/)
- gateway/platforms/whatsapp.py (whatsapp/session)
- gateway/delivery.py (cron/output)
- agent/auxiliary_client.py (auth.json)
- agent/prompt_builder.py (SOUL.md)
- cli.py (config.yaml, images/, pastes/, history)
- run_agent.py (logs/)
- tools/environments/base.py (sandboxes/)
- tools/environments/modal.py (modal_snapshots.json)
- tools/environments/singularity.py (singularity_snapshots.json)
- tools/tts_tool.py (audio_cache)
- hermes_cli/status.py (cron/jobs.json, sessions.json)
- hermes_cli/gateway.py (logs/, whatsapp session)
- hermes_cli/main.py (whatsapp/session)
Tests updated to use HERMES_HOME env var instead of patching Path.home().
Closes #892
(cherry picked from commit 78ac1bba43b8b74a934c6172f2c29bb4d03164b9)
2026-03-11 07:31:41 +01:00
|
|
|
|
from hermes_cli.config import get_hermes_home
|
2026-02-22 02:16:11 -08:00
|
|
|
|
from hermes_constants import OPENROUTER_BASE_URL
|
2026-04-21 17:55:04 +08:00
|
|
|
|
from utils import base_url_host_matches, base_url_hostname, normalize_proxy_env_vars
|
2026-02-22 02:16:11 -08:00
|
|
|
|
|
|
|
|
|
|
logger = logging.getLogger(__name__)
|
|
|
|
|
|
|
2026-04-11 12:48:09 +05:30
|
|
|
|
# Module-level flag: only warn once per process about stale OPENAI_BASE_URL.
|
|
|
|
|
|
_stale_base_url_warned = False
|
|
|
|
|
|
|
2026-04-07 22:23:28 -07:00
|
|
|
|
_PROVIDER_ALIASES = {
|
|
|
|
|
|
"google": "gemini",
|
|
|
|
|
|
"google-gemini": "gemini",
|
|
|
|
|
|
"google-ai-studio": "gemini",
|
feat(xai): upgrade to Responses API, add TTS provider
Cherry-picked and trimmed from PR #10600 by Jaaneek.
- Switch xAI transport from openai_chat to codex_responses (Responses API)
- Add codex_responses detection for xAI in all runtime_provider resolution paths
- Add xAI api_mode detection in AIAgent.__init__ (provider name + URL auto-detect)
- Add extra_headers passthrough for codex_responses requests
- Add x-grok-conv-id session header for xAI prompt caching
- Add xAI reasoning support (encrypted_content include, no effort param)
- Move x-grok-conv-id from chat_completions path to codex_responses path
- Add xAI TTS provider (dedicated /v1/tts endpoint with Opus conversion)
- Add xAI provider aliases (grok, x-ai, x.ai) across auth, models, providers, auxiliary
- Trim xAI model list to agentic models (grok-4.20-reasoning, grok-4-1-fast-reasoning)
- Add XAI_API_KEY/XAI_BASE_URL to OPTIONAL_ENV_VARS
- Add xAI TTS config section, setup wizard entry, tools_config provider option
- Add shared xai_http.py helper for User-Agent string
Co-authored-by: Jaaneek <Jaaneek@users.noreply.github.com>
2026-04-15 22:27:26 -07:00
|
|
|
|
"x-ai": "xai",
|
|
|
|
|
|
"x.ai": "xai",
|
|
|
|
|
|
"grok": "xai",
|
2026-04-07 22:23:28 -07:00
|
|
|
|
"glm": "zai",
|
|
|
|
|
|
"z-ai": "zai",
|
|
|
|
|
|
"z.ai": "zai",
|
|
|
|
|
|
"zhipu": "zai",
|
|
|
|
|
|
"kimi": "kimi-coding",
|
|
|
|
|
|
"moonshot": "kimi-coding",
|
2026-04-13 11:13:09 -07:00
|
|
|
|
"kimi-cn": "kimi-coding-cn",
|
|
|
|
|
|
"moonshot-cn": "kimi-coding-cn",
|
2026-04-07 22:23:28 -07:00
|
|
|
|
"minimax-china": "minimax-cn",
|
|
|
|
|
|
"minimax_cn": "minimax-cn",
|
|
|
|
|
|
"claude": "anthropic",
|
|
|
|
|
|
"claude-code": "anthropic",
|
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
|
2026-04-13 16:08:19 +08:00
|
|
|
|
def _normalize_aux_provider(provider: Optional[str]) -> str:
|
2026-04-07 22:23:28 -07:00
|
|
|
|
normalized = (provider or "auto").strip().lower()
|
|
|
|
|
|
if normalized.startswith("custom:"):
|
|
|
|
|
|
suffix = normalized.split(":", 1)[1].strip()
|
|
|
|
|
|
if not suffix:
|
|
|
|
|
|
return "custom"
|
2026-04-13 16:08:19 +08:00
|
|
|
|
normalized = suffix
|
2026-04-07 22:23:28 -07:00
|
|
|
|
if normalized == "codex":
|
|
|
|
|
|
return "openai-codex"
|
|
|
|
|
|
if normalized == "main":
|
|
|
|
|
|
# Resolve to the user's actual main provider so named custom providers
|
|
|
|
|
|
# and non-aggregator providers (DeepSeek, Alibaba, etc.) work correctly.
|
|
|
|
|
|
main_prov = _read_main_provider()
|
|
|
|
|
|
if main_prov and main_prov not in ("auto", "main", ""):
|
|
|
|
|
|
return main_prov
|
|
|
|
|
|
return "custom"
|
|
|
|
|
|
return _PROVIDER_ALIASES.get(normalized, normalized)
|
|
|
|
|
|
|
2026-04-17 16:17:15 -06:00
|
|
|
|
|
2026-04-20 12:23:05 -07:00
|
|
|
|
# Sentinel: when returned by _fixed_temperature_for_model(), callers must
|
|
|
|
|
|
# strip the ``temperature`` key from API kwargs entirely so the provider's
|
|
|
|
|
|
# server-side default applies. Kimi/Moonshot models manage temperature
|
|
|
|
|
|
# internally — sending *any* value (even the "correct" one) can conflict
|
|
|
|
|
|
# with gateway-side mode selection (thinking → 1.0, non-thinking → 0.6).
|
|
|
|
|
|
OMIT_TEMPERATURE: object = object()
|
2026-04-17 16:17:15 -06:00
|
|
|
|
|
2026-04-20 12:23:05 -07:00
|
|
|
|
|
|
|
|
|
|
def _is_kimi_model(model: Optional[str]) -> bool:
|
|
|
|
|
|
"""True for any Kimi / Moonshot model that manages temperature server-side."""
|
|
|
|
|
|
bare = (model or "").strip().lower().rsplit("/", 1)[-1]
|
|
|
|
|
|
return bare.startswith("kimi-") or bare == "kimi"
|
2026-04-20 04:18:49 +09:00
|
|
|
|
|
2026-04-17 16:17:15 -06:00
|
|
|
|
|
2026-04-20 04:18:49 +09:00
|
|
|
|
def _fixed_temperature_for_model(
|
|
|
|
|
|
model: Optional[str],
|
|
|
|
|
|
base_url: Optional[str] = None,
|
2026-04-20 12:23:05 -07:00
|
|
|
|
) -> "Optional[float] | object":
|
|
|
|
|
|
"""Return a temperature directive for models with strict contracts.
|
fix(kimi): force fixed temperature on kimi-k2.* models (k2.5, thinking, turbo) (#12144)
* fix(kimi): force fixed temperature on kimi-k2.* models (k2.5, thinking, turbo)
The prior override only matched the literal model name "kimi-for-coding",
but Moonshot's coding endpoint is hit with real model IDs such as
`kimi-k2.5`, `kimi-k2-turbo-preview`, `kimi-k2-thinking`, etc. Those
requests bypassed the override and kept the caller's temperature, so
Moonshot returns HTTP 400 "invalid temperature: only 0.6 is allowed for
this model" (or 1.0 for thinking variants).
Match the whole kimi-k2.* family:
* kimi-k2-thinking / kimi-k2-thinking-turbo -> 1.0 (thinking mode)
* all other kimi-k2.* -> 0.6 (non-thinking / instant mode)
Also accept an optional vendor prefix (e.g. `moonshotai/kimi-k2.5`) so
aggregator routings are covered.
* refactor(kimi): whitelist-match kimi coding models instead of prefix
Addresses review feedback on PR #12144.
- Replace `startswith("kimi-k2")` with explicit frozensets sourced from
Moonshot's kimi-for-coding model list. The prefix match would have also
clamped `kimi-k2-instruct` / `kimi-k2-instruct-0905`, which are the
separate non-coding K2 family with variable temperature (recommended 0.6
but not enforced — see huggingface.co/moonshotai/Kimi-K2-Instruct).
- Confirmed via platform.kimi.ai docs that all five coding models
(k2.5, k2-turbo-preview, k2-0905-preview, k2-thinking, k2-thinking-turbo)
share the fixed-temperature lock, so the preview-model mapping is no
longer an assumption.
- Drop the fragile `"thinking" in bare` substring test for a set lookup.
- Log a debug line on each override so operators can see when Hermes
silently rewrites temperature.
- Update class docstring. Extend the negative test to parametrize over
kimi-k2-instruct, Kimi-K2-Instruct-0905, and a hypothetical future
kimi-k2-experimental name — all must keep the caller's temperature.
2026-04-18 09:35:51 -07:00
|
|
|
|
|
2026-04-20 12:23:05 -07:00
|
|
|
|
Returns:
|
|
|
|
|
|
``OMIT_TEMPERATURE`` — caller must remove the ``temperature`` key so the
|
|
|
|
|
|
provider chooses its own default. Used for all Kimi / Moonshot
|
|
|
|
|
|
models whose gateway selects temperature server-side.
|
|
|
|
|
|
``float`` — a specific value the caller must use (reserved for future
|
|
|
|
|
|
models with fixed-temperature contracts).
|
|
|
|
|
|
``None`` — no override; caller should use its own default.
|
fix(kimi): force fixed temperature on kimi-k2.* models (k2.5, thinking, turbo) (#12144)
* fix(kimi): force fixed temperature on kimi-k2.* models (k2.5, thinking, turbo)
The prior override only matched the literal model name "kimi-for-coding",
but Moonshot's coding endpoint is hit with real model IDs such as
`kimi-k2.5`, `kimi-k2-turbo-preview`, `kimi-k2-thinking`, etc. Those
requests bypassed the override and kept the caller's temperature, so
Moonshot returns HTTP 400 "invalid temperature: only 0.6 is allowed for
this model" (or 1.0 for thinking variants).
Match the whole kimi-k2.* family:
* kimi-k2-thinking / kimi-k2-thinking-turbo -> 1.0 (thinking mode)
* all other kimi-k2.* -> 0.6 (non-thinking / instant mode)
Also accept an optional vendor prefix (e.g. `moonshotai/kimi-k2.5`) so
aggregator routings are covered.
* refactor(kimi): whitelist-match kimi coding models instead of prefix
Addresses review feedback on PR #12144.
- Replace `startswith("kimi-k2")` with explicit frozensets sourced from
Moonshot's kimi-for-coding model list. The prefix match would have also
clamped `kimi-k2-instruct` / `kimi-k2-instruct-0905`, which are the
separate non-coding K2 family with variable temperature (recommended 0.6
but not enforced — see huggingface.co/moonshotai/Kimi-K2-Instruct).
- Confirmed via platform.kimi.ai docs that all five coding models
(k2.5, k2-turbo-preview, k2-0905-preview, k2-thinking, k2-thinking-turbo)
share the fixed-temperature lock, so the preview-model mapping is no
longer an assumption.
- Drop the fragile `"thinking" in bare` substring test for a set lookup.
- Log a debug line on each override so operators can see when Hermes
silently rewrites temperature.
- Update class docstring. Extend the negative test to parametrize over
kimi-k2-instruct, Kimi-K2-Instruct-0905, and a hypothetical future
kimi-k2-experimental name — all must keep the caller's temperature.
2026-04-18 09:35:51 -07:00
|
|
|
|
"""
|
2026-04-20 12:23:05 -07:00
|
|
|
|
if _is_kimi_model(model):
|
|
|
|
|
|
logger.debug("Omitting temperature for Kimi model %r (server-managed)", model)
|
|
|
|
|
|
return OMIT_TEMPERATURE
|
fix(kimi): force fixed temperature on kimi-k2.* models (k2.5, thinking, turbo) (#12144)
* fix(kimi): force fixed temperature on kimi-k2.* models (k2.5, thinking, turbo)
The prior override only matched the literal model name "kimi-for-coding",
but Moonshot's coding endpoint is hit with real model IDs such as
`kimi-k2.5`, `kimi-k2-turbo-preview`, `kimi-k2-thinking`, etc. Those
requests bypassed the override and kept the caller's temperature, so
Moonshot returns HTTP 400 "invalid temperature: only 0.6 is allowed for
this model" (or 1.0 for thinking variants).
Match the whole kimi-k2.* family:
* kimi-k2-thinking / kimi-k2-thinking-turbo -> 1.0 (thinking mode)
* all other kimi-k2.* -> 0.6 (non-thinking / instant mode)
Also accept an optional vendor prefix (e.g. `moonshotai/kimi-k2.5`) so
aggregator routings are covered.
* refactor(kimi): whitelist-match kimi coding models instead of prefix
Addresses review feedback on PR #12144.
- Replace `startswith("kimi-k2")` with explicit frozensets sourced from
Moonshot's kimi-for-coding model list. The prefix match would have also
clamped `kimi-k2-instruct` / `kimi-k2-instruct-0905`, which are the
separate non-coding K2 family with variable temperature (recommended 0.6
but not enforced — see huggingface.co/moonshotai/Kimi-K2-Instruct).
- Confirmed via platform.kimi.ai docs that all five coding models
(k2.5, k2-turbo-preview, k2-0905-preview, k2-thinking, k2-thinking-turbo)
share the fixed-temperature lock, so the preview-model mapping is no
longer an assumption.
- Drop the fragile `"thinking" in bare` substring test for a set lookup.
- Log a debug line on each override so operators can see when Hermes
silently rewrites temperature.
- Update class docstring. Extend the negative test to parametrize over
kimi-k2-instruct, Kimi-K2-Instruct-0905, and a hypothetical future
kimi-k2-experimental name — all must keep the caller's temperature.
2026-04-18 09:35:51 -07:00
|
|
|
|
return None
|
2026-04-17 16:17:15 -06:00
|
|
|
|
|
2026-03-06 19:08:54 -08:00
|
|
|
|
# Default auxiliary models for direct API-key providers (cheap/fast for side tasks)
|
|
|
|
|
|
_API_KEY_PROVIDER_AUX_MODELS: Dict[str, str] = {
|
2026-04-06 10:19:19 -07:00
|
|
|
|
"gemini": "gemini-3-flash-preview",
|
2026-03-06 19:08:54 -08:00
|
|
|
|
"zai": "glm-4.5-flash",
|
|
|
|
|
|
"kimi-coding": "kimi-k2-turbo-preview",
|
2026-04-22 13:28:01 +05:30
|
|
|
|
"stepfun": "step-3.5-flash",
|
2026-04-13 11:13:09 -07:00
|
|
|
|
"kimi-coding-cn": "kimi-k2-turbo-preview",
|
2026-04-08 01:39:28 -07:00
|
|
|
|
"minimax": "MiniMax-M2.7",
|
|
|
|
|
|
"minimax-cn": "MiniMax-M2.7",
|
feat: native Anthropic provider with Claude Code credential auto-discovery
Add Anthropic as a first-class inference provider, bypassing OpenRouter
for direct API access. Uses the native Anthropic SDK with a full format
adapter (same pattern as the codex_responses api_mode).
## Auth (three methods, priority order)
1. ANTHROPIC_API_KEY env var (regular API key, sk-ant-api-*)
2. ANTHROPIC_TOKEN / CLAUDE_CODE_OAUTH_TOKEN env var (setup-token, sk-ant-oat-*)
3. Auto-discovery from ~/.claude/.credentials.json (Claude Code subscription)
- Reads Claude Code's OAuth credentials
- Checks token expiry with 60s buffer
- Setup tokens use Bearer auth + anthropic-beta: oauth-2025-04-20 header
- Regular API keys use standard x-api-key header
## Changes by file
### New files
- agent/anthropic_adapter.py — Client builder, message/tool/response
format conversion, Claude Code credential reader, token resolver.
Handles system prompt extraction, tool_use/tool_result blocks,
thinking/reasoning, orphaned tool_use cleanup, cache_control.
- tests/test_anthropic_adapter.py — 36 tests covering all adapter logic
### Modified files
- pyproject.toml — Add anthropic>=0.39.0 dependency
- hermes_cli/auth.py — Add 'anthropic' to PROVIDER_REGISTRY with
three env vars, plus 'claude'/'claude-code' aliases
- hermes_cli/models.py — Add model catalog, labels, aliases, provider order
- hermes_cli/main.py — Add 'anthropic' to --provider CLI choices
- hermes_cli/runtime_provider.py — Add Anthropic branch returning
api_mode='anthropic_messages' (before generic api_key fallthrough)
- hermes_cli/setup.py — Add Anthropic setup wizard with Claude Code
credential auto-discovery, model selection, OpenRouter tools prompt
- agent/auxiliary_client.py — Add claude-haiku-4-5 as aux model
- agent/model_metadata.py — Add bare Claude model context lengths
- run_agent.py — Add anthropic_messages api_mode:
* Client init (Anthropic SDK instead of OpenAI)
* API call dispatch (_anthropic_client.messages.create)
* Response validation (content blocks)
* finish_reason mapping (stop_reason -> finish_reason)
* Token usage (input_tokens/output_tokens)
* Response normalization (normalize_anthropic_response)
* Client interrupt/rebuild
* Prompt caching auto-enabled for native Anthropic
- tests/test_run_agent.py — Update test_anthropic_base_url_accepted to
expect native routing, add test_prompt_caching_native_anthropic
2026-03-12 15:47:45 -07:00
|
|
|
|
"anthropic": "claude-haiku-4-5-20251001",
|
2026-03-17 00:12:16 -07:00
|
|
|
|
"ai-gateway": "google/gemini-3-flash",
|
2026-03-17 02:02:43 -07:00
|
|
|
|
"opencode-zen": "gemini-3-flash",
|
|
|
|
|
|
"opencode-go": "glm-5",
|
feat: add Kilo Code (kilocode) as first-class inference provider (#1666)
Add Kilo Gateway (kilo.ai) as an API-key provider with OpenAI-compatible
endpoint at https://api.kilo.ai/api/gateway. Supports 500+ models from
Anthropic, OpenAI, Google, xAI, Mistral, MiniMax via a single API key.
- Register kilocode in PROVIDER_REGISTRY with aliases (kilo, kilo-code,
kilo-gateway) and KILOCODE_API_KEY / KILOCODE_BASE_URL env vars
- Add to model catalog, CLI provider menu, setup wizard, doctor checks
- Add google/gemini-3-flash-preview as default aux model
- 12 new tests covering registration, aliases, credential resolution,
runtime config
- Documentation updates (env vars, config, fallback providers)
- Fix setup test index shift from provider insertion
Inspired by PR #1473 by @amanning3390.
Co-authored-by: amanning3390 <amanning3390@users.noreply.github.com>
2026-03-17 02:40:34 -07:00
|
|
|
|
"kilocode": "google/gemini-3-flash-preview",
|
2026-04-15 22:32:05 -07:00
|
|
|
|
"ollama-cloud": "nemotron-3-nano:30b",
|
feat(xiaomi): add Xiaomi MiMo as first-class provider
Cherry-picked from PR #7702 by kshitijk4poor.
Adds Xiaomi MiMo as a direct provider (XIAOMI_API_KEY) with models:
- mimo-v2-pro (1M context), mimo-v2-omni (256K, multimodal), mimo-v2-flash (256K, cheapest)
Standard OpenAI-compatible provider checklist: auth.py, config.py, models.py,
main.py, providers.py, doctor.py, model_normalize.py, model_metadata.py,
models_dev.py, auxiliary_client.py, .env.example, cli-config.yaml.example.
Follow-up: vision tasks use mimo-v2-omni (multimodal) instead of the user's
main model. Non-vision aux uses the user's selected model. Added
_PROVIDER_VISION_MODELS dict for provider-specific vision model overrides.
On failure, falls back to aggregators (gemini flash) via existing fallback chain.
Corrects pre-existing context lengths: mimo-v2-pro 1048576→1000000,
mimo-v2-omni 1048576→256000, adds mimo-v2-flash 256000.
36 tests covering registry, aliases, auto-detect, credentials, models.dev,
normalization, URL mapping, providers module, doctor, aux client, vision
model override, and agent init.
2026-04-11 10:10:31 -07:00
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
# Vision-specific model overrides for direct providers.
|
|
|
|
|
|
# When the user's main provider has a dedicated vision/multimodal model that
|
|
|
|
|
|
# differs from their main chat model, map it here. The vision auto-detect
|
|
|
|
|
|
# "exotic provider" branch checks this before falling back to the main model.
|
|
|
|
|
|
_PROVIDER_VISION_MODELS: Dict[str, str] = {
|
feat: add Xiaomi MiMo v2.5-pro and v2.5 model support (#14635)
## Merged
Adds MiMo v2.5-pro and v2.5 support to Xiaomi native provider, OpenCode Go, and setup wizard.
### Changes
- Context lengths: added v2.5-pro (1M) and v2.5 (1M), corrected existing MiMo entries to exact values (262144)
- Provider lists: xiaomi, opencode-go, setup wizard
- Vision: upgraded from mimo-v2-omni to mimo-v2.5 (omnimodal)
- Config description updated for XIAOMI_API_KEY
- Tests updated for new vision model preference
### Verification
- 4322 tests passed, 0 new regressions
- Live API tested on Xiaomi portal: basic, reasoning, tool calling, multi-tool, file ops, system prompt, vision — all pass
- Self-review found and fixed 2 issues (redundant vision check, stale HuggingFace context length)
2026-04-23 10:06:25 -07:00
|
|
|
|
"xiaomi": "mimo-v2.5",
|
2026-04-14 16:26:01 -07:00
|
|
|
|
"zai": "glm-5v-turbo",
|
2026-03-06 19:08:54 -08:00
|
|
|
|
}
|
|
|
|
|
|
|
2026-02-25 16:34:47 -08:00
|
|
|
|
# OpenRouter app attribution headers
|
|
|
|
|
|
_OR_HEADERS = {
|
2026-03-12 16:20:22 -07:00
|
|
|
|
"HTTP-Referer": "https://hermes-agent.nousresearch.com",
|
2026-02-25 16:34:47 -08:00
|
|
|
|
"X-OpenRouter-Title": "Hermes Agent",
|
2026-02-28 10:38:49 -08:00
|
|
|
|
"X-OpenRouter-Categories": "productivity,cli-agent",
|
2026-02-25 16:34:47 -08:00
|
|
|
|
}
|
|
|
|
|
|
|
2026-04-19 22:00:45 -07:00
|
|
|
|
# Vercel AI Gateway app attribution headers. HTTP-Referer maps to
|
|
|
|
|
|
# referrerUrl and X-Title maps to appName in the gateway's analytics.
|
|
|
|
|
|
from hermes_cli import __version__ as _HERMES_VERSION
|
|
|
|
|
|
|
|
|
|
|
|
_AI_GATEWAY_HEADERS = {
|
|
|
|
|
|
"HTTP-Referer": "https://hermes-agent.nousresearch.com",
|
|
|
|
|
|
"X-Title": "Hermes Agent",
|
|
|
|
|
|
"User-Agent": f"HermesAgent/{_HERMES_VERSION}",
|
|
|
|
|
|
}
|
|
|
|
|
|
|
2026-02-25 18:39:36 -08:00
|
|
|
|
# Nous Portal extra_body for product attribution.
|
|
|
|
|
|
# Callers should pass this as extra_body in chat.completions.create()
|
|
|
|
|
|
# when the auxiliary client is backed by Nous Portal.
|
|
|
|
|
|
NOUS_EXTRA_BODY = {"tags": ["product=hermes-agent"]}
|
|
|
|
|
|
|
|
|
|
|
|
# Set at resolve time — True if the auxiliary client points to Nous Portal
|
|
|
|
|
|
auxiliary_is_nous: bool = False
|
|
|
|
|
|
|
2026-02-22 02:16:11 -08:00
|
|
|
|
# Default auxiliary models per provider
|
|
|
|
|
|
_OPENROUTER_MODEL = "google/gemini-3-flash-preview"
|
2026-03-26 13:49:43 -07:00
|
|
|
|
_NOUS_MODEL = "google/gemini-3-flash-preview"
|
2026-02-22 02:16:11 -08:00
|
|
|
|
_NOUS_DEFAULT_BASE_URL = "https://inference-api.nousresearch.com/v1"
|
2026-03-14 21:14:20 -07:00
|
|
|
|
_ANTHROPIC_DEFAULT_BASE_URL = "https://api.anthropic.com"
|
fix(cli): respect HERMES_HOME in all remaining hardcoded ~/.hermes paths
Several files resolved paths via Path.home() / ".hermes" or
os.path.expanduser("~/.hermes/..."), bypassing the HERMES_HOME
environment variable. This broke isolation when running multiple
Hermes instances with distinct HERMES_HOME directories.
Replace all hardcoded paths with calls to get_hermes_home() from
hermes_cli.config, consistent with the rest of the codebase.
Files fixed:
- tools/process_registry.py (processes.json)
- gateway/pairing.py (pairing/)
- gateway/sticker_cache.py (sticker_cache.json)
- gateway/channel_directory.py (channel_directory.json, sessions.json)
- gateway/config.py (gateway.json, config.yaml, sessions_dir)
- gateway/mirror.py (sessions/)
- gateway/hooks.py (hooks/)
- gateway/platforms/base.py (image_cache/, audio_cache/, document_cache/)
- gateway/platforms/whatsapp.py (whatsapp/session)
- gateway/delivery.py (cron/output)
- agent/auxiliary_client.py (auth.json)
- agent/prompt_builder.py (SOUL.md)
- cli.py (config.yaml, images/, pastes/, history)
- run_agent.py (logs/)
- tools/environments/base.py (sandboxes/)
- tools/environments/modal.py (modal_snapshots.json)
- tools/environments/singularity.py (singularity_snapshots.json)
- tools/tts_tool.py (audio_cache)
- hermes_cli/status.py (cron/jobs.json, sessions.json)
- hermes_cli/gateway.py (logs/, whatsapp session)
- hermes_cli/main.py (whatsapp/session)
Tests updated to use HERMES_HOME env var instead of patching Path.home().
Closes #892
(cherry picked from commit 78ac1bba43b8b74a934c6172f2c29bb4d03164b9)
2026-03-11 07:31:41 +01:00
|
|
|
|
_AUTH_JSON_PATH = get_hermes_home() / "auth.json"
|
2026-02-22 02:16:11 -08:00
|
|
|
|
|
2026-02-28 21:47:51 -08:00
|
|
|
|
# Codex fallback: uses the Responses API (the only endpoint the Codex
|
|
|
|
|
|
# OAuth token can access) with a fast model for auxiliary tasks.
|
2026-03-14 23:21:09 -07:00
|
|
|
|
# ChatGPT-backed Codex accounts currently reject gpt-5.3-codex for these
|
|
|
|
|
|
# auxiliary flows, while gpt-5.2-codex remains broadly available and supports
|
|
|
|
|
|
# vision via Responses.
|
|
|
|
|
|
_CODEX_AUX_MODEL = "gpt-5.2-codex"
|
2026-02-28 21:47:51 -08:00
|
|
|
|
_CODEX_AUX_BASE_URL = "https://chatgpt.com/backend-api/codex"
|
|
|
|
|
|
|
|
|
|
|
|
|
fix(codex): pin correct Cloudflare headers and extend to auxiliary client
The cherry-picked salvage (admin28980's commit) added codex headers only on the
primary chat client path, with two inaccuracies:
- originator was 'hermes-agent' — Cloudflare whitelists codex_cli_rs,
codex_vscode, codex_sdk_ts, and Codex* prefixes. 'hermes-agent' isn't on
the list, so the header had no mitigating effect on the 403 (the
account-id header alone may have been carrying the fix).
- account-id header was 'ChatGPT-Account-Id' — upstream codex-rs auth.rs
uses canonical 'ChatGPT-Account-ID' (PascalCase, trailing -ID).
Also, the auxiliary client (_try_codex + resolve_provider_client raw_codex
branch) constructs OpenAI clients against the same chatgpt.com endpoint with
no default headers at all — so compression, title generation, vision, session
search, and web_extract all still 403 from VPS IPs.
Consolidate the header set into _codex_cloudflare_headers() in
agent/auxiliary_client.py (natural home next to _read_codex_access_token and
the existing JWT decode logic) and call it from all four insertion points:
- run_agent.py: AIAgent.__init__ (initial construction)
- run_agent.py: _apply_client_headers_for_base_url (credential rotation)
- agent/auxiliary_client.py: _try_codex (aux client)
- agent/auxiliary_client.py: resolve_provider_client raw_codex branch
Net: -36/+55 lines, -25 lines of duplicated inline JWT decode replaced by a
single helper. User-Agent switched to 'codex_cli_rs/0.0.0 (Hermes Agent)' to
match the codex-rs shape while keeping product attribution.
Tests in tests/agent/test_codex_cloudflare_headers.py cover:
- originator value, User-Agent shape, canonical header casing
- account-ID extraction from a real JWT fixture
- graceful handling of malformed / non-string / claim-missing tokens
- wiring at all four insertion points (primary init, rotation, both aux paths)
- non-chatgpt base URLs (openrouter) do NOT get codex headers
- switching away from chatgpt.com drops the headers
2026-04-19 11:58:15 -07:00
|
|
|
|
def _codex_cloudflare_headers(access_token: str) -> Dict[str, str]:
|
|
|
|
|
|
"""Headers required to avoid Cloudflare 403s on chatgpt.com/backend-api/codex.
|
|
|
|
|
|
|
|
|
|
|
|
The Cloudflare layer in front of the Codex endpoint whitelists a small set of
|
|
|
|
|
|
first-party originators (``codex_cli_rs``, ``codex_vscode``, ``codex_sdk_ts``,
|
|
|
|
|
|
anything starting with ``Codex``). Requests from non-residential IPs (VPS,
|
|
|
|
|
|
server-hosted agents) that don't advertise an allowed originator are served
|
|
|
|
|
|
a 403 with ``cf-mitigated: challenge`` regardless of auth correctness.
|
|
|
|
|
|
|
|
|
|
|
|
We pin ``originator: codex_cli_rs`` to match the upstream codex-rs CLI, set
|
|
|
|
|
|
``User-Agent`` to a codex_cli_rs-shaped string (beats SDK fingerprinting),
|
|
|
|
|
|
and extract ``ChatGPT-Account-ID`` (canonical casing, from codex-rs
|
|
|
|
|
|
``auth.rs``) out of the OAuth JWT's ``chatgpt_account_id`` claim.
|
|
|
|
|
|
|
|
|
|
|
|
Malformed tokens are tolerated — we drop the account-ID header rather than
|
|
|
|
|
|
raise, so a bad token still surfaces as an auth error (401) instead of a
|
|
|
|
|
|
crash at client construction.
|
|
|
|
|
|
"""
|
|
|
|
|
|
headers = {
|
|
|
|
|
|
"User-Agent": "codex_cli_rs/0.0.0 (Hermes Agent)",
|
|
|
|
|
|
"originator": "codex_cli_rs",
|
|
|
|
|
|
}
|
|
|
|
|
|
if not isinstance(access_token, str) or not access_token.strip():
|
|
|
|
|
|
return headers
|
|
|
|
|
|
try:
|
|
|
|
|
|
import base64
|
|
|
|
|
|
parts = access_token.split(".")
|
|
|
|
|
|
if len(parts) < 2:
|
|
|
|
|
|
return headers
|
|
|
|
|
|
payload_b64 = parts[1] + "=" * (-len(parts[1]) % 4)
|
|
|
|
|
|
claims = json.loads(base64.urlsafe_b64decode(payload_b64))
|
|
|
|
|
|
acct_id = claims.get("https://api.openai.com/auth", {}).get("chatgpt_account_id")
|
|
|
|
|
|
if isinstance(acct_id, str) and acct_id:
|
|
|
|
|
|
headers["ChatGPT-Account-ID"] = acct_id
|
|
|
|
|
|
except Exception:
|
|
|
|
|
|
pass
|
|
|
|
|
|
return headers
|
|
|
|
|
|
|
|
|
|
|
|
|
2026-04-07 22:23:28 -07:00
|
|
|
|
def _to_openai_base_url(base_url: str) -> str:
|
|
|
|
|
|
"""Normalize an Anthropic-style base URL to OpenAI-compatible format.
|
|
|
|
|
|
|
|
|
|
|
|
Some providers (MiniMax, MiniMax-CN) expose an ``/anthropic`` endpoint for
|
|
|
|
|
|
the Anthropic Messages API and a separate ``/v1`` endpoint for OpenAI chat
|
|
|
|
|
|
completions. The auxiliary client uses the OpenAI SDK, so it must hit the
|
|
|
|
|
|
``/v1`` surface. Passing the raw ``inference_base_url`` causes requests to
|
|
|
|
|
|
land on ``/anthropic/chat/completions`` — a 404.
|
|
|
|
|
|
"""
|
|
|
|
|
|
url = str(base_url or "").strip().rstrip("/")
|
|
|
|
|
|
if url.endswith("/anthropic"):
|
|
|
|
|
|
rewritten = url[: -len("/anthropic")] + "/v1"
|
|
|
|
|
|
logger.debug("Auxiliary client: rewrote base URL %s → %s", url, rewritten)
|
|
|
|
|
|
return rewritten
|
|
|
|
|
|
return url
|
|
|
|
|
|
|
|
|
|
|
|
|
feat(auth): same-provider credential pools with rotation, custom endpoint support, and interactive CLI (#2647)
* feat(auth): add same-provider credential pools and rotation UX
Add same-provider credential pooling so Hermes can rotate across
multiple credentials for a single provider, recover from exhausted
credentials without jumping providers immediately, and configure
that behavior directly in hermes setup.
- agent/credential_pool.py: persisted per-provider credential pools
- hermes auth add/list/remove/reset CLI commands
- 429/402/401 recovery with pool rotation in run_agent.py
- Setup wizard integration for pool strategy configuration
- Auto-seeding from env vars and existing OAuth state
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
Salvaged from PR #2647
* fix(tests): prevent pool auto-seeding from host env in credential pool tests
Tests for non-pool Anthropic paths and auth remove were failing when
host env vars (ANTHROPIC_API_KEY) or file-backed OAuth credentials
were present. The pool auto-seeding picked these up, causing unexpected
pool entries in tests.
- Mock _select_pool_entry in auxiliary_client OAuth flag tests
- Clear Anthropic env vars and mock _seed_from_singletons in auth remove test
* feat(auth): add thread safety, least_used strategy, and request counting
- Add threading.Lock to CredentialPool for gateway thread safety
(concurrent requests from multiple gateway sessions could race on
pool state mutations without this)
- Add 'least_used' rotation strategy that selects the credential
with the lowest request_count, distributing load more evenly
- Add request_count field to PooledCredential for usage tracking
- Add mark_used() method to increment per-credential request counts
- Wrap select(), mark_exhausted_and_rotate(), and try_refresh_current()
with lock acquisition
- Add tests: least_used selection, mark_used counting, concurrent
thread safety (4 threads × 20 selects with no corruption)
* feat(auth): add interactive mode for bare 'hermes auth' command
When 'hermes auth' is called without a subcommand, it now launches an
interactive wizard that:
1. Shows full credential pool status across all providers
2. Offers a menu: add, remove, reset cooldowns, set strategy
3. For OAuth-capable providers (anthropic, nous, openai-codex), the
add flow explicitly asks 'API key or OAuth login?' — making it
clear that both auth types are supported for the same provider
4. Strategy picker shows all 4 options (fill_first, round_robin,
least_used, random) with the current selection marked
5. Remove flow shows entries with indices for easy selection
The subcommand paths (hermes auth add/list/remove/reset) still work
exactly as before for scripted/non-interactive use.
* fix(tests): update runtime_provider tests for config.yaml source of truth (#4165)
Tests were using OPENAI_BASE_URL env var which is no longer consulted
after #4165. Updated to use model config (provider, base_url, api_key)
which is the new single source of truth for custom endpoint URLs.
* feat(auth): support custom endpoint credential pools keyed by provider name
Custom OpenAI-compatible endpoints all share provider='custom', making
the provider-keyed pool useless. Now pools for custom endpoints are
keyed by 'custom:<normalized_name>' where the name comes from the
custom_providers config list (auto-generated from URL hostname).
- Pool key format: 'custom:together.ai', 'custom:local-(localhost:8080)'
- load_pool('custom:name') seeds from custom_providers api_key AND
model.api_key when base_url matches
- hermes auth add/list now shows custom endpoints alongside registry
providers
- _resolve_openrouter_runtime and _resolve_named_custom_runtime check
pool before falling back to single config key
- 6 new tests covering custom pool keying, seeding, and listing
* docs: add Excalidraw diagram of full credential pool flow
Comprehensive architecture diagram showing:
- Credential sources (env vars, auth.json OAuth, config.yaml, CLI)
- Pool storage and auto-seeding
- Runtime resolution paths (registry, custom, OpenRouter)
- Error recovery (429 retry-then-rotate, 402 immediate, 401 refresh)
- CLI management commands and strategy configuration
Open at: https://excalidraw.com/#json=2Ycqhqpi6f12E_3ITyiwh,c7u9jSt5BwrmiVzHGbm87g
* fix(tests): update setup wizard pool tests for unified select_provider_and_model flow
The setup wizard now delegates to select_provider_and_model() instead
of using its own prompt_choice-based provider picker. Tests needed:
- Mock select_provider_and_model as no-op (provider pre-written to config)
- Call _stub_tts BEFORE custom prompt_choice mock (it overwrites it)
- Pre-write model.provider to config so the pool step is reached
* docs: add comprehensive credential pool documentation
- New page: website/docs/user-guide/features/credential-pools.md
Full guide covering quick start, CLI commands, rotation strategies,
error recovery, custom endpoint pools, auto-discovery, thread safety,
architecture, and storage format.
- Updated fallback-providers.md to reference credential pools as the
first layer of resilience (same-provider rotation before cross-provider)
- Added hermes auth to CLI commands reference with usage examples
- Added credential_pool_strategies to configuration guide
* chore: remove excalidraw diagram from repo (external link only)
* refactor: simplify credential pool code — extract helpers, collapse extras, dedup patterns
- _load_config_safe(): replace 4 identical try/except/import blocks
- _iter_custom_providers(): shared generator for custom provider iteration
- PooledCredential.extra dict: collapse 11 round-trip-only fields
(token_type, scope, client_id, portal_base_url, obtained_at,
expires_in, agent_key_id, agent_key_expires_in, agent_key_reused,
agent_key_obtained_at, tls) into a single extra dict with
__getattr__ for backward-compatible access
- _available_entries(): shared exhaustion-check between select and peek
- Dedup anthropic OAuth seeding (hermes_pkce + claude_code identical)
- SimpleNamespace replaces class _Args boilerplate in auth_commands
- _try_resolve_from_custom_pool(): shared pool-check in runtime_provider
Net -17 lines. All 383 targeted tests pass.
---------
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
2026-03-31 03:10:01 -07:00
|
|
|
|
def _select_pool_entry(provider: str) -> Tuple[bool, Optional[Any]]:
|
|
|
|
|
|
"""Return (pool_exists_for_provider, selected_entry)."""
|
|
|
|
|
|
try:
|
|
|
|
|
|
pool = load_pool(provider)
|
|
|
|
|
|
except Exception as exc:
|
|
|
|
|
|
logger.debug("Auxiliary client: could not load pool for %s: %s", provider, exc)
|
|
|
|
|
|
return False, None
|
|
|
|
|
|
if not pool or not pool.has_credentials():
|
|
|
|
|
|
return False, None
|
|
|
|
|
|
try:
|
|
|
|
|
|
return True, pool.select()
|
|
|
|
|
|
except Exception as exc:
|
|
|
|
|
|
logger.debug("Auxiliary client: could not select pool entry for %s: %s", provider, exc)
|
|
|
|
|
|
return True, None
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def _pool_runtime_api_key(entry: Any) -> str:
|
|
|
|
|
|
if entry is None:
|
|
|
|
|
|
return ""
|
|
|
|
|
|
# Use the PooledCredential.runtime_api_key property which handles
|
|
|
|
|
|
# provider-specific fallback (e.g. agent_key for nous).
|
|
|
|
|
|
key = getattr(entry, "runtime_api_key", None) or getattr(entry, "access_token", "")
|
|
|
|
|
|
return str(key or "").strip()
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def _pool_runtime_base_url(entry: Any, fallback: str = "") -> str:
|
|
|
|
|
|
if entry is None:
|
|
|
|
|
|
return str(fallback or "").strip().rstrip("/")
|
|
|
|
|
|
# runtime_base_url handles provider-specific logic (e.g. nous prefers inference_base_url).
|
|
|
|
|
|
# Fall back through inference_base_url and base_url for non-PooledCredential entries.
|
|
|
|
|
|
url = (
|
|
|
|
|
|
getattr(entry, "runtime_base_url", None)
|
|
|
|
|
|
or getattr(entry, "inference_base_url", None)
|
|
|
|
|
|
or getattr(entry, "base_url", None)
|
|
|
|
|
|
or fallback
|
|
|
|
|
|
)
|
|
|
|
|
|
return str(url or "").strip().rstrip("/")
|
|
|
|
|
|
|
|
|
|
|
|
|
2026-02-28 21:47:51 -08:00
|
|
|
|
# ── Codex Responses → chat.completions adapter ─────────────────────────────
|
|
|
|
|
|
# All auxiliary consumers call client.chat.completions.create(**kwargs) and
|
|
|
|
|
|
# read response.choices[0].message.content. This adapter translates those
|
|
|
|
|
|
# calls to the Codex Responses API so callers don't need any changes.
|
|
|
|
|
|
|
2026-03-08 18:44:25 -07:00
|
|
|
|
|
|
|
|
|
|
def _convert_content_for_responses(content: Any) -> Any:
|
|
|
|
|
|
"""Convert chat.completions content to Responses API format.
|
|
|
|
|
|
|
|
|
|
|
|
chat.completions uses:
|
|
|
|
|
|
{"type": "text", "text": "..."}
|
|
|
|
|
|
{"type": "image_url", "image_url": {"url": "data:image/png;base64,..."}}
|
|
|
|
|
|
|
|
|
|
|
|
Responses API uses:
|
|
|
|
|
|
{"type": "input_text", "text": "..."}
|
|
|
|
|
|
{"type": "input_image", "image_url": "data:image/png;base64,..."}
|
|
|
|
|
|
|
|
|
|
|
|
If content is a plain string, it's returned as-is (the Responses API
|
|
|
|
|
|
accepts strings directly for text-only messages).
|
|
|
|
|
|
"""
|
|
|
|
|
|
if isinstance(content, str):
|
|
|
|
|
|
return content
|
|
|
|
|
|
if not isinstance(content, list):
|
|
|
|
|
|
return str(content) if content else ""
|
|
|
|
|
|
|
|
|
|
|
|
converted: List[Dict[str, Any]] = []
|
|
|
|
|
|
for part in content:
|
|
|
|
|
|
if not isinstance(part, dict):
|
|
|
|
|
|
continue
|
|
|
|
|
|
ptype = part.get("type", "")
|
|
|
|
|
|
if ptype == "text":
|
|
|
|
|
|
converted.append({"type": "input_text", "text": part.get("text", "")})
|
|
|
|
|
|
elif ptype == "image_url":
|
|
|
|
|
|
# chat.completions nests the URL: {"image_url": {"url": "..."}}
|
|
|
|
|
|
image_data = part.get("image_url", {})
|
|
|
|
|
|
url = image_data.get("url", "") if isinstance(image_data, dict) else str(image_data)
|
|
|
|
|
|
entry: Dict[str, Any] = {"type": "input_image", "image_url": url}
|
|
|
|
|
|
# Preserve detail if specified
|
|
|
|
|
|
detail = image_data.get("detail") if isinstance(image_data, dict) else None
|
|
|
|
|
|
if detail:
|
|
|
|
|
|
entry["detail"] = detail
|
|
|
|
|
|
converted.append(entry)
|
|
|
|
|
|
elif ptype in ("input_text", "input_image"):
|
|
|
|
|
|
# Already in Responses format — pass through
|
|
|
|
|
|
converted.append(part)
|
|
|
|
|
|
else:
|
|
|
|
|
|
# Unknown content type — try to preserve as text
|
|
|
|
|
|
text = part.get("text", "")
|
|
|
|
|
|
if text:
|
|
|
|
|
|
converted.append({"type": "input_text", "text": text})
|
|
|
|
|
|
|
|
|
|
|
|
return converted or ""
|
|
|
|
|
|
|
|
|
|
|
|
|
2026-02-28 21:47:51 -08:00
|
|
|
|
class _CodexCompletionsAdapter:
|
|
|
|
|
|
"""Drop-in shim that accepts chat.completions.create() kwargs and
|
|
|
|
|
|
routes them through the Codex Responses streaming API."""
|
|
|
|
|
|
|
|
|
|
|
|
def __init__(self, real_client: OpenAI, model: str):
|
|
|
|
|
|
self._client = real_client
|
|
|
|
|
|
self._model = model
|
|
|
|
|
|
|
|
|
|
|
|
def create(self, **kwargs) -> Any:
|
|
|
|
|
|
messages = kwargs.get("messages", [])
|
|
|
|
|
|
model = kwargs.get("model", self._model)
|
|
|
|
|
|
|
2026-03-08 18:44:25 -07:00
|
|
|
|
# Separate system/instructions from conversation messages.
|
|
|
|
|
|
# Convert chat.completions multimodal content blocks to Responses
|
|
|
|
|
|
# API format (input_text / input_image instead of text / image_url).
|
2026-02-28 21:47:51 -08:00
|
|
|
|
instructions = "You are a helpful assistant."
|
|
|
|
|
|
input_msgs: List[Dict[str, Any]] = []
|
|
|
|
|
|
for msg in messages:
|
|
|
|
|
|
role = msg.get("role", "user")
|
2026-03-02 02:23:53 -08:00
|
|
|
|
content = msg.get("content") or ""
|
2026-02-28 21:47:51 -08:00
|
|
|
|
if role == "system":
|
2026-03-08 18:44:25 -07:00
|
|
|
|
instructions = content if isinstance(content, str) else str(content)
|
2026-02-28 21:47:51 -08:00
|
|
|
|
else:
|
2026-03-08 18:44:25 -07:00
|
|
|
|
input_msgs.append({
|
|
|
|
|
|
"role": role,
|
|
|
|
|
|
"content": _convert_content_for_responses(content),
|
|
|
|
|
|
})
|
2026-02-28 21:47:51 -08:00
|
|
|
|
|
|
|
|
|
|
resp_kwargs: Dict[str, Any] = {
|
|
|
|
|
|
"model": model,
|
|
|
|
|
|
"instructions": instructions,
|
|
|
|
|
|
"input": input_msgs or [{"role": "user", "content": ""}],
|
|
|
|
|
|
"store": False,
|
|
|
|
|
|
}
|
|
|
|
|
|
|
2026-03-08 18:44:25 -07:00
|
|
|
|
# Note: the Codex endpoint (chatgpt.com/backend-api/codex) does NOT
|
|
|
|
|
|
# support max_output_tokens or temperature — omit to avoid 400 errors.
|
2026-02-28 21:47:51 -08:00
|
|
|
|
|
|
|
|
|
|
# Tools support for flush_memories and similar callers
|
|
|
|
|
|
tools = kwargs.get("tools")
|
|
|
|
|
|
if tools:
|
|
|
|
|
|
converted = []
|
|
|
|
|
|
for t in tools:
|
|
|
|
|
|
fn = t.get("function", {}) if isinstance(t, dict) else {}
|
|
|
|
|
|
name = fn.get("name")
|
|
|
|
|
|
if not name:
|
|
|
|
|
|
continue
|
|
|
|
|
|
converted.append({
|
|
|
|
|
|
"type": "function",
|
|
|
|
|
|
"name": name,
|
|
|
|
|
|
"description": fn.get("description", ""),
|
|
|
|
|
|
"parameters": fn.get("parameters", {}),
|
|
|
|
|
|
})
|
|
|
|
|
|
if converted:
|
|
|
|
|
|
resp_kwargs["tools"] = converted
|
|
|
|
|
|
|
|
|
|
|
|
# Stream and collect the response
|
|
|
|
|
|
text_parts: List[str] = []
|
|
|
|
|
|
tool_calls_raw: List[Any] = []
|
|
|
|
|
|
usage = None
|
|
|
|
|
|
|
|
|
|
|
|
try:
|
2026-04-06 21:13:22 -07:00
|
|
|
|
# Collect output items and text deltas during streaming —
|
|
|
|
|
|
# the Codex backend can return empty response.output from
|
|
|
|
|
|
# get_final_response() even when items were streamed.
|
|
|
|
|
|
collected_output_items: List[Any] = []
|
|
|
|
|
|
collected_text_deltas: List[str] = []
|
2026-04-06 21:35:33 -07:00
|
|
|
|
has_function_calls = False
|
2026-02-28 21:47:51 -08:00
|
|
|
|
with self._client.responses.stream(**resp_kwargs) as stream:
|
|
|
|
|
|
for _event in stream:
|
2026-04-06 21:13:22 -07:00
|
|
|
|
_etype = getattr(_event, "type", "")
|
|
|
|
|
|
if _etype == "response.output_item.done":
|
|
|
|
|
|
_done = getattr(_event, "item", None)
|
|
|
|
|
|
if _done is not None:
|
|
|
|
|
|
collected_output_items.append(_done)
|
|
|
|
|
|
elif "output_text.delta" in _etype:
|
|
|
|
|
|
_delta = getattr(_event, "delta", "")
|
|
|
|
|
|
if _delta:
|
|
|
|
|
|
collected_text_deltas.append(_delta)
|
2026-04-06 21:35:33 -07:00
|
|
|
|
elif "function_call" in _etype:
|
|
|
|
|
|
has_function_calls = True
|
2026-02-28 21:47:51 -08:00
|
|
|
|
final = stream.get_final_response()
|
|
|
|
|
|
|
2026-04-06 21:13:22 -07:00
|
|
|
|
# Backfill empty output from collected stream events
|
|
|
|
|
|
_output = getattr(final, "output", None)
|
|
|
|
|
|
if isinstance(_output, list) and not _output:
|
|
|
|
|
|
if collected_output_items:
|
|
|
|
|
|
final.output = list(collected_output_items)
|
|
|
|
|
|
logger.debug(
|
|
|
|
|
|
"Codex auxiliary: backfilled %d output items from stream events",
|
|
|
|
|
|
len(collected_output_items),
|
|
|
|
|
|
)
|
2026-04-06 21:35:33 -07:00
|
|
|
|
elif collected_text_deltas and not has_function_calls:
|
|
|
|
|
|
# Only synthesize text when no tool calls were streamed —
|
|
|
|
|
|
# a function_call response with incidental text should not
|
|
|
|
|
|
# be collapsed into a plain-text message.
|
2026-04-06 21:13:22 -07:00
|
|
|
|
assembled = "".join(collected_text_deltas)
|
|
|
|
|
|
final.output = [SimpleNamespace(
|
|
|
|
|
|
type="message", role="assistant", status="completed",
|
|
|
|
|
|
content=[SimpleNamespace(type="output_text", text=assembled)],
|
|
|
|
|
|
)]
|
|
|
|
|
|
logger.debug(
|
|
|
|
|
|
"Codex auxiliary: synthesized from %d deltas (%d chars)",
|
|
|
|
|
|
len(collected_text_deltas), len(assembled),
|
|
|
|
|
|
)
|
|
|
|
|
|
|
2026-04-06 21:35:33 -07:00
|
|
|
|
# Extract text and tool calls from the Responses output.
|
|
|
|
|
|
# Items may be SDK objects (attrs) or dicts (raw/fallback paths),
|
|
|
|
|
|
# so use a helper that handles both shapes.
|
|
|
|
|
|
def _item_get(obj: Any, key: str, default: Any = None) -> Any:
|
|
|
|
|
|
val = getattr(obj, key, None)
|
|
|
|
|
|
if val is None and isinstance(obj, dict):
|
|
|
|
|
|
val = obj.get(key, default)
|
|
|
|
|
|
return val if val is not None else default
|
|
|
|
|
|
|
2026-02-28 21:47:51 -08:00
|
|
|
|
for item in getattr(final, "output", []):
|
2026-04-06 21:35:33 -07:00
|
|
|
|
item_type = _item_get(item, "type")
|
2026-02-28 21:47:51 -08:00
|
|
|
|
if item_type == "message":
|
2026-04-06 21:35:33 -07:00
|
|
|
|
for part in (_item_get(item, "content") or []):
|
|
|
|
|
|
ptype = _item_get(part, "type")
|
2026-02-28 21:47:51 -08:00
|
|
|
|
if ptype in ("output_text", "text"):
|
2026-04-06 21:35:33 -07:00
|
|
|
|
text_parts.append(_item_get(part, "text", ""))
|
2026-02-28 21:47:51 -08:00
|
|
|
|
elif item_type == "function_call":
|
|
|
|
|
|
tool_calls_raw.append(SimpleNamespace(
|
2026-04-06 21:35:33 -07:00
|
|
|
|
id=_item_get(item, "call_id", ""),
|
2026-02-28 21:47:51 -08:00
|
|
|
|
type="function",
|
|
|
|
|
|
function=SimpleNamespace(
|
2026-04-06 21:35:33 -07:00
|
|
|
|
name=_item_get(item, "name", ""),
|
|
|
|
|
|
arguments=_item_get(item, "arguments", "{}"),
|
2026-02-28 21:47:51 -08:00
|
|
|
|
),
|
|
|
|
|
|
))
|
|
|
|
|
|
|
|
|
|
|
|
resp_usage = getattr(final, "usage", None)
|
|
|
|
|
|
if resp_usage:
|
|
|
|
|
|
usage = SimpleNamespace(
|
|
|
|
|
|
prompt_tokens=getattr(resp_usage, "input_tokens", 0),
|
|
|
|
|
|
completion_tokens=getattr(resp_usage, "output_tokens", 0),
|
|
|
|
|
|
total_tokens=getattr(resp_usage, "total_tokens", 0),
|
|
|
|
|
|
)
|
|
|
|
|
|
except Exception as exc:
|
|
|
|
|
|
logger.debug("Codex auxiliary Responses API call failed: %s", exc)
|
|
|
|
|
|
raise
|
|
|
|
|
|
|
|
|
|
|
|
content = "".join(text_parts).strip() or None
|
|
|
|
|
|
|
|
|
|
|
|
# Build a response that looks like chat.completions
|
|
|
|
|
|
message = SimpleNamespace(
|
|
|
|
|
|
role="assistant",
|
|
|
|
|
|
content=content,
|
|
|
|
|
|
tool_calls=tool_calls_raw or None,
|
|
|
|
|
|
)
|
|
|
|
|
|
choice = SimpleNamespace(
|
|
|
|
|
|
index=0,
|
|
|
|
|
|
message=message,
|
|
|
|
|
|
finish_reason="stop" if not tool_calls_raw else "tool_calls",
|
|
|
|
|
|
)
|
|
|
|
|
|
return SimpleNamespace(
|
|
|
|
|
|
choices=[choice],
|
|
|
|
|
|
model=model,
|
|
|
|
|
|
usage=usage,
|
|
|
|
|
|
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
class _CodexChatShim:
|
|
|
|
|
|
"""Wraps the adapter to provide client.chat.completions.create()."""
|
|
|
|
|
|
|
|
|
|
|
|
def __init__(self, adapter: _CodexCompletionsAdapter):
|
|
|
|
|
|
self.completions = adapter
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
class CodexAuxiliaryClient:
|
|
|
|
|
|
"""OpenAI-client-compatible wrapper that routes through Codex Responses API.
|
|
|
|
|
|
|
|
|
|
|
|
Consumers can call client.chat.completions.create(**kwargs) as normal.
|
|
|
|
|
|
Also exposes .api_key and .base_url for introspection by async wrappers.
|
|
|
|
|
|
"""
|
|
|
|
|
|
|
|
|
|
|
|
def __init__(self, real_client: OpenAI, model: str):
|
|
|
|
|
|
self._real_client = real_client
|
|
|
|
|
|
adapter = _CodexCompletionsAdapter(real_client, model)
|
|
|
|
|
|
self.chat = _CodexChatShim(adapter)
|
|
|
|
|
|
self.api_key = real_client.api_key
|
|
|
|
|
|
self.base_url = real_client.base_url
|
|
|
|
|
|
|
|
|
|
|
|
def close(self):
|
|
|
|
|
|
self._real_client.close()
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
class _AsyncCodexCompletionsAdapter:
|
|
|
|
|
|
"""Async version of the Codex Responses adapter.
|
|
|
|
|
|
|
|
|
|
|
|
Wraps the sync adapter via asyncio.to_thread() so async consumers
|
|
|
|
|
|
(web_tools, session_search) can await it as normal.
|
|
|
|
|
|
"""
|
|
|
|
|
|
|
|
|
|
|
|
def __init__(self, sync_adapter: _CodexCompletionsAdapter):
|
|
|
|
|
|
self._sync = sync_adapter
|
|
|
|
|
|
|
|
|
|
|
|
async def create(self, **kwargs) -> Any:
|
|
|
|
|
|
import asyncio
|
|
|
|
|
|
return await asyncio.to_thread(self._sync.create, **kwargs)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
class _AsyncCodexChatShim:
|
|
|
|
|
|
def __init__(self, adapter: _AsyncCodexCompletionsAdapter):
|
|
|
|
|
|
self.completions = adapter
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
class AsyncCodexAuxiliaryClient:
|
|
|
|
|
|
"""Async-compatible wrapper matching AsyncOpenAI.chat.completions.create()."""
|
|
|
|
|
|
|
|
|
|
|
|
def __init__(self, sync_wrapper: "CodexAuxiliaryClient"):
|
|
|
|
|
|
sync_adapter = sync_wrapper.chat.completions
|
|
|
|
|
|
async_adapter = _AsyncCodexCompletionsAdapter(sync_adapter)
|
|
|
|
|
|
self.chat = _AsyncCodexChatShim(async_adapter)
|
|
|
|
|
|
self.api_key = sync_wrapper.api_key
|
|
|
|
|
|
self.base_url = sync_wrapper.base_url
|
|
|
|
|
|
|
2026-02-22 02:16:11 -08:00
|
|
|
|
|
2026-03-14 21:14:20 -07:00
|
|
|
|
class _AnthropicCompletionsAdapter:
|
|
|
|
|
|
"""OpenAI-client-compatible adapter for Anthropic Messages API."""
|
|
|
|
|
|
|
2026-03-21 17:36:25 -07:00
|
|
|
|
def __init__(self, real_client: Any, model: str, is_oauth: bool = False):
|
2026-03-14 21:14:20 -07:00
|
|
|
|
self._client = real_client
|
|
|
|
|
|
self._model = model
|
2026-03-21 17:36:25 -07:00
|
|
|
|
self._is_oauth = is_oauth
|
2026-03-14 21:14:20 -07:00
|
|
|
|
|
|
|
|
|
|
def create(self, **kwargs) -> Any:
|
2026-04-23 13:39:44 +05:30
|
|
|
|
from agent.anthropic_adapter import build_anthropic_kwargs
|
|
|
|
|
|
from agent.transports import get_transport
|
2026-03-14 21:14:20 -07:00
|
|
|
|
|
|
|
|
|
|
messages = kwargs.get("messages", [])
|
|
|
|
|
|
model = kwargs.get("model", self._model)
|
|
|
|
|
|
tools = kwargs.get("tools")
|
|
|
|
|
|
tool_choice = kwargs.get("tool_choice")
|
|
|
|
|
|
max_tokens = kwargs.get("max_tokens") or kwargs.get("max_completion_tokens") or 2000
|
|
|
|
|
|
temperature = kwargs.get("temperature")
|
|
|
|
|
|
|
|
|
|
|
|
normalized_tool_choice = None
|
|
|
|
|
|
if isinstance(tool_choice, str):
|
|
|
|
|
|
normalized_tool_choice = tool_choice
|
|
|
|
|
|
elif isinstance(tool_choice, dict):
|
|
|
|
|
|
choice_type = str(tool_choice.get("type", "")).lower()
|
|
|
|
|
|
if choice_type == "function":
|
|
|
|
|
|
normalized_tool_choice = tool_choice.get("function", {}).get("name")
|
|
|
|
|
|
elif choice_type in {"auto", "required", "none"}:
|
|
|
|
|
|
normalized_tool_choice = choice_type
|
|
|
|
|
|
|
|
|
|
|
|
anthropic_kwargs = build_anthropic_kwargs(
|
|
|
|
|
|
model=model,
|
|
|
|
|
|
messages=messages,
|
|
|
|
|
|
tools=tools,
|
|
|
|
|
|
max_tokens=max_tokens,
|
|
|
|
|
|
reasoning_config=None,
|
|
|
|
|
|
tool_choice=normalized_tool_choice,
|
2026-03-21 17:36:25 -07:00
|
|
|
|
is_oauth=self._is_oauth,
|
2026-03-14 21:14:20 -07:00
|
|
|
|
)
|
fix(agent): complete Claude Opus 4.7 API migration
Claude Opus 4.7 introduced several breaking API changes that the current
codebase partially handled but not completely. This patch finishes the
migration per the official migration guide at
https://platform.claude.com/docs/en/about-claude/models/migration-guide
Fixes NousResearch/hermes-agent#11137
Breaking-change coverage:
1. Adaptive thinking + output_config.effort — 4.7 is now recognized by
_supports_adaptive_thinking() (extends previous 4.6-only gate).
2. Sampling parameter stripping — 4.7 returns 400 for any non-default
temperature / top_p / top_k. build_anthropic_kwargs drops them as a
safety net; the OpenAI-protocol auxiliary path (_build_call_kwargs)
and AnthropicCompletionsAdapter.create() both early-exit before
setting temperature for 4.7+ models. This keeps flush_memories and
structured-JSON aux paths that hardcode temperature from 400ing
when the aux model is flipped to 4.7.
3. thinking.display = "summarized" — 4.7 defaults display to "omitted",
which silently hides reasoning text from Hermes's CLI activity feed
during long tool runs. Restoring "summarized" preserves 4.6 UX.
4. Effort level mapping — xhigh now maps to xhigh (was xhigh→max, which
silently over-efforted every coding/agentic request). max is now a
distinct ceiling per Anthropic's 5-level effort model.
5. New stop_reason values — refusal and model_context_window_exceeded
were silently collapsed to "stop" (end_turn) by the adapter's
stop_reason_map. Now mapped to "content_filter" and "length"
respectively, matching upstream finish-reason handling already in
bedrock_adapter.
6. Model catalogs — claude-opus-4-7 added to the Anthropic provider
list, anthropic/claude-opus-4.7 added at top of OpenRouter fallback
catalog (recommended), claude-opus-4-7 added to model_metadata
DEFAULT_CONTEXT_LENGTHS (1M, matching 4.6 per migration guide).
7. Prefill docstrings — run_agent.AIAgent and BatchRunner now document
that Anthropic Sonnet/Opus 4.6+ reject a trailing assistant-role
prefill (400).
8. Tests — 4 new tests in test_anthropic_adapter covering display
default, xhigh preservation, max on 4.7, refusal / context-overflow
stop_reason mapping, plus the sampling-param predicate. test_model_metadata
accepts 4.7 at 1M context.
Tested on macOS 15.5 (darwin). 119 tests pass in
tests/agent/test_anthropic_adapter.py, 1320 pass in tests/agent/.
2026-04-16 12:35:43 -05:00
|
|
|
|
# Opus 4.7+ rejects any non-default temperature/top_p/top_k; only set
|
|
|
|
|
|
# temperature for models that still accept it. build_anthropic_kwargs
|
|
|
|
|
|
# additionally strips these keys as a safety net — keep both layers.
|
2026-03-14 21:14:20 -07:00
|
|
|
|
if temperature is not None:
|
fix(agent): complete Claude Opus 4.7 API migration
Claude Opus 4.7 introduced several breaking API changes that the current
codebase partially handled but not completely. This patch finishes the
migration per the official migration guide at
https://platform.claude.com/docs/en/about-claude/models/migration-guide
Fixes NousResearch/hermes-agent#11137
Breaking-change coverage:
1. Adaptive thinking + output_config.effort — 4.7 is now recognized by
_supports_adaptive_thinking() (extends previous 4.6-only gate).
2. Sampling parameter stripping — 4.7 returns 400 for any non-default
temperature / top_p / top_k. build_anthropic_kwargs drops them as a
safety net; the OpenAI-protocol auxiliary path (_build_call_kwargs)
and AnthropicCompletionsAdapter.create() both early-exit before
setting temperature for 4.7+ models. This keeps flush_memories and
structured-JSON aux paths that hardcode temperature from 400ing
when the aux model is flipped to 4.7.
3. thinking.display = "summarized" — 4.7 defaults display to "omitted",
which silently hides reasoning text from Hermes's CLI activity feed
during long tool runs. Restoring "summarized" preserves 4.6 UX.
4. Effort level mapping — xhigh now maps to xhigh (was xhigh→max, which
silently over-efforted every coding/agentic request). max is now a
distinct ceiling per Anthropic's 5-level effort model.
5. New stop_reason values — refusal and model_context_window_exceeded
were silently collapsed to "stop" (end_turn) by the adapter's
stop_reason_map. Now mapped to "content_filter" and "length"
respectively, matching upstream finish-reason handling already in
bedrock_adapter.
6. Model catalogs — claude-opus-4-7 added to the Anthropic provider
list, anthropic/claude-opus-4.7 added at top of OpenRouter fallback
catalog (recommended), claude-opus-4-7 added to model_metadata
DEFAULT_CONTEXT_LENGTHS (1M, matching 4.6 per migration guide).
7. Prefill docstrings — run_agent.AIAgent and BatchRunner now document
that Anthropic Sonnet/Opus 4.6+ reject a trailing assistant-role
prefill (400).
8. Tests — 4 new tests in test_anthropic_adapter covering display
default, xhigh preservation, max on 4.7, refusal / context-overflow
stop_reason mapping, plus the sampling-param predicate. test_model_metadata
accepts 4.7 at 1M context.
Tested on macOS 15.5 (darwin). 119 tests pass in
tests/agent/test_anthropic_adapter.py, 1320 pass in tests/agent/.
2026-04-16 12:35:43 -05:00
|
|
|
|
from agent.anthropic_adapter import _forbids_sampling_params
|
|
|
|
|
|
if not _forbids_sampling_params(model):
|
|
|
|
|
|
anthropic_kwargs["temperature"] = temperature
|
2026-03-14 21:14:20 -07:00
|
|
|
|
|
|
|
|
|
|
response = self._client.messages.create(**anthropic_kwargs)
|
2026-04-23 13:39:44 +05:30
|
|
|
|
_transport = get_transport("anthropic_messages")
|
|
|
|
|
|
_nr = _transport.normalize_response(
|
|
|
|
|
|
response, strip_tool_prefix=self._is_oauth
|
|
|
|
|
|
)
|
|
|
|
|
|
|
2026-04-23 14:06:36 +05:30
|
|
|
|
# ToolCall already duck-types as OpenAI shape (.type, .function.name,
|
|
|
|
|
|
# .function.arguments) via properties, so no wrapping needed.
|
2026-04-23 13:39:44 +05:30
|
|
|
|
assistant_message = SimpleNamespace(
|
|
|
|
|
|
content=_nr.content,
|
2026-04-23 14:06:36 +05:30
|
|
|
|
tool_calls=_nr.tool_calls,
|
2026-04-23 13:39:44 +05:30
|
|
|
|
reasoning=_nr.reasoning,
|
|
|
|
|
|
)
|
|
|
|
|
|
finish_reason = _nr.finish_reason
|
2026-03-14 21:14:20 -07:00
|
|
|
|
|
|
|
|
|
|
usage = None
|
|
|
|
|
|
if hasattr(response, "usage") and response.usage:
|
|
|
|
|
|
prompt_tokens = getattr(response.usage, "input_tokens", 0) or 0
|
|
|
|
|
|
completion_tokens = getattr(response.usage, "output_tokens", 0) or 0
|
|
|
|
|
|
total_tokens = getattr(response.usage, "total_tokens", 0) or (prompt_tokens + completion_tokens)
|
|
|
|
|
|
usage = SimpleNamespace(
|
|
|
|
|
|
prompt_tokens=prompt_tokens,
|
|
|
|
|
|
completion_tokens=completion_tokens,
|
|
|
|
|
|
total_tokens=total_tokens,
|
|
|
|
|
|
)
|
|
|
|
|
|
|
|
|
|
|
|
choice = SimpleNamespace(
|
|
|
|
|
|
index=0,
|
|
|
|
|
|
message=assistant_message,
|
|
|
|
|
|
finish_reason=finish_reason,
|
|
|
|
|
|
)
|
|
|
|
|
|
return SimpleNamespace(
|
|
|
|
|
|
choices=[choice],
|
|
|
|
|
|
model=model,
|
|
|
|
|
|
usage=usage,
|
|
|
|
|
|
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
class _AnthropicChatShim:
|
|
|
|
|
|
def __init__(self, adapter: _AnthropicCompletionsAdapter):
|
|
|
|
|
|
self.completions = adapter
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
class AnthropicAuxiliaryClient:
|
|
|
|
|
|
"""OpenAI-client-compatible wrapper over a native Anthropic client."""
|
|
|
|
|
|
|
2026-03-21 17:36:25 -07:00
|
|
|
|
def __init__(self, real_client: Any, model: str, api_key: str, base_url: str, is_oauth: bool = False):
|
2026-03-14 21:14:20 -07:00
|
|
|
|
self._real_client = real_client
|
2026-03-21 17:36:25 -07:00
|
|
|
|
adapter = _AnthropicCompletionsAdapter(real_client, model, is_oauth=is_oauth)
|
2026-03-14 21:14:20 -07:00
|
|
|
|
self.chat = _AnthropicChatShim(adapter)
|
|
|
|
|
|
self.api_key = api_key
|
|
|
|
|
|
self.base_url = base_url
|
|
|
|
|
|
|
|
|
|
|
|
def close(self):
|
|
|
|
|
|
close_fn = getattr(self._real_client, "close", None)
|
|
|
|
|
|
if callable(close_fn):
|
|
|
|
|
|
close_fn()
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
class _AsyncAnthropicCompletionsAdapter:
|
|
|
|
|
|
def __init__(self, sync_adapter: _AnthropicCompletionsAdapter):
|
|
|
|
|
|
self._sync = sync_adapter
|
|
|
|
|
|
|
|
|
|
|
|
async def create(self, **kwargs) -> Any:
|
|
|
|
|
|
import asyncio
|
|
|
|
|
|
return await asyncio.to_thread(self._sync.create, **kwargs)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
class _AsyncAnthropicChatShim:
|
|
|
|
|
|
def __init__(self, adapter: _AsyncAnthropicCompletionsAdapter):
|
|
|
|
|
|
self.completions = adapter
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
class AsyncAnthropicAuxiliaryClient:
|
|
|
|
|
|
def __init__(self, sync_wrapper: "AnthropicAuxiliaryClient"):
|
|
|
|
|
|
sync_adapter = sync_wrapper.chat.completions
|
|
|
|
|
|
async_adapter = _AsyncAnthropicCompletionsAdapter(sync_adapter)
|
|
|
|
|
|
self.chat = _AsyncAnthropicChatShim(async_adapter)
|
|
|
|
|
|
self.api_key = sync_wrapper.api_key
|
|
|
|
|
|
self.base_url = sync_wrapper.base_url
|
|
|
|
|
|
|
|
|
|
|
|
|
2026-02-22 02:16:11 -08:00
|
|
|
|
def _read_nous_auth() -> Optional[dict]:
|
|
|
|
|
|
"""Read and validate ~/.hermes/auth.json for an active Nous provider.
|
|
|
|
|
|
|
|
|
|
|
|
Returns the provider state dict if Nous is active with tokens,
|
|
|
|
|
|
otherwise None.
|
|
|
|
|
|
"""
|
feat(auth): same-provider credential pools with rotation, custom endpoint support, and interactive CLI (#2647)
* feat(auth): add same-provider credential pools and rotation UX
Add same-provider credential pooling so Hermes can rotate across
multiple credentials for a single provider, recover from exhausted
credentials without jumping providers immediately, and configure
that behavior directly in hermes setup.
- agent/credential_pool.py: persisted per-provider credential pools
- hermes auth add/list/remove/reset CLI commands
- 429/402/401 recovery with pool rotation in run_agent.py
- Setup wizard integration for pool strategy configuration
- Auto-seeding from env vars and existing OAuth state
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
Salvaged from PR #2647
* fix(tests): prevent pool auto-seeding from host env in credential pool tests
Tests for non-pool Anthropic paths and auth remove were failing when
host env vars (ANTHROPIC_API_KEY) or file-backed OAuth credentials
were present. The pool auto-seeding picked these up, causing unexpected
pool entries in tests.
- Mock _select_pool_entry in auxiliary_client OAuth flag tests
- Clear Anthropic env vars and mock _seed_from_singletons in auth remove test
* feat(auth): add thread safety, least_used strategy, and request counting
- Add threading.Lock to CredentialPool for gateway thread safety
(concurrent requests from multiple gateway sessions could race on
pool state mutations without this)
- Add 'least_used' rotation strategy that selects the credential
with the lowest request_count, distributing load more evenly
- Add request_count field to PooledCredential for usage tracking
- Add mark_used() method to increment per-credential request counts
- Wrap select(), mark_exhausted_and_rotate(), and try_refresh_current()
with lock acquisition
- Add tests: least_used selection, mark_used counting, concurrent
thread safety (4 threads × 20 selects with no corruption)
* feat(auth): add interactive mode for bare 'hermes auth' command
When 'hermes auth' is called without a subcommand, it now launches an
interactive wizard that:
1. Shows full credential pool status across all providers
2. Offers a menu: add, remove, reset cooldowns, set strategy
3. For OAuth-capable providers (anthropic, nous, openai-codex), the
add flow explicitly asks 'API key or OAuth login?' — making it
clear that both auth types are supported for the same provider
4. Strategy picker shows all 4 options (fill_first, round_robin,
least_used, random) with the current selection marked
5. Remove flow shows entries with indices for easy selection
The subcommand paths (hermes auth add/list/remove/reset) still work
exactly as before for scripted/non-interactive use.
* fix(tests): update runtime_provider tests for config.yaml source of truth (#4165)
Tests were using OPENAI_BASE_URL env var which is no longer consulted
after #4165. Updated to use model config (provider, base_url, api_key)
which is the new single source of truth for custom endpoint URLs.
* feat(auth): support custom endpoint credential pools keyed by provider name
Custom OpenAI-compatible endpoints all share provider='custom', making
the provider-keyed pool useless. Now pools for custom endpoints are
keyed by 'custom:<normalized_name>' where the name comes from the
custom_providers config list (auto-generated from URL hostname).
- Pool key format: 'custom:together.ai', 'custom:local-(localhost:8080)'
- load_pool('custom:name') seeds from custom_providers api_key AND
model.api_key when base_url matches
- hermes auth add/list now shows custom endpoints alongside registry
providers
- _resolve_openrouter_runtime and _resolve_named_custom_runtime check
pool before falling back to single config key
- 6 new tests covering custom pool keying, seeding, and listing
* docs: add Excalidraw diagram of full credential pool flow
Comprehensive architecture diagram showing:
- Credential sources (env vars, auth.json OAuth, config.yaml, CLI)
- Pool storage and auto-seeding
- Runtime resolution paths (registry, custom, OpenRouter)
- Error recovery (429 retry-then-rotate, 402 immediate, 401 refresh)
- CLI management commands and strategy configuration
Open at: https://excalidraw.com/#json=2Ycqhqpi6f12E_3ITyiwh,c7u9jSt5BwrmiVzHGbm87g
* fix(tests): update setup wizard pool tests for unified select_provider_and_model flow
The setup wizard now delegates to select_provider_and_model() instead
of using its own prompt_choice-based provider picker. Tests needed:
- Mock select_provider_and_model as no-op (provider pre-written to config)
- Call _stub_tts BEFORE custom prompt_choice mock (it overwrites it)
- Pre-write model.provider to config so the pool step is reached
* docs: add comprehensive credential pool documentation
- New page: website/docs/user-guide/features/credential-pools.md
Full guide covering quick start, CLI commands, rotation strategies,
error recovery, custom endpoint pools, auto-discovery, thread safety,
architecture, and storage format.
- Updated fallback-providers.md to reference credential pools as the
first layer of resilience (same-provider rotation before cross-provider)
- Added hermes auth to CLI commands reference with usage examples
- Added credential_pool_strategies to configuration guide
* chore: remove excalidraw diagram from repo (external link only)
* refactor: simplify credential pool code — extract helpers, collapse extras, dedup patterns
- _load_config_safe(): replace 4 identical try/except/import blocks
- _iter_custom_providers(): shared generator for custom provider iteration
- PooledCredential.extra dict: collapse 11 round-trip-only fields
(token_type, scope, client_id, portal_base_url, obtained_at,
expires_in, agent_key_id, agent_key_expires_in, agent_key_reused,
agent_key_obtained_at, tls) into a single extra dict with
__getattr__ for backward-compatible access
- _available_entries(): shared exhaustion-check between select and peek
- Dedup anthropic OAuth seeding (hermes_pkce + claude_code identical)
- SimpleNamespace replaces class _Args boilerplate in auth_commands
- _try_resolve_from_custom_pool(): shared pool-check in runtime_provider
Net -17 lines. All 383 targeted tests pass.
---------
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
2026-03-31 03:10:01 -07:00
|
|
|
|
pool_present, entry = _select_pool_entry("nous")
|
|
|
|
|
|
if pool_present:
|
|
|
|
|
|
if entry is None:
|
|
|
|
|
|
return None
|
|
|
|
|
|
return {
|
|
|
|
|
|
"access_token": getattr(entry, "access_token", ""),
|
|
|
|
|
|
"refresh_token": getattr(entry, "refresh_token", None),
|
|
|
|
|
|
"agent_key": getattr(entry, "agent_key", None),
|
|
|
|
|
|
"inference_base_url": _pool_runtime_base_url(entry, _NOUS_DEFAULT_BASE_URL),
|
|
|
|
|
|
"portal_base_url": getattr(entry, "portal_base_url", None),
|
|
|
|
|
|
"client_id": getattr(entry, "client_id", None),
|
|
|
|
|
|
"scope": getattr(entry, "scope", None),
|
|
|
|
|
|
"token_type": getattr(entry, "token_type", "Bearer"),
|
|
|
|
|
|
"source": "pool",
|
|
|
|
|
|
}
|
|
|
|
|
|
|
2026-02-22 02:16:11 -08:00
|
|
|
|
try:
|
|
|
|
|
|
if not _AUTH_JSON_PATH.is_file():
|
|
|
|
|
|
return None
|
|
|
|
|
|
data = json.loads(_AUTH_JSON_PATH.read_text())
|
|
|
|
|
|
if data.get("active_provider") != "nous":
|
|
|
|
|
|
return None
|
|
|
|
|
|
provider = data.get("providers", {}).get("nous", {})
|
|
|
|
|
|
# Must have at least an access_token or agent_key
|
|
|
|
|
|
if not provider.get("agent_key") and not provider.get("access_token"):
|
|
|
|
|
|
return None
|
|
|
|
|
|
return provider
|
|
|
|
|
|
except Exception as exc:
|
|
|
|
|
|
logger.debug("Could not read Nous auth: %s", exc)
|
|
|
|
|
|
return None
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def _nous_api_key(provider: dict) -> str:
|
|
|
|
|
|
"""Extract the best API key from a Nous provider state dict."""
|
|
|
|
|
|
return provider.get("agent_key") or provider.get("access_token", "")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def _nous_base_url() -> str:
|
|
|
|
|
|
"""Resolve the Nous inference base URL from env or default."""
|
|
|
|
|
|
return os.getenv("NOUS_INFERENCE_BASE_URL", _NOUS_DEFAULT_BASE_URL)
|
|
|
|
|
|
|
|
|
|
|
|
|
2026-04-21 14:45:13 -06:00
|
|
|
|
def _resolve_nous_runtime_api(*, force_refresh: bool = False) -> Optional[tuple[str, str]]:
|
|
|
|
|
|
"""Return fresh Nous runtime credentials when available.
|
|
|
|
|
|
|
|
|
|
|
|
This mirrors the main agent's 401 recovery path and keeps auxiliary
|
|
|
|
|
|
clients aligned with the singleton auth store + mint flow instead of
|
|
|
|
|
|
relying only on whatever raw tokens happen to be sitting in auth.json
|
|
|
|
|
|
or the credential pool.
|
|
|
|
|
|
"""
|
|
|
|
|
|
try:
|
|
|
|
|
|
from hermes_cli.auth import resolve_nous_runtime_credentials
|
|
|
|
|
|
|
|
|
|
|
|
creds = resolve_nous_runtime_credentials(
|
|
|
|
|
|
min_key_ttl_seconds=max(60, int(os.getenv("HERMES_NOUS_MIN_KEY_TTL_SECONDS", "1800"))),
|
|
|
|
|
|
timeout_seconds=float(os.getenv("HERMES_NOUS_TIMEOUT_SECONDS", "15")),
|
|
|
|
|
|
force_mint=force_refresh,
|
|
|
|
|
|
)
|
|
|
|
|
|
except Exception as exc:
|
|
|
|
|
|
logger.debug("Auxiliary Nous runtime credential resolution failed: %s", exc)
|
|
|
|
|
|
return None
|
|
|
|
|
|
|
|
|
|
|
|
api_key = str(creds.get("api_key") or "").strip()
|
|
|
|
|
|
base_url = str(creds.get("base_url") or "").strip().rstrip("/")
|
|
|
|
|
|
if not api_key or not base_url:
|
|
|
|
|
|
return None
|
|
|
|
|
|
return api_key, base_url
|
|
|
|
|
|
|
|
|
|
|
|
|
2026-02-28 21:47:51 -08:00
|
|
|
|
def _read_codex_access_token() -> Optional[str]:
|
2026-04-09 00:04:30 -07:00
|
|
|
|
"""Read a valid, non-expired Codex OAuth access token from Hermes auth store.
|
|
|
|
|
|
|
|
|
|
|
|
If a credential pool exists but currently has no selectable runtime entry
|
|
|
|
|
|
(for example all pool slots are marked exhausted), fall back to the
|
|
|
|
|
|
profile's auth.json token instead of hard-failing. This keeps explicit
|
|
|
|
|
|
fallback-to-Codex working when the pool state is stale but the stored OAuth
|
|
|
|
|
|
token is still valid.
|
|
|
|
|
|
"""
|
feat(auth): same-provider credential pools with rotation, custom endpoint support, and interactive CLI (#2647)
* feat(auth): add same-provider credential pools and rotation UX
Add same-provider credential pooling so Hermes can rotate across
multiple credentials for a single provider, recover from exhausted
credentials without jumping providers immediately, and configure
that behavior directly in hermes setup.
- agent/credential_pool.py: persisted per-provider credential pools
- hermes auth add/list/remove/reset CLI commands
- 429/402/401 recovery with pool rotation in run_agent.py
- Setup wizard integration for pool strategy configuration
- Auto-seeding from env vars and existing OAuth state
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
Salvaged from PR #2647
* fix(tests): prevent pool auto-seeding from host env in credential pool tests
Tests for non-pool Anthropic paths and auth remove were failing when
host env vars (ANTHROPIC_API_KEY) or file-backed OAuth credentials
were present. The pool auto-seeding picked these up, causing unexpected
pool entries in tests.
- Mock _select_pool_entry in auxiliary_client OAuth flag tests
- Clear Anthropic env vars and mock _seed_from_singletons in auth remove test
* feat(auth): add thread safety, least_used strategy, and request counting
- Add threading.Lock to CredentialPool for gateway thread safety
(concurrent requests from multiple gateway sessions could race on
pool state mutations without this)
- Add 'least_used' rotation strategy that selects the credential
with the lowest request_count, distributing load more evenly
- Add request_count field to PooledCredential for usage tracking
- Add mark_used() method to increment per-credential request counts
- Wrap select(), mark_exhausted_and_rotate(), and try_refresh_current()
with lock acquisition
- Add tests: least_used selection, mark_used counting, concurrent
thread safety (4 threads × 20 selects with no corruption)
* feat(auth): add interactive mode for bare 'hermes auth' command
When 'hermes auth' is called without a subcommand, it now launches an
interactive wizard that:
1. Shows full credential pool status across all providers
2. Offers a menu: add, remove, reset cooldowns, set strategy
3. For OAuth-capable providers (anthropic, nous, openai-codex), the
add flow explicitly asks 'API key or OAuth login?' — making it
clear that both auth types are supported for the same provider
4. Strategy picker shows all 4 options (fill_first, round_robin,
least_used, random) with the current selection marked
5. Remove flow shows entries with indices for easy selection
The subcommand paths (hermes auth add/list/remove/reset) still work
exactly as before for scripted/non-interactive use.
* fix(tests): update runtime_provider tests for config.yaml source of truth (#4165)
Tests were using OPENAI_BASE_URL env var which is no longer consulted
after #4165. Updated to use model config (provider, base_url, api_key)
which is the new single source of truth for custom endpoint URLs.
* feat(auth): support custom endpoint credential pools keyed by provider name
Custom OpenAI-compatible endpoints all share provider='custom', making
the provider-keyed pool useless. Now pools for custom endpoints are
keyed by 'custom:<normalized_name>' where the name comes from the
custom_providers config list (auto-generated from URL hostname).
- Pool key format: 'custom:together.ai', 'custom:local-(localhost:8080)'
- load_pool('custom:name') seeds from custom_providers api_key AND
model.api_key when base_url matches
- hermes auth add/list now shows custom endpoints alongside registry
providers
- _resolve_openrouter_runtime and _resolve_named_custom_runtime check
pool before falling back to single config key
- 6 new tests covering custom pool keying, seeding, and listing
* docs: add Excalidraw diagram of full credential pool flow
Comprehensive architecture diagram showing:
- Credential sources (env vars, auth.json OAuth, config.yaml, CLI)
- Pool storage and auto-seeding
- Runtime resolution paths (registry, custom, OpenRouter)
- Error recovery (429 retry-then-rotate, 402 immediate, 401 refresh)
- CLI management commands and strategy configuration
Open at: https://excalidraw.com/#json=2Ycqhqpi6f12E_3ITyiwh,c7u9jSt5BwrmiVzHGbm87g
* fix(tests): update setup wizard pool tests for unified select_provider_and_model flow
The setup wizard now delegates to select_provider_and_model() instead
of using its own prompt_choice-based provider picker. Tests needed:
- Mock select_provider_and_model as no-op (provider pre-written to config)
- Call _stub_tts BEFORE custom prompt_choice mock (it overwrites it)
- Pre-write model.provider to config so the pool step is reached
* docs: add comprehensive credential pool documentation
- New page: website/docs/user-guide/features/credential-pools.md
Full guide covering quick start, CLI commands, rotation strategies,
error recovery, custom endpoint pools, auto-discovery, thread safety,
architecture, and storage format.
- Updated fallback-providers.md to reference credential pools as the
first layer of resilience (same-provider rotation before cross-provider)
- Added hermes auth to CLI commands reference with usage examples
- Added credential_pool_strategies to configuration guide
* chore: remove excalidraw diagram from repo (external link only)
* refactor: simplify credential pool code — extract helpers, collapse extras, dedup patterns
- _load_config_safe(): replace 4 identical try/except/import blocks
- _iter_custom_providers(): shared generator for custom provider iteration
- PooledCredential.extra dict: collapse 11 round-trip-only fields
(token_type, scope, client_id, portal_base_url, obtained_at,
expires_in, agent_key_id, agent_key_expires_in, agent_key_reused,
agent_key_obtained_at, tls) into a single extra dict with
__getattr__ for backward-compatible access
- _available_entries(): shared exhaustion-check between select and peek
- Dedup anthropic OAuth seeding (hermes_pkce + claude_code identical)
- SimpleNamespace replaces class _Args boilerplate in auth_commands
- _try_resolve_from_custom_pool(): shared pool-check in runtime_provider
Net -17 lines. All 383 targeted tests pass.
---------
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
2026-03-31 03:10:01 -07:00
|
|
|
|
pool_present, entry = _select_pool_entry("openai-codex")
|
|
|
|
|
|
if pool_present:
|
|
|
|
|
|
token = _pool_runtime_api_key(entry)
|
2026-04-09 00:04:30 -07:00
|
|
|
|
if token:
|
|
|
|
|
|
return token
|
feat(auth): same-provider credential pools with rotation, custom endpoint support, and interactive CLI (#2647)
* feat(auth): add same-provider credential pools and rotation UX
Add same-provider credential pooling so Hermes can rotate across
multiple credentials for a single provider, recover from exhausted
credentials without jumping providers immediately, and configure
that behavior directly in hermes setup.
- agent/credential_pool.py: persisted per-provider credential pools
- hermes auth add/list/remove/reset CLI commands
- 429/402/401 recovery with pool rotation in run_agent.py
- Setup wizard integration for pool strategy configuration
- Auto-seeding from env vars and existing OAuth state
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
Salvaged from PR #2647
* fix(tests): prevent pool auto-seeding from host env in credential pool tests
Tests for non-pool Anthropic paths and auth remove were failing when
host env vars (ANTHROPIC_API_KEY) or file-backed OAuth credentials
were present. The pool auto-seeding picked these up, causing unexpected
pool entries in tests.
- Mock _select_pool_entry in auxiliary_client OAuth flag tests
- Clear Anthropic env vars and mock _seed_from_singletons in auth remove test
* feat(auth): add thread safety, least_used strategy, and request counting
- Add threading.Lock to CredentialPool for gateway thread safety
(concurrent requests from multiple gateway sessions could race on
pool state mutations without this)
- Add 'least_used' rotation strategy that selects the credential
with the lowest request_count, distributing load more evenly
- Add request_count field to PooledCredential for usage tracking
- Add mark_used() method to increment per-credential request counts
- Wrap select(), mark_exhausted_and_rotate(), and try_refresh_current()
with lock acquisition
- Add tests: least_used selection, mark_used counting, concurrent
thread safety (4 threads × 20 selects with no corruption)
* feat(auth): add interactive mode for bare 'hermes auth' command
When 'hermes auth' is called without a subcommand, it now launches an
interactive wizard that:
1. Shows full credential pool status across all providers
2. Offers a menu: add, remove, reset cooldowns, set strategy
3. For OAuth-capable providers (anthropic, nous, openai-codex), the
add flow explicitly asks 'API key or OAuth login?' — making it
clear that both auth types are supported for the same provider
4. Strategy picker shows all 4 options (fill_first, round_robin,
least_used, random) with the current selection marked
5. Remove flow shows entries with indices for easy selection
The subcommand paths (hermes auth add/list/remove/reset) still work
exactly as before for scripted/non-interactive use.
* fix(tests): update runtime_provider tests for config.yaml source of truth (#4165)
Tests were using OPENAI_BASE_URL env var which is no longer consulted
after #4165. Updated to use model config (provider, base_url, api_key)
which is the new single source of truth for custom endpoint URLs.
* feat(auth): support custom endpoint credential pools keyed by provider name
Custom OpenAI-compatible endpoints all share provider='custom', making
the provider-keyed pool useless. Now pools for custom endpoints are
keyed by 'custom:<normalized_name>' where the name comes from the
custom_providers config list (auto-generated from URL hostname).
- Pool key format: 'custom:together.ai', 'custom:local-(localhost:8080)'
- load_pool('custom:name') seeds from custom_providers api_key AND
model.api_key when base_url matches
- hermes auth add/list now shows custom endpoints alongside registry
providers
- _resolve_openrouter_runtime and _resolve_named_custom_runtime check
pool before falling back to single config key
- 6 new tests covering custom pool keying, seeding, and listing
* docs: add Excalidraw diagram of full credential pool flow
Comprehensive architecture diagram showing:
- Credential sources (env vars, auth.json OAuth, config.yaml, CLI)
- Pool storage and auto-seeding
- Runtime resolution paths (registry, custom, OpenRouter)
- Error recovery (429 retry-then-rotate, 402 immediate, 401 refresh)
- CLI management commands and strategy configuration
Open at: https://excalidraw.com/#json=2Ycqhqpi6f12E_3ITyiwh,c7u9jSt5BwrmiVzHGbm87g
* fix(tests): update setup wizard pool tests for unified select_provider_and_model flow
The setup wizard now delegates to select_provider_and_model() instead
of using its own prompt_choice-based provider picker. Tests needed:
- Mock select_provider_and_model as no-op (provider pre-written to config)
- Call _stub_tts BEFORE custom prompt_choice mock (it overwrites it)
- Pre-write model.provider to config so the pool step is reached
* docs: add comprehensive credential pool documentation
- New page: website/docs/user-guide/features/credential-pools.md
Full guide covering quick start, CLI commands, rotation strategies,
error recovery, custom endpoint pools, auto-discovery, thread safety,
architecture, and storage format.
- Updated fallback-providers.md to reference credential pools as the
first layer of resilience (same-provider rotation before cross-provider)
- Added hermes auth to CLI commands reference with usage examples
- Added credential_pool_strategies to configuration guide
* chore: remove excalidraw diagram from repo (external link only)
* refactor: simplify credential pool code — extract helpers, collapse extras, dedup patterns
- _load_config_safe(): replace 4 identical try/except/import blocks
- _iter_custom_providers(): shared generator for custom provider iteration
- PooledCredential.extra dict: collapse 11 round-trip-only fields
(token_type, scope, client_id, portal_base_url, obtained_at,
expires_in, agent_key_id, agent_key_expires_in, agent_key_reused,
agent_key_obtained_at, tls) into a single extra dict with
__getattr__ for backward-compatible access
- _available_entries(): shared exhaustion-check between select and peek
- Dedup anthropic OAuth seeding (hermes_pkce + claude_code identical)
- SimpleNamespace replaces class _Args boilerplate in auth_commands
- _try_resolve_from_custom_pool(): shared pool-check in runtime_provider
Net -17 lines. All 383 targeted tests pass.
---------
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
2026-03-31 03:10:01 -07:00
|
|
|
|
|
2026-02-28 21:47:51 -08:00
|
|
|
|
try:
|
2026-03-01 19:59:24 -08:00
|
|
|
|
from hermes_cli.auth import _read_codex_tokens
|
|
|
|
|
|
data = _read_codex_tokens()
|
|
|
|
|
|
tokens = data.get("tokens", {})
|
2026-02-28 21:47:51 -08:00
|
|
|
|
access_token = tokens.get("access_token")
|
2026-03-21 17:36:25 -07:00
|
|
|
|
if not isinstance(access_token, str) or not access_token.strip():
|
|
|
|
|
|
return None
|
|
|
|
|
|
|
|
|
|
|
|
# Check JWT expiry — expired tokens block the auto chain and
|
|
|
|
|
|
# prevent fallback to working providers (e.g. Anthropic).
|
|
|
|
|
|
try:
|
|
|
|
|
|
import base64
|
|
|
|
|
|
payload = access_token.split(".")[1]
|
|
|
|
|
|
payload += "=" * (-len(payload) % 4)
|
|
|
|
|
|
claims = json.loads(base64.urlsafe_b64decode(payload))
|
|
|
|
|
|
exp = claims.get("exp", 0)
|
|
|
|
|
|
if exp and time.time() > exp:
|
|
|
|
|
|
logger.debug("Codex access token expired (exp=%s), skipping", exp)
|
|
|
|
|
|
return None
|
|
|
|
|
|
except Exception:
|
|
|
|
|
|
pass # Non-JWT token or decode error — use as-is
|
|
|
|
|
|
|
|
|
|
|
|
return access_token.strip()
|
2026-02-28 21:47:51 -08:00
|
|
|
|
except Exception as exc:
|
|
|
|
|
|
logger.debug("Could not read Codex auth for auxiliary client: %s", exc)
|
|
|
|
|
|
return None
|
|
|
|
|
|
|
|
|
|
|
|
|
2026-03-06 19:08:54 -08:00
|
|
|
|
def _resolve_api_key_provider() -> Tuple[Optional[OpenAI], Optional[str]]:
|
|
|
|
|
|
"""Try each API-key provider in PROVIDER_REGISTRY order.
|
|
|
|
|
|
|
2026-03-17 23:40:22 -07:00
|
|
|
|
Returns (client, model) for the first provider with usable runtime
|
|
|
|
|
|
credentials, or (None, None) if none are configured.
|
2026-03-06 19:08:54 -08:00
|
|
|
|
"""
|
|
|
|
|
|
try:
|
2026-03-17 23:40:22 -07:00
|
|
|
|
from hermes_cli.auth import PROVIDER_REGISTRY, resolve_api_key_provider_credentials
|
2026-03-06 19:08:54 -08:00
|
|
|
|
except ImportError:
|
|
|
|
|
|
logger.debug("Could not import PROVIDER_REGISTRY for API-key fallback")
|
|
|
|
|
|
return None, None
|
|
|
|
|
|
|
|
|
|
|
|
for provider_id, pconfig in PROVIDER_REGISTRY.items():
|
|
|
|
|
|
if pconfig.auth_type != "api_key":
|
|
|
|
|
|
continue
|
2026-03-14 21:14:20 -07:00
|
|
|
|
if provider_id == "anthropic":
|
2026-04-10 15:16:18 +08:00
|
|
|
|
# Only try anthropic when the user has explicitly configured it.
|
|
|
|
|
|
# Without this gate, Claude Code credentials get silently used
|
|
|
|
|
|
# as auxiliary fallback when the user's primary provider fails.
|
|
|
|
|
|
try:
|
|
|
|
|
|
from hermes_cli.auth import is_provider_explicitly_configured
|
|
|
|
|
|
if not is_provider_explicitly_configured("anthropic"):
|
|
|
|
|
|
continue
|
|
|
|
|
|
except ImportError:
|
|
|
|
|
|
pass
|
2026-03-14 21:14:20 -07:00
|
|
|
|
return _try_anthropic()
|
|
|
|
|
|
|
feat(auth): same-provider credential pools with rotation, custom endpoint support, and interactive CLI (#2647)
* feat(auth): add same-provider credential pools and rotation UX
Add same-provider credential pooling so Hermes can rotate across
multiple credentials for a single provider, recover from exhausted
credentials without jumping providers immediately, and configure
that behavior directly in hermes setup.
- agent/credential_pool.py: persisted per-provider credential pools
- hermes auth add/list/remove/reset CLI commands
- 429/402/401 recovery with pool rotation in run_agent.py
- Setup wizard integration for pool strategy configuration
- Auto-seeding from env vars and existing OAuth state
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
Salvaged from PR #2647
* fix(tests): prevent pool auto-seeding from host env in credential pool tests
Tests for non-pool Anthropic paths and auth remove were failing when
host env vars (ANTHROPIC_API_KEY) or file-backed OAuth credentials
were present. The pool auto-seeding picked these up, causing unexpected
pool entries in tests.
- Mock _select_pool_entry in auxiliary_client OAuth flag tests
- Clear Anthropic env vars and mock _seed_from_singletons in auth remove test
* feat(auth): add thread safety, least_used strategy, and request counting
- Add threading.Lock to CredentialPool for gateway thread safety
(concurrent requests from multiple gateway sessions could race on
pool state mutations without this)
- Add 'least_used' rotation strategy that selects the credential
with the lowest request_count, distributing load more evenly
- Add request_count field to PooledCredential for usage tracking
- Add mark_used() method to increment per-credential request counts
- Wrap select(), mark_exhausted_and_rotate(), and try_refresh_current()
with lock acquisition
- Add tests: least_used selection, mark_used counting, concurrent
thread safety (4 threads × 20 selects with no corruption)
* feat(auth): add interactive mode for bare 'hermes auth' command
When 'hermes auth' is called without a subcommand, it now launches an
interactive wizard that:
1. Shows full credential pool status across all providers
2. Offers a menu: add, remove, reset cooldowns, set strategy
3. For OAuth-capable providers (anthropic, nous, openai-codex), the
add flow explicitly asks 'API key or OAuth login?' — making it
clear that both auth types are supported for the same provider
4. Strategy picker shows all 4 options (fill_first, round_robin,
least_used, random) with the current selection marked
5. Remove flow shows entries with indices for easy selection
The subcommand paths (hermes auth add/list/remove/reset) still work
exactly as before for scripted/non-interactive use.
* fix(tests): update runtime_provider tests for config.yaml source of truth (#4165)
Tests were using OPENAI_BASE_URL env var which is no longer consulted
after #4165. Updated to use model config (provider, base_url, api_key)
which is the new single source of truth for custom endpoint URLs.
* feat(auth): support custom endpoint credential pools keyed by provider name
Custom OpenAI-compatible endpoints all share provider='custom', making
the provider-keyed pool useless. Now pools for custom endpoints are
keyed by 'custom:<normalized_name>' where the name comes from the
custom_providers config list (auto-generated from URL hostname).
- Pool key format: 'custom:together.ai', 'custom:local-(localhost:8080)'
- load_pool('custom:name') seeds from custom_providers api_key AND
model.api_key when base_url matches
- hermes auth add/list now shows custom endpoints alongside registry
providers
- _resolve_openrouter_runtime and _resolve_named_custom_runtime check
pool before falling back to single config key
- 6 new tests covering custom pool keying, seeding, and listing
* docs: add Excalidraw diagram of full credential pool flow
Comprehensive architecture diagram showing:
- Credential sources (env vars, auth.json OAuth, config.yaml, CLI)
- Pool storage and auto-seeding
- Runtime resolution paths (registry, custom, OpenRouter)
- Error recovery (429 retry-then-rotate, 402 immediate, 401 refresh)
- CLI management commands and strategy configuration
Open at: https://excalidraw.com/#json=2Ycqhqpi6f12E_3ITyiwh,c7u9jSt5BwrmiVzHGbm87g
* fix(tests): update setup wizard pool tests for unified select_provider_and_model flow
The setup wizard now delegates to select_provider_and_model() instead
of using its own prompt_choice-based provider picker. Tests needed:
- Mock select_provider_and_model as no-op (provider pre-written to config)
- Call _stub_tts BEFORE custom prompt_choice mock (it overwrites it)
- Pre-write model.provider to config so the pool step is reached
* docs: add comprehensive credential pool documentation
- New page: website/docs/user-guide/features/credential-pools.md
Full guide covering quick start, CLI commands, rotation strategies,
error recovery, custom endpoint pools, auto-discovery, thread safety,
architecture, and storage format.
- Updated fallback-providers.md to reference credential pools as the
first layer of resilience (same-provider rotation before cross-provider)
- Added hermes auth to CLI commands reference with usage examples
- Added credential_pool_strategies to configuration guide
* chore: remove excalidraw diagram from repo (external link only)
* refactor: simplify credential pool code — extract helpers, collapse extras, dedup patterns
- _load_config_safe(): replace 4 identical try/except/import blocks
- _iter_custom_providers(): shared generator for custom provider iteration
- PooledCredential.extra dict: collapse 11 round-trip-only fields
(token_type, scope, client_id, portal_base_url, obtained_at,
expires_in, agent_key_id, agent_key_expires_in, agent_key_reused,
agent_key_obtained_at, tls) into a single extra dict with
__getattr__ for backward-compatible access
- _available_entries(): shared exhaustion-check between select and peek
- Dedup anthropic OAuth seeding (hermes_pkce + claude_code identical)
- SimpleNamespace replaces class _Args boilerplate in auth_commands
- _try_resolve_from_custom_pool(): shared pool-check in runtime_provider
Net -17 lines. All 383 targeted tests pass.
---------
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
2026-03-31 03:10:01 -07:00
|
|
|
|
pool_present, entry = _select_pool_entry(provider_id)
|
|
|
|
|
|
if pool_present:
|
|
|
|
|
|
api_key = _pool_runtime_api_key(entry)
|
|
|
|
|
|
if not api_key:
|
|
|
|
|
|
continue
|
|
|
|
|
|
|
2026-04-07 22:23:28 -07:00
|
|
|
|
base_url = _to_openai_base_url(
|
|
|
|
|
|
_pool_runtime_base_url(entry, pconfig.inference_base_url) or pconfig.inference_base_url
|
|
|
|
|
|
)
|
2026-04-11 12:37:53 +05:30
|
|
|
|
model = _API_KEY_PROVIDER_AUX_MODELS.get(provider_id)
|
|
|
|
|
|
if model is None:
|
|
|
|
|
|
continue # skip provider if we don't know a valid aux model
|
feat(auth): same-provider credential pools with rotation, custom endpoint support, and interactive CLI (#2647)
* feat(auth): add same-provider credential pools and rotation UX
Add same-provider credential pooling so Hermes can rotate across
multiple credentials for a single provider, recover from exhausted
credentials without jumping providers immediately, and configure
that behavior directly in hermes setup.
- agent/credential_pool.py: persisted per-provider credential pools
- hermes auth add/list/remove/reset CLI commands
- 429/402/401 recovery with pool rotation in run_agent.py
- Setup wizard integration for pool strategy configuration
- Auto-seeding from env vars and existing OAuth state
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
Salvaged from PR #2647
* fix(tests): prevent pool auto-seeding from host env in credential pool tests
Tests for non-pool Anthropic paths and auth remove were failing when
host env vars (ANTHROPIC_API_KEY) or file-backed OAuth credentials
were present. The pool auto-seeding picked these up, causing unexpected
pool entries in tests.
- Mock _select_pool_entry in auxiliary_client OAuth flag tests
- Clear Anthropic env vars and mock _seed_from_singletons in auth remove test
* feat(auth): add thread safety, least_used strategy, and request counting
- Add threading.Lock to CredentialPool for gateway thread safety
(concurrent requests from multiple gateway sessions could race on
pool state mutations without this)
- Add 'least_used' rotation strategy that selects the credential
with the lowest request_count, distributing load more evenly
- Add request_count field to PooledCredential for usage tracking
- Add mark_used() method to increment per-credential request counts
- Wrap select(), mark_exhausted_and_rotate(), and try_refresh_current()
with lock acquisition
- Add tests: least_used selection, mark_used counting, concurrent
thread safety (4 threads × 20 selects with no corruption)
* feat(auth): add interactive mode for bare 'hermes auth' command
When 'hermes auth' is called without a subcommand, it now launches an
interactive wizard that:
1. Shows full credential pool status across all providers
2. Offers a menu: add, remove, reset cooldowns, set strategy
3. For OAuth-capable providers (anthropic, nous, openai-codex), the
add flow explicitly asks 'API key or OAuth login?' — making it
clear that both auth types are supported for the same provider
4. Strategy picker shows all 4 options (fill_first, round_robin,
least_used, random) with the current selection marked
5. Remove flow shows entries with indices for easy selection
The subcommand paths (hermes auth add/list/remove/reset) still work
exactly as before for scripted/non-interactive use.
* fix(tests): update runtime_provider tests for config.yaml source of truth (#4165)
Tests were using OPENAI_BASE_URL env var which is no longer consulted
after #4165. Updated to use model config (provider, base_url, api_key)
which is the new single source of truth for custom endpoint URLs.
* feat(auth): support custom endpoint credential pools keyed by provider name
Custom OpenAI-compatible endpoints all share provider='custom', making
the provider-keyed pool useless. Now pools for custom endpoints are
keyed by 'custom:<normalized_name>' where the name comes from the
custom_providers config list (auto-generated from URL hostname).
- Pool key format: 'custom:together.ai', 'custom:local-(localhost:8080)'
- load_pool('custom:name') seeds from custom_providers api_key AND
model.api_key when base_url matches
- hermes auth add/list now shows custom endpoints alongside registry
providers
- _resolve_openrouter_runtime and _resolve_named_custom_runtime check
pool before falling back to single config key
- 6 new tests covering custom pool keying, seeding, and listing
* docs: add Excalidraw diagram of full credential pool flow
Comprehensive architecture diagram showing:
- Credential sources (env vars, auth.json OAuth, config.yaml, CLI)
- Pool storage and auto-seeding
- Runtime resolution paths (registry, custom, OpenRouter)
- Error recovery (429 retry-then-rotate, 402 immediate, 401 refresh)
- CLI management commands and strategy configuration
Open at: https://excalidraw.com/#json=2Ycqhqpi6f12E_3ITyiwh,c7u9jSt5BwrmiVzHGbm87g
* fix(tests): update setup wizard pool tests for unified select_provider_and_model flow
The setup wizard now delegates to select_provider_and_model() instead
of using its own prompt_choice-based provider picker. Tests needed:
- Mock select_provider_and_model as no-op (provider pre-written to config)
- Call _stub_tts BEFORE custom prompt_choice mock (it overwrites it)
- Pre-write model.provider to config so the pool step is reached
* docs: add comprehensive credential pool documentation
- New page: website/docs/user-guide/features/credential-pools.md
Full guide covering quick start, CLI commands, rotation strategies,
error recovery, custom endpoint pools, auto-discovery, thread safety,
architecture, and storage format.
- Updated fallback-providers.md to reference credential pools as the
first layer of resilience (same-provider rotation before cross-provider)
- Added hermes auth to CLI commands reference with usage examples
- Added credential_pool_strategies to configuration guide
* chore: remove excalidraw diagram from repo (external link only)
* refactor: simplify credential pool code — extract helpers, collapse extras, dedup patterns
- _load_config_safe(): replace 4 identical try/except/import blocks
- _iter_custom_providers(): shared generator for custom provider iteration
- PooledCredential.extra dict: collapse 11 round-trip-only fields
(token_type, scope, client_id, portal_base_url, obtained_at,
expires_in, agent_key_id, agent_key_expires_in, agent_key_reused,
agent_key_obtained_at, tls) into a single extra dict with
__getattr__ for backward-compatible access
- _available_entries(): shared exhaustion-check between select and peek
- Dedup anthropic OAuth seeding (hermes_pkce + claude_code identical)
- SimpleNamespace replaces class _Args boilerplate in auth_commands
- _try_resolve_from_custom_pool(): shared pool-check in runtime_provider
Net -17 lines. All 383 targeted tests pass.
---------
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
2026-03-31 03:10:01 -07:00
|
|
|
|
logger.debug("Auxiliary text client: %s (%s) via pool", pconfig.name, model)
|
2026-04-20 00:00:50 +05:30
|
|
|
|
if provider_id == "gemini":
|
2026-04-20 00:41:20 +05:30
|
|
|
|
from agent.gemini_native_adapter import GeminiNativeClient, is_native_gemini_base_url
|
2026-04-20 00:00:50 +05:30
|
|
|
|
|
2026-04-20 00:41:20 +05:30
|
|
|
|
if is_native_gemini_base_url(base_url):
|
|
|
|
|
|
return GeminiNativeClient(api_key=api_key, base_url=base_url), model
|
feat(auth): same-provider credential pools with rotation, custom endpoint support, and interactive CLI (#2647)
* feat(auth): add same-provider credential pools and rotation UX
Add same-provider credential pooling so Hermes can rotate across
multiple credentials for a single provider, recover from exhausted
credentials without jumping providers immediately, and configure
that behavior directly in hermes setup.
- agent/credential_pool.py: persisted per-provider credential pools
- hermes auth add/list/remove/reset CLI commands
- 429/402/401 recovery with pool rotation in run_agent.py
- Setup wizard integration for pool strategy configuration
- Auto-seeding from env vars and existing OAuth state
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
Salvaged from PR #2647
* fix(tests): prevent pool auto-seeding from host env in credential pool tests
Tests for non-pool Anthropic paths and auth remove were failing when
host env vars (ANTHROPIC_API_KEY) or file-backed OAuth credentials
were present. The pool auto-seeding picked these up, causing unexpected
pool entries in tests.
- Mock _select_pool_entry in auxiliary_client OAuth flag tests
- Clear Anthropic env vars and mock _seed_from_singletons in auth remove test
* feat(auth): add thread safety, least_used strategy, and request counting
- Add threading.Lock to CredentialPool for gateway thread safety
(concurrent requests from multiple gateway sessions could race on
pool state mutations without this)
- Add 'least_used' rotation strategy that selects the credential
with the lowest request_count, distributing load more evenly
- Add request_count field to PooledCredential for usage tracking
- Add mark_used() method to increment per-credential request counts
- Wrap select(), mark_exhausted_and_rotate(), and try_refresh_current()
with lock acquisition
- Add tests: least_used selection, mark_used counting, concurrent
thread safety (4 threads × 20 selects with no corruption)
* feat(auth): add interactive mode for bare 'hermes auth' command
When 'hermes auth' is called without a subcommand, it now launches an
interactive wizard that:
1. Shows full credential pool status across all providers
2. Offers a menu: add, remove, reset cooldowns, set strategy
3. For OAuth-capable providers (anthropic, nous, openai-codex), the
add flow explicitly asks 'API key or OAuth login?' — making it
clear that both auth types are supported for the same provider
4. Strategy picker shows all 4 options (fill_first, round_robin,
least_used, random) with the current selection marked
5. Remove flow shows entries with indices for easy selection
The subcommand paths (hermes auth add/list/remove/reset) still work
exactly as before for scripted/non-interactive use.
* fix(tests): update runtime_provider tests for config.yaml source of truth (#4165)
Tests were using OPENAI_BASE_URL env var which is no longer consulted
after #4165. Updated to use model config (provider, base_url, api_key)
which is the new single source of truth for custom endpoint URLs.
* feat(auth): support custom endpoint credential pools keyed by provider name
Custom OpenAI-compatible endpoints all share provider='custom', making
the provider-keyed pool useless. Now pools for custom endpoints are
keyed by 'custom:<normalized_name>' where the name comes from the
custom_providers config list (auto-generated from URL hostname).
- Pool key format: 'custom:together.ai', 'custom:local-(localhost:8080)'
- load_pool('custom:name') seeds from custom_providers api_key AND
model.api_key when base_url matches
- hermes auth add/list now shows custom endpoints alongside registry
providers
- _resolve_openrouter_runtime and _resolve_named_custom_runtime check
pool before falling back to single config key
- 6 new tests covering custom pool keying, seeding, and listing
* docs: add Excalidraw diagram of full credential pool flow
Comprehensive architecture diagram showing:
- Credential sources (env vars, auth.json OAuth, config.yaml, CLI)
- Pool storage and auto-seeding
- Runtime resolution paths (registry, custom, OpenRouter)
- Error recovery (429 retry-then-rotate, 402 immediate, 401 refresh)
- CLI management commands and strategy configuration
Open at: https://excalidraw.com/#json=2Ycqhqpi6f12E_3ITyiwh,c7u9jSt5BwrmiVzHGbm87g
* fix(tests): update setup wizard pool tests for unified select_provider_and_model flow
The setup wizard now delegates to select_provider_and_model() instead
of using its own prompt_choice-based provider picker. Tests needed:
- Mock select_provider_and_model as no-op (provider pre-written to config)
- Call _stub_tts BEFORE custom prompt_choice mock (it overwrites it)
- Pre-write model.provider to config so the pool step is reached
* docs: add comprehensive credential pool documentation
- New page: website/docs/user-guide/features/credential-pools.md
Full guide covering quick start, CLI commands, rotation strategies,
error recovery, custom endpoint pools, auto-discovery, thread safety,
architecture, and storage format.
- Updated fallback-providers.md to reference credential pools as the
first layer of resilience (same-provider rotation before cross-provider)
- Added hermes auth to CLI commands reference with usage examples
- Added credential_pool_strategies to configuration guide
* chore: remove excalidraw diagram from repo (external link only)
* refactor: simplify credential pool code — extract helpers, collapse extras, dedup patterns
- _load_config_safe(): replace 4 identical try/except/import blocks
- _iter_custom_providers(): shared generator for custom provider iteration
- PooledCredential.extra dict: collapse 11 round-trip-only fields
(token_type, scope, client_id, portal_base_url, obtained_at,
expires_in, agent_key_id, agent_key_expires_in, agent_key_reused,
agent_key_obtained_at, tls) into a single extra dict with
__getattr__ for backward-compatible access
- _available_entries(): shared exhaustion-check between select and peek
- Dedup anthropic OAuth seeding (hermes_pkce + claude_code identical)
- SimpleNamespace replaces class _Args boilerplate in auth_commands
- _try_resolve_from_custom_pool(): shared pool-check in runtime_provider
Net -17 lines. All 383 targeted tests pass.
---------
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
2026-03-31 03:10:01 -07:00
|
|
|
|
extra = {}
|
fix: sweep remaining provider-URL substring checks across codebase
Completes the hostname-hardening sweep — every substring check against a
provider host in live-routing code is now hostname-based. This closes the
same false-positive class for OpenRouter, GitHub Copilot, Kimi, Qwen,
ChatGPT/Codex, Bedrock, GitHub Models, Vercel AI Gateway, Nous, Z.AI,
Moonshot, Arcee, and MiniMax that the original PR closed for OpenAI, xAI,
and Anthropic.
New helper:
- utils.base_url_host_matches(base_url, domain) — safe counterpart to
'domain in base_url'. Accepts hostname equality and subdomain matches;
rejects path segments, host suffixes, and prefix collisions.
Call sites converted (real-code only; tests, optional-skills, red-teaming
scripts untouched):
run_agent.py (10 sites):
- AIAgent.__init__ Bedrock branch, ChatGPT/Codex branch (also path check)
- header cascade for openrouter / copilot / kimi / qwen / chatgpt
- interleaved-thinking trigger (openrouter + claude)
- _is_openrouter_url(), _is_qwen_portal()
- is_native_anthropic check
- github-models-vs-copilot detection (3 sites)
- reasoning-capable route gate (nousresearch, vercel, github)
- codex-backend detection in API kwargs build
- fallback api_mode Bedrock detection
agent/auxiliary_client.py (7 sites):
- extra-headers cascades in 4 distinct client-construction paths
(resolve custom, resolve auto, OpenRouter-fallback-to-custom,
_async_client_from_sync, resolve_provider_client explicit-custom,
resolve_auto_with_codex)
- _is_openrouter_client() base_url sniff
agent/usage_pricing.py:
- resolve_billing_route openrouter branch
agent/model_metadata.py:
- _is_openrouter_base_url(), Bedrock context-length lookup
hermes_cli/providers.py:
- determine_api_mode Bedrock heuristic
hermes_cli/runtime_provider.py:
- _is_openrouter_url flag for API-key preference (issues #420, #560)
hermes_cli/doctor.py:
- Kimi User-Agent header for /models probes
tools/delegate_tool.py:
- subagent Codex endpoint detection
trajectory_compressor.py:
- _detect_provider() cascade (8 providers: openrouter, nous, codex, zai,
kimi-coding, arcee, minimax-cn, minimax)
cli.py, gateway/run.py:
- /model-switch cache-enabled hint (openrouter + claude)
Bedrock detection tightened from 'bedrock-runtime in url' to
'hostname starts with bedrock-runtime. AND host is under amazonaws.com'.
ChatGPT/Codex detection tightened from 'chatgpt.com/backend-api/codex in
url' to 'hostname is chatgpt.com AND path contains /backend-api/codex'.
Tests:
- tests/test_base_url_hostname.py extended with a base_url_host_matches
suite (exact match, subdomain, path-segment rejection, host-suffix
rejection, host-prefix rejection, empty-input, case-insensitivity,
trailing dot).
Validation: 651 targeted tests pass (runtime_provider, minimax, bedrock,
gemini, auxiliary, codex_cloudflare, usage_pricing, compressor_fallback,
fallback_model, openai_client_lifecycle, provider_parity, cli_provider_resolution,
delegate, credential_pool, context_compressor, plus the 4 hostname test
modules). 26-assertion E2E call-site verification across 6 modules passes.
2026-04-20 21:17:28 -07:00
|
|
|
|
if base_url_host_matches(base_url, "api.kimi.com"):
|
2026-04-18 22:55:36 +08:00
|
|
|
|
extra["default_headers"] = {"User-Agent": "claude-code/0.1.0"}
|
fix: sweep remaining provider-URL substring checks across codebase
Completes the hostname-hardening sweep — every substring check against a
provider host in live-routing code is now hostname-based. This closes the
same false-positive class for OpenRouter, GitHub Copilot, Kimi, Qwen,
ChatGPT/Codex, Bedrock, GitHub Models, Vercel AI Gateway, Nous, Z.AI,
Moonshot, Arcee, and MiniMax that the original PR closed for OpenAI, xAI,
and Anthropic.
New helper:
- utils.base_url_host_matches(base_url, domain) — safe counterpart to
'domain in base_url'. Accepts hostname equality and subdomain matches;
rejects path segments, host suffixes, and prefix collisions.
Call sites converted (real-code only; tests, optional-skills, red-teaming
scripts untouched):
run_agent.py (10 sites):
- AIAgent.__init__ Bedrock branch, ChatGPT/Codex branch (also path check)
- header cascade for openrouter / copilot / kimi / qwen / chatgpt
- interleaved-thinking trigger (openrouter + claude)
- _is_openrouter_url(), _is_qwen_portal()
- is_native_anthropic check
- github-models-vs-copilot detection (3 sites)
- reasoning-capable route gate (nousresearch, vercel, github)
- codex-backend detection in API kwargs build
- fallback api_mode Bedrock detection
agent/auxiliary_client.py (7 sites):
- extra-headers cascades in 4 distinct client-construction paths
(resolve custom, resolve auto, OpenRouter-fallback-to-custom,
_async_client_from_sync, resolve_provider_client explicit-custom,
resolve_auto_with_codex)
- _is_openrouter_client() base_url sniff
agent/usage_pricing.py:
- resolve_billing_route openrouter branch
agent/model_metadata.py:
- _is_openrouter_base_url(), Bedrock context-length lookup
hermes_cli/providers.py:
- determine_api_mode Bedrock heuristic
hermes_cli/runtime_provider.py:
- _is_openrouter_url flag for API-key preference (issues #420, #560)
hermes_cli/doctor.py:
- Kimi User-Agent header for /models probes
tools/delegate_tool.py:
- subagent Codex endpoint detection
trajectory_compressor.py:
- _detect_provider() cascade (8 providers: openrouter, nous, codex, zai,
kimi-coding, arcee, minimax-cn, minimax)
cli.py, gateway/run.py:
- /model-switch cache-enabled hint (openrouter + claude)
Bedrock detection tightened from 'bedrock-runtime in url' to
'hostname starts with bedrock-runtime. AND host is under amazonaws.com'.
ChatGPT/Codex detection tightened from 'chatgpt.com/backend-api/codex in
url' to 'hostname is chatgpt.com AND path contains /backend-api/codex'.
Tests:
- tests/test_base_url_hostname.py extended with a base_url_host_matches
suite (exact match, subdomain, path-segment rejection, host-suffix
rejection, host-prefix rejection, empty-input, case-insensitivity,
trailing dot).
Validation: 651 targeted tests pass (runtime_provider, minimax, bedrock,
gemini, auxiliary, codex_cloudflare, usage_pricing, compressor_fallback,
fallback_model, openai_client_lifecycle, provider_parity, cli_provider_resolution,
delegate, credential_pool, context_compressor, plus the 4 hostname test
modules). 26-assertion E2E call-site verification across 6 modules passes.
2026-04-20 21:17:28 -07:00
|
|
|
|
elif base_url_host_matches(base_url, "api.githubcopilot.com"):
|
feat(auth): same-provider credential pools with rotation, custom endpoint support, and interactive CLI (#2647)
* feat(auth): add same-provider credential pools and rotation UX
Add same-provider credential pooling so Hermes can rotate across
multiple credentials for a single provider, recover from exhausted
credentials without jumping providers immediately, and configure
that behavior directly in hermes setup.
- agent/credential_pool.py: persisted per-provider credential pools
- hermes auth add/list/remove/reset CLI commands
- 429/402/401 recovery with pool rotation in run_agent.py
- Setup wizard integration for pool strategy configuration
- Auto-seeding from env vars and existing OAuth state
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
Salvaged from PR #2647
* fix(tests): prevent pool auto-seeding from host env in credential pool tests
Tests for non-pool Anthropic paths and auth remove were failing when
host env vars (ANTHROPIC_API_KEY) or file-backed OAuth credentials
were present. The pool auto-seeding picked these up, causing unexpected
pool entries in tests.
- Mock _select_pool_entry in auxiliary_client OAuth flag tests
- Clear Anthropic env vars and mock _seed_from_singletons in auth remove test
* feat(auth): add thread safety, least_used strategy, and request counting
- Add threading.Lock to CredentialPool for gateway thread safety
(concurrent requests from multiple gateway sessions could race on
pool state mutations without this)
- Add 'least_used' rotation strategy that selects the credential
with the lowest request_count, distributing load more evenly
- Add request_count field to PooledCredential for usage tracking
- Add mark_used() method to increment per-credential request counts
- Wrap select(), mark_exhausted_and_rotate(), and try_refresh_current()
with lock acquisition
- Add tests: least_used selection, mark_used counting, concurrent
thread safety (4 threads × 20 selects with no corruption)
* feat(auth): add interactive mode for bare 'hermes auth' command
When 'hermes auth' is called without a subcommand, it now launches an
interactive wizard that:
1. Shows full credential pool status across all providers
2. Offers a menu: add, remove, reset cooldowns, set strategy
3. For OAuth-capable providers (anthropic, nous, openai-codex), the
add flow explicitly asks 'API key or OAuth login?' — making it
clear that both auth types are supported for the same provider
4. Strategy picker shows all 4 options (fill_first, round_robin,
least_used, random) with the current selection marked
5. Remove flow shows entries with indices for easy selection
The subcommand paths (hermes auth add/list/remove/reset) still work
exactly as before for scripted/non-interactive use.
* fix(tests): update runtime_provider tests for config.yaml source of truth (#4165)
Tests were using OPENAI_BASE_URL env var which is no longer consulted
after #4165. Updated to use model config (provider, base_url, api_key)
which is the new single source of truth for custom endpoint URLs.
* feat(auth): support custom endpoint credential pools keyed by provider name
Custom OpenAI-compatible endpoints all share provider='custom', making
the provider-keyed pool useless. Now pools for custom endpoints are
keyed by 'custom:<normalized_name>' where the name comes from the
custom_providers config list (auto-generated from URL hostname).
- Pool key format: 'custom:together.ai', 'custom:local-(localhost:8080)'
- load_pool('custom:name') seeds from custom_providers api_key AND
model.api_key when base_url matches
- hermes auth add/list now shows custom endpoints alongside registry
providers
- _resolve_openrouter_runtime and _resolve_named_custom_runtime check
pool before falling back to single config key
- 6 new tests covering custom pool keying, seeding, and listing
* docs: add Excalidraw diagram of full credential pool flow
Comprehensive architecture diagram showing:
- Credential sources (env vars, auth.json OAuth, config.yaml, CLI)
- Pool storage and auto-seeding
- Runtime resolution paths (registry, custom, OpenRouter)
- Error recovery (429 retry-then-rotate, 402 immediate, 401 refresh)
- CLI management commands and strategy configuration
Open at: https://excalidraw.com/#json=2Ycqhqpi6f12E_3ITyiwh,c7u9jSt5BwrmiVzHGbm87g
* fix(tests): update setup wizard pool tests for unified select_provider_and_model flow
The setup wizard now delegates to select_provider_and_model() instead
of using its own prompt_choice-based provider picker. Tests needed:
- Mock select_provider_and_model as no-op (provider pre-written to config)
- Call _stub_tts BEFORE custom prompt_choice mock (it overwrites it)
- Pre-write model.provider to config so the pool step is reached
* docs: add comprehensive credential pool documentation
- New page: website/docs/user-guide/features/credential-pools.md
Full guide covering quick start, CLI commands, rotation strategies,
error recovery, custom endpoint pools, auto-discovery, thread safety,
architecture, and storage format.
- Updated fallback-providers.md to reference credential pools as the
first layer of resilience (same-provider rotation before cross-provider)
- Added hermes auth to CLI commands reference with usage examples
- Added credential_pool_strategies to configuration guide
* chore: remove excalidraw diagram from repo (external link only)
* refactor: simplify credential pool code — extract helpers, collapse extras, dedup patterns
- _load_config_safe(): replace 4 identical try/except/import blocks
- _iter_custom_providers(): shared generator for custom provider iteration
- PooledCredential.extra dict: collapse 11 round-trip-only fields
(token_type, scope, client_id, portal_base_url, obtained_at,
expires_in, agent_key_id, agent_key_expires_in, agent_key_reused,
agent_key_obtained_at, tls) into a single extra dict with
__getattr__ for backward-compatible access
- _available_entries(): shared exhaustion-check between select and peek
- Dedup anthropic OAuth seeding (hermes_pkce + claude_code identical)
- SimpleNamespace replaces class _Args boilerplate in auth_commands
- _try_resolve_from_custom_pool(): shared pool-check in runtime_provider
Net -17 lines. All 383 targeted tests pass.
---------
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
2026-03-31 03:10:01 -07:00
|
|
|
|
from hermes_cli.models import copilot_default_headers
|
|
|
|
|
|
|
|
|
|
|
|
extra["default_headers"] = copilot_default_headers()
|
|
|
|
|
|
return OpenAI(api_key=api_key, base_url=base_url, **extra), model
|
|
|
|
|
|
|
2026-03-17 23:40:22 -07:00
|
|
|
|
creds = resolve_api_key_provider_credentials(provider_id)
|
|
|
|
|
|
api_key = str(creds.get("api_key", "")).strip()
|
|
|
|
|
|
if not api_key:
|
|
|
|
|
|
continue
|
|
|
|
|
|
|
2026-04-07 22:23:28 -07:00
|
|
|
|
base_url = _to_openai_base_url(
|
|
|
|
|
|
str(creds.get("base_url", "")).strip().rstrip("/") or pconfig.inference_base_url
|
|
|
|
|
|
)
|
2026-04-11 12:37:53 +05:30
|
|
|
|
model = _API_KEY_PROVIDER_AUX_MODELS.get(provider_id)
|
|
|
|
|
|
if model is None:
|
|
|
|
|
|
continue # skip provider if we don't know a valid aux model
|
2026-03-06 19:08:54 -08:00
|
|
|
|
logger.debug("Auxiliary text client: %s (%s)", pconfig.name, model)
|
2026-04-20 00:00:50 +05:30
|
|
|
|
if provider_id == "gemini":
|
2026-04-20 00:41:20 +05:30
|
|
|
|
from agent.gemini_native_adapter import GeminiNativeClient, is_native_gemini_base_url
|
2026-04-20 00:00:50 +05:30
|
|
|
|
|
2026-04-20 00:41:20 +05:30
|
|
|
|
if is_native_gemini_base_url(base_url):
|
|
|
|
|
|
return GeminiNativeClient(api_key=api_key, base_url=base_url), model
|
2026-03-07 20:43:34 -05:00
|
|
|
|
extra = {}
|
fix: sweep remaining provider-URL substring checks across codebase
Completes the hostname-hardening sweep — every substring check against a
provider host in live-routing code is now hostname-based. This closes the
same false-positive class for OpenRouter, GitHub Copilot, Kimi, Qwen,
ChatGPT/Codex, Bedrock, GitHub Models, Vercel AI Gateway, Nous, Z.AI,
Moonshot, Arcee, and MiniMax that the original PR closed for OpenAI, xAI,
and Anthropic.
New helper:
- utils.base_url_host_matches(base_url, domain) — safe counterpart to
'domain in base_url'. Accepts hostname equality and subdomain matches;
rejects path segments, host suffixes, and prefix collisions.
Call sites converted (real-code only; tests, optional-skills, red-teaming
scripts untouched):
run_agent.py (10 sites):
- AIAgent.__init__ Bedrock branch, ChatGPT/Codex branch (also path check)
- header cascade for openrouter / copilot / kimi / qwen / chatgpt
- interleaved-thinking trigger (openrouter + claude)
- _is_openrouter_url(), _is_qwen_portal()
- is_native_anthropic check
- github-models-vs-copilot detection (3 sites)
- reasoning-capable route gate (nousresearch, vercel, github)
- codex-backend detection in API kwargs build
- fallback api_mode Bedrock detection
agent/auxiliary_client.py (7 sites):
- extra-headers cascades in 4 distinct client-construction paths
(resolve custom, resolve auto, OpenRouter-fallback-to-custom,
_async_client_from_sync, resolve_provider_client explicit-custom,
resolve_auto_with_codex)
- _is_openrouter_client() base_url sniff
agent/usage_pricing.py:
- resolve_billing_route openrouter branch
agent/model_metadata.py:
- _is_openrouter_base_url(), Bedrock context-length lookup
hermes_cli/providers.py:
- determine_api_mode Bedrock heuristic
hermes_cli/runtime_provider.py:
- _is_openrouter_url flag for API-key preference (issues #420, #560)
hermes_cli/doctor.py:
- Kimi User-Agent header for /models probes
tools/delegate_tool.py:
- subagent Codex endpoint detection
trajectory_compressor.py:
- _detect_provider() cascade (8 providers: openrouter, nous, codex, zai,
kimi-coding, arcee, minimax-cn, minimax)
cli.py, gateway/run.py:
- /model-switch cache-enabled hint (openrouter + claude)
Bedrock detection tightened from 'bedrock-runtime in url' to
'hostname starts with bedrock-runtime. AND host is under amazonaws.com'.
ChatGPT/Codex detection tightened from 'chatgpt.com/backend-api/codex in
url' to 'hostname is chatgpt.com AND path contains /backend-api/codex'.
Tests:
- tests/test_base_url_hostname.py extended with a base_url_host_matches
suite (exact match, subdomain, path-segment rejection, host-suffix
rejection, host-prefix rejection, empty-input, case-insensitivity,
trailing dot).
Validation: 651 targeted tests pass (runtime_provider, minimax, bedrock,
gemini, auxiliary, codex_cloudflare, usage_pricing, compressor_fallback,
fallback_model, openai_client_lifecycle, provider_parity, cli_provider_resolution,
delegate, credential_pool, context_compressor, plus the 4 hostname test
modules). 26-assertion E2E call-site verification across 6 modules passes.
2026-04-20 21:17:28 -07:00
|
|
|
|
if base_url_host_matches(base_url, "api.kimi.com"):
|
2026-04-18 22:55:36 +08:00
|
|
|
|
extra["default_headers"] = {"User-Agent": "claude-code/0.1.0"}
|
fix: sweep remaining provider-URL substring checks across codebase
Completes the hostname-hardening sweep — every substring check against a
provider host in live-routing code is now hostname-based. This closes the
same false-positive class for OpenRouter, GitHub Copilot, Kimi, Qwen,
ChatGPT/Codex, Bedrock, GitHub Models, Vercel AI Gateway, Nous, Z.AI,
Moonshot, Arcee, and MiniMax that the original PR closed for OpenAI, xAI,
and Anthropic.
New helper:
- utils.base_url_host_matches(base_url, domain) — safe counterpart to
'domain in base_url'. Accepts hostname equality and subdomain matches;
rejects path segments, host suffixes, and prefix collisions.
Call sites converted (real-code only; tests, optional-skills, red-teaming
scripts untouched):
run_agent.py (10 sites):
- AIAgent.__init__ Bedrock branch, ChatGPT/Codex branch (also path check)
- header cascade for openrouter / copilot / kimi / qwen / chatgpt
- interleaved-thinking trigger (openrouter + claude)
- _is_openrouter_url(), _is_qwen_portal()
- is_native_anthropic check
- github-models-vs-copilot detection (3 sites)
- reasoning-capable route gate (nousresearch, vercel, github)
- codex-backend detection in API kwargs build
- fallback api_mode Bedrock detection
agent/auxiliary_client.py (7 sites):
- extra-headers cascades in 4 distinct client-construction paths
(resolve custom, resolve auto, OpenRouter-fallback-to-custom,
_async_client_from_sync, resolve_provider_client explicit-custom,
resolve_auto_with_codex)
- _is_openrouter_client() base_url sniff
agent/usage_pricing.py:
- resolve_billing_route openrouter branch
agent/model_metadata.py:
- _is_openrouter_base_url(), Bedrock context-length lookup
hermes_cli/providers.py:
- determine_api_mode Bedrock heuristic
hermes_cli/runtime_provider.py:
- _is_openrouter_url flag for API-key preference (issues #420, #560)
hermes_cli/doctor.py:
- Kimi User-Agent header for /models probes
tools/delegate_tool.py:
- subagent Codex endpoint detection
trajectory_compressor.py:
- _detect_provider() cascade (8 providers: openrouter, nous, codex, zai,
kimi-coding, arcee, minimax-cn, minimax)
cli.py, gateway/run.py:
- /model-switch cache-enabled hint (openrouter + claude)
Bedrock detection tightened from 'bedrock-runtime in url' to
'hostname starts with bedrock-runtime. AND host is under amazonaws.com'.
ChatGPT/Codex detection tightened from 'chatgpt.com/backend-api/codex in
url' to 'hostname is chatgpt.com AND path contains /backend-api/codex'.
Tests:
- tests/test_base_url_hostname.py extended with a base_url_host_matches
suite (exact match, subdomain, path-segment rejection, host-suffix
rejection, host-prefix rejection, empty-input, case-insensitivity,
trailing dot).
Validation: 651 targeted tests pass (runtime_provider, minimax, bedrock,
gemini, auxiliary, codex_cloudflare, usage_pricing, compressor_fallback,
fallback_model, openai_client_lifecycle, provider_parity, cli_provider_resolution,
delegate, credential_pool, context_compressor, plus the 4 hostname test
modules). 26-assertion E2E call-site verification across 6 modules passes.
2026-04-20 21:17:28 -07:00
|
|
|
|
elif base_url_host_matches(base_url, "api.githubcopilot.com"):
|
2026-03-17 23:40:22 -07:00
|
|
|
|
from hermes_cli.models import copilot_default_headers
|
|
|
|
|
|
|
|
|
|
|
|
extra["default_headers"] = copilot_default_headers()
|
2026-03-07 20:43:34 -05:00
|
|
|
|
return OpenAI(api_key=api_key, base_url=base_url, **extra), model
|
2026-03-06 19:08:54 -08:00
|
|
|
|
|
|
|
|
|
|
return None, None
|
|
|
|
|
|
|
|
|
|
|
|
|
2026-03-07 08:52:06 -08:00
|
|
|
|
# ── Provider resolution helpers ─────────────────────────────────────────────
|
2026-02-22 02:16:11 -08:00
|
|
|
|
|
2026-03-14 20:48:29 -07:00
|
|
|
|
|
|
|
|
|
|
|
2026-03-07 08:52:06 -08:00
|
|
|
|
def _try_openrouter() -> Tuple[Optional[OpenAI], Optional[str]]:
|
feat(auth): same-provider credential pools with rotation, custom endpoint support, and interactive CLI (#2647)
* feat(auth): add same-provider credential pools and rotation UX
Add same-provider credential pooling so Hermes can rotate across
multiple credentials for a single provider, recover from exhausted
credentials without jumping providers immediately, and configure
that behavior directly in hermes setup.
- agent/credential_pool.py: persisted per-provider credential pools
- hermes auth add/list/remove/reset CLI commands
- 429/402/401 recovery with pool rotation in run_agent.py
- Setup wizard integration for pool strategy configuration
- Auto-seeding from env vars and existing OAuth state
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
Salvaged from PR #2647
* fix(tests): prevent pool auto-seeding from host env in credential pool tests
Tests for non-pool Anthropic paths and auth remove were failing when
host env vars (ANTHROPIC_API_KEY) or file-backed OAuth credentials
were present. The pool auto-seeding picked these up, causing unexpected
pool entries in tests.
- Mock _select_pool_entry in auxiliary_client OAuth flag tests
- Clear Anthropic env vars and mock _seed_from_singletons in auth remove test
* feat(auth): add thread safety, least_used strategy, and request counting
- Add threading.Lock to CredentialPool for gateway thread safety
(concurrent requests from multiple gateway sessions could race on
pool state mutations without this)
- Add 'least_used' rotation strategy that selects the credential
with the lowest request_count, distributing load more evenly
- Add request_count field to PooledCredential for usage tracking
- Add mark_used() method to increment per-credential request counts
- Wrap select(), mark_exhausted_and_rotate(), and try_refresh_current()
with lock acquisition
- Add tests: least_used selection, mark_used counting, concurrent
thread safety (4 threads × 20 selects with no corruption)
* feat(auth): add interactive mode for bare 'hermes auth' command
When 'hermes auth' is called without a subcommand, it now launches an
interactive wizard that:
1. Shows full credential pool status across all providers
2. Offers a menu: add, remove, reset cooldowns, set strategy
3. For OAuth-capable providers (anthropic, nous, openai-codex), the
add flow explicitly asks 'API key or OAuth login?' — making it
clear that both auth types are supported for the same provider
4. Strategy picker shows all 4 options (fill_first, round_robin,
least_used, random) with the current selection marked
5. Remove flow shows entries with indices for easy selection
The subcommand paths (hermes auth add/list/remove/reset) still work
exactly as before for scripted/non-interactive use.
* fix(tests): update runtime_provider tests for config.yaml source of truth (#4165)
Tests were using OPENAI_BASE_URL env var which is no longer consulted
after #4165. Updated to use model config (provider, base_url, api_key)
which is the new single source of truth for custom endpoint URLs.
* feat(auth): support custom endpoint credential pools keyed by provider name
Custom OpenAI-compatible endpoints all share provider='custom', making
the provider-keyed pool useless. Now pools for custom endpoints are
keyed by 'custom:<normalized_name>' where the name comes from the
custom_providers config list (auto-generated from URL hostname).
- Pool key format: 'custom:together.ai', 'custom:local-(localhost:8080)'
- load_pool('custom:name') seeds from custom_providers api_key AND
model.api_key when base_url matches
- hermes auth add/list now shows custom endpoints alongside registry
providers
- _resolve_openrouter_runtime and _resolve_named_custom_runtime check
pool before falling back to single config key
- 6 new tests covering custom pool keying, seeding, and listing
* docs: add Excalidraw diagram of full credential pool flow
Comprehensive architecture diagram showing:
- Credential sources (env vars, auth.json OAuth, config.yaml, CLI)
- Pool storage and auto-seeding
- Runtime resolution paths (registry, custom, OpenRouter)
- Error recovery (429 retry-then-rotate, 402 immediate, 401 refresh)
- CLI management commands and strategy configuration
Open at: https://excalidraw.com/#json=2Ycqhqpi6f12E_3ITyiwh,c7u9jSt5BwrmiVzHGbm87g
* fix(tests): update setup wizard pool tests for unified select_provider_and_model flow
The setup wizard now delegates to select_provider_and_model() instead
of using its own prompt_choice-based provider picker. Tests needed:
- Mock select_provider_and_model as no-op (provider pre-written to config)
- Call _stub_tts BEFORE custom prompt_choice mock (it overwrites it)
- Pre-write model.provider to config so the pool step is reached
* docs: add comprehensive credential pool documentation
- New page: website/docs/user-guide/features/credential-pools.md
Full guide covering quick start, CLI commands, rotation strategies,
error recovery, custom endpoint pools, auto-discovery, thread safety,
architecture, and storage format.
- Updated fallback-providers.md to reference credential pools as the
first layer of resilience (same-provider rotation before cross-provider)
- Added hermes auth to CLI commands reference with usage examples
- Added credential_pool_strategies to configuration guide
* chore: remove excalidraw diagram from repo (external link only)
* refactor: simplify credential pool code — extract helpers, collapse extras, dedup patterns
- _load_config_safe(): replace 4 identical try/except/import blocks
- _iter_custom_providers(): shared generator for custom provider iteration
- PooledCredential.extra dict: collapse 11 round-trip-only fields
(token_type, scope, client_id, portal_base_url, obtained_at,
expires_in, agent_key_id, agent_key_expires_in, agent_key_reused,
agent_key_obtained_at, tls) into a single extra dict with
__getattr__ for backward-compatible access
- _available_entries(): shared exhaustion-check between select and peek
- Dedup anthropic OAuth seeding (hermes_pkce + claude_code identical)
- SimpleNamespace replaces class _Args boilerplate in auth_commands
- _try_resolve_from_custom_pool(): shared pool-check in runtime_provider
Net -17 lines. All 383 targeted tests pass.
---------
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
2026-03-31 03:10:01 -07:00
|
|
|
|
pool_present, entry = _select_pool_entry("openrouter")
|
|
|
|
|
|
if pool_present:
|
|
|
|
|
|
or_key = _pool_runtime_api_key(entry)
|
|
|
|
|
|
if not or_key:
|
|
|
|
|
|
return None, None
|
|
|
|
|
|
base_url = _pool_runtime_base_url(entry, OPENROUTER_BASE_URL) or OPENROUTER_BASE_URL
|
|
|
|
|
|
logger.debug("Auxiliary client: OpenRouter via pool")
|
|
|
|
|
|
return OpenAI(api_key=or_key, base_url=base_url,
|
|
|
|
|
|
default_headers=_OR_HEADERS), _OPENROUTER_MODEL
|
|
|
|
|
|
|
2026-02-22 02:16:11 -08:00
|
|
|
|
or_key = os.getenv("OPENROUTER_API_KEY")
|
2026-03-07 08:52:06 -08:00
|
|
|
|
if not or_key:
|
|
|
|
|
|
return None, None
|
|
|
|
|
|
logger.debug("Auxiliary client: OpenRouter")
|
|
|
|
|
|
return OpenAI(api_key=or_key, base_url=OPENROUTER_BASE_URL,
|
|
|
|
|
|
default_headers=_OR_HEADERS), _OPENROUTER_MODEL
|
2026-02-22 02:16:11 -08:00
|
|
|
|
|
2026-03-07 08:52:06 -08:00
|
|
|
|
|
2026-04-07 21:41:05 -07:00
|
|
|
|
def _try_nous(vision: bool = False) -> Tuple[Optional[OpenAI], Optional[str]]:
|
fix: Nous Portal rate limit guard — prevent retry amplification (#10568)
When Nous returns a 429, the retry amplification chain burns up to 9
API requests per conversation turn (3 SDK retries × 3 Hermes retries),
each counting against RPH and deepening the rate limit. With multiple
concurrent sessions (cron + gateway + auxiliary), this creates a spiral
where retries keep the limit tapped indefinitely.
New module: agent/nous_rate_guard.py
- Shared file-based rate limit state (~/.hermes/rate_limits/nous.json)
- Parses reset time from x-ratelimit-reset-requests-1h, x-ratelimit-
reset-requests, retry-after headers, or error context
- Falls back to 5-minute default cooldown if no header data
- Atomic writes (tempfile + rename) for cross-process safety
- Auto-cleanup of expired state files
run_agent.py changes:
- Top-of-retry-loop guard: when another session already recorded Nous
as rate-limited, skip the API call entirely. Try fallback provider
first, then return a clear message with the reset time.
- On 429 from Nous: record rate limit state and skip further retries
(sets retry_count = max_retries to trigger fallback path)
- On success from Nous: clear the rate limit state so other sessions
know they can resume
auxiliary_client.py changes:
- _try_nous() checks rate guard before attempting Nous in the auxiliary
fallback chain. When rate-limited, returns (None, None) so the chain
skips to the next provider instead of piling more requests onto Nous.
This eliminates three sources of amplification:
1. Hermes-level retries (saves 6 of 9 calls per turn)
2. Cross-session retries (cron + gateway all skip Nous)
3. Auxiliary fallback to Nous (compression/session_search skip too)
Includes 24 tests covering the rate guard module, header parsing,
state lifecycle, and auxiliary client integration.
2026-04-15 16:31:48 -07:00
|
|
|
|
# Check cross-session rate limit guard before attempting Nous —
|
|
|
|
|
|
# if another session already recorded a 429, skip Nous entirely
|
|
|
|
|
|
# to avoid piling more requests onto the tapped RPH bucket.
|
|
|
|
|
|
try:
|
|
|
|
|
|
from agent.nous_rate_guard import nous_rate_limit_remaining
|
|
|
|
|
|
_remaining = nous_rate_limit_remaining()
|
|
|
|
|
|
if _remaining is not None and _remaining > 0:
|
|
|
|
|
|
logger.debug(
|
|
|
|
|
|
"Auxiliary: skipping Nous Portal (rate-limited, resets in %.0fs)",
|
|
|
|
|
|
_remaining,
|
|
|
|
|
|
)
|
|
|
|
|
|
return None, None
|
|
|
|
|
|
except Exception:
|
|
|
|
|
|
pass
|
|
|
|
|
|
|
2026-02-22 02:16:11 -08:00
|
|
|
|
nous = _read_nous_auth()
|
2026-04-21 14:45:13 -06:00
|
|
|
|
runtime = _resolve_nous_runtime_api(force_refresh=False)
|
|
|
|
|
|
if runtime is None and not nous:
|
2026-03-07 08:52:06 -08:00
|
|
|
|
return None, None
|
|
|
|
|
|
global auxiliary_is_nous
|
|
|
|
|
|
auxiliary_is_nous = True
|
|
|
|
|
|
logger.debug("Auxiliary client: Nous Portal")
|
feat(aux): use Portal /api/nous/recommended-models for auxiliary models
Wire the auxiliary client (compaction, vision, session search, web extract)
to the Nous Portal's curated recommended-models endpoint when running on
Nous Portal, with a TTL-cached fetch that mirrors how we pull /models for
pricing.
hermes_cli/models.py
- fetch_nous_recommended_models(portal_base_url, force_refresh=False)
10-minute TTL cache, keyed per portal URL (staging vs prod don't
collide). Public endpoint, no auth required. Returns {} on any
failure so callers always get a dict.
- get_nous_recommended_aux_model(vision, free_tier=None, ...)
Tier-aware pick from the payload:
- Paid tier → paidRecommended{Vision,Compaction}Model, falling back
to freeRecommended* when the paid field is null (common during
staged rollouts of new paid models).
- Free tier → freeRecommended* only, never leaks paid models.
When free_tier is None, auto-detects via the existing
check_nous_free_tier() helper (already cached 3 min against
/api/oauth/account). Detection errors default to paid so we never
silently downgrade a paying user.
agent/auxiliary_client.py — _try_nous()
- Replaces the hardcoded xiaomi/mimo free-tier branch with a single call
to get_nous_recommended_aux_model(vision=vision).
- Falls back to _NOUS_MODEL (google/gemini-3-flash-preview) when the
Portal is unreachable or returns a null recommendation.
- The Portal is now the source of truth for aux model selection; the
xiaomi allowlist we used to carry is effectively dead.
Tests (15 new)
- tests/hermes_cli/test_models.py::TestNousRecommendedModels
Fetch caching, per-portal keying, network failure, force_refresh;
paid-prefers-paid, paid-falls-to-free, free-never-leaks-paid,
auto-detect, detection-error → paid default, null/blank modelName
handling.
- tests/agent/test_auxiliary_client.py::TestNousAuxiliaryRefresh
_try_nous honors Portal recommendation for text + vision, falls
back to google/gemini-3-flash-preview on None or exception.
Behavior won't visibly change today — both tier recommendations currently
point at google/gemini-3-flash-preview — but the moment the Portal ships
a better paid recommendation, subscribers pick it up within 10 minutes
without a Hermes release.
2026-04-21 22:53:45 -04:00
|
|
|
|
|
|
|
|
|
|
# Ask the Portal which model it currently recommends for this task type.
|
|
|
|
|
|
# The /api/nous/recommended-models endpoint is the authoritative source:
|
|
|
|
|
|
# it distinguishes paid vs free tier recommendations, and get_nous_recommended_aux_model
|
|
|
|
|
|
# auto-detects the caller's tier via check_nous_free_tier(). Fall back to
|
|
|
|
|
|
# _NOUS_MODEL (google/gemini-3-flash-preview) when the Portal is unreachable
|
|
|
|
|
|
# or returns a null recommendation for this task type.
|
|
|
|
|
|
model = _NOUS_MODEL
|
2026-04-07 02:17:14 -04:00
|
|
|
|
try:
|
feat(aux): use Portal /api/nous/recommended-models for auxiliary models
Wire the auxiliary client (compaction, vision, session search, web extract)
to the Nous Portal's curated recommended-models endpoint when running on
Nous Portal, with a TTL-cached fetch that mirrors how we pull /models for
pricing.
hermes_cli/models.py
- fetch_nous_recommended_models(portal_base_url, force_refresh=False)
10-minute TTL cache, keyed per portal URL (staging vs prod don't
collide). Public endpoint, no auth required. Returns {} on any
failure so callers always get a dict.
- get_nous_recommended_aux_model(vision, free_tier=None, ...)
Tier-aware pick from the payload:
- Paid tier → paidRecommended{Vision,Compaction}Model, falling back
to freeRecommended* when the paid field is null (common during
staged rollouts of new paid models).
- Free tier → freeRecommended* only, never leaks paid models.
When free_tier is None, auto-detects via the existing
check_nous_free_tier() helper (already cached 3 min against
/api/oauth/account). Detection errors default to paid so we never
silently downgrade a paying user.
agent/auxiliary_client.py — _try_nous()
- Replaces the hardcoded xiaomi/mimo free-tier branch with a single call
to get_nous_recommended_aux_model(vision=vision).
- Falls back to _NOUS_MODEL (google/gemini-3-flash-preview) when the
Portal is unreachable or returns a null recommendation.
- The Portal is now the source of truth for aux model selection; the
xiaomi allowlist we used to carry is effectively dead.
Tests (15 new)
- tests/hermes_cli/test_models.py::TestNousRecommendedModels
Fetch caching, per-portal keying, network failure, force_refresh;
paid-prefers-paid, paid-falls-to-free, free-never-leaks-paid,
auto-detect, detection-error → paid default, null/blank modelName
handling.
- tests/agent/test_auxiliary_client.py::TestNousAuxiliaryRefresh
_try_nous honors Portal recommendation for text + vision, falls
back to google/gemini-3-flash-preview on None or exception.
Behavior won't visibly change today — both tier recommendations currently
point at google/gemini-3-flash-preview — but the moment the Portal ships
a better paid recommendation, subscribers pick it up within 10 minutes
without a Hermes release.
2026-04-21 22:53:45 -04:00
|
|
|
|
from hermes_cli.models import get_nous_recommended_aux_model
|
|
|
|
|
|
recommended = get_nous_recommended_aux_model(vision=vision)
|
|
|
|
|
|
if recommended:
|
|
|
|
|
|
model = recommended
|
|
|
|
|
|
logger.debug(
|
|
|
|
|
|
"Auxiliary/%s: using Portal-recommended model %s",
|
|
|
|
|
|
"vision" if vision else "text", model,
|
|
|
|
|
|
)
|
|
|
|
|
|
else:
|
|
|
|
|
|
logger.debug(
|
|
|
|
|
|
"Auxiliary/%s: no Portal recommendation, falling back to %s",
|
|
|
|
|
|
"vision" if vision else "text", model,
|
|
|
|
|
|
)
|
|
|
|
|
|
except Exception as exc:
|
|
|
|
|
|
logger.debug(
|
|
|
|
|
|
"Auxiliary/%s: recommended-models lookup failed (%s); "
|
|
|
|
|
|
"falling back to %s",
|
|
|
|
|
|
"vision" if vision else "text", exc, model,
|
|
|
|
|
|
)
|
|
|
|
|
|
|
2026-04-21 14:45:13 -06:00
|
|
|
|
if runtime is not None:
|
|
|
|
|
|
api_key, base_url = runtime
|
|
|
|
|
|
else:
|
|
|
|
|
|
api_key = _nous_api_key(nous or {})
|
|
|
|
|
|
base_url = str((nous or {}).get("inference_base_url") or _nous_base_url()).rstrip("/")
|
2026-03-07 08:52:06 -08:00
|
|
|
|
return (
|
feat(auth): same-provider credential pools with rotation, custom endpoint support, and interactive CLI (#2647)
* feat(auth): add same-provider credential pools and rotation UX
Add same-provider credential pooling so Hermes can rotate across
multiple credentials for a single provider, recover from exhausted
credentials without jumping providers immediately, and configure
that behavior directly in hermes setup.
- agent/credential_pool.py: persisted per-provider credential pools
- hermes auth add/list/remove/reset CLI commands
- 429/402/401 recovery with pool rotation in run_agent.py
- Setup wizard integration for pool strategy configuration
- Auto-seeding from env vars and existing OAuth state
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
Salvaged from PR #2647
* fix(tests): prevent pool auto-seeding from host env in credential pool tests
Tests for non-pool Anthropic paths and auth remove were failing when
host env vars (ANTHROPIC_API_KEY) or file-backed OAuth credentials
were present. The pool auto-seeding picked these up, causing unexpected
pool entries in tests.
- Mock _select_pool_entry in auxiliary_client OAuth flag tests
- Clear Anthropic env vars and mock _seed_from_singletons in auth remove test
* feat(auth): add thread safety, least_used strategy, and request counting
- Add threading.Lock to CredentialPool for gateway thread safety
(concurrent requests from multiple gateway sessions could race on
pool state mutations without this)
- Add 'least_used' rotation strategy that selects the credential
with the lowest request_count, distributing load more evenly
- Add request_count field to PooledCredential for usage tracking
- Add mark_used() method to increment per-credential request counts
- Wrap select(), mark_exhausted_and_rotate(), and try_refresh_current()
with lock acquisition
- Add tests: least_used selection, mark_used counting, concurrent
thread safety (4 threads × 20 selects with no corruption)
* feat(auth): add interactive mode for bare 'hermes auth' command
When 'hermes auth' is called without a subcommand, it now launches an
interactive wizard that:
1. Shows full credential pool status across all providers
2. Offers a menu: add, remove, reset cooldowns, set strategy
3. For OAuth-capable providers (anthropic, nous, openai-codex), the
add flow explicitly asks 'API key or OAuth login?' — making it
clear that both auth types are supported for the same provider
4. Strategy picker shows all 4 options (fill_first, round_robin,
least_used, random) with the current selection marked
5. Remove flow shows entries with indices for easy selection
The subcommand paths (hermes auth add/list/remove/reset) still work
exactly as before for scripted/non-interactive use.
* fix(tests): update runtime_provider tests for config.yaml source of truth (#4165)
Tests were using OPENAI_BASE_URL env var which is no longer consulted
after #4165. Updated to use model config (provider, base_url, api_key)
which is the new single source of truth for custom endpoint URLs.
* feat(auth): support custom endpoint credential pools keyed by provider name
Custom OpenAI-compatible endpoints all share provider='custom', making
the provider-keyed pool useless. Now pools for custom endpoints are
keyed by 'custom:<normalized_name>' where the name comes from the
custom_providers config list (auto-generated from URL hostname).
- Pool key format: 'custom:together.ai', 'custom:local-(localhost:8080)'
- load_pool('custom:name') seeds from custom_providers api_key AND
model.api_key when base_url matches
- hermes auth add/list now shows custom endpoints alongside registry
providers
- _resolve_openrouter_runtime and _resolve_named_custom_runtime check
pool before falling back to single config key
- 6 new tests covering custom pool keying, seeding, and listing
* docs: add Excalidraw diagram of full credential pool flow
Comprehensive architecture diagram showing:
- Credential sources (env vars, auth.json OAuth, config.yaml, CLI)
- Pool storage and auto-seeding
- Runtime resolution paths (registry, custom, OpenRouter)
- Error recovery (429 retry-then-rotate, 402 immediate, 401 refresh)
- CLI management commands and strategy configuration
Open at: https://excalidraw.com/#json=2Ycqhqpi6f12E_3ITyiwh,c7u9jSt5BwrmiVzHGbm87g
* fix(tests): update setup wizard pool tests for unified select_provider_and_model flow
The setup wizard now delegates to select_provider_and_model() instead
of using its own prompt_choice-based provider picker. Tests needed:
- Mock select_provider_and_model as no-op (provider pre-written to config)
- Call _stub_tts BEFORE custom prompt_choice mock (it overwrites it)
- Pre-write model.provider to config so the pool step is reached
* docs: add comprehensive credential pool documentation
- New page: website/docs/user-guide/features/credential-pools.md
Full guide covering quick start, CLI commands, rotation strategies,
error recovery, custom endpoint pools, auto-discovery, thread safety,
architecture, and storage format.
- Updated fallback-providers.md to reference credential pools as the
first layer of resilience (same-provider rotation before cross-provider)
- Added hermes auth to CLI commands reference with usage examples
- Added credential_pool_strategies to configuration guide
* chore: remove excalidraw diagram from repo (external link only)
* refactor: simplify credential pool code — extract helpers, collapse extras, dedup patterns
- _load_config_safe(): replace 4 identical try/except/import blocks
- _iter_custom_providers(): shared generator for custom provider iteration
- PooledCredential.extra dict: collapse 11 round-trip-only fields
(token_type, scope, client_id, portal_base_url, obtained_at,
expires_in, agent_key_id, agent_key_expires_in, agent_key_reused,
agent_key_obtained_at, tls) into a single extra dict with
__getattr__ for backward-compatible access
- _available_entries(): shared exhaustion-check between select and peek
- Dedup anthropic OAuth seeding (hermes_pkce + claude_code identical)
- SimpleNamespace replaces class _Args boilerplate in auth_commands
- _try_resolve_from_custom_pool(): shared pool-check in runtime_provider
Net -17 lines. All 383 targeted tests pass.
---------
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
2026-03-31 03:10:01 -07:00
|
|
|
|
OpenAI(
|
2026-04-21 14:45:13 -06:00
|
|
|
|
api_key=api_key,
|
|
|
|
|
|
base_url=base_url,
|
feat(auth): same-provider credential pools with rotation, custom endpoint support, and interactive CLI (#2647)
* feat(auth): add same-provider credential pools and rotation UX
Add same-provider credential pooling so Hermes can rotate across
multiple credentials for a single provider, recover from exhausted
credentials without jumping providers immediately, and configure
that behavior directly in hermes setup.
- agent/credential_pool.py: persisted per-provider credential pools
- hermes auth add/list/remove/reset CLI commands
- 429/402/401 recovery with pool rotation in run_agent.py
- Setup wizard integration for pool strategy configuration
- Auto-seeding from env vars and existing OAuth state
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
Salvaged from PR #2647
* fix(tests): prevent pool auto-seeding from host env in credential pool tests
Tests for non-pool Anthropic paths and auth remove were failing when
host env vars (ANTHROPIC_API_KEY) or file-backed OAuth credentials
were present. The pool auto-seeding picked these up, causing unexpected
pool entries in tests.
- Mock _select_pool_entry in auxiliary_client OAuth flag tests
- Clear Anthropic env vars and mock _seed_from_singletons in auth remove test
* feat(auth): add thread safety, least_used strategy, and request counting
- Add threading.Lock to CredentialPool for gateway thread safety
(concurrent requests from multiple gateway sessions could race on
pool state mutations without this)
- Add 'least_used' rotation strategy that selects the credential
with the lowest request_count, distributing load more evenly
- Add request_count field to PooledCredential for usage tracking
- Add mark_used() method to increment per-credential request counts
- Wrap select(), mark_exhausted_and_rotate(), and try_refresh_current()
with lock acquisition
- Add tests: least_used selection, mark_used counting, concurrent
thread safety (4 threads × 20 selects with no corruption)
* feat(auth): add interactive mode for bare 'hermes auth' command
When 'hermes auth' is called without a subcommand, it now launches an
interactive wizard that:
1. Shows full credential pool status across all providers
2. Offers a menu: add, remove, reset cooldowns, set strategy
3. For OAuth-capable providers (anthropic, nous, openai-codex), the
add flow explicitly asks 'API key or OAuth login?' — making it
clear that both auth types are supported for the same provider
4. Strategy picker shows all 4 options (fill_first, round_robin,
least_used, random) with the current selection marked
5. Remove flow shows entries with indices for easy selection
The subcommand paths (hermes auth add/list/remove/reset) still work
exactly as before for scripted/non-interactive use.
* fix(tests): update runtime_provider tests for config.yaml source of truth (#4165)
Tests were using OPENAI_BASE_URL env var which is no longer consulted
after #4165. Updated to use model config (provider, base_url, api_key)
which is the new single source of truth for custom endpoint URLs.
* feat(auth): support custom endpoint credential pools keyed by provider name
Custom OpenAI-compatible endpoints all share provider='custom', making
the provider-keyed pool useless. Now pools for custom endpoints are
keyed by 'custom:<normalized_name>' where the name comes from the
custom_providers config list (auto-generated from URL hostname).
- Pool key format: 'custom:together.ai', 'custom:local-(localhost:8080)'
- load_pool('custom:name') seeds from custom_providers api_key AND
model.api_key when base_url matches
- hermes auth add/list now shows custom endpoints alongside registry
providers
- _resolve_openrouter_runtime and _resolve_named_custom_runtime check
pool before falling back to single config key
- 6 new tests covering custom pool keying, seeding, and listing
* docs: add Excalidraw diagram of full credential pool flow
Comprehensive architecture diagram showing:
- Credential sources (env vars, auth.json OAuth, config.yaml, CLI)
- Pool storage and auto-seeding
- Runtime resolution paths (registry, custom, OpenRouter)
- Error recovery (429 retry-then-rotate, 402 immediate, 401 refresh)
- CLI management commands and strategy configuration
Open at: https://excalidraw.com/#json=2Ycqhqpi6f12E_3ITyiwh,c7u9jSt5BwrmiVzHGbm87g
* fix(tests): update setup wizard pool tests for unified select_provider_and_model flow
The setup wizard now delegates to select_provider_and_model() instead
of using its own prompt_choice-based provider picker. Tests needed:
- Mock select_provider_and_model as no-op (provider pre-written to config)
- Call _stub_tts BEFORE custom prompt_choice mock (it overwrites it)
- Pre-write model.provider to config so the pool step is reached
* docs: add comprehensive credential pool documentation
- New page: website/docs/user-guide/features/credential-pools.md
Full guide covering quick start, CLI commands, rotation strategies,
error recovery, custom endpoint pools, auto-discovery, thread safety,
architecture, and storage format.
- Updated fallback-providers.md to reference credential pools as the
first layer of resilience (same-provider rotation before cross-provider)
- Added hermes auth to CLI commands reference with usage examples
- Added credential_pool_strategies to configuration guide
* chore: remove excalidraw diagram from repo (external link only)
* refactor: simplify credential pool code — extract helpers, collapse extras, dedup patterns
- _load_config_safe(): replace 4 identical try/except/import blocks
- _iter_custom_providers(): shared generator for custom provider iteration
- PooledCredential.extra dict: collapse 11 round-trip-only fields
(token_type, scope, client_id, portal_base_url, obtained_at,
expires_in, agent_key_id, agent_key_expires_in, agent_key_reused,
agent_key_obtained_at, tls) into a single extra dict with
__getattr__ for backward-compatible access
- _available_entries(): shared exhaustion-check between select and peek
- Dedup anthropic OAuth seeding (hermes_pkce + claude_code identical)
- SimpleNamespace replaces class _Args boilerplate in auth_commands
- _try_resolve_from_custom_pool(): shared pool-check in runtime_provider
Net -17 lines. All 383 targeted tests pass.
---------
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
2026-03-31 03:10:01 -07:00
|
|
|
|
),
|
|
|
|
|
|
model,
|
2026-03-07 08:52:06 -08:00
|
|
|
|
)
|
|
|
|
|
|
|
2026-02-22 02:16:11 -08:00
|
|
|
|
|
fix: auxiliary client uses main model for custom/local endpoints instead of gpt-4o-mini (#1189)
* fix: prevent model/provider mismatch when switching providers during active gateway
When _update_config_for_provider() writes the new provider and base_url
to config.yaml, the gateway (which re-reads config per-message) can pick
up the change before model selection completes. This causes the old model
name (e.g. 'anthropic/claude-opus-4.6') to be sent to the new provider's
API (e.g. MiniMax), which fails.
Changes:
- _update_config_for_provider() now accepts an optional default_model
parameter. When provided and the current model.default is empty or
uses OpenRouter format (contains '/'), it sets a safe default model
for the new provider.
- All setup.py callers for direct-API providers (zai, kimi, minimax,
minimax-cn, anthropic) now pass a provider-appropriate default model.
- _setup_provider_model_selection() now validates the 'Keep current'
choice: if the current model uses OpenRouter format and wouldn't work
with the new provider, it warns and switches to the provider's first
default model instead of silently keeping the incompatible name.
Reported by a user on Home Assistant whose gateway started sending
'anthropic/claude-opus-4.6' to MiniMax's API after running hermes setup.
* fix: auxiliary client uses main model for custom/local endpoints instead of gpt-4o-mini
When a user runs a local server (e.g. Qwen3.5-9B via OPENAI_BASE_URL),
the auxiliary client (context compression, vision, session search) would
send requests for 'gpt-4o-mini' or 'google/gemini-3-flash-preview' to
the local server, which only serves one model — causing 404 errors
mid-task.
Changes:
- _try_custom_endpoint() now reads the user's configured main model via
_read_main_model() (checks OPENAI_MODEL → HERMES_MODEL → LLM_MODEL →
config.yaml model.default) instead of hardcoding 'gpt-4o-mini'.
- resolve_provider_client() auto mode now detects when an OpenRouter-
formatted model override (containing '/') would be sent to a non-
OpenRouter provider (like a local server) and drops it in favor of
the provider's default model.
- Test isolation fixes: properly clear env vars in 'nothing available'
tests to prevent host environment leakage.
2026-03-13 10:02:16 -07:00
|
|
|
|
def _read_main_model() -> str:
|
refactor: make config.yaml the single source of truth for endpoint URLs (#4165)
OPENAI_BASE_URL was written to .env AND config.yaml, creating a dual-source
confusion. Users (especially Docker) would see the URL in .env and assume
that's where all config lives, then wonder why LLM_MODEL in .env didn't work.
Changes:
- Remove all 27 save_env_value("OPENAI_BASE_URL", ...) calls across main.py,
setup.py, and tools_config.py
- Remove OPENAI_BASE_URL env var reading from runtime_provider.py, cli.py,
models.py, and gateway/run.py
- Remove LLM_MODEL/HERMES_MODEL env var reading from gateway/run.py and
auxiliary_client.py — config.yaml model.default is authoritative
- Vision base URL now saved to config.yaml auxiliary.vision.base_url
(both setup wizard and tools_config paths)
- Tests updated to set config values instead of env vars
Convention enforced: .env is for SECRETS only (API keys). All other
configuration (model names, base URLs, provider selection) lives
exclusively in config.yaml.
2026-03-30 22:02:53 -07:00
|
|
|
|
"""Read the user's configured main model from config.yaml.
|
fix: auxiliary client uses main model for custom/local endpoints instead of gpt-4o-mini (#1189)
* fix: prevent model/provider mismatch when switching providers during active gateway
When _update_config_for_provider() writes the new provider and base_url
to config.yaml, the gateway (which re-reads config per-message) can pick
up the change before model selection completes. This causes the old model
name (e.g. 'anthropic/claude-opus-4.6') to be sent to the new provider's
API (e.g. MiniMax), which fails.
Changes:
- _update_config_for_provider() now accepts an optional default_model
parameter. When provided and the current model.default is empty or
uses OpenRouter format (contains '/'), it sets a safe default model
for the new provider.
- All setup.py callers for direct-API providers (zai, kimi, minimax,
minimax-cn, anthropic) now pass a provider-appropriate default model.
- _setup_provider_model_selection() now validates the 'Keep current'
choice: if the current model uses OpenRouter format and wouldn't work
with the new provider, it warns and switches to the provider's first
default model instead of silently keeping the incompatible name.
Reported by a user on Home Assistant whose gateway started sending
'anthropic/claude-opus-4.6' to MiniMax's API after running hermes setup.
* fix: auxiliary client uses main model for custom/local endpoints instead of gpt-4o-mini
When a user runs a local server (e.g. Qwen3.5-9B via OPENAI_BASE_URL),
the auxiliary client (context compression, vision, session search) would
send requests for 'gpt-4o-mini' or 'google/gemini-3-flash-preview' to
the local server, which only serves one model — causing 404 errors
mid-task.
Changes:
- _try_custom_endpoint() now reads the user's configured main model via
_read_main_model() (checks OPENAI_MODEL → HERMES_MODEL → LLM_MODEL →
config.yaml model.default) instead of hardcoding 'gpt-4o-mini'.
- resolve_provider_client() auto mode now detects when an OpenRouter-
formatted model override (containing '/') would be sent to a non-
OpenRouter provider (like a local server) and drops it in favor of
the provider's default model.
- Test isolation fixes: properly clear env vars in 'nothing available'
tests to prevent host environment leakage.
2026-03-13 10:02:16 -07:00
|
|
|
|
|
refactor: make config.yaml the single source of truth for endpoint URLs (#4165)
OPENAI_BASE_URL was written to .env AND config.yaml, creating a dual-source
confusion. Users (especially Docker) would see the URL in .env and assume
that's where all config lives, then wonder why LLM_MODEL in .env didn't work.
Changes:
- Remove all 27 save_env_value("OPENAI_BASE_URL", ...) calls across main.py,
setup.py, and tools_config.py
- Remove OPENAI_BASE_URL env var reading from runtime_provider.py, cli.py,
models.py, and gateway/run.py
- Remove LLM_MODEL/HERMES_MODEL env var reading from gateway/run.py and
auxiliary_client.py — config.yaml model.default is authoritative
- Vision base URL now saved to config.yaml auxiliary.vision.base_url
(both setup wizard and tools_config paths)
- Tests updated to set config values instead of env vars
Convention enforced: .env is for SECRETS only (API keys). All other
configuration (model names, base URLs, provider selection) lives
exclusively in config.yaml.
2026-03-30 22:02:53 -07:00
|
|
|
|
config.yaml model.default is the single source of truth for the active
|
|
|
|
|
|
model. Environment variables are no longer consulted.
|
fix: auxiliary client uses main model for custom/local endpoints instead of gpt-4o-mini (#1189)
* fix: prevent model/provider mismatch when switching providers during active gateway
When _update_config_for_provider() writes the new provider and base_url
to config.yaml, the gateway (which re-reads config per-message) can pick
up the change before model selection completes. This causes the old model
name (e.g. 'anthropic/claude-opus-4.6') to be sent to the new provider's
API (e.g. MiniMax), which fails.
Changes:
- _update_config_for_provider() now accepts an optional default_model
parameter. When provided and the current model.default is empty or
uses OpenRouter format (contains '/'), it sets a safe default model
for the new provider.
- All setup.py callers for direct-API providers (zai, kimi, minimax,
minimax-cn, anthropic) now pass a provider-appropriate default model.
- _setup_provider_model_selection() now validates the 'Keep current'
choice: if the current model uses OpenRouter format and wouldn't work
with the new provider, it warns and switches to the provider's first
default model instead of silently keeping the incompatible name.
Reported by a user on Home Assistant whose gateway started sending
'anthropic/claude-opus-4.6' to MiniMax's API after running hermes setup.
* fix: auxiliary client uses main model for custom/local endpoints instead of gpt-4o-mini
When a user runs a local server (e.g. Qwen3.5-9B via OPENAI_BASE_URL),
the auxiliary client (context compression, vision, session search) would
send requests for 'gpt-4o-mini' or 'google/gemini-3-flash-preview' to
the local server, which only serves one model — causing 404 errors
mid-task.
Changes:
- _try_custom_endpoint() now reads the user's configured main model via
_read_main_model() (checks OPENAI_MODEL → HERMES_MODEL → LLM_MODEL →
config.yaml model.default) instead of hardcoding 'gpt-4o-mini'.
- resolve_provider_client() auto mode now detects when an OpenRouter-
formatted model override (containing '/') would be sent to a non-
OpenRouter provider (like a local server) and drops it in favor of
the provider's default model.
- Test isolation fixes: properly clear env vars in 'nothing available'
tests to prevent host environment leakage.
2026-03-13 10:02:16 -07:00
|
|
|
|
"""
|
|
|
|
|
|
try:
|
|
|
|
|
|
from hermes_cli.config import load_config
|
|
|
|
|
|
cfg = load_config()
|
|
|
|
|
|
model_cfg = cfg.get("model", {})
|
|
|
|
|
|
if isinstance(model_cfg, str) and model_cfg.strip():
|
|
|
|
|
|
return model_cfg.strip()
|
|
|
|
|
|
if isinstance(model_cfg, dict):
|
|
|
|
|
|
default = model_cfg.get("default", "")
|
|
|
|
|
|
if isinstance(default, str) and default.strip():
|
|
|
|
|
|
return default.strip()
|
|
|
|
|
|
except Exception:
|
|
|
|
|
|
pass
|
|
|
|
|
|
return ""
|
|
|
|
|
|
|
|
|
|
|
|
|
2026-04-04 12:07:43 -07:00
|
|
|
|
def _read_main_provider() -> str:
|
|
|
|
|
|
"""Read the user's configured main provider from config.yaml.
|
|
|
|
|
|
|
|
|
|
|
|
Returns the lowercase provider id (e.g. "alibaba", "openrouter") or ""
|
|
|
|
|
|
if not configured.
|
|
|
|
|
|
"""
|
|
|
|
|
|
try:
|
|
|
|
|
|
from hermes_cli.config import load_config
|
|
|
|
|
|
cfg = load_config()
|
|
|
|
|
|
model_cfg = cfg.get("model", {})
|
|
|
|
|
|
if isinstance(model_cfg, dict):
|
|
|
|
|
|
provider = model_cfg.get("provider", "")
|
|
|
|
|
|
if isinstance(provider, str) and provider.strip():
|
2026-04-08 16:37:05 -07:00
|
|
|
|
return provider.strip().lower()
|
2026-04-04 12:07:43 -07:00
|
|
|
|
except Exception:
|
|
|
|
|
|
pass
|
|
|
|
|
|
return ""
|
|
|
|
|
|
|
|
|
|
|
|
|
2026-04-10 00:10:32 +02:00
|
|
|
|
def _resolve_custom_runtime() -> Tuple[Optional[str], Optional[str], Optional[str]]:
|
2026-03-14 20:58:12 -07:00
|
|
|
|
"""Resolve the active custom/main endpoint the same way the main CLI does.
|
|
|
|
|
|
|
|
|
|
|
|
This covers both env-driven OPENAI_BASE_URL setups and config-saved custom
|
|
|
|
|
|
endpoints where the base URL lives in config.yaml instead of the live
|
|
|
|
|
|
environment.
|
|
|
|
|
|
"""
|
|
|
|
|
|
try:
|
|
|
|
|
|
from hermes_cli.runtime_provider import resolve_runtime_provider
|
|
|
|
|
|
|
|
|
|
|
|
runtime = resolve_runtime_provider(requested="custom")
|
|
|
|
|
|
except Exception as exc:
|
|
|
|
|
|
logger.debug("Auxiliary client: custom runtime resolution failed: %s", exc)
|
2026-04-10 00:10:32 +02:00
|
|
|
|
runtime = None
|
|
|
|
|
|
|
|
|
|
|
|
if not isinstance(runtime, dict):
|
|
|
|
|
|
openai_base = os.getenv("OPENAI_BASE_URL", "").strip().rstrip("/")
|
|
|
|
|
|
openai_key = os.getenv("OPENAI_API_KEY", "").strip()
|
|
|
|
|
|
if not openai_base:
|
|
|
|
|
|
return None, None, None
|
|
|
|
|
|
runtime = {
|
|
|
|
|
|
"base_url": openai_base,
|
|
|
|
|
|
"api_key": openai_key,
|
|
|
|
|
|
}
|
2026-03-14 20:58:12 -07:00
|
|
|
|
|
|
|
|
|
|
custom_base = runtime.get("base_url")
|
|
|
|
|
|
custom_key = runtime.get("api_key")
|
2026-04-10 00:10:32 +02:00
|
|
|
|
custom_mode = runtime.get("api_mode")
|
2026-03-14 20:58:12 -07:00
|
|
|
|
if not isinstance(custom_base, str) or not custom_base.strip():
|
2026-04-10 00:10:32 +02:00
|
|
|
|
return None, None, None
|
2026-03-14 20:58:12 -07:00
|
|
|
|
|
|
|
|
|
|
custom_base = custom_base.strip().rstrip("/")
|
fix: sweep remaining provider-URL substring checks across codebase
Completes the hostname-hardening sweep — every substring check against a
provider host in live-routing code is now hostname-based. This closes the
same false-positive class for OpenRouter, GitHub Copilot, Kimi, Qwen,
ChatGPT/Codex, Bedrock, GitHub Models, Vercel AI Gateway, Nous, Z.AI,
Moonshot, Arcee, and MiniMax that the original PR closed for OpenAI, xAI,
and Anthropic.
New helper:
- utils.base_url_host_matches(base_url, domain) — safe counterpart to
'domain in base_url'. Accepts hostname equality and subdomain matches;
rejects path segments, host suffixes, and prefix collisions.
Call sites converted (real-code only; tests, optional-skills, red-teaming
scripts untouched):
run_agent.py (10 sites):
- AIAgent.__init__ Bedrock branch, ChatGPT/Codex branch (also path check)
- header cascade for openrouter / copilot / kimi / qwen / chatgpt
- interleaved-thinking trigger (openrouter + claude)
- _is_openrouter_url(), _is_qwen_portal()
- is_native_anthropic check
- github-models-vs-copilot detection (3 sites)
- reasoning-capable route gate (nousresearch, vercel, github)
- codex-backend detection in API kwargs build
- fallback api_mode Bedrock detection
agent/auxiliary_client.py (7 sites):
- extra-headers cascades in 4 distinct client-construction paths
(resolve custom, resolve auto, OpenRouter-fallback-to-custom,
_async_client_from_sync, resolve_provider_client explicit-custom,
resolve_auto_with_codex)
- _is_openrouter_client() base_url sniff
agent/usage_pricing.py:
- resolve_billing_route openrouter branch
agent/model_metadata.py:
- _is_openrouter_base_url(), Bedrock context-length lookup
hermes_cli/providers.py:
- determine_api_mode Bedrock heuristic
hermes_cli/runtime_provider.py:
- _is_openrouter_url flag for API-key preference (issues #420, #560)
hermes_cli/doctor.py:
- Kimi User-Agent header for /models probes
tools/delegate_tool.py:
- subagent Codex endpoint detection
trajectory_compressor.py:
- _detect_provider() cascade (8 providers: openrouter, nous, codex, zai,
kimi-coding, arcee, minimax-cn, minimax)
cli.py, gateway/run.py:
- /model-switch cache-enabled hint (openrouter + claude)
Bedrock detection tightened from 'bedrock-runtime in url' to
'hostname starts with bedrock-runtime. AND host is under amazonaws.com'.
ChatGPT/Codex detection tightened from 'chatgpt.com/backend-api/codex in
url' to 'hostname is chatgpt.com AND path contains /backend-api/codex'.
Tests:
- tests/test_base_url_hostname.py extended with a base_url_host_matches
suite (exact match, subdomain, path-segment rejection, host-suffix
rejection, host-prefix rejection, empty-input, case-insensitivity,
trailing dot).
Validation: 651 targeted tests pass (runtime_provider, minimax, bedrock,
gemini, auxiliary, codex_cloudflare, usage_pricing, compressor_fallback,
fallback_model, openai_client_lifecycle, provider_parity, cli_provider_resolution,
delegate, credential_pool, context_compressor, plus the 4 hostname test
modules). 26-assertion E2E call-site verification across 6 modules passes.
2026-04-20 21:17:28 -07:00
|
|
|
|
if base_url_host_matches(custom_base, "openrouter.ai"):
|
2026-03-14 20:58:12 -07:00
|
|
|
|
# requested='custom' falls back to OpenRouter when no custom endpoint is
|
|
|
|
|
|
# configured. Treat that as "no custom endpoint" for auxiliary routing.
|
2026-04-10 00:10:32 +02:00
|
|
|
|
return None, None, None
|
2026-03-14 20:58:12 -07:00
|
|
|
|
|
2026-03-29 21:05:36 -07:00
|
|
|
|
# Local servers (Ollama, llama.cpp, vLLM, LM Studio) don't require auth.
|
|
|
|
|
|
# Use a placeholder key — the OpenAI SDK requires a non-empty string but
|
|
|
|
|
|
# local servers ignore the Authorization header. Same fix as cli.py
|
|
|
|
|
|
# _ensure_runtime_credentials() (PR #2556).
|
|
|
|
|
|
if not isinstance(custom_key, str) or not custom_key.strip():
|
|
|
|
|
|
custom_key = "no-key-required"
|
|
|
|
|
|
|
2026-04-10 00:10:32 +02:00
|
|
|
|
if not isinstance(custom_mode, str) or not custom_mode.strip():
|
|
|
|
|
|
custom_mode = None
|
|
|
|
|
|
|
|
|
|
|
|
return custom_base, custom_key.strip(), custom_mode
|
2026-03-14 20:58:12 -07:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def _current_custom_base_url() -> str:
|
2026-04-10 00:10:32 +02:00
|
|
|
|
custom_base, _, _ = _resolve_custom_runtime()
|
2026-03-14 20:58:12 -07:00
|
|
|
|
return custom_base or ""
|
|
|
|
|
|
|
|
|
|
|
|
|
2026-04-15 15:07:11 -07:00
|
|
|
|
def _validate_proxy_env_urls() -> None:
|
|
|
|
|
|
"""Fail fast with a clear error when proxy env vars have malformed URLs.
|
|
|
|
|
|
|
|
|
|
|
|
Common cause: shell config (e.g. .zshrc) with a typo like
|
|
|
|
|
|
``export HTTP_PROXY=http://127.0.0.1:6153export NEXT_VAR=...``
|
|
|
|
|
|
which concatenates 'export' into the port number. Without this
|
|
|
|
|
|
check the OpenAI/httpx client raises a cryptic ``Invalid port``
|
|
|
|
|
|
error that doesn't name the offending env var.
|
|
|
|
|
|
"""
|
|
|
|
|
|
from urllib.parse import urlparse
|
|
|
|
|
|
|
2026-04-21 17:55:04 +08:00
|
|
|
|
normalize_proxy_env_vars()
|
|
|
|
|
|
|
2026-04-15 15:07:11 -07:00
|
|
|
|
for key in ("HTTPS_PROXY", "HTTP_PROXY", "ALL_PROXY",
|
|
|
|
|
|
"https_proxy", "http_proxy", "all_proxy"):
|
|
|
|
|
|
value = str(os.environ.get(key) or "").strip()
|
|
|
|
|
|
if not value:
|
|
|
|
|
|
continue
|
|
|
|
|
|
try:
|
|
|
|
|
|
parsed = urlparse(value)
|
|
|
|
|
|
if parsed.scheme:
|
|
|
|
|
|
_ = parsed.port # raises ValueError for e.g. '6153export'
|
|
|
|
|
|
except ValueError as exc:
|
|
|
|
|
|
raise RuntimeError(
|
|
|
|
|
|
f"Malformed proxy environment variable {key}={value!r}. "
|
|
|
|
|
|
"Fix or unset your proxy settings and try again."
|
|
|
|
|
|
) from exc
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def _validate_base_url(base_url: str) -> None:
|
|
|
|
|
|
"""Reject obviously broken custom endpoint URLs before they reach httpx."""
|
|
|
|
|
|
from urllib.parse import urlparse
|
|
|
|
|
|
|
|
|
|
|
|
candidate = str(base_url or "").strip()
|
|
|
|
|
|
if not candidate or candidate.startswith("acp://"):
|
|
|
|
|
|
return
|
|
|
|
|
|
try:
|
|
|
|
|
|
parsed = urlparse(candidate)
|
|
|
|
|
|
if parsed.scheme in {"http", "https"}:
|
|
|
|
|
|
_ = parsed.port # raises ValueError for malformed ports
|
|
|
|
|
|
except ValueError as exc:
|
|
|
|
|
|
raise RuntimeError(
|
|
|
|
|
|
f"Malformed custom endpoint URL: {candidate!r}. "
|
|
|
|
|
|
"Run `hermes setup` or `hermes model` and enter a valid http(s) base URL."
|
|
|
|
|
|
) from exc
|
|
|
|
|
|
|
|
|
|
|
|
|
fix(anthropic): complete third-party Anthropic-compatible provider support (#12846)
Third-party gateways that speak the native Anthropic protocol (MiniMax,
Zhipu GLM, Alibaba DashScope, Kimi, LiteLLM proxies) now work end-to-end
with the same feature set as direct api.anthropic.com callers. Synthesizes
eight stale community PRs into one consolidated change.
Five fixes:
- URL detection: consolidate three inline `endswith("/anthropic")`
checks in runtime_provider.py into the shared _detect_api_mode_for_url
helper. Third-party /anthropic endpoints now auto-resolve to
api_mode=anthropic_messages via one code path instead of three.
- OAuth leak-guard: all five sites that assign `_is_anthropic_oauth`
(__init__, switch_model, _try_refresh_anthropic_client_credentials,
_swap_credential, _try_activate_fallback) now gate on
`provider == "anthropic"` so a stale ANTHROPIC_TOKEN never trips
Claude-Code identity injection on third-party endpoints. Previously
only 2 of 5 sites were guarded.
- Prompt caching: new method `_anthropic_prompt_cache_policy()` returns
`(should_cache, use_native_layout)` per endpoint. Replaces three
inline conditions and the `native_anthropic=(api_mode=='anthropic_messages')`
call-site flag. Native Anthropic and third-party Anthropic gateways
both get the native cache_control layout; OpenRouter gets envelope
layout. Layout is persisted in `_primary_runtime` so fallback
restoration preserves the per-endpoint choice.
- Auxiliary client: `_try_custom_endpoint` honors
`api_mode=anthropic_messages` and builds `AnthropicAuxiliaryClient`
instead of silently downgrading to an OpenAI-wire client. Degrades
gracefully to OpenAI-wire when the anthropic SDK isn't installed.
- Config hygiene: `_update_config_for_provider` (hermes_cli/auth.py)
clears stale `api_key`/`api_mode` when switching to a built-in
provider, so a previous MiniMax custom endpoint's credentials can't
leak into a later OpenRouter session.
- Truncation continuation: length-continuation and tool-call-truncation
retry now cover `anthropic_messages` in addition to `chat_completions`
and `bedrock_converse`. Reuses the existing `_build_assistant_message`
path via `normalize_anthropic_response()` so the interim message
shape is byte-identical to the non-truncated path.
Tests: 6 new files, 42 test cases. Targeted run + tests/run_agent,
tests/agent, tests/hermes_cli all pass (4554 passed).
Synthesized from (credits preserved via Co-authored-by trailers):
#7410 @nocoo — URL detection helper
#7393 @keyuyuan — OAuth 5-site guard
#7367 @n-WN — OAuth guard (narrower cousin, kept comment)
#8636 @sgaofen — caching helper + native-vs-proxy layout split
#10954 @Only-Code-A — caching on anthropic_messages+Claude
#7648 @zhongyueming1121 — aux client anthropic_messages branch
#6096 @hansnow — /model switch clears stale api_mode
#9691 @TroyMitchell911 — anthropic_messages truncation continuation
Closes: #7366, #8294 (third-party Anthropic identity + caching).
Supersedes: #7410, #7367, #7393, #8636, #10954, #7648, #6096, #9691.
Rejects: #9621 (OpenAI-wire caching with incomplete blocklist — risky),
#7242 (superseded by #9691, stale branch),
#8321 (targets smart_model_routing which was removed in #12732).
Co-authored-by: nocoo <nocoo@users.noreply.github.com>
Co-authored-by: Keyu Yuan <leoyuan0099@gmail.com>
Co-authored-by: Zoee <30841158+n-WN@users.noreply.github.com>
Co-authored-by: sgaofen <135070653+sgaofen@users.noreply.github.com>
Co-authored-by: Only-Code-A <bxzt2006@163.com>
Co-authored-by: zhongyueming <mygamez@163.com>
Co-authored-by: Xiaohan Li <hansnow@users.noreply.github.com>
Co-authored-by: Troy Mitchell <i@troy-y.org>
2026-04-19 22:43:09 -07:00
|
|
|
|
def _try_custom_endpoint() -> Tuple[Optional[Any], Optional[str]]:
|
2026-04-10 00:10:32 +02:00
|
|
|
|
runtime = _resolve_custom_runtime()
|
|
|
|
|
|
if len(runtime) == 2:
|
|
|
|
|
|
custom_base, custom_key = runtime
|
|
|
|
|
|
custom_mode = None
|
|
|
|
|
|
else:
|
|
|
|
|
|
custom_base, custom_key, custom_mode = runtime
|
2026-03-07 08:52:06 -08:00
|
|
|
|
if not custom_base or not custom_key:
|
|
|
|
|
|
return None, None
|
2026-04-10 00:10:32 +02:00
|
|
|
|
if custom_base.lower().startswith(_CODEX_AUX_BASE_URL.lower()):
|
|
|
|
|
|
return None, None
|
fix: auxiliary client uses main model for custom/local endpoints instead of gpt-4o-mini (#1189)
* fix: prevent model/provider mismatch when switching providers during active gateway
When _update_config_for_provider() writes the new provider and base_url
to config.yaml, the gateway (which re-reads config per-message) can pick
up the change before model selection completes. This causes the old model
name (e.g. 'anthropic/claude-opus-4.6') to be sent to the new provider's
API (e.g. MiniMax), which fails.
Changes:
- _update_config_for_provider() now accepts an optional default_model
parameter. When provided and the current model.default is empty or
uses OpenRouter format (contains '/'), it sets a safe default model
for the new provider.
- All setup.py callers for direct-API providers (zai, kimi, minimax,
minimax-cn, anthropic) now pass a provider-appropriate default model.
- _setup_provider_model_selection() now validates the 'Keep current'
choice: if the current model uses OpenRouter format and wouldn't work
with the new provider, it warns and switches to the provider's first
default model instead of silently keeping the incompatible name.
Reported by a user on Home Assistant whose gateway started sending
'anthropic/claude-opus-4.6' to MiniMax's API after running hermes setup.
* fix: auxiliary client uses main model for custom/local endpoints instead of gpt-4o-mini
When a user runs a local server (e.g. Qwen3.5-9B via OPENAI_BASE_URL),
the auxiliary client (context compression, vision, session search) would
send requests for 'gpt-4o-mini' or 'google/gemini-3-flash-preview' to
the local server, which only serves one model — causing 404 errors
mid-task.
Changes:
- _try_custom_endpoint() now reads the user's configured main model via
_read_main_model() (checks OPENAI_MODEL → HERMES_MODEL → LLM_MODEL →
config.yaml model.default) instead of hardcoding 'gpt-4o-mini'.
- resolve_provider_client() auto mode now detects when an OpenRouter-
formatted model override (containing '/') would be sent to a non-
OpenRouter provider (like a local server) and drops it in favor of
the provider's default model.
- Test isolation fixes: properly clear env vars in 'nothing available'
tests to prevent host environment leakage.
2026-03-13 10:02:16 -07:00
|
|
|
|
model = _read_main_model() or "gpt-4o-mini"
|
2026-04-10 00:10:32 +02:00
|
|
|
|
logger.debug("Auxiliary client: custom endpoint (%s, api_mode=%s)", model, custom_mode or "chat_completions")
|
|
|
|
|
|
if custom_mode == "codex_responses":
|
|
|
|
|
|
real_client = OpenAI(api_key=custom_key, base_url=custom_base)
|
|
|
|
|
|
return CodexAuxiliaryClient(real_client, model), model
|
fix(anthropic): complete third-party Anthropic-compatible provider support (#12846)
Third-party gateways that speak the native Anthropic protocol (MiniMax,
Zhipu GLM, Alibaba DashScope, Kimi, LiteLLM proxies) now work end-to-end
with the same feature set as direct api.anthropic.com callers. Synthesizes
eight stale community PRs into one consolidated change.
Five fixes:
- URL detection: consolidate three inline `endswith("/anthropic")`
checks in runtime_provider.py into the shared _detect_api_mode_for_url
helper. Third-party /anthropic endpoints now auto-resolve to
api_mode=anthropic_messages via one code path instead of three.
- OAuth leak-guard: all five sites that assign `_is_anthropic_oauth`
(__init__, switch_model, _try_refresh_anthropic_client_credentials,
_swap_credential, _try_activate_fallback) now gate on
`provider == "anthropic"` so a stale ANTHROPIC_TOKEN never trips
Claude-Code identity injection on third-party endpoints. Previously
only 2 of 5 sites were guarded.
- Prompt caching: new method `_anthropic_prompt_cache_policy()` returns
`(should_cache, use_native_layout)` per endpoint. Replaces three
inline conditions and the `native_anthropic=(api_mode=='anthropic_messages')`
call-site flag. Native Anthropic and third-party Anthropic gateways
both get the native cache_control layout; OpenRouter gets envelope
layout. Layout is persisted in `_primary_runtime` so fallback
restoration preserves the per-endpoint choice.
- Auxiliary client: `_try_custom_endpoint` honors
`api_mode=anthropic_messages` and builds `AnthropicAuxiliaryClient`
instead of silently downgrading to an OpenAI-wire client. Degrades
gracefully to OpenAI-wire when the anthropic SDK isn't installed.
- Config hygiene: `_update_config_for_provider` (hermes_cli/auth.py)
clears stale `api_key`/`api_mode` when switching to a built-in
provider, so a previous MiniMax custom endpoint's credentials can't
leak into a later OpenRouter session.
- Truncation continuation: length-continuation and tool-call-truncation
retry now cover `anthropic_messages` in addition to `chat_completions`
and `bedrock_converse`. Reuses the existing `_build_assistant_message`
path via `normalize_anthropic_response()` so the interim message
shape is byte-identical to the non-truncated path.
Tests: 6 new files, 42 test cases. Targeted run + tests/run_agent,
tests/agent, tests/hermes_cli all pass (4554 passed).
Synthesized from (credits preserved via Co-authored-by trailers):
#7410 @nocoo — URL detection helper
#7393 @keyuyuan — OAuth 5-site guard
#7367 @n-WN — OAuth guard (narrower cousin, kept comment)
#8636 @sgaofen — caching helper + native-vs-proxy layout split
#10954 @Only-Code-A — caching on anthropic_messages+Claude
#7648 @zhongyueming1121 — aux client anthropic_messages branch
#6096 @hansnow — /model switch clears stale api_mode
#9691 @TroyMitchell911 — anthropic_messages truncation continuation
Closes: #7366, #8294 (third-party Anthropic identity + caching).
Supersedes: #7410, #7367, #7393, #8636, #10954, #7648, #6096, #9691.
Rejects: #9621 (OpenAI-wire caching with incomplete blocklist — risky),
#7242 (superseded by #9691, stale branch),
#8321 (targets smart_model_routing which was removed in #12732).
Co-authored-by: nocoo <nocoo@users.noreply.github.com>
Co-authored-by: Keyu Yuan <leoyuan0099@gmail.com>
Co-authored-by: Zoee <30841158+n-WN@users.noreply.github.com>
Co-authored-by: sgaofen <135070653+sgaofen@users.noreply.github.com>
Co-authored-by: Only-Code-A <bxzt2006@163.com>
Co-authored-by: zhongyueming <mygamez@163.com>
Co-authored-by: Xiaohan Li <hansnow@users.noreply.github.com>
Co-authored-by: Troy Mitchell <i@troy-y.org>
2026-04-19 22:43:09 -07:00
|
|
|
|
if custom_mode == "anthropic_messages":
|
|
|
|
|
|
# Third-party Anthropic-compatible gateway (MiniMax, Zhipu GLM,
|
|
|
|
|
|
# LiteLLM proxies, etc.). Must NEVER be treated as OAuth —
|
|
|
|
|
|
# Anthropic OAuth claims only apply to api.anthropic.com.
|
|
|
|
|
|
try:
|
|
|
|
|
|
from agent.anthropic_adapter import build_anthropic_client
|
|
|
|
|
|
real_client = build_anthropic_client(custom_key, custom_base)
|
|
|
|
|
|
except ImportError:
|
|
|
|
|
|
logger.warning(
|
|
|
|
|
|
"Custom endpoint declares api_mode=anthropic_messages but the "
|
|
|
|
|
|
"anthropic SDK is not installed — falling back to OpenAI-wire."
|
|
|
|
|
|
)
|
|
|
|
|
|
return OpenAI(api_key=custom_key, base_url=custom_base), model
|
|
|
|
|
|
return (
|
|
|
|
|
|
AnthropicAuxiliaryClient(real_client, model, custom_key, custom_base, is_oauth=False),
|
|
|
|
|
|
model,
|
|
|
|
|
|
)
|
2026-03-07 08:52:06 -08:00
|
|
|
|
return OpenAI(api_key=custom_key, base_url=custom_base), model
|
|
|
|
|
|
|
2026-02-22 02:16:11 -08:00
|
|
|
|
|
2026-03-07 08:52:06 -08:00
|
|
|
|
def _try_codex() -> Tuple[Optional[Any], Optional[str]]:
|
feat(auth): same-provider credential pools with rotation, custom endpoint support, and interactive CLI (#2647)
* feat(auth): add same-provider credential pools and rotation UX
Add same-provider credential pooling so Hermes can rotate across
multiple credentials for a single provider, recover from exhausted
credentials without jumping providers immediately, and configure
that behavior directly in hermes setup.
- agent/credential_pool.py: persisted per-provider credential pools
- hermes auth add/list/remove/reset CLI commands
- 429/402/401 recovery with pool rotation in run_agent.py
- Setup wizard integration for pool strategy configuration
- Auto-seeding from env vars and existing OAuth state
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
Salvaged from PR #2647
* fix(tests): prevent pool auto-seeding from host env in credential pool tests
Tests for non-pool Anthropic paths and auth remove were failing when
host env vars (ANTHROPIC_API_KEY) or file-backed OAuth credentials
were present. The pool auto-seeding picked these up, causing unexpected
pool entries in tests.
- Mock _select_pool_entry in auxiliary_client OAuth flag tests
- Clear Anthropic env vars and mock _seed_from_singletons in auth remove test
* feat(auth): add thread safety, least_used strategy, and request counting
- Add threading.Lock to CredentialPool for gateway thread safety
(concurrent requests from multiple gateway sessions could race on
pool state mutations without this)
- Add 'least_used' rotation strategy that selects the credential
with the lowest request_count, distributing load more evenly
- Add request_count field to PooledCredential for usage tracking
- Add mark_used() method to increment per-credential request counts
- Wrap select(), mark_exhausted_and_rotate(), and try_refresh_current()
with lock acquisition
- Add tests: least_used selection, mark_used counting, concurrent
thread safety (4 threads × 20 selects with no corruption)
* feat(auth): add interactive mode for bare 'hermes auth' command
When 'hermes auth' is called without a subcommand, it now launches an
interactive wizard that:
1. Shows full credential pool status across all providers
2. Offers a menu: add, remove, reset cooldowns, set strategy
3. For OAuth-capable providers (anthropic, nous, openai-codex), the
add flow explicitly asks 'API key or OAuth login?' — making it
clear that both auth types are supported for the same provider
4. Strategy picker shows all 4 options (fill_first, round_robin,
least_used, random) with the current selection marked
5. Remove flow shows entries with indices for easy selection
The subcommand paths (hermes auth add/list/remove/reset) still work
exactly as before for scripted/non-interactive use.
* fix(tests): update runtime_provider tests for config.yaml source of truth (#4165)
Tests were using OPENAI_BASE_URL env var which is no longer consulted
after #4165. Updated to use model config (provider, base_url, api_key)
which is the new single source of truth for custom endpoint URLs.
* feat(auth): support custom endpoint credential pools keyed by provider name
Custom OpenAI-compatible endpoints all share provider='custom', making
the provider-keyed pool useless. Now pools for custom endpoints are
keyed by 'custom:<normalized_name>' where the name comes from the
custom_providers config list (auto-generated from URL hostname).
- Pool key format: 'custom:together.ai', 'custom:local-(localhost:8080)'
- load_pool('custom:name') seeds from custom_providers api_key AND
model.api_key when base_url matches
- hermes auth add/list now shows custom endpoints alongside registry
providers
- _resolve_openrouter_runtime and _resolve_named_custom_runtime check
pool before falling back to single config key
- 6 new tests covering custom pool keying, seeding, and listing
* docs: add Excalidraw diagram of full credential pool flow
Comprehensive architecture diagram showing:
- Credential sources (env vars, auth.json OAuth, config.yaml, CLI)
- Pool storage and auto-seeding
- Runtime resolution paths (registry, custom, OpenRouter)
- Error recovery (429 retry-then-rotate, 402 immediate, 401 refresh)
- CLI management commands and strategy configuration
Open at: https://excalidraw.com/#json=2Ycqhqpi6f12E_3ITyiwh,c7u9jSt5BwrmiVzHGbm87g
* fix(tests): update setup wizard pool tests for unified select_provider_and_model flow
The setup wizard now delegates to select_provider_and_model() instead
of using its own prompt_choice-based provider picker. Tests needed:
- Mock select_provider_and_model as no-op (provider pre-written to config)
- Call _stub_tts BEFORE custom prompt_choice mock (it overwrites it)
- Pre-write model.provider to config so the pool step is reached
* docs: add comprehensive credential pool documentation
- New page: website/docs/user-guide/features/credential-pools.md
Full guide covering quick start, CLI commands, rotation strategies,
error recovery, custom endpoint pools, auto-discovery, thread safety,
architecture, and storage format.
- Updated fallback-providers.md to reference credential pools as the
first layer of resilience (same-provider rotation before cross-provider)
- Added hermes auth to CLI commands reference with usage examples
- Added credential_pool_strategies to configuration guide
* chore: remove excalidraw diagram from repo (external link only)
* refactor: simplify credential pool code — extract helpers, collapse extras, dedup patterns
- _load_config_safe(): replace 4 identical try/except/import blocks
- _iter_custom_providers(): shared generator for custom provider iteration
- PooledCredential.extra dict: collapse 11 round-trip-only fields
(token_type, scope, client_id, portal_base_url, obtained_at,
expires_in, agent_key_id, agent_key_expires_in, agent_key_reused,
agent_key_obtained_at, tls) into a single extra dict with
__getattr__ for backward-compatible access
- _available_entries(): shared exhaustion-check between select and peek
- Dedup anthropic OAuth seeding (hermes_pkce + claude_code identical)
- SimpleNamespace replaces class _Args boilerplate in auth_commands
- _try_resolve_from_custom_pool(): shared pool-check in runtime_provider
Net -17 lines. All 383 targeted tests pass.
---------
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
2026-03-31 03:10:01 -07:00
|
|
|
|
pool_present, entry = _select_pool_entry("openai-codex")
|
|
|
|
|
|
if pool_present:
|
|
|
|
|
|
codex_token = _pool_runtime_api_key(entry)
|
2026-04-09 00:04:30 -07:00
|
|
|
|
if codex_token:
|
|
|
|
|
|
base_url = _pool_runtime_base_url(entry, _CODEX_AUX_BASE_URL) or _CODEX_AUX_BASE_URL
|
|
|
|
|
|
else:
|
|
|
|
|
|
codex_token = _read_codex_access_token()
|
|
|
|
|
|
if not codex_token:
|
|
|
|
|
|
return None, None
|
|
|
|
|
|
base_url = _CODEX_AUX_BASE_URL
|
feat(auth): same-provider credential pools with rotation, custom endpoint support, and interactive CLI (#2647)
* feat(auth): add same-provider credential pools and rotation UX
Add same-provider credential pooling so Hermes can rotate across
multiple credentials for a single provider, recover from exhausted
credentials without jumping providers immediately, and configure
that behavior directly in hermes setup.
- agent/credential_pool.py: persisted per-provider credential pools
- hermes auth add/list/remove/reset CLI commands
- 429/402/401 recovery with pool rotation in run_agent.py
- Setup wizard integration for pool strategy configuration
- Auto-seeding from env vars and existing OAuth state
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
Salvaged from PR #2647
* fix(tests): prevent pool auto-seeding from host env in credential pool tests
Tests for non-pool Anthropic paths and auth remove were failing when
host env vars (ANTHROPIC_API_KEY) or file-backed OAuth credentials
were present. The pool auto-seeding picked these up, causing unexpected
pool entries in tests.
- Mock _select_pool_entry in auxiliary_client OAuth flag tests
- Clear Anthropic env vars and mock _seed_from_singletons in auth remove test
* feat(auth): add thread safety, least_used strategy, and request counting
- Add threading.Lock to CredentialPool for gateway thread safety
(concurrent requests from multiple gateway sessions could race on
pool state mutations without this)
- Add 'least_used' rotation strategy that selects the credential
with the lowest request_count, distributing load more evenly
- Add request_count field to PooledCredential for usage tracking
- Add mark_used() method to increment per-credential request counts
- Wrap select(), mark_exhausted_and_rotate(), and try_refresh_current()
with lock acquisition
- Add tests: least_used selection, mark_used counting, concurrent
thread safety (4 threads × 20 selects with no corruption)
* feat(auth): add interactive mode for bare 'hermes auth' command
When 'hermes auth' is called without a subcommand, it now launches an
interactive wizard that:
1. Shows full credential pool status across all providers
2. Offers a menu: add, remove, reset cooldowns, set strategy
3. For OAuth-capable providers (anthropic, nous, openai-codex), the
add flow explicitly asks 'API key or OAuth login?' — making it
clear that both auth types are supported for the same provider
4. Strategy picker shows all 4 options (fill_first, round_robin,
least_used, random) with the current selection marked
5. Remove flow shows entries with indices for easy selection
The subcommand paths (hermes auth add/list/remove/reset) still work
exactly as before for scripted/non-interactive use.
* fix(tests): update runtime_provider tests for config.yaml source of truth (#4165)
Tests were using OPENAI_BASE_URL env var which is no longer consulted
after #4165. Updated to use model config (provider, base_url, api_key)
which is the new single source of truth for custom endpoint URLs.
* feat(auth): support custom endpoint credential pools keyed by provider name
Custom OpenAI-compatible endpoints all share provider='custom', making
the provider-keyed pool useless. Now pools for custom endpoints are
keyed by 'custom:<normalized_name>' where the name comes from the
custom_providers config list (auto-generated from URL hostname).
- Pool key format: 'custom:together.ai', 'custom:local-(localhost:8080)'
- load_pool('custom:name') seeds from custom_providers api_key AND
model.api_key when base_url matches
- hermes auth add/list now shows custom endpoints alongside registry
providers
- _resolve_openrouter_runtime and _resolve_named_custom_runtime check
pool before falling back to single config key
- 6 new tests covering custom pool keying, seeding, and listing
* docs: add Excalidraw diagram of full credential pool flow
Comprehensive architecture diagram showing:
- Credential sources (env vars, auth.json OAuth, config.yaml, CLI)
- Pool storage and auto-seeding
- Runtime resolution paths (registry, custom, OpenRouter)
- Error recovery (429 retry-then-rotate, 402 immediate, 401 refresh)
- CLI management commands and strategy configuration
Open at: https://excalidraw.com/#json=2Ycqhqpi6f12E_3ITyiwh,c7u9jSt5BwrmiVzHGbm87g
* fix(tests): update setup wizard pool tests for unified select_provider_and_model flow
The setup wizard now delegates to select_provider_and_model() instead
of using its own prompt_choice-based provider picker. Tests needed:
- Mock select_provider_and_model as no-op (provider pre-written to config)
- Call _stub_tts BEFORE custom prompt_choice mock (it overwrites it)
- Pre-write model.provider to config so the pool step is reached
* docs: add comprehensive credential pool documentation
- New page: website/docs/user-guide/features/credential-pools.md
Full guide covering quick start, CLI commands, rotation strategies,
error recovery, custom endpoint pools, auto-discovery, thread safety,
architecture, and storage format.
- Updated fallback-providers.md to reference credential pools as the
first layer of resilience (same-provider rotation before cross-provider)
- Added hermes auth to CLI commands reference with usage examples
- Added credential_pool_strategies to configuration guide
* chore: remove excalidraw diagram from repo (external link only)
* refactor: simplify credential pool code — extract helpers, collapse extras, dedup patterns
- _load_config_safe(): replace 4 identical try/except/import blocks
- _iter_custom_providers(): shared generator for custom provider iteration
- PooledCredential.extra dict: collapse 11 round-trip-only fields
(token_type, scope, client_id, portal_base_url, obtained_at,
expires_in, agent_key_id, agent_key_expires_in, agent_key_reused,
agent_key_obtained_at, tls) into a single extra dict with
__getattr__ for backward-compatible access
- _available_entries(): shared exhaustion-check between select and peek
- Dedup anthropic OAuth seeding (hermes_pkce + claude_code identical)
- SimpleNamespace replaces class _Args boilerplate in auth_commands
- _try_resolve_from_custom_pool(): shared pool-check in runtime_provider
Net -17 lines. All 383 targeted tests pass.
---------
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
2026-03-31 03:10:01 -07:00
|
|
|
|
else:
|
|
|
|
|
|
codex_token = _read_codex_access_token()
|
|
|
|
|
|
if not codex_token:
|
|
|
|
|
|
return None, None
|
|
|
|
|
|
base_url = _CODEX_AUX_BASE_URL
|
2026-03-07 08:52:06 -08:00
|
|
|
|
logger.debug("Auxiliary client: Codex OAuth (%s via Responses API)", _CODEX_AUX_MODEL)
|
fix(codex): pin correct Cloudflare headers and extend to auxiliary client
The cherry-picked salvage (admin28980's commit) added codex headers only on the
primary chat client path, with two inaccuracies:
- originator was 'hermes-agent' — Cloudflare whitelists codex_cli_rs,
codex_vscode, codex_sdk_ts, and Codex* prefixes. 'hermes-agent' isn't on
the list, so the header had no mitigating effect on the 403 (the
account-id header alone may have been carrying the fix).
- account-id header was 'ChatGPT-Account-Id' — upstream codex-rs auth.rs
uses canonical 'ChatGPT-Account-ID' (PascalCase, trailing -ID).
Also, the auxiliary client (_try_codex + resolve_provider_client raw_codex
branch) constructs OpenAI clients against the same chatgpt.com endpoint with
no default headers at all — so compression, title generation, vision, session
search, and web_extract all still 403 from VPS IPs.
Consolidate the header set into _codex_cloudflare_headers() in
agent/auxiliary_client.py (natural home next to _read_codex_access_token and
the existing JWT decode logic) and call it from all four insertion points:
- run_agent.py: AIAgent.__init__ (initial construction)
- run_agent.py: _apply_client_headers_for_base_url (credential rotation)
- agent/auxiliary_client.py: _try_codex (aux client)
- agent/auxiliary_client.py: resolve_provider_client raw_codex branch
Net: -36/+55 lines, -25 lines of duplicated inline JWT decode replaced by a
single helper. User-Agent switched to 'codex_cli_rs/0.0.0 (Hermes Agent)' to
match the codex-rs shape while keeping product attribution.
Tests in tests/agent/test_codex_cloudflare_headers.py cover:
- originator value, User-Agent shape, canonical header casing
- account-ID extraction from a real JWT fixture
- graceful handling of malformed / non-string / claim-missing tokens
- wiring at all four insertion points (primary init, rotation, both aux paths)
- non-chatgpt base URLs (openrouter) do NOT get codex headers
- switching away from chatgpt.com drops the headers
2026-04-19 11:58:15 -07:00
|
|
|
|
real_client = OpenAI(
|
|
|
|
|
|
api_key=codex_token,
|
|
|
|
|
|
base_url=base_url,
|
|
|
|
|
|
default_headers=_codex_cloudflare_headers(codex_token),
|
|
|
|
|
|
)
|
2026-03-07 08:52:06 -08:00
|
|
|
|
return CodexAuxiliaryClient(real_client, _CODEX_AUX_MODEL), _CODEX_AUX_MODEL
|
|
|
|
|
|
|
|
|
|
|
|
|
2026-03-14 21:14:20 -07:00
|
|
|
|
def _try_anthropic() -> Tuple[Optional[Any], Optional[str]]:
|
|
|
|
|
|
try:
|
|
|
|
|
|
from agent.anthropic_adapter import build_anthropic_client, resolve_anthropic_token
|
|
|
|
|
|
except ImportError:
|
|
|
|
|
|
return None, None
|
|
|
|
|
|
|
feat(auth): same-provider credential pools with rotation, custom endpoint support, and interactive CLI (#2647)
* feat(auth): add same-provider credential pools and rotation UX
Add same-provider credential pooling so Hermes can rotate across
multiple credentials for a single provider, recover from exhausted
credentials without jumping providers immediately, and configure
that behavior directly in hermes setup.
- agent/credential_pool.py: persisted per-provider credential pools
- hermes auth add/list/remove/reset CLI commands
- 429/402/401 recovery with pool rotation in run_agent.py
- Setup wizard integration for pool strategy configuration
- Auto-seeding from env vars and existing OAuth state
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
Salvaged from PR #2647
* fix(tests): prevent pool auto-seeding from host env in credential pool tests
Tests for non-pool Anthropic paths and auth remove were failing when
host env vars (ANTHROPIC_API_KEY) or file-backed OAuth credentials
were present. The pool auto-seeding picked these up, causing unexpected
pool entries in tests.
- Mock _select_pool_entry in auxiliary_client OAuth flag tests
- Clear Anthropic env vars and mock _seed_from_singletons in auth remove test
* feat(auth): add thread safety, least_used strategy, and request counting
- Add threading.Lock to CredentialPool for gateway thread safety
(concurrent requests from multiple gateway sessions could race on
pool state mutations without this)
- Add 'least_used' rotation strategy that selects the credential
with the lowest request_count, distributing load more evenly
- Add request_count field to PooledCredential for usage tracking
- Add mark_used() method to increment per-credential request counts
- Wrap select(), mark_exhausted_and_rotate(), and try_refresh_current()
with lock acquisition
- Add tests: least_used selection, mark_used counting, concurrent
thread safety (4 threads × 20 selects with no corruption)
* feat(auth): add interactive mode for bare 'hermes auth' command
When 'hermes auth' is called without a subcommand, it now launches an
interactive wizard that:
1. Shows full credential pool status across all providers
2. Offers a menu: add, remove, reset cooldowns, set strategy
3. For OAuth-capable providers (anthropic, nous, openai-codex), the
add flow explicitly asks 'API key or OAuth login?' — making it
clear that both auth types are supported for the same provider
4. Strategy picker shows all 4 options (fill_first, round_robin,
least_used, random) with the current selection marked
5. Remove flow shows entries with indices for easy selection
The subcommand paths (hermes auth add/list/remove/reset) still work
exactly as before for scripted/non-interactive use.
* fix(tests): update runtime_provider tests for config.yaml source of truth (#4165)
Tests were using OPENAI_BASE_URL env var which is no longer consulted
after #4165. Updated to use model config (provider, base_url, api_key)
which is the new single source of truth for custom endpoint URLs.
* feat(auth): support custom endpoint credential pools keyed by provider name
Custom OpenAI-compatible endpoints all share provider='custom', making
the provider-keyed pool useless. Now pools for custom endpoints are
keyed by 'custom:<normalized_name>' where the name comes from the
custom_providers config list (auto-generated from URL hostname).
- Pool key format: 'custom:together.ai', 'custom:local-(localhost:8080)'
- load_pool('custom:name') seeds from custom_providers api_key AND
model.api_key when base_url matches
- hermes auth add/list now shows custom endpoints alongside registry
providers
- _resolve_openrouter_runtime and _resolve_named_custom_runtime check
pool before falling back to single config key
- 6 new tests covering custom pool keying, seeding, and listing
* docs: add Excalidraw diagram of full credential pool flow
Comprehensive architecture diagram showing:
- Credential sources (env vars, auth.json OAuth, config.yaml, CLI)
- Pool storage and auto-seeding
- Runtime resolution paths (registry, custom, OpenRouter)
- Error recovery (429 retry-then-rotate, 402 immediate, 401 refresh)
- CLI management commands and strategy configuration
Open at: https://excalidraw.com/#json=2Ycqhqpi6f12E_3ITyiwh,c7u9jSt5BwrmiVzHGbm87g
* fix(tests): update setup wizard pool tests for unified select_provider_and_model flow
The setup wizard now delegates to select_provider_and_model() instead
of using its own prompt_choice-based provider picker. Tests needed:
- Mock select_provider_and_model as no-op (provider pre-written to config)
- Call _stub_tts BEFORE custom prompt_choice mock (it overwrites it)
- Pre-write model.provider to config so the pool step is reached
* docs: add comprehensive credential pool documentation
- New page: website/docs/user-guide/features/credential-pools.md
Full guide covering quick start, CLI commands, rotation strategies,
error recovery, custom endpoint pools, auto-discovery, thread safety,
architecture, and storage format.
- Updated fallback-providers.md to reference credential pools as the
first layer of resilience (same-provider rotation before cross-provider)
- Added hermes auth to CLI commands reference with usage examples
- Added credential_pool_strategies to configuration guide
* chore: remove excalidraw diagram from repo (external link only)
* refactor: simplify credential pool code — extract helpers, collapse extras, dedup patterns
- _load_config_safe(): replace 4 identical try/except/import blocks
- _iter_custom_providers(): shared generator for custom provider iteration
- PooledCredential.extra dict: collapse 11 round-trip-only fields
(token_type, scope, client_id, portal_base_url, obtained_at,
expires_in, agent_key_id, agent_key_expires_in, agent_key_reused,
agent_key_obtained_at, tls) into a single extra dict with
__getattr__ for backward-compatible access
- _available_entries(): shared exhaustion-check between select and peek
- Dedup anthropic OAuth seeding (hermes_pkce + claude_code identical)
- SimpleNamespace replaces class _Args boilerplate in auth_commands
- _try_resolve_from_custom_pool(): shared pool-check in runtime_provider
Net -17 lines. All 383 targeted tests pass.
---------
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
2026-03-31 03:10:01 -07:00
|
|
|
|
pool_present, entry = _select_pool_entry("anthropic")
|
|
|
|
|
|
if pool_present:
|
|
|
|
|
|
if entry is None:
|
|
|
|
|
|
return None, None
|
|
|
|
|
|
token = _pool_runtime_api_key(entry)
|
|
|
|
|
|
else:
|
|
|
|
|
|
entry = None
|
|
|
|
|
|
token = resolve_anthropic_token()
|
2026-03-14 21:14:20 -07:00
|
|
|
|
if not token:
|
|
|
|
|
|
return None, None
|
|
|
|
|
|
|
2026-03-21 16:16:17 -07:00
|
|
|
|
# Allow base URL override from config.yaml model.base_url, but only
|
|
|
|
|
|
# when the configured provider is anthropic — otherwise a non-Anthropic
|
|
|
|
|
|
# base_url (e.g. Codex endpoint) would leak into Anthropic requests.
|
feat(auth): same-provider credential pools with rotation, custom endpoint support, and interactive CLI (#2647)
* feat(auth): add same-provider credential pools and rotation UX
Add same-provider credential pooling so Hermes can rotate across
multiple credentials for a single provider, recover from exhausted
credentials without jumping providers immediately, and configure
that behavior directly in hermes setup.
- agent/credential_pool.py: persisted per-provider credential pools
- hermes auth add/list/remove/reset CLI commands
- 429/402/401 recovery with pool rotation in run_agent.py
- Setup wizard integration for pool strategy configuration
- Auto-seeding from env vars and existing OAuth state
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
Salvaged from PR #2647
* fix(tests): prevent pool auto-seeding from host env in credential pool tests
Tests for non-pool Anthropic paths and auth remove were failing when
host env vars (ANTHROPIC_API_KEY) or file-backed OAuth credentials
were present. The pool auto-seeding picked these up, causing unexpected
pool entries in tests.
- Mock _select_pool_entry in auxiliary_client OAuth flag tests
- Clear Anthropic env vars and mock _seed_from_singletons in auth remove test
* feat(auth): add thread safety, least_used strategy, and request counting
- Add threading.Lock to CredentialPool for gateway thread safety
(concurrent requests from multiple gateway sessions could race on
pool state mutations without this)
- Add 'least_used' rotation strategy that selects the credential
with the lowest request_count, distributing load more evenly
- Add request_count field to PooledCredential for usage tracking
- Add mark_used() method to increment per-credential request counts
- Wrap select(), mark_exhausted_and_rotate(), and try_refresh_current()
with lock acquisition
- Add tests: least_used selection, mark_used counting, concurrent
thread safety (4 threads × 20 selects with no corruption)
* feat(auth): add interactive mode for bare 'hermes auth' command
When 'hermes auth' is called without a subcommand, it now launches an
interactive wizard that:
1. Shows full credential pool status across all providers
2. Offers a menu: add, remove, reset cooldowns, set strategy
3. For OAuth-capable providers (anthropic, nous, openai-codex), the
add flow explicitly asks 'API key or OAuth login?' — making it
clear that both auth types are supported for the same provider
4. Strategy picker shows all 4 options (fill_first, round_robin,
least_used, random) with the current selection marked
5. Remove flow shows entries with indices for easy selection
The subcommand paths (hermes auth add/list/remove/reset) still work
exactly as before for scripted/non-interactive use.
* fix(tests): update runtime_provider tests for config.yaml source of truth (#4165)
Tests were using OPENAI_BASE_URL env var which is no longer consulted
after #4165. Updated to use model config (provider, base_url, api_key)
which is the new single source of truth for custom endpoint URLs.
* feat(auth): support custom endpoint credential pools keyed by provider name
Custom OpenAI-compatible endpoints all share provider='custom', making
the provider-keyed pool useless. Now pools for custom endpoints are
keyed by 'custom:<normalized_name>' where the name comes from the
custom_providers config list (auto-generated from URL hostname).
- Pool key format: 'custom:together.ai', 'custom:local-(localhost:8080)'
- load_pool('custom:name') seeds from custom_providers api_key AND
model.api_key when base_url matches
- hermes auth add/list now shows custom endpoints alongside registry
providers
- _resolve_openrouter_runtime and _resolve_named_custom_runtime check
pool before falling back to single config key
- 6 new tests covering custom pool keying, seeding, and listing
* docs: add Excalidraw diagram of full credential pool flow
Comprehensive architecture diagram showing:
- Credential sources (env vars, auth.json OAuth, config.yaml, CLI)
- Pool storage and auto-seeding
- Runtime resolution paths (registry, custom, OpenRouter)
- Error recovery (429 retry-then-rotate, 402 immediate, 401 refresh)
- CLI management commands and strategy configuration
Open at: https://excalidraw.com/#json=2Ycqhqpi6f12E_3ITyiwh,c7u9jSt5BwrmiVzHGbm87g
* fix(tests): update setup wizard pool tests for unified select_provider_and_model flow
The setup wizard now delegates to select_provider_and_model() instead
of using its own prompt_choice-based provider picker. Tests needed:
- Mock select_provider_and_model as no-op (provider pre-written to config)
- Call _stub_tts BEFORE custom prompt_choice mock (it overwrites it)
- Pre-write model.provider to config so the pool step is reached
* docs: add comprehensive credential pool documentation
- New page: website/docs/user-guide/features/credential-pools.md
Full guide covering quick start, CLI commands, rotation strategies,
error recovery, custom endpoint pools, auto-discovery, thread safety,
architecture, and storage format.
- Updated fallback-providers.md to reference credential pools as the
first layer of resilience (same-provider rotation before cross-provider)
- Added hermes auth to CLI commands reference with usage examples
- Added credential_pool_strategies to configuration guide
* chore: remove excalidraw diagram from repo (external link only)
* refactor: simplify credential pool code — extract helpers, collapse extras, dedup patterns
- _load_config_safe(): replace 4 identical try/except/import blocks
- _iter_custom_providers(): shared generator for custom provider iteration
- PooledCredential.extra dict: collapse 11 round-trip-only fields
(token_type, scope, client_id, portal_base_url, obtained_at,
expires_in, agent_key_id, agent_key_expires_in, agent_key_reused,
agent_key_obtained_at, tls) into a single extra dict with
__getattr__ for backward-compatible access
- _available_entries(): shared exhaustion-check between select and peek
- Dedup anthropic OAuth seeding (hermes_pkce + claude_code identical)
- SimpleNamespace replaces class _Args boilerplate in auth_commands
- _try_resolve_from_custom_pool(): shared pool-check in runtime_provider
Net -17 lines. All 383 targeted tests pass.
---------
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
2026-03-31 03:10:01 -07:00
|
|
|
|
base_url = _pool_runtime_base_url(entry, _ANTHROPIC_DEFAULT_BASE_URL) if pool_present else _ANTHROPIC_DEFAULT_BASE_URL
|
2026-03-18 16:51:24 -07:00
|
|
|
|
try:
|
|
|
|
|
|
from hermes_cli.config import load_config
|
|
|
|
|
|
cfg = load_config()
|
|
|
|
|
|
model_cfg = cfg.get("model")
|
|
|
|
|
|
if isinstance(model_cfg, dict):
|
2026-03-21 16:16:17 -07:00
|
|
|
|
cfg_provider = str(model_cfg.get("provider") or "").strip().lower()
|
|
|
|
|
|
if cfg_provider == "anthropic":
|
|
|
|
|
|
cfg_base_url = (model_cfg.get("base_url") or "").strip().rstrip("/")
|
|
|
|
|
|
if cfg_base_url:
|
|
|
|
|
|
base_url = cfg_base_url
|
2026-03-18 16:51:24 -07:00
|
|
|
|
except Exception:
|
|
|
|
|
|
pass
|
|
|
|
|
|
|
2026-03-21 17:36:25 -07:00
|
|
|
|
from agent.anthropic_adapter import _is_oauth_token
|
|
|
|
|
|
is_oauth = _is_oauth_token(token)
|
2026-03-14 21:14:20 -07:00
|
|
|
|
model = _API_KEY_PROVIDER_AUX_MODELS.get("anthropic", "claude-haiku-4-5-20251001")
|
2026-03-21 17:36:25 -07:00
|
|
|
|
logger.debug("Auxiliary client: Anthropic native (%s) at %s (oauth=%s)", model, base_url, is_oauth)
|
2026-03-26 18:21:59 -07:00
|
|
|
|
try:
|
|
|
|
|
|
real_client = build_anthropic_client(token, base_url)
|
|
|
|
|
|
except ImportError:
|
|
|
|
|
|
# The anthropic_adapter module imports fine but the SDK itself is
|
|
|
|
|
|
# missing — build_anthropic_client raises ImportError at call time
|
|
|
|
|
|
# when _anthropic_sdk is None. Treat as unavailable.
|
|
|
|
|
|
return None, None
|
2026-03-21 17:36:25 -07:00
|
|
|
|
return AnthropicAuxiliaryClient(real_client, model, token, base_url, is_oauth=is_oauth), model
|
2026-03-14 21:14:20 -07:00
|
|
|
|
|
|
|
|
|
|
|
2026-03-29 21:29:00 -07:00
|
|
|
|
_AUTO_PROVIDER_LABELS = {
|
|
|
|
|
|
"_try_openrouter": "openrouter",
|
|
|
|
|
|
"_try_nous": "nous",
|
|
|
|
|
|
"_try_custom_endpoint": "local/custom",
|
|
|
|
|
|
"_try_codex": "openai-codex",
|
|
|
|
|
|
"_resolve_api_key_provider": "api-key",
|
|
|
|
|
|
}
|
|
|
|
|
|
|
2026-04-12 00:10:19 -04:00
|
|
|
|
_MAIN_RUNTIME_FIELDS = ("provider", "model", "base_url", "api_key", "api_mode")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def _normalize_main_runtime(main_runtime: Optional[Dict[str, Any]]) -> Dict[str, str]:
|
|
|
|
|
|
"""Return a sanitized copy of a live main-runtime override."""
|
|
|
|
|
|
if not isinstance(main_runtime, dict):
|
|
|
|
|
|
return {}
|
|
|
|
|
|
normalized: Dict[str, str] = {}
|
|
|
|
|
|
for field in _MAIN_RUNTIME_FIELDS:
|
|
|
|
|
|
value = main_runtime.get(field)
|
|
|
|
|
|
if isinstance(value, str) and value.strip():
|
|
|
|
|
|
normalized[field] = value.strip()
|
|
|
|
|
|
provider = normalized.get("provider")
|
|
|
|
|
|
if provider:
|
|
|
|
|
|
normalized["provider"] = provider.lower()
|
|
|
|
|
|
return normalized
|
|
|
|
|
|
|
2026-04-04 12:07:43 -07:00
|
|
|
|
|
2026-04-06 12:41:40 -07:00
|
|
|
|
def _get_provider_chain() -> List[tuple]:
|
|
|
|
|
|
"""Return the ordered provider detection chain.
|
|
|
|
|
|
|
|
|
|
|
|
Built at call time (not module level) so that test patches
|
|
|
|
|
|
on the ``_try_*`` functions are picked up correctly.
|
|
|
|
|
|
"""
|
|
|
|
|
|
return [
|
|
|
|
|
|
("openrouter", _try_openrouter),
|
|
|
|
|
|
("nous", _try_nous),
|
|
|
|
|
|
("local/custom", _try_custom_endpoint),
|
|
|
|
|
|
("openai-codex", _try_codex),
|
|
|
|
|
|
("api-key", _resolve_api_key_provider),
|
|
|
|
|
|
]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def _is_payment_error(exc: Exception) -> bool:
|
|
|
|
|
|
"""Detect payment/credit/quota exhaustion errors.
|
|
|
|
|
|
|
|
|
|
|
|
Returns True for HTTP 402 (Payment Required) and for 429/other errors
|
|
|
|
|
|
whose message indicates billing exhaustion rather than rate limiting.
|
|
|
|
|
|
"""
|
|
|
|
|
|
status = getattr(exc, "status_code", None)
|
|
|
|
|
|
if status == 402:
|
|
|
|
|
|
return True
|
|
|
|
|
|
err_lower = str(exc).lower()
|
|
|
|
|
|
# OpenRouter and other providers include "credits" or "afford" in 402 bodies,
|
|
|
|
|
|
# but sometimes wrap them in 429 or other codes.
|
|
|
|
|
|
if status in (402, 429, None):
|
|
|
|
|
|
if any(kw in err_lower for kw in ("credits", "insufficient funds",
|
|
|
|
|
|
"can only afford", "billing",
|
|
|
|
|
|
"payment required")):
|
|
|
|
|
|
return True
|
|
|
|
|
|
return False
|
|
|
|
|
|
|
|
|
|
|
|
|
fix: model fallback — stale model on Nous login + connection error fallback (#6554)
Two bugs in the model fallback system:
1. Nous login leaves stale model in config (provider=nous, model=opus
from previous OpenRouter setup). Fixed by deferring the config.yaml
provider write until AFTER model selection completes, and passing the
selected model atomically via _update_config_for_provider's
default_model parameter. Previously, _update_config_for_provider was
called before model selection — if selection failed (free tier, no
models, exception), config stayed as nous+opus permanently.
2. Codex/stale providers in auxiliary fallback can't connect but block
the auto-detection chain. Added _is_connection_error() detection
(APIConnectionError, APITimeoutError, DNS failures, connection
refused) alongside the existing _is_payment_error() check in
call_llm(). When a provider endpoint is unreachable, the system now
falls back to the next available provider instead of crashing.
2026-04-09 10:38:53 -07:00
|
|
|
|
def _is_connection_error(exc: Exception) -> bool:
|
|
|
|
|
|
"""Detect connection/network errors that warrant provider fallback.
|
|
|
|
|
|
|
|
|
|
|
|
Returns True for errors indicating the provider endpoint is unreachable
|
|
|
|
|
|
(DNS failure, connection refused, TLS errors, timeouts). These are
|
|
|
|
|
|
distinct from API errors (4xx/5xx) which indicate the provider IS
|
|
|
|
|
|
reachable but returned an error.
|
|
|
|
|
|
"""
|
|
|
|
|
|
from openai import APIConnectionError, APITimeoutError
|
|
|
|
|
|
|
|
|
|
|
|
if isinstance(exc, (APIConnectionError, APITimeoutError)):
|
|
|
|
|
|
return True
|
|
|
|
|
|
# urllib3 / httpx / httpcore connection errors
|
|
|
|
|
|
err_type = type(exc).__name__
|
|
|
|
|
|
if any(kw in err_type for kw in ("Connection", "Timeout", "DNS", "SSL")):
|
|
|
|
|
|
return True
|
|
|
|
|
|
err_lower = str(exc).lower()
|
|
|
|
|
|
if any(kw in err_lower for kw in (
|
|
|
|
|
|
"connection refused", "name or service not known",
|
|
|
|
|
|
"no route to host", "network is unreachable",
|
|
|
|
|
|
"timed out", "connection reset",
|
|
|
|
|
|
)):
|
|
|
|
|
|
return True
|
|
|
|
|
|
return False
|
|
|
|
|
|
|
|
|
|
|
|
|
2026-04-21 14:45:13 -06:00
|
|
|
|
def _is_auth_error(exc: Exception) -> bool:
|
|
|
|
|
|
"""Detect auth failures that should trigger provider-specific refresh."""
|
|
|
|
|
|
status = getattr(exc, "status_code", None)
|
|
|
|
|
|
if status == 401:
|
|
|
|
|
|
return True
|
|
|
|
|
|
err_lower = str(exc).lower()
|
|
|
|
|
|
return "error code: 401" in err_lower or "authenticationerror" in type(exc).__name__.lower()
|
|
|
|
|
|
|
|
|
|
|
|
|
2026-04-06 12:41:40 -07:00
|
|
|
|
def _try_payment_fallback(
|
|
|
|
|
|
failed_provider: str,
|
|
|
|
|
|
task: str = None,
|
2026-04-11 12:37:53 +05:30
|
|
|
|
reason: str = "payment error",
|
2026-04-06 12:41:40 -07:00
|
|
|
|
) -> Tuple[Optional[Any], Optional[str], str]:
|
2026-04-11 12:37:53 +05:30
|
|
|
|
"""Try alternative providers after a payment/credit or connection error.
|
2026-04-06 12:41:40 -07:00
|
|
|
|
|
|
|
|
|
|
Iterates the standard auto-detection chain, skipping the provider that
|
2026-04-11 12:37:53 +05:30
|
|
|
|
failed.
|
2026-04-06 12:41:40 -07:00
|
|
|
|
|
|
|
|
|
|
Returns:
|
|
|
|
|
|
(client, model, provider_label) or (None, None, "") if no fallback.
|
|
|
|
|
|
"""
|
|
|
|
|
|
# Normalise the failed provider label for matching.
|
|
|
|
|
|
skip = failed_provider.lower().strip()
|
|
|
|
|
|
# Also skip Step-1 main-provider path if it maps to the same backend.
|
|
|
|
|
|
# (e.g. main_provider="openrouter" → skip "openrouter" in chain)
|
|
|
|
|
|
main_provider = _read_main_provider()
|
|
|
|
|
|
skip_labels = {skip}
|
|
|
|
|
|
if main_provider and main_provider.lower() in skip:
|
|
|
|
|
|
skip_labels.add(main_provider.lower())
|
|
|
|
|
|
# Map common resolved_provider values back to chain labels.
|
|
|
|
|
|
_alias_to_label = {"openrouter": "openrouter", "nous": "nous",
|
|
|
|
|
|
"openai-codex": "openai-codex", "codex": "openai-codex",
|
|
|
|
|
|
"custom": "local/custom", "local/custom": "local/custom"}
|
|
|
|
|
|
skip_chain_labels = {_alias_to_label.get(s, s) for s in skip_labels}
|
|
|
|
|
|
|
|
|
|
|
|
tried = []
|
|
|
|
|
|
for label, try_fn in _get_provider_chain():
|
|
|
|
|
|
if label in skip_chain_labels:
|
|
|
|
|
|
continue
|
|
|
|
|
|
client, model = try_fn()
|
|
|
|
|
|
if client is not None:
|
|
|
|
|
|
logger.info(
|
2026-04-11 12:37:53 +05:30
|
|
|
|
"Auxiliary %s: %s on %s — falling back to %s (%s)",
|
|
|
|
|
|
task or "call", reason, failed_provider, label, model or "default",
|
2026-04-06 12:41:40 -07:00
|
|
|
|
)
|
|
|
|
|
|
return client, model, label
|
|
|
|
|
|
tried.append(label)
|
|
|
|
|
|
|
|
|
|
|
|
logger.warning(
|
2026-04-11 12:37:53 +05:30
|
|
|
|
"Auxiliary %s: %s on %s and no fallback available (tried: %s)",
|
|
|
|
|
|
task or "call", reason, failed_provider, ", ".join(tried),
|
2026-04-06 12:41:40 -07:00
|
|
|
|
)
|
|
|
|
|
|
return None, None, ""
|
|
|
|
|
|
|
|
|
|
|
|
|
2026-04-12 00:10:19 -04:00
|
|
|
|
def _resolve_auto(main_runtime: Optional[Dict[str, Any]] = None) -> Tuple[Optional[OpenAI], Optional[str]]:
|
2026-04-04 12:07:43 -07:00
|
|
|
|
"""Full auto-detection chain.
|
|
|
|
|
|
|
|
|
|
|
|
Priority:
|
feat(auxiliary): default 'auto' routing to main model for all users (#11900)
Before: aggregator users (OpenRouter / Nous Portal) running 'auto'
routing for auxiliary tasks — compression, vision, web extraction,
session search, etc. — got routed to a cheap provider-side default
model (Gemini Flash). Non-aggregator users already got their main
model. Behavior was inconsistent and surprising — users picked
Claude / GPT / their preferred model, but side tasks ran on
Gemini Flash.
After: 'auto' means "use my main chat model" for every user,
regardless of provider type. Only when the main provider has no
working client does the fallback chain run (OpenRouter → Nous →
custom → Codex → API-key providers). Explicit per-task overrides
in config.yaml (auxiliary.<task>.provider / .model) still win —
they are a hard constraint, not subject to the auto policy.
Vision auto-detection follows the same policy: try main provider +
main model first (with _PROVIDER_VISION_MODELS overrides preserved
for providers like xiaomi and zai that ship a dedicated multimodal
model distinct from their chat model). Aggregator strict vision
backends are fallbacks, not the primary path.
Changes:
- agent/auxiliary_client.py: _resolve_auto() drops the
`_AGGREGATOR_PROVIDERS` guard. resolve_vision_provider_client()
auto branch unifies aggregator and exotic-provider paths —
everyone goes through resolve_provider_client() with main_model.
Dead _AGGREGATOR_PROVIDERS constant removed (was only used by
the guard we just removed).
- hermes_cli/main.py: aux config menu copy updated to reflect
the new semantics ("'auto' means 'use my main model'").
- tests/agent/test_auxiliary_main_first.py: 12 regression tests
covering OpenRouter/Nous/DeepSeek main paths, runtime-override
wins, explicit-config wins, vision override preservation for
exotic providers, and fallback-chain activation when the main
provider has no working client.
Co-authored-by: teknium1 <teknium@nousresearch.com>
2026-04-17 19:13:23 -07:00
|
|
|
|
1. User's main provider + main model, regardless of provider type.
|
|
|
|
|
|
This means auxiliary tasks (compression, vision, web extraction,
|
|
|
|
|
|
session search, etc.) use the same model the user configured for
|
|
|
|
|
|
chat. Users on OpenRouter/Nous get their chosen chat model; users
|
|
|
|
|
|
on DeepSeek/ZAI/Alibaba get theirs; etc. Running aux tasks on the
|
|
|
|
|
|
user's picked model keeps behavior predictable — no surprise
|
|
|
|
|
|
switches to a cheap fallback model for side tasks.
|
|
|
|
|
|
2. OpenRouter → Nous → custom → Codex → API-key providers (fallback
|
|
|
|
|
|
chain, only used when the main provider has no working client).
|
2026-04-04 12:07:43 -07:00
|
|
|
|
"""
|
2026-04-11 12:48:09 +05:30
|
|
|
|
global auxiliary_is_nous, _stale_base_url_warned
|
2026-03-17 04:02:15 -07:00
|
|
|
|
auxiliary_is_nous = False # Reset — _try_nous() will set True if it wins
|
2026-04-12 00:10:19 -04:00
|
|
|
|
runtime = _normalize_main_runtime(main_runtime)
|
|
|
|
|
|
runtime_provider = runtime.get("provider", "")
|
|
|
|
|
|
runtime_model = runtime.get("model", "")
|
|
|
|
|
|
runtime_base_url = runtime.get("base_url", "")
|
|
|
|
|
|
runtime_api_key = runtime.get("api_key", "")
|
|
|
|
|
|
runtime_api_mode = runtime.get("api_mode", "")
|
2026-04-04 12:07:43 -07:00
|
|
|
|
|
2026-04-11 12:48:09 +05:30
|
|
|
|
# ── Warn once if OPENAI_BASE_URL is set but config.yaml uses a named
|
|
|
|
|
|
# provider (not 'custom'). This catches the common "env poisoning"
|
|
|
|
|
|
# scenario where a user switches providers via `hermes model` but the
|
|
|
|
|
|
# old OPENAI_BASE_URL lingers in ~/.hermes/.env. ──
|
|
|
|
|
|
if not _stale_base_url_warned:
|
|
|
|
|
|
_env_base = os.getenv("OPENAI_BASE_URL", "").strip()
|
2026-04-12 00:10:19 -04:00
|
|
|
|
_cfg_provider = runtime_provider or _read_main_provider()
|
2026-04-11 12:48:09 +05:30
|
|
|
|
if (_env_base and _cfg_provider
|
|
|
|
|
|
and _cfg_provider != "custom"
|
|
|
|
|
|
and not _cfg_provider.startswith("custom:")):
|
|
|
|
|
|
logger.warning(
|
|
|
|
|
|
"OPENAI_BASE_URL is set (%s) but model.provider is '%s'. "
|
|
|
|
|
|
"Auxiliary clients may route to the wrong endpoint. "
|
|
|
|
|
|
"Run: hermes model to reconfigure, or remove "
|
|
|
|
|
|
"OPENAI_BASE_URL from ~/.hermes/.env",
|
|
|
|
|
|
_env_base, _cfg_provider,
|
|
|
|
|
|
)
|
|
|
|
|
|
_stale_base_url_warned = True
|
|
|
|
|
|
|
feat(auxiliary): default 'auto' routing to main model for all users (#11900)
Before: aggregator users (OpenRouter / Nous Portal) running 'auto'
routing for auxiliary tasks — compression, vision, web extraction,
session search, etc. — got routed to a cheap provider-side default
model (Gemini Flash). Non-aggregator users already got their main
model. Behavior was inconsistent and surprising — users picked
Claude / GPT / their preferred model, but side tasks ran on
Gemini Flash.
After: 'auto' means "use my main chat model" for every user,
regardless of provider type. Only when the main provider has no
working client does the fallback chain run (OpenRouter → Nous →
custom → Codex → API-key providers). Explicit per-task overrides
in config.yaml (auxiliary.<task>.provider / .model) still win —
they are a hard constraint, not subject to the auto policy.
Vision auto-detection follows the same policy: try main provider +
main model first (with _PROVIDER_VISION_MODELS overrides preserved
for providers like xiaomi and zai that ship a dedicated multimodal
model distinct from their chat model). Aggregator strict vision
backends are fallbacks, not the primary path.
Changes:
- agent/auxiliary_client.py: _resolve_auto() drops the
`_AGGREGATOR_PROVIDERS` guard. resolve_vision_provider_client()
auto branch unifies aggregator and exotic-provider paths —
everyone goes through resolve_provider_client() with main_model.
Dead _AGGREGATOR_PROVIDERS constant removed (was only used by
the guard we just removed).
- hermes_cli/main.py: aux config menu copy updated to reflect
the new semantics ("'auto' means 'use my main model'").
- tests/agent/test_auxiliary_main_first.py: 12 regression tests
covering OpenRouter/Nous/DeepSeek main paths, runtime-override
wins, explicit-config wins, vision override preservation for
exotic providers, and fallback-chain activation when the main
provider has no working client.
Co-authored-by: teknium1 <teknium@nousresearch.com>
2026-04-17 19:13:23 -07:00
|
|
|
|
# ── Step 1: main provider + main model → use them directly ──
|
|
|
|
|
|
#
|
|
|
|
|
|
# This is the primary aux backend for every user. "auto" means
|
|
|
|
|
|
# "use my main chat model for side tasks as well" — including users
|
|
|
|
|
|
# on aggregators (OpenRouter, Nous) who previously got routed to a
|
|
|
|
|
|
# cheap provider-side default. Explicit per-task overrides set via
|
|
|
|
|
|
# config.yaml (auxiliary.<task>.provider) still win over this.
|
2026-04-12 00:10:19 -04:00
|
|
|
|
main_provider = runtime_provider or _read_main_provider()
|
|
|
|
|
|
main_model = runtime_model or _read_main_model()
|
2026-04-04 12:07:43 -07:00
|
|
|
|
if (main_provider and main_model
|
2026-04-09 13:23:43 -07:00
|
|
|
|
and main_provider not in ("auto", "")):
|
2026-04-12 00:10:19 -04:00
|
|
|
|
resolved_provider = main_provider
|
|
|
|
|
|
explicit_base_url = None
|
|
|
|
|
|
explicit_api_key = None
|
|
|
|
|
|
if runtime_base_url and (main_provider == "custom" or main_provider.startswith("custom:")):
|
|
|
|
|
|
resolved_provider = "custom"
|
|
|
|
|
|
explicit_base_url = runtime_base_url
|
|
|
|
|
|
explicit_api_key = runtime_api_key or None
|
|
|
|
|
|
client, resolved = resolve_provider_client(
|
|
|
|
|
|
resolved_provider,
|
|
|
|
|
|
main_model,
|
|
|
|
|
|
explicit_base_url=explicit_base_url,
|
|
|
|
|
|
explicit_api_key=explicit_api_key,
|
|
|
|
|
|
api_mode=runtime_api_mode or None,
|
|
|
|
|
|
)
|
2026-04-04 12:07:43 -07:00
|
|
|
|
if client is not None:
|
|
|
|
|
|
logger.info("Auxiliary auto-detect: using main provider %s (%s)",
|
|
|
|
|
|
main_provider, resolved or main_model)
|
|
|
|
|
|
return client, resolved or main_model
|
|
|
|
|
|
|
|
|
|
|
|
# ── Step 2: aggregator / fallback chain ──────────────────────────────
|
2026-03-29 21:29:00 -07:00
|
|
|
|
tried = []
|
2026-04-06 12:41:40 -07:00
|
|
|
|
for label, try_fn in _get_provider_chain():
|
2026-03-07 08:52:06 -08:00
|
|
|
|
client, model = try_fn()
|
|
|
|
|
|
if client is not None:
|
2026-03-29 21:29:00 -07:00
|
|
|
|
if tried:
|
|
|
|
|
|
logger.info("Auxiliary auto-detect: using %s (%s) — skipped: %s",
|
|
|
|
|
|
label, model or "default", ", ".join(tried))
|
|
|
|
|
|
else:
|
|
|
|
|
|
logger.info("Auxiliary auto-detect: using %s (%s)", label, model or "default")
|
2026-03-07 08:52:06 -08:00
|
|
|
|
return client, model
|
2026-03-29 21:29:00 -07:00
|
|
|
|
tried.append(label)
|
|
|
|
|
|
logger.warning("Auxiliary auto-detect: no provider available (tried: %s). "
|
|
|
|
|
|
"Compression, summarization, and memory flush will not work. "
|
|
|
|
|
|
"Set OPENROUTER_API_KEY or configure a local model in config.yaml.",
|
|
|
|
|
|
", ".join(tried))
|
2026-03-07 08:52:06 -08:00
|
|
|
|
return None, None
|
|
|
|
|
|
|
|
|
|
|
|
|
feat: centralized provider router + fix Codex vision bypass + vision error handling
Three interconnected fixes for auxiliary client infrastructure:
1. CENTRALIZED PROVIDER ROUTER (auxiliary_client.py)
Add resolve_provider_client(provider, model, async_mode) — a single
entry point for creating properly configured clients. Given a provider
name and optional model, it handles auth lookup (env vars, OAuth
tokens, auth.json), base URL resolution, provider-specific headers,
and API format differences (Chat Completions vs Responses API for
Codex). All auxiliary consumers should route through this instead of
ad-hoc env var lookups.
Refactored get_text_auxiliary_client, get_async_text_auxiliary_client,
and get_vision_auxiliary_client to use the router internally.
2. FIX CODEX VISION BYPASS (vision_tools.py)
vision_tools.py was constructing a raw AsyncOpenAI client from the
sync vision client's api_key/base_url, completely bypassing the Codex
Responses API adapter. When the vision provider resolved to Codex,
the raw client would hit chatgpt.com/backend-api/codex with
chat.completions.create() which only supports the Responses API.
Fix: Added get_async_vision_auxiliary_client() which properly wraps
Codex into AsyncCodexAuxiliaryClient. vision_tools.py now uses this
instead of manual client construction.
3. FIX COMPRESSION FALLBACK + VISION ERROR HANDLING
- context_compressor.py: Removed _get_fallback_client() which blindly
looked for OPENAI_API_KEY + OPENAI_BASE_URL (fails for Codex OAuth,
API-key providers, users without OPENAI_BASE_URL set). Replaced
with fallback loop through resolve_provider_client() for each
known provider, with same-provider dedup.
- vision_tools.py: Added error detection for vision capability
failures. Returns clear message to the model when the configured
model doesn't support vision, instead of a generic error.
Addresses #886
2026-03-11 19:46:47 -07:00
|
|
|
|
# ── Centralized Provider Router ─────────────────────────────────────────────
|
|
|
|
|
|
#
|
|
|
|
|
|
# resolve_provider_client() is the single entry point for creating a properly
|
|
|
|
|
|
# configured client given a (provider, model) pair. It handles auth lookup,
|
|
|
|
|
|
# base URL resolution, provider-specific headers, and API format differences
|
|
|
|
|
|
# (Chat Completions vs Responses API for Codex).
|
|
|
|
|
|
#
|
|
|
|
|
|
# All auxiliary consumer code should go through this or the public helpers
|
|
|
|
|
|
# below — never look up auth env vars ad-hoc.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def _to_async_client(sync_client, model: str):
|
|
|
|
|
|
"""Convert a sync client to its async counterpart, preserving Codex routing."""
|
|
|
|
|
|
from openai import AsyncOpenAI
|
|
|
|
|
|
|
|
|
|
|
|
if isinstance(sync_client, CodexAuxiliaryClient):
|
|
|
|
|
|
return AsyncCodexAuxiliaryClient(sync_client), model
|
2026-03-14 21:14:20 -07:00
|
|
|
|
if isinstance(sync_client, AnthropicAuxiliaryClient):
|
|
|
|
|
|
return AsyncAnthropicAuxiliaryClient(sync_client), model
|
2026-04-20 00:00:50 +05:30
|
|
|
|
try:
|
|
|
|
|
|
from agent.gemini_native_adapter import GeminiNativeClient, AsyncGeminiNativeClient
|
|
|
|
|
|
|
|
|
|
|
|
if isinstance(sync_client, GeminiNativeClient):
|
|
|
|
|
|
return AsyncGeminiNativeClient(sync_client), model
|
|
|
|
|
|
except ImportError:
|
|
|
|
|
|
pass
|
2026-04-13 16:06:22 -07:00
|
|
|
|
try:
|
|
|
|
|
|
from agent.copilot_acp_client import CopilotACPClient
|
|
|
|
|
|
if isinstance(sync_client, CopilotACPClient):
|
|
|
|
|
|
return sync_client, model
|
|
|
|
|
|
except ImportError:
|
|
|
|
|
|
pass
|
feat: centralized provider router + fix Codex vision bypass + vision error handling
Three interconnected fixes for auxiliary client infrastructure:
1. CENTRALIZED PROVIDER ROUTER (auxiliary_client.py)
Add resolve_provider_client(provider, model, async_mode) — a single
entry point for creating properly configured clients. Given a provider
name and optional model, it handles auth lookup (env vars, OAuth
tokens, auth.json), base URL resolution, provider-specific headers,
and API format differences (Chat Completions vs Responses API for
Codex). All auxiliary consumers should route through this instead of
ad-hoc env var lookups.
Refactored get_text_auxiliary_client, get_async_text_auxiliary_client,
and get_vision_auxiliary_client to use the router internally.
2. FIX CODEX VISION BYPASS (vision_tools.py)
vision_tools.py was constructing a raw AsyncOpenAI client from the
sync vision client's api_key/base_url, completely bypassing the Codex
Responses API adapter. When the vision provider resolved to Codex,
the raw client would hit chatgpt.com/backend-api/codex with
chat.completions.create() which only supports the Responses API.
Fix: Added get_async_vision_auxiliary_client() which properly wraps
Codex into AsyncCodexAuxiliaryClient. vision_tools.py now uses this
instead of manual client construction.
3. FIX COMPRESSION FALLBACK + VISION ERROR HANDLING
- context_compressor.py: Removed _get_fallback_client() which blindly
looked for OPENAI_API_KEY + OPENAI_BASE_URL (fails for Codex OAuth,
API-key providers, users without OPENAI_BASE_URL set). Replaced
with fallback loop through resolve_provider_client() for each
known provider, with same-provider dedup.
- vision_tools.py: Added error detection for vision capability
failures. Returns clear message to the model when the configured
model doesn't support vision, instead of a generic error.
Addresses #886
2026-03-11 19:46:47 -07:00
|
|
|
|
|
|
|
|
|
|
async_kwargs = {
|
|
|
|
|
|
"api_key": sync_client.api_key,
|
|
|
|
|
|
"base_url": str(sync_client.base_url),
|
|
|
|
|
|
}
|
fix: sweep remaining provider-URL substring checks across codebase
Completes the hostname-hardening sweep — every substring check against a
provider host in live-routing code is now hostname-based. This closes the
same false-positive class for OpenRouter, GitHub Copilot, Kimi, Qwen,
ChatGPT/Codex, Bedrock, GitHub Models, Vercel AI Gateway, Nous, Z.AI,
Moonshot, Arcee, and MiniMax that the original PR closed for OpenAI, xAI,
and Anthropic.
New helper:
- utils.base_url_host_matches(base_url, domain) — safe counterpart to
'domain in base_url'. Accepts hostname equality and subdomain matches;
rejects path segments, host suffixes, and prefix collisions.
Call sites converted (real-code only; tests, optional-skills, red-teaming
scripts untouched):
run_agent.py (10 sites):
- AIAgent.__init__ Bedrock branch, ChatGPT/Codex branch (also path check)
- header cascade for openrouter / copilot / kimi / qwen / chatgpt
- interleaved-thinking trigger (openrouter + claude)
- _is_openrouter_url(), _is_qwen_portal()
- is_native_anthropic check
- github-models-vs-copilot detection (3 sites)
- reasoning-capable route gate (nousresearch, vercel, github)
- codex-backend detection in API kwargs build
- fallback api_mode Bedrock detection
agent/auxiliary_client.py (7 sites):
- extra-headers cascades in 4 distinct client-construction paths
(resolve custom, resolve auto, OpenRouter-fallback-to-custom,
_async_client_from_sync, resolve_provider_client explicit-custom,
resolve_auto_with_codex)
- _is_openrouter_client() base_url sniff
agent/usage_pricing.py:
- resolve_billing_route openrouter branch
agent/model_metadata.py:
- _is_openrouter_base_url(), Bedrock context-length lookup
hermes_cli/providers.py:
- determine_api_mode Bedrock heuristic
hermes_cli/runtime_provider.py:
- _is_openrouter_url flag for API-key preference (issues #420, #560)
hermes_cli/doctor.py:
- Kimi User-Agent header for /models probes
tools/delegate_tool.py:
- subagent Codex endpoint detection
trajectory_compressor.py:
- _detect_provider() cascade (8 providers: openrouter, nous, codex, zai,
kimi-coding, arcee, minimax-cn, minimax)
cli.py, gateway/run.py:
- /model-switch cache-enabled hint (openrouter + claude)
Bedrock detection tightened from 'bedrock-runtime in url' to
'hostname starts with bedrock-runtime. AND host is under amazonaws.com'.
ChatGPT/Codex detection tightened from 'chatgpt.com/backend-api/codex in
url' to 'hostname is chatgpt.com AND path contains /backend-api/codex'.
Tests:
- tests/test_base_url_hostname.py extended with a base_url_host_matches
suite (exact match, subdomain, path-segment rejection, host-suffix
rejection, host-prefix rejection, empty-input, case-insensitivity,
trailing dot).
Validation: 651 targeted tests pass (runtime_provider, minimax, bedrock,
gemini, auxiliary, codex_cloudflare, usage_pricing, compressor_fallback,
fallback_model, openai_client_lifecycle, provider_parity, cli_provider_resolution,
delegate, credential_pool, context_compressor, plus the 4 hostname test
modules). 26-assertion E2E call-site verification across 6 modules passes.
2026-04-20 21:17:28 -07:00
|
|
|
|
sync_base_url = str(sync_client.base_url)
|
|
|
|
|
|
if base_url_host_matches(sync_base_url, "openrouter.ai"):
|
feat: centralized provider router + fix Codex vision bypass + vision error handling
Three interconnected fixes for auxiliary client infrastructure:
1. CENTRALIZED PROVIDER ROUTER (auxiliary_client.py)
Add resolve_provider_client(provider, model, async_mode) — a single
entry point for creating properly configured clients. Given a provider
name and optional model, it handles auth lookup (env vars, OAuth
tokens, auth.json), base URL resolution, provider-specific headers,
and API format differences (Chat Completions vs Responses API for
Codex). All auxiliary consumers should route through this instead of
ad-hoc env var lookups.
Refactored get_text_auxiliary_client, get_async_text_auxiliary_client,
and get_vision_auxiliary_client to use the router internally.
2. FIX CODEX VISION BYPASS (vision_tools.py)
vision_tools.py was constructing a raw AsyncOpenAI client from the
sync vision client's api_key/base_url, completely bypassing the Codex
Responses API adapter. When the vision provider resolved to Codex,
the raw client would hit chatgpt.com/backend-api/codex with
chat.completions.create() which only supports the Responses API.
Fix: Added get_async_vision_auxiliary_client() which properly wraps
Codex into AsyncCodexAuxiliaryClient. vision_tools.py now uses this
instead of manual client construction.
3. FIX COMPRESSION FALLBACK + VISION ERROR HANDLING
- context_compressor.py: Removed _get_fallback_client() which blindly
looked for OPENAI_API_KEY + OPENAI_BASE_URL (fails for Codex OAuth,
API-key providers, users without OPENAI_BASE_URL set). Replaced
with fallback loop through resolve_provider_client() for each
known provider, with same-provider dedup.
- vision_tools.py: Added error detection for vision capability
failures. Returns clear message to the model when the configured
model doesn't support vision, instead of a generic error.
Addresses #886
2026-03-11 19:46:47 -07:00
|
|
|
|
async_kwargs["default_headers"] = dict(_OR_HEADERS)
|
fix: sweep remaining provider-URL substring checks across codebase
Completes the hostname-hardening sweep — every substring check against a
provider host in live-routing code is now hostname-based. This closes the
same false-positive class for OpenRouter, GitHub Copilot, Kimi, Qwen,
ChatGPT/Codex, Bedrock, GitHub Models, Vercel AI Gateway, Nous, Z.AI,
Moonshot, Arcee, and MiniMax that the original PR closed for OpenAI, xAI,
and Anthropic.
New helper:
- utils.base_url_host_matches(base_url, domain) — safe counterpart to
'domain in base_url'. Accepts hostname equality and subdomain matches;
rejects path segments, host suffixes, and prefix collisions.
Call sites converted (real-code only; tests, optional-skills, red-teaming
scripts untouched):
run_agent.py (10 sites):
- AIAgent.__init__ Bedrock branch, ChatGPT/Codex branch (also path check)
- header cascade for openrouter / copilot / kimi / qwen / chatgpt
- interleaved-thinking trigger (openrouter + claude)
- _is_openrouter_url(), _is_qwen_portal()
- is_native_anthropic check
- github-models-vs-copilot detection (3 sites)
- reasoning-capable route gate (nousresearch, vercel, github)
- codex-backend detection in API kwargs build
- fallback api_mode Bedrock detection
agent/auxiliary_client.py (7 sites):
- extra-headers cascades in 4 distinct client-construction paths
(resolve custom, resolve auto, OpenRouter-fallback-to-custom,
_async_client_from_sync, resolve_provider_client explicit-custom,
resolve_auto_with_codex)
- _is_openrouter_client() base_url sniff
agent/usage_pricing.py:
- resolve_billing_route openrouter branch
agent/model_metadata.py:
- _is_openrouter_base_url(), Bedrock context-length lookup
hermes_cli/providers.py:
- determine_api_mode Bedrock heuristic
hermes_cli/runtime_provider.py:
- _is_openrouter_url flag for API-key preference (issues #420, #560)
hermes_cli/doctor.py:
- Kimi User-Agent header for /models probes
tools/delegate_tool.py:
- subagent Codex endpoint detection
trajectory_compressor.py:
- _detect_provider() cascade (8 providers: openrouter, nous, codex, zai,
kimi-coding, arcee, minimax-cn, minimax)
cli.py, gateway/run.py:
- /model-switch cache-enabled hint (openrouter + claude)
Bedrock detection tightened from 'bedrock-runtime in url' to
'hostname starts with bedrock-runtime. AND host is under amazonaws.com'.
ChatGPT/Codex detection tightened from 'chatgpt.com/backend-api/codex in
url' to 'hostname is chatgpt.com AND path contains /backend-api/codex'.
Tests:
- tests/test_base_url_hostname.py extended with a base_url_host_matches
suite (exact match, subdomain, path-segment rejection, host-suffix
rejection, host-prefix rejection, empty-input, case-insensitivity,
trailing dot).
Validation: 651 targeted tests pass (runtime_provider, minimax, bedrock,
gemini, auxiliary, codex_cloudflare, usage_pricing, compressor_fallback,
fallback_model, openai_client_lifecycle, provider_parity, cli_provider_resolution,
delegate, credential_pool, context_compressor, plus the 4 hostname test
modules). 26-assertion E2E call-site verification across 6 modules passes.
2026-04-20 21:17:28 -07:00
|
|
|
|
elif base_url_host_matches(sync_base_url, "api.githubcopilot.com"):
|
2026-03-17 23:40:22 -07:00
|
|
|
|
from hermes_cli.models import copilot_default_headers
|
|
|
|
|
|
|
|
|
|
|
|
async_kwargs["default_headers"] = copilot_default_headers()
|
fix: sweep remaining provider-URL substring checks across codebase
Completes the hostname-hardening sweep — every substring check against a
provider host in live-routing code is now hostname-based. This closes the
same false-positive class for OpenRouter, GitHub Copilot, Kimi, Qwen,
ChatGPT/Codex, Bedrock, GitHub Models, Vercel AI Gateway, Nous, Z.AI,
Moonshot, Arcee, and MiniMax that the original PR closed for OpenAI, xAI,
and Anthropic.
New helper:
- utils.base_url_host_matches(base_url, domain) — safe counterpart to
'domain in base_url'. Accepts hostname equality and subdomain matches;
rejects path segments, host suffixes, and prefix collisions.
Call sites converted (real-code only; tests, optional-skills, red-teaming
scripts untouched):
run_agent.py (10 sites):
- AIAgent.__init__ Bedrock branch, ChatGPT/Codex branch (also path check)
- header cascade for openrouter / copilot / kimi / qwen / chatgpt
- interleaved-thinking trigger (openrouter + claude)
- _is_openrouter_url(), _is_qwen_portal()
- is_native_anthropic check
- github-models-vs-copilot detection (3 sites)
- reasoning-capable route gate (nousresearch, vercel, github)
- codex-backend detection in API kwargs build
- fallback api_mode Bedrock detection
agent/auxiliary_client.py (7 sites):
- extra-headers cascades in 4 distinct client-construction paths
(resolve custom, resolve auto, OpenRouter-fallback-to-custom,
_async_client_from_sync, resolve_provider_client explicit-custom,
resolve_auto_with_codex)
- _is_openrouter_client() base_url sniff
agent/usage_pricing.py:
- resolve_billing_route openrouter branch
agent/model_metadata.py:
- _is_openrouter_base_url(), Bedrock context-length lookup
hermes_cli/providers.py:
- determine_api_mode Bedrock heuristic
hermes_cli/runtime_provider.py:
- _is_openrouter_url flag for API-key preference (issues #420, #560)
hermes_cli/doctor.py:
- Kimi User-Agent header for /models probes
tools/delegate_tool.py:
- subagent Codex endpoint detection
trajectory_compressor.py:
- _detect_provider() cascade (8 providers: openrouter, nous, codex, zai,
kimi-coding, arcee, minimax-cn, minimax)
cli.py, gateway/run.py:
- /model-switch cache-enabled hint (openrouter + claude)
Bedrock detection tightened from 'bedrock-runtime in url' to
'hostname starts with bedrock-runtime. AND host is under amazonaws.com'.
ChatGPT/Codex detection tightened from 'chatgpt.com/backend-api/codex in
url' to 'hostname is chatgpt.com AND path contains /backend-api/codex'.
Tests:
- tests/test_base_url_hostname.py extended with a base_url_host_matches
suite (exact match, subdomain, path-segment rejection, host-suffix
rejection, host-prefix rejection, empty-input, case-insensitivity,
trailing dot).
Validation: 651 targeted tests pass (runtime_provider, minimax, bedrock,
gemini, auxiliary, codex_cloudflare, usage_pricing, compressor_fallback,
fallback_model, openai_client_lifecycle, provider_parity, cli_provider_resolution,
delegate, credential_pool, context_compressor, plus the 4 hostname test
modules). 26-assertion E2E call-site verification across 6 modules passes.
2026-04-20 21:17:28 -07:00
|
|
|
|
elif base_url_host_matches(sync_base_url, "api.kimi.com"):
|
2026-04-18 22:55:36 +08:00
|
|
|
|
async_kwargs["default_headers"] = {"User-Agent": "claude-code/0.1.0"}
|
feat: centralized provider router + fix Codex vision bypass + vision error handling
Three interconnected fixes for auxiliary client infrastructure:
1. CENTRALIZED PROVIDER ROUTER (auxiliary_client.py)
Add resolve_provider_client(provider, model, async_mode) — a single
entry point for creating properly configured clients. Given a provider
name and optional model, it handles auth lookup (env vars, OAuth
tokens, auth.json), base URL resolution, provider-specific headers,
and API format differences (Chat Completions vs Responses API for
Codex). All auxiliary consumers should route through this instead of
ad-hoc env var lookups.
Refactored get_text_auxiliary_client, get_async_text_auxiliary_client,
and get_vision_auxiliary_client to use the router internally.
2. FIX CODEX VISION BYPASS (vision_tools.py)
vision_tools.py was constructing a raw AsyncOpenAI client from the
sync vision client's api_key/base_url, completely bypassing the Codex
Responses API adapter. When the vision provider resolved to Codex,
the raw client would hit chatgpt.com/backend-api/codex with
chat.completions.create() which only supports the Responses API.
Fix: Added get_async_vision_auxiliary_client() which properly wraps
Codex into AsyncCodexAuxiliaryClient. vision_tools.py now uses this
instead of manual client construction.
3. FIX COMPRESSION FALLBACK + VISION ERROR HANDLING
- context_compressor.py: Removed _get_fallback_client() which blindly
looked for OPENAI_API_KEY + OPENAI_BASE_URL (fails for Codex OAuth,
API-key providers, users without OPENAI_BASE_URL set). Replaced
with fallback loop through resolve_provider_client() for each
known provider, with same-provider dedup.
- vision_tools.py: Added error detection for vision capability
failures. Returns clear message to the model when the configured
model doesn't support vision, instead of a generic error.
Addresses #886
2026-03-11 19:46:47 -07:00
|
|
|
|
return AsyncOpenAI(**async_kwargs), model
|
|
|
|
|
|
|
|
|
|
|
|
|
2026-04-09 21:20:29 -07:00
|
|
|
|
def _normalize_resolved_model(model_name: Optional[str], provider: str) -> Optional[str]:
|
|
|
|
|
|
"""Normalize a resolved model for the provider that will receive it."""
|
|
|
|
|
|
if not model_name:
|
|
|
|
|
|
return model_name
|
|
|
|
|
|
try:
|
|
|
|
|
|
from hermes_cli.model_normalize import normalize_model_for_provider
|
|
|
|
|
|
|
|
|
|
|
|
return normalize_model_for_provider(model_name, provider)
|
|
|
|
|
|
except Exception:
|
|
|
|
|
|
return model_name
|
|
|
|
|
|
|
|
|
|
|
|
|
feat: centralized provider router + fix Codex vision bypass + vision error handling
Three interconnected fixes for auxiliary client infrastructure:
1. CENTRALIZED PROVIDER ROUTER (auxiliary_client.py)
Add resolve_provider_client(provider, model, async_mode) — a single
entry point for creating properly configured clients. Given a provider
name and optional model, it handles auth lookup (env vars, OAuth
tokens, auth.json), base URL resolution, provider-specific headers,
and API format differences (Chat Completions vs Responses API for
Codex). All auxiliary consumers should route through this instead of
ad-hoc env var lookups.
Refactored get_text_auxiliary_client, get_async_text_auxiliary_client,
and get_vision_auxiliary_client to use the router internally.
2. FIX CODEX VISION BYPASS (vision_tools.py)
vision_tools.py was constructing a raw AsyncOpenAI client from the
sync vision client's api_key/base_url, completely bypassing the Codex
Responses API adapter. When the vision provider resolved to Codex,
the raw client would hit chatgpt.com/backend-api/codex with
chat.completions.create() which only supports the Responses API.
Fix: Added get_async_vision_auxiliary_client() which properly wraps
Codex into AsyncCodexAuxiliaryClient. vision_tools.py now uses this
instead of manual client construction.
3. FIX COMPRESSION FALLBACK + VISION ERROR HANDLING
- context_compressor.py: Removed _get_fallback_client() which blindly
looked for OPENAI_API_KEY + OPENAI_BASE_URL (fails for Codex OAuth,
API-key providers, users without OPENAI_BASE_URL set). Replaced
with fallback loop through resolve_provider_client() for each
known provider, with same-provider dedup.
- vision_tools.py: Added error detection for vision capability
failures. Returns clear message to the model when the configured
model doesn't support vision, instead of a generic error.
Addresses #886
2026-03-11 19:46:47 -07:00
|
|
|
|
def resolve_provider_client(
|
|
|
|
|
|
provider: str,
|
|
|
|
|
|
model: str = None,
|
|
|
|
|
|
async_mode: bool = False,
|
2026-03-11 21:38:29 -07:00
|
|
|
|
raw_codex: bool = False,
|
2026-03-14 20:48:29 -07:00
|
|
|
|
explicit_base_url: str = None,
|
|
|
|
|
|
explicit_api_key: str = None,
|
2026-04-11 13:50:43 +05:30
|
|
|
|
api_mode: str = None,
|
2026-04-12 00:10:19 -04:00
|
|
|
|
main_runtime: Optional[Dict[str, Any]] = None,
|
feat: centralized provider router + fix Codex vision bypass + vision error handling
Three interconnected fixes for auxiliary client infrastructure:
1. CENTRALIZED PROVIDER ROUTER (auxiliary_client.py)
Add resolve_provider_client(provider, model, async_mode) — a single
entry point for creating properly configured clients. Given a provider
name and optional model, it handles auth lookup (env vars, OAuth
tokens, auth.json), base URL resolution, provider-specific headers,
and API format differences (Chat Completions vs Responses API for
Codex). All auxiliary consumers should route through this instead of
ad-hoc env var lookups.
Refactored get_text_auxiliary_client, get_async_text_auxiliary_client,
and get_vision_auxiliary_client to use the router internally.
2. FIX CODEX VISION BYPASS (vision_tools.py)
vision_tools.py was constructing a raw AsyncOpenAI client from the
sync vision client's api_key/base_url, completely bypassing the Codex
Responses API adapter. When the vision provider resolved to Codex,
the raw client would hit chatgpt.com/backend-api/codex with
chat.completions.create() which only supports the Responses API.
Fix: Added get_async_vision_auxiliary_client() which properly wraps
Codex into AsyncCodexAuxiliaryClient. vision_tools.py now uses this
instead of manual client construction.
3. FIX COMPRESSION FALLBACK + VISION ERROR HANDLING
- context_compressor.py: Removed _get_fallback_client() which blindly
looked for OPENAI_API_KEY + OPENAI_BASE_URL (fails for Codex OAuth,
API-key providers, users without OPENAI_BASE_URL set). Replaced
with fallback loop through resolve_provider_client() for each
known provider, with same-provider dedup.
- vision_tools.py: Added error detection for vision capability
failures. Returns clear message to the model when the configured
model doesn't support vision, instead of a generic error.
Addresses #886
2026-03-11 19:46:47 -07:00
|
|
|
|
) -> Tuple[Optional[Any], Optional[str]]:
|
|
|
|
|
|
"""Central router: given a provider name and optional model, return a
|
|
|
|
|
|
configured client with the correct auth, base URL, and API format.
|
|
|
|
|
|
|
|
|
|
|
|
The returned client always exposes ``.chat.completions.create()`` — for
|
|
|
|
|
|
Codex/Responses API providers, an adapter handles the translation
|
|
|
|
|
|
transparently.
|
|
|
|
|
|
|
|
|
|
|
|
Args:
|
|
|
|
|
|
provider: Provider identifier. One of:
|
|
|
|
|
|
"openrouter", "nous", "openai-codex" (or "codex"),
|
2026-03-11 20:14:44 -07:00
|
|
|
|
"zai", "kimi-coding", "minimax", "minimax-cn",
|
feat: centralized provider router + fix Codex vision bypass + vision error handling
Three interconnected fixes for auxiliary client infrastructure:
1. CENTRALIZED PROVIDER ROUTER (auxiliary_client.py)
Add resolve_provider_client(provider, model, async_mode) — a single
entry point for creating properly configured clients. Given a provider
name and optional model, it handles auth lookup (env vars, OAuth
tokens, auth.json), base URL resolution, provider-specific headers,
and API format differences (Chat Completions vs Responses API for
Codex). All auxiliary consumers should route through this instead of
ad-hoc env var lookups.
Refactored get_text_auxiliary_client, get_async_text_auxiliary_client,
and get_vision_auxiliary_client to use the router internally.
2. FIX CODEX VISION BYPASS (vision_tools.py)
vision_tools.py was constructing a raw AsyncOpenAI client from the
sync vision client's api_key/base_url, completely bypassing the Codex
Responses API adapter. When the vision provider resolved to Codex,
the raw client would hit chatgpt.com/backend-api/codex with
chat.completions.create() which only supports the Responses API.
Fix: Added get_async_vision_auxiliary_client() which properly wraps
Codex into AsyncCodexAuxiliaryClient. vision_tools.py now uses this
instead of manual client construction.
3. FIX COMPRESSION FALLBACK + VISION ERROR HANDLING
- context_compressor.py: Removed _get_fallback_client() which blindly
looked for OPENAI_API_KEY + OPENAI_BASE_URL (fails for Codex OAuth,
API-key providers, users without OPENAI_BASE_URL set). Replaced
with fallback loop through resolve_provider_client() for each
known provider, with same-provider dedup.
- vision_tools.py: Added error detection for vision capability
failures. Returns clear message to the model when the configured
model doesn't support vision, instead of a generic error.
Addresses #886
2026-03-11 19:46:47 -07:00
|
|
|
|
"custom" (OPENAI_BASE_URL + OPENAI_API_KEY),
|
|
|
|
|
|
"auto" (full auto-detection chain).
|
|
|
|
|
|
model: Model slug override. If None, uses the provider's default
|
|
|
|
|
|
auxiliary model.
|
|
|
|
|
|
async_mode: If True, return an async-compatible client.
|
2026-03-11 21:38:29 -07:00
|
|
|
|
raw_codex: If True, return a raw OpenAI client for Codex providers
|
|
|
|
|
|
instead of wrapping in CodexAuxiliaryClient. Use this when
|
|
|
|
|
|
the caller needs direct access to responses.stream() (e.g.,
|
|
|
|
|
|
the main agent loop).
|
2026-03-14 20:48:29 -07:00
|
|
|
|
explicit_base_url: Optional direct OpenAI-compatible endpoint.
|
|
|
|
|
|
explicit_api_key: Optional API key paired with explicit_base_url.
|
2026-04-11 13:50:43 +05:30
|
|
|
|
api_mode: API mode override. One of "chat_completions",
|
|
|
|
|
|
"codex_responses", or None (auto-detect). When set to
|
|
|
|
|
|
"codex_responses", the client is wrapped in
|
|
|
|
|
|
CodexAuxiliaryClient to route through the Responses API.
|
feat: centralized provider router + fix Codex vision bypass + vision error handling
Three interconnected fixes for auxiliary client infrastructure:
1. CENTRALIZED PROVIDER ROUTER (auxiliary_client.py)
Add resolve_provider_client(provider, model, async_mode) — a single
entry point for creating properly configured clients. Given a provider
name and optional model, it handles auth lookup (env vars, OAuth
tokens, auth.json), base URL resolution, provider-specific headers,
and API format differences (Chat Completions vs Responses API for
Codex). All auxiliary consumers should route through this instead of
ad-hoc env var lookups.
Refactored get_text_auxiliary_client, get_async_text_auxiliary_client,
and get_vision_auxiliary_client to use the router internally.
2. FIX CODEX VISION BYPASS (vision_tools.py)
vision_tools.py was constructing a raw AsyncOpenAI client from the
sync vision client's api_key/base_url, completely bypassing the Codex
Responses API adapter. When the vision provider resolved to Codex,
the raw client would hit chatgpt.com/backend-api/codex with
chat.completions.create() which only supports the Responses API.
Fix: Added get_async_vision_auxiliary_client() which properly wraps
Codex into AsyncCodexAuxiliaryClient. vision_tools.py now uses this
instead of manual client construction.
3. FIX COMPRESSION FALLBACK + VISION ERROR HANDLING
- context_compressor.py: Removed _get_fallback_client() which blindly
looked for OPENAI_API_KEY + OPENAI_BASE_URL (fails for Codex OAuth,
API-key providers, users without OPENAI_BASE_URL set). Replaced
with fallback loop through resolve_provider_client() for each
known provider, with same-provider dedup.
- vision_tools.py: Added error detection for vision capability
failures. Returns clear message to the model when the configured
model doesn't support vision, instead of a generic error.
Addresses #886
2026-03-11 19:46:47 -07:00
|
|
|
|
|
|
|
|
|
|
Returns:
|
|
|
|
|
|
(client, resolved_model) or (None, None) if auth is unavailable.
|
|
|
|
|
|
"""
|
2026-04-15 15:07:11 -07:00
|
|
|
|
_validate_proxy_env_urls()
|
feat: centralized provider router + fix Codex vision bypass + vision error handling
Three interconnected fixes for auxiliary client infrastructure:
1. CENTRALIZED PROVIDER ROUTER (auxiliary_client.py)
Add resolve_provider_client(provider, model, async_mode) — a single
entry point for creating properly configured clients. Given a provider
name and optional model, it handles auth lookup (env vars, OAuth
tokens, auth.json), base URL resolution, provider-specific headers,
and API format differences (Chat Completions vs Responses API for
Codex). All auxiliary consumers should route through this instead of
ad-hoc env var lookups.
Refactored get_text_auxiliary_client, get_async_text_auxiliary_client,
and get_vision_auxiliary_client to use the router internally.
2. FIX CODEX VISION BYPASS (vision_tools.py)
vision_tools.py was constructing a raw AsyncOpenAI client from the
sync vision client's api_key/base_url, completely bypassing the Codex
Responses API adapter. When the vision provider resolved to Codex,
the raw client would hit chatgpt.com/backend-api/codex with
chat.completions.create() which only supports the Responses API.
Fix: Added get_async_vision_auxiliary_client() which properly wraps
Codex into AsyncCodexAuxiliaryClient. vision_tools.py now uses this
instead of manual client construction.
3. FIX COMPRESSION FALLBACK + VISION ERROR HANDLING
- context_compressor.py: Removed _get_fallback_client() which blindly
looked for OPENAI_API_KEY + OPENAI_BASE_URL (fails for Codex OAuth,
API-key providers, users without OPENAI_BASE_URL set). Replaced
with fallback loop through resolve_provider_client() for each
known provider, with same-provider dedup.
- vision_tools.py: Added error detection for vision capability
failures. Returns clear message to the model when the configured
model doesn't support vision, instead of a generic error.
Addresses #886
2026-03-11 19:46:47 -07:00
|
|
|
|
# Normalise aliases
|
2026-04-07 22:23:28 -07:00
|
|
|
|
provider = _normalize_aux_provider(provider)
|
feat: centralized provider router + fix Codex vision bypass + vision error handling
Three interconnected fixes for auxiliary client infrastructure:
1. CENTRALIZED PROVIDER ROUTER (auxiliary_client.py)
Add resolve_provider_client(provider, model, async_mode) — a single
entry point for creating properly configured clients. Given a provider
name and optional model, it handles auth lookup (env vars, OAuth
tokens, auth.json), base URL resolution, provider-specific headers,
and API format differences (Chat Completions vs Responses API for
Codex). All auxiliary consumers should route through this instead of
ad-hoc env var lookups.
Refactored get_text_auxiliary_client, get_async_text_auxiliary_client,
and get_vision_auxiliary_client to use the router internally.
2. FIX CODEX VISION BYPASS (vision_tools.py)
vision_tools.py was constructing a raw AsyncOpenAI client from the
sync vision client's api_key/base_url, completely bypassing the Codex
Responses API adapter. When the vision provider resolved to Codex,
the raw client would hit chatgpt.com/backend-api/codex with
chat.completions.create() which only supports the Responses API.
Fix: Added get_async_vision_auxiliary_client() which properly wraps
Codex into AsyncCodexAuxiliaryClient. vision_tools.py now uses this
instead of manual client construction.
3. FIX COMPRESSION FALLBACK + VISION ERROR HANDLING
- context_compressor.py: Removed _get_fallback_client() which blindly
looked for OPENAI_API_KEY + OPENAI_BASE_URL (fails for Codex OAuth,
API-key providers, users without OPENAI_BASE_URL set). Replaced
with fallback loop through resolve_provider_client() for each
known provider, with same-provider dedup.
- vision_tools.py: Added error detection for vision capability
failures. Returns clear message to the model when the configured
model doesn't support vision, instead of a generic error.
Addresses #886
2026-03-11 19:46:47 -07:00
|
|
|
|
|
2026-04-11 13:50:43 +05:30
|
|
|
|
def _needs_codex_wrap(client_obj, base_url_str: str, model_str: str) -> bool:
|
|
|
|
|
|
"""Decide if a plain OpenAI client should be wrapped for Responses API.
|
|
|
|
|
|
|
|
|
|
|
|
Returns True when api_mode is explicitly "codex_responses", or when
|
|
|
|
|
|
auto-detection (api.openai.com + codex-family model) suggests it.
|
|
|
|
|
|
Already-wrapped clients (CodexAuxiliaryClient) are skipped.
|
|
|
|
|
|
"""
|
|
|
|
|
|
if isinstance(client_obj, CodexAuxiliaryClient):
|
|
|
|
|
|
return False
|
|
|
|
|
|
if raw_codex:
|
|
|
|
|
|
return False
|
|
|
|
|
|
if api_mode == "codex_responses":
|
|
|
|
|
|
return True
|
|
|
|
|
|
# Auto-detect: api.openai.com + codex model name pattern
|
|
|
|
|
|
if api_mode and api_mode != "codex_responses":
|
|
|
|
|
|
return False # explicit non-codex mode
|
fix: extend hostname-match provider detection across remaining call sites
Aslaaen's fix in the original PR covered _detect_api_mode_for_url and the
two openai/xai sites in run_agent.py. This finishes the sweep: the same
substring-match false-positive class (e.g. https://api.openai.com.evil/v1,
https://proxy/api.openai.com/v1, https://api.anthropic.com.example/v1)
existed in eight more call sites, and the hostname helper was duplicated
in two modules.
- utils: add shared base_url_hostname() (single source of truth).
- hermes_cli/runtime_provider, run_agent: drop local duplicates, import
from utils. Reuse the cached AIAgent._base_url_hostname attribute
everywhere it's already populated.
- agent/auxiliary_client: switch codex-wrap auto-detect, max_completion_tokens
gate (auxiliary_max_tokens_param), and custom-endpoint max_tokens kwarg
selection to hostname equality.
- run_agent: native-anthropic check in the Claude-style model branch
and in the AIAgent init provider-auto-detect branch.
- agent/model_metadata: Anthropic /v1/models context-length lookup.
- hermes_cli/providers.determine_api_mode: anthropic / openai URL
heuristics for custom/unknown providers (the /anthropic path-suffix
convention for third-party gateways is preserved).
- tools/delegate_tool: anthropic detection for delegated subagent
runtimes.
- hermes_cli/setup, hermes_cli/tools_config: setup-wizard vision-endpoint
native-OpenAI detection (paired with deduping the repeated check into
a single is_native_openai boolean per branch).
Tests:
- tests/test_base_url_hostname.py covers the helper directly
(path-containing-host, host-suffix, trailing dot, port, case).
- tests/hermes_cli/test_determine_api_mode_hostname.py adds the same
regression class for determine_api_mode, plus a test that the
/anthropic third-party gateway convention still wins.
Also: add asslaenn5@gmail.com → Aslaaen to scripts/release.py AUTHOR_MAP.
2026-04-20 20:58:01 -07:00
|
|
|
|
if base_url_hostname(base_url_str) == "api.openai.com":
|
2026-04-11 13:50:43 +05:30
|
|
|
|
model_lower = (model_str or "").lower()
|
|
|
|
|
|
if "codex" in model_lower:
|
|
|
|
|
|
return True
|
|
|
|
|
|
return False
|
|
|
|
|
|
|
|
|
|
|
|
def _wrap_if_needed(client_obj, final_model_str: str, base_url_str: str = ""):
|
|
|
|
|
|
"""Wrap a plain OpenAI client in CodexAuxiliaryClient if Responses API is needed."""
|
|
|
|
|
|
if _needs_codex_wrap(client_obj, base_url_str, final_model_str):
|
|
|
|
|
|
logger.debug(
|
|
|
|
|
|
"resolve_provider_client: wrapping client in CodexAuxiliaryClient "
|
|
|
|
|
|
"(api_mode=%s, model=%s, base_url=%s)",
|
|
|
|
|
|
api_mode or "auto-detected", final_model_str,
|
|
|
|
|
|
base_url_str[:60] if base_url_str else "")
|
|
|
|
|
|
return CodexAuxiliaryClient(client_obj, final_model_str)
|
|
|
|
|
|
return client_obj
|
|
|
|
|
|
|
feat: centralized provider router + fix Codex vision bypass + vision error handling
Three interconnected fixes for auxiliary client infrastructure:
1. CENTRALIZED PROVIDER ROUTER (auxiliary_client.py)
Add resolve_provider_client(provider, model, async_mode) — a single
entry point for creating properly configured clients. Given a provider
name and optional model, it handles auth lookup (env vars, OAuth
tokens, auth.json), base URL resolution, provider-specific headers,
and API format differences (Chat Completions vs Responses API for
Codex). All auxiliary consumers should route through this instead of
ad-hoc env var lookups.
Refactored get_text_auxiliary_client, get_async_text_auxiliary_client,
and get_vision_auxiliary_client to use the router internally.
2. FIX CODEX VISION BYPASS (vision_tools.py)
vision_tools.py was constructing a raw AsyncOpenAI client from the
sync vision client's api_key/base_url, completely bypassing the Codex
Responses API adapter. When the vision provider resolved to Codex,
the raw client would hit chatgpt.com/backend-api/codex with
chat.completions.create() which only supports the Responses API.
Fix: Added get_async_vision_auxiliary_client() which properly wraps
Codex into AsyncCodexAuxiliaryClient. vision_tools.py now uses this
instead of manual client construction.
3. FIX COMPRESSION FALLBACK + VISION ERROR HANDLING
- context_compressor.py: Removed _get_fallback_client() which blindly
looked for OPENAI_API_KEY + OPENAI_BASE_URL (fails for Codex OAuth,
API-key providers, users without OPENAI_BASE_URL set). Replaced
with fallback loop through resolve_provider_client() for each
known provider, with same-provider dedup.
- vision_tools.py: Added error detection for vision capability
failures. Returns clear message to the model when the configured
model doesn't support vision, instead of a generic error.
Addresses #886
2026-03-11 19:46:47 -07:00
|
|
|
|
# ── Auto: try all providers in priority order ────────────────────
|
|
|
|
|
|
if provider == "auto":
|
2026-04-12 00:10:19 -04:00
|
|
|
|
client, resolved = _resolve_auto(main_runtime=main_runtime)
|
feat: centralized provider router + fix Codex vision bypass + vision error handling
Three interconnected fixes for auxiliary client infrastructure:
1. CENTRALIZED PROVIDER ROUTER (auxiliary_client.py)
Add resolve_provider_client(provider, model, async_mode) — a single
entry point for creating properly configured clients. Given a provider
name and optional model, it handles auth lookup (env vars, OAuth
tokens, auth.json), base URL resolution, provider-specific headers,
and API format differences (Chat Completions vs Responses API for
Codex). All auxiliary consumers should route through this instead of
ad-hoc env var lookups.
Refactored get_text_auxiliary_client, get_async_text_auxiliary_client,
and get_vision_auxiliary_client to use the router internally.
2. FIX CODEX VISION BYPASS (vision_tools.py)
vision_tools.py was constructing a raw AsyncOpenAI client from the
sync vision client's api_key/base_url, completely bypassing the Codex
Responses API adapter. When the vision provider resolved to Codex,
the raw client would hit chatgpt.com/backend-api/codex with
chat.completions.create() which only supports the Responses API.
Fix: Added get_async_vision_auxiliary_client() which properly wraps
Codex into AsyncCodexAuxiliaryClient. vision_tools.py now uses this
instead of manual client construction.
3. FIX COMPRESSION FALLBACK + VISION ERROR HANDLING
- context_compressor.py: Removed _get_fallback_client() which blindly
looked for OPENAI_API_KEY + OPENAI_BASE_URL (fails for Codex OAuth,
API-key providers, users without OPENAI_BASE_URL set). Replaced
with fallback loop through resolve_provider_client() for each
known provider, with same-provider dedup.
- vision_tools.py: Added error detection for vision capability
failures. Returns clear message to the model when the configured
model doesn't support vision, instead of a generic error.
Addresses #886
2026-03-11 19:46:47 -07:00
|
|
|
|
if client is None:
|
|
|
|
|
|
return None, None
|
fix: auxiliary client uses main model for custom/local endpoints instead of gpt-4o-mini (#1189)
* fix: prevent model/provider mismatch when switching providers during active gateway
When _update_config_for_provider() writes the new provider and base_url
to config.yaml, the gateway (which re-reads config per-message) can pick
up the change before model selection completes. This causes the old model
name (e.g. 'anthropic/claude-opus-4.6') to be sent to the new provider's
API (e.g. MiniMax), which fails.
Changes:
- _update_config_for_provider() now accepts an optional default_model
parameter. When provided and the current model.default is empty or
uses OpenRouter format (contains '/'), it sets a safe default model
for the new provider.
- All setup.py callers for direct-API providers (zai, kimi, minimax,
minimax-cn, anthropic) now pass a provider-appropriate default model.
- _setup_provider_model_selection() now validates the 'Keep current'
choice: if the current model uses OpenRouter format and wouldn't work
with the new provider, it warns and switches to the provider's first
default model instead of silently keeping the incompatible name.
Reported by a user on Home Assistant whose gateway started sending
'anthropic/claude-opus-4.6' to MiniMax's API after running hermes setup.
* fix: auxiliary client uses main model for custom/local endpoints instead of gpt-4o-mini
When a user runs a local server (e.g. Qwen3.5-9B via OPENAI_BASE_URL),
the auxiliary client (context compression, vision, session search) would
send requests for 'gpt-4o-mini' or 'google/gemini-3-flash-preview' to
the local server, which only serves one model — causing 404 errors
mid-task.
Changes:
- _try_custom_endpoint() now reads the user's configured main model via
_read_main_model() (checks OPENAI_MODEL → HERMES_MODEL → LLM_MODEL →
config.yaml model.default) instead of hardcoding 'gpt-4o-mini'.
- resolve_provider_client() auto mode now detects when an OpenRouter-
formatted model override (containing '/') would be sent to a non-
OpenRouter provider (like a local server) and drops it in favor of
the provider's default model.
- Test isolation fixes: properly clear env vars in 'nothing available'
tests to prevent host environment leakage.
2026-03-13 10:02:16 -07:00
|
|
|
|
# When auto-detection lands on a non-OpenRouter provider (e.g. a
|
|
|
|
|
|
# local server), an OpenRouter-formatted model override like
|
|
|
|
|
|
# "google/gemini-3-flash-preview" won't work. Drop it and use
|
|
|
|
|
|
# the provider's own default model instead.
|
|
|
|
|
|
if model and "/" in model and resolved and "/" not in resolved:
|
|
|
|
|
|
logger.debug(
|
|
|
|
|
|
"Dropping OpenRouter-format model %r for non-OpenRouter "
|
|
|
|
|
|
"auxiliary provider (using %r instead)", model, resolved)
|
|
|
|
|
|
model = None
|
feat: centralized provider router + fix Codex vision bypass + vision error handling
Three interconnected fixes for auxiliary client infrastructure:
1. CENTRALIZED PROVIDER ROUTER (auxiliary_client.py)
Add resolve_provider_client(provider, model, async_mode) — a single
entry point for creating properly configured clients. Given a provider
name and optional model, it handles auth lookup (env vars, OAuth
tokens, auth.json), base URL resolution, provider-specific headers,
and API format differences (Chat Completions vs Responses API for
Codex). All auxiliary consumers should route through this instead of
ad-hoc env var lookups.
Refactored get_text_auxiliary_client, get_async_text_auxiliary_client,
and get_vision_auxiliary_client to use the router internally.
2. FIX CODEX VISION BYPASS (vision_tools.py)
vision_tools.py was constructing a raw AsyncOpenAI client from the
sync vision client's api_key/base_url, completely bypassing the Codex
Responses API adapter. When the vision provider resolved to Codex,
the raw client would hit chatgpt.com/backend-api/codex with
chat.completions.create() which only supports the Responses API.
Fix: Added get_async_vision_auxiliary_client() which properly wraps
Codex into AsyncCodexAuxiliaryClient. vision_tools.py now uses this
instead of manual client construction.
3. FIX COMPRESSION FALLBACK + VISION ERROR HANDLING
- context_compressor.py: Removed _get_fallback_client() which blindly
looked for OPENAI_API_KEY + OPENAI_BASE_URL (fails for Codex OAuth,
API-key providers, users without OPENAI_BASE_URL set). Replaced
with fallback loop through resolve_provider_client() for each
known provider, with same-provider dedup.
- vision_tools.py: Added error detection for vision capability
failures. Returns clear message to the model when the configured
model doesn't support vision, instead of a generic error.
Addresses #886
2026-03-11 19:46:47 -07:00
|
|
|
|
final_model = model or resolved
|
|
|
|
|
|
return (_to_async_client(client, final_model) if async_mode
|
|
|
|
|
|
else (client, final_model))
|
|
|
|
|
|
|
|
|
|
|
|
# ── OpenRouter ───────────────────────────────────────────────────
|
|
|
|
|
|
if provider == "openrouter":
|
|
|
|
|
|
client, default = _try_openrouter()
|
|
|
|
|
|
if client is None:
|
|
|
|
|
|
logger.warning("resolve_provider_client: openrouter requested "
|
|
|
|
|
|
"but OPENROUTER_API_KEY not set")
|
|
|
|
|
|
return None, None
|
2026-04-09 21:20:29 -07:00
|
|
|
|
final_model = _normalize_resolved_model(model or default, provider)
|
feat: centralized provider router + fix Codex vision bypass + vision error handling
Three interconnected fixes for auxiliary client infrastructure:
1. CENTRALIZED PROVIDER ROUTER (auxiliary_client.py)
Add resolve_provider_client(provider, model, async_mode) — a single
entry point for creating properly configured clients. Given a provider
name and optional model, it handles auth lookup (env vars, OAuth
tokens, auth.json), base URL resolution, provider-specific headers,
and API format differences (Chat Completions vs Responses API for
Codex). All auxiliary consumers should route through this instead of
ad-hoc env var lookups.
Refactored get_text_auxiliary_client, get_async_text_auxiliary_client,
and get_vision_auxiliary_client to use the router internally.
2. FIX CODEX VISION BYPASS (vision_tools.py)
vision_tools.py was constructing a raw AsyncOpenAI client from the
sync vision client's api_key/base_url, completely bypassing the Codex
Responses API adapter. When the vision provider resolved to Codex,
the raw client would hit chatgpt.com/backend-api/codex with
chat.completions.create() which only supports the Responses API.
Fix: Added get_async_vision_auxiliary_client() which properly wraps
Codex into AsyncCodexAuxiliaryClient. vision_tools.py now uses this
instead of manual client construction.
3. FIX COMPRESSION FALLBACK + VISION ERROR HANDLING
- context_compressor.py: Removed _get_fallback_client() which blindly
looked for OPENAI_API_KEY + OPENAI_BASE_URL (fails for Codex OAuth,
API-key providers, users without OPENAI_BASE_URL set). Replaced
with fallback loop through resolve_provider_client() for each
known provider, with same-provider dedup.
- vision_tools.py: Added error detection for vision capability
failures. Returns clear message to the model when the configured
model doesn't support vision, instead of a generic error.
Addresses #886
2026-03-11 19:46:47 -07:00
|
|
|
|
return (_to_async_client(client, final_model) if async_mode
|
|
|
|
|
|
else (client, final_model))
|
|
|
|
|
|
|
|
|
|
|
|
# ── Nous Portal (OAuth) ──────────────────────────────────────────
|
|
|
|
|
|
if provider == "nous":
|
2026-04-19 20:08:03 +00:00
|
|
|
|
# Detect vision tasks: either explicit model override from
|
|
|
|
|
|
# _PROVIDER_VISION_MODELS, or caller passed a known vision model.
|
|
|
|
|
|
_is_vision = (
|
|
|
|
|
|
model in _PROVIDER_VISION_MODELS.values()
|
|
|
|
|
|
or (model or "").strip().lower() == "mimo-v2-omni"
|
|
|
|
|
|
)
|
|
|
|
|
|
client, default = _try_nous(vision=_is_vision)
|
feat: centralized provider router + fix Codex vision bypass + vision error handling
Three interconnected fixes for auxiliary client infrastructure:
1. CENTRALIZED PROVIDER ROUTER (auxiliary_client.py)
Add resolve_provider_client(provider, model, async_mode) — a single
entry point for creating properly configured clients. Given a provider
name and optional model, it handles auth lookup (env vars, OAuth
tokens, auth.json), base URL resolution, provider-specific headers,
and API format differences (Chat Completions vs Responses API for
Codex). All auxiliary consumers should route through this instead of
ad-hoc env var lookups.
Refactored get_text_auxiliary_client, get_async_text_auxiliary_client,
and get_vision_auxiliary_client to use the router internally.
2. FIX CODEX VISION BYPASS (vision_tools.py)
vision_tools.py was constructing a raw AsyncOpenAI client from the
sync vision client's api_key/base_url, completely bypassing the Codex
Responses API adapter. When the vision provider resolved to Codex,
the raw client would hit chatgpt.com/backend-api/codex with
chat.completions.create() which only supports the Responses API.
Fix: Added get_async_vision_auxiliary_client() which properly wraps
Codex into AsyncCodexAuxiliaryClient. vision_tools.py now uses this
instead of manual client construction.
3. FIX COMPRESSION FALLBACK + VISION ERROR HANDLING
- context_compressor.py: Removed _get_fallback_client() which blindly
looked for OPENAI_API_KEY + OPENAI_BASE_URL (fails for Codex OAuth,
API-key providers, users without OPENAI_BASE_URL set). Replaced
with fallback loop through resolve_provider_client() for each
known provider, with same-provider dedup.
- vision_tools.py: Added error detection for vision capability
failures. Returns clear message to the model when the configured
model doesn't support vision, instead of a generic error.
Addresses #886
2026-03-11 19:46:47 -07:00
|
|
|
|
if client is None:
|
|
|
|
|
|
logger.warning("resolve_provider_client: nous requested "
|
2026-04-06 17:17:57 -07:00
|
|
|
|
"but Nous Portal not configured (run: hermes auth)")
|
feat: centralized provider router + fix Codex vision bypass + vision error handling
Three interconnected fixes for auxiliary client infrastructure:
1. CENTRALIZED PROVIDER ROUTER (auxiliary_client.py)
Add resolve_provider_client(provider, model, async_mode) — a single
entry point for creating properly configured clients. Given a provider
name and optional model, it handles auth lookup (env vars, OAuth
tokens, auth.json), base URL resolution, provider-specific headers,
and API format differences (Chat Completions vs Responses API for
Codex). All auxiliary consumers should route through this instead of
ad-hoc env var lookups.
Refactored get_text_auxiliary_client, get_async_text_auxiliary_client,
and get_vision_auxiliary_client to use the router internally.
2. FIX CODEX VISION BYPASS (vision_tools.py)
vision_tools.py was constructing a raw AsyncOpenAI client from the
sync vision client's api_key/base_url, completely bypassing the Codex
Responses API adapter. When the vision provider resolved to Codex,
the raw client would hit chatgpt.com/backend-api/codex with
chat.completions.create() which only supports the Responses API.
Fix: Added get_async_vision_auxiliary_client() which properly wraps
Codex into AsyncCodexAuxiliaryClient. vision_tools.py now uses this
instead of manual client construction.
3. FIX COMPRESSION FALLBACK + VISION ERROR HANDLING
- context_compressor.py: Removed _get_fallback_client() which blindly
looked for OPENAI_API_KEY + OPENAI_BASE_URL (fails for Codex OAuth,
API-key providers, users without OPENAI_BASE_URL set). Replaced
with fallback loop through resolve_provider_client() for each
known provider, with same-provider dedup.
- vision_tools.py: Added error detection for vision capability
failures. Returns clear message to the model when the configured
model doesn't support vision, instead of a generic error.
Addresses #886
2026-03-11 19:46:47 -07:00
|
|
|
|
return None, None
|
2026-04-09 21:20:29 -07:00
|
|
|
|
final_model = _normalize_resolved_model(model or default, provider)
|
feat: centralized provider router + fix Codex vision bypass + vision error handling
Three interconnected fixes for auxiliary client infrastructure:
1. CENTRALIZED PROVIDER ROUTER (auxiliary_client.py)
Add resolve_provider_client(provider, model, async_mode) — a single
entry point for creating properly configured clients. Given a provider
name and optional model, it handles auth lookup (env vars, OAuth
tokens, auth.json), base URL resolution, provider-specific headers,
and API format differences (Chat Completions vs Responses API for
Codex). All auxiliary consumers should route through this instead of
ad-hoc env var lookups.
Refactored get_text_auxiliary_client, get_async_text_auxiliary_client,
and get_vision_auxiliary_client to use the router internally.
2. FIX CODEX VISION BYPASS (vision_tools.py)
vision_tools.py was constructing a raw AsyncOpenAI client from the
sync vision client's api_key/base_url, completely bypassing the Codex
Responses API adapter. When the vision provider resolved to Codex,
the raw client would hit chatgpt.com/backend-api/codex with
chat.completions.create() which only supports the Responses API.
Fix: Added get_async_vision_auxiliary_client() which properly wraps
Codex into AsyncCodexAuxiliaryClient. vision_tools.py now uses this
instead of manual client construction.
3. FIX COMPRESSION FALLBACK + VISION ERROR HANDLING
- context_compressor.py: Removed _get_fallback_client() which blindly
looked for OPENAI_API_KEY + OPENAI_BASE_URL (fails for Codex OAuth,
API-key providers, users without OPENAI_BASE_URL set). Replaced
with fallback loop through resolve_provider_client() for each
known provider, with same-provider dedup.
- vision_tools.py: Added error detection for vision capability
failures. Returns clear message to the model when the configured
model doesn't support vision, instead of a generic error.
Addresses #886
2026-03-11 19:46:47 -07:00
|
|
|
|
return (_to_async_client(client, final_model) if async_mode
|
|
|
|
|
|
else (client, final_model))
|
|
|
|
|
|
|
|
|
|
|
|
# ── OpenAI Codex (OAuth → Responses API) ─────────────────────────
|
|
|
|
|
|
if provider == "openai-codex":
|
2026-03-11 21:38:29 -07:00
|
|
|
|
if raw_codex:
|
|
|
|
|
|
# Return the raw OpenAI client for callers that need direct
|
|
|
|
|
|
# access to responses.stream() (e.g., the main agent loop).
|
|
|
|
|
|
codex_token = _read_codex_access_token()
|
|
|
|
|
|
if not codex_token:
|
|
|
|
|
|
logger.warning("resolve_provider_client: openai-codex requested "
|
|
|
|
|
|
"but no Codex OAuth token found (run: hermes model)")
|
|
|
|
|
|
return None, None
|
2026-04-09 21:20:29 -07:00
|
|
|
|
final_model = _normalize_resolved_model(model or _CODEX_AUX_MODEL, provider)
|
fix(codex): pin correct Cloudflare headers and extend to auxiliary client
The cherry-picked salvage (admin28980's commit) added codex headers only on the
primary chat client path, with two inaccuracies:
- originator was 'hermes-agent' — Cloudflare whitelists codex_cli_rs,
codex_vscode, codex_sdk_ts, and Codex* prefixes. 'hermes-agent' isn't on
the list, so the header had no mitigating effect on the 403 (the
account-id header alone may have been carrying the fix).
- account-id header was 'ChatGPT-Account-Id' — upstream codex-rs auth.rs
uses canonical 'ChatGPT-Account-ID' (PascalCase, trailing -ID).
Also, the auxiliary client (_try_codex + resolve_provider_client raw_codex
branch) constructs OpenAI clients against the same chatgpt.com endpoint with
no default headers at all — so compression, title generation, vision, session
search, and web_extract all still 403 from VPS IPs.
Consolidate the header set into _codex_cloudflare_headers() in
agent/auxiliary_client.py (natural home next to _read_codex_access_token and
the existing JWT decode logic) and call it from all four insertion points:
- run_agent.py: AIAgent.__init__ (initial construction)
- run_agent.py: _apply_client_headers_for_base_url (credential rotation)
- agent/auxiliary_client.py: _try_codex (aux client)
- agent/auxiliary_client.py: resolve_provider_client raw_codex branch
Net: -36/+55 lines, -25 lines of duplicated inline JWT decode replaced by a
single helper. User-Agent switched to 'codex_cli_rs/0.0.0 (Hermes Agent)' to
match the codex-rs shape while keeping product attribution.
Tests in tests/agent/test_codex_cloudflare_headers.py cover:
- originator value, User-Agent shape, canonical header casing
- account-ID extraction from a real JWT fixture
- graceful handling of malformed / non-string / claim-missing tokens
- wiring at all four insertion points (primary init, rotation, both aux paths)
- non-chatgpt base URLs (openrouter) do NOT get codex headers
- switching away from chatgpt.com drops the headers
2026-04-19 11:58:15 -07:00
|
|
|
|
raw_client = OpenAI(
|
|
|
|
|
|
api_key=codex_token,
|
|
|
|
|
|
base_url=_CODEX_AUX_BASE_URL,
|
|
|
|
|
|
default_headers=_codex_cloudflare_headers(codex_token),
|
|
|
|
|
|
)
|
2026-03-11 21:38:29 -07:00
|
|
|
|
return (raw_client, final_model)
|
|
|
|
|
|
# Standard path: wrap in CodexAuxiliaryClient adapter
|
feat: centralized provider router + fix Codex vision bypass + vision error handling
Three interconnected fixes for auxiliary client infrastructure:
1. CENTRALIZED PROVIDER ROUTER (auxiliary_client.py)
Add resolve_provider_client(provider, model, async_mode) — a single
entry point for creating properly configured clients. Given a provider
name and optional model, it handles auth lookup (env vars, OAuth
tokens, auth.json), base URL resolution, provider-specific headers,
and API format differences (Chat Completions vs Responses API for
Codex). All auxiliary consumers should route through this instead of
ad-hoc env var lookups.
Refactored get_text_auxiliary_client, get_async_text_auxiliary_client,
and get_vision_auxiliary_client to use the router internally.
2. FIX CODEX VISION BYPASS (vision_tools.py)
vision_tools.py was constructing a raw AsyncOpenAI client from the
sync vision client's api_key/base_url, completely bypassing the Codex
Responses API adapter. When the vision provider resolved to Codex,
the raw client would hit chatgpt.com/backend-api/codex with
chat.completions.create() which only supports the Responses API.
Fix: Added get_async_vision_auxiliary_client() which properly wraps
Codex into AsyncCodexAuxiliaryClient. vision_tools.py now uses this
instead of manual client construction.
3. FIX COMPRESSION FALLBACK + VISION ERROR HANDLING
- context_compressor.py: Removed _get_fallback_client() which blindly
looked for OPENAI_API_KEY + OPENAI_BASE_URL (fails for Codex OAuth,
API-key providers, users without OPENAI_BASE_URL set). Replaced
with fallback loop through resolve_provider_client() for each
known provider, with same-provider dedup.
- vision_tools.py: Added error detection for vision capability
failures. Returns clear message to the model when the configured
model doesn't support vision, instead of a generic error.
Addresses #886
2026-03-11 19:46:47 -07:00
|
|
|
|
client, default = _try_codex()
|
|
|
|
|
|
if client is None:
|
|
|
|
|
|
logger.warning("resolve_provider_client: openai-codex requested "
|
|
|
|
|
|
"but no Codex OAuth token found (run: hermes model)")
|
|
|
|
|
|
return None, None
|
2026-04-09 21:20:29 -07:00
|
|
|
|
final_model = _normalize_resolved_model(model or default, provider)
|
feat: centralized provider router + fix Codex vision bypass + vision error handling
Three interconnected fixes for auxiliary client infrastructure:
1. CENTRALIZED PROVIDER ROUTER (auxiliary_client.py)
Add resolve_provider_client(provider, model, async_mode) — a single
entry point for creating properly configured clients. Given a provider
name and optional model, it handles auth lookup (env vars, OAuth
tokens, auth.json), base URL resolution, provider-specific headers,
and API format differences (Chat Completions vs Responses API for
Codex). All auxiliary consumers should route through this instead of
ad-hoc env var lookups.
Refactored get_text_auxiliary_client, get_async_text_auxiliary_client,
and get_vision_auxiliary_client to use the router internally.
2. FIX CODEX VISION BYPASS (vision_tools.py)
vision_tools.py was constructing a raw AsyncOpenAI client from the
sync vision client's api_key/base_url, completely bypassing the Codex
Responses API adapter. When the vision provider resolved to Codex,
the raw client would hit chatgpt.com/backend-api/codex with
chat.completions.create() which only supports the Responses API.
Fix: Added get_async_vision_auxiliary_client() which properly wraps
Codex into AsyncCodexAuxiliaryClient. vision_tools.py now uses this
instead of manual client construction.
3. FIX COMPRESSION FALLBACK + VISION ERROR HANDLING
- context_compressor.py: Removed _get_fallback_client() which blindly
looked for OPENAI_API_KEY + OPENAI_BASE_URL (fails for Codex OAuth,
API-key providers, users without OPENAI_BASE_URL set). Replaced
with fallback loop through resolve_provider_client() for each
known provider, with same-provider dedup.
- vision_tools.py: Added error detection for vision capability
failures. Returns clear message to the model when the configured
model doesn't support vision, instead of a generic error.
Addresses #886
2026-03-11 19:46:47 -07:00
|
|
|
|
return (_to_async_client(client, final_model) if async_mode
|
|
|
|
|
|
else (client, final_model))
|
|
|
|
|
|
|
|
|
|
|
|
# ── Custom endpoint (OPENAI_BASE_URL + OPENAI_API_KEY) ───────────
|
|
|
|
|
|
if provider == "custom":
|
2026-03-14 20:48:29 -07:00
|
|
|
|
if explicit_base_url:
|
|
|
|
|
|
custom_base = explicit_base_url.strip()
|
|
|
|
|
|
custom_key = (
|
|
|
|
|
|
(explicit_api_key or "").strip()
|
|
|
|
|
|
or os.getenv("OPENAI_API_KEY", "").strip()
|
2026-03-29 21:05:36 -07:00
|
|
|
|
or "no-key-required" # local servers don't need auth
|
2026-03-14 20:48:29 -07:00
|
|
|
|
)
|
2026-03-29 21:05:36 -07:00
|
|
|
|
if not custom_base:
|
2026-03-14 20:48:29 -07:00
|
|
|
|
logger.warning(
|
|
|
|
|
|
"resolve_provider_client: explicit custom endpoint requested "
|
2026-03-29 21:05:36 -07:00
|
|
|
|
"but base_url is empty"
|
2026-03-14 20:48:29 -07:00
|
|
|
|
)
|
|
|
|
|
|
return None, None
|
2026-04-09 21:20:29 -07:00
|
|
|
|
final_model = _normalize_resolved_model(
|
|
|
|
|
|
model or _read_main_model() or "gpt-4o-mini",
|
|
|
|
|
|
provider,
|
|
|
|
|
|
)
|
2026-04-07 17:50:42 +08:00
|
|
|
|
extra = {}
|
fix: sweep remaining provider-URL substring checks across codebase
Completes the hostname-hardening sweep — every substring check against a
provider host in live-routing code is now hostname-based. This closes the
same false-positive class for OpenRouter, GitHub Copilot, Kimi, Qwen,
ChatGPT/Codex, Bedrock, GitHub Models, Vercel AI Gateway, Nous, Z.AI,
Moonshot, Arcee, and MiniMax that the original PR closed for OpenAI, xAI,
and Anthropic.
New helper:
- utils.base_url_host_matches(base_url, domain) — safe counterpart to
'domain in base_url'. Accepts hostname equality and subdomain matches;
rejects path segments, host suffixes, and prefix collisions.
Call sites converted (real-code only; tests, optional-skills, red-teaming
scripts untouched):
run_agent.py (10 sites):
- AIAgent.__init__ Bedrock branch, ChatGPT/Codex branch (also path check)
- header cascade for openrouter / copilot / kimi / qwen / chatgpt
- interleaved-thinking trigger (openrouter + claude)
- _is_openrouter_url(), _is_qwen_portal()
- is_native_anthropic check
- github-models-vs-copilot detection (3 sites)
- reasoning-capable route gate (nousresearch, vercel, github)
- codex-backend detection in API kwargs build
- fallback api_mode Bedrock detection
agent/auxiliary_client.py (7 sites):
- extra-headers cascades in 4 distinct client-construction paths
(resolve custom, resolve auto, OpenRouter-fallback-to-custom,
_async_client_from_sync, resolve_provider_client explicit-custom,
resolve_auto_with_codex)
- _is_openrouter_client() base_url sniff
agent/usage_pricing.py:
- resolve_billing_route openrouter branch
agent/model_metadata.py:
- _is_openrouter_base_url(), Bedrock context-length lookup
hermes_cli/providers.py:
- determine_api_mode Bedrock heuristic
hermes_cli/runtime_provider.py:
- _is_openrouter_url flag for API-key preference (issues #420, #560)
hermes_cli/doctor.py:
- Kimi User-Agent header for /models probes
tools/delegate_tool.py:
- subagent Codex endpoint detection
trajectory_compressor.py:
- _detect_provider() cascade (8 providers: openrouter, nous, codex, zai,
kimi-coding, arcee, minimax-cn, minimax)
cli.py, gateway/run.py:
- /model-switch cache-enabled hint (openrouter + claude)
Bedrock detection tightened from 'bedrock-runtime in url' to
'hostname starts with bedrock-runtime. AND host is under amazonaws.com'.
ChatGPT/Codex detection tightened from 'chatgpt.com/backend-api/codex in
url' to 'hostname is chatgpt.com AND path contains /backend-api/codex'.
Tests:
- tests/test_base_url_hostname.py extended with a base_url_host_matches
suite (exact match, subdomain, path-segment rejection, host-suffix
rejection, host-prefix rejection, empty-input, case-insensitivity,
trailing dot).
Validation: 651 targeted tests pass (runtime_provider, minimax, bedrock,
gemini, auxiliary, codex_cloudflare, usage_pricing, compressor_fallback,
fallback_model, openai_client_lifecycle, provider_parity, cli_provider_resolution,
delegate, credential_pool, context_compressor, plus the 4 hostname test
modules). 26-assertion E2E call-site verification across 6 modules passes.
2026-04-20 21:17:28 -07:00
|
|
|
|
if base_url_host_matches(custom_base, "api.kimi.com"):
|
2026-04-18 22:55:36 +08:00
|
|
|
|
extra["default_headers"] = {"User-Agent": "claude-code/0.1.0"}
|
fix: sweep remaining provider-URL substring checks across codebase
Completes the hostname-hardening sweep — every substring check against a
provider host in live-routing code is now hostname-based. This closes the
same false-positive class for OpenRouter, GitHub Copilot, Kimi, Qwen,
ChatGPT/Codex, Bedrock, GitHub Models, Vercel AI Gateway, Nous, Z.AI,
Moonshot, Arcee, and MiniMax that the original PR closed for OpenAI, xAI,
and Anthropic.
New helper:
- utils.base_url_host_matches(base_url, domain) — safe counterpart to
'domain in base_url'. Accepts hostname equality and subdomain matches;
rejects path segments, host suffixes, and prefix collisions.
Call sites converted (real-code only; tests, optional-skills, red-teaming
scripts untouched):
run_agent.py (10 sites):
- AIAgent.__init__ Bedrock branch, ChatGPT/Codex branch (also path check)
- header cascade for openrouter / copilot / kimi / qwen / chatgpt
- interleaved-thinking trigger (openrouter + claude)
- _is_openrouter_url(), _is_qwen_portal()
- is_native_anthropic check
- github-models-vs-copilot detection (3 sites)
- reasoning-capable route gate (nousresearch, vercel, github)
- codex-backend detection in API kwargs build
- fallback api_mode Bedrock detection
agent/auxiliary_client.py (7 sites):
- extra-headers cascades in 4 distinct client-construction paths
(resolve custom, resolve auto, OpenRouter-fallback-to-custom,
_async_client_from_sync, resolve_provider_client explicit-custom,
resolve_auto_with_codex)
- _is_openrouter_client() base_url sniff
agent/usage_pricing.py:
- resolve_billing_route openrouter branch
agent/model_metadata.py:
- _is_openrouter_base_url(), Bedrock context-length lookup
hermes_cli/providers.py:
- determine_api_mode Bedrock heuristic
hermes_cli/runtime_provider.py:
- _is_openrouter_url flag for API-key preference (issues #420, #560)
hermes_cli/doctor.py:
- Kimi User-Agent header for /models probes
tools/delegate_tool.py:
- subagent Codex endpoint detection
trajectory_compressor.py:
- _detect_provider() cascade (8 providers: openrouter, nous, codex, zai,
kimi-coding, arcee, minimax-cn, minimax)
cli.py, gateway/run.py:
- /model-switch cache-enabled hint (openrouter + claude)
Bedrock detection tightened from 'bedrock-runtime in url' to
'hostname starts with bedrock-runtime. AND host is under amazonaws.com'.
ChatGPT/Codex detection tightened from 'chatgpt.com/backend-api/codex in
url' to 'hostname is chatgpt.com AND path contains /backend-api/codex'.
Tests:
- tests/test_base_url_hostname.py extended with a base_url_host_matches
suite (exact match, subdomain, path-segment rejection, host-suffix
rejection, host-prefix rejection, empty-input, case-insensitivity,
trailing dot).
Validation: 651 targeted tests pass (runtime_provider, minimax, bedrock,
gemini, auxiliary, codex_cloudflare, usage_pricing, compressor_fallback,
fallback_model, openai_client_lifecycle, provider_parity, cli_provider_resolution,
delegate, credential_pool, context_compressor, plus the 4 hostname test
modules). 26-assertion E2E call-site verification across 6 modules passes.
2026-04-20 21:17:28 -07:00
|
|
|
|
elif base_url_host_matches(custom_base, "api.githubcopilot.com"):
|
2026-04-07 17:50:42 +08:00
|
|
|
|
from hermes_cli.models import copilot_default_headers
|
|
|
|
|
|
extra["default_headers"] = copilot_default_headers()
|
|
|
|
|
|
client = OpenAI(api_key=custom_key, base_url=custom_base, **extra)
|
2026-04-11 13:50:43 +05:30
|
|
|
|
client = _wrap_if_needed(client, final_model, custom_base)
|
2026-03-14 20:48:29 -07:00
|
|
|
|
return (_to_async_client(client, final_model) if async_mode
|
|
|
|
|
|
else (client, final_model))
|
feat: centralized provider router + fix Codex vision bypass + vision error handling
Three interconnected fixes for auxiliary client infrastructure:
1. CENTRALIZED PROVIDER ROUTER (auxiliary_client.py)
Add resolve_provider_client(provider, model, async_mode) — a single
entry point for creating properly configured clients. Given a provider
name and optional model, it handles auth lookup (env vars, OAuth
tokens, auth.json), base URL resolution, provider-specific headers,
and API format differences (Chat Completions vs Responses API for
Codex). All auxiliary consumers should route through this instead of
ad-hoc env var lookups.
Refactored get_text_auxiliary_client, get_async_text_auxiliary_client,
and get_vision_auxiliary_client to use the router internally.
2. FIX CODEX VISION BYPASS (vision_tools.py)
vision_tools.py was constructing a raw AsyncOpenAI client from the
sync vision client's api_key/base_url, completely bypassing the Codex
Responses API adapter. When the vision provider resolved to Codex,
the raw client would hit chatgpt.com/backend-api/codex with
chat.completions.create() which only supports the Responses API.
Fix: Added get_async_vision_auxiliary_client() which properly wraps
Codex into AsyncCodexAuxiliaryClient. vision_tools.py now uses this
instead of manual client construction.
3. FIX COMPRESSION FALLBACK + VISION ERROR HANDLING
- context_compressor.py: Removed _get_fallback_client() which blindly
looked for OPENAI_API_KEY + OPENAI_BASE_URL (fails for Codex OAuth,
API-key providers, users without OPENAI_BASE_URL set). Replaced
with fallback loop through resolve_provider_client() for each
known provider, with same-provider dedup.
- vision_tools.py: Added error detection for vision capability
failures. Returns clear message to the model when the configured
model doesn't support vision, instead of a generic error.
Addresses #886
2026-03-11 19:46:47 -07:00
|
|
|
|
# Try custom first, then codex, then API-key providers
|
|
|
|
|
|
for try_fn in (_try_custom_endpoint, _try_codex,
|
|
|
|
|
|
_resolve_api_key_provider):
|
|
|
|
|
|
client, default = try_fn()
|
|
|
|
|
|
if client is not None:
|
2026-04-09 21:20:29 -07:00
|
|
|
|
final_model = _normalize_resolved_model(model or default, provider)
|
2026-04-11 13:50:43 +05:30
|
|
|
|
_cbase = str(getattr(client, "base_url", "") or "")
|
|
|
|
|
|
client = _wrap_if_needed(client, final_model, _cbase)
|
feat: centralized provider router + fix Codex vision bypass + vision error handling
Three interconnected fixes for auxiliary client infrastructure:
1. CENTRALIZED PROVIDER ROUTER (auxiliary_client.py)
Add resolve_provider_client(provider, model, async_mode) — a single
entry point for creating properly configured clients. Given a provider
name and optional model, it handles auth lookup (env vars, OAuth
tokens, auth.json), base URL resolution, provider-specific headers,
and API format differences (Chat Completions vs Responses API for
Codex). All auxiliary consumers should route through this instead of
ad-hoc env var lookups.
Refactored get_text_auxiliary_client, get_async_text_auxiliary_client,
and get_vision_auxiliary_client to use the router internally.
2. FIX CODEX VISION BYPASS (vision_tools.py)
vision_tools.py was constructing a raw AsyncOpenAI client from the
sync vision client's api_key/base_url, completely bypassing the Codex
Responses API adapter. When the vision provider resolved to Codex,
the raw client would hit chatgpt.com/backend-api/codex with
chat.completions.create() which only supports the Responses API.
Fix: Added get_async_vision_auxiliary_client() which properly wraps
Codex into AsyncCodexAuxiliaryClient. vision_tools.py now uses this
instead of manual client construction.
3. FIX COMPRESSION FALLBACK + VISION ERROR HANDLING
- context_compressor.py: Removed _get_fallback_client() which blindly
looked for OPENAI_API_KEY + OPENAI_BASE_URL (fails for Codex OAuth,
API-key providers, users without OPENAI_BASE_URL set). Replaced
with fallback loop through resolve_provider_client() for each
known provider, with same-provider dedup.
- vision_tools.py: Added error detection for vision capability
failures. Returns clear message to the model when the configured
model doesn't support vision, instead of a generic error.
Addresses #886
2026-03-11 19:46:47 -07:00
|
|
|
|
return (_to_async_client(client, final_model) if async_mode
|
|
|
|
|
|
else (client, final_model))
|
|
|
|
|
|
logger.warning("resolve_provider_client: custom/main requested "
|
|
|
|
|
|
"but no endpoint credentials found")
|
|
|
|
|
|
return None, None
|
|
|
|
|
|
|
fix(auxiliary): resolve named custom providers and 'main' alias in auxiliary routing (#5978)
* fix(telegram): replace substring caption check with exact line-by-line match
Captions in photo bursts and media group albums were silently dropped when
a shorter caption happened to be a substring of an existing one (e.g.
"Meeting" lost inside "Meeting agenda"). Extract a shared _merge_caption
static helper that splits on "\n\n" and uses exact match with whitespace
normalisation, then use it in both _enqueue_photo_event and
_queue_media_group_event.
Adds 13 unit tests covering the fixed bug scenarios.
Cherry-picked from PR #2671 by Dilee.
* fix: extend caption substring fix to all platforms
Move _merge_caption helper from TelegramAdapter to BasePlatformAdapter
so all adapters inherit it. Fix the same substring-containment bug in:
- gateway/platforms/base.py (photo burst merging)
- gateway/run.py (priority photo follow-up merging)
- gateway/platforms/feishu.py (media batch merging)
The original fix only covered telegram.py. The same bug existed in base.py
and run.py (pure substring check) and feishu.py (list membership without
whitespace normalization).
* fix(auxiliary): resolve named custom providers and 'main' alias in auxiliary routing
Two bugs caused auxiliary tasks (vision, compression, etc.) to fail when
using named custom providers defined in config.yaml:
1. 'provider: main' was hardcoded to 'custom', which only checks legacy
OPENAI_BASE_URL env vars. Now reads _read_main_provider() to resolve
to the actual provider (e.g., 'custom:beans', 'openrouter', 'deepseek').
2. Named custom provider names (e.g., 'beans') fell through to
PROVIDER_REGISTRY which doesn't know about config.yaml entries.
Now checks _get_named_custom_provider() before the registry fallback.
Fixes both resolve_provider_client() and _normalize_vision_provider()
so the fix covers all auxiliary tasks (vision, compression, web_extract,
session_search, etc.).
Adds 13 unit tests. Reported by Laura via Discord.
---------
Co-authored-by: Dilee <uzmpsk.dilekakbas@gmail.com>
2026-04-07 17:59:47 -07:00
|
|
|
|
# ── Named custom providers (config.yaml custom_providers list) ───
|
|
|
|
|
|
try:
|
|
|
|
|
|
from hermes_cli.runtime_provider import _get_named_custom_provider
|
|
|
|
|
|
custom_entry = _get_named_custom_provider(provider)
|
|
|
|
|
|
if custom_entry:
|
|
|
|
|
|
custom_base = custom_entry.get("base_url", "").strip()
|
2026-04-13 05:26:32 -07:00
|
|
|
|
custom_key = custom_entry.get("api_key", "").strip()
|
|
|
|
|
|
custom_key_env = custom_entry.get("key_env", "").strip()
|
|
|
|
|
|
if not custom_key and custom_key_env:
|
|
|
|
|
|
custom_key = os.getenv(custom_key_env, "").strip()
|
|
|
|
|
|
custom_key = custom_key or "no-key-required"
|
fix(auxiliary): resolve named custom providers and 'main' alias in auxiliary routing (#5978)
* fix(telegram): replace substring caption check with exact line-by-line match
Captions in photo bursts and media group albums were silently dropped when
a shorter caption happened to be a substring of an existing one (e.g.
"Meeting" lost inside "Meeting agenda"). Extract a shared _merge_caption
static helper that splits on "\n\n" and uses exact match with whitespace
normalisation, then use it in both _enqueue_photo_event and
_queue_media_group_event.
Adds 13 unit tests covering the fixed bug scenarios.
Cherry-picked from PR #2671 by Dilee.
* fix: extend caption substring fix to all platforms
Move _merge_caption helper from TelegramAdapter to BasePlatformAdapter
so all adapters inherit it. Fix the same substring-containment bug in:
- gateway/platforms/base.py (photo burst merging)
- gateway/run.py (priority photo follow-up merging)
- gateway/platforms/feishu.py (media batch merging)
The original fix only covered telegram.py. The same bug existed in base.py
and run.py (pure substring check) and feishu.py (list membership without
whitespace normalization).
* fix(auxiliary): resolve named custom providers and 'main' alias in auxiliary routing
Two bugs caused auxiliary tasks (vision, compression, etc.) to fail when
using named custom providers defined in config.yaml:
1. 'provider: main' was hardcoded to 'custom', which only checks legacy
OPENAI_BASE_URL env vars. Now reads _read_main_provider() to resolve
to the actual provider (e.g., 'custom:beans', 'openrouter', 'deepseek').
2. Named custom provider names (e.g., 'beans') fell through to
PROVIDER_REGISTRY which doesn't know about config.yaml entries.
Now checks _get_named_custom_provider() before the registry fallback.
Fixes both resolve_provider_client() and _normalize_vision_provider()
so the fix covers all auxiliary tasks (vision, compression, web_extract,
session_search, etc.).
Adds 13 unit tests. Reported by Laura via Discord.
---------
Co-authored-by: Dilee <uzmpsk.dilekakbas@gmail.com>
2026-04-07 17:59:47 -07:00
|
|
|
|
if custom_base:
|
2026-04-09 21:20:29 -07:00
|
|
|
|
final_model = _normalize_resolved_model(
|
2026-04-13 05:26:32 -07:00
|
|
|
|
model or custom_entry.get("model") or _read_main_model() or "gpt-4o-mini",
|
2026-04-09 21:20:29 -07:00
|
|
|
|
provider,
|
|
|
|
|
|
)
|
fix(auxiliary): resolve named custom providers and 'main' alias in auxiliary routing (#5978)
* fix(telegram): replace substring caption check with exact line-by-line match
Captions in photo bursts and media group albums were silently dropped when
a shorter caption happened to be a substring of an existing one (e.g.
"Meeting" lost inside "Meeting agenda"). Extract a shared _merge_caption
static helper that splits on "\n\n" and uses exact match with whitespace
normalisation, then use it in both _enqueue_photo_event and
_queue_media_group_event.
Adds 13 unit tests covering the fixed bug scenarios.
Cherry-picked from PR #2671 by Dilee.
* fix: extend caption substring fix to all platforms
Move _merge_caption helper from TelegramAdapter to BasePlatformAdapter
so all adapters inherit it. Fix the same substring-containment bug in:
- gateway/platforms/base.py (photo burst merging)
- gateway/run.py (priority photo follow-up merging)
- gateway/platforms/feishu.py (media batch merging)
The original fix only covered telegram.py. The same bug existed in base.py
and run.py (pure substring check) and feishu.py (list membership without
whitespace normalization).
* fix(auxiliary): resolve named custom providers and 'main' alias in auxiliary routing
Two bugs caused auxiliary tasks (vision, compression, etc.) to fail when
using named custom providers defined in config.yaml:
1. 'provider: main' was hardcoded to 'custom', which only checks legacy
OPENAI_BASE_URL env vars. Now reads _read_main_provider() to resolve
to the actual provider (e.g., 'custom:beans', 'openrouter', 'deepseek').
2. Named custom provider names (e.g., 'beans') fell through to
PROVIDER_REGISTRY which doesn't know about config.yaml entries.
Now checks _get_named_custom_provider() before the registry fallback.
Fixes both resolve_provider_client() and _normalize_vision_provider()
so the fix covers all auxiliary tasks (vision, compression, web_extract,
session_search, etc.).
Adds 13 unit tests. Reported by Laura via Discord.
---------
Co-authored-by: Dilee <uzmpsk.dilekakbas@gmail.com>
2026-04-07 17:59:47 -07:00
|
|
|
|
client = OpenAI(api_key=custom_key, base_url=custom_base)
|
2026-04-11 13:50:43 +05:30
|
|
|
|
client = _wrap_if_needed(client, final_model, custom_base)
|
fix(auxiliary): resolve named custom providers and 'main' alias in auxiliary routing (#5978)
* fix(telegram): replace substring caption check with exact line-by-line match
Captions in photo bursts and media group albums were silently dropped when
a shorter caption happened to be a substring of an existing one (e.g.
"Meeting" lost inside "Meeting agenda"). Extract a shared _merge_caption
static helper that splits on "\n\n" and uses exact match with whitespace
normalisation, then use it in both _enqueue_photo_event and
_queue_media_group_event.
Adds 13 unit tests covering the fixed bug scenarios.
Cherry-picked from PR #2671 by Dilee.
* fix: extend caption substring fix to all platforms
Move _merge_caption helper from TelegramAdapter to BasePlatformAdapter
so all adapters inherit it. Fix the same substring-containment bug in:
- gateway/platforms/base.py (photo burst merging)
- gateway/run.py (priority photo follow-up merging)
- gateway/platforms/feishu.py (media batch merging)
The original fix only covered telegram.py. The same bug existed in base.py
and run.py (pure substring check) and feishu.py (list membership without
whitespace normalization).
* fix(auxiliary): resolve named custom providers and 'main' alias in auxiliary routing
Two bugs caused auxiliary tasks (vision, compression, etc.) to fail when
using named custom providers defined in config.yaml:
1. 'provider: main' was hardcoded to 'custom', which only checks legacy
OPENAI_BASE_URL env vars. Now reads _read_main_provider() to resolve
to the actual provider (e.g., 'custom:beans', 'openrouter', 'deepseek').
2. Named custom provider names (e.g., 'beans') fell through to
PROVIDER_REGISTRY which doesn't know about config.yaml entries.
Now checks _get_named_custom_provider() before the registry fallback.
Fixes both resolve_provider_client() and _normalize_vision_provider()
so the fix covers all auxiliary tasks (vision, compression, web_extract,
session_search, etc.).
Adds 13 unit tests. Reported by Laura via Discord.
---------
Co-authored-by: Dilee <uzmpsk.dilekakbas@gmail.com>
2026-04-07 17:59:47 -07:00
|
|
|
|
logger.debug(
|
|
|
|
|
|
"resolve_provider_client: named custom provider %r (%s)",
|
|
|
|
|
|
provider, final_model)
|
|
|
|
|
|
return (_to_async_client(client, final_model) if async_mode
|
|
|
|
|
|
else (client, final_model))
|
|
|
|
|
|
logger.warning(
|
|
|
|
|
|
"resolve_provider_client: named custom provider %r has no base_url",
|
|
|
|
|
|
provider)
|
|
|
|
|
|
return None, None
|
|
|
|
|
|
except ImportError:
|
|
|
|
|
|
pass
|
|
|
|
|
|
|
feat: centralized provider router + fix Codex vision bypass + vision error handling
Three interconnected fixes for auxiliary client infrastructure:
1. CENTRALIZED PROVIDER ROUTER (auxiliary_client.py)
Add resolve_provider_client(provider, model, async_mode) — a single
entry point for creating properly configured clients. Given a provider
name and optional model, it handles auth lookup (env vars, OAuth
tokens, auth.json), base URL resolution, provider-specific headers,
and API format differences (Chat Completions vs Responses API for
Codex). All auxiliary consumers should route through this instead of
ad-hoc env var lookups.
Refactored get_text_auxiliary_client, get_async_text_auxiliary_client,
and get_vision_auxiliary_client to use the router internally.
2. FIX CODEX VISION BYPASS (vision_tools.py)
vision_tools.py was constructing a raw AsyncOpenAI client from the
sync vision client's api_key/base_url, completely bypassing the Codex
Responses API adapter. When the vision provider resolved to Codex,
the raw client would hit chatgpt.com/backend-api/codex with
chat.completions.create() which only supports the Responses API.
Fix: Added get_async_vision_auxiliary_client() which properly wraps
Codex into AsyncCodexAuxiliaryClient. vision_tools.py now uses this
instead of manual client construction.
3. FIX COMPRESSION FALLBACK + VISION ERROR HANDLING
- context_compressor.py: Removed _get_fallback_client() which blindly
looked for OPENAI_API_KEY + OPENAI_BASE_URL (fails for Codex OAuth,
API-key providers, users without OPENAI_BASE_URL set). Replaced
with fallback loop through resolve_provider_client() for each
known provider, with same-provider dedup.
- vision_tools.py: Added error detection for vision capability
failures. Returns clear message to the model when the configured
model doesn't support vision, instead of a generic error.
Addresses #886
2026-03-11 19:46:47 -07:00
|
|
|
|
# ── API-key providers from PROVIDER_REGISTRY ─────────────────────
|
|
|
|
|
|
try:
|
2026-04-12 18:47:14 -06:00
|
|
|
|
from hermes_cli.auth import (
|
|
|
|
|
|
PROVIDER_REGISTRY,
|
|
|
|
|
|
resolve_api_key_provider_credentials,
|
|
|
|
|
|
resolve_external_process_provider_credentials,
|
|
|
|
|
|
)
|
feat: centralized provider router + fix Codex vision bypass + vision error handling
Three interconnected fixes for auxiliary client infrastructure:
1. CENTRALIZED PROVIDER ROUTER (auxiliary_client.py)
Add resolve_provider_client(provider, model, async_mode) — a single
entry point for creating properly configured clients. Given a provider
name and optional model, it handles auth lookup (env vars, OAuth
tokens, auth.json), base URL resolution, provider-specific headers,
and API format differences (Chat Completions vs Responses API for
Codex). All auxiliary consumers should route through this instead of
ad-hoc env var lookups.
Refactored get_text_auxiliary_client, get_async_text_auxiliary_client,
and get_vision_auxiliary_client to use the router internally.
2. FIX CODEX VISION BYPASS (vision_tools.py)
vision_tools.py was constructing a raw AsyncOpenAI client from the
sync vision client's api_key/base_url, completely bypassing the Codex
Responses API adapter. When the vision provider resolved to Codex,
the raw client would hit chatgpt.com/backend-api/codex with
chat.completions.create() which only supports the Responses API.
Fix: Added get_async_vision_auxiliary_client() which properly wraps
Codex into AsyncCodexAuxiliaryClient. vision_tools.py now uses this
instead of manual client construction.
3. FIX COMPRESSION FALLBACK + VISION ERROR HANDLING
- context_compressor.py: Removed _get_fallback_client() which blindly
looked for OPENAI_API_KEY + OPENAI_BASE_URL (fails for Codex OAuth,
API-key providers, users without OPENAI_BASE_URL set). Replaced
with fallback loop through resolve_provider_client() for each
known provider, with same-provider dedup.
- vision_tools.py: Added error detection for vision capability
failures. Returns clear message to the model when the configured
model doesn't support vision, instead of a generic error.
Addresses #886
2026-03-11 19:46:47 -07:00
|
|
|
|
except ImportError:
|
|
|
|
|
|
logger.debug("hermes_cli.auth not available for provider %s", provider)
|
|
|
|
|
|
return None, None
|
|
|
|
|
|
|
|
|
|
|
|
pconfig = PROVIDER_REGISTRY.get(provider)
|
|
|
|
|
|
if pconfig is None:
|
|
|
|
|
|
logger.warning("resolve_provider_client: unknown provider %r", provider)
|
|
|
|
|
|
return None, None
|
|
|
|
|
|
|
|
|
|
|
|
if pconfig.auth_type == "api_key":
|
2026-03-14 21:14:20 -07:00
|
|
|
|
if provider == "anthropic":
|
|
|
|
|
|
client, default_model = _try_anthropic()
|
|
|
|
|
|
if client is None:
|
|
|
|
|
|
logger.warning("resolve_provider_client: anthropic requested but no Anthropic credentials found")
|
|
|
|
|
|
return None, None
|
2026-04-09 21:20:29 -07:00
|
|
|
|
final_model = _normalize_resolved_model(model or default_model, provider)
|
2026-03-14 21:14:20 -07:00
|
|
|
|
return (_to_async_client(client, final_model) if async_mode else (client, final_model))
|
|
|
|
|
|
|
2026-03-17 23:40:22 -07:00
|
|
|
|
creds = resolve_api_key_provider_credentials(provider)
|
|
|
|
|
|
api_key = str(creds.get("api_key", "")).strip()
|
feat: centralized provider router + fix Codex vision bypass + vision error handling
Three interconnected fixes for auxiliary client infrastructure:
1. CENTRALIZED PROVIDER ROUTER (auxiliary_client.py)
Add resolve_provider_client(provider, model, async_mode) — a single
entry point for creating properly configured clients. Given a provider
name and optional model, it handles auth lookup (env vars, OAuth
tokens, auth.json), base URL resolution, provider-specific headers,
and API format differences (Chat Completions vs Responses API for
Codex). All auxiliary consumers should route through this instead of
ad-hoc env var lookups.
Refactored get_text_auxiliary_client, get_async_text_auxiliary_client,
and get_vision_auxiliary_client to use the router internally.
2. FIX CODEX VISION BYPASS (vision_tools.py)
vision_tools.py was constructing a raw AsyncOpenAI client from the
sync vision client's api_key/base_url, completely bypassing the Codex
Responses API adapter. When the vision provider resolved to Codex,
the raw client would hit chatgpt.com/backend-api/codex with
chat.completions.create() which only supports the Responses API.
Fix: Added get_async_vision_auxiliary_client() which properly wraps
Codex into AsyncCodexAuxiliaryClient. vision_tools.py now uses this
instead of manual client construction.
3. FIX COMPRESSION FALLBACK + VISION ERROR HANDLING
- context_compressor.py: Removed _get_fallback_client() which blindly
looked for OPENAI_API_KEY + OPENAI_BASE_URL (fails for Codex OAuth,
API-key providers, users without OPENAI_BASE_URL set). Replaced
with fallback loop through resolve_provider_client() for each
known provider, with same-provider dedup.
- vision_tools.py: Added error detection for vision capability
failures. Returns clear message to the model when the configured
model doesn't support vision, instead of a generic error.
Addresses #886
2026-03-11 19:46:47 -07:00
|
|
|
|
if not api_key:
|
2026-03-17 23:40:22 -07:00
|
|
|
|
tried_sources = list(pconfig.api_key_env_vars)
|
|
|
|
|
|
if provider == "copilot":
|
|
|
|
|
|
tried_sources.append("gh auth token")
|
2026-04-02 12:58:08 +05:30
|
|
|
|
logger.debug("resolve_provider_client: provider %s has no API "
|
|
|
|
|
|
"key configured (tried: %s)",
|
|
|
|
|
|
provider, ", ".join(tried_sources))
|
feat: centralized provider router + fix Codex vision bypass + vision error handling
Three interconnected fixes for auxiliary client infrastructure:
1. CENTRALIZED PROVIDER ROUTER (auxiliary_client.py)
Add resolve_provider_client(provider, model, async_mode) — a single
entry point for creating properly configured clients. Given a provider
name and optional model, it handles auth lookup (env vars, OAuth
tokens, auth.json), base URL resolution, provider-specific headers,
and API format differences (Chat Completions vs Responses API for
Codex). All auxiliary consumers should route through this instead of
ad-hoc env var lookups.
Refactored get_text_auxiliary_client, get_async_text_auxiliary_client,
and get_vision_auxiliary_client to use the router internally.
2. FIX CODEX VISION BYPASS (vision_tools.py)
vision_tools.py was constructing a raw AsyncOpenAI client from the
sync vision client's api_key/base_url, completely bypassing the Codex
Responses API adapter. When the vision provider resolved to Codex,
the raw client would hit chatgpt.com/backend-api/codex with
chat.completions.create() which only supports the Responses API.
Fix: Added get_async_vision_auxiliary_client() which properly wraps
Codex into AsyncCodexAuxiliaryClient. vision_tools.py now uses this
instead of manual client construction.
3. FIX COMPRESSION FALLBACK + VISION ERROR HANDLING
- context_compressor.py: Removed _get_fallback_client() which blindly
looked for OPENAI_API_KEY + OPENAI_BASE_URL (fails for Codex OAuth,
API-key providers, users without OPENAI_BASE_URL set). Replaced
with fallback loop through resolve_provider_client() for each
known provider, with same-provider dedup.
- vision_tools.py: Added error detection for vision capability
failures. Returns clear message to the model when the configured
model doesn't support vision, instead of a generic error.
Addresses #886
2026-03-11 19:46:47 -07:00
|
|
|
|
return None, None
|
|
|
|
|
|
|
2026-04-07 22:23:28 -07:00
|
|
|
|
base_url = _to_openai_base_url(
|
|
|
|
|
|
str(creds.get("base_url", "")).strip().rstrip("/") or pconfig.inference_base_url
|
|
|
|
|
|
)
|
feat: centralized provider router + fix Codex vision bypass + vision error handling
Three interconnected fixes for auxiliary client infrastructure:
1. CENTRALIZED PROVIDER ROUTER (auxiliary_client.py)
Add resolve_provider_client(provider, model, async_mode) — a single
entry point for creating properly configured clients. Given a provider
name and optional model, it handles auth lookup (env vars, OAuth
tokens, auth.json), base URL resolution, provider-specific headers,
and API format differences (Chat Completions vs Responses API for
Codex). All auxiliary consumers should route through this instead of
ad-hoc env var lookups.
Refactored get_text_auxiliary_client, get_async_text_auxiliary_client,
and get_vision_auxiliary_client to use the router internally.
2. FIX CODEX VISION BYPASS (vision_tools.py)
vision_tools.py was constructing a raw AsyncOpenAI client from the
sync vision client's api_key/base_url, completely bypassing the Codex
Responses API adapter. When the vision provider resolved to Codex,
the raw client would hit chatgpt.com/backend-api/codex with
chat.completions.create() which only supports the Responses API.
Fix: Added get_async_vision_auxiliary_client() which properly wraps
Codex into AsyncCodexAuxiliaryClient. vision_tools.py now uses this
instead of manual client construction.
3. FIX COMPRESSION FALLBACK + VISION ERROR HANDLING
- context_compressor.py: Removed _get_fallback_client() which blindly
looked for OPENAI_API_KEY + OPENAI_BASE_URL (fails for Codex OAuth,
API-key providers, users without OPENAI_BASE_URL set). Replaced
with fallback loop through resolve_provider_client() for each
known provider, with same-provider dedup.
- vision_tools.py: Added error detection for vision capability
failures. Returns clear message to the model when the configured
model doesn't support vision, instead of a generic error.
Addresses #886
2026-03-11 19:46:47 -07:00
|
|
|
|
|
|
|
|
|
|
default_model = _API_KEY_PROVIDER_AUX_MODELS.get(provider, "")
|
2026-04-09 21:20:29 -07:00
|
|
|
|
final_model = _normalize_resolved_model(model or default_model, provider)
|
feat: centralized provider router + fix Codex vision bypass + vision error handling
Three interconnected fixes for auxiliary client infrastructure:
1. CENTRALIZED PROVIDER ROUTER (auxiliary_client.py)
Add resolve_provider_client(provider, model, async_mode) — a single
entry point for creating properly configured clients. Given a provider
name and optional model, it handles auth lookup (env vars, OAuth
tokens, auth.json), base URL resolution, provider-specific headers,
and API format differences (Chat Completions vs Responses API for
Codex). All auxiliary consumers should route through this instead of
ad-hoc env var lookups.
Refactored get_text_auxiliary_client, get_async_text_auxiliary_client,
and get_vision_auxiliary_client to use the router internally.
2. FIX CODEX VISION BYPASS (vision_tools.py)
vision_tools.py was constructing a raw AsyncOpenAI client from the
sync vision client's api_key/base_url, completely bypassing the Codex
Responses API adapter. When the vision provider resolved to Codex,
the raw client would hit chatgpt.com/backend-api/codex with
chat.completions.create() which only supports the Responses API.
Fix: Added get_async_vision_auxiliary_client() which properly wraps
Codex into AsyncCodexAuxiliaryClient. vision_tools.py now uses this
instead of manual client construction.
3. FIX COMPRESSION FALLBACK + VISION ERROR HANDLING
- context_compressor.py: Removed _get_fallback_client() which blindly
looked for OPENAI_API_KEY + OPENAI_BASE_URL (fails for Codex OAuth,
API-key providers, users without OPENAI_BASE_URL set). Replaced
with fallback loop through resolve_provider_client() for each
known provider, with same-provider dedup.
- vision_tools.py: Added error detection for vision capability
failures. Returns clear message to the model when the configured
model doesn't support vision, instead of a generic error.
Addresses #886
2026-03-11 19:46:47 -07:00
|
|
|
|
|
2026-04-20 00:00:50 +05:30
|
|
|
|
if provider == "gemini":
|
2026-04-20 00:41:20 +05:30
|
|
|
|
from agent.gemini_native_adapter import GeminiNativeClient, is_native_gemini_base_url
|
2026-04-20 00:00:50 +05:30
|
|
|
|
|
2026-04-20 00:41:20 +05:30
|
|
|
|
if is_native_gemini_base_url(base_url):
|
|
|
|
|
|
client = GeminiNativeClient(api_key=api_key, base_url=base_url)
|
|
|
|
|
|
logger.debug("resolve_provider_client: %s (%s)", provider, final_model)
|
|
|
|
|
|
return (_to_async_client(client, final_model) if async_mode
|
|
|
|
|
|
else (client, final_model))
|
2026-04-20 00:00:50 +05:30
|
|
|
|
|
feat: centralized provider router + fix Codex vision bypass + vision error handling
Three interconnected fixes for auxiliary client infrastructure:
1. CENTRALIZED PROVIDER ROUTER (auxiliary_client.py)
Add resolve_provider_client(provider, model, async_mode) — a single
entry point for creating properly configured clients. Given a provider
name and optional model, it handles auth lookup (env vars, OAuth
tokens, auth.json), base URL resolution, provider-specific headers,
and API format differences (Chat Completions vs Responses API for
Codex). All auxiliary consumers should route through this instead of
ad-hoc env var lookups.
Refactored get_text_auxiliary_client, get_async_text_auxiliary_client,
and get_vision_auxiliary_client to use the router internally.
2. FIX CODEX VISION BYPASS (vision_tools.py)
vision_tools.py was constructing a raw AsyncOpenAI client from the
sync vision client's api_key/base_url, completely bypassing the Codex
Responses API adapter. When the vision provider resolved to Codex,
the raw client would hit chatgpt.com/backend-api/codex with
chat.completions.create() which only supports the Responses API.
Fix: Added get_async_vision_auxiliary_client() which properly wraps
Codex into AsyncCodexAuxiliaryClient. vision_tools.py now uses this
instead of manual client construction.
3. FIX COMPRESSION FALLBACK + VISION ERROR HANDLING
- context_compressor.py: Removed _get_fallback_client() which blindly
looked for OPENAI_API_KEY + OPENAI_BASE_URL (fails for Codex OAuth,
API-key providers, users without OPENAI_BASE_URL set). Replaced
with fallback loop through resolve_provider_client() for each
known provider, with same-provider dedup.
- vision_tools.py: Added error detection for vision capability
failures. Returns clear message to the model when the configured
model doesn't support vision, instead of a generic error.
Addresses #886
2026-03-11 19:46:47 -07:00
|
|
|
|
# Provider-specific headers
|
|
|
|
|
|
headers = {}
|
fix: sweep remaining provider-URL substring checks across codebase
Completes the hostname-hardening sweep — every substring check against a
provider host in live-routing code is now hostname-based. This closes the
same false-positive class for OpenRouter, GitHub Copilot, Kimi, Qwen,
ChatGPT/Codex, Bedrock, GitHub Models, Vercel AI Gateway, Nous, Z.AI,
Moonshot, Arcee, and MiniMax that the original PR closed for OpenAI, xAI,
and Anthropic.
New helper:
- utils.base_url_host_matches(base_url, domain) — safe counterpart to
'domain in base_url'. Accepts hostname equality and subdomain matches;
rejects path segments, host suffixes, and prefix collisions.
Call sites converted (real-code only; tests, optional-skills, red-teaming
scripts untouched):
run_agent.py (10 sites):
- AIAgent.__init__ Bedrock branch, ChatGPT/Codex branch (also path check)
- header cascade for openrouter / copilot / kimi / qwen / chatgpt
- interleaved-thinking trigger (openrouter + claude)
- _is_openrouter_url(), _is_qwen_portal()
- is_native_anthropic check
- github-models-vs-copilot detection (3 sites)
- reasoning-capable route gate (nousresearch, vercel, github)
- codex-backend detection in API kwargs build
- fallback api_mode Bedrock detection
agent/auxiliary_client.py (7 sites):
- extra-headers cascades in 4 distinct client-construction paths
(resolve custom, resolve auto, OpenRouter-fallback-to-custom,
_async_client_from_sync, resolve_provider_client explicit-custom,
resolve_auto_with_codex)
- _is_openrouter_client() base_url sniff
agent/usage_pricing.py:
- resolve_billing_route openrouter branch
agent/model_metadata.py:
- _is_openrouter_base_url(), Bedrock context-length lookup
hermes_cli/providers.py:
- determine_api_mode Bedrock heuristic
hermes_cli/runtime_provider.py:
- _is_openrouter_url flag for API-key preference (issues #420, #560)
hermes_cli/doctor.py:
- Kimi User-Agent header for /models probes
tools/delegate_tool.py:
- subagent Codex endpoint detection
trajectory_compressor.py:
- _detect_provider() cascade (8 providers: openrouter, nous, codex, zai,
kimi-coding, arcee, minimax-cn, minimax)
cli.py, gateway/run.py:
- /model-switch cache-enabled hint (openrouter + claude)
Bedrock detection tightened from 'bedrock-runtime in url' to
'hostname starts with bedrock-runtime. AND host is under amazonaws.com'.
ChatGPT/Codex detection tightened from 'chatgpt.com/backend-api/codex in
url' to 'hostname is chatgpt.com AND path contains /backend-api/codex'.
Tests:
- tests/test_base_url_hostname.py extended with a base_url_host_matches
suite (exact match, subdomain, path-segment rejection, host-suffix
rejection, host-prefix rejection, empty-input, case-insensitivity,
trailing dot).
Validation: 651 targeted tests pass (runtime_provider, minimax, bedrock,
gemini, auxiliary, codex_cloudflare, usage_pricing, compressor_fallback,
fallback_model, openai_client_lifecycle, provider_parity, cli_provider_resolution,
delegate, credential_pool, context_compressor, plus the 4 hostname test
modules). 26-assertion E2E call-site verification across 6 modules passes.
2026-04-20 21:17:28 -07:00
|
|
|
|
if base_url_host_matches(base_url, "api.kimi.com"):
|
2026-04-18 22:55:36 +08:00
|
|
|
|
headers["User-Agent"] = "claude-code/0.1.0"
|
fix: sweep remaining provider-URL substring checks across codebase
Completes the hostname-hardening sweep — every substring check against a
provider host in live-routing code is now hostname-based. This closes the
same false-positive class for OpenRouter, GitHub Copilot, Kimi, Qwen,
ChatGPT/Codex, Bedrock, GitHub Models, Vercel AI Gateway, Nous, Z.AI,
Moonshot, Arcee, and MiniMax that the original PR closed for OpenAI, xAI,
and Anthropic.
New helper:
- utils.base_url_host_matches(base_url, domain) — safe counterpart to
'domain in base_url'. Accepts hostname equality and subdomain matches;
rejects path segments, host suffixes, and prefix collisions.
Call sites converted (real-code only; tests, optional-skills, red-teaming
scripts untouched):
run_agent.py (10 sites):
- AIAgent.__init__ Bedrock branch, ChatGPT/Codex branch (also path check)
- header cascade for openrouter / copilot / kimi / qwen / chatgpt
- interleaved-thinking trigger (openrouter + claude)
- _is_openrouter_url(), _is_qwen_portal()
- is_native_anthropic check
- github-models-vs-copilot detection (3 sites)
- reasoning-capable route gate (nousresearch, vercel, github)
- codex-backend detection in API kwargs build
- fallback api_mode Bedrock detection
agent/auxiliary_client.py (7 sites):
- extra-headers cascades in 4 distinct client-construction paths
(resolve custom, resolve auto, OpenRouter-fallback-to-custom,
_async_client_from_sync, resolve_provider_client explicit-custom,
resolve_auto_with_codex)
- _is_openrouter_client() base_url sniff
agent/usage_pricing.py:
- resolve_billing_route openrouter branch
agent/model_metadata.py:
- _is_openrouter_base_url(), Bedrock context-length lookup
hermes_cli/providers.py:
- determine_api_mode Bedrock heuristic
hermes_cli/runtime_provider.py:
- _is_openrouter_url flag for API-key preference (issues #420, #560)
hermes_cli/doctor.py:
- Kimi User-Agent header for /models probes
tools/delegate_tool.py:
- subagent Codex endpoint detection
trajectory_compressor.py:
- _detect_provider() cascade (8 providers: openrouter, nous, codex, zai,
kimi-coding, arcee, minimax-cn, minimax)
cli.py, gateway/run.py:
- /model-switch cache-enabled hint (openrouter + claude)
Bedrock detection tightened from 'bedrock-runtime in url' to
'hostname starts with bedrock-runtime. AND host is under amazonaws.com'.
ChatGPT/Codex detection tightened from 'chatgpt.com/backend-api/codex in
url' to 'hostname is chatgpt.com AND path contains /backend-api/codex'.
Tests:
- tests/test_base_url_hostname.py extended with a base_url_host_matches
suite (exact match, subdomain, path-segment rejection, host-suffix
rejection, host-prefix rejection, empty-input, case-insensitivity,
trailing dot).
Validation: 651 targeted tests pass (runtime_provider, minimax, bedrock,
gemini, auxiliary, codex_cloudflare, usage_pricing, compressor_fallback,
fallback_model, openai_client_lifecycle, provider_parity, cli_provider_resolution,
delegate, credential_pool, context_compressor, plus the 4 hostname test
modules). 26-assertion E2E call-site verification across 6 modules passes.
2026-04-20 21:17:28 -07:00
|
|
|
|
elif base_url_host_matches(base_url, "api.githubcopilot.com"):
|
2026-03-17 23:40:22 -07:00
|
|
|
|
from hermes_cli.models import copilot_default_headers
|
|
|
|
|
|
|
|
|
|
|
|
headers.update(copilot_default_headers())
|
feat: centralized provider router + fix Codex vision bypass + vision error handling
Three interconnected fixes for auxiliary client infrastructure:
1. CENTRALIZED PROVIDER ROUTER (auxiliary_client.py)
Add resolve_provider_client(provider, model, async_mode) — a single
entry point for creating properly configured clients. Given a provider
name and optional model, it handles auth lookup (env vars, OAuth
tokens, auth.json), base URL resolution, provider-specific headers,
and API format differences (Chat Completions vs Responses API for
Codex). All auxiliary consumers should route through this instead of
ad-hoc env var lookups.
Refactored get_text_auxiliary_client, get_async_text_auxiliary_client,
and get_vision_auxiliary_client to use the router internally.
2. FIX CODEX VISION BYPASS (vision_tools.py)
vision_tools.py was constructing a raw AsyncOpenAI client from the
sync vision client's api_key/base_url, completely bypassing the Codex
Responses API adapter. When the vision provider resolved to Codex,
the raw client would hit chatgpt.com/backend-api/codex with
chat.completions.create() which only supports the Responses API.
Fix: Added get_async_vision_auxiliary_client() which properly wraps
Codex into AsyncCodexAuxiliaryClient. vision_tools.py now uses this
instead of manual client construction.
3. FIX COMPRESSION FALLBACK + VISION ERROR HANDLING
- context_compressor.py: Removed _get_fallback_client() which blindly
looked for OPENAI_API_KEY + OPENAI_BASE_URL (fails for Codex OAuth,
API-key providers, users without OPENAI_BASE_URL set). Replaced
with fallback loop through resolve_provider_client() for each
known provider, with same-provider dedup.
- vision_tools.py: Added error detection for vision capability
failures. Returns clear message to the model when the configured
model doesn't support vision, instead of a generic error.
Addresses #886
2026-03-11 19:46:47 -07:00
|
|
|
|
client = OpenAI(api_key=api_key, base_url=base_url,
|
|
|
|
|
|
**({"default_headers": headers} if headers else {}))
|
2026-04-10 04:35:07 +00:00
|
|
|
|
|
|
|
|
|
|
# Copilot GPT-5+ models (except gpt-5-mini) require the Responses
|
|
|
|
|
|
# API — they are not accessible via /chat/completions. Wrap the
|
|
|
|
|
|
# plain client in CodexAuxiliaryClient so call_llm() transparently
|
|
|
|
|
|
# routes through responses.stream().
|
|
|
|
|
|
if provider == "copilot" and final_model and not raw_codex:
|
|
|
|
|
|
try:
|
|
|
|
|
|
from hermes_cli.models import _should_use_copilot_responses_api
|
|
|
|
|
|
if _should_use_copilot_responses_api(final_model):
|
|
|
|
|
|
logger.debug(
|
|
|
|
|
|
"resolve_provider_client: copilot model %s needs "
|
|
|
|
|
|
"Responses API — wrapping with CodexAuxiliaryClient",
|
|
|
|
|
|
final_model)
|
|
|
|
|
|
client = CodexAuxiliaryClient(client, final_model)
|
|
|
|
|
|
except ImportError:
|
|
|
|
|
|
pass
|
|
|
|
|
|
|
2026-04-11 13:50:43 +05:30
|
|
|
|
# Honor api_mode for any API-key provider (e.g. direct OpenAI with
|
|
|
|
|
|
# codex-family models). The copilot-specific wrapping above handles
|
|
|
|
|
|
# copilot; this covers the general case (#6800).
|
|
|
|
|
|
client = _wrap_if_needed(client, final_model, base_url)
|
|
|
|
|
|
|
feat: centralized provider router + fix Codex vision bypass + vision error handling
Three interconnected fixes for auxiliary client infrastructure:
1. CENTRALIZED PROVIDER ROUTER (auxiliary_client.py)
Add resolve_provider_client(provider, model, async_mode) — a single
entry point for creating properly configured clients. Given a provider
name and optional model, it handles auth lookup (env vars, OAuth
tokens, auth.json), base URL resolution, provider-specific headers,
and API format differences (Chat Completions vs Responses API for
Codex). All auxiliary consumers should route through this instead of
ad-hoc env var lookups.
Refactored get_text_auxiliary_client, get_async_text_auxiliary_client,
and get_vision_auxiliary_client to use the router internally.
2. FIX CODEX VISION BYPASS (vision_tools.py)
vision_tools.py was constructing a raw AsyncOpenAI client from the
sync vision client's api_key/base_url, completely bypassing the Codex
Responses API adapter. When the vision provider resolved to Codex,
the raw client would hit chatgpt.com/backend-api/codex with
chat.completions.create() which only supports the Responses API.
Fix: Added get_async_vision_auxiliary_client() which properly wraps
Codex into AsyncCodexAuxiliaryClient. vision_tools.py now uses this
instead of manual client construction.
3. FIX COMPRESSION FALLBACK + VISION ERROR HANDLING
- context_compressor.py: Removed _get_fallback_client() which blindly
looked for OPENAI_API_KEY + OPENAI_BASE_URL (fails for Codex OAuth,
API-key providers, users without OPENAI_BASE_URL set). Replaced
with fallback loop through resolve_provider_client() for each
known provider, with same-provider dedup.
- vision_tools.py: Added error detection for vision capability
failures. Returns clear message to the model when the configured
model doesn't support vision, instead of a generic error.
Addresses #886
2026-03-11 19:46:47 -07:00
|
|
|
|
logger.debug("resolve_provider_client: %s (%s)", provider, final_model)
|
|
|
|
|
|
return (_to_async_client(client, final_model) if async_mode
|
|
|
|
|
|
else (client, final_model))
|
|
|
|
|
|
|
2026-04-12 18:47:14 -06:00
|
|
|
|
if pconfig.auth_type == "external_process":
|
|
|
|
|
|
creds = resolve_external_process_provider_credentials(provider)
|
|
|
|
|
|
final_model = _normalize_resolved_model(model or _read_main_model(), provider)
|
|
|
|
|
|
if provider == "copilot-acp":
|
|
|
|
|
|
api_key = str(creds.get("api_key", "")).strip()
|
|
|
|
|
|
base_url = str(creds.get("base_url", "")).strip()
|
|
|
|
|
|
command = str(creds.get("command", "")).strip() or None
|
|
|
|
|
|
args = list(creds.get("args") or [])
|
|
|
|
|
|
if not final_model:
|
|
|
|
|
|
logger.warning(
|
|
|
|
|
|
"resolve_provider_client: copilot-acp requested but no model "
|
|
|
|
|
|
"was provided or configured"
|
|
|
|
|
|
)
|
|
|
|
|
|
return None, None
|
|
|
|
|
|
if not api_key or not base_url:
|
|
|
|
|
|
logger.warning(
|
|
|
|
|
|
"resolve_provider_client: copilot-acp requested but external "
|
|
|
|
|
|
"process credentials are incomplete"
|
|
|
|
|
|
)
|
|
|
|
|
|
return None, None
|
|
|
|
|
|
from agent.copilot_acp_client import CopilotACPClient
|
|
|
|
|
|
|
|
|
|
|
|
client = CopilotACPClient(
|
|
|
|
|
|
api_key=api_key,
|
|
|
|
|
|
base_url=base_url,
|
|
|
|
|
|
command=command,
|
|
|
|
|
|
args=args,
|
|
|
|
|
|
)
|
|
|
|
|
|
logger.debug("resolve_provider_client: %s (%s)", provider, final_model)
|
|
|
|
|
|
return (_to_async_client(client, final_model) if async_mode
|
|
|
|
|
|
else (client, final_model))
|
|
|
|
|
|
logger.warning("resolve_provider_client: external-process provider %s not "
|
|
|
|
|
|
"directly supported", provider)
|
|
|
|
|
|
return None, None
|
|
|
|
|
|
|
feat: centralized provider router + fix Codex vision bypass + vision error handling
Three interconnected fixes for auxiliary client infrastructure:
1. CENTRALIZED PROVIDER ROUTER (auxiliary_client.py)
Add resolve_provider_client(provider, model, async_mode) — a single
entry point for creating properly configured clients. Given a provider
name and optional model, it handles auth lookup (env vars, OAuth
tokens, auth.json), base URL resolution, provider-specific headers,
and API format differences (Chat Completions vs Responses API for
Codex). All auxiliary consumers should route through this instead of
ad-hoc env var lookups.
Refactored get_text_auxiliary_client, get_async_text_auxiliary_client,
and get_vision_auxiliary_client to use the router internally.
2. FIX CODEX VISION BYPASS (vision_tools.py)
vision_tools.py was constructing a raw AsyncOpenAI client from the
sync vision client's api_key/base_url, completely bypassing the Codex
Responses API adapter. When the vision provider resolved to Codex,
the raw client would hit chatgpt.com/backend-api/codex with
chat.completions.create() which only supports the Responses API.
Fix: Added get_async_vision_auxiliary_client() which properly wraps
Codex into AsyncCodexAuxiliaryClient. vision_tools.py now uses this
instead of manual client construction.
3. FIX COMPRESSION FALLBACK + VISION ERROR HANDLING
- context_compressor.py: Removed _get_fallback_client() which blindly
looked for OPENAI_API_KEY + OPENAI_BASE_URL (fails for Codex OAuth,
API-key providers, users without OPENAI_BASE_URL set). Replaced
with fallback loop through resolve_provider_client() for each
known provider, with same-provider dedup.
- vision_tools.py: Added error detection for vision capability
failures. Returns clear message to the model when the configured
model doesn't support vision, instead of a generic error.
Addresses #886
2026-03-11 19:46:47 -07:00
|
|
|
|
elif pconfig.auth_type in ("oauth_device_code", "oauth_external"):
|
|
|
|
|
|
# OAuth providers — route through their specific try functions
|
|
|
|
|
|
if provider == "nous":
|
|
|
|
|
|
return resolve_provider_client("nous", model, async_mode)
|
|
|
|
|
|
if provider == "openai-codex":
|
|
|
|
|
|
return resolve_provider_client("openai-codex", model, async_mode)
|
2026-03-11 20:14:44 -07:00
|
|
|
|
# Other OAuth providers not directly supported
|
feat: centralized provider router + fix Codex vision bypass + vision error handling
Three interconnected fixes for auxiliary client infrastructure:
1. CENTRALIZED PROVIDER ROUTER (auxiliary_client.py)
Add resolve_provider_client(provider, model, async_mode) — a single
entry point for creating properly configured clients. Given a provider
name and optional model, it handles auth lookup (env vars, OAuth
tokens, auth.json), base URL resolution, provider-specific headers,
and API format differences (Chat Completions vs Responses API for
Codex). All auxiliary consumers should route through this instead of
ad-hoc env var lookups.
Refactored get_text_auxiliary_client, get_async_text_auxiliary_client,
and get_vision_auxiliary_client to use the router internally.
2. FIX CODEX VISION BYPASS (vision_tools.py)
vision_tools.py was constructing a raw AsyncOpenAI client from the
sync vision client's api_key/base_url, completely bypassing the Codex
Responses API adapter. When the vision provider resolved to Codex,
the raw client would hit chatgpt.com/backend-api/codex with
chat.completions.create() which only supports the Responses API.
Fix: Added get_async_vision_auxiliary_client() which properly wraps
Codex into AsyncCodexAuxiliaryClient. vision_tools.py now uses this
instead of manual client construction.
3. FIX COMPRESSION FALLBACK + VISION ERROR HANDLING
- context_compressor.py: Removed _get_fallback_client() which blindly
looked for OPENAI_API_KEY + OPENAI_BASE_URL (fails for Codex OAuth,
API-key providers, users without OPENAI_BASE_URL set). Replaced
with fallback loop through resolve_provider_client() for each
known provider, with same-provider dedup.
- vision_tools.py: Added error detection for vision capability
failures. Returns clear message to the model when the configured
model doesn't support vision, instead of a generic error.
Addresses #886
2026-03-11 19:46:47 -07:00
|
|
|
|
logger.warning("resolve_provider_client: OAuth provider %s not "
|
|
|
|
|
|
"directly supported, try 'auto'", provider)
|
|
|
|
|
|
return None, None
|
|
|
|
|
|
|
|
|
|
|
|
logger.warning("resolve_provider_client: unhandled auth_type %s for %s",
|
|
|
|
|
|
pconfig.auth_type, provider)
|
|
|
|
|
|
return None, None
|
|
|
|
|
|
|
|
|
|
|
|
|
2026-03-07 08:52:06 -08:00
|
|
|
|
# ── Public API ──────────────────────────────────────────────────────────────
|
|
|
|
|
|
|
2026-04-12 00:10:19 -04:00
|
|
|
|
def get_text_auxiliary_client(
|
|
|
|
|
|
task: str = "",
|
|
|
|
|
|
*,
|
|
|
|
|
|
main_runtime: Optional[Dict[str, Any]] = None,
|
|
|
|
|
|
) -> Tuple[Optional[OpenAI], Optional[str]]:
|
2026-03-07 08:52:06 -08:00
|
|
|
|
"""Return (client, default_model_slug) for text-only auxiliary tasks.
|
|
|
|
|
|
|
|
|
|
|
|
Args:
|
|
|
|
|
|
task: Optional task name ("compression", "web_extract") to check
|
|
|
|
|
|
for a task-specific provider override.
|
|
|
|
|
|
|
2026-04-13 04:59:26 -07:00
|
|
|
|
Callers may override the returned model via config.yaml
|
|
|
|
|
|
(e.g. auxiliary.compression.model, auxiliary.web_extract.model).
|
2026-03-07 08:52:06 -08:00
|
|
|
|
"""
|
2026-04-11 13:50:43 +05:30
|
|
|
|
provider, model, base_url, api_key, api_mode = _resolve_task_provider_model(task or None)
|
2026-03-14 20:48:29 -07:00
|
|
|
|
return resolve_provider_client(
|
|
|
|
|
|
provider,
|
|
|
|
|
|
model=model,
|
|
|
|
|
|
explicit_base_url=base_url,
|
|
|
|
|
|
explicit_api_key=api_key,
|
2026-04-11 13:50:43 +05:30
|
|
|
|
api_mode=api_mode,
|
2026-04-12 00:10:19 -04:00
|
|
|
|
main_runtime=main_runtime,
|
2026-03-14 20:48:29 -07:00
|
|
|
|
)
|
2026-03-07 08:52:06 -08:00
|
|
|
|
|
|
|
|
|
|
|
2026-04-12 00:10:19 -04:00
|
|
|
|
def get_async_text_auxiliary_client(task: str = "", *, main_runtime: Optional[Dict[str, Any]] = None):
|
2026-02-28 21:47:51 -08:00
|
|
|
|
"""Return (async_client, model_slug) for async consumers.
|
|
|
|
|
|
|
|
|
|
|
|
For standard providers returns (AsyncOpenAI, model). For Codex returns
|
|
|
|
|
|
(AsyncCodexAuxiliaryClient, model) which wraps the Responses API.
|
|
|
|
|
|
Returns (None, None) when no provider is available.
|
|
|
|
|
|
"""
|
2026-04-11 13:50:43 +05:30
|
|
|
|
provider, model, base_url, api_key, api_mode = _resolve_task_provider_model(task or None)
|
2026-03-14 20:48:29 -07:00
|
|
|
|
return resolve_provider_client(
|
|
|
|
|
|
provider,
|
|
|
|
|
|
model=model,
|
|
|
|
|
|
async_mode=True,
|
|
|
|
|
|
explicit_base_url=base_url,
|
|
|
|
|
|
explicit_api_key=api_key,
|
2026-04-11 13:50:43 +05:30
|
|
|
|
api_mode=api_mode,
|
2026-04-12 00:10:19 -04:00
|
|
|
|
main_runtime=main_runtime,
|
2026-03-14 20:48:29 -07:00
|
|
|
|
)
|
2026-02-28 21:47:51 -08:00
|
|
|
|
|
|
|
|
|
|
|
2026-03-14 20:22:13 -07:00
|
|
|
|
_VISION_AUTO_PROVIDER_ORDER = (
|
|
|
|
|
|
"openrouter",
|
|
|
|
|
|
"nous",
|
|
|
|
|
|
)
|
2026-02-22 02:16:11 -08:00
|
|
|
|
|
2026-03-08 18:06:40 -07:00
|
|
|
|
|
2026-03-14 20:22:13 -07:00
|
|
|
|
def _normalize_vision_provider(provider: Optional[str]) -> str:
|
2026-04-13 16:08:19 +08:00
|
|
|
|
return _normalize_aux_provider(provider)
|
2026-03-14 20:22:13 -07:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def _resolve_strict_vision_backend(provider: str) -> Tuple[Optional[Any], Optional[str]]:
|
|
|
|
|
|
provider = _normalize_vision_provider(provider)
|
|
|
|
|
|
if provider == "openrouter":
|
|
|
|
|
|
return _try_openrouter()
|
|
|
|
|
|
if provider == "nous":
|
2026-04-07 21:41:05 -07:00
|
|
|
|
return _try_nous(vision=True)
|
2026-03-14 20:22:13 -07:00
|
|
|
|
if provider == "openai-codex":
|
|
|
|
|
|
return _try_codex()
|
2026-03-14 21:14:20 -07:00
|
|
|
|
if provider == "anthropic":
|
|
|
|
|
|
return _try_anthropic()
|
2026-03-14 20:22:13 -07:00
|
|
|
|
if provider == "custom":
|
|
|
|
|
|
return _try_custom_endpoint()
|
2026-03-08 18:06:40 -07:00
|
|
|
|
return None, None
|
2026-02-25 18:39:36 -08:00
|
|
|
|
|
|
|
|
|
|
|
2026-03-14 20:22:13 -07:00
|
|
|
|
def _strict_vision_backend_available(provider: str) -> bool:
|
|
|
|
|
|
return _resolve_strict_vision_backend(provider)[0] is not None
|
feat: centralized provider router + fix Codex vision bypass + vision error handling
Three interconnected fixes for auxiliary client infrastructure:
1. CENTRALIZED PROVIDER ROUTER (auxiliary_client.py)
Add resolve_provider_client(provider, model, async_mode) — a single
entry point for creating properly configured clients. Given a provider
name and optional model, it handles auth lookup (env vars, OAuth
tokens, auth.json), base URL resolution, provider-specific headers,
and API format differences (Chat Completions vs Responses API for
Codex). All auxiliary consumers should route through this instead of
ad-hoc env var lookups.
Refactored get_text_auxiliary_client, get_async_text_auxiliary_client,
and get_vision_auxiliary_client to use the router internally.
2. FIX CODEX VISION BYPASS (vision_tools.py)
vision_tools.py was constructing a raw AsyncOpenAI client from the
sync vision client's api_key/base_url, completely bypassing the Codex
Responses API adapter. When the vision provider resolved to Codex,
the raw client would hit chatgpt.com/backend-api/codex with
chat.completions.create() which only supports the Responses API.
Fix: Added get_async_vision_auxiliary_client() which properly wraps
Codex into AsyncCodexAuxiliaryClient. vision_tools.py now uses this
instead of manual client construction.
3. FIX COMPRESSION FALLBACK + VISION ERROR HANDLING
- context_compressor.py: Removed _get_fallback_client() which blindly
looked for OPENAI_API_KEY + OPENAI_BASE_URL (fails for Codex OAuth,
API-key providers, users without OPENAI_BASE_URL set). Replaced
with fallback loop through resolve_provider_client() for each
known provider, with same-provider dedup.
- vision_tools.py: Added error detection for vision capability
failures. Returns clear message to the model when the configured
model doesn't support vision, instead of a generic error.
Addresses #886
2026-03-11 19:46:47 -07:00
|
|
|
|
|
|
|
|
|
|
|
2026-03-14 20:22:13 -07:00
|
|
|
|
def get_available_vision_backends() -> List[str]:
|
|
|
|
|
|
"""Return the currently available vision backends in auto-selection order.
|
|
|
|
|
|
|
2026-04-08 16:37:05 -07:00
|
|
|
|
Order: active provider → OpenRouter → Nous → stop. This is the single
|
|
|
|
|
|
source of truth for setup, tool gating, and runtime auto-routing of
|
|
|
|
|
|
vision tasks.
|
feat: centralized provider router + fix Codex vision bypass + vision error handling
Three interconnected fixes for auxiliary client infrastructure:
1. CENTRALIZED PROVIDER ROUTER (auxiliary_client.py)
Add resolve_provider_client(provider, model, async_mode) — a single
entry point for creating properly configured clients. Given a provider
name and optional model, it handles auth lookup (env vars, OAuth
tokens, auth.json), base URL resolution, provider-specific headers,
and API format differences (Chat Completions vs Responses API for
Codex). All auxiliary consumers should route through this instead of
ad-hoc env var lookups.
Refactored get_text_auxiliary_client, get_async_text_auxiliary_client,
and get_vision_auxiliary_client to use the router internally.
2. FIX CODEX VISION BYPASS (vision_tools.py)
vision_tools.py was constructing a raw AsyncOpenAI client from the
sync vision client's api_key/base_url, completely bypassing the Codex
Responses API adapter. When the vision provider resolved to Codex,
the raw client would hit chatgpt.com/backend-api/codex with
chat.completions.create() which only supports the Responses API.
Fix: Added get_async_vision_auxiliary_client() which properly wraps
Codex into AsyncCodexAuxiliaryClient. vision_tools.py now uses this
instead of manual client construction.
3. FIX COMPRESSION FALLBACK + VISION ERROR HANDLING
- context_compressor.py: Removed _get_fallback_client() which blindly
looked for OPENAI_API_KEY + OPENAI_BASE_URL (fails for Codex OAuth,
API-key providers, users without OPENAI_BASE_URL set). Replaced
with fallback loop through resolve_provider_client() for each
known provider, with same-provider dedup.
- vision_tools.py: Added error detection for vision capability
failures. Returns clear message to the model when the configured
model doesn't support vision, instead of a generic error.
Addresses #886
2026-03-11 19:46:47 -07:00
|
|
|
|
"""
|
2026-04-08 16:37:05 -07:00
|
|
|
|
available: List[str] = []
|
|
|
|
|
|
# 1. Active provider — if the user configured a provider, try it first.
|
2026-04-07 22:24:36 -07:00
|
|
|
|
main_provider = _read_main_provider()
|
2026-04-08 16:37:05 -07:00
|
|
|
|
if main_provider and main_provider not in ("auto", ""):
|
|
|
|
|
|
if main_provider in _VISION_AUTO_PROVIDER_ORDER:
|
|
|
|
|
|
if _strict_vision_backend_available(main_provider):
|
|
|
|
|
|
available.append(main_provider)
|
|
|
|
|
|
else:
|
|
|
|
|
|
client, _ = resolve_provider_client(main_provider, _read_main_model())
|
|
|
|
|
|
if client is not None:
|
|
|
|
|
|
available.append(main_provider)
|
|
|
|
|
|
# 2. OpenRouter, 3. Nous — skip if already covered by main provider.
|
|
|
|
|
|
for p in _VISION_AUTO_PROVIDER_ORDER:
|
|
|
|
|
|
if p not in available and _strict_vision_backend_available(p):
|
|
|
|
|
|
available.append(p)
|
2026-04-07 22:24:36 -07:00
|
|
|
|
return available
|
2026-03-14 20:22:13 -07:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def resolve_vision_provider_client(
|
|
|
|
|
|
provider: Optional[str] = None,
|
|
|
|
|
|
model: Optional[str] = None,
|
|
|
|
|
|
*,
|
2026-03-14 20:48:29 -07:00
|
|
|
|
base_url: Optional[str] = None,
|
|
|
|
|
|
api_key: Optional[str] = None,
|
2026-03-14 20:22:13 -07:00
|
|
|
|
async_mode: bool = False,
|
|
|
|
|
|
) -> Tuple[Optional[str], Optional[Any], Optional[str]]:
|
|
|
|
|
|
"""Resolve the client actually used for vision tasks.
|
|
|
|
|
|
|
2026-03-14 20:48:29 -07:00
|
|
|
|
Direct endpoint overrides take precedence over provider selection. Explicit
|
|
|
|
|
|
provider overrides still use the generic provider router for non-standard
|
|
|
|
|
|
backends, so users can intentionally force experimental providers. Auto mode
|
|
|
|
|
|
stays conservative and only tries vision backends known to work today.
|
2026-03-14 20:22:13 -07:00
|
|
|
|
"""
|
2026-04-11 13:50:43 +05:30
|
|
|
|
requested, resolved_model, resolved_base_url, resolved_api_key, resolved_api_mode = _resolve_task_provider_model(
|
2026-03-14 20:48:29 -07:00
|
|
|
|
"vision", provider, model, base_url, api_key
|
|
|
|
|
|
)
|
|
|
|
|
|
requested = _normalize_vision_provider(requested)
|
2026-03-14 20:22:13 -07:00
|
|
|
|
|
|
|
|
|
|
def _finalize(resolved_provider: str, sync_client: Any, default_model: Optional[str]):
|
|
|
|
|
|
if sync_client is None:
|
|
|
|
|
|
return resolved_provider, None, None
|
2026-03-14 20:48:29 -07:00
|
|
|
|
final_model = resolved_model or default_model
|
2026-03-14 20:22:13 -07:00
|
|
|
|
if async_mode:
|
|
|
|
|
|
async_client, async_model = _to_async_client(sync_client, final_model)
|
|
|
|
|
|
return resolved_provider, async_client, async_model
|
|
|
|
|
|
return resolved_provider, sync_client, final_model
|
|
|
|
|
|
|
2026-03-14 20:48:29 -07:00
|
|
|
|
if resolved_base_url:
|
|
|
|
|
|
client, final_model = resolve_provider_client(
|
|
|
|
|
|
"custom",
|
|
|
|
|
|
model=resolved_model,
|
|
|
|
|
|
async_mode=async_mode,
|
|
|
|
|
|
explicit_base_url=resolved_base_url,
|
|
|
|
|
|
explicit_api_key=resolved_api_key,
|
2026-04-13 16:08:19 +08:00
|
|
|
|
api_mode=resolved_api_mode,
|
2026-03-14 20:48:29 -07:00
|
|
|
|
)
|
|
|
|
|
|
if client is None:
|
|
|
|
|
|
return "custom", None, None
|
|
|
|
|
|
return "custom", client, final_model
|
|
|
|
|
|
|
2026-03-14 20:22:13 -07:00
|
|
|
|
if requested == "auto":
|
2026-04-07 22:24:36 -07:00
|
|
|
|
# Vision auto-detection order:
|
feat(auxiliary): default 'auto' routing to main model for all users (#11900)
Before: aggregator users (OpenRouter / Nous Portal) running 'auto'
routing for auxiliary tasks — compression, vision, web extraction,
session search, etc. — got routed to a cheap provider-side default
model (Gemini Flash). Non-aggregator users already got their main
model. Behavior was inconsistent and surprising — users picked
Claude / GPT / their preferred model, but side tasks ran on
Gemini Flash.
After: 'auto' means "use my main chat model" for every user,
regardless of provider type. Only when the main provider has no
working client does the fallback chain run (OpenRouter → Nous →
custom → Codex → API-key providers). Explicit per-task overrides
in config.yaml (auxiliary.<task>.provider / .model) still win —
they are a hard constraint, not subject to the auto policy.
Vision auto-detection follows the same policy: try main provider +
main model first (with _PROVIDER_VISION_MODELS overrides preserved
for providers like xiaomi and zai that ship a dedicated multimodal
model distinct from their chat model). Aggregator strict vision
backends are fallbacks, not the primary path.
Changes:
- agent/auxiliary_client.py: _resolve_auto() drops the
`_AGGREGATOR_PROVIDERS` guard. resolve_vision_provider_client()
auto branch unifies aggregator and exotic-provider paths —
everyone goes through resolve_provider_client() with main_model.
Dead _AGGREGATOR_PROVIDERS constant removed (was only used by
the guard we just removed).
- hermes_cli/main.py: aux config menu copy updated to reflect
the new semantics ("'auto' means 'use my main model'").
- tests/agent/test_auxiliary_main_first.py: 12 regression tests
covering OpenRouter/Nous/DeepSeek main paths, runtime-override
wins, explicit-config wins, vision override preservation for
exotic providers, and fallback-chain activation when the main
provider has no working client.
Co-authored-by: teknium1 <teknium@nousresearch.com>
2026-04-17 19:13:23 -07:00
|
|
|
|
# 1. User's main provider + main model (including aggregators).
|
|
|
|
|
|
# _PROVIDER_VISION_MODELS provides per-provider vision model
|
|
|
|
|
|
# overrides when the provider has a dedicated multimodal model
|
|
|
|
|
|
# that differs from the chat model (e.g. xiaomi → mimo-v2-omni,
|
2026-04-21 15:19:00 -06:00
|
|
|
|
# zai → glm-5v-turbo). Nous is the exception: it has a dedicated
|
|
|
|
|
|
# strict vision backend with tier-aware defaults, so it must not
|
|
|
|
|
|
# fall through to the user's text chat model here.
|
feat(auxiliary): default 'auto' routing to main model for all users (#11900)
Before: aggregator users (OpenRouter / Nous Portal) running 'auto'
routing for auxiliary tasks — compression, vision, web extraction,
session search, etc. — got routed to a cheap provider-side default
model (Gemini Flash). Non-aggregator users already got their main
model. Behavior was inconsistent and surprising — users picked
Claude / GPT / their preferred model, but side tasks ran on
Gemini Flash.
After: 'auto' means "use my main chat model" for every user,
regardless of provider type. Only when the main provider has no
working client does the fallback chain run (OpenRouter → Nous →
custom → Codex → API-key providers). Explicit per-task overrides
in config.yaml (auxiliary.<task>.provider / .model) still win —
they are a hard constraint, not subject to the auto policy.
Vision auto-detection follows the same policy: try main provider +
main model first (with _PROVIDER_VISION_MODELS overrides preserved
for providers like xiaomi and zai that ship a dedicated multimodal
model distinct from their chat model). Aggregator strict vision
backends are fallbacks, not the primary path.
Changes:
- agent/auxiliary_client.py: _resolve_auto() drops the
`_AGGREGATOR_PROVIDERS` guard. resolve_vision_provider_client()
auto branch unifies aggregator and exotic-provider paths —
everyone goes through resolve_provider_client() with main_model.
Dead _AGGREGATOR_PROVIDERS constant removed (was only used by
the guard we just removed).
- hermes_cli/main.py: aux config menu copy updated to reflect
the new semantics ("'auto' means 'use my main model'").
- tests/agent/test_auxiliary_main_first.py: 12 regression tests
covering OpenRouter/Nous/DeepSeek main paths, runtime-override
wins, explicit-config wins, vision override preservation for
exotic providers, and fallback-chain activation when the main
provider has no working client.
Co-authored-by: teknium1 <teknium@nousresearch.com>
2026-04-17 19:13:23 -07:00
|
|
|
|
# 2. OpenRouter (vision-capable aggregator fallback)
|
|
|
|
|
|
# 3. Nous Portal (vision-capable aggregator fallback)
|
2026-04-07 22:24:36 -07:00
|
|
|
|
# 4. Stop
|
|
|
|
|
|
main_provider = _read_main_provider()
|
|
|
|
|
|
main_model = _read_main_model()
|
|
|
|
|
|
if main_provider and main_provider not in ("auto", ""):
|
2026-04-21 15:19:00 -06:00
|
|
|
|
if main_provider == "nous":
|
|
|
|
|
|
sync_client, default_model = _resolve_strict_vision_backend(main_provider)
|
|
|
|
|
|
if sync_client is not None:
|
|
|
|
|
|
logger.info(
|
|
|
|
|
|
"Vision auto-detect: using main provider %s (%s)",
|
|
|
|
|
|
main_provider, default_model or resolved_model or main_model,
|
|
|
|
|
|
)
|
|
|
|
|
|
return _finalize(main_provider, sync_client, default_model)
|
|
|
|
|
|
else:
|
|
|
|
|
|
vision_model = _PROVIDER_VISION_MODELS.get(main_provider, main_model)
|
|
|
|
|
|
rpc_client, rpc_model = resolve_provider_client(
|
|
|
|
|
|
main_provider, vision_model,
|
|
|
|
|
|
api_mode=resolved_api_mode)
|
|
|
|
|
|
if rpc_client is not None:
|
|
|
|
|
|
logger.info(
|
|
|
|
|
|
"Vision auto-detect: using main provider %s (%s)",
|
|
|
|
|
|
main_provider, rpc_model or vision_model,
|
|
|
|
|
|
)
|
|
|
|
|
|
return _finalize(
|
|
|
|
|
|
main_provider, rpc_client, rpc_model or vision_model)
|
2026-04-08 16:37:05 -07:00
|
|
|
|
|
feat(auxiliary): default 'auto' routing to main model for all users (#11900)
Before: aggregator users (OpenRouter / Nous Portal) running 'auto'
routing for auxiliary tasks — compression, vision, web extraction,
session search, etc. — got routed to a cheap provider-side default
model (Gemini Flash). Non-aggregator users already got their main
model. Behavior was inconsistent and surprising — users picked
Claude / GPT / their preferred model, but side tasks ran on
Gemini Flash.
After: 'auto' means "use my main chat model" for every user,
regardless of provider type. Only when the main provider has no
working client does the fallback chain run (OpenRouter → Nous →
custom → Codex → API-key providers). Explicit per-task overrides
in config.yaml (auxiliary.<task>.provider / .model) still win —
they are a hard constraint, not subject to the auto policy.
Vision auto-detection follows the same policy: try main provider +
main model first (with _PROVIDER_VISION_MODELS overrides preserved
for providers like xiaomi and zai that ship a dedicated multimodal
model distinct from their chat model). Aggregator strict vision
backends are fallbacks, not the primary path.
Changes:
- agent/auxiliary_client.py: _resolve_auto() drops the
`_AGGREGATOR_PROVIDERS` guard. resolve_vision_provider_client()
auto branch unifies aggregator and exotic-provider paths —
everyone goes through resolve_provider_client() with main_model.
Dead _AGGREGATOR_PROVIDERS constant removed (was only used by
the guard we just removed).
- hermes_cli/main.py: aux config menu copy updated to reflect
the new semantics ("'auto' means 'use my main model'").
- tests/agent/test_auxiliary_main_first.py: 12 regression tests
covering OpenRouter/Nous/DeepSeek main paths, runtime-override
wins, explicit-config wins, vision override preservation for
exotic providers, and fallback-chain activation when the main
provider has no working client.
Co-authored-by: teknium1 <teknium@nousresearch.com>
2026-04-17 19:13:23 -07:00
|
|
|
|
# Fall back through aggregators (uses their dedicated vision model,
|
|
|
|
|
|
# not the user's main model) when main provider has no client.
|
2026-04-08 16:37:05 -07:00
|
|
|
|
for candidate in _VISION_AUTO_PROVIDER_ORDER:
|
|
|
|
|
|
if candidate == main_provider:
|
|
|
|
|
|
continue # already tried above
|
|
|
|
|
|
sync_client, default_model = _resolve_strict_vision_backend(candidate)
|
2026-04-07 22:24:36 -07:00
|
|
|
|
if sync_client is not None:
|
2026-04-08 16:37:05 -07:00
|
|
|
|
return _finalize(candidate, sync_client, default_model)
|
2026-04-07 22:24:36 -07:00
|
|
|
|
|
2026-03-14 20:22:13 -07:00
|
|
|
|
logger.debug("Auxiliary vision client: none available")
|
|
|
|
|
|
return None, None, None
|
|
|
|
|
|
|
|
|
|
|
|
if requested in _VISION_AUTO_PROVIDER_ORDER:
|
|
|
|
|
|
sync_client, default_model = _resolve_strict_vision_backend(requested)
|
|
|
|
|
|
return _finalize(requested, sync_client, default_model)
|
|
|
|
|
|
|
2026-04-13 16:08:19 +08:00
|
|
|
|
client, final_model = _get_cached_client(requested, resolved_model, async_mode,
|
|
|
|
|
|
api_mode=resolved_api_mode)
|
2026-03-14 20:22:13 -07:00
|
|
|
|
if client is None:
|
|
|
|
|
|
return requested, None, None
|
|
|
|
|
|
return requested, client, final_model
|
|
|
|
|
|
|
|
|
|
|
|
|
2026-02-25 18:39:36 -08:00
|
|
|
|
def get_auxiliary_extra_body() -> dict:
|
|
|
|
|
|
"""Return extra_body kwargs for auxiliary API calls.
|
|
|
|
|
|
|
|
|
|
|
|
Includes Nous Portal product tags when the auxiliary client is backed
|
|
|
|
|
|
by Nous Portal. Returns empty dict otherwise.
|
|
|
|
|
|
"""
|
|
|
|
|
|
return dict(NOUS_EXTRA_BODY) if auxiliary_is_nous else {}
|
2026-02-26 20:23:56 -08:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def auxiliary_max_tokens_param(value: int) -> dict:
|
|
|
|
|
|
"""Return the correct max tokens kwarg for the auxiliary client's provider.
|
|
|
|
|
|
|
|
|
|
|
|
OpenRouter and local models use 'max_tokens'. Direct OpenAI with newer
|
|
|
|
|
|
models (gpt-4o, o-series, gpt-5+) requires 'max_completion_tokens'.
|
2026-02-28 21:47:51 -08:00
|
|
|
|
The Codex adapter translates max_tokens internally, so we use max_tokens
|
|
|
|
|
|
for it as well.
|
2026-02-26 20:23:56 -08:00
|
|
|
|
"""
|
2026-03-14 20:58:12 -07:00
|
|
|
|
custom_base = _current_custom_base_url()
|
2026-02-26 20:23:56 -08:00
|
|
|
|
or_key = os.getenv("OPENROUTER_API_KEY")
|
2026-02-28 21:47:51 -08:00
|
|
|
|
# Only use max_completion_tokens for direct OpenAI custom endpoints
|
2026-02-26 20:23:56 -08:00
|
|
|
|
if (not or_key
|
|
|
|
|
|
and _read_nous_auth() is None
|
fix: extend hostname-match provider detection across remaining call sites
Aslaaen's fix in the original PR covered _detect_api_mode_for_url and the
two openai/xai sites in run_agent.py. This finishes the sweep: the same
substring-match false-positive class (e.g. https://api.openai.com.evil/v1,
https://proxy/api.openai.com/v1, https://api.anthropic.com.example/v1)
existed in eight more call sites, and the hostname helper was duplicated
in two modules.
- utils: add shared base_url_hostname() (single source of truth).
- hermes_cli/runtime_provider, run_agent: drop local duplicates, import
from utils. Reuse the cached AIAgent._base_url_hostname attribute
everywhere it's already populated.
- agent/auxiliary_client: switch codex-wrap auto-detect, max_completion_tokens
gate (auxiliary_max_tokens_param), and custom-endpoint max_tokens kwarg
selection to hostname equality.
- run_agent: native-anthropic check in the Claude-style model branch
and in the AIAgent init provider-auto-detect branch.
- agent/model_metadata: Anthropic /v1/models context-length lookup.
- hermes_cli/providers.determine_api_mode: anthropic / openai URL
heuristics for custom/unknown providers (the /anthropic path-suffix
convention for third-party gateways is preserved).
- tools/delegate_tool: anthropic detection for delegated subagent
runtimes.
- hermes_cli/setup, hermes_cli/tools_config: setup-wizard vision-endpoint
native-OpenAI detection (paired with deduping the repeated check into
a single is_native_openai boolean per branch).
Tests:
- tests/test_base_url_hostname.py covers the helper directly
(path-containing-host, host-suffix, trailing dot, port, case).
- tests/hermes_cli/test_determine_api_mode_hostname.py adds the same
regression class for determine_api_mode, plus a test that the
/anthropic third-party gateway convention still wins.
Also: add asslaenn5@gmail.com → Aslaaen to scripts/release.py AUTHOR_MAP.
2026-04-20 20:58:01 -07:00
|
|
|
|
and base_url_hostname(custom_base) == "api.openai.com"):
|
2026-02-26 20:23:56 -08:00
|
|
|
|
return {"max_completion_tokens": value}
|
|
|
|
|
|
return {"max_tokens": value}
|
2026-03-11 20:52:19 -07:00
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
# ── Centralized LLM Call API ────────────────────────────────────────────────
|
|
|
|
|
|
#
|
|
|
|
|
|
# call_llm() and async_call_llm() own the full request lifecycle:
|
|
|
|
|
|
# 1. Resolve provider + model from task config (or explicit args)
|
|
|
|
|
|
# 2. Get or create a cached client for that provider
|
|
|
|
|
|
# 3. Format request args for the provider + model (max_tokens handling, etc.)
|
|
|
|
|
|
# 4. Make the API call
|
|
|
|
|
|
# 5. Return the response
|
|
|
|
|
|
#
|
|
|
|
|
|
# Every auxiliary LLM consumer should use these instead of manually
|
|
|
|
|
|
# constructing clients and calling .chat.completions.create().
|
|
|
|
|
|
|
2026-04-15 13:16:28 -07:00
|
|
|
|
# Client cache: (provider, async_mode, base_url, api_key, api_mode, runtime_key) -> (client, default_model, loop)
|
|
|
|
|
|
# NOTE: loop identity is NOT part of the key. On async cache hits we check
|
|
|
|
|
|
# whether the cached loop is the *current* loop; if not, the stale entry is
|
|
|
|
|
|
# replaced in-place. This bounds cache growth to one entry per unique
|
|
|
|
|
|
# provider config rather than one per (config × event-loop), which previously
|
|
|
|
|
|
# caused unbounded fd accumulation in long-running gateway processes (#10200).
|
2026-03-11 20:52:19 -07:00
|
|
|
|
_client_cache: Dict[tuple, tuple] = {}
|
2026-03-17 02:53:33 -07:00
|
|
|
|
_client_cache_lock = threading.Lock()
|
2026-04-15 13:16:28 -07:00
|
|
|
|
_CLIENT_CACHE_MAX_SIZE = 64 # safety belt — evict oldest when exceeded
|
2026-03-11 20:52:19 -07:00
|
|
|
|
|
|
|
|
|
|
|
2026-04-21 14:45:13 -06:00
|
|
|
|
def _client_cache_key(
|
|
|
|
|
|
provider: str,
|
|
|
|
|
|
*,
|
|
|
|
|
|
async_mode: bool,
|
|
|
|
|
|
base_url: Optional[str] = None,
|
|
|
|
|
|
api_key: Optional[str] = None,
|
|
|
|
|
|
api_mode: Optional[str] = None,
|
|
|
|
|
|
main_runtime: Optional[Dict[str, Any]] = None,
|
|
|
|
|
|
) -> tuple:
|
|
|
|
|
|
runtime = _normalize_main_runtime(main_runtime)
|
|
|
|
|
|
runtime_key = tuple(runtime.get(field, "") for field in _MAIN_RUNTIME_FIELDS) if provider == "auto" else ()
|
|
|
|
|
|
return (provider, async_mode, base_url or "", api_key or "", api_mode or "", runtime_key)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def _store_cached_client(cache_key: tuple, client: Any, default_model: Optional[str], *, bound_loop: Any = None) -> None:
|
|
|
|
|
|
with _client_cache_lock:
|
|
|
|
|
|
old_entry = _client_cache.get(cache_key)
|
|
|
|
|
|
if old_entry is not None and old_entry[0] is not client:
|
|
|
|
|
|
_force_close_async_httpx(old_entry[0])
|
|
|
|
|
|
try:
|
|
|
|
|
|
close_fn = getattr(old_entry[0], "close", None)
|
|
|
|
|
|
if callable(close_fn):
|
|
|
|
|
|
close_fn()
|
|
|
|
|
|
except Exception:
|
|
|
|
|
|
pass
|
|
|
|
|
|
_client_cache[cache_key] = (client, default_model, bound_loop)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def _refresh_nous_auxiliary_client(
|
|
|
|
|
|
*,
|
|
|
|
|
|
cache_provider: str,
|
|
|
|
|
|
model: Optional[str],
|
|
|
|
|
|
async_mode: bool,
|
|
|
|
|
|
base_url: Optional[str] = None,
|
|
|
|
|
|
api_key: Optional[str] = None,
|
|
|
|
|
|
api_mode: Optional[str] = None,
|
|
|
|
|
|
main_runtime: Optional[Dict[str, Any]] = None,
|
|
|
|
|
|
) -> Tuple[Optional[Any], Optional[str]]:
|
|
|
|
|
|
"""Refresh Nous runtime creds, rebuild the client, and replace the cache entry."""
|
|
|
|
|
|
runtime = _resolve_nous_runtime_api(force_refresh=True)
|
|
|
|
|
|
if runtime is None:
|
|
|
|
|
|
return None, model
|
|
|
|
|
|
|
|
|
|
|
|
fresh_key, fresh_base_url = runtime
|
|
|
|
|
|
sync_client = OpenAI(api_key=fresh_key, base_url=fresh_base_url)
|
|
|
|
|
|
final_model = model
|
|
|
|
|
|
|
|
|
|
|
|
current_loop = None
|
|
|
|
|
|
if async_mode:
|
|
|
|
|
|
try:
|
|
|
|
|
|
import asyncio as _aio
|
|
|
|
|
|
current_loop = _aio.get_event_loop()
|
|
|
|
|
|
except RuntimeError:
|
|
|
|
|
|
pass
|
|
|
|
|
|
client, final_model = _to_async_client(sync_client, final_model or "")
|
|
|
|
|
|
else:
|
|
|
|
|
|
client = sync_client
|
|
|
|
|
|
|
|
|
|
|
|
cache_key = _client_cache_key(
|
|
|
|
|
|
cache_provider,
|
|
|
|
|
|
async_mode=async_mode,
|
|
|
|
|
|
base_url=base_url,
|
|
|
|
|
|
api_key=api_key,
|
|
|
|
|
|
api_mode=api_mode,
|
|
|
|
|
|
main_runtime=main_runtime,
|
|
|
|
|
|
)
|
|
|
|
|
|
_store_cached_client(cache_key, client, final_model, bound_loop=current_loop)
|
|
|
|
|
|
return client, final_model
|
|
|
|
|
|
|
|
|
|
|
|
|
2026-03-27 09:45:25 -07:00
|
|
|
|
def neuter_async_httpx_del() -> None:
|
|
|
|
|
|
"""Monkey-patch ``AsyncHttpxClientWrapper.__del__`` to be a no-op.
|
|
|
|
|
|
|
|
|
|
|
|
The OpenAI SDK's ``AsyncHttpxClientWrapper.__del__`` schedules
|
|
|
|
|
|
``self.aclose()`` via ``asyncio.get_running_loop().create_task()``.
|
|
|
|
|
|
When an ``AsyncOpenAI`` client is garbage-collected while
|
|
|
|
|
|
prompt_toolkit's event loop is running (the common CLI idle state),
|
|
|
|
|
|
the ``aclose()`` task runs on prompt_toolkit's loop but the
|
|
|
|
|
|
underlying TCP transport is bound to a *different* loop (the worker
|
|
|
|
|
|
thread's loop that the client was originally created on). If that
|
|
|
|
|
|
loop is closed or its thread is dead, the transport's
|
|
|
|
|
|
``self._loop.call_soon()`` raises ``RuntimeError("Event loop is
|
|
|
|
|
|
closed")``, which prompt_toolkit surfaces as "Unhandled exception
|
|
|
|
|
|
in event loop ... Press ENTER to continue...".
|
|
|
|
|
|
|
|
|
|
|
|
Neutering ``__del__`` is safe because:
|
|
|
|
|
|
- Cached clients are explicitly cleaned via ``_force_close_async_httpx``
|
|
|
|
|
|
on stale-loop detection and ``shutdown_cached_clients`` on exit.
|
|
|
|
|
|
- Uncached clients' TCP connections are cleaned up by the OS when the
|
|
|
|
|
|
process exits.
|
|
|
|
|
|
- The OpenAI SDK itself marks this as a TODO (``# TODO(someday):
|
|
|
|
|
|
support non asyncio runtimes here``).
|
|
|
|
|
|
|
|
|
|
|
|
Call this once at CLI startup, before any ``AsyncOpenAI`` clients are
|
|
|
|
|
|
created.
|
|
|
|
|
|
"""
|
|
|
|
|
|
try:
|
|
|
|
|
|
from openai._base_client import AsyncHttpxClientWrapper
|
|
|
|
|
|
AsyncHttpxClientWrapper.__del__ = lambda self: None # type: ignore[assignment]
|
|
|
|
|
|
except (ImportError, AttributeError):
|
|
|
|
|
|
pass # Graceful degradation if the SDK changes its internals
|
|
|
|
|
|
|
|
|
|
|
|
|
2026-03-22 15:31:54 -07:00
|
|
|
|
def _force_close_async_httpx(client: Any) -> None:
|
|
|
|
|
|
"""Mark the httpx AsyncClient inside an AsyncOpenAI client as closed.
|
|
|
|
|
|
|
|
|
|
|
|
This prevents ``AsyncHttpxClientWrapper.__del__`` from scheduling
|
|
|
|
|
|
``aclose()`` on a (potentially closed) event loop, which causes
|
|
|
|
|
|
``RuntimeError: Event loop is closed`` → prompt_toolkit's
|
|
|
|
|
|
"Press ENTER to continue..." handler.
|
|
|
|
|
|
|
|
|
|
|
|
We intentionally do NOT run the full async close path — the
|
|
|
|
|
|
connections will be dropped by the OS when the process exits.
|
|
|
|
|
|
"""
|
|
|
|
|
|
try:
|
|
|
|
|
|
from httpx._client import ClientState
|
|
|
|
|
|
inner = getattr(client, "_client", None)
|
|
|
|
|
|
if inner is not None and not getattr(inner, "is_closed", True):
|
|
|
|
|
|
inner._state = ClientState.CLOSED
|
|
|
|
|
|
except Exception:
|
|
|
|
|
|
pass
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def shutdown_cached_clients() -> None:
|
|
|
|
|
|
"""Close all cached clients (sync and async) to prevent event-loop errors.
|
|
|
|
|
|
|
|
|
|
|
|
Call this during CLI shutdown, *before* the event loop is closed, to
|
|
|
|
|
|
avoid ``AsyncHttpxClientWrapper.__del__`` raising on a dead loop.
|
|
|
|
|
|
"""
|
|
|
|
|
|
import inspect
|
|
|
|
|
|
|
|
|
|
|
|
with _client_cache_lock:
|
|
|
|
|
|
for key, entry in list(_client_cache.items()):
|
|
|
|
|
|
client = entry[0]
|
|
|
|
|
|
if client is None:
|
|
|
|
|
|
continue
|
|
|
|
|
|
# Mark any async httpx transport as closed first (prevents __del__
|
|
|
|
|
|
# from scheduling aclose() on a dead event loop).
|
|
|
|
|
|
_force_close_async_httpx(client)
|
|
|
|
|
|
# Sync clients: close the httpx connection pool cleanly.
|
|
|
|
|
|
# Async clients: skip — we already neutered __del__ above.
|
|
|
|
|
|
try:
|
|
|
|
|
|
close_fn = getattr(client, "close", None)
|
|
|
|
|
|
if close_fn and not inspect.iscoroutinefunction(close_fn):
|
|
|
|
|
|
close_fn()
|
|
|
|
|
|
except Exception:
|
|
|
|
|
|
pass
|
|
|
|
|
|
_client_cache.clear()
|
|
|
|
|
|
|
|
|
|
|
|
|
2026-03-27 09:45:25 -07:00
|
|
|
|
def cleanup_stale_async_clients() -> None:
|
|
|
|
|
|
"""Force-close cached async clients whose event loop is closed.
|
|
|
|
|
|
|
|
|
|
|
|
Call this after each agent turn to proactively clean up stale clients
|
|
|
|
|
|
before GC can trigger ``AsyncHttpxClientWrapper.__del__`` on them.
|
|
|
|
|
|
This is defense-in-depth — the primary fix is ``neuter_async_httpx_del``
|
|
|
|
|
|
which disables ``__del__`` entirely.
|
|
|
|
|
|
"""
|
|
|
|
|
|
with _client_cache_lock:
|
|
|
|
|
|
stale_keys = []
|
|
|
|
|
|
for key, entry in _client_cache.items():
|
|
|
|
|
|
client, _default, cached_loop = entry
|
|
|
|
|
|
if cached_loop is not None and cached_loop.is_closed():
|
|
|
|
|
|
_force_close_async_httpx(client)
|
|
|
|
|
|
stale_keys.append(key)
|
|
|
|
|
|
for key in stale_keys:
|
|
|
|
|
|
del _client_cache[key]
|
|
|
|
|
|
|
|
|
|
|
|
|
2026-04-07 18:07:08 +08:00
|
|
|
|
def _is_openrouter_client(client: Any) -> bool:
|
|
|
|
|
|
for obj in (client, getattr(client, "_client", None), getattr(client, "client", None)):
|
fix: sweep remaining provider-URL substring checks across codebase
Completes the hostname-hardening sweep — every substring check against a
provider host in live-routing code is now hostname-based. This closes the
same false-positive class for OpenRouter, GitHub Copilot, Kimi, Qwen,
ChatGPT/Codex, Bedrock, GitHub Models, Vercel AI Gateway, Nous, Z.AI,
Moonshot, Arcee, and MiniMax that the original PR closed for OpenAI, xAI,
and Anthropic.
New helper:
- utils.base_url_host_matches(base_url, domain) — safe counterpart to
'domain in base_url'. Accepts hostname equality and subdomain matches;
rejects path segments, host suffixes, and prefix collisions.
Call sites converted (real-code only; tests, optional-skills, red-teaming
scripts untouched):
run_agent.py (10 sites):
- AIAgent.__init__ Bedrock branch, ChatGPT/Codex branch (also path check)
- header cascade for openrouter / copilot / kimi / qwen / chatgpt
- interleaved-thinking trigger (openrouter + claude)
- _is_openrouter_url(), _is_qwen_portal()
- is_native_anthropic check
- github-models-vs-copilot detection (3 sites)
- reasoning-capable route gate (nousresearch, vercel, github)
- codex-backend detection in API kwargs build
- fallback api_mode Bedrock detection
agent/auxiliary_client.py (7 sites):
- extra-headers cascades in 4 distinct client-construction paths
(resolve custom, resolve auto, OpenRouter-fallback-to-custom,
_async_client_from_sync, resolve_provider_client explicit-custom,
resolve_auto_with_codex)
- _is_openrouter_client() base_url sniff
agent/usage_pricing.py:
- resolve_billing_route openrouter branch
agent/model_metadata.py:
- _is_openrouter_base_url(), Bedrock context-length lookup
hermes_cli/providers.py:
- determine_api_mode Bedrock heuristic
hermes_cli/runtime_provider.py:
- _is_openrouter_url flag for API-key preference (issues #420, #560)
hermes_cli/doctor.py:
- Kimi User-Agent header for /models probes
tools/delegate_tool.py:
- subagent Codex endpoint detection
trajectory_compressor.py:
- _detect_provider() cascade (8 providers: openrouter, nous, codex, zai,
kimi-coding, arcee, minimax-cn, minimax)
cli.py, gateway/run.py:
- /model-switch cache-enabled hint (openrouter + claude)
Bedrock detection tightened from 'bedrock-runtime in url' to
'hostname starts with bedrock-runtime. AND host is under amazonaws.com'.
ChatGPT/Codex detection tightened from 'chatgpt.com/backend-api/codex in
url' to 'hostname is chatgpt.com AND path contains /backend-api/codex'.
Tests:
- tests/test_base_url_hostname.py extended with a base_url_host_matches
suite (exact match, subdomain, path-segment rejection, host-suffix
rejection, host-prefix rejection, empty-input, case-insensitivity,
trailing dot).
Validation: 651 targeted tests pass (runtime_provider, minimax, bedrock,
gemini, auxiliary, codex_cloudflare, usage_pricing, compressor_fallback,
fallback_model, openai_client_lifecycle, provider_parity, cli_provider_resolution,
delegate, credential_pool, context_compressor, plus the 4 hostname test
modules). 26-assertion E2E call-site verification across 6 modules passes.
2026-04-20 21:17:28 -07:00
|
|
|
|
if obj and base_url_host_matches(str(getattr(obj, "base_url", "") or ""), "openrouter.ai"):
|
2026-04-07 18:07:08 +08:00
|
|
|
|
return True
|
|
|
|
|
|
return False
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def _compat_model(client: Any, model: Optional[str], cached_default: Optional[str]) -> Optional[str]:
|
|
|
|
|
|
"""Drop OpenRouter-format model slugs (with '/') for non-OpenRouter clients.
|
|
|
|
|
|
|
|
|
|
|
|
Mirrors the guard in resolve_provider_client() which is skipped on cache hits.
|
|
|
|
|
|
"""
|
|
|
|
|
|
if model and "/" in model and not _is_openrouter_client(client):
|
|
|
|
|
|
return cached_default
|
|
|
|
|
|
return model or cached_default
|
|
|
|
|
|
|
|
|
|
|
|
|
2026-03-11 20:52:19 -07:00
|
|
|
|
def _get_cached_client(
|
2026-03-14 20:48:29 -07:00
|
|
|
|
provider: str,
|
|
|
|
|
|
model: str = None,
|
|
|
|
|
|
async_mode: bool = False,
|
|
|
|
|
|
base_url: str = None,
|
|
|
|
|
|
api_key: str = None,
|
2026-04-11 13:50:43 +05:30
|
|
|
|
api_mode: str = None,
|
2026-04-12 00:10:19 -04:00
|
|
|
|
main_runtime: Optional[Dict[str, Any]] = None,
|
2026-03-11 20:52:19 -07:00
|
|
|
|
) -> Tuple[Optional[Any], Optional[str]]:
|
2026-03-25 17:31:56 -07:00
|
|
|
|
"""Get or create a cached client for the given provider.
|
|
|
|
|
|
|
|
|
|
|
|
Async clients (AsyncOpenAI) use httpx.AsyncClient internally, which
|
|
|
|
|
|
binds to the event loop that was current when the client was created.
|
|
|
|
|
|
Using such a client on a *different* loop causes deadlocks or
|
2026-04-15 13:16:28 -07:00
|
|
|
|
RuntimeError. To prevent cross-loop issues, the cache validates on
|
|
|
|
|
|
every async hit that the cached loop is the *current, open* loop.
|
|
|
|
|
|
If the loop changed (e.g. a new gateway worker-thread loop), the stale
|
|
|
|
|
|
entry is replaced in-place rather than creating an additional entry.
|
|
|
|
|
|
|
|
|
|
|
|
This keeps cache size bounded to one entry per unique provider config,
|
|
|
|
|
|
preventing the fd-exhaustion that previously occurred in long-running
|
|
|
|
|
|
gateways where recycled worker threads created unbounded entries (#10200).
|
2026-03-25 17:31:56 -07:00
|
|
|
|
"""
|
2026-04-15 13:16:28 -07:00
|
|
|
|
# Resolve the current event loop for async clients so we can validate
|
|
|
|
|
|
# cached entries. Loop identity is NOT in the cache key — instead we
|
|
|
|
|
|
# check at hit time whether the cached loop is still current and open.
|
|
|
|
|
|
# This prevents unbounded cache growth from recycled worker-thread loops
|
|
|
|
|
|
# while still guaranteeing we never reuse a client on the wrong loop
|
|
|
|
|
|
# (which causes deadlocks, see #2681).
|
2026-03-25 17:31:56 -07:00
|
|
|
|
current_loop = None
|
|
|
|
|
|
if async_mode:
|
|
|
|
|
|
try:
|
|
|
|
|
|
import asyncio as _aio
|
|
|
|
|
|
current_loop = _aio.get_event_loop()
|
|
|
|
|
|
except RuntimeError:
|
|
|
|
|
|
pass
|
2026-04-12 00:10:19 -04:00
|
|
|
|
runtime = _normalize_main_runtime(main_runtime)
|
2026-04-21 14:45:13 -06:00
|
|
|
|
cache_key = _client_cache_key(
|
|
|
|
|
|
provider,
|
|
|
|
|
|
async_mode=async_mode,
|
|
|
|
|
|
base_url=base_url,
|
|
|
|
|
|
api_key=api_key,
|
|
|
|
|
|
api_mode=api_mode,
|
|
|
|
|
|
main_runtime=main_runtime,
|
|
|
|
|
|
)
|
2026-03-17 02:53:33 -07:00
|
|
|
|
with _client_cache_lock:
|
|
|
|
|
|
if cache_key in _client_cache:
|
2026-03-20 09:44:50 -07:00
|
|
|
|
cached_client, cached_default, cached_loop = _client_cache[cache_key]
|
|
|
|
|
|
if async_mode:
|
2026-04-15 13:16:28 -07:00
|
|
|
|
# Validate: the cached client must be bound to the CURRENT,
|
|
|
|
|
|
# OPEN loop. If the loop changed or was closed, the httpx
|
|
|
|
|
|
# transport inside is dead — force-close and replace.
|
|
|
|
|
|
loop_ok = (
|
|
|
|
|
|
cached_loop is not None
|
|
|
|
|
|
and cached_loop is current_loop
|
|
|
|
|
|
and not cached_loop.is_closed()
|
|
|
|
|
|
)
|
|
|
|
|
|
if loop_ok:
|
2026-04-07 18:07:08 +08:00
|
|
|
|
effective = _compat_model(cached_client, model, cached_default)
|
|
|
|
|
|
return cached_client, effective
|
2026-04-15 13:16:28 -07:00
|
|
|
|
# Stale — evict and fall through to create a new client.
|
|
|
|
|
|
_force_close_async_httpx(cached_client)
|
|
|
|
|
|
del _client_cache[cache_key]
|
2026-03-20 09:44:50 -07:00
|
|
|
|
else:
|
2026-04-07 18:07:08 +08:00
|
|
|
|
effective = _compat_model(cached_client, model, cached_default)
|
|
|
|
|
|
return cached_client, effective
|
2026-03-17 02:53:33 -07:00
|
|
|
|
# Build outside the lock
|
2026-03-14 20:48:29 -07:00
|
|
|
|
client, default_model = resolve_provider_client(
|
|
|
|
|
|
provider,
|
|
|
|
|
|
model,
|
|
|
|
|
|
async_mode,
|
|
|
|
|
|
explicit_base_url=base_url,
|
|
|
|
|
|
explicit_api_key=api_key,
|
2026-04-11 13:50:43 +05:30
|
|
|
|
api_mode=api_mode,
|
2026-04-12 00:10:19 -04:00
|
|
|
|
main_runtime=runtime,
|
2026-03-14 20:48:29 -07:00
|
|
|
|
)
|
2026-03-11 20:52:19 -07:00
|
|
|
|
if client is not None:
|
2026-03-20 09:44:50 -07:00
|
|
|
|
# For async clients, remember which loop they were created on so we
|
|
|
|
|
|
# can detect stale entries later.
|
2026-03-25 17:31:56 -07:00
|
|
|
|
bound_loop = current_loop
|
2026-03-17 02:53:33 -07:00
|
|
|
|
with _client_cache_lock:
|
|
|
|
|
|
if cache_key not in _client_cache:
|
2026-04-15 13:16:28 -07:00
|
|
|
|
# Safety belt: if the cache has grown beyond the max, evict
|
|
|
|
|
|
# the oldest entries (FIFO — dict preserves insertion order).
|
|
|
|
|
|
while len(_client_cache) >= _CLIENT_CACHE_MAX_SIZE:
|
|
|
|
|
|
evict_key, evict_entry = next(iter(_client_cache.items()))
|
|
|
|
|
|
_force_close_async_httpx(evict_entry[0])
|
|
|
|
|
|
del _client_cache[evict_key]
|
2026-03-20 09:44:50 -07:00
|
|
|
|
_client_cache[cache_key] = (client, default_model, bound_loop)
|
2026-03-17 02:53:33 -07:00
|
|
|
|
else:
|
2026-03-20 09:44:50 -07:00
|
|
|
|
client, default_model, _ = _client_cache[cache_key]
|
2026-03-11 20:52:19 -07:00
|
|
|
|
return client, model or default_model
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def _resolve_task_provider_model(
|
|
|
|
|
|
task: str = None,
|
|
|
|
|
|
provider: str = None,
|
|
|
|
|
|
model: str = None,
|
2026-03-14 20:48:29 -07:00
|
|
|
|
base_url: str = None,
|
|
|
|
|
|
api_key: str = None,
|
2026-04-11 13:50:43 +05:30
|
|
|
|
) -> Tuple[str, Optional[str], Optional[str], Optional[str], Optional[str]]:
|
2026-03-11 20:52:19 -07:00
|
|
|
|
"""Determine provider + model for a call.
|
|
|
|
|
|
|
|
|
|
|
|
Priority:
|
2026-03-14 20:48:29 -07:00
|
|
|
|
1. Explicit provider/model/base_url/api_key args (always win)
|
2026-04-13 04:59:26 -07:00
|
|
|
|
2. Config file (auxiliary.{task}.provider/model/base_url)
|
|
|
|
|
|
3. "auto" (full auto-detection chain)
|
2026-03-11 20:52:19 -07:00
|
|
|
|
|
2026-04-11 13:50:43 +05:30
|
|
|
|
Returns (provider, model, base_url, api_key, api_mode) where model may
|
|
|
|
|
|
be None (use provider default). When base_url is set, provider is forced
|
|
|
|
|
|
to "custom" and the task uses that direct endpoint. api_mode is one of
|
|
|
|
|
|
"chat_completions", "codex_responses", or None (auto-detect).
|
2026-03-11 20:52:19 -07:00
|
|
|
|
"""
|
2026-03-14 20:48:29 -07:00
|
|
|
|
cfg_provider = None
|
|
|
|
|
|
cfg_model = None
|
|
|
|
|
|
cfg_base_url = None
|
|
|
|
|
|
cfg_api_key = None
|
2026-04-11 13:50:43 +05:30
|
|
|
|
cfg_api_mode = None
|
2026-03-11 20:52:19 -07:00
|
|
|
|
|
|
|
|
|
|
if task:
|
2026-04-20 00:44:32 -07:00
|
|
|
|
task_config = _get_auxiliary_task_config(task)
|
2026-03-14 20:48:29 -07:00
|
|
|
|
cfg_provider = str(task_config.get("provider", "")).strip() or None
|
|
|
|
|
|
cfg_model = str(task_config.get("model", "")).strip() or None
|
|
|
|
|
|
cfg_base_url = str(task_config.get("base_url", "")).strip() or None
|
|
|
|
|
|
cfg_api_key = str(task_config.get("api_key", "")).strip() or None
|
2026-04-11 13:50:43 +05:30
|
|
|
|
cfg_api_mode = str(task_config.get("api_mode", "")).strip() or None
|
2026-03-11 20:52:19 -07:00
|
|
|
|
|
2026-04-13 04:59:26 -07:00
|
|
|
|
resolved_model = model or cfg_model
|
|
|
|
|
|
resolved_api_mode = cfg_api_mode
|
2026-03-14 20:48:29 -07:00
|
|
|
|
|
|
|
|
|
|
if base_url:
|
2026-04-11 13:50:43 +05:30
|
|
|
|
return "custom", resolved_model, base_url, api_key, resolved_api_mode
|
2026-03-14 20:48:29 -07:00
|
|
|
|
if provider:
|
2026-04-11 13:50:43 +05:30
|
|
|
|
return provider, resolved_model, base_url, api_key, resolved_api_mode
|
2026-03-11 20:52:19 -07:00
|
|
|
|
|
2026-03-14 20:48:29 -07:00
|
|
|
|
if task:
|
2026-04-11 11:21:59 -07:00
|
|
|
|
# Config.yaml is the primary source for per-task overrides.
|
|
|
|
|
|
if cfg_base_url:
|
|
|
|
|
|
return "custom", resolved_model, cfg_base_url, cfg_api_key, resolved_api_mode
|
|
|
|
|
|
if cfg_provider and cfg_provider != "auto":
|
|
|
|
|
|
return cfg_provider, resolved_model, None, None, resolved_api_mode
|
|
|
|
|
|
|
2026-04-11 13:50:43 +05:30
|
|
|
|
return "auto", resolved_model, None, None, resolved_api_mode
|
2026-03-11 20:52:19 -07:00
|
|
|
|
|
2026-04-11 13:50:43 +05:30
|
|
|
|
return "auto", resolved_model, None, None, resolved_api_mode
|
2026-03-11 20:52:19 -07:00
|
|
|
|
|
|
|
|
|
|
|
2026-03-28 14:35:28 -07:00
|
|
|
|
_DEFAULT_AUX_TIMEOUT = 30.0
|
|
|
|
|
|
|
|
|
|
|
|
|
2026-04-20 00:44:32 -07:00
|
|
|
|
def _get_auxiliary_task_config(task: str) -> Dict[str, Any]:
|
|
|
|
|
|
"""Return the config dict for auxiliary.<task>, or {} when unavailable."""
|
2026-03-28 14:35:28 -07:00
|
|
|
|
if not task:
|
2026-04-20 00:44:32 -07:00
|
|
|
|
return {}
|
2026-03-28 14:35:28 -07:00
|
|
|
|
try:
|
|
|
|
|
|
from hermes_cli.config import load_config
|
|
|
|
|
|
config = load_config()
|
|
|
|
|
|
except ImportError:
|
2026-04-20 00:44:32 -07:00
|
|
|
|
return {}
|
2026-03-28 14:35:28 -07:00
|
|
|
|
aux = config.get("auxiliary", {}) if isinstance(config, dict) else {}
|
|
|
|
|
|
task_config = aux.get(task, {}) if isinstance(aux, dict) else {}
|
2026-04-20 00:44:32 -07:00
|
|
|
|
return task_config if isinstance(task_config, dict) else {}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def _get_task_timeout(task: str, default: float = _DEFAULT_AUX_TIMEOUT) -> float:
|
|
|
|
|
|
"""Read timeout from auxiliary.{task}.timeout in config, falling back to *default*."""
|
|
|
|
|
|
if not task:
|
|
|
|
|
|
return default
|
|
|
|
|
|
task_config = _get_auxiliary_task_config(task)
|
2026-03-28 14:35:28 -07:00
|
|
|
|
raw = task_config.get("timeout")
|
|
|
|
|
|
if raw is not None:
|
|
|
|
|
|
try:
|
|
|
|
|
|
return float(raw)
|
|
|
|
|
|
except (ValueError, TypeError):
|
|
|
|
|
|
pass
|
|
|
|
|
|
return default
|
|
|
|
|
|
|
|
|
|
|
|
|
2026-04-20 00:44:32 -07:00
|
|
|
|
def _get_task_extra_body(task: str) -> Dict[str, Any]:
|
|
|
|
|
|
"""Read auxiliary.<task>.extra_body and return a shallow copy when valid."""
|
|
|
|
|
|
task_config = _get_auxiliary_task_config(task)
|
|
|
|
|
|
raw = task_config.get("extra_body")
|
|
|
|
|
|
if isinstance(raw, dict):
|
|
|
|
|
|
return dict(raw)
|
|
|
|
|
|
return {}
|
|
|
|
|
|
|
|
|
|
|
|
|
2026-04-12 01:44:18 -07:00
|
|
|
|
# ---------------------------------------------------------------------------
|
|
|
|
|
|
# Anthropic-compatible endpoint detection + image block conversion
|
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
|
|
|
|
|
|
|
|
|
|
# Providers that use Anthropic-compatible endpoints (via OpenAI SDK wrapper).
|
|
|
|
|
|
# Their image content blocks must use Anthropic format, not OpenAI format.
|
|
|
|
|
|
_ANTHROPIC_COMPAT_PROVIDERS = frozenset({"minimax", "minimax-cn"})
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def _is_anthropic_compat_endpoint(provider: str, base_url: str) -> bool:
|
|
|
|
|
|
"""Detect if an endpoint expects Anthropic-format content blocks.
|
|
|
|
|
|
|
|
|
|
|
|
Returns True for known Anthropic-compatible providers (MiniMax) and
|
|
|
|
|
|
any endpoint whose URL contains ``/anthropic`` in the path.
|
|
|
|
|
|
"""
|
|
|
|
|
|
if provider in _ANTHROPIC_COMPAT_PROVIDERS:
|
|
|
|
|
|
return True
|
|
|
|
|
|
url_lower = (base_url or "").lower()
|
|
|
|
|
|
return "/anthropic" in url_lower
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def _convert_openai_images_to_anthropic(messages: list) -> list:
|
|
|
|
|
|
"""Convert OpenAI ``image_url`` content blocks to Anthropic ``image`` blocks.
|
|
|
|
|
|
|
|
|
|
|
|
Only touches messages that have list-type content with ``image_url`` blocks;
|
|
|
|
|
|
plain text messages pass through unchanged.
|
|
|
|
|
|
"""
|
|
|
|
|
|
converted = []
|
|
|
|
|
|
for msg in messages:
|
|
|
|
|
|
content = msg.get("content")
|
|
|
|
|
|
if not isinstance(content, list):
|
|
|
|
|
|
converted.append(msg)
|
|
|
|
|
|
continue
|
|
|
|
|
|
new_content = []
|
|
|
|
|
|
changed = False
|
|
|
|
|
|
for block in content:
|
|
|
|
|
|
if block.get("type") == "image_url":
|
|
|
|
|
|
image_url_val = (block.get("image_url") or {}).get("url", "")
|
|
|
|
|
|
if image_url_val.startswith("data:"):
|
|
|
|
|
|
# Parse data URI: data:<media_type>;base64,<data>
|
|
|
|
|
|
header, _, b64data = image_url_val.partition(",")
|
|
|
|
|
|
media_type = "image/png"
|
|
|
|
|
|
if ":" in header and ";" in header:
|
|
|
|
|
|
media_type = header.split(":", 1)[1].split(";", 1)[0]
|
|
|
|
|
|
new_content.append({
|
|
|
|
|
|
"type": "image",
|
|
|
|
|
|
"source": {
|
|
|
|
|
|
"type": "base64",
|
|
|
|
|
|
"media_type": media_type,
|
|
|
|
|
|
"data": b64data,
|
|
|
|
|
|
},
|
|
|
|
|
|
})
|
|
|
|
|
|
else:
|
|
|
|
|
|
# URL-based image
|
|
|
|
|
|
new_content.append({
|
|
|
|
|
|
"type": "image",
|
|
|
|
|
|
"source": {
|
|
|
|
|
|
"type": "url",
|
|
|
|
|
|
"url": image_url_val,
|
|
|
|
|
|
},
|
|
|
|
|
|
})
|
|
|
|
|
|
changed = True
|
|
|
|
|
|
else:
|
|
|
|
|
|
new_content.append(block)
|
|
|
|
|
|
converted.append({**msg, "content": new_content} if changed else msg)
|
|
|
|
|
|
return converted
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2026-03-11 20:52:19 -07:00
|
|
|
|
def _build_call_kwargs(
|
|
|
|
|
|
provider: str,
|
|
|
|
|
|
model: str,
|
|
|
|
|
|
messages: list,
|
|
|
|
|
|
temperature: Optional[float] = None,
|
|
|
|
|
|
max_tokens: Optional[int] = None,
|
|
|
|
|
|
tools: Optional[list] = None,
|
|
|
|
|
|
timeout: float = 30.0,
|
|
|
|
|
|
extra_body: Optional[dict] = None,
|
2026-03-14 20:48:29 -07:00
|
|
|
|
base_url: Optional[str] = None,
|
2026-03-11 20:52:19 -07:00
|
|
|
|
) -> dict:
|
|
|
|
|
|
"""Build kwargs for .chat.completions.create() with model/provider adjustments."""
|
|
|
|
|
|
kwargs: Dict[str, Any] = {
|
|
|
|
|
|
"model": model,
|
|
|
|
|
|
"messages": messages,
|
|
|
|
|
|
"timeout": timeout,
|
|
|
|
|
|
}
|
|
|
|
|
|
|
2026-04-20 04:18:49 +09:00
|
|
|
|
fixed_temperature = _fixed_temperature_for_model(model, base_url)
|
2026-04-20 12:23:05 -07:00
|
|
|
|
if fixed_temperature is OMIT_TEMPERATURE:
|
|
|
|
|
|
temperature = None # strip — let server choose
|
|
|
|
|
|
elif fixed_temperature is not None:
|
2026-04-17 16:17:15 -06:00
|
|
|
|
temperature = fixed_temperature
|
|
|
|
|
|
|
fix(agent): complete Claude Opus 4.7 API migration
Claude Opus 4.7 introduced several breaking API changes that the current
codebase partially handled but not completely. This patch finishes the
migration per the official migration guide at
https://platform.claude.com/docs/en/about-claude/models/migration-guide
Fixes NousResearch/hermes-agent#11137
Breaking-change coverage:
1. Adaptive thinking + output_config.effort — 4.7 is now recognized by
_supports_adaptive_thinking() (extends previous 4.6-only gate).
2. Sampling parameter stripping — 4.7 returns 400 for any non-default
temperature / top_p / top_k. build_anthropic_kwargs drops them as a
safety net; the OpenAI-protocol auxiliary path (_build_call_kwargs)
and AnthropicCompletionsAdapter.create() both early-exit before
setting temperature for 4.7+ models. This keeps flush_memories and
structured-JSON aux paths that hardcode temperature from 400ing
when the aux model is flipped to 4.7.
3. thinking.display = "summarized" — 4.7 defaults display to "omitted",
which silently hides reasoning text from Hermes's CLI activity feed
during long tool runs. Restoring "summarized" preserves 4.6 UX.
4. Effort level mapping — xhigh now maps to xhigh (was xhigh→max, which
silently over-efforted every coding/agentic request). max is now a
distinct ceiling per Anthropic's 5-level effort model.
5. New stop_reason values — refusal and model_context_window_exceeded
were silently collapsed to "stop" (end_turn) by the adapter's
stop_reason_map. Now mapped to "content_filter" and "length"
respectively, matching upstream finish-reason handling already in
bedrock_adapter.
6. Model catalogs — claude-opus-4-7 added to the Anthropic provider
list, anthropic/claude-opus-4.7 added at top of OpenRouter fallback
catalog (recommended), claude-opus-4-7 added to model_metadata
DEFAULT_CONTEXT_LENGTHS (1M, matching 4.6 per migration guide).
7. Prefill docstrings — run_agent.AIAgent and BatchRunner now document
that Anthropic Sonnet/Opus 4.6+ reject a trailing assistant-role
prefill (400).
8. Tests — 4 new tests in test_anthropic_adapter covering display
default, xhigh preservation, max on 4.7, refusal / context-overflow
stop_reason mapping, plus the sampling-param predicate. test_model_metadata
accepts 4.7 at 1M context.
Tested on macOS 15.5 (darwin). 119 tests pass in
tests/agent/test_anthropic_adapter.py, 1320 pass in tests/agent/.
2026-04-16 12:35:43 -05:00
|
|
|
|
# Opus 4.7+ rejects any non-default temperature/top_p/top_k — silently
|
|
|
|
|
|
# drop here so auxiliary callers that hardcode temperature (e.g. 0.3 on
|
|
|
|
|
|
# flush_memories, 0 on structured-JSON extraction) don't 400 the moment
|
|
|
|
|
|
# the aux model is flipped to 4.7.
|
|
|
|
|
|
if temperature is not None:
|
|
|
|
|
|
from agent.anthropic_adapter import _forbids_sampling_params
|
|
|
|
|
|
if _forbids_sampling_params(model):
|
|
|
|
|
|
temperature = None
|
|
|
|
|
|
|
2026-03-11 20:52:19 -07:00
|
|
|
|
if temperature is not None:
|
|
|
|
|
|
kwargs["temperature"] = temperature
|
|
|
|
|
|
|
|
|
|
|
|
if max_tokens is not None:
|
|
|
|
|
|
# Codex adapter handles max_tokens internally; OpenRouter/Nous use max_tokens.
|
|
|
|
|
|
# Direct OpenAI api.openai.com with newer models needs max_completion_tokens.
|
|
|
|
|
|
if provider == "custom":
|
2026-03-14 21:16:29 -07:00
|
|
|
|
custom_base = base_url or _current_custom_base_url()
|
fix: extend hostname-match provider detection across remaining call sites
Aslaaen's fix in the original PR covered _detect_api_mode_for_url and the
two openai/xai sites in run_agent.py. This finishes the sweep: the same
substring-match false-positive class (e.g. https://api.openai.com.evil/v1,
https://proxy/api.openai.com/v1, https://api.anthropic.com.example/v1)
existed in eight more call sites, and the hostname helper was duplicated
in two modules.
- utils: add shared base_url_hostname() (single source of truth).
- hermes_cli/runtime_provider, run_agent: drop local duplicates, import
from utils. Reuse the cached AIAgent._base_url_hostname attribute
everywhere it's already populated.
- agent/auxiliary_client: switch codex-wrap auto-detect, max_completion_tokens
gate (auxiliary_max_tokens_param), and custom-endpoint max_tokens kwarg
selection to hostname equality.
- run_agent: native-anthropic check in the Claude-style model branch
and in the AIAgent init provider-auto-detect branch.
- agent/model_metadata: Anthropic /v1/models context-length lookup.
- hermes_cli/providers.determine_api_mode: anthropic / openai URL
heuristics for custom/unknown providers (the /anthropic path-suffix
convention for third-party gateways is preserved).
- tools/delegate_tool: anthropic detection for delegated subagent
runtimes.
- hermes_cli/setup, hermes_cli/tools_config: setup-wizard vision-endpoint
native-OpenAI detection (paired with deduping the repeated check into
a single is_native_openai boolean per branch).
Tests:
- tests/test_base_url_hostname.py covers the helper directly
(path-containing-host, host-suffix, trailing dot, port, case).
- tests/hermes_cli/test_determine_api_mode_hostname.py adds the same
regression class for determine_api_mode, plus a test that the
/anthropic third-party gateway convention still wins.
Also: add asslaenn5@gmail.com → Aslaaen to scripts/release.py AUTHOR_MAP.
2026-04-20 20:58:01 -07:00
|
|
|
|
if base_url_hostname(custom_base) == "api.openai.com":
|
2026-03-11 20:52:19 -07:00
|
|
|
|
kwargs["max_completion_tokens"] = max_tokens
|
|
|
|
|
|
else:
|
|
|
|
|
|
kwargs["max_tokens"] = max_tokens
|
|
|
|
|
|
else:
|
|
|
|
|
|
kwargs["max_tokens"] = max_tokens
|
|
|
|
|
|
|
|
|
|
|
|
if tools:
|
|
|
|
|
|
kwargs["tools"] = tools
|
|
|
|
|
|
|
|
|
|
|
|
# Provider-specific extra_body
|
|
|
|
|
|
merged_extra = dict(extra_body or {})
|
|
|
|
|
|
if provider == "nous" or auxiliary_is_nous:
|
|
|
|
|
|
merged_extra.setdefault("tags", []).extend(["product=hermes-agent"])
|
|
|
|
|
|
if merged_extra:
|
|
|
|
|
|
kwargs["extra_body"] = merged_extra
|
|
|
|
|
|
|
|
|
|
|
|
return kwargs
|
|
|
|
|
|
|
|
|
|
|
|
|
2026-04-11 13:53:00 +05:30
|
|
|
|
def _validate_llm_response(response: Any, task: str = None) -> Any:
|
|
|
|
|
|
"""Validate that an LLM response has the expected .choices[0].message shape.
|
|
|
|
|
|
|
|
|
|
|
|
Fails fast with a clear error instead of letting malformed payloads
|
|
|
|
|
|
propagate to downstream consumers where they crash with misleading
|
|
|
|
|
|
AttributeError (e.g. "'str' object has no attribute 'choices'").
|
|
|
|
|
|
|
|
|
|
|
|
See #7264.
|
|
|
|
|
|
"""
|
|
|
|
|
|
if response is None:
|
|
|
|
|
|
raise RuntimeError(
|
|
|
|
|
|
f"Auxiliary {task or 'call'}: LLM returned None response"
|
|
|
|
|
|
)
|
|
|
|
|
|
# Allow SimpleNamespace responses from adapters (CodexAuxiliaryClient,
|
|
|
|
|
|
# AnthropicAuxiliaryClient) — they have .choices[0].message.
|
|
|
|
|
|
try:
|
|
|
|
|
|
choices = response.choices
|
|
|
|
|
|
if not choices or not hasattr(choices[0], "message"):
|
|
|
|
|
|
raise AttributeError("missing choices[0].message")
|
|
|
|
|
|
except (AttributeError, TypeError, IndexError) as exc:
|
|
|
|
|
|
response_type = type(response).__name__
|
|
|
|
|
|
response_preview = str(response)[:120]
|
|
|
|
|
|
raise RuntimeError(
|
|
|
|
|
|
f"Auxiliary {task or 'call'}: LLM returned invalid response "
|
|
|
|
|
|
f"(type={response_type}): {response_preview!r}. "
|
|
|
|
|
|
f"Expected object with .choices[0].message — check provider "
|
|
|
|
|
|
f"adapter or custom endpoint compatibility."
|
|
|
|
|
|
) from exc
|
|
|
|
|
|
return response
|
|
|
|
|
|
|
|
|
|
|
|
|
2026-03-11 20:52:19 -07:00
|
|
|
|
def call_llm(
|
|
|
|
|
|
task: str = None,
|
|
|
|
|
|
*,
|
|
|
|
|
|
provider: str = None,
|
|
|
|
|
|
model: str = None,
|
2026-03-14 20:48:29 -07:00
|
|
|
|
base_url: str = None,
|
|
|
|
|
|
api_key: str = None,
|
2026-04-12 00:10:19 -04:00
|
|
|
|
main_runtime: Optional[Dict[str, Any]] = None,
|
2026-03-11 20:52:19 -07:00
|
|
|
|
messages: list,
|
|
|
|
|
|
temperature: float = None,
|
|
|
|
|
|
max_tokens: int = None,
|
|
|
|
|
|
tools: list = None,
|
2026-03-28 14:35:28 -07:00
|
|
|
|
timeout: float = None,
|
2026-03-11 20:52:19 -07:00
|
|
|
|
extra_body: dict = None,
|
|
|
|
|
|
) -> Any:
|
|
|
|
|
|
"""Centralized synchronous LLM call.
|
|
|
|
|
|
|
|
|
|
|
|
Resolves provider + model (from task config, explicit args, or auto-detect),
|
|
|
|
|
|
handles auth, request formatting, and model-specific arg adjustments.
|
|
|
|
|
|
|
|
|
|
|
|
Args:
|
|
|
|
|
|
task: Auxiliary task name ("compression", "vision", "web_extract",
|
|
|
|
|
|
"session_search", "skills_hub", "mcp", "flush_memories").
|
|
|
|
|
|
Reads provider:model from config/env. Ignored if provider is set.
|
|
|
|
|
|
provider: Explicit provider override.
|
|
|
|
|
|
model: Explicit model override.
|
|
|
|
|
|
messages: Chat messages list.
|
|
|
|
|
|
temperature: Sampling temperature (None = provider default).
|
|
|
|
|
|
max_tokens: Max output tokens (handles max_tokens vs max_completion_tokens).
|
|
|
|
|
|
tools: Tool definitions (for function calling).
|
2026-03-28 14:35:28 -07:00
|
|
|
|
timeout: Request timeout in seconds (None = read from auxiliary.{task}.timeout config).
|
2026-03-11 20:52:19 -07:00
|
|
|
|
extra_body: Additional request body fields.
|
|
|
|
|
|
|
|
|
|
|
|
Returns:
|
|
|
|
|
|
Response object with .choices[0].message.content
|
|
|
|
|
|
|
|
|
|
|
|
Raises:
|
|
|
|
|
|
RuntimeError: If no provider is configured.
|
|
|
|
|
|
"""
|
2026-04-11 13:50:43 +05:30
|
|
|
|
resolved_provider, resolved_model, resolved_base_url, resolved_api_key, resolved_api_mode = _resolve_task_provider_model(
|
2026-03-14 20:48:29 -07:00
|
|
|
|
task, provider, model, base_url, api_key)
|
2026-04-20 00:44:32 -07:00
|
|
|
|
effective_extra_body = _get_task_extra_body(task)
|
|
|
|
|
|
effective_extra_body.update(extra_body or {})
|
2026-03-11 20:52:19 -07:00
|
|
|
|
|
2026-03-14 20:22:13 -07:00
|
|
|
|
if task == "vision":
|
|
|
|
|
|
effective_provider, client, final_model = resolve_vision_provider_client(
|
2026-04-16 19:24:21 +05:30
|
|
|
|
provider=resolved_provider if resolved_provider != "auto" else provider,
|
|
|
|
|
|
model=resolved_model or model,
|
|
|
|
|
|
base_url=resolved_base_url or base_url,
|
|
|
|
|
|
api_key=resolved_api_key or api_key,
|
2026-03-14 20:22:13 -07:00
|
|
|
|
async_mode=False,
|
|
|
|
|
|
)
|
2026-03-14 20:48:29 -07:00
|
|
|
|
if client is None and resolved_provider != "auto" and not resolved_base_url:
|
2026-03-14 20:22:13 -07:00
|
|
|
|
logger.warning(
|
|
|
|
|
|
"Vision provider %s unavailable, falling back to auto vision backends",
|
|
|
|
|
|
resolved_provider,
|
|
|
|
|
|
)
|
|
|
|
|
|
effective_provider, client, final_model = resolve_vision_provider_client(
|
|
|
|
|
|
provider="auto",
|
|
|
|
|
|
model=resolved_model,
|
|
|
|
|
|
async_mode=False,
|
|
|
|
|
|
)
|
|
|
|
|
|
if client is None:
|
|
|
|
|
|
raise RuntimeError(
|
|
|
|
|
|
f"No LLM provider configured for task={task} provider={resolved_provider}. "
|
|
|
|
|
|
f"Run: hermes setup"
|
|
|
|
|
|
)
|
|
|
|
|
|
resolved_provider = effective_provider or resolved_provider
|
|
|
|
|
|
else:
|
2026-03-14 20:48:29 -07:00
|
|
|
|
client, final_model = _get_cached_client(
|
|
|
|
|
|
resolved_provider,
|
|
|
|
|
|
resolved_model,
|
|
|
|
|
|
base_url=resolved_base_url,
|
|
|
|
|
|
api_key=resolved_api_key,
|
2026-04-11 13:50:43 +05:30
|
|
|
|
api_mode=resolved_api_mode,
|
2026-04-12 00:10:19 -04:00
|
|
|
|
main_runtime=main_runtime,
|
2026-03-14 20:48:29 -07:00
|
|
|
|
)
|
2026-03-14 20:22:13 -07:00
|
|
|
|
if client is None:
|
2026-03-22 03:59:29 -07:00
|
|
|
|
# When the user explicitly chose a non-OpenRouter provider but no
|
|
|
|
|
|
# credentials were found, fail fast instead of silently routing
|
|
|
|
|
|
# through OpenRouter (which causes confusing 404s).
|
|
|
|
|
|
_explicit = (resolved_provider or "").strip().lower()
|
|
|
|
|
|
if _explicit and _explicit not in ("auto", "openrouter", "custom"):
|
|
|
|
|
|
raise RuntimeError(
|
|
|
|
|
|
f"Provider '{_explicit}' is set in config.yaml but no API key "
|
|
|
|
|
|
f"was found. Set the {_explicit.upper()}_API_KEY environment "
|
|
|
|
|
|
f"variable, or switch to a different provider with `hermes model`."
|
|
|
|
|
|
)
|
2026-04-06 12:41:40 -07:00
|
|
|
|
# For auto/custom with no credentials, try the full auto chain
|
|
|
|
|
|
# rather than hardcoding OpenRouter (which may be depleted).
|
|
|
|
|
|
# Pass model=None so each provider uses its own default —
|
|
|
|
|
|
# resolved_model may be an OpenRouter-format slug that doesn't
|
|
|
|
|
|
# work on other providers.
|
2026-03-22 03:59:29 -07:00
|
|
|
|
if not resolved_base_url:
|
2026-04-06 12:41:40 -07:00
|
|
|
|
logger.info("Auxiliary %s: provider %s unavailable, trying auto-detection chain",
|
2026-03-29 21:29:00 -07:00
|
|
|
|
task or "call", resolved_provider)
|
2026-04-12 00:10:19 -04:00
|
|
|
|
client, final_model = _get_cached_client("auto", main_runtime=main_runtime)
|
2026-03-14 20:22:13 -07:00
|
|
|
|
if client is None:
|
|
|
|
|
|
raise RuntimeError(
|
|
|
|
|
|
f"No LLM provider configured for task={task} provider={resolved_provider}. "
|
|
|
|
|
|
f"Run: hermes setup")
|
2026-03-11 20:52:19 -07:00
|
|
|
|
|
2026-03-28 14:35:28 -07:00
|
|
|
|
effective_timeout = timeout if timeout is not None else _get_task_timeout(task)
|
|
|
|
|
|
|
2026-03-29 21:29:00 -07:00
|
|
|
|
# Log what we're about to do — makes auxiliary operations visible
|
|
|
|
|
|
_base_info = str(getattr(client, "base_url", resolved_base_url) or "")
|
|
|
|
|
|
if task:
|
|
|
|
|
|
logger.info("Auxiliary %s: using %s (%s)%s",
|
|
|
|
|
|
task, resolved_provider or "auto", final_model or "default",
|
|
|
|
|
|
f" at {_base_info}" if _base_info and "openrouter" not in _base_info else "")
|
|
|
|
|
|
|
2026-04-20 04:18:49 +09:00
|
|
|
|
# Pass the client's actual base_url (not just resolved_base_url) so
|
|
|
|
|
|
# endpoint-specific temperature overrides can distinguish
|
|
|
|
|
|
# api.moonshot.ai vs api.kimi.com/coding even on auto-detected routes.
|
2026-03-11 20:52:19 -07:00
|
|
|
|
kwargs = _build_call_kwargs(
|
|
|
|
|
|
resolved_provider, final_model, messages,
|
|
|
|
|
|
temperature=temperature, max_tokens=max_tokens,
|
2026-04-20 00:44:32 -07:00
|
|
|
|
tools=tools, timeout=effective_timeout, extra_body=effective_extra_body,
|
2026-04-20 04:18:49 +09:00
|
|
|
|
base_url=_base_info or resolved_base_url)
|
2026-03-11 20:52:19 -07:00
|
|
|
|
|
2026-04-12 01:44:18 -07:00
|
|
|
|
# Convert image blocks for Anthropic-compatible endpoints (e.g. MiniMax)
|
|
|
|
|
|
_client_base = str(getattr(client, "base_url", "") or "")
|
|
|
|
|
|
if _is_anthropic_compat_endpoint(resolved_provider, _client_base):
|
|
|
|
|
|
kwargs["messages"] = _convert_openai_images_to_anthropic(kwargs["messages"])
|
|
|
|
|
|
|
2026-04-06 12:41:40 -07:00
|
|
|
|
# Handle max_tokens vs max_completion_tokens retry, then payment fallback.
|
2026-03-11 20:52:19 -07:00
|
|
|
|
try:
|
2026-04-11 13:53:00 +05:30
|
|
|
|
return _validate_llm_response(
|
|
|
|
|
|
client.chat.completions.create(**kwargs), task)
|
2026-03-11 20:52:19 -07:00
|
|
|
|
except Exception as first_err:
|
|
|
|
|
|
err_str = str(first_err)
|
|
|
|
|
|
if "max_tokens" in err_str or "unsupported_parameter" in err_str:
|
|
|
|
|
|
kwargs.pop("max_tokens", None)
|
|
|
|
|
|
kwargs["max_completion_tokens"] = max_tokens
|
2026-04-06 12:41:40 -07:00
|
|
|
|
try:
|
2026-04-11 13:53:00 +05:30
|
|
|
|
return _validate_llm_response(
|
|
|
|
|
|
client.chat.completions.create(**kwargs), task)
|
2026-04-06 12:41:40 -07:00
|
|
|
|
except Exception as retry_err:
|
2026-04-11 12:37:53 +05:30
|
|
|
|
# If the max_tokens retry also hits a payment or connection
|
|
|
|
|
|
# error, fall through to the fallback chain below.
|
|
|
|
|
|
if not (_is_payment_error(retry_err) or _is_connection_error(retry_err)):
|
2026-04-06 12:41:40 -07:00
|
|
|
|
raise
|
|
|
|
|
|
first_err = retry_err
|
|
|
|
|
|
|
2026-04-21 14:45:13 -06:00
|
|
|
|
# ── Nous auth refresh parity with main agent ──────────────────
|
|
|
|
|
|
client_is_nous = (
|
|
|
|
|
|
resolved_provider == "nous"
|
|
|
|
|
|
or base_url_host_matches(_base_info, "inference-api.nousresearch.com")
|
|
|
|
|
|
)
|
|
|
|
|
|
if _is_auth_error(first_err) and client_is_nous:
|
|
|
|
|
|
refreshed_client, refreshed_model = _refresh_nous_auxiliary_client(
|
|
|
|
|
|
cache_provider=resolved_provider or "nous",
|
|
|
|
|
|
model=final_model,
|
|
|
|
|
|
async_mode=False,
|
|
|
|
|
|
base_url=resolved_base_url,
|
|
|
|
|
|
api_key=resolved_api_key,
|
|
|
|
|
|
api_mode=resolved_api_mode,
|
|
|
|
|
|
main_runtime=main_runtime,
|
|
|
|
|
|
)
|
|
|
|
|
|
if refreshed_client is not None:
|
|
|
|
|
|
logger.info("Auxiliary %s: refreshed Nous runtime credentials after 401, retrying",
|
|
|
|
|
|
task or "call")
|
|
|
|
|
|
if refreshed_model and refreshed_model != kwargs.get("model"):
|
|
|
|
|
|
kwargs["model"] = refreshed_model
|
|
|
|
|
|
return _validate_llm_response(
|
|
|
|
|
|
refreshed_client.chat.completions.create(**kwargs), task)
|
|
|
|
|
|
|
2026-04-06 12:41:40 -07:00
|
|
|
|
# ── Payment / credit exhaustion fallback ──────────────────────
|
|
|
|
|
|
# When the resolved provider returns 402 or a credit-related error,
|
|
|
|
|
|
# try alternative providers instead of giving up. This handles the
|
|
|
|
|
|
# common case where a user runs out of OpenRouter credits but has
|
|
|
|
|
|
# Codex OAuth or another provider available.
|
fix: model fallback — stale model on Nous login + connection error fallback (#6554)
Two bugs in the model fallback system:
1. Nous login leaves stale model in config (provider=nous, model=opus
from previous OpenRouter setup). Fixed by deferring the config.yaml
provider write until AFTER model selection completes, and passing the
selected model atomically via _update_config_for_provider's
default_model parameter. Previously, _update_config_for_provider was
called before model selection — if selection failed (free tier, no
models, exception), config stayed as nous+opus permanently.
2. Codex/stale providers in auxiliary fallback can't connect but block
the auto-detection chain. Added _is_connection_error() detection
(APIConnectionError, APITimeoutError, DNS failures, connection
refused) alongside the existing _is_payment_error() check in
call_llm(). When a provider endpoint is unreachable, the system now
falls back to the next available provider instead of crashing.
2026-04-09 10:38:53 -07:00
|
|
|
|
#
|
|
|
|
|
|
# ── Connection error fallback ────────────────────────────────
|
|
|
|
|
|
# When a provider endpoint is unreachable (DNS failure, connection
|
|
|
|
|
|
# refused, timeout), try alternative providers. This handles stale
|
|
|
|
|
|
# Codex/OAuth tokens that authenticate but whose endpoint is down,
|
|
|
|
|
|
# and providers the user never configured that got picked up by
|
|
|
|
|
|
# the auto-detection chain.
|
|
|
|
|
|
should_fallback = _is_payment_error(first_err) or _is_connection_error(first_err)
|
2026-04-11 12:37:53 +05:30
|
|
|
|
# Only try alternative providers when the user didn't explicitly
|
|
|
|
|
|
# configure this task's provider. Explicit provider = hard constraint;
|
|
|
|
|
|
# auto (the default) = best-effort fallback chain. (#7559)
|
|
|
|
|
|
is_auto = resolved_provider in ("auto", "", None)
|
|
|
|
|
|
if should_fallback and is_auto:
|
fix: model fallback — stale model on Nous login + connection error fallback (#6554)
Two bugs in the model fallback system:
1. Nous login leaves stale model in config (provider=nous, model=opus
from previous OpenRouter setup). Fixed by deferring the config.yaml
provider write until AFTER model selection completes, and passing the
selected model atomically via _update_config_for_provider's
default_model parameter. Previously, _update_config_for_provider was
called before model selection — if selection failed (free tier, no
models, exception), config stayed as nous+opus permanently.
2. Codex/stale providers in auxiliary fallback can't connect but block
the auto-detection chain. Added _is_connection_error() detection
(APIConnectionError, APITimeoutError, DNS failures, connection
refused) alongside the existing _is_payment_error() check in
call_llm(). When a provider endpoint is unreachable, the system now
falls back to the next available provider instead of crashing.
2026-04-09 10:38:53 -07:00
|
|
|
|
reason = "payment error" if _is_payment_error(first_err) else "connection error"
|
|
|
|
|
|
logger.info("Auxiliary %s: %s on %s (%s), trying fallback",
|
|
|
|
|
|
task or "call", reason, resolved_provider, first_err)
|
2026-04-06 12:41:40 -07:00
|
|
|
|
fb_client, fb_model, fb_label = _try_payment_fallback(
|
2026-04-11 12:37:53 +05:30
|
|
|
|
resolved_provider, task, reason=reason)
|
2026-04-06 12:41:40 -07:00
|
|
|
|
if fb_client is not None:
|
|
|
|
|
|
fb_kwargs = _build_call_kwargs(
|
|
|
|
|
|
fb_label, fb_model, messages,
|
|
|
|
|
|
temperature=temperature, max_tokens=max_tokens,
|
|
|
|
|
|
tools=tools, timeout=effective_timeout,
|
2026-04-20 00:44:32 -07:00
|
|
|
|
extra_body=effective_extra_body,
|
2026-04-20 04:18:49 +09:00
|
|
|
|
base_url=str(getattr(fb_client, "base_url", "") or ""))
|
2026-04-11 13:53:00 +05:30
|
|
|
|
return _validate_llm_response(
|
|
|
|
|
|
fb_client.chat.completions.create(**fb_kwargs), task)
|
2026-03-11 20:52:19 -07:00
|
|
|
|
raise
|
|
|
|
|
|
|
|
|
|
|
|
|
2026-03-27 15:28:19 -07:00
|
|
|
|
def extract_content_or_reasoning(response) -> str:
|
|
|
|
|
|
"""Extract content from an LLM response, falling back to reasoning fields.
|
|
|
|
|
|
|
|
|
|
|
|
Mirrors the main agent loop's behavior when a reasoning model (DeepSeek-R1,
|
|
|
|
|
|
Qwen-QwQ, etc.) returns ``content=None`` with reasoning in structured fields.
|
|
|
|
|
|
|
|
|
|
|
|
Resolution order:
|
|
|
|
|
|
1. ``message.content`` — strip inline think/reasoning blocks, check for
|
|
|
|
|
|
remaining non-whitespace text.
|
|
|
|
|
|
2. ``message.reasoning`` / ``message.reasoning_content`` — direct
|
|
|
|
|
|
structured reasoning fields (DeepSeek, Moonshot, Novita, etc.).
|
|
|
|
|
|
3. ``message.reasoning_details`` — OpenRouter unified array format.
|
|
|
|
|
|
|
|
|
|
|
|
Returns the best available text, or ``""`` if nothing found.
|
|
|
|
|
|
"""
|
|
|
|
|
|
import re
|
|
|
|
|
|
|
|
|
|
|
|
msg = response.choices[0].message
|
|
|
|
|
|
content = (msg.content or "").strip()
|
|
|
|
|
|
|
|
|
|
|
|
if content:
|
|
|
|
|
|
# Strip inline think/reasoning blocks (mirrors _strip_think_blocks)
|
|
|
|
|
|
cleaned = re.sub(
|
2026-04-12 12:38:24 -07:00
|
|
|
|
r"<(?:think|thinking|reasoning|thought|REASONING_SCRATCHPAD)>"
|
2026-03-27 15:28:19 -07:00
|
|
|
|
r".*?"
|
2026-04-12 12:38:24 -07:00
|
|
|
|
r"</(?:think|thinking|reasoning|thought|REASONING_SCRATCHPAD)>",
|
2026-03-27 15:28:19 -07:00
|
|
|
|
"", content, flags=re.DOTALL | re.IGNORECASE,
|
|
|
|
|
|
).strip()
|
|
|
|
|
|
if cleaned:
|
|
|
|
|
|
return cleaned
|
|
|
|
|
|
|
|
|
|
|
|
# Content is empty or reasoning-only — try structured reasoning fields
|
|
|
|
|
|
reasoning_parts: list[str] = []
|
|
|
|
|
|
for field in ("reasoning", "reasoning_content"):
|
|
|
|
|
|
val = getattr(msg, field, None)
|
|
|
|
|
|
if val and isinstance(val, str) and val.strip() and val not in reasoning_parts:
|
|
|
|
|
|
reasoning_parts.append(val.strip())
|
|
|
|
|
|
|
|
|
|
|
|
details = getattr(msg, "reasoning_details", None)
|
|
|
|
|
|
if details and isinstance(details, list):
|
|
|
|
|
|
for detail in details:
|
|
|
|
|
|
if isinstance(detail, dict):
|
|
|
|
|
|
summary = (
|
|
|
|
|
|
detail.get("summary")
|
|
|
|
|
|
or detail.get("content")
|
|
|
|
|
|
or detail.get("text")
|
|
|
|
|
|
)
|
|
|
|
|
|
if summary and summary not in reasoning_parts:
|
|
|
|
|
|
reasoning_parts.append(summary.strip() if isinstance(summary, str) else str(summary))
|
|
|
|
|
|
|
|
|
|
|
|
if reasoning_parts:
|
|
|
|
|
|
return "\n\n".join(reasoning_parts)
|
|
|
|
|
|
|
|
|
|
|
|
return ""
|
|
|
|
|
|
|
|
|
|
|
|
|
2026-03-11 20:52:19 -07:00
|
|
|
|
async def async_call_llm(
|
|
|
|
|
|
task: str = None,
|
|
|
|
|
|
*,
|
|
|
|
|
|
provider: str = None,
|
|
|
|
|
|
model: str = None,
|
2026-03-14 20:48:29 -07:00
|
|
|
|
base_url: str = None,
|
|
|
|
|
|
api_key: str = None,
|
2026-03-11 20:52:19 -07:00
|
|
|
|
messages: list,
|
|
|
|
|
|
temperature: float = None,
|
|
|
|
|
|
max_tokens: int = None,
|
|
|
|
|
|
tools: list = None,
|
2026-03-28 14:35:28 -07:00
|
|
|
|
timeout: float = None,
|
2026-03-11 20:52:19 -07:00
|
|
|
|
extra_body: dict = None,
|
|
|
|
|
|
) -> Any:
|
|
|
|
|
|
"""Centralized asynchronous LLM call.
|
|
|
|
|
|
|
|
|
|
|
|
Same as call_llm() but async. See call_llm() for full documentation.
|
|
|
|
|
|
"""
|
2026-04-11 13:50:43 +05:30
|
|
|
|
resolved_provider, resolved_model, resolved_base_url, resolved_api_key, resolved_api_mode = _resolve_task_provider_model(
|
2026-03-14 20:48:29 -07:00
|
|
|
|
task, provider, model, base_url, api_key)
|
2026-04-20 00:44:32 -07:00
|
|
|
|
effective_extra_body = _get_task_extra_body(task)
|
|
|
|
|
|
effective_extra_body.update(extra_body or {})
|
2026-03-11 20:52:19 -07:00
|
|
|
|
|
2026-03-14 20:22:13 -07:00
|
|
|
|
if task == "vision":
|
|
|
|
|
|
effective_provider, client, final_model = resolve_vision_provider_client(
|
2026-04-16 19:24:21 +05:30
|
|
|
|
provider=resolved_provider if resolved_provider != "auto" else provider,
|
|
|
|
|
|
model=resolved_model or model,
|
|
|
|
|
|
base_url=resolved_base_url or base_url,
|
|
|
|
|
|
api_key=resolved_api_key or api_key,
|
2026-03-14 20:22:13 -07:00
|
|
|
|
async_mode=True,
|
|
|
|
|
|
)
|
2026-03-14 20:48:29 -07:00
|
|
|
|
if client is None and resolved_provider != "auto" and not resolved_base_url:
|
2026-03-14 20:22:13 -07:00
|
|
|
|
logger.warning(
|
|
|
|
|
|
"Vision provider %s unavailable, falling back to auto vision backends",
|
|
|
|
|
|
resolved_provider,
|
|
|
|
|
|
)
|
|
|
|
|
|
effective_provider, client, final_model = resolve_vision_provider_client(
|
|
|
|
|
|
provider="auto",
|
|
|
|
|
|
model=resolved_model,
|
|
|
|
|
|
async_mode=True,
|
|
|
|
|
|
)
|
|
|
|
|
|
if client is None:
|
|
|
|
|
|
raise RuntimeError(
|
|
|
|
|
|
f"No LLM provider configured for task={task} provider={resolved_provider}. "
|
|
|
|
|
|
f"Run: hermes setup"
|
|
|
|
|
|
)
|
|
|
|
|
|
resolved_provider = effective_provider or resolved_provider
|
|
|
|
|
|
else:
|
|
|
|
|
|
client, final_model = _get_cached_client(
|
2026-03-14 20:48:29 -07:00
|
|
|
|
resolved_provider,
|
|
|
|
|
|
resolved_model,
|
|
|
|
|
|
async_mode=True,
|
|
|
|
|
|
base_url=resolved_base_url,
|
|
|
|
|
|
api_key=resolved_api_key,
|
2026-04-11 13:50:43 +05:30
|
|
|
|
api_mode=resolved_api_mode,
|
2026-03-14 20:48:29 -07:00
|
|
|
|
)
|
2026-03-14 20:22:13 -07:00
|
|
|
|
if client is None:
|
2026-03-22 03:59:29 -07:00
|
|
|
|
_explicit = (resolved_provider or "").strip().lower()
|
|
|
|
|
|
if _explicit and _explicit not in ("auto", "openrouter", "custom"):
|
|
|
|
|
|
raise RuntimeError(
|
|
|
|
|
|
f"Provider '{_explicit}' is set in config.yaml but no API key "
|
|
|
|
|
|
f"was found. Set the {_explicit.upper()}_API_KEY environment "
|
|
|
|
|
|
f"variable, or switch to a different provider with `hermes model`."
|
|
|
|
|
|
)
|
|
|
|
|
|
if not resolved_base_url:
|
2026-04-11 12:37:53 +05:30
|
|
|
|
logger.info("Auxiliary %s: provider %s unavailable, trying auto-detection chain",
|
|
|
|
|
|
task or "call", resolved_provider)
|
|
|
|
|
|
client, final_model = _get_cached_client("auto", async_mode=True)
|
2026-03-14 20:22:13 -07:00
|
|
|
|
if client is None:
|
|
|
|
|
|
raise RuntimeError(
|
|
|
|
|
|
f"No LLM provider configured for task={task} provider={resolved_provider}. "
|
|
|
|
|
|
f"Run: hermes setup")
|
2026-03-11 20:52:19 -07:00
|
|
|
|
|
2026-03-28 14:35:28 -07:00
|
|
|
|
effective_timeout = timeout if timeout is not None else _get_task_timeout(task)
|
|
|
|
|
|
|
2026-04-20 04:18:49 +09:00
|
|
|
|
# Pass the client's actual base_url (not just resolved_base_url) so
|
|
|
|
|
|
# endpoint-specific temperature overrides can distinguish
|
|
|
|
|
|
# api.moonshot.ai vs api.kimi.com/coding even on auto-detected routes.
|
|
|
|
|
|
_client_base = str(getattr(client, "base_url", "") or "")
|
2026-03-11 20:52:19 -07:00
|
|
|
|
kwargs = _build_call_kwargs(
|
|
|
|
|
|
resolved_provider, final_model, messages,
|
|
|
|
|
|
temperature=temperature, max_tokens=max_tokens,
|
2026-04-20 00:44:32 -07:00
|
|
|
|
tools=tools, timeout=effective_timeout, extra_body=effective_extra_body,
|
2026-04-20 04:18:49 +09:00
|
|
|
|
base_url=_client_base or resolved_base_url)
|
2026-03-11 20:52:19 -07:00
|
|
|
|
|
2026-04-12 01:44:18 -07:00
|
|
|
|
# Convert image blocks for Anthropic-compatible endpoints (e.g. MiniMax)
|
|
|
|
|
|
if _is_anthropic_compat_endpoint(resolved_provider, _client_base):
|
|
|
|
|
|
kwargs["messages"] = _convert_openai_images_to_anthropic(kwargs["messages"])
|
|
|
|
|
|
|
2026-03-11 20:52:19 -07:00
|
|
|
|
try:
|
2026-04-11 13:53:00 +05:30
|
|
|
|
return _validate_llm_response(
|
|
|
|
|
|
await client.chat.completions.create(**kwargs), task)
|
2026-03-11 20:52:19 -07:00
|
|
|
|
except Exception as first_err:
|
|
|
|
|
|
err_str = str(first_err)
|
|
|
|
|
|
if "max_tokens" in err_str or "unsupported_parameter" in err_str:
|
|
|
|
|
|
kwargs.pop("max_tokens", None)
|
|
|
|
|
|
kwargs["max_completion_tokens"] = max_tokens
|
2026-04-11 12:37:53 +05:30
|
|
|
|
try:
|
2026-04-11 13:53:00 +05:30
|
|
|
|
return _validate_llm_response(
|
|
|
|
|
|
await client.chat.completions.create(**kwargs), task)
|
2026-04-11 12:37:53 +05:30
|
|
|
|
except Exception as retry_err:
|
|
|
|
|
|
# If the max_tokens retry also hits a payment or connection
|
|
|
|
|
|
# error, fall through to the fallback chain below.
|
|
|
|
|
|
if not (_is_payment_error(retry_err) or _is_connection_error(retry_err)):
|
|
|
|
|
|
raise
|
|
|
|
|
|
first_err = retry_err
|
|
|
|
|
|
|
2026-04-21 14:45:13 -06:00
|
|
|
|
# ── Nous auth refresh parity with main agent ──────────────────
|
|
|
|
|
|
client_is_nous = (
|
|
|
|
|
|
resolved_provider == "nous"
|
|
|
|
|
|
or base_url_host_matches(_client_base, "inference-api.nousresearch.com")
|
|
|
|
|
|
)
|
|
|
|
|
|
if _is_auth_error(first_err) and client_is_nous:
|
|
|
|
|
|
refreshed_client, refreshed_model = _refresh_nous_auxiliary_client(
|
|
|
|
|
|
cache_provider=resolved_provider or "nous",
|
|
|
|
|
|
model=final_model,
|
|
|
|
|
|
async_mode=True,
|
|
|
|
|
|
base_url=resolved_base_url,
|
|
|
|
|
|
api_key=resolved_api_key,
|
|
|
|
|
|
api_mode=resolved_api_mode,
|
|
|
|
|
|
)
|
|
|
|
|
|
if refreshed_client is not None:
|
|
|
|
|
|
logger.info("Auxiliary %s (async): refreshed Nous runtime credentials after 401, retrying",
|
|
|
|
|
|
task or "call")
|
|
|
|
|
|
if refreshed_model and refreshed_model != kwargs.get("model"):
|
|
|
|
|
|
kwargs["model"] = refreshed_model
|
|
|
|
|
|
return _validate_llm_response(
|
|
|
|
|
|
await refreshed_client.chat.completions.create(**kwargs), task)
|
|
|
|
|
|
|
2026-04-11 12:37:53 +05:30
|
|
|
|
# ── Payment / connection fallback (mirrors sync call_llm) ─────
|
|
|
|
|
|
should_fallback = _is_payment_error(first_err) or _is_connection_error(first_err)
|
|
|
|
|
|
is_auto = resolved_provider in ("auto", "", None)
|
|
|
|
|
|
if should_fallback and is_auto:
|
|
|
|
|
|
reason = "payment error" if _is_payment_error(first_err) else "connection error"
|
|
|
|
|
|
logger.info("Auxiliary %s (async): %s on %s (%s), trying fallback",
|
|
|
|
|
|
task or "call", reason, resolved_provider, first_err)
|
|
|
|
|
|
fb_client, fb_model, fb_label = _try_payment_fallback(
|
|
|
|
|
|
resolved_provider, task, reason=reason)
|
|
|
|
|
|
if fb_client is not None:
|
|
|
|
|
|
fb_kwargs = _build_call_kwargs(
|
|
|
|
|
|
fb_label, fb_model, messages,
|
|
|
|
|
|
temperature=temperature, max_tokens=max_tokens,
|
|
|
|
|
|
tools=tools, timeout=effective_timeout,
|
2026-04-20 00:44:32 -07:00
|
|
|
|
extra_body=effective_extra_body,
|
2026-04-20 04:18:49 +09:00
|
|
|
|
base_url=str(getattr(fb_client, "base_url", "") or ""))
|
2026-04-11 12:37:53 +05:30
|
|
|
|
# Convert sync fallback client to async
|
|
|
|
|
|
async_fb, async_fb_model = _to_async_client(fb_client, fb_model or "")
|
|
|
|
|
|
if async_fb_model and async_fb_model != fb_kwargs.get("model"):
|
|
|
|
|
|
fb_kwargs["model"] = async_fb_model
|
2026-04-11 13:53:00 +05:30
|
|
|
|
return _validate_llm_response(
|
|
|
|
|
|
await async_fb.chat.completions.create(**fb_kwargs), task)
|
2026-03-11 20:52:19 -07:00
|
|
|
|
raise
|