agent/auxiliary_client.py

"""Shared auxiliary client router for side tasks.

Provides a single resolution chain so every consumer (context compression,
session search, web extraction, vision analysis, browser vision) picks up
the best available backend without duplicating fallback logic.

Resolution order for text tasks (auto mode):
  1. OpenRouter  (OPENROUTER_API_KEY)
  2. Nous Portal (~/.hermes/auth.json active provider)
  3. Custom endpoint (config.yaml model.base_url + OPENAI_API_KEY)
  4. Codex OAuth (Responses API via chatgpt.com with gpt-5.3-codex,
     wrapped to look like a chat.completions client)
  5. Native Anthropic
  6. Direct API-key providers (z.ai/GLM, Kimi/Moonshot, MiniMax, MiniMax-CN)
  7. None

Resolution order for vision/multimodal tasks (auto mode):
  1. Selected main provider, if it is one of the supported vision backends below
  2. OpenRouter
  3. Nous Portal
  4. Codex OAuth (gpt-5.3-codex supports vision via Responses API)
  5. Native Anthropic
  6. Custom endpoint (for local vision models: Qwen-VL, LLaVA, Pixtral, etc.)
  7. None

Per-task overrides are configured in config.yaml under the ``auxiliary:`` section
(e.g. ``auxiliary.vision.provider``, ``auxiliary.compression.model``).
Default "auto" follows the chains above.

Payment / credit exhaustion fallback:
  When a resolved provider returns HTTP 402 or a credit-related error,
  call_llm() automatically retries with the next available provider in the
  auto-detection chain.  This handles the common case where a user depletes
  their OpenRouter balance but has Codex OAuth or another provider available.
"""

import json
import logging
import os
import threading
import time
from pathlib import Path  # noqa: F401 — used by test mocks
from types import SimpleNamespace
from typing import Any, Dict, List, Optional, Tuple
from urllib.parse import urlparse, parse_qs, urlunparse

from openai import OpenAI

from agent.credential_pool import load_pool
from hermes_cli.config import get_hermes_home
from hermes_constants import OPENROUTER_BASE_URL
from utils import base_url_host_matches, base_url_hostname, normalize_proxy_env_vars

logger = logging.getLogger(__name__)


def _extract_url_query_params(url: str):
    """Extract query params from URL, return (clean_url, default_query dict or None)."""
    parsed = urlparse(url)
    if parsed.query:
        clean = urlunparse(parsed._replace(query=""))
        params = {k: v[0] for k, v in parse_qs(parsed.query).items()}
        return clean, params
    return url, None


# Module-level flag: only warn once per process about stale OPENAI_BASE_URL.
_stale_base_url_warned = False

_PROVIDER_ALIASES = {
    "google": "gemini",
    "google-gemini": "gemini",
    "google-ai-studio": "gemini",
    "x-ai": "xai",
    "x.ai": "xai",
    "grok": "xai",
    "glm": "zai",
    "z-ai": "zai",
    "z.ai": "zai",
    "zhipu": "zai",
    "kimi": "kimi-coding",
    "moonshot": "kimi-coding",
    "kimi-cn": "kimi-coding-cn",
    "moonshot-cn": "kimi-coding-cn",
    "gmi-cloud": "gmi",
    "gmicloud": "gmi",
    "minimax-china": "minimax-cn",
    "minimax_cn": "minimax-cn",
    "claude": "anthropic",
    "claude-code": "anthropic",
    "github": "copilot",
    "github-copilot": "copilot",
    "github-model": "copilot",
    "github-models": "copilot",
    "github-copilot-acp": "copilot-acp",
    "copilot-acp-agent": "copilot-acp",
}


def _normalize_aux_provider(provider: Optional[str]) -> str:
    normalized = (provider or "auto").strip().lower()
    if normalized.startswith("custom:"):
        suffix = normalized.split(":", 1)[1].strip()
        if not suffix:
            return "custom"
        normalized = suffix
    if normalized == "codex":
        return "openai-codex"
    if normalized == "main":
        # Resolve to the user's actual main provider so named custom providers
        # and non-aggregator providers (DeepSeek, Alibaba, etc.) work correctly.
        main_prov = (_read_main_provider() or "").strip().lower()
        if main_prov and main_prov not in ("auto", "main", ""):
            normalized = main_prov
        else:
            return "custom"
    return _PROVIDER_ALIASES.get(normalized, normalized)


# Sentinel: when returned by _fixed_temperature_for_model(), callers must
# strip the ``temperature`` key from API kwargs entirely so the provider's
# server-side default applies.  Kimi/Moonshot models manage temperature
# internally — sending *any* value (even the "correct" one) can conflict
# with gateway-side mode selection (thinking → 1.0, non-thinking → 0.6).
OMIT_TEMPERATURE: object = object()


def _is_kimi_model(model: Optional[str]) -> bool:
    """True for any Kimi / Moonshot model that manages temperature server-side."""
    bare = (model or "").strip().lower().rsplit("/", 1)[-1]
    return bare.startswith("kimi-") or bare == "kimi"


def _fixed_temperature_for_model(
    model: Optional[str],
    base_url: Optional[str] = None,
) -> "Optional[float] | object":
    """Return a temperature directive for models with strict contracts.

    Returns:
        ``OMIT_TEMPERATURE`` — caller must remove the ``temperature`` key so the
            provider chooses its own default.  Used for all Kimi / Moonshot
            models whose gateway selects temperature server-side.
        ``float`` — a specific value the caller must use (reserved for future
            models with fixed-temperature contracts).
        ``None`` — no override; caller should use its own default.
    """
    if _is_kimi_model(model):
        logger.debug("Omitting temperature for Kimi model %r (server-managed)", model)
        return OMIT_TEMPERATURE
    return None

# Default auxiliary models for direct API-key providers (cheap/fast for side tasks)
_API_KEY_PROVIDER_AUX_MODELS: Dict[str, str] = {
    "gemini": "gemini-3-flash-preview",
    "zai": "glm-4.5-flash",
    "kimi-coding": "kimi-k2-turbo-preview",
    "stepfun": "step-3.5-flash",
    "kimi-coding-cn": "kimi-k2-turbo-preview",
    "gmi": "google/gemini-3.1-flash-lite-preview",
    "minimax": "MiniMax-M2.7",
    "minimax-cn": "MiniMax-M2.7",
    "anthropic": "claude-haiku-4-5-20251001",
    "ai-gateway": "google/gemini-3-flash",
    "opencode-zen": "gemini-3-flash",
    "opencode-go": "glm-5",
    "kilocode": "google/gemini-3-flash-preview",
    "ollama-cloud": "nemotron-3-nano:30b",
}

# Vision-specific model overrides for direct providers.
# When the user's main provider has a dedicated vision/multimodal model that
# differs from their main chat model, map it here.  The vision auto-detect
# "exotic provider" branch checks this before falling back to the main model.
_PROVIDER_VISION_MODELS: Dict[str, str] = {
    "xiaomi": "mimo-v2.5",
    "zai": "glm-5v-turbo",
}

# OpenRouter app attribution headers
_OR_HEADERS = {
    "HTTP-Referer": "https://hermes-agent.nousresearch.com",
    "X-OpenRouter-Title": "Hermes Agent",
    "X-OpenRouter-Categories": "productivity,cli-agent",
}

# Vercel AI Gateway app attribution headers. HTTP-Referer maps to
# referrerUrl and X-Title maps to appName in the gateway's analytics.
from hermes_cli import __version__ as _HERMES_VERSION

_AI_GATEWAY_HEADERS = {
    "HTTP-Referer": "https://hermes-agent.nousresearch.com",
    "X-Title": "Hermes Agent",
    "User-Agent": f"HermesAgent/{_HERMES_VERSION}",
}

# Nous Portal extra_body for product attribution.
# Callers should pass this as extra_body in chat.completions.create()
# when the auxiliary client is backed by Nous Portal.
NOUS_EXTRA_BODY = {"tags": ["product=hermes-agent"]}

# Set at resolve time — True if the auxiliary client points to Nous Portal
auxiliary_is_nous: bool = False

# Default auxiliary models per provider
_OPENROUTER_MODEL = "google/gemini-3-flash-preview"
_NOUS_MODEL = "google/gemini-3-flash-preview"
_NOUS_DEFAULT_BASE_URL = "https://inference-api.nousresearch.com/v1"
_ANTHROPIC_DEFAULT_BASE_URL = "https://api.anthropic.com"
_AUTH_JSON_PATH = get_hermes_home() / "auth.json"

# Codex fallback: uses the Responses API (the only endpoint the Codex
# OAuth token can access) with a fast model for auxiliary tasks.
# ChatGPT-backed Codex accounts currently reject gpt-5.3-codex for these
# auxiliary flows, while gpt-5.2-codex remains broadly available and supports
# vision via Responses.
_CODEX_AUX_MODEL = "gpt-5.2-codex"
_CODEX_AUX_BASE_URL = "https://chatgpt.com/backend-api/codex"


def _codex_cloudflare_headers(access_token: str) -> Dict[str, str]:
    """Headers required to avoid Cloudflare 403s on chatgpt.com/backend-api/codex.

    The Cloudflare layer in front of the Codex endpoint whitelists a small set of
    first-party originators (``codex_cli_rs``, ``codex_vscode``, ``codex_sdk_ts``,
    anything starting with ``Codex``). Requests from non-residential IPs (VPS,
    server-hosted agents) that don't advertise an allowed originator are served
    a 403 with ``cf-mitigated: challenge`` regardless of auth correctness.

    We pin ``originator: codex_cli_rs`` to match the upstream codex-rs CLI, set
    ``User-Agent`` to a codex_cli_rs-shaped string (beats SDK fingerprinting),
    and extract ``ChatGPT-Account-ID`` (canonical casing, from codex-rs
    ``auth.rs``) out of the OAuth JWT's ``chatgpt_account_id`` claim.

    Malformed tokens are tolerated — we drop the account-ID header rather than
    raise, so a bad token still surfaces as an auth error (401) instead of a
    crash at client construction.
    """
    headers = {
        "User-Agent": "codex_cli_rs/0.0.0 (Hermes Agent)",
        "originator": "codex_cli_rs",
    }
    if not isinstance(access_token, str) or not access_token.strip():
        return headers
    try:
        import base64
        parts = access_token.split(".")
        if len(parts) < 2:
            return headers
        payload_b64 = parts[1] + "=" * (-len(parts[1]) % 4)
        claims = json.loads(base64.urlsafe_b64decode(payload_b64))
        acct_id = claims.get("https://api.openai.com/auth", {}).get("chatgpt_account_id")
        if isinstance(acct_id, str) and acct_id:
            headers["ChatGPT-Account-ID"] = acct_id
    except Exception:
        pass
    return headers


def _to_openai_base_url(base_url: str) -> str:
    """Normalize an Anthropic-style base URL to OpenAI-compatible format.

    Some providers (MiniMax, MiniMax-CN) expose an ``/anthropic`` endpoint for
    the Anthropic Messages API and a separate ``/v1`` endpoint for OpenAI chat
    completions.  The auxiliary client uses the OpenAI SDK, so it must hit the
    ``/v1`` surface.  Passing the raw ``inference_base_url`` causes requests to
    land on ``/anthropic/chat/completions`` — a 404.
    """
    url = str(base_url or "").strip().rstrip("/")
    if url.endswith("/anthropic"):
        rewritten = url[: -len("/anthropic")] + "/v1"
        logger.debug("Auxiliary client: rewrote base URL %s → %s", url, rewritten)
        return rewritten
    return url


def _select_pool_entry(provider: str) -> Tuple[bool, Optional[Any]]:
    """Return (pool_exists_for_provider, selected_entry)."""
    try:
        pool = load_pool(provider)
    except Exception as exc:
        logger.debug("Auxiliary client: could not load pool for %s: %s", provider, exc)
        return False, None
    if not pool or not pool.has_credentials():
        return False, None
    try:
        return True, pool.select()
    except Exception as exc:
        logger.debug("Auxiliary client: could not select pool entry for %s: %s", provider, exc)
        return True, None


def _pool_runtime_api_key(entry: Any) -> str:
    if entry is None:
        return ""
    # Use the PooledCredential.runtime_api_key property which handles
    # provider-specific fallback (e.g. agent_key for nous).
    key = getattr(entry, "runtime_api_key", None) or getattr(entry, "access_token", "")
    return str(key or "").strip()


def _pool_runtime_base_url(entry: Any, fallback: str = "") -> str:
    if entry is None:
        return str(fallback or "").strip().rstrip("/")
    # runtime_base_url handles provider-specific logic (e.g. nous prefers inference_base_url).
    # Fall back through inference_base_url and base_url for non-PooledCredential entries.
    url = (
        getattr(entry, "runtime_base_url", None)
        or getattr(entry, "inference_base_url", None)
        or getattr(entry, "base_url", None)
        or fallback
    )
    return str(url or "").strip().rstrip("/")


# ── Codex Responses → chat.completions adapter ─────────────────────────────
# All auxiliary consumers call client.chat.completions.create(**kwargs) and
# read response.choices[0].message.content. This adapter translates those
# calls to the Codex Responses API so callers don't need any changes.


def _convert_content_for_responses(content: Any) -> Any:
    """Convert chat.completions content to Responses API format.

    chat.completions uses:
      {"type": "text", "text": "..."}
      {"type": "image_url", "image_url": {"url": "data:image/png;base64,..."}}

    Responses API uses:
      {"type": "input_text", "text": "..."}
      {"type": "input_image", "image_url": "data:image/png;base64,..."}

    If content is a plain string, it's returned as-is (the Responses API
    accepts strings directly for text-only messages).
    """
    if isinstance(content, str):
        return content
    if not isinstance(content, list):
        return str(content) if content else ""

    converted: List[Dict[str, Any]] = []
    for part in content:
        if not isinstance(part, dict):
            continue
        ptype = part.get("type", "")
        if ptype == "text":
            converted.append({"type": "input_text", "text": part.get("text", "")})
        elif ptype == "image_url":
            # chat.completions nests the URL: {"image_url": {"url": "..."}}
            image_data = part.get("image_url", {})
            url = image_data.get("url", "") if isinstance(image_data, dict) else str(image_data)
            entry: Dict[str, Any] = {"type": "input_image", "image_url": url}
            # Preserve detail if specified
            detail = image_data.get("detail") if isinstance(image_data, dict) else None
            if detail:
                entry["detail"] = detail
            converted.append(entry)
        elif ptype in ("input_text", "input_image"):
            # Already in Responses format — pass through
            converted.append(part)
        else:
            # Unknown content type — try to preserve as text
            text = part.get("text", "")
            if text:
                converted.append({"type": "input_text", "text": text})

    return converted or ""


class _CodexCompletionsAdapter:
    """Drop-in shim that accepts chat.completions.create() kwargs and
    routes them through the Codex Responses streaming API."""

    def __init__(self, real_client: OpenAI, model: str):
        self._client = real_client
        self._model = model

    def create(self, **kwargs) -> Any:
        messages = kwargs.get("messages", [])
        model = kwargs.get("model", self._model)

        # Separate system/instructions from conversation messages.
        # Convert chat.completions multimodal content blocks to Responses
        # API format (input_text / input_image instead of text / image_url).
        instructions = "You are a helpful assistant."
        input_msgs: List[Dict[str, Any]] = []
        for msg in messages:
            role = msg.get("role", "user")
            content = msg.get("content") or ""
            if role == "system":
                instructions = content if isinstance(content, str) else str(content)
            else:
                input_msgs.append({
                    "role": role,
                    "content": _convert_content_for_responses(content),
                })

        resp_kwargs: Dict[str, Any] = {
            "model": model,
            "instructions": instructions,
            "input": input_msgs or [{"role": "user", "content": ""}],
            "store": False,
        }

        # Note: the Codex endpoint (chatgpt.com/backend-api/codex) does NOT
        # support max_output_tokens or temperature — omit to avoid 400 errors.

        # Tools support for auxiliary callers (e.g. skills_hub) that pass function schemas
        tools = kwargs.get("tools")
        if tools:
            converted = []
            for t in tools:
                fn = t.get("function", {}) if isinstance(t, dict) else {}
                name = fn.get("name")
                if not name:
                    continue
                converted.append({
                    "type": "function",
                    "name": name,
                    "description": fn.get("description", ""),
                    "parameters": fn.get("parameters", {}),
                })
            if converted:
                resp_kwargs["tools"] = converted

        # Stream and collect the response
        text_parts: List[str] = []
        tool_calls_raw: List[Any] = []
        usage = None

        try:
            # Collect output items and text deltas during streaming —
            # the Codex backend can return empty response.output from
            # get_final_response() even when items were streamed.
            collected_output_items: List[Any] = []
            collected_text_deltas: List[str] = []
            has_function_calls = False
            with self._client.responses.stream(**resp_kwargs) as stream:
                for _event in stream:
                    _etype = getattr(_event, "type", "")
                    if _etype == "response.output_item.done":
                        _done = getattr(_event, "item", None)
                        if _done is not None:
                            collected_output_items.append(_done)
                    elif "output_text.delta" in _etype:
                        _delta = getattr(_event, "delta", "")
                        if _delta:
                            collected_text_deltas.append(_delta)
                    elif "function_call" in _etype:
                        has_function_calls = True
                final = stream.get_final_response()

            # Backfill empty output from collected stream events
            _output = getattr(final, "output", None)
            if isinstance(_output, list) and not _output:
                if collected_output_items:
                    final.output = list(collected_output_items)
                    logger.debug(
                        "Codex auxiliary: backfilled %d output items from stream events",
                        len(collected_output_items),
                    )
                elif collected_text_deltas and not has_function_calls:
                    # Only synthesize text when no tool calls were streamed —
                    # a function_call response with incidental text should not
                    # be collapsed into a plain-text message.
                    assembled = "".join(collected_text_deltas)
                    final.output = [SimpleNamespace(
                        type="message", role="assistant", status="completed",
                        content=[SimpleNamespace(type="output_text", text=assembled)],
                    )]
                    logger.debug(
                        "Codex auxiliary: synthesized from %d deltas (%d chars)",
                        len(collected_text_deltas), len(assembled),
                    )

            # Extract text and tool calls from the Responses output.
            # Items may be SDK objects (attrs) or dicts (raw/fallback paths),
            # so use a helper that handles both shapes.
            def _item_get(obj: Any, key: str, default: Any = None) -> Any:
                val = getattr(obj, key, None)
                if val is None and isinstance(obj, dict):
                    val = obj.get(key, default)
                return val if val is not None else default

            for item in getattr(final, "output", []):
                item_type = _item_get(item, "type")
                if item_type == "message":
                    for part in (_item_get(item, "content") or []):
                        ptype = _item_get(part, "type")
                        if ptype in ("output_text", "text"):
                            text_parts.append(_item_get(part, "text", ""))
                elif item_type == "function_call":
                    tool_calls_raw.append(SimpleNamespace(
                        id=_item_get(item, "call_id", ""),
                        type="function",
                        function=SimpleNamespace(
                            name=_item_get(item, "name", ""),
                            arguments=_item_get(item, "arguments", "{}"),
                        ),
                    ))

            resp_usage = getattr(final, "usage", None)
            if resp_usage:
                usage = SimpleNamespace(
                    prompt_tokens=getattr(resp_usage, "input_tokens", 0),
                    completion_tokens=getattr(resp_usage, "output_tokens", 0),
                    total_tokens=getattr(resp_usage, "total_tokens", 0),
                )
        except Exception as exc:
            logger.debug("Codex auxiliary Responses API call failed: %s", exc)
            raise

        content = "".join(text_parts).strip() or None

        # Build a response that looks like chat.completions
        message = SimpleNamespace(
            role="assistant",
            content=content,
            tool_calls=tool_calls_raw or None,
        )
        choice = SimpleNamespace(
            index=0,
            message=message,
            finish_reason="stop" if not tool_calls_raw else "tool_calls",
        )
        return SimpleNamespace(
            choices=[choice],
            model=model,
            usage=usage,
        )


class _CodexChatShim:
    """Wraps the adapter to provide client.chat.completions.create()."""

    def __init__(self, adapter: _CodexCompletionsAdapter):
        self.completions = adapter


class CodexAuxiliaryClient:
    """OpenAI-client-compatible wrapper that routes through Codex Responses API.

    Consumers can call client.chat.completions.create(**kwargs) as normal.
    Also exposes .api_key and .base_url for introspection by async wrappers.
    """

    def __init__(self, real_client: OpenAI, model: str):
        self._real_client = real_client
        adapter = _CodexCompletionsAdapter(real_client, model)
        self.chat = _CodexChatShim(adapter)
        self.api_key = real_client.api_key
        self.base_url = real_client.base_url

    def close(self):
        self._real_client.close()


class _AsyncCodexCompletionsAdapter:
    """Async version of the Codex Responses adapter.

    Wraps the sync adapter via asyncio.to_thread() so async consumers
    (web_tools, session_search) can await it as normal.
    """

    def __init__(self, sync_adapter: _CodexCompletionsAdapter):
        self._sync = sync_adapter

    async def create(self, **kwargs) -> Any:
        import asyncio
        return await asyncio.to_thread(self._sync.create, **kwargs)


class _AsyncCodexChatShim:
    def __init__(self, adapter: _AsyncCodexCompletionsAdapter):
        self.completions = adapter


class AsyncCodexAuxiliaryClient:
    """Async-compatible wrapper matching AsyncOpenAI.chat.completions.create()."""

    def __init__(self, sync_wrapper: "CodexAuxiliaryClient"):
        sync_adapter = sync_wrapper.chat.completions
        async_adapter = _AsyncCodexCompletionsAdapter(sync_adapter)
        self.chat = _AsyncCodexChatShim(async_adapter)
        self.api_key = sync_wrapper.api_key
        self.base_url = sync_wrapper.base_url


class _AnthropicCompletionsAdapter:
    """OpenAI-client-compatible adapter for Anthropic Messages API."""

    def __init__(self, real_client: Any, model: str, is_oauth: bool = False):
        self._client = real_client
        self._model = model
        self._is_oauth = is_oauth

    def create(self, **kwargs) -> Any:
        from agent.anthropic_adapter import build_anthropic_kwargs
        from agent.transports import get_transport

        messages = kwargs.get("messages", [])
        model = kwargs.get("model", self._model)
        tools = kwargs.get("tools")
        tool_choice = kwargs.get("tool_choice")
        max_tokens = kwargs.get("max_tokens") or kwargs.get("max_completion_tokens") or 2000
        temperature = kwargs.get("temperature")

        normalized_tool_choice = None
        if isinstance(tool_choice, str):
            normalized_tool_choice = tool_choice
        elif isinstance(tool_choice, dict):
            choice_type = str(tool_choice.get("type", "")).lower()
            if choice_type == "function":
                normalized_tool_choice = tool_choice.get("function", {}).get("name")
            elif choice_type in {"auto", "required", "none"}:
                normalized_tool_choice = choice_type

        anthropic_kwargs = build_anthropic_kwargs(
            model=model,
            messages=messages,
            tools=tools,
            max_tokens=max_tokens,
            reasoning_config=None,
            tool_choice=normalized_tool_choice,
            is_oauth=self._is_oauth,
        )
        # Opus 4.7+ rejects any non-default temperature/top_p/top_k; only set
        # temperature for models that still accept it. build_anthropic_kwargs
        # additionally strips these keys as a safety net — keep both layers.
        if temperature is not None:
            from agent.anthropic_adapter import _forbids_sampling_params
            if not _forbids_sampling_params(model):
                anthropic_kwargs["temperature"] = temperature

        response = self._client.messages.create(**anthropic_kwargs)
        _transport = get_transport("anthropic_messages")
        _nr = _transport.normalize_response(
            response, strip_tool_prefix=self._is_oauth
        )

        # ToolCall already duck-types as OpenAI shape (.type, .function.name,
        # .function.arguments) via properties, so no wrapping needed.
        assistant_message = SimpleNamespace(
            content=_nr.content,
            tool_calls=_nr.tool_calls,
            reasoning=_nr.reasoning,
        )
        finish_reason = _nr.finish_reason

        usage = None
        if hasattr(response, "usage") and response.usage:
            prompt_tokens = getattr(response.usage, "input_tokens", 0) or 0
            completion_tokens = getattr(response.usage, "output_tokens", 0) or 0
            total_tokens = getattr(response.usage, "total_tokens", 0) or (prompt_tokens + completion_tokens)
            usage = SimpleNamespace(
                prompt_tokens=prompt_tokens,
                completion_tokens=completion_tokens,
                total_tokens=total_tokens,
            )

        choice = SimpleNamespace(
            index=0,
            message=assistant_message,
            finish_reason=finish_reason,
        )
        return SimpleNamespace(
            choices=[choice],
            model=model,
            usage=usage,
        )


class _AnthropicChatShim:
    def __init__(self, adapter: _AnthropicCompletionsAdapter):
        self.completions = adapter


class AnthropicAuxiliaryClient:
    """OpenAI-client-compatible wrapper over a native Anthropic client."""

    def __init__(self, real_client: Any, model: str, api_key: str, base_url: str, is_oauth: bool = False):
        self._real_client = real_client
        adapter = _AnthropicCompletionsAdapter(real_client, model, is_oauth=is_oauth)
        self.chat = _AnthropicChatShim(adapter)
        self.api_key = api_key
        self.base_url = base_url

    def close(self):
        close_fn = getattr(self._real_client, "close", None)
        if callable(close_fn):
            close_fn()


class _AsyncAnthropicCompletionsAdapter:
    def __init__(self, sync_adapter: _AnthropicCompletionsAdapter):
        self._sync = sync_adapter

    async def create(self, **kwargs) -> Any:
        import asyncio
        return await asyncio.to_thread(self._sync.create, **kwargs)


class _AsyncAnthropicChatShim:
    def __init__(self, adapter: _AsyncAnthropicCompletionsAdapter):
        self.completions = adapter


class AsyncAnthropicAuxiliaryClient:
    def __init__(self, sync_wrapper: "AnthropicAuxiliaryClient"):
        sync_adapter = sync_wrapper.chat.completions
        async_adapter = _AsyncAnthropicCompletionsAdapter(sync_adapter)
        self.chat = _AsyncAnthropicChatShim(async_adapter)
        self.api_key = sync_wrapper.api_key
        self.base_url = sync_wrapper.base_url


def _read_nous_auth() -> Optional[dict]:
    """Read and validate ~/.hermes/auth.json for an active Nous provider.

    Returns the provider state dict if Nous is active with tokens,
    otherwise None.
    """
    pool_present, entry = _select_pool_entry("nous")
    if pool_present:
        if entry is None:
            return None
        return {
            "access_token": getattr(entry, "access_token", ""),
            "refresh_token": getattr(entry, "refresh_token", None),
            "agent_key": getattr(entry, "agent_key", None),
            "inference_base_url": _pool_runtime_base_url(entry, _NOUS_DEFAULT_BASE_URL),
            "portal_base_url": getattr(entry, "portal_base_url", None),
            "client_id": getattr(entry, "client_id", None),
            "scope": getattr(entry, "scope", None),
            "token_type": getattr(entry, "token_type", "Bearer"),
            "source": "pool",
        }

    try:
        if not _AUTH_JSON_PATH.is_file():
            return None
        data = json.loads(_AUTH_JSON_PATH.read_text())
        if data.get("active_provider") != "nous":
            return None
        provider = data.get("providers", {}).get("nous", {})
        # Must have at least an access_token or agent_key
        if not provider.get("agent_key") and not provider.get("access_token"):
            return None
        return provider
    except Exception as exc:
        logger.debug("Could not read Nous auth: %s", exc)
        return None


def _nous_api_key(provider: dict) -> str:
    """Extract the best API key from a Nous provider state dict."""
    return provider.get("agent_key") or provider.get("access_token", "")


def _nous_base_url() -> str:
    """Resolve the Nous inference base URL from env or default."""
    return os.getenv("NOUS_INFERENCE_BASE_URL", _NOUS_DEFAULT_BASE_URL)


def _resolve_nous_runtime_api(*, force_refresh: bool = False) -> Optional[tuple[str, str]]:
    """Return fresh Nous runtime credentials when available.

    This mirrors the main agent's 401 recovery path and keeps auxiliary
    clients aligned with the singleton auth store + mint flow instead of
    relying only on whatever raw tokens happen to be sitting in auth.json
    or the credential pool.
    """
    try:
        from hermes_cli.auth import resolve_nous_runtime_credentials

        creds = resolve_nous_runtime_credentials(
            min_key_ttl_seconds=max(60, int(os.getenv("HERMES_NOUS_MIN_KEY_TTL_SECONDS", "1800"))),
            timeout_seconds=float(os.getenv("HERMES_NOUS_TIMEOUT_SECONDS", "15")),
            force_mint=force_refresh,
        )
    except Exception as exc:
        logger.debug("Auxiliary Nous runtime credential resolution failed: %s", exc)
        return None

    api_key = str(creds.get("api_key") or "").strip()
    base_url = str(creds.get("base_url") or "").strip().rstrip("/")
    if not api_key or not base_url:
        return None
    return api_key, base_url


def _read_codex_access_token() -> Optional[str]:
    """Read a valid, non-expired Codex OAuth access token from Hermes auth store.

    If a credential pool exists but currently has no selectable runtime entry
    (for example all pool slots are marked exhausted), fall back to the
    profile's auth.json token instead of hard-failing. This keeps explicit
    fallback-to-Codex working when the pool state is stale but the stored OAuth
    token is still valid.
    """
    pool_present, entry = _select_pool_entry("openai-codex")
    if pool_present:
        token = _pool_runtime_api_key(entry)
        if token:
            return token

    try:
        from hermes_cli.auth import _read_codex_tokens
        data = _read_codex_tokens()
        tokens = data.get("tokens", {})
        access_token = tokens.get("access_token")
        if not isinstance(access_token, str) or not access_token.strip():
            return None

        # Check JWT expiry — expired tokens block the auto chain and
        # prevent fallback to working providers (e.g. Anthropic).
        try:
            import base64
            payload = access_token.split(".")[1]
            payload += "=" * (-len(payload) % 4)
            claims = json.loads(base64.urlsafe_b64decode(payload))
            exp = claims.get("exp", 0)
            if exp and time.time() > exp:
                logger.debug("Codex access token expired (exp=%s), skipping", exp)
                return None
        except Exception:
            pass  # Non-JWT token or decode error — use as-is

        return access_token.strip()
    except Exception as exc:
        logger.debug("Could not read Codex auth for auxiliary client: %s", exc)
        return None


def _resolve_api_key_provider() -> Tuple[Optional[OpenAI], Optional[str]]:
    """Try each API-key provider in PROVIDER_REGISTRY order.

    Returns (client, model) for the first provider with usable runtime
    credentials, or (None, None) if none are configured.
    """
    try:
        from hermes_cli.auth import PROVIDER_REGISTRY, resolve_api_key_provider_credentials
    except ImportError:
        logger.debug("Could not import PROVIDER_REGISTRY for API-key fallback")
        return None, None

    for provider_id, pconfig in PROVIDER_REGISTRY.items():
        if pconfig.auth_type != "api_key":
            continue
        if provider_id == "anthropic":
            # Only try anthropic when the user has explicitly configured it.
            # Without this gate, Claude Code credentials get silently used
            # as auxiliary fallback when the user's primary provider fails.
            try:
                from hermes_cli.auth import is_provider_explicitly_configured
                if not is_provider_explicitly_configured("anthropic"):
                    continue
            except ImportError:
                pass
            return _try_anthropic()

        pool_present, entry = _select_pool_entry(provider_id)
        if pool_present:
            api_key = _pool_runtime_api_key(entry)
            if not api_key:
                continue

            base_url = _to_openai_base_url(
                _pool_runtime_base_url(entry, pconfig.inference_base_url) or pconfig.inference_base_url
            )
            model = _API_KEY_PROVIDER_AUX_MODELS.get(provider_id)
            if model is None:
                continue  # skip provider if we don't know a valid aux model
            logger.debug("Auxiliary text client: %s (%s) via pool", pconfig.name, model)
            if provider_id == "gemini":
                from agent.gemini_native_adapter import GeminiNativeClient, is_native_gemini_base_url

                if is_native_gemini_base_url(base_url):
                    return GeminiNativeClient(api_key=api_key, base_url=base_url), model
            extra = {}
            if base_url_host_matches(base_url, "api.kimi.com"):
                extra["default_headers"] = {"User-Agent": "claude-code/0.1.0"}
            elif base_url_host_matches(base_url, "api.githubcopilot.com"):
                from hermes_cli.models import copilot_default_headers

                extra["default_headers"] = copilot_default_headers()
            return OpenAI(api_key=api_key, base_url=base_url, **extra), model

        creds = resolve_api_key_provider_credentials(provider_id)
        api_key = str(creds.get("api_key", "")).strip()
        if not api_key:
            continue

        base_url = _to_openai_base_url(
            str(creds.get("base_url", "")).strip().rstrip("/") or pconfig.inference_base_url
        )
        model = _API_KEY_PROVIDER_AUX_MODELS.get(provider_id)
        if model is None:
            continue  # skip provider if we don't know a valid aux model
        logger.debug("Auxiliary text client: %s (%s)", pconfig.name, model)
        if provider_id == "gemini":
            from agent.gemini_native_adapter import GeminiNativeClient, is_native_gemini_base_url

            if is_native_gemini_base_url(base_url):
                return GeminiNativeClient(api_key=api_key, base_url=base_url), model
        extra = {}
        if base_url_host_matches(base_url, "api.kimi.com"):
            extra["default_headers"] = {"User-Agent": "claude-code/0.1.0"}
        elif base_url_host_matches(base_url, "api.githubcopilot.com"):
            from hermes_cli.models import copilot_default_headers

            extra["default_headers"] = copilot_default_headers()
        return OpenAI(api_key=api_key, base_url=base_url, **extra), model

    return None, None


# ── Provider resolution helpers ─────────────────────────────────────────────


def _try_openrouter() -> Tuple[Optional[OpenAI], Optional[str]]:
    pool_present, entry = _select_pool_entry("openrouter")
    if pool_present:
        or_key = _pool_runtime_api_key(entry)
        if not or_key:
            return None, None
        base_url = _pool_runtime_base_url(entry, OPENROUTER_BASE_URL) or OPENROUTER_BASE_URL
        logger.debug("Auxiliary client: OpenRouter via pool")
        return OpenAI(api_key=or_key, base_url=base_url,
                       default_headers=_OR_HEADERS), _OPENROUTER_MODEL

    or_key = os.getenv("OPENROUTER_API_KEY")
    if not or_key:
        return None, None
    logger.debug("Auxiliary client: OpenRouter")
    return OpenAI(api_key=or_key, base_url=OPENROUTER_BASE_URL,
                   default_headers=_OR_HEADERS), _OPENROUTER_MODEL


def _describe_openrouter_unavailable() -> str:
    """Return a more precise OpenRouter auth failure reason for logs."""
    pool_present, entry = _select_pool_entry("openrouter")
    if pool_present:
        if entry is None:
            return "OpenRouter credential pool has no usable entries (credentials may be exhausted)"
        if not _pool_runtime_api_key(entry):
            return "OpenRouter credential pool entry is missing a runtime API key"
    if not str(os.getenv("OPENROUTER_API_KEY") or "").strip():
        return "OPENROUTER_API_KEY not set"
    return "no usable OpenRouter credentials found"


def _try_nous(vision: bool = False) -> Tuple[Optional[OpenAI], Optional[str]]:
    # Check cross-session rate limit guard before attempting Nous —
    # if another session already recorded a 429, skip Nous entirely
    # to avoid piling more requests onto the tapped RPH bucket.
    try:
        from agent.nous_rate_guard import nous_rate_limit_remaining
        _remaining = nous_rate_limit_remaining()
        if _remaining is not None and _remaining > 0:
            logger.debug(
                "Auxiliary: skipping Nous Portal (rate-limited, resets in %.0fs)",
                _remaining,
            )
            return None, None
    except Exception:
        pass

    nous = _read_nous_auth()
    runtime = _resolve_nous_runtime_api(force_refresh=False)
    if runtime is None and not nous:
        return None, None
    global auxiliary_is_nous
    auxiliary_is_nous = True
    logger.debug("Auxiliary client: Nous Portal")

    # Ask the Portal which model it currently recommends for this task type.
    # The /api/nous/recommended-models endpoint is the authoritative source:
    # it distinguishes paid vs free tier recommendations, and get_nous_recommended_aux_model
    # auto-detects the caller's tier via check_nous_free_tier().  Fall back to
    # _NOUS_MODEL (google/gemini-3-flash-preview) when the Portal is unreachable
    # or returns a null recommendation for this task type.
    model = _NOUS_MODEL
    try:
        from hermes_cli.models import get_nous_recommended_aux_model
        recommended = get_nous_recommended_aux_model(vision=vision)
        if recommended:
            model = recommended
            logger.debug(
                "Auxiliary/%s: using Portal-recommended model %s",
                "vision" if vision else "text", model,
            )
        else:
            logger.debug(
                "Auxiliary/%s: no Portal recommendation, falling back to %s",
                "vision" if vision else "text", model,
            )
    except Exception as exc:
        logger.debug(
            "Auxiliary/%s: recommended-models lookup failed (%s); "
            "falling back to %s",
            "vision" if vision else "text", exc, model,
        )

    if runtime is not None:
        api_key, base_url = runtime
    else:
        api_key = _nous_api_key(nous or {})
        base_url = str((nous or {}).get("inference_base_url") or _nous_base_url()).rstrip("/")
    return (
        OpenAI(
            api_key=api_key,
            base_url=base_url,
        ),
        model,
    )


def _read_main_model() -> str:
    """Read the user's configured main model from config.yaml.

    config.yaml model.default is the single source of truth for the active
    model. Environment variables are no longer consulted.
    """
    try:
        from hermes_cli.config import load_config
        cfg = load_config()
        model_cfg = cfg.get("model", {})
        if isinstance(model_cfg, str) and model_cfg.strip():
            return model_cfg.strip()
        if isinstance(model_cfg, dict):
            default = model_cfg.get("default", "")
            if isinstance(default, str) and default.strip():
                return default.strip()
    except Exception:
        pass
    return ""


def _read_main_provider() -> str:
    """Read the user's configured main provider from config.yaml.

    Returns the lowercase provider id (e.g. "alibaba", "openrouter") or ""
    if not configured.
    """
    try:
        from hermes_cli.config import load_config
        cfg = load_config()
        model_cfg = cfg.get("model", {})
        if isinstance(model_cfg, dict):
            provider = model_cfg.get("provider", "")
            if isinstance(provider, str) and provider.strip():
                return provider.strip().lower()
    except Exception:
        pass
    return ""


def _resolve_custom_runtime() -> Tuple[Optional[str], Optional[str], Optional[str]]:
    """Resolve the active custom/main endpoint the same way the main CLI does.

    This covers both env-driven OPENAI_BASE_URL setups and config-saved custom
    endpoints where the base URL lives in config.yaml instead of the live
    environment.
    """
    try:
        from hermes_cli.runtime_provider import resolve_runtime_provider

        runtime = resolve_runtime_provider(requested="custom")
    except Exception as exc:
        logger.debug("Auxiliary client: custom runtime resolution failed: %s", exc)
        runtime = None

    if not isinstance(runtime, dict):
        openai_base = os.getenv("OPENAI_BASE_URL", "").strip().rstrip("/")
        openai_key = os.getenv("OPENAI_API_KEY", "").strip()
        if not openai_base:
            return None, None, None
        runtime = {
            "base_url": openai_base,
            "api_key": openai_key,
        }

    custom_base = runtime.get("base_url")
    custom_key = runtime.get("api_key")
    custom_mode = runtime.get("api_mode")
    if not isinstance(custom_base, str) or not custom_base.strip():
        return None, None, None

    custom_base = custom_base.strip().rstrip("/")
    if base_url_host_matches(custom_base, "openrouter.ai"):
        # requested='custom' falls back to OpenRouter when no custom endpoint is
        # configured. Treat that as "no custom endpoint" for auxiliary routing.
        return None, None, None

    # Local servers (Ollama, llama.cpp, vLLM, LM Studio) don't require auth.
    # Use a placeholder key — the OpenAI SDK requires a non-empty string but
    # local servers ignore the Authorization header.  Same fix as cli.py
    # _ensure_runtime_credentials() (PR #2556).
    if not isinstance(custom_key, str) or not custom_key.strip():
        custom_key = "no-key-required"

    if not isinstance(custom_mode, str) or not custom_mode.strip():
        custom_mode = None

    return custom_base, custom_key.strip(), custom_mode


def _current_custom_base_url() -> str:
    custom_base, _, _ = _resolve_custom_runtime()
    return custom_base or ""


def _validate_proxy_env_urls() -> None:
    """Fail fast with a clear error when proxy env vars have malformed URLs.

    Common cause: shell config (e.g. .zshrc) with a typo like
    ``export HTTP_PROXY=http://127.0.0.1:6153export NEXT_VAR=...``
    which concatenates 'export' into the port number.  Without this
    check the OpenAI/httpx client raises a cryptic ``Invalid port``
    error that doesn't name the offending env var.
    """
    from urllib.parse import urlparse

    normalize_proxy_env_vars()

    for key in ("HTTPS_PROXY", "HTTP_PROXY", "ALL_PROXY",
                "https_proxy", "http_proxy", "all_proxy"):
        value = str(os.environ.get(key) or "").strip()
        if not value:
            continue
        try:
            parsed = urlparse(value)
            if parsed.scheme:
                _ = parsed.port          # raises ValueError for e.g. '6153export'
        except ValueError as exc:
            raise RuntimeError(
                f"Malformed proxy environment variable {key}={value!r}. "
                "Fix or unset your proxy settings and try again."
            ) from exc


def _validate_base_url(base_url: str) -> None:
    """Reject obviously broken custom endpoint URLs before they reach httpx."""
    from urllib.parse import urlparse

    candidate = str(base_url or "").strip()
    if not candidate or candidate.startswith("acp://"):
        return
    try:
        parsed = urlparse(candidate)
        if parsed.scheme in {"http", "https"}:
            _ = parsed.port              # raises ValueError for malformed ports
    except ValueError as exc:
        raise RuntimeError(
            f"Malformed custom endpoint URL: {candidate!r}. "
            "Run `hermes setup` or `hermes model` and enter a valid http(s) base URL."
        ) from exc


def _try_custom_endpoint() -> Tuple[Optional[Any], Optional[str]]:
    runtime = _resolve_custom_runtime()
    if len(runtime) == 2:
        custom_base, custom_key = runtime
        custom_mode = None
    else:
        custom_base, custom_key, custom_mode = runtime
    if not custom_base or not custom_key:
        return None, None
    if custom_base.lower().startswith(_CODEX_AUX_BASE_URL.lower()):
        return None, None
    model = _read_main_model() or "gpt-4o-mini"
    logger.debug("Auxiliary client: custom endpoint (%s, api_mode=%s)", model, custom_mode or "chat_completions")
    _clean_base, _dq = _extract_url_query_params(custom_base)
    _extra = {"default_query": _dq} if _dq else {}
    if custom_mode == "codex_responses":
        real_client = OpenAI(api_key=custom_key, base_url=_clean_base, **_extra)
        return CodexAuxiliaryClient(real_client, model), model
    if custom_mode == "anthropic_messages":
        # Third-party Anthropic-compatible gateway (MiniMax, Zhipu GLM,
        # LiteLLM proxies, etc.).  Must NEVER be treated as OAuth —
        # Anthropic OAuth claims only apply to api.anthropic.com.
        try:
            from agent.anthropic_adapter import build_anthropic_client
            real_client = build_anthropic_client(custom_key, custom_base)
        except ImportError:
            logger.warning(
                "Custom endpoint declares api_mode=anthropic_messages but the "
                "anthropic SDK is not installed — falling back to OpenAI-wire."
            )
            return OpenAI(api_key=custom_key, base_url=_clean_base, **_extra), model
        return (
            AnthropicAuxiliaryClient(real_client, model, custom_key, custom_base, is_oauth=False),
            model,
        )
    return OpenAI(api_key=custom_key, base_url=_clean_base, **_extra), model


def _try_codex() -> Tuple[Optional[Any], Optional[str]]:
    pool_present, entry = _select_pool_entry("openai-codex")
    if pool_present:
        codex_token = _pool_runtime_api_key(entry)
        if codex_token:
            base_url = _pool_runtime_base_url(entry, _CODEX_AUX_BASE_URL) or _CODEX_AUX_BASE_URL
        else:
            codex_token = _read_codex_access_token()
            if not codex_token:
                return None, None
            base_url = _CODEX_AUX_BASE_URL
    else:
        codex_token = _read_codex_access_token()
        if not codex_token:
            return None, None
        base_url = _CODEX_AUX_BASE_URL
    logger.debug("Auxiliary client: Codex OAuth (%s via Responses API)", _CODEX_AUX_MODEL)
    real_client = OpenAI(
        api_key=codex_token,
        base_url=base_url,
        default_headers=_codex_cloudflare_headers(codex_token),
    )
    return CodexAuxiliaryClient(real_client, _CODEX_AUX_MODEL), _CODEX_AUX_MODEL


def _try_anthropic() -> Tuple[Optional[Any], Optional[str]]:
    try:
        from agent.anthropic_adapter import build_anthropic_client, resolve_anthropic_token
    except ImportError:
        return None, None

    pool_present, entry = _select_pool_entry("anthropic")
    if pool_present:
        if entry is None:
            return None, None
        token = _pool_runtime_api_key(entry)
    else:
        entry = None
        token = resolve_anthropic_token()
    if not token:
        return None, None

    # Allow base URL override from config.yaml model.base_url, but only
    # when the configured provider is anthropic — otherwise a non-Anthropic
    # base_url (e.g. Codex endpoint) would leak into Anthropic requests.
    base_url = _pool_runtime_base_url(entry, _ANTHROPIC_DEFAULT_BASE_URL) if pool_present else _ANTHROPIC_DEFAULT_BASE_URL
    try:
        from hermes_cli.config import load_config
        cfg = load_config()
        model_cfg = cfg.get("model")
        if isinstance(model_cfg, dict):
            cfg_provider = str(model_cfg.get("provider") or "").strip().lower()
            if cfg_provider == "anthropic":
                cfg_base_url = (model_cfg.get("base_url") or "").strip().rstrip("/")
                if cfg_base_url:
                    base_url = cfg_base_url
    except Exception:
        pass

    from agent.anthropic_adapter import _is_oauth_token
    is_oauth = _is_oauth_token(token)
    model = _API_KEY_PROVIDER_AUX_MODELS.get("anthropic", "claude-haiku-4-5-20251001")
    logger.debug("Auxiliary client: Anthropic native (%s) at %s (oauth=%s)", model, base_url, is_oauth)
    try:
        real_client = build_anthropic_client(token, base_url)
    except ImportError:
        # The anthropic_adapter module imports fine but the SDK itself is
        # missing — build_anthropic_client raises ImportError at call time
        # when _anthropic_sdk is None.  Treat as unavailable.
        return None, None
    return AnthropicAuxiliaryClient(real_client, model, token, base_url, is_oauth=is_oauth), model


_AUTO_PROVIDER_LABELS = {
    "_try_openrouter": "openrouter",
    "_try_nous": "nous",
    "_try_custom_endpoint": "local/custom",
    "_try_codex": "openai-codex",
    "_resolve_api_key_provider": "api-key",
}

_MAIN_RUNTIME_FIELDS = ("provider", "model", "base_url", "api_key", "api_mode")


def _normalize_main_runtime(main_runtime: Optional[Dict[str, Any]]) -> Dict[str, str]:
    """Return a sanitized copy of a live main-runtime override."""
    if not isinstance(main_runtime, dict):
        return {}
    normalized: Dict[str, str] = {}
    for field in _MAIN_RUNTIME_FIELDS:
        value = main_runtime.get(field)
        if isinstance(value, str) and value.strip():
            normalized[field] = value.strip()
    provider = normalized.get("provider")
    if provider:
        normalized["provider"] = provider.lower()
    return normalized


def _get_provider_chain() -> List[tuple]:
    """Return the ordered provider detection chain.

    Built at call time (not module level) so that test patches
    on the ``_try_*`` functions are picked up correctly.
    """
    return [
        ("openrouter", _try_openrouter),
        ("nous", _try_nous),
        ("local/custom", _try_custom_endpoint),
        ("openai-codex", _try_codex),
        ("api-key", _resolve_api_key_provider),
    ]


def _is_payment_error(exc: Exception) -> bool:
    """Detect payment/credit/quota exhaustion errors.

    Returns True for HTTP 402 (Payment Required) and for 429/other errors
    whose message indicates billing exhaustion rather than rate limiting.
    """
    status = getattr(exc, "status_code", None)
    if status == 402:
        return True
    err_lower = str(exc).lower()
    # OpenRouter and other providers include "credits" or "afford" in 402 bodies,
    # but sometimes wrap them in 429 or other codes.
    if status in (402, 429, None):
        if any(kw in err_lower for kw in ("credits", "insufficient funds",
                                           "can only afford", "billing",
                                           "payment required")):
            return True
    return False


def _is_connection_error(exc: Exception) -> bool:
    """Detect connection/network errors that warrant provider fallback.

    Returns True for errors indicating the provider endpoint is unreachable
    (DNS failure, connection refused, TLS errors, timeouts).  These are
    distinct from API errors (4xx/5xx) which indicate the provider IS
    reachable but returned an error.
    """
    from openai import APIConnectionError, APITimeoutError

    if isinstance(exc, (APIConnectionError, APITimeoutError)):
        return True
    # urllib3 / httpx / httpcore connection errors
    err_type = type(exc).__name__
    if any(kw in err_type for kw in ("Connection", "Timeout", "DNS", "SSL")):
        return True
    err_lower = str(exc).lower()
    if any(kw in err_lower for kw in (
        "connection refused", "name or service not known",
        "no route to host", "network is unreachable",
        "timed out", "connection reset",
    )):
        return True
    return False


def _is_auth_error(exc: Exception) -> bool:
    """Detect auth failures that should trigger provider-specific refresh."""
    status = getattr(exc, "status_code", None)
    if status == 401:
        return True
    err_lower = str(exc).lower()
    return "error code: 401" in err_lower or "authenticationerror" in type(exc).__name__.lower()


def _is_unsupported_parameter_error(exc: Exception, param: str) -> bool:
    """Detect provider 400s for an unsupported request parameter.

    Different OpenAI-compatible endpoints phrase the same class of error a few
    ways: ``Unsupported parameter: X``, ``unsupported_parameter`` with a
    ``param`` field, ``X is not supported``, ``unknown parameter: X``,
    ``unrecognized request argument: X``.  We match on both the parameter
    name and a generic "unsupported/unknown/unrecognized parameter" marker so
    call sites can reactively retry without the offending key instead of
    surfacing a noisy auxiliary failure.

    Generalizes the temperature-specific detector that originally shipped
    with PR #15621 so the same retry strategy can cover ``max_tokens``,
    ``seed``, ``top_p``, and any future quirk. Credit @nicholasrae (PR #15416)
    for the generalization pattern.
    """
    param_lower = (param or "").lower()
    if not param_lower:
        return False
    err_lower = str(exc).lower()
    if param_lower not in err_lower:
        return False
    return any(marker in err_lower for marker in (
        "unsupported parameter",
        "unsupported_parameter",
        "not supported",
        "does not support",
        "unknown parameter",
        "unrecognized request argument",
        "unrecognized parameter",
        "invalid parameter",
    ))


def _is_unsupported_temperature_error(exc: Exception) -> bool:
    """Back-compat wrapper: detect API errors where the model rejects ``temperature``.

    Delegates to :func:`_is_unsupported_parameter_error`; kept as a separate
    public symbol because existing tests and call sites import it by name.
    """
    return _is_unsupported_parameter_error(exc, "temperature")


def _evict_cached_clients(provider: str) -> None:
    """Drop cached auxiliary clients for a provider so fresh creds are used."""
    normalized = _normalize_aux_provider(provider)
    with _client_cache_lock:
        stale_keys = [
            key for key in _client_cache
            if _normalize_aux_provider(str(key[0])) == normalized
        ]
        for key in stale_keys:
            client = _client_cache.get(key, (None, None, None))[0]
            if client is not None:
                _force_close_async_httpx(client)
                try:
                    close_fn = getattr(client, "close", None)
                    if callable(close_fn):
                        close_fn()
                except Exception:
                    pass
            _client_cache.pop(key, None)


def _refresh_provider_credentials(provider: str) -> bool:
    """Refresh short-lived credentials for OAuth-backed auxiliary providers."""
    normalized = _normalize_aux_provider(provider)
    try:
        if normalized == "openai-codex":
            from hermes_cli.auth import resolve_codex_runtime_credentials

            creds = resolve_codex_runtime_credentials(force_refresh=True)
            if not str(creds.get("api_key", "") or "").strip():
                return False
            _evict_cached_clients(normalized)
            return True
        if normalized == "nous":
            from hermes_cli.auth import resolve_nous_runtime_credentials

            creds = resolve_nous_runtime_credentials(
                min_key_ttl_seconds=max(60, int(os.getenv("HERMES_NOUS_MIN_KEY_TTL_SECONDS", "1800"))),
                timeout_seconds=float(os.getenv("HERMES_NOUS_TIMEOUT_SECONDS", "15")),
                force_mint=True,
            )
            if not str(creds.get("api_key", "") or "").strip():
                return False
            _evict_cached_clients(normalized)
            return True
        if normalized == "anthropic":
            from agent.anthropic_adapter import read_claude_code_credentials, _refresh_oauth_token, resolve_anthropic_token

            creds = read_claude_code_credentials()
            token = _refresh_oauth_token(creds) if isinstance(creds, dict) and creds.get("refreshToken") else None
            if not str(token or "").strip():
                token = resolve_anthropic_token()
            if not str(token or "").strip():
                return False
            _evict_cached_clients(normalized)
            return True
    except Exception as exc:
        logger.debug("Auxiliary provider credential refresh failed for %s: %s", normalized, exc)
        return False
    return False


def _try_payment_fallback(
    failed_provider: str,
    task: str = None,
    reason: str = "payment error",
) -> Tuple[Optional[Any], Optional[str], str]:
    """Try alternative providers after a payment/credit or connection error.

    Iterates the standard auto-detection chain, skipping the provider that
    failed.

    Returns:
        (client, model, provider_label) or (None, None, "") if no fallback.
    """
    # Normalise the failed provider label for matching.
    skip = failed_provider.lower().strip()
    # Also skip Step-1 main-provider path if it maps to the same backend.
    # (e.g. main_provider="openrouter" → skip "openrouter" in chain)
    main_provider = _read_main_provider()
    skip_labels = {skip}
    if main_provider and main_provider.lower() in skip:
        skip_labels.add(main_provider.lower())
    # Map common resolved_provider values back to chain labels.
    _alias_to_label = {"openrouter": "openrouter", "nous": "nous",
                       "openai-codex": "openai-codex", "codex": "openai-codex",
                       "custom": "local/custom", "local/custom": "local/custom"}
    skip_chain_labels = {_alias_to_label.get(s, s) for s in skip_labels}

    tried = []
    for label, try_fn in _get_provider_chain():
        if label in skip_chain_labels:
            continue
        client, model = try_fn()
        if client is not None:
            logger.info(
                "Auxiliary %s: %s on %s — falling back to %s (%s)",
                task or "call", reason, failed_provider, label, model or "default",
            )
            return client, model, label
        tried.append(label)

    logger.warning(
        "Auxiliary %s: %s on %s and no fallback available (tried: %s)",
        task or "call", reason, failed_provider, ", ".join(tried),
    )
    return None, None, ""


def _resolve_auto(main_runtime: Optional[Dict[str, Any]] = None) -> Tuple[Optional[OpenAI], Optional[str]]:
    """Full auto-detection chain.

    Priority:
      1. User's main provider + main model, regardless of provider type.
         This means auxiliary tasks (compression, vision, web extraction,
         session search, etc.) use the same model the user configured for
         chat.  Users on OpenRouter/Nous get their chosen chat model; users
         on DeepSeek/ZAI/Alibaba get theirs; etc.  Running aux tasks on the
         user's picked model keeps behavior predictable — no surprise
         switches to a cheap fallback model for side tasks.
      2. OpenRouter → Nous → custom → Codex → API-key providers (fallback
         chain, only used when the main provider has no working client).
    """
    global auxiliary_is_nous, _stale_base_url_warned
    auxiliary_is_nous = False  # Reset — _try_nous() will set True if it wins
    runtime = _normalize_main_runtime(main_runtime)
    runtime_provider = runtime.get("provider", "")
    runtime_model = runtime.get("model", "")
    runtime_base_url = runtime.get("base_url", "")
    runtime_api_key = runtime.get("api_key", "")
    runtime_api_mode = runtime.get("api_mode", "")

    # ── Warn once if OPENAI_BASE_URL is set but config.yaml uses a named
    #    provider (not 'custom').  This catches the common "env poisoning"
    #    scenario where a user switches providers via `hermes model` but the
    #    old OPENAI_BASE_URL lingers in ~/.hermes/.env. ──
    if not _stale_base_url_warned:
        _env_base = os.getenv("OPENAI_BASE_URL", "").strip()
        _cfg_provider = runtime_provider or _read_main_provider()
        if (_env_base and _cfg_provider
                and _cfg_provider != "custom"
                and not _cfg_provider.startswith("custom:")):
            logger.warning(
                "OPENAI_BASE_URL is set (%s) but model.provider is '%s'. "
                "Auxiliary clients may route to the wrong endpoint. "
                "Run: hermes model to reconfigure, or remove "
                "OPENAI_BASE_URL from ~/.hermes/.env",
                _env_base, _cfg_provider,
            )
            _stale_base_url_warned = True

    # ── Step 1: main provider + main model → use them directly ──
    #
    # This is the primary aux backend for every user.  "auto" means
    # "use my main chat model for side tasks as well" — including users
    # on aggregators (OpenRouter, Nous) who previously got routed to a
    # cheap provider-side default.  Explicit per-task overrides set via
    # config.yaml (auxiliary.<task>.provider) still win over this.
    main_provider = runtime_provider or _read_main_provider()
    main_model = runtime_model or _read_main_model()
    if (main_provider and main_model
            and main_provider not in ("auto", "")):
        resolved_provider = main_provider
        explicit_base_url = None
        explicit_api_key = None
        if runtime_base_url and (main_provider == "custom" or main_provider.startswith("custom:")):
            resolved_provider = "custom"
            explicit_base_url = runtime_base_url
            explicit_api_key = runtime_api_key or None
        client, resolved = resolve_provider_client(
            resolved_provider,
            main_model,
            explicit_base_url=explicit_base_url,
            explicit_api_key=explicit_api_key,
            api_mode=runtime_api_mode or None,
        )
        if client is not None:
            logger.info("Auxiliary auto-detect: using main provider %s (%s)",
                        main_provider, resolved or main_model)
            return client, resolved or main_model

    # ── Step 2: aggregator / fallback chain ──────────────────────────────
    tried = []
    for label, try_fn in _get_provider_chain():
        client, model = try_fn()
        if client is not None:
            if tried:
                logger.info("Auxiliary auto-detect: using %s (%s) — skipped: %s",
                            label, model or "default", ", ".join(tried))
            else:
                logger.info("Auxiliary auto-detect: using %s (%s)", label, model or "default")
            return client, model
        tried.append(label)
    logger.warning("Auxiliary auto-detect: no provider available (tried: %s). "
                   "Compression, summarization, and memory flush will not work. "
                   "Set OPENROUTER_API_KEY or configure a local model in config.yaml.",
                   ", ".join(tried))
    return None, None


# ── Centralized Provider Router ─────────────────────────────────────────────
#
# resolve_provider_client() is the single entry point for creating a properly
# configured client given a (provider, model) pair.  It handles auth lookup,
# base URL resolution, provider-specific headers, and API format differences
# (Chat Completions vs Responses API for Codex).
#
# All auxiliary consumer code should go through this or the public helpers
# below — never look up auth env vars ad-hoc.


def _to_async_client(sync_client, model: str, is_vision: bool = False):
    """Convert a sync client to its async counterpart, preserving Codex routing.

    When ``is_vision=True`` and the underlying base URL is Copilot, the
    resulting async client carries the ``Copilot-Vision-Request: true``
    header so the request is routed to Copilot's vision-capable
    infrastructure (otherwise vision payloads silently time out).
    """
    from openai import AsyncOpenAI

    if isinstance(sync_client, CodexAuxiliaryClient):
        return AsyncCodexAuxiliaryClient(sync_client), model
    if isinstance(sync_client, AnthropicAuxiliaryClient):
        return AsyncAnthropicAuxiliaryClient(sync_client), model
    try:
        from agent.gemini_native_adapter import GeminiNativeClient, AsyncGeminiNativeClient

        if isinstance(sync_client, GeminiNativeClient):
            return AsyncGeminiNativeClient(sync_client), model
    except ImportError:
        pass
    try:
        from agent.copilot_acp_client import CopilotACPClient
        if isinstance(sync_client, CopilotACPClient):
            return sync_client, model
    except ImportError:
        pass

    async_kwargs = {
        "api_key": sync_client.api_key,
        "base_url": str(sync_client.base_url),
    }
    sync_base_url = str(sync_client.base_url)
    if base_url_host_matches(sync_base_url, "openrouter.ai"):
        async_kwargs["default_headers"] = dict(_OR_HEADERS)
    elif base_url_host_matches(sync_base_url, "api.githubcopilot.com"):
        from hermes_cli.copilot_auth import copilot_request_headers

        async_kwargs["default_headers"] = copilot_request_headers(
            is_agent_turn=True, is_vision=is_vision
        )
    elif base_url_host_matches(sync_base_url, "api.kimi.com"):
        async_kwargs["default_headers"] = {"User-Agent": "claude-code/0.1.0"}
    return AsyncOpenAI(**async_kwargs), model


def _normalize_resolved_model(model_name: Optional[str], provider: str) -> Optional[str]:
    """Normalize a resolved model for the provider that will receive it."""
    if not model_name:
        return model_name
    try:
        from hermes_cli.model_normalize import normalize_model_for_provider

        return normalize_model_for_provider(model_name, provider)
    except Exception:
        return model_name


def resolve_provider_client(
    provider: str,
    model: str = None,
    async_mode: bool = False,
    raw_codex: bool = False,
    explicit_base_url: str = None,
    explicit_api_key: str = None,
    api_mode: str = None,
    main_runtime: Optional[Dict[str, Any]] = None,
    is_vision: bool = False,
) -> Tuple[Optional[Any], Optional[str]]:
    """Central router: given a provider name and optional model, return a
    configured client with the correct auth, base URL, and API format.

    The returned client always exposes ``.chat.completions.create()`` — for
    Codex/Responses API providers, an adapter handles the translation
    transparently.

    Args:
        provider: Provider identifier.  One of:
            "openrouter", "nous", "openai-codex" (or "codex"),
            "zai", "kimi-coding", "minimax", "minimax-cn",
            "custom" (OPENAI_BASE_URL + OPENAI_API_KEY),
            "auto" (full auto-detection chain).
        model: Model slug override.  If None, uses the provider's default
               auxiliary model.
        async_mode: If True, return an async-compatible client.
        raw_codex: If True, return a raw OpenAI client for Codex providers
            instead of wrapping in CodexAuxiliaryClient.  Use this when
            the caller needs direct access to responses.stream() (e.g.,
            the main agent loop).
        explicit_base_url: Optional direct OpenAI-compatible endpoint.
        explicit_api_key: Optional API key paired with explicit_base_url.
        api_mode: API mode override.  One of "chat_completions",
            "codex_responses", or None (auto-detect).  When set to
            "codex_responses", the client is wrapped in
            CodexAuxiliaryClient to route through the Responses API.

    Returns:
        (client, resolved_model) or (None, None) if auth is unavailable.
    """
    _validate_proxy_env_urls()
    # Normalise aliases
    provider = _normalize_aux_provider(provider)

    def _needs_codex_wrap(client_obj, base_url_str: str, model_str: str) -> bool:
        """Decide if a plain OpenAI client should be wrapped for Responses API.

        Returns True when api_mode is explicitly "codex_responses", or when
        auto-detection (api.openai.com + codex-family model) suggests it.
        Already-wrapped clients (CodexAuxiliaryClient) are skipped.
        """
        if isinstance(client_obj, CodexAuxiliaryClient):
            return False
        if raw_codex:
            return False
        if api_mode == "codex_responses":
            return True
        # Auto-detect: api.openai.com + codex model name pattern
        if api_mode and api_mode != "codex_responses":
            return False  # explicit non-codex mode
        if base_url_hostname(base_url_str) == "api.openai.com":
            model_lower = (model_str or "").lower()
            if "codex" in model_lower:
                return True
        return False

    def _wrap_if_needed(client_obj, final_model_str: str, base_url_str: str = ""):
        """Wrap a plain OpenAI client in CodexAuxiliaryClient if Responses API is needed."""
        if _needs_codex_wrap(client_obj, base_url_str, final_model_str):
            logger.debug(
                "resolve_provider_client: wrapping client in CodexAuxiliaryClient "
                "(api_mode=%s, model=%s, base_url=%s)",
                api_mode or "auto-detected", final_model_str,
                base_url_str[:60] if base_url_str else "")
            return CodexAuxiliaryClient(client_obj, final_model_str)
        return client_obj

    # ── Auto: try all providers in priority order ────────────────────
    if provider == "auto":
        client, resolved = _resolve_auto(main_runtime=main_runtime)
        if client is None:
            return None, None
        # When auto-detection lands on a non-OpenRouter provider (e.g. a
        # local server), an OpenRouter-formatted model override like
        # "google/gemini-3-flash-preview" won't work.  Drop it and use
        # the provider's own default model instead.
        if model and "/" in model and resolved and "/" not in resolved:
            logger.debug(
                "Dropping OpenRouter-format model %r for non-OpenRouter "
                "auxiliary provider (using %r instead)", model, resolved)
            model = None
        final_model = model or resolved
        return (_to_async_client(client, final_model, is_vision=is_vision) if async_mode
                else (client, final_model))

    # ── OpenRouter ───────────────────────────────────────────────────
    if provider == "openrouter":
        client, default = _try_openrouter()
        if client is None:
            logger.warning(
                "resolve_provider_client: openrouter requested but %s",
                _describe_openrouter_unavailable(),
            )
            return None, None
        final_model = _normalize_resolved_model(model or default, provider)
        return (_to_async_client(client, final_model, is_vision=is_vision) if async_mode
                else (client, final_model))

    # ── Nous Portal (OAuth) ──────────────────────────────────────────
    if provider == "nous":
        # Detect vision tasks: either explicit model override from
        # _PROVIDER_VISION_MODELS, or caller passed a known vision model.
        _is_vision = (
            model in _PROVIDER_VISION_MODELS.values()
            or (model or "").strip().lower() == "mimo-v2-omni"
        )
        client, default = _try_nous(vision=_is_vision)
        if client is None:
            logger.warning("resolve_provider_client: nous requested "
                           "but Nous Portal not configured (run: hermes auth)")
            return None, None
        final_model = _normalize_resolved_model(model or default, provider)
        return (_to_async_client(client, final_model, is_vision=is_vision) if async_mode
                else (client, final_model))

    # ── OpenAI Codex (OAuth → Responses API) ─────────────────────────
    if provider == "openai-codex":
        if raw_codex:
            # Return the raw OpenAI client for callers that need direct
            # access to responses.stream() (e.g., the main agent loop).
            codex_token = _read_codex_access_token()
            if not codex_token:
                logger.warning("resolve_provider_client: openai-codex requested "
                               "but no Codex OAuth token found (run: hermes model)")
                return None, None
            final_model = _normalize_resolved_model(model or _CODEX_AUX_MODEL, provider)
            raw_client = OpenAI(
                api_key=codex_token,
                base_url=_CODEX_AUX_BASE_URL,
                default_headers=_codex_cloudflare_headers(codex_token),
            )
            return (raw_client, final_model)
        # Standard path: wrap in CodexAuxiliaryClient adapter
        client, default = _try_codex()
        if client is None:
            logger.warning("resolve_provider_client: openai-codex requested "
                           "but no Codex OAuth token found (run: hermes model)")
            return None, None
        final_model = _normalize_resolved_model(model or default, provider)
        return (_to_async_client(client, final_model, is_vision=is_vision) if async_mode
                else (client, final_model))

    # ── Custom endpoint (OPENAI_BASE_URL + OPENAI_API_KEY) ───────────
    if provider == "custom":
        if explicit_base_url:
            custom_base = explicit_base_url.strip()
            custom_key = (
                (explicit_api_key or "").strip()
                or os.getenv("OPENAI_API_KEY", "").strip()
                or "no-key-required"  # local servers don't need auth
            )
            if not custom_base:
                logger.warning(
                    "resolve_provider_client: explicit custom endpoint requested "
                    "but base_url is empty"
                )
                return None, None
            final_model = _normalize_resolved_model(
                model or _read_main_model() or "gpt-4o-mini",
                provider,
            )
            extra = {}
            _clean_base, _dq = _extract_url_query_params(custom_base)
            if _dq:
                extra["default_query"] = _dq
            if base_url_host_matches(custom_base, "api.kimi.com"):
                extra["default_headers"] = {"User-Agent": "claude-code/0.1.0"}
            elif base_url_host_matches(custom_base, "api.githubcopilot.com"):
                from hermes_cli.copilot_auth import copilot_request_headers
                extra["default_headers"] = copilot_request_headers(
                    is_agent_turn=True, is_vision=is_vision
                )
            client = OpenAI(api_key=custom_key, base_url=_clean_base, **extra)
            client = _wrap_if_needed(client, final_model, custom_base)
            return (_to_async_client(client, final_model, is_vision=is_vision) if async_mode
                    else (client, final_model))
        # Try custom first, then codex, then API-key providers
        for try_fn in (_try_custom_endpoint, _try_codex,
                       _resolve_api_key_provider):
            client, default = try_fn()
            if client is not None:
                final_model = _normalize_resolved_model(model or default, provider)
                _cbase = str(getattr(client, "base_url", "") or "")
                client = _wrap_if_needed(client, final_model, _cbase)
                return (_to_async_client(client, final_model, is_vision=is_vision) if async_mode
                        else (client, final_model))
        logger.warning("resolve_provider_client: custom/main requested "
                       "but no endpoint credentials found")
        return None, None

    # ── Named custom providers (config.yaml providers dict / custom_providers list) ───
    try:
        from hermes_cli.runtime_provider import _get_named_custom_provider
        custom_entry = _get_named_custom_provider(provider)
        if custom_entry:
            custom_base = custom_entry.get("base_url", "").strip()
            custom_key = custom_entry.get("api_key", "").strip()
            custom_key_env = custom_entry.get("key_env", "").strip()
            if not custom_key and custom_key_env:
                custom_key = os.getenv(custom_key_env, "").strip()
            custom_key = custom_key or "no-key-required"
            # An explicit per-task api_mode override (from _resolve_task_provider_model)
            # wins; otherwise fall back to what the provider entry declared.
            entry_api_mode = (api_mode or custom_entry.get("api_mode") or "").strip()
            if custom_base:
                final_model = _normalize_resolved_model(
                    model or custom_entry.get("model") or _read_main_model() or "gpt-4o-mini",
                    provider,
                )
                _clean_base2, _dq2 = _extract_url_query_params(custom_base)
                _extra2 = {"default_query": _dq2} if _dq2 else {}
                logger.debug(
                    "resolve_provider_client: named custom provider %r (%s, api_mode=%s)",
                    provider, final_model, entry_api_mode or "chat_completions")
                # anthropic_messages: route through the Anthropic Messages API
                # via AnthropicAuxiliaryClient. Mirrors the anonymous-custom
                # branch in _try_custom_endpoint(). See #15033.
                if entry_api_mode == "anthropic_messages":
                    try:
                        from agent.anthropic_adapter import build_anthropic_client
                        real_client = build_anthropic_client(custom_key, custom_base)
                    except ImportError:
                        logger.warning(
                            "Named custom provider %r declares api_mode="
                            "anthropic_messages but the anthropic SDK is not "
                            "installed — falling back to OpenAI-wire.",
                            provider,
                        )
                        client = OpenAI(api_key=custom_key, base_url=_clean_base2, **_extra2)
                        return (_to_async_client(client, final_model, is_vision=is_vision) if async_mode
                                else (client, final_model))
                    sync_anthropic = AnthropicAuxiliaryClient(
                        real_client, final_model, custom_key, custom_base, is_oauth=False,
                    )
                    if async_mode:
                        return AsyncAnthropicAuxiliaryClient(sync_anthropic), final_model
                    return sync_anthropic, final_model
                client = OpenAI(api_key=custom_key, base_url=_clean_base2, **_extra2)
                # codex_responses or inherited auto-detect (via _wrap_if_needed).
                # _wrap_if_needed reads the closed-over `api_mode` (the task-level
                # override). Named-provider entry api_mode=codex_responses also
                # flows through here.
                if entry_api_mode == "codex_responses" and not isinstance(
                    client, CodexAuxiliaryClient
                ):
                    client = CodexAuxiliaryClient(client, final_model)
                else:
                    client = _wrap_if_needed(client, final_model, custom_base)
                return (_to_async_client(client, final_model, is_vision=is_vision) if async_mode
                        else (client, final_model))
            logger.warning(
                "resolve_provider_client: named custom provider %r has no base_url",
                provider)
            return None, None
    except ImportError:
        pass

    # ── API-key providers from PROVIDER_REGISTRY ─────────────────────
    try:
        from hermes_cli.auth import (
            PROVIDER_REGISTRY,
            resolve_api_key_provider_credentials,
            resolve_external_process_provider_credentials,
        )
    except ImportError:
        logger.debug("hermes_cli.auth not available for provider %s", provider)
        return None, None

    pconfig = PROVIDER_REGISTRY.get(provider)
    if pconfig is None:
        logger.warning("resolve_provider_client: unknown provider %r", provider)
        return None, None

    if pconfig.auth_type == "api_key":
        if provider == "anthropic":
            client, default_model = _try_anthropic()
            if client is None:
                logger.warning("resolve_provider_client: anthropic requested but no Anthropic credentials found")
                return None, None
            final_model = _normalize_resolved_model(model or default_model, provider)
            return (_to_async_client(client, final_model, is_vision=is_vision) if async_mode else (client, final_model))

        creds = resolve_api_key_provider_credentials(provider)
        api_key = str(creds.get("api_key", "")).strip()
        if not api_key:
            tried_sources = list(pconfig.api_key_env_vars)
            if provider == "copilot":
                tried_sources.append("gh auth token")
            logger.debug("resolve_provider_client: provider %s has no API "
                         "key configured (tried: %s)",
                         provider, ", ".join(tried_sources))
            return None, None

        base_url = _to_openai_base_url(
            str(creds.get("base_url", "")).strip().rstrip("/") or pconfig.inference_base_url
        )

        default_model = _API_KEY_PROVIDER_AUX_MODELS.get(provider, "")
        final_model = _normalize_resolved_model(model or default_model, provider)

        if provider == "gemini":
            from agent.gemini_native_adapter import GeminiNativeClient, is_native_gemini_base_url

            if is_native_gemini_base_url(base_url):
                client = GeminiNativeClient(api_key=api_key, base_url=base_url)
                logger.debug("resolve_provider_client: %s (%s)", provider, final_model)
                return (_to_async_client(client, final_model, is_vision=is_vision) if async_mode
                        else (client, final_model))

        # Provider-specific headers
        headers = {}
        if base_url_host_matches(base_url, "api.kimi.com"):
            headers["User-Agent"] = "claude-code/0.1.0"
        elif base_url_host_matches(base_url, "api.githubcopilot.com"):
            from hermes_cli.copilot_auth import copilot_request_headers

            headers.update(copilot_request_headers(
                is_agent_turn=True, is_vision=is_vision
            ))
        client = OpenAI(api_key=api_key, base_url=base_url,
                        **({"default_headers": headers} if headers else {}))

        # Copilot GPT-5+ models (except gpt-5-mini) require the Responses
        # API — they are not accessible via /chat/completions.  Wrap the
        # plain client in CodexAuxiliaryClient so call_llm() transparently
        # routes through responses.stream().
        if provider == "copilot" and final_model and not raw_codex:
            try:
                from hermes_cli.models import _should_use_copilot_responses_api
                if _should_use_copilot_responses_api(final_model):
                    logger.debug(
                        "resolve_provider_client: copilot model %s needs "
                        "Responses API — wrapping with CodexAuxiliaryClient",
                        final_model)
                    client = CodexAuxiliaryClient(client, final_model)
            except ImportError:
                pass

        # Honor api_mode for any API-key provider (e.g. direct OpenAI with
        # codex-family models).  The copilot-specific wrapping above handles
        # copilot; this covers the general case (#6800).
        client = _wrap_if_needed(client, final_model, base_url)

        logger.debug("resolve_provider_client: %s (%s)", provider, final_model)
        return (_to_async_client(client, final_model, is_vision=is_vision) if async_mode
                else (client, final_model))

    if pconfig.auth_type == "external_process":
        creds = resolve_external_process_provider_credentials(provider)
        final_model = _normalize_resolved_model(model or _read_main_model(), provider)
        if provider == "copilot-acp":
            api_key = str(creds.get("api_key", "")).strip()
            base_url = str(creds.get("base_url", "")).strip()
            command = str(creds.get("command", "")).strip() or None
            args = list(creds.get("args") or [])
            if not final_model:
                logger.warning(
                    "resolve_provider_client: copilot-acp requested but no model "
                    "was provided or configured"
                )
                return None, None
            if not api_key or not base_url:
                logger.warning(
                    "resolve_provider_client: copilot-acp requested but external "
                    "process credentials are incomplete"
                )
                return None, None
            from agent.copilot_acp_client import CopilotACPClient

            client = CopilotACPClient(
                api_key=api_key,
                base_url=base_url,
                command=command,
                args=args,
            )
            logger.debug("resolve_provider_client: %s (%s)", provider, final_model)
            return (_to_async_client(client, final_model, is_vision=is_vision) if async_mode
                    else (client, final_model))
        logger.warning("resolve_provider_client: external-process provider %s not "
                       "directly supported", provider)
        return None, None

    elif pconfig.auth_type == "aws_sdk":
        # AWS SDK providers (Bedrock) — use the Anthropic Bedrock client via
        # boto3's credential chain (IAM roles, SSO, env vars, instance metadata).
        try:
            from agent.bedrock_adapter import has_aws_credentials, resolve_bedrock_region
            from agent.anthropic_adapter import build_anthropic_bedrock_client
        except ImportError:
            logger.warning("resolve_provider_client: bedrock requested but "
                           "boto3 or anthropic SDK not installed")
            return None, None

        if not has_aws_credentials():
            logger.debug("resolve_provider_client: bedrock requested but "
                         "no AWS credentials found")
            return None, None

        region = resolve_bedrock_region()
        default_model = "anthropic.claude-haiku-4-5-20251001-v1:0"
        final_model = _normalize_resolved_model(model or default_model, provider)
        try:
            real_client = build_anthropic_bedrock_client(region)
        except ImportError as exc:
            logger.warning("resolve_provider_client: cannot create Bedrock "
                           "client: %s", exc)
            return None, None
        client = AnthropicAuxiliaryClient(
            real_client, final_model, api_key="aws-sdk",
            base_url=f"https://bedrock-runtime.{region}.amazonaws.com",
        )
        logger.debug("resolve_provider_client: bedrock (%s, %s)", final_model, region)
        return (_to_async_client(client, final_model, is_vision=is_vision) if async_mode
                else (client, final_model))

    elif pconfig.auth_type in ("oauth_device_code", "oauth_external"):
        # OAuth providers — route through their specific try functions
        if provider == "nous":
            return resolve_provider_client("nous", model, async_mode)
        if provider == "openai-codex":
            return resolve_provider_client("openai-codex", model, async_mode)
        # Other OAuth providers not directly supported
        logger.warning("resolve_provider_client: OAuth provider %s not "
                       "directly supported, try 'auto'", provider)
        return None, None

    logger.warning("resolve_provider_client: unhandled auth_type %s for %s",
                   pconfig.auth_type, provider)
    return None, None


# ── Public API ──────────────────────────────────────────────────────────────

def get_text_auxiliary_client(
    task: str = "",
    *,
    main_runtime: Optional[Dict[str, Any]] = None,
) -> Tuple[Optional[OpenAI], Optional[str]]:
    """Return (client, default_model_slug) for text-only auxiliary tasks.

    Args:
        task: Optional task name ("compression", "web_extract") to check
              for a task-specific provider override.

    Callers may override the returned model via config.yaml
    (e.g. auxiliary.compression.model, auxiliary.web_extract.model).
    """
    provider, model, base_url, api_key, api_mode = _resolve_task_provider_model(task or None)
    return resolve_provider_client(
        provider,
        model=model,
        explicit_base_url=base_url,
        explicit_api_key=api_key,
        api_mode=api_mode,
        main_runtime=main_runtime,
    )


def get_async_text_auxiliary_client(task: str = "", *, main_runtime: Optional[Dict[str, Any]] = None):
    """Return (async_client, model_slug) for async consumers.

    For standard providers returns (AsyncOpenAI, model). For Codex returns
    (AsyncCodexAuxiliaryClient, model) which wraps the Responses API.
    Returns (None, None) when no provider is available.
    """
    provider, model, base_url, api_key, api_mode = _resolve_task_provider_model(task or None)
    return resolve_provider_client(
        provider,
        model=model,
        async_mode=True,
        explicit_base_url=base_url,
        explicit_api_key=api_key,
        api_mode=api_mode,
        main_runtime=main_runtime,
    )


_VISION_AUTO_PROVIDER_ORDER = (
    "openrouter",
    "nous",
)


def _normalize_vision_provider(provider: Optional[str]) -> str:
    return _normalize_aux_provider(provider)


def _resolve_strict_vision_backend(
    provider: str,
    model: Optional[str] = None,
) -> Tuple[Optional[Any], Optional[str]]:
    provider = _normalize_vision_provider(provider)
    if provider == "copilot":
        return resolve_provider_client("copilot", model, is_vision=True)
    if provider == "openrouter":
        return _try_openrouter()
    if provider == "nous":
        return _try_nous(vision=True)
    if provider == "openai-codex":
        return _try_codex()
    if provider == "anthropic":
        return _try_anthropic()
    if provider == "custom":
        return _try_custom_endpoint()
    return None, None


def _strict_vision_backend_available(provider: str) -> bool:
    return _resolve_strict_vision_backend(provider)[0] is not None


def get_available_vision_backends() -> List[str]:
    """Return the currently available vision backends in auto-selection order.

    Order: active provider → OpenRouter → Nous → stop.  This is the single
    source of truth for setup, tool gating, and runtime auto-routing of
    vision tasks.
    """
    available: List[str] = []
    # 1. Active provider — if the user configured a provider, try it first.
    main_provider = _read_main_provider()
    if main_provider and main_provider not in ("auto", ""):
        if main_provider in _VISION_AUTO_PROVIDER_ORDER:
            if _strict_vision_backend_available(main_provider):
                available.append(main_provider)
        else:
            client, _ = resolve_provider_client(main_provider, _read_main_model())
            if client is not None:
                available.append(main_provider)
    # 2. OpenRouter, 3. Nous — skip if already covered by main provider.
    for p in _VISION_AUTO_PROVIDER_ORDER:
        if p not in available and _strict_vision_backend_available(p):
            available.append(p)
    return available


def resolve_vision_provider_client(
    provider: Optional[str] = None,
    model: Optional[str] = None,
    *,
    base_url: Optional[str] = None,
    api_key: Optional[str] = None,
    async_mode: bool = False,
) -> Tuple[Optional[str], Optional[Any], Optional[str]]:
    """Resolve the client actually used for vision tasks.

    Direct endpoint overrides take precedence over provider selection. Explicit
    provider overrides still use the generic provider router for non-standard
    backends, so users can intentionally force experimental providers. Auto mode
    stays conservative and only tries vision backends known to work today.
    """
    requested, resolved_model, resolved_base_url, resolved_api_key, resolved_api_mode = _resolve_task_provider_model(
        "vision", provider, model, base_url, api_key
    )
    requested = _normalize_vision_provider(requested)

    def _finalize(resolved_provider: str, sync_client: Any, default_model: Optional[str]):
        if sync_client is None:
            return resolved_provider, None, None
        final_model = resolved_model or default_model
        if async_mode:
            async_client, async_model = _to_async_client(sync_client, final_model, is_vision=True)
            return resolved_provider, async_client, async_model
        return resolved_provider, sync_client, final_model

    if resolved_base_url:
        client, final_model = resolve_provider_client(
            "custom",
            model=resolved_model,
            async_mode=async_mode,
            explicit_base_url=resolved_base_url,
            explicit_api_key=resolved_api_key,
            api_mode=resolved_api_mode,
        )
        if client is None:
            return "custom", None, None
        return "custom", client, final_model

    if requested == "auto":
        # Vision auto-detection order:
        #   1. User's main provider + main model (including aggregators).
        #      _PROVIDER_VISION_MODELS provides per-provider vision model
        #      overrides when the provider has a dedicated multimodal model
        #      that differs from the chat model (e.g. xiaomi → mimo-v2-omni,
        #      zai → glm-5v-turbo). Nous is the exception: it has a dedicated
        #      strict vision backend with tier-aware defaults, so it must not
        #      fall through to the user's text chat model here.
        #   2. OpenRouter  (vision-capable aggregator fallback)
        #   3. Nous Portal (vision-capable aggregator fallback)
        #   4. Stop
        main_provider = _read_main_provider()
        main_model = _read_main_model()
        if main_provider and main_provider not in ("auto", ""):
            vision_model = _PROVIDER_VISION_MODELS.get(main_provider, main_model)
            if main_provider == "nous":
                sync_client, default_model = _resolve_strict_vision_backend(
                    main_provider, vision_model
                )
                if sync_client is not None:
                    logger.info(
                        "Vision auto-detect: using main provider %s (%s)",
                        main_provider, default_model or resolved_model or main_model,
                    )
                    return _finalize(main_provider, sync_client, default_model)
            else:
                rpc_client, rpc_model = resolve_provider_client(
                    main_provider, vision_model,
                    api_mode=resolved_api_mode,
                    is_vision=True)
                if rpc_client is not None:
                    logger.info(
                        "Vision auto-detect: using main provider %s (%s)",
                        main_provider, rpc_model or vision_model,
                    )
                    return _finalize(
                        main_provider, rpc_client, rpc_model or vision_model)

        # Fall back through aggregators (uses their dedicated vision model,
        # not the user's main model) when main provider has no client.
        for candidate in _VISION_AUTO_PROVIDER_ORDER:
            if candidate == main_provider:
                continue  # already tried above
            sync_client, default_model = _resolve_strict_vision_backend(candidate)
            if sync_client is not None:
                return _finalize(candidate, sync_client, default_model)

        logger.debug("Auxiliary vision client: none available")
        return None, None, None

    if requested in _VISION_AUTO_PROVIDER_ORDER:
        sync_client, default_model = _resolve_strict_vision_backend(
            requested, resolved_model
        )
        return _finalize(requested, sync_client, default_model)

    client, final_model = _get_cached_client(requested, resolved_model, async_mode,
                                             api_mode=resolved_api_mode,
                                             is_vision=True)
    if client is None:
        return requested, None, None
    return requested, client, final_model


def get_auxiliary_extra_body() -> dict:
    """Return extra_body kwargs for auxiliary API calls.
    
    Includes Nous Portal product tags when the auxiliary client is backed
    by Nous Portal. Returns empty dict otherwise.
    """
    return dict(NOUS_EXTRA_BODY) if auxiliary_is_nous else {}


def auxiliary_max_tokens_param(value: int) -> dict:
    """Return the correct max tokens kwarg for the auxiliary client's provider.
    
    OpenRouter and local models use 'max_tokens'. Direct OpenAI with newer
    models (gpt-4o, o-series, gpt-5+) requires 'max_completion_tokens'.
    The Codex adapter translates max_tokens internally, so we use max_tokens
    for it as well.
    """
    custom_base = _current_custom_base_url()
    or_key = os.getenv("OPENROUTER_API_KEY")
    # Only use max_completion_tokens for direct OpenAI custom endpoints
    if (not or_key
            and _read_nous_auth() is None
            and base_url_hostname(custom_base) == "api.openai.com"):
        return {"max_completion_tokens": value}
    return {"max_tokens": value}


# ── Centralized LLM Call API ────────────────────────────────────────────────
#
# call_llm() and async_call_llm() own the full request lifecycle:
#   1. Resolve provider + model from task config (or explicit args)
#   2. Get or create a cached client for that provider
#   3. Format request args for the provider + model (max_tokens handling, etc.)
#   4. Make the API call
#   5. Return the response
#
# Every auxiliary LLM consumer should use these instead of manually
# constructing clients and calling .chat.completions.create().

# Client cache: (provider, async_mode, base_url, api_key, api_mode, runtime_key) -> (client, default_model, loop)
# NOTE: loop identity is NOT part of the key.  On async cache hits we check
# whether the cached loop is the *current* loop; if not, the stale entry is
# replaced in-place.  This bounds cache growth to one entry per unique
# provider config rather than one per (config × event-loop), which previously
# caused unbounded fd accumulation in long-running gateway processes (#10200).
_client_cache: Dict[tuple, tuple] = {}
_client_cache_lock = threading.Lock()
_CLIENT_CACHE_MAX_SIZE = 64  # safety belt — evict oldest when exceeded


def _client_cache_key(
    provider: str,
    *,
    async_mode: bool,
    base_url: Optional[str] = None,
    api_key: Optional[str] = None,
    api_mode: Optional[str] = None,
    main_runtime: Optional[Dict[str, Any]] = None,
    is_vision: bool = False,
) -> tuple:
    runtime = _normalize_main_runtime(main_runtime)
    runtime_key = tuple(runtime.get(field, "") for field in _MAIN_RUNTIME_FIELDS) if provider == "auto" else ()
    return (provider, async_mode, base_url or "", api_key or "", api_mode or "", runtime_key, is_vision)


def _store_cached_client(cache_key: tuple, client: Any, default_model: Optional[str], *, bound_loop: Any = None) -> None:
    with _client_cache_lock:
        old_entry = _client_cache.get(cache_key)
        if old_entry is not None and old_entry[0] is not client:
            _force_close_async_httpx(old_entry[0])
            try:
                close_fn = getattr(old_entry[0], "close", None)
                if callable(close_fn):
                    close_fn()
            except Exception:
                pass
        _client_cache[cache_key] = (client, default_model, bound_loop)


def _refresh_nous_auxiliary_client(
    *,
    cache_provider: str,
    model: Optional[str],
    async_mode: bool,
    base_url: Optional[str] = None,
    api_key: Optional[str] = None,
    api_mode: Optional[str] = None,
    main_runtime: Optional[Dict[str, Any]] = None,
    is_vision: bool = False,
) -> Tuple[Optional[Any], Optional[str]]:
    """Refresh Nous runtime creds, rebuild the client, and replace the cache entry."""
    runtime = _resolve_nous_runtime_api(force_refresh=True)
    if runtime is None:
        return None, model

    fresh_key, fresh_base_url = runtime
    sync_client = OpenAI(api_key=fresh_key, base_url=fresh_base_url)
    final_model = model

    current_loop = None
    if async_mode:
        try:
            import asyncio as _aio
            current_loop = _aio.get_event_loop()
        except RuntimeError:
            pass
        client, final_model = _to_async_client(sync_client, final_model or "", is_vision=is_vision)
    else:
        client = sync_client

    cache_key = _client_cache_key(
        cache_provider,
        async_mode=async_mode,
        base_url=base_url,
        api_key=api_key,
        api_mode=api_mode,
        main_runtime=main_runtime,
        is_vision=is_vision,
    )
    _store_cached_client(cache_key, client, final_model, bound_loop=current_loop)
    return client, final_model


def neuter_async_httpx_del() -> None:
    """Monkey-patch ``AsyncHttpxClientWrapper.__del__`` to be a no-op.

    The OpenAI SDK's ``AsyncHttpxClientWrapper.__del__`` schedules
    ``self.aclose()`` via ``asyncio.get_running_loop().create_task()``.
    When an ``AsyncOpenAI`` client is garbage-collected while
    prompt_toolkit's event loop is running (the common CLI idle state),
    the ``aclose()`` task runs on prompt_toolkit's loop but the
    underlying TCP transport is bound to a *different* loop (the worker
    thread's loop that the client was originally created on).  If that
    loop is closed or its thread is dead, the transport's
    ``self._loop.call_soon()`` raises ``RuntimeError("Event loop is
    closed")``, which prompt_toolkit surfaces as "Unhandled exception
    in event loop ... Press ENTER to continue...".

    Neutering ``__del__`` is safe because:
    - Cached clients are explicitly cleaned via ``_force_close_async_httpx``
      on stale-loop detection and ``shutdown_cached_clients`` on exit.
    - Uncached clients' TCP connections are cleaned up by the OS when the
      process exits.
    - The OpenAI SDK itself marks this as a TODO (``# TODO(someday):
      support non asyncio runtimes here``).

    Call this once at CLI startup, before any ``AsyncOpenAI`` clients are
    created.
    """
    try:
        from openai._base_client import AsyncHttpxClientWrapper
        AsyncHttpxClientWrapper.__del__ = lambda self: None  # type: ignore[assignment]
    except (ImportError, AttributeError):
        pass  # Graceful degradation if the SDK changes its internals


def _force_close_async_httpx(client: Any) -> None:
    """Mark the httpx AsyncClient inside an AsyncOpenAI client as closed.

    This prevents ``AsyncHttpxClientWrapper.__del__`` from scheduling
    ``aclose()`` on a (potentially closed) event loop, which causes
    ``RuntimeError: Event loop is closed`` → prompt_toolkit's
    "Press ENTER to continue..." handler.

    We intentionally do NOT run the full async close path — the
    connections will be dropped by the OS when the process exits.
    """
    try:
        from httpx._client import ClientState
        inner = getattr(client, "_client", None)
        if inner is not None and not getattr(inner, "is_closed", True):
            inner._state = ClientState.CLOSED
    except Exception:
        pass


def shutdown_cached_clients() -> None:
    """Close all cached clients (sync and async) to prevent event-loop errors.

    Call this during CLI shutdown, *before* the event loop is closed, to
    avoid ``AsyncHttpxClientWrapper.__del__`` raising on a dead loop.
    """
    import inspect

    with _client_cache_lock:
        for key, entry in list(_client_cache.items()):
            client = entry[0]
            if client is None:
                continue
            # Mark any async httpx transport as closed first (prevents __del__
            # from scheduling aclose() on a dead event loop).
            _force_close_async_httpx(client)
            # Sync clients: close the httpx connection pool cleanly.
            # Async clients: skip — we already neutered __del__ above.
            try:
                close_fn = getattr(client, "close", None)
                if close_fn and not inspect.iscoroutinefunction(close_fn):
                    close_fn()
            except Exception:
                pass
        _client_cache.clear()


def cleanup_stale_async_clients() -> None:
    """Force-close cached async clients whose event loop is closed.

    Call this after each agent turn to proactively clean up stale clients
    before GC can trigger ``AsyncHttpxClientWrapper.__del__`` on them.
    This is defense-in-depth — the primary fix is ``neuter_async_httpx_del``
    which disables ``__del__`` entirely.
    """
    with _client_cache_lock:
        stale_keys = []
        for key, entry in _client_cache.items():
            client, _default, cached_loop = entry
            if cached_loop is not None and cached_loop.is_closed():
                _force_close_async_httpx(client)
                stale_keys.append(key)
        for key in stale_keys:
            del _client_cache[key]


def _is_openrouter_client(client: Any) -> bool:
    for obj in (client, getattr(client, "_client", None), getattr(client, "client", None)):
        if obj and base_url_host_matches(str(getattr(obj, "base_url", "") or ""), "openrouter.ai"):
            return True
    return False


def _cached_client_accepts_slash_models(client: Any, cached_default: Optional[str]) -> bool:
    """Best-effort check for cached clients that accept ``vendor/model`` IDs."""
    if _is_openrouter_client(client):
        return True
    return bool(cached_default and "/" in cached_default)


def _compat_model(client: Any, model: Optional[str], cached_default: Optional[str]) -> Optional[str]:
    """Keep slash-bearing model IDs only for cached clients that support them.

    Mirrors the guard in resolve_provider_client() which is skipped on cache hits.
    """
    if model and "/" in model and not _cached_client_accepts_slash_models(client, cached_default):
        return cached_default
    return model or cached_default


def _get_cached_client(
    provider: str,
    model: str = None,
    async_mode: bool = False,
    base_url: str = None,
    api_key: str = None,
    api_mode: str = None,
    main_runtime: Optional[Dict[str, Any]] = None,
    is_vision: bool = False,
) -> Tuple[Optional[Any], Optional[str]]:
    """Get or create a cached client for the given provider.

    Async clients (AsyncOpenAI) use httpx.AsyncClient internally, which
    binds to the event loop that was current when the client was created.
    Using such a client on a *different* loop causes deadlocks or
    RuntimeError.  To prevent cross-loop issues, the cache validates on
    every async hit that the cached loop is the *current, open* loop.
    If the loop changed (e.g. a new gateway worker-thread loop), the stale
    entry is replaced in-place rather than creating an additional entry.

    This keeps cache size bounded to one entry per unique provider config,
    preventing the fd-exhaustion that previously occurred in long-running
    gateways where recycled worker threads created unbounded entries (#10200).
    """
    # Resolve the current event loop for async clients so we can validate
    # cached entries.  Loop identity is NOT in the cache key — instead we
    # check at hit time whether the cached loop is still current and open.
    # This prevents unbounded cache growth from recycled worker-thread loops
    # while still guaranteeing we never reuse a client on the wrong loop
    # (which causes deadlocks, see #2681).
    current_loop = None
    if async_mode:
        try:
            import asyncio as _aio
            current_loop = _aio.get_event_loop()
        except RuntimeError:
            pass
    runtime = _normalize_main_runtime(main_runtime)
    cache_key = _client_cache_key(
        provider,
        async_mode=async_mode,
        base_url=base_url,
        api_key=api_key,
        api_mode=api_mode,
        main_runtime=main_runtime,
        is_vision=is_vision,
    )
    with _client_cache_lock:
        if cache_key in _client_cache:
            cached_client, cached_default, cached_loop = _client_cache[cache_key]
            if async_mode:
                # Validate: the cached client must be bound to the CURRENT,
                # OPEN loop.  If the loop changed or was closed, the httpx
                # transport inside is dead — force-close and replace.
                loop_ok = (
                    cached_loop is not None
                    and cached_loop is current_loop
                    and not cached_loop.is_closed()
                )
                if loop_ok:
                    effective = _compat_model(cached_client, model, cached_default)
                    return cached_client, effective
                # Stale — evict and fall through to create a new client.
                _force_close_async_httpx(cached_client)
                del _client_cache[cache_key]
            else:
                effective = _compat_model(cached_client, model, cached_default)
                return cached_client, effective
    # Build outside the lock
    client, default_model = resolve_provider_client(
        provider,
        model,
        async_mode,
        explicit_base_url=base_url,
        explicit_api_key=api_key,
        api_mode=api_mode,
        main_runtime=runtime,
        is_vision=is_vision,
    )
    if client is not None:
        # For async clients, remember which loop they were created on so we
        # can detect stale entries later.
        bound_loop = current_loop
        with _client_cache_lock:
            if cache_key not in _client_cache:
                # Safety belt: if the cache has grown beyond the max, evict
                # the oldest entries (FIFO — dict preserves insertion order).
                while len(_client_cache) >= _CLIENT_CACHE_MAX_SIZE:
                    evict_key, evict_entry = next(iter(_client_cache.items()))
                    _force_close_async_httpx(evict_entry[0])
                    del _client_cache[evict_key]
                _client_cache[cache_key] = (client, default_model, bound_loop)
            else:
                client, default_model, _ = _client_cache[cache_key]
    return client, model or default_model


def _resolve_task_provider_model(
    task: str = None,
    provider: str = None,
    model: str = None,
    base_url: str = None,
    api_key: str = None,
) -> Tuple[str, Optional[str], Optional[str], Optional[str], Optional[str]]:
    """Determine provider + model for a call.

    Priority:
      1. Explicit provider/model/base_url/api_key args (always win)
      2. Config file (auxiliary.{task}.provider/model/base_url)
      3. "auto" (full auto-detection chain)

    Returns (provider, model, base_url, api_key, api_mode) where model may
    be None (use provider default). When base_url is set, provider is forced
    to "custom" and the task uses that direct endpoint. api_mode is one of
    "chat_completions", "codex_responses", or None (auto-detect).
    """
    cfg_provider = None
    cfg_model = None
    cfg_base_url = None
    cfg_api_key = None
    cfg_api_mode = None

    if task:
        task_config = _get_auxiliary_task_config(task)
        cfg_provider = str(task_config.get("provider", "")).strip() or None
        cfg_model = str(task_config.get("model", "")).strip() or None
        cfg_base_url = str(task_config.get("base_url", "")).strip() or None
        cfg_api_key = str(task_config.get("api_key", "")).strip() or None
        cfg_api_mode = str(task_config.get("api_mode", "")).strip() or None

    resolved_model = model or cfg_model
    resolved_api_mode = cfg_api_mode

    if base_url:
        return "custom", resolved_model, base_url, api_key, resolved_api_mode
    if provider:
        return provider, resolved_model, base_url, api_key, resolved_api_mode

    if task:
        # Config.yaml is the primary source for per-task overrides.
        if cfg_base_url:
            return "custom", resolved_model, cfg_base_url, cfg_api_key, resolved_api_mode
        if cfg_provider and cfg_provider != "auto":
            return cfg_provider, resolved_model, None, None, resolved_api_mode

        return "auto", resolved_model, None, None, resolved_api_mode

    return "auto", resolved_model, None, None, resolved_api_mode


_DEFAULT_AUX_TIMEOUT = 30.0


def _get_auxiliary_task_config(task: str) -> Dict[str, Any]:
    """Return the config dict for auxiliary.<task>, or {} when unavailable."""
    if not task:
        return {}
    try:
        from hermes_cli.config import load_config
        config = load_config()
    except ImportError:
        return {}
    aux = config.get("auxiliary", {}) if isinstance(config, dict) else {}
    task_config = aux.get(task, {}) if isinstance(aux, dict) else {}
    return task_config if isinstance(task_config, dict) else {}


def _get_task_timeout(task: str, default: float = _DEFAULT_AUX_TIMEOUT) -> float:
    """Read timeout from auxiliary.{task}.timeout in config, falling back to *default*."""
    if not task:
        return default
    task_config = _get_auxiliary_task_config(task)
    raw = task_config.get("timeout")
    if raw is not None:
        try:
            return float(raw)
        except (ValueError, TypeError):
            pass
    return default


def _get_task_extra_body(task: str) -> Dict[str, Any]:
    """Read auxiliary.<task>.extra_body and return a shallow copy when valid."""
    task_config = _get_auxiliary_task_config(task)
    raw = task_config.get("extra_body")
    if isinstance(raw, dict):
        return dict(raw)
    return {}


# ---------------------------------------------------------------------------
# Anthropic-compatible endpoint detection + image block conversion
# ---------------------------------------------------------------------------

# Providers that use Anthropic-compatible endpoints (via OpenAI SDK wrapper).
# Their image content blocks must use Anthropic format, not OpenAI format.
_ANTHROPIC_COMPAT_PROVIDERS = frozenset({"minimax", "minimax-cn"})


def _is_anthropic_compat_endpoint(provider: str, base_url: str) -> bool:
    """Detect if an endpoint expects Anthropic-format content blocks.

    Returns True for known Anthropic-compatible providers (MiniMax) and
    any endpoint whose URL contains ``/anthropic`` in the path.
    """
    if provider in _ANTHROPIC_COMPAT_PROVIDERS:
        return True
    url_lower = (base_url or "").lower()
    return "/anthropic" in url_lower


def _convert_openai_images_to_anthropic(messages: list) -> list:
    """Convert OpenAI ``image_url`` content blocks to Anthropic ``image`` blocks.

    Only touches messages that have list-type content with ``image_url`` blocks;
    plain text messages pass through unchanged.
    """
    converted = []
    for msg in messages:
        content = msg.get("content")
        if not isinstance(content, list):
            converted.append(msg)
            continue
        new_content = []
        changed = False
        for block in content:
            if block.get("type") == "image_url":
                image_url_val = (block.get("image_url") or {}).get("url", "")
                if image_url_val.startswith("data:"):
                    # Parse data URI: data:<media_type>;base64,<data>
                    header, _, b64data = image_url_val.partition(",")
                    media_type = "image/png"
                    if ":" in header and ";" in header:
                        media_type = header.split(":", 1)[1].split(";", 1)[0]
                    new_content.append({
                        "type": "image",
                        "source": {
                            "type": "base64",
                            "media_type": media_type,
                            "data": b64data,
                        },
                    })
                else:
                    # URL-based image
                    new_content.append({
                        "type": "image",
                        "source": {
                            "type": "url",
                            "url": image_url_val,
                        },
                    })
                changed = True
            else:
                new_content.append(block)
        converted.append({**msg, "content": new_content} if changed else msg)
    return converted


def _build_call_kwargs(
    provider: str,
    model: str,
    messages: list,
    temperature: Optional[float] = None,
    max_tokens: Optional[int] = None,
    tools: Optional[list] = None,
    timeout: float = 30.0,
    extra_body: Optional[dict] = None,
    base_url: Optional[str] = None,
) -> dict:
    """Build kwargs for .chat.completions.create() with model/provider adjustments."""
    kwargs: Dict[str, Any] = {
        "model": model,
        "messages": messages,
        "timeout": timeout,
    }

    fixed_temperature = _fixed_temperature_for_model(model, base_url)
    if fixed_temperature is OMIT_TEMPERATURE:
        temperature = None  # strip — let server choose
    elif fixed_temperature is not None:
        temperature = fixed_temperature

    # Opus 4.7+ rejects any non-default temperature/top_p/top_k — silently
    # drop here so auxiliary callers that hardcode temperature (e.g. 0 on
    # structured-JSON extraction) don't 400 the moment
    # the aux model is flipped to 4.7.
    if temperature is not None:
        from agent.anthropic_adapter import _forbids_sampling_params
        if _forbids_sampling_params(model):
            temperature = None

    if temperature is not None:
        kwargs["temperature"] = temperature

    if max_tokens is not None:
        # Codex adapter handles max_tokens internally; OpenRouter/Nous use max_tokens.
        # Direct OpenAI api.openai.com with newer models needs max_completion_tokens.
        if provider == "custom":
            custom_base = base_url or _current_custom_base_url()
            if base_url_hostname(custom_base) == "api.openai.com":
                kwargs["max_completion_tokens"] = max_tokens
            else:
                kwargs["max_tokens"] = max_tokens
        else:
            kwargs["max_tokens"] = max_tokens

    if tools:
        kwargs["tools"] = tools

    # Provider-specific extra_body
    merged_extra = dict(extra_body or {})
    if provider == "nous" or auxiliary_is_nous:
        merged_extra.setdefault("tags", []).extend(["product=hermes-agent"])
    if merged_extra:
        kwargs["extra_body"] = merged_extra

    return kwargs


def _validate_llm_response(response: Any, task: str = None) -> Any:
    """Validate that an LLM response has the expected .choices[0].message shape.

    Fails fast with a clear error instead of letting malformed payloads
    propagate to downstream consumers where they crash with misleading
    AttributeError (e.g. "'str' object has no attribute 'choices'").

    See #7264.
    """
    if response is None:
        raise RuntimeError(
            f"Auxiliary {task or 'call'}: LLM returned None response"
        )
    # Allow SimpleNamespace responses from adapters (CodexAuxiliaryClient,
    # AnthropicAuxiliaryClient) — they have .choices[0].message.
    try:
        choices = response.choices
        if not choices or not hasattr(choices[0], "message"):
            raise AttributeError("missing choices[0].message")
    except (AttributeError, TypeError, IndexError) as exc:
        response_type = type(response).__name__
        response_preview = str(response)[:120]
        raise RuntimeError(
            f"Auxiliary {task or 'call'}: LLM returned invalid response "
            f"(type={response_type}): {response_preview!r}. "
            f"Expected object with .choices[0].message — check provider "
            f"adapter or custom endpoint compatibility."
        ) from exc
    return response


def call_llm(
    task: str = None,
    *,
    provider: str = None,
    model: str = None,
    base_url: str = None,
    api_key: str = None,
    main_runtime: Optional[Dict[str, Any]] = None,
    messages: list,
    temperature: float = None,
    max_tokens: int = None,
    tools: list = None,
    timeout: float = None,
    extra_body: dict = None,
) -> Any:
    """Centralized synchronous LLM call.

    Resolves provider + model (from task config, explicit args, or auto-detect),
    handles auth, request formatting, and model-specific arg adjustments.

    Args:
        task: Auxiliary task name ("compression", "vision", "web_extract",
              "session_search", "skills_hub", "mcp", "title_generation").
              Reads provider:model from config/env. Ignored if provider is set.
        provider: Explicit provider override.
        model: Explicit model override.
        messages: Chat messages list.
        temperature: Sampling temperature (None = provider default).
        max_tokens: Max output tokens (handles max_tokens vs max_completion_tokens).
        tools: Tool definitions (for function calling).
        timeout: Request timeout in seconds (None = read from auxiliary.{task}.timeout config).
        extra_body: Additional request body fields.

    Returns:
        Response object with .choices[0].message.content

    Raises:
        RuntimeError: If no provider is configured.
    """
    resolved_provider, resolved_model, resolved_base_url, resolved_api_key, resolved_api_mode = _resolve_task_provider_model(
        task, provider, model, base_url, api_key)
    effective_extra_body = _get_task_extra_body(task)
    effective_extra_body.update(extra_body or {})

    if task == "vision":
        effective_provider, client, final_model = resolve_vision_provider_client(
            provider=resolved_provider if resolved_provider != "auto" else provider,
            model=resolved_model or model,
            base_url=resolved_base_url or base_url,
            api_key=resolved_api_key or api_key,
            async_mode=False,
        )
        if client is None and resolved_provider != "auto" and not resolved_base_url:
            logger.warning(
                "Vision provider %s unavailable, falling back to auto vision backends",
                resolved_provider,
            )
            effective_provider, client, final_model = resolve_vision_provider_client(
                provider="auto",
                model=resolved_model,
                async_mode=False,
            )
        if client is None:
            raise RuntimeError(
                f"No LLM provider configured for task={task} provider={resolved_provider}. "
                f"Run: hermes setup"
            )
        resolved_provider = effective_provider or resolved_provider
    else:
        client, final_model = _get_cached_client(
            resolved_provider,
            resolved_model,
            base_url=resolved_base_url,
            api_key=resolved_api_key,
            api_mode=resolved_api_mode,
            main_runtime=main_runtime,
        )
        if client is None:
            # When the user explicitly chose a non-OpenRouter provider but no
            # credentials were found, fail fast instead of silently routing
            # through OpenRouter (which causes confusing 404s).
            _explicit = (resolved_provider or "").strip().lower()
            if _explicit and _explicit not in ("auto", "openrouter", "custom"):
                raise RuntimeError(
                    f"Provider '{_explicit}' is set in config.yaml but no API key "
                    f"was found. Set the {_explicit.upper()}_API_KEY environment "
                    f"variable, or switch to a different provider with `hermes model`."
                )
            # For auto/custom with no credentials, try the full auto chain
            # rather than hardcoding OpenRouter (which may be depleted).
            # Pass model=None so each provider uses its own default —
            # resolved_model may be an OpenRouter-format slug that doesn't
            # work on other providers.
            if not resolved_base_url:
                logger.info("Auxiliary %s: provider %s unavailable, trying auto-detection chain",
                            task or "call", resolved_provider)
                client, final_model = _get_cached_client("auto", main_runtime=main_runtime)
        if client is None:
            raise RuntimeError(
                f"No LLM provider configured for task={task} provider={resolved_provider}. "
                f"Run: hermes setup")

    effective_timeout = timeout if timeout is not None else _get_task_timeout(task)

    # Log what we're about to do — makes auxiliary operations visible
    _base_info = str(getattr(client, "base_url", resolved_base_url) or "")
    if task:
        logger.info("Auxiliary %s: using %s (%s)%s",
                     task, resolved_provider or "auto", final_model or "default",
                     f" at {_base_info}" if _base_info and "openrouter" not in _base_info else "")

    # Pass the client's actual base_url (not just resolved_base_url) so
    # endpoint-specific temperature overrides can distinguish
    # api.moonshot.ai vs api.kimi.com/coding even on auto-detected routes.
    kwargs = _build_call_kwargs(
        resolved_provider, final_model, messages,
        temperature=temperature, max_tokens=max_tokens,
        tools=tools, timeout=effective_timeout, extra_body=effective_extra_body,
        base_url=_base_info or resolved_base_url)

    # Convert image blocks for Anthropic-compatible endpoints (e.g. MiniMax)
    _client_base = str(getattr(client, "base_url", "") or "")
    if _is_anthropic_compat_endpoint(resolved_provider, _client_base):
        kwargs["messages"] = _convert_openai_images_to_anthropic(kwargs["messages"])

    # Handle unsupported temperature, max_tokens vs max_completion_tokens retry,
    # then payment fallback.
    try:
        return _validate_llm_response(
            client.chat.completions.create(**kwargs), task)
    except Exception as first_err:
        if "temperature" in kwargs and _is_unsupported_temperature_error(first_err):
            retry_kwargs = dict(kwargs)
            retry_kwargs.pop("temperature", None)
            logger.info(
                "Auxiliary %s: provider rejected temperature; retrying once without it",
                task or "call",
            )
            try:
                return _validate_llm_response(
                    client.chat.completions.create(**retry_kwargs), task)
            except Exception as retry_err:
                retry_err_str = str(retry_err)
                # If retry still fails, fall through to the max_tokens /
                # payment / auth chains below using the temperature-stripped
                # kwargs.  Re-raise only if the retry hit something those
                # chains won't handle.
                if not (
                    _is_payment_error(retry_err)
                    or _is_connection_error(retry_err)
                    or _is_auth_error(retry_err)
                    or "max_tokens" in retry_err_str
                    or "unsupported_parameter" in retry_err_str
                ):
                    raise
                first_err = retry_err
                kwargs = retry_kwargs

        err_str = str(first_err)
        if max_tokens is not None and (
            "max_tokens" in err_str
            or "unsupported_parameter" in err_str
            or _is_unsupported_parameter_error(first_err, "max_tokens")
        ):
            kwargs.pop("max_tokens", None)
            kwargs["max_completion_tokens"] = max_tokens
            try:
                return _validate_llm_response(
                    client.chat.completions.create(**kwargs), task)
            except Exception as retry_err:
                # If the max_tokens retry also hits a payment or connection
                # error, fall through to the fallback chain below.
                if not (_is_payment_error(retry_err) or _is_connection_error(retry_err)):
                    raise
                first_err = retry_err

        # ── Nous auth refresh parity with main agent ──────────────────
        client_is_nous = (
            resolved_provider == "nous"
            or base_url_host_matches(_base_info, "inference-api.nousresearch.com")
        )
        if _is_auth_error(first_err) and client_is_nous:
            refreshed_client, refreshed_model = _refresh_nous_auxiliary_client(
                cache_provider=resolved_provider or "nous",
                model=final_model,
                async_mode=False,
                base_url=resolved_base_url,
                api_key=resolved_api_key,
                api_mode=resolved_api_mode,
                main_runtime=main_runtime,
                is_vision=(task == "vision"),
            )
            if refreshed_client is not None:
                logger.info("Auxiliary %s: refreshed Nous runtime credentials after 401, retrying",
                            task or "call")
                if refreshed_model and refreshed_model != kwargs.get("model"):
                    kwargs["model"] = refreshed_model
                return _validate_llm_response(
                    refreshed_client.chat.completions.create(**kwargs), task)

        # ── Auth refresh retry ───────────────────────────────────────
        if (_is_auth_error(first_err)
                and resolved_provider not in ("auto", "", None)
                and not client_is_nous):
            if _refresh_provider_credentials(resolved_provider):
                logger.info(
                    "Auxiliary %s: refreshed %s credentials after auth error, retrying",
                    task or "call", resolved_provider,
                )
                retry_client, retry_model = (
                    resolve_vision_provider_client(
                        provider=resolved_provider,
                        model=final_model,
                        async_mode=False,
                    )[1:]
                    if task == "vision"
                    else _get_cached_client(
                        resolved_provider,
                        resolved_model,
                        base_url=resolved_base_url,
                        api_key=resolved_api_key,
                        api_mode=resolved_api_mode,
                        main_runtime=main_runtime,
                    )
                )
                if retry_client is not None:
                    retry_kwargs = _build_call_kwargs(
                        resolved_provider,
                        retry_model or final_model,
                        messages,
                        temperature=temperature,
                        max_tokens=max_tokens,
                        tools=tools,
                        timeout=effective_timeout,
                        extra_body=effective_extra_body,
                        base_url=resolved_base_url,
                    )
                    _retry_base = str(getattr(retry_client, "base_url", "") or "")
                    if _is_anthropic_compat_endpoint(resolved_provider, _retry_base):
                        retry_kwargs["messages"] = _convert_openai_images_to_anthropic(retry_kwargs["messages"])
                    return _validate_llm_response(
                        retry_client.chat.completions.create(**retry_kwargs), task)

        # ── Payment / credit exhaustion fallback ──────────────────────
        # When the resolved provider returns 402 or a credit-related error,
        # try alternative providers instead of giving up.  This handles the
        # common case where a user runs out of OpenRouter credits but has
        # Codex OAuth or another provider available.
        #
        # ── Connection error fallback ────────────────────────────────
        # When a provider endpoint is unreachable (DNS failure, connection
        # refused, timeout), try alternative providers.  This handles stale
        # Codex/OAuth tokens that authenticate but whose endpoint is down,
        # and providers the user never configured that got picked up by
        # the auto-detection chain.
        should_fallback = _is_payment_error(first_err) or _is_connection_error(first_err)
        # Only try alternative providers when the user didn't explicitly
        # configure this task's provider.  Explicit provider = hard constraint;
        # auto (the default) = best-effort fallback chain.  (#7559)
        is_auto = resolved_provider in ("auto", "", None)
        if should_fallback and is_auto:
            reason = "payment error" if _is_payment_error(first_err) else "connection error"
            logger.info("Auxiliary %s: %s on %s (%s), trying fallback",
                        task or "call", reason, resolved_provider, first_err)
            fb_client, fb_model, fb_label = _try_payment_fallback(
                resolved_provider, task, reason=reason)
            if fb_client is not None:
                fb_kwargs = _build_call_kwargs(
                    fb_label, fb_model, messages,
                    temperature=temperature, max_tokens=max_tokens,
                    tools=tools, timeout=effective_timeout,
                    extra_body=effective_extra_body,
                    base_url=str(getattr(fb_client, "base_url", "") or ""))
                return _validate_llm_response(
                    fb_client.chat.completions.create(**fb_kwargs), task)
        raise


def extract_content_or_reasoning(response) -> str:
    """Extract content from an LLM response, falling back to reasoning fields.

    Mirrors the main agent loop's behavior when a reasoning model (DeepSeek-R1,
    Qwen-QwQ, etc.) returns ``content=None`` with reasoning in structured fields.

    Resolution order:
      1. ``message.content`` — strip inline think/reasoning blocks, check for
         remaining non-whitespace text.
      2. ``message.reasoning`` / ``message.reasoning_content`` — direct
         structured reasoning fields (DeepSeek, Moonshot, Novita, etc.).
      3. ``message.reasoning_details`` — OpenRouter unified array format.

    Returns the best available text, or ``""`` if nothing found.
    """
    import re

    msg = response.choices[0].message
    content = (msg.content or "").strip()

    if content:
        # Strip inline think/reasoning blocks (mirrors _strip_think_blocks)
        cleaned = re.sub(
            r"<(?:think|thinking|reasoning|thought|REASONING_SCRATCHPAD)>"
            r".*?"
            r"</(?:think|thinking|reasoning|thought|REASONING_SCRATCHPAD)>",
            "", content, flags=re.DOTALL | re.IGNORECASE,
        ).strip()
        if cleaned:
            return cleaned

    # Content is empty or reasoning-only — try structured reasoning fields
    reasoning_parts: list[str] = []
    for field in ("reasoning", "reasoning_content"):
        val = getattr(msg, field, None)
        if val and isinstance(val, str) and val.strip() and val not in reasoning_parts:
            reasoning_parts.append(val.strip())

    details = getattr(msg, "reasoning_details", None)
    if details and isinstance(details, list):
        for detail in details:
            if isinstance(detail, dict):
                summary = (
                    detail.get("summary")
                    or detail.get("content")
                    or detail.get("text")
                )
                if summary and summary not in reasoning_parts:
                    reasoning_parts.append(summary.strip() if isinstance(summary, str) else str(summary))

    if reasoning_parts:
        return "\n\n".join(reasoning_parts)

    return ""


async def async_call_llm(
    task: str = None,
    *,
    provider: str = None,
    model: str = None,
    base_url: str = None,
    api_key: str = None,
    messages: list,
    temperature: float = None,
    max_tokens: int = None,
    tools: list = None,
    timeout: float = None,
    extra_body: dict = None,
) -> Any:
    """Centralized asynchronous LLM call.

    Same as call_llm() but async. See call_llm() for full documentation.
    """
    resolved_provider, resolved_model, resolved_base_url, resolved_api_key, resolved_api_mode = _resolve_task_provider_model(
        task, provider, model, base_url, api_key)
    effective_extra_body = _get_task_extra_body(task)
    effective_extra_body.update(extra_body or {})

    if task == "vision":
        effective_provider, client, final_model = resolve_vision_provider_client(
            provider=resolved_provider if resolved_provider != "auto" else provider,
            model=resolved_model or model,
            base_url=resolved_base_url or base_url,
            api_key=resolved_api_key or api_key,
            async_mode=True,
        )
        if client is None and resolved_provider != "auto" and not resolved_base_url:
            logger.warning(
                "Vision provider %s unavailable, falling back to auto vision backends",
                resolved_provider,
            )
            effective_provider, client, final_model = resolve_vision_provider_client(
                provider="auto",
                model=resolved_model,
                async_mode=True,
            )
        if client is None:
            raise RuntimeError(
                f"No LLM provider configured for task={task} provider={resolved_provider}. "
                f"Run: hermes setup"
            )
        resolved_provider = effective_provider or resolved_provider
    else:
        client, final_model = _get_cached_client(
            resolved_provider,
            resolved_model,
            async_mode=True,
            base_url=resolved_base_url,
            api_key=resolved_api_key,
            api_mode=resolved_api_mode,
        )
        if client is None:
            _explicit = (resolved_provider or "").strip().lower()
            if _explicit and _explicit not in ("auto", "openrouter", "custom"):
                raise RuntimeError(
                    f"Provider '{_explicit}' is set in config.yaml but no API key "
                    f"was found. Set the {_explicit.upper()}_API_KEY environment "
                    f"variable, or switch to a different provider with `hermes model`."
                )
            if not resolved_base_url:
                logger.info("Auxiliary %s: provider %s unavailable, trying auto-detection chain",
                            task or "call", resolved_provider)
                client, final_model = _get_cached_client("auto", async_mode=True)
        if client is None:
            raise RuntimeError(
                f"No LLM provider configured for task={task} provider={resolved_provider}. "
                f"Run: hermes setup")

    effective_timeout = timeout if timeout is not None else _get_task_timeout(task)

    # Pass the client's actual base_url (not just resolved_base_url) so
    # endpoint-specific temperature overrides can distinguish
    # api.moonshot.ai vs api.kimi.com/coding even on auto-detected routes.
    _client_base = str(getattr(client, "base_url", "") or "")
    kwargs = _build_call_kwargs(
        resolved_provider, final_model, messages,
        temperature=temperature, max_tokens=max_tokens,
        tools=tools, timeout=effective_timeout, extra_body=effective_extra_body,
        base_url=_client_base or resolved_base_url)

    # Convert image blocks for Anthropic-compatible endpoints (e.g. MiniMax)
    if _is_anthropic_compat_endpoint(resolved_provider, _client_base):
        kwargs["messages"] = _convert_openai_images_to_anthropic(kwargs["messages"])

    try:
        return _validate_llm_response(
            await client.chat.completions.create(**kwargs), task)
    except Exception as first_err:
        if "temperature" in kwargs and _is_unsupported_temperature_error(first_err):
            retry_kwargs = dict(kwargs)
            retry_kwargs.pop("temperature", None)
            logger.info(
                "Auxiliary %s (async): provider rejected temperature; retrying once without it",
                task or "call",
            )
            try:
                return _validate_llm_response(
                    await client.chat.completions.create(**retry_kwargs), task)
            except Exception as retry_err:
                retry_err_str = str(retry_err)
                if not (
                    _is_payment_error(retry_err)
                    or _is_connection_error(retry_err)
                    or _is_auth_error(retry_err)
                    or "max_tokens" in retry_err_str
                    or "unsupported_parameter" in retry_err_str
                ):
                    raise
                first_err = retry_err
                kwargs = retry_kwargs

        err_str = str(first_err)
        if max_tokens is not None and (
            "max_tokens" in err_str
            or "unsupported_parameter" in err_str
            or _is_unsupported_parameter_error(first_err, "max_tokens")
        ):
            kwargs.pop("max_tokens", None)
            kwargs["max_completion_tokens"] = max_tokens
            try:
                return _validate_llm_response(
                    await client.chat.completions.create(**kwargs), task)
            except Exception as retry_err:
                # If the max_tokens retry also hits a payment or connection
                # error, fall through to the fallback chain below.
                if not (_is_payment_error(retry_err) or _is_connection_error(retry_err)):
                    raise
                first_err = retry_err

        # ── Nous auth refresh parity with main agent ──────────────────
        client_is_nous = (
            resolved_provider == "nous"
            or base_url_host_matches(_client_base, "inference-api.nousresearch.com")
        )
        if _is_auth_error(first_err) and client_is_nous:
            refreshed_client, refreshed_model = _refresh_nous_auxiliary_client(
                cache_provider=resolved_provider or "nous",
                model=final_model,
                async_mode=True,
                base_url=resolved_base_url,
                api_key=resolved_api_key,
                api_mode=resolved_api_mode,
                is_vision=(task == "vision"),
            )
            if refreshed_client is not None:
                logger.info("Auxiliary %s (async): refreshed Nous runtime credentials after 401, retrying",
                            task or "call")
                if refreshed_model and refreshed_model != kwargs.get("model"):
                    kwargs["model"] = refreshed_model
                return _validate_llm_response(
                    await refreshed_client.chat.completions.create(**kwargs), task)

        # ── Auth refresh retry (mirrors sync call_llm) ───────────────
        if (_is_auth_error(first_err)
                and resolved_provider not in ("auto", "", None)
                and not client_is_nous):
            if _refresh_provider_credentials(resolved_provider):
                logger.info(
                    "Auxiliary %s (async): refreshed %s credentials after auth error, retrying",
                    task or "call", resolved_provider,
                )
                if task == "vision":
                    _, retry_client, retry_model = resolve_vision_provider_client(
                        provider=resolved_provider,
                        model=final_model,
                        async_mode=True,
                    )
                else:
                    retry_client, retry_model = _get_cached_client(
                        resolved_provider,
                        resolved_model,
                        async_mode=True,
                        base_url=resolved_base_url,
                        api_key=resolved_api_key,
                        api_mode=resolved_api_mode,
                    )
                if retry_client is not None:
                    retry_kwargs = _build_call_kwargs(
                        resolved_provider,
                        retry_model or final_model,
                        messages,
                        temperature=temperature,
                        max_tokens=max_tokens,
                        tools=tools,
                        timeout=effective_timeout,
                        extra_body=effective_extra_body,
                        base_url=resolved_base_url,
                    )
                    _retry_base = str(getattr(retry_client, "base_url", "") or "")
                    if _is_anthropic_compat_endpoint(resolved_provider, _retry_base):
                        retry_kwargs["messages"] = _convert_openai_images_to_anthropic(retry_kwargs["messages"])
                    return _validate_llm_response(
                        await retry_client.chat.completions.create(**retry_kwargs), task)

        # ── Payment / connection fallback (mirrors sync call_llm) ─────
        should_fallback = _is_payment_error(first_err) or _is_connection_error(first_err)
        is_auto = resolved_provider in ("auto", "", None)
        if should_fallback and is_auto:
            reason = "payment error" if _is_payment_error(first_err) else "connection error"
            logger.info("Auxiliary %s (async): %s on %s (%s), trying fallback",
                        task or "call", reason, resolved_provider, first_err)
            fb_client, fb_model, fb_label = _try_payment_fallback(
                resolved_provider, task, reason=reason)
            if fb_client is not None:
                fb_kwargs = _build_call_kwargs(
                    fb_label, fb_model, messages,
                    temperature=temperature, max_tokens=max_tokens,
                    tools=tools, timeout=effective_timeout,
                    extra_body=effective_extra_body,
                    base_url=str(getattr(fb_client, "base_url", "") or ""))
                # Convert sync fallback client to async
                async_fb, async_fb_model = _to_async_client(
                    fb_client, fb_model or "", is_vision=(task == "vision")
                )
                if async_fb_model and async_fb_model != fb_kwargs.get("model"):
                    fb_kwargs["model"] = async_fb_model
                return _validate_llm_response(
                    await async_fb.chat.completions.create(**fb_kwargs), task)
        raise
-												feat: add native Anthropic auxiliary vision

											
										
										
											2026-03-14 21:14:20 -07:00
+								"""Shared auxiliary client router for side tasks.
-												Hermes Agent UX Improvements

											
										
										
											2026-02-22 02:16:11 -08:00
 								Provides a single resolution chain so every consumer (context compression,
 								session search, web extraction, vision analysis, browser vision) picks up
 								the best available backend without duplicating fallback logic.
-												fix: harden auxiliary model config — gateway bridge, vision safety, tests

Improvements on top of PR #606 (auxiliary model configuration):

1. Gateway bridge: Added auxiliary.* and compression.summary_provider
   config bridging to gateway/run.py so config.yaml settings work from
   messaging platforms (not just CLI). Matches the pattern in cli.py.

2. Vision auto-fallback safety: In auto mode, vision now only tries
   OpenRouter + Nous Portal (known multimodal-capable providers).
   Custom endpoints, Codex, and API-key providers are skipped to avoid
   confusing errors from providers that don't support vision input.
   Explicit provider override (AUXILIARY_VISION_PROVIDER=main) still
   allows using any provider.

3. Comprehensive tests (46 new):
   - _get_auxiliary_provider env var resolution (8 tests)
   - _resolve_forced_provider with all provider types (8 tests)
   - Per-task provider routing integration (4 tests)
   - Vision auto-fallback safety (7 tests)
   - Config bridging logic (11 tests)
   - Gateway/CLI bridge parity (2 tests)
   - Vision model override via env var (2 tests)
   - DEFAULT_CONFIG shape validation (4 tests)

4. Docs: Added auxiliary_client.py to AGENTS.md project structure.
   Updated module docstring with separate text/vision resolution chains.

Tests: 2429 passed (was 2383).

											
										
										
											2026-03-08 18:06:40 -07:00
+								Resolution order for text tasks (auto mode):
-												Hermes Agent UX Improvements

											
										
										
											2026-02-22 02:16:11 -08:00
+. OpenRouter  (OPENROUTER_API_KEY)
 . Nous Portal (~/.hermes/auth.json active provider)
-												refactor: make config.yaml the single source of truth for endpoint URLs (#4165)

OPENAI_BASE_URL was written to .env AND config.yaml, creating a dual-source
confusion. Users (especially Docker) would see the URL in .env and assume
that's where all config lives, then wonder why LLM_MODEL in .env didn't work.

Changes:
- Remove all 27 save_env_value("OPENAI_BASE_URL", ...) calls across main.py,
  setup.py, and tools_config.py
- Remove OPENAI_BASE_URL env var reading from runtime_provider.py, cli.py,
  models.py, and gateway/run.py
- Remove LLM_MODEL/HERMES_MODEL env var reading from gateway/run.py and
  auxiliary_client.py — config.yaml model.default is authoritative
- Vision base URL now saved to config.yaml auxiliary.vision.base_url
  (both setup wizard and tools_config paths)
- Tests updated to set config values instead of env vars

Convention enforced: .env is for SECRETS only (API keys). All other
configuration (model names, base URLs, provider selection) lives
exclusively in config.yaml.
											
										
										
											2026-03-30 22:02:53 -07:00
+. Custom endpoint (config.yaml model.base_url + OPENAI_API_KEY)
-												refactor(cli): Finalize OpenAI Codex Integration with OAuth

- Enhanced Codex model discovery by fetching available models from the API, with fallback to local cache and defaults.
- Updated the context compressor's summary target tokens to 2500 for improved performance.
- Added external credential detection for Codex CLI to streamline authentication.
- Refactored various components to ensure consistent handling of authentication and model selection across the application.

											
										
										
											2026-02-28 21:47:51 -08:00
+. Codex OAuth (Responses API via chatgpt.com with gpt-5.3-codex,
 								     wrapped to look like a chat.completions client)
-												feat: add native Anthropic auxiliary vision

											
										
										
											2026-03-14 21:14:20 -07:00
+. Native Anthropic
 . Direct API-key providers (z.ai/GLM, Kimi/Moonshot, MiniMax, MiniMax-CN)
 . None
-												Hermes Agent UX Improvements

											
										
										
											2026-02-22 02:16:11 -08:00
-												fix: harden auxiliary model config — gateway bridge, vision safety, tests

Improvements on top of PR #606 (auxiliary model configuration):

1. Gateway bridge: Added auxiliary.* and compression.summary_provider
   config bridging to gateway/run.py so config.yaml settings work from
   messaging platforms (not just CLI). Matches the pattern in cli.py.

2. Vision auto-fallback safety: In auto mode, vision now only tries
   OpenRouter + Nous Portal (known multimodal-capable providers).
   Custom endpoints, Codex, and API-key providers are skipped to avoid
   confusing errors from providers that don't support vision input.
   Explicit provider override (AUXILIARY_VISION_PROVIDER=main) still
   allows using any provider.

3. Comprehensive tests (46 new):
   - _get_auxiliary_provider env var resolution (8 tests)
   - _resolve_forced_provider with all provider types (8 tests)
   - Per-task provider routing integration (4 tests)
   - Vision auto-fallback safety (7 tests)
   - Config bridging logic (11 tests)
   - Gateway/CLI bridge parity (2 tests)
   - Vision model override via env var (2 tests)
   - DEFAULT_CONFIG shape validation (4 tests)

4. Docs: Added auxiliary_client.py to AGENTS.md project structure.
   Updated module docstring with separate text/vision resolution chains.

Tests: 2429 passed (was 2383).

											
										
										
											2026-03-08 18:06:40 -07:00
+								Resolution order for vision/multimodal tasks (auto mode):
-												feat: add native Anthropic auxiliary vision

											
										
										
											2026-03-14 21:14:20 -07:00
+. Selected main provider, if it is one of the supported vision backends below
 . OpenRouter
 . Nous Portal
 . Codex OAuth (gpt-5.3-codex supports vision via Responses API)
 . Native Anthropic
 . Custom endpoint (for local vision models: Qwen-VL, LLaVA, Pixtral, etc.)
 . None
-												fix: harden auxiliary model config — gateway bridge, vision safety, tests

Improvements on top of PR #606 (auxiliary model configuration):

1. Gateway bridge: Added auxiliary.* and compression.summary_provider
   config bridging to gateway/run.py so config.yaml settings work from
   messaging platforms (not just CLI). Matches the pattern in cli.py.

2. Vision auto-fallback safety: In auto mode, vision now only tries
   OpenRouter + Nous Portal (known multimodal-capable providers).
   Custom endpoints, Codex, and API-key providers are skipped to avoid
   confusing errors from providers that don't support vision input.
   Explicit provider override (AUXILIARY_VISION_PROVIDER=main) still
   allows using any provider.

3. Comprehensive tests (46 new):
   - _get_auxiliary_provider env var resolution (8 tests)
   - _resolve_forced_provider with all provider types (8 tests)
   - Per-task provider routing integration (4 tests)
   - Vision auto-fallback safety (7 tests)
   - Config bridging logic (11 tests)
   - Gateway/CLI bridge parity (2 tests)
   - Vision model override via env var (2 tests)
   - DEFAULT_CONFIG shape validation (4 tests)

4. Docs: Added auxiliary_client.py to AGENTS.md project structure.
   Updated module docstring with separate text/vision resolution chains.

Tests: 2429 passed (was 2383).

											
										
										
											2026-03-08 18:06:40 -07:00
-												refactor(auxiliary): config.yaml takes priority over env vars for aux task settings (#7889)

The auxiliary client previously checked env vars (AUXILIARY_{TASK}_PROVIDER,
AUXILIARY_{TASK}_MODEL, etc.) before config.yaml's auxiliary.{task}.* section.
This violated the project's '.env is for secrets only' policy — these are
behavioral settings, not API keys.

Flipped the resolution order in _resolve_task_provider_model():
  1. Explicit args (always win)
  2. config.yaml auxiliary.{task}.* (PRIMARY)
  3. Env var overrides (backward-compat fallback only)
  4. 'auto' (full auto-detection chain)

Env var reading code is kept for backward compatibility but config.yaml
now takes precedence. Updated module docstring and function docstring.

Also removed AUXILIARY_VISION_MODEL from _EXTRA_ENV_KEYS in config.py.
											
										
										
											2026-04-11 11:21:59 -07:00
+								Per-task overrides are configured in config.yaml under the ``auxiliary:`` section
 								(e.g. ``auxiliary.vision.provider``, ``auxiliary.compression.model``).
-												fix: harden auxiliary model config — gateway bridge, vision safety, tests

Improvements on top of PR #606 (auxiliary model configuration):

1. Gateway bridge: Added auxiliary.* and compression.summary_provider
   config bridging to gateway/run.py so config.yaml settings work from
   messaging platforms (not just CLI). Matches the pattern in cli.py.

2. Vision auto-fallback safety: In auto mode, vision now only tries
   OpenRouter + Nous Portal (known multimodal-capable providers).
   Custom endpoints, Codex, and API-key providers are skipped to avoid
   confusing errors from providers that don't support vision input.
   Explicit provider override (AUXILIARY_VISION_PROVIDER=main) still
   allows using any provider.

3. Comprehensive tests (46 new):
   - _get_auxiliary_provider env var resolution (8 tests)
   - _resolve_forced_provider with all provider types (8 tests)
   - Per-task provider routing integration (4 tests)
   - Vision auto-fallback safety (7 tests)
   - Config bridging logic (11 tests)
   - Gateway/CLI bridge parity (2 tests)
   - Vision model override via env var (2 tests)
   - DEFAULT_CONFIG shape validation (4 tests)

4. Docs: Added auxiliary_client.py to AGENTS.md project structure.
   Updated module docstring with separate text/vision resolution chains.

Tests: 2429 passed (was 2383).

											
										
										
											2026-03-08 18:06:40 -07:00
+								Default "auto" follows the chains above.
-												fix: auxiliary client payment fallback — retry with next provider on 402 (#5599)

When a user runs out of OpenRouter credits and switches to Codex (or any
other provider), auxiliary tasks (compression, vision, web_extract) would
still try OpenRouter first and fail with 402.  Two fixes:

1. Payment fallback in call_llm(): When a resolved provider returns HTTP 402
   or a credit-related error, automatically retry with the next available
   provider in the auto-detection chain.  Skips the depleted provider and
   tries Nous → Custom → Codex → API-key providers.

2. Remove hardcoded OpenRouter fallback: The old code fell back specifically
   to OpenRouter when auto/custom resolution returned no client.  Now falls
   back to the full auto-detection chain, which handles any available
   provider — not just OpenRouter.

Also extracts _get_provider_chain() as a shared function (replaces inline
tuple in _resolve_auto and the new fallback), built at call time so test
patches on _try_* functions remain visible.

Adds 16 tests covering _is_payment_error(), _get_provider_chain(),
_try_payment_fallback(), and call_llm() integration with 402 retry.
											
										
										
											2026-04-06 12:41:40 -07:00
+								Payment / credit exhaustion fallback:
 								  When a resolved provider returns HTTP 402 or a credit-related error,
 								  call_llm() automatically retries with the next available provider in the
 								  auto-detection chain.  This handles the common case where a user depletes
 								  their OpenRouter balance but has Codex OAuth or another provider available.
-												Hermes Agent UX Improvements

											
										
										
											2026-02-22 02:16:11 -08:00
+								"""
 								import json
 								import logging
 								import os
-												fix: thread safety for concurrent subagent delegation (#1672)

* fix: thread safety for concurrent subagent delegation

Four thread-safety fixes that prevent crashes and data races when
running multiple subagents concurrently via delegate_task:

1. Remove redirect_stdout/stderr from delegate_tool — mutating global
   sys.stdout races with the spinner thread when multiple children start
   concurrently, causing segfaults. Children already run with
   quiet_mode=True so the redirect was redundant.

2. Split _run_single_child into _build_child_agent (main thread) +
   _run_single_child (worker thread). AIAgent construction creates
   httpx/SSL clients which are not thread-safe to initialize
   concurrently.

3. Add threading.Lock to SessionDB — subagents share the parent's
   SessionDB and call create_session/append_message from worker threads
   with no synchronization.

4. Add _active_children_lock to AIAgent — interrupt() iterates
   _active_children while worker threads append/remove children.

5. Add _client_cache_lock to auxiliary_client — multiple subagent
   threads may resolve clients concurrently via call_llm().

Based on PR #1471 by peteromallet.

* feat: Honcho base_url override via config.yaml + quick command alias type

Two features salvaged from PR #1576:

1. Honcho base_url override: allows pointing Hermes at a remote
   self-hosted Honcho deployment via config.yaml:

     honcho:
       base_url: "http://192.168.x.x:8000"

   When set, this overrides the Honcho SDK's environment mapping
   (production/local), enabling LAN/VPN Honcho deployments without
   requiring the server to live on localhost. Uses config.yaml instead
   of env var (HONCHO_URL) per project convention.

2. Quick command alias type: adds a new 'alias' quick command type
   that rewrites to another slash command before normal dispatch:

     quick_commands:
       sc:
         type: alias
         target: /context

   Supports both CLI and gateway. Arguments are forwarded to the
   target command.

Based on PR #1576 by redhelix.

---------

Co-authored-by: peteromallet <peteromallet@users.noreply.github.com>
Co-authored-by: redhelix <redhelix@users.noreply.github.com>
											
										
										
											2026-03-17 02:53:33 -07:00
+								import threading
-												fix: auxiliary client skips expired Codex JWT and propagates Anthropic OAuth flag

Two bugs in the auxiliary provider auto-detection chain:

1. Expired Codex JWT blocks the auto chain: _read_codex_access_token()
   returned any stored token without checking expiry, preventing fallback
   to working providers. Now decodes JWT exp claim and returns None for
   expired tokens.

2. Auxiliary Anthropic client missing OAuth identity transforms:
   _AnthropicCompletionsAdapter always called build_anthropic_kwargs with
   is_oauth=False, causing 400 errors for OAuth tokens. Now detects OAuth
   tokens via _is_oauth_token() and propagates the flag through the
   adapter chain.

Cherry-picked from PR #2378 by 0xbyt4. Fixed test_api_key_no_oauth_flag
to mock resolve_anthropic_token directly (env var alone was insufficient).

											
										
										
											2026-03-21 17:36:25 -07:00
+								import time
-												chore: remove ~100 unused imports across 55 files (#3016)

Automated cleanup via pyflakes + autoflake with manual review.

Changes:
- Removed unused stdlib imports (os, sys, json, pathlib.Path, etc.)
- Removed unused typing imports (List, Dict, Any, Optional, Tuple, Set, etc.)
- Removed unused internal imports (hermes_cli.auth, hermes_cli.config, etc.)
- Fixed cli.py: removed 8 shadowed banner imports (imported from hermes_cli.banner
  then immediately redefined locally — only build_welcome_banner is actually used)
- Added noqa comments to imports that appear unused but serve a purpose:
  - Re-exports (gateway/session.py SessionResetPolicy, tools/terminal_tool.py
    is_interrupted/_interrupt_event)
  - SDK presence checks in try/except (daytona, fal_client, discord)
  - Test mock targets (auxiliary_client.py Path, mcp_config.py get_hermes_home)

Zero behavioral changes. Full test suite passes (6162/6162, 2 pre-existing
streaming test failures unrelated to this change).
											
										
										
											2026-03-25 15:02:03 -07:00
+								from pathlib import Path  # noqa: F401 — used by test mocks
-												refactor(cli): Finalize OpenAI Codex Integration with OAuth

- Enhanced Codex model discovery by fetching available models from the API, with fallback to local cache and defaults.
- Updated the context compressor's summary target tokens to 2500 for improved performance.
- Added external credential detection for Codex CLI to streamline authentication.
- Refactored various components to ensure consistent handling of authentication and model selection across the application.

											
										
										
											2026-02-28 21:47:51 -08:00
+								from types import SimpleNamespace
 								from typing import Any, Dict, List, Optional, Tuple
-												fix: preserve URL query params for Azure OpenAI and custom endpoints

Azure OpenAI requires an `api-version` query parameter on every request.
When users include it in the base_url (e.g. `?api-version=2025-04-01-preview`),
the OpenAI SDK silently drops it during URL construction, causing 404 errors.

Extract query params from base_url and pass them via `default_query` so the
SDK appends them to every request. This is a generic solution that works for
any custom endpoint requiring query parameters, not just Azure.

No-op for URLs without query params — fully backward compatible.

											
										
										
											2026-04-13 11:13:46 +08:00
+								from urllib.parse import urlparse, parse_qs, urlunparse
-												Hermes Agent UX Improvements

											
										
										
											2026-02-22 02:16:11 -08:00
 								from openai import OpenAI
-												feat(auth): same-provider credential pools with rotation, custom endpoint support, and interactive CLI (#2647)

* feat(auth): add same-provider credential pools and rotation UX

Add same-provider credential pooling so Hermes can rotate across
multiple credentials for a single provider, recover from exhausted
credentials without jumping providers immediately, and configure
that behavior directly in hermes setup.

- agent/credential_pool.py: persisted per-provider credential pools
- hermes auth add/list/remove/reset CLI commands
- 429/402/401 recovery with pool rotation in run_agent.py
- Setup wizard integration for pool strategy configuration
- Auto-seeding from env vars and existing OAuth state

Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
Salvaged from PR #2647

* fix(tests): prevent pool auto-seeding from host env in credential pool tests

Tests for non-pool Anthropic paths and auth remove were failing when
host env vars (ANTHROPIC_API_KEY) or file-backed OAuth credentials
were present. The pool auto-seeding picked these up, causing unexpected
pool entries in tests.

- Mock _select_pool_entry in auxiliary_client OAuth flag tests
- Clear Anthropic env vars and mock _seed_from_singletons in auth remove test

* feat(auth): add thread safety, least_used strategy, and request counting

- Add threading.Lock to CredentialPool for gateway thread safety
  (concurrent requests from multiple gateway sessions could race on
  pool state mutations without this)
- Add 'least_used' rotation strategy that selects the credential
  with the lowest request_count, distributing load more evenly
- Add request_count field to PooledCredential for usage tracking
- Add mark_used() method to increment per-credential request counts
- Wrap select(), mark_exhausted_and_rotate(), and try_refresh_current()
  with lock acquisition
- Add tests: least_used selection, mark_used counting, concurrent
  thread safety (4 threads × 20 selects with no corruption)

* feat(auth): add interactive mode for bare 'hermes auth' command

When 'hermes auth' is called without a subcommand, it now launches an
interactive wizard that:

1. Shows full credential pool status across all providers
2. Offers a menu: add, remove, reset cooldowns, set strategy
3. For OAuth-capable providers (anthropic, nous, openai-codex), the
   add flow explicitly asks 'API key or OAuth login?' — making it
   clear that both auth types are supported for the same provider
4. Strategy picker shows all 4 options (fill_first, round_robin,
   least_used, random) with the current selection marked
5. Remove flow shows entries with indices for easy selection

The subcommand paths (hermes auth add/list/remove/reset) still work
exactly as before for scripted/non-interactive use.

* fix(tests): update runtime_provider tests for config.yaml source of truth (#4165)

Tests were using OPENAI_BASE_URL env var which is no longer consulted
after #4165. Updated to use model config (provider, base_url, api_key)
which is the new single source of truth for custom endpoint URLs.

* feat(auth): support custom endpoint credential pools keyed by provider name

Custom OpenAI-compatible endpoints all share provider='custom', making
the provider-keyed pool useless. Now pools for custom endpoints are
keyed by 'custom:<normalized_name>' where the name comes from the
custom_providers config list (auto-generated from URL hostname).

- Pool key format: 'custom:together.ai', 'custom:local-(localhost:8080)'
- load_pool('custom:name') seeds from custom_providers api_key AND
  model.api_key when base_url matches
- hermes auth add/list now shows custom endpoints alongside registry
  providers
- _resolve_openrouter_runtime and _resolve_named_custom_runtime check
  pool before falling back to single config key
- 6 new tests covering custom pool keying, seeding, and listing

* docs: add Excalidraw diagram of full credential pool flow

Comprehensive architecture diagram showing:
- Credential sources (env vars, auth.json OAuth, config.yaml, CLI)
- Pool storage and auto-seeding
- Runtime resolution paths (registry, custom, OpenRouter)
- Error recovery (429 retry-then-rotate, 402 immediate, 401 refresh)
- CLI management commands and strategy configuration

Open at: https://excalidraw.com/#json=2Ycqhqpi6f12E_3ITyiwh,c7u9jSt5BwrmiVzHGbm87g

* fix(tests): update setup wizard pool tests for unified select_provider_and_model flow

The setup wizard now delegates to select_provider_and_model() instead
of using its own prompt_choice-based provider picker. Tests needed:
- Mock select_provider_and_model as no-op (provider pre-written to config)
- Call _stub_tts BEFORE custom prompt_choice mock (it overwrites it)
- Pre-write model.provider to config so the pool step is reached

* docs: add comprehensive credential pool documentation

- New page: website/docs/user-guide/features/credential-pools.md
  Full guide covering quick start, CLI commands, rotation strategies,
  error recovery, custom endpoint pools, auto-discovery, thread safety,
  architecture, and storage format.
- Updated fallback-providers.md to reference credential pools as the
  first layer of resilience (same-provider rotation before cross-provider)
- Added hermes auth to CLI commands reference with usage examples
- Added credential_pool_strategies to configuration guide

* chore: remove excalidraw diagram from repo (external link only)

* refactor: simplify credential pool code — extract helpers, collapse extras, dedup patterns

- _load_config_safe(): replace 4 identical try/except/import blocks
- _iter_custom_providers(): shared generator for custom provider iteration
- PooledCredential.extra dict: collapse 11 round-trip-only fields
  (token_type, scope, client_id, portal_base_url, obtained_at,
  expires_in, agent_key_id, agent_key_expires_in, agent_key_reused,
  agent_key_obtained_at, tls) into a single extra dict with
  __getattr__ for backward-compatible access
- _available_entries(): shared exhaustion-check between select and peek
- Dedup anthropic OAuth seeding (hermes_pkce + claude_code identical)
- SimpleNamespace replaces class _Args boilerplate in auth_commands
- _try_resolve_from_custom_pool(): shared pool-check in runtime_provider

Net -17 lines. All 383 targeted tests pass.

---------

Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
											
										
										
											2026-03-31 03:10:01 -07:00
+								from agent.credential_pool import load_pool
-												fix(cli): respect HERMES_HOME in all remaining hardcoded ~/.hermes paths

Several files resolved paths via Path.home() / ".hermes" or
os.path.expanduser("~/.hermes/..."), bypassing the HERMES_HOME
environment variable. This broke isolation when running multiple
Hermes instances with distinct HERMES_HOME directories.

Replace all hardcoded paths with calls to get_hermes_home() from
hermes_cli.config, consistent with the rest of the codebase.

Files fixed:
- tools/process_registry.py (processes.json)
- gateway/pairing.py (pairing/)
- gateway/sticker_cache.py (sticker_cache.json)
- gateway/channel_directory.py (channel_directory.json, sessions.json)
- gateway/config.py (gateway.json, config.yaml, sessions_dir)
- gateway/mirror.py (sessions/)
- gateway/hooks.py (hooks/)
- gateway/platforms/base.py (image_cache/, audio_cache/, document_cache/)
- gateway/platforms/whatsapp.py (whatsapp/session)
- gateway/delivery.py (cron/output)
- agent/auxiliary_client.py (auth.json)
- agent/prompt_builder.py (SOUL.md)
- cli.py (config.yaml, images/, pastes/, history)
- run_agent.py (logs/)
- tools/environments/base.py (sandboxes/)
- tools/environments/modal.py (modal_snapshots.json)
- tools/environments/singularity.py (singularity_snapshots.json)
- tools/tts_tool.py (audio_cache)
- hermes_cli/status.py (cron/jobs.json, sessions.json)
- hermes_cli/gateway.py (logs/, whatsapp session)
- hermes_cli/main.py (whatsapp/session)

Tests updated to use HERMES_HOME env var instead of patching Path.home().

Closes #892

(cherry picked from commit 78ac1bba43b8b74a934c6172f2c29bb4d03164b9)

											
										
										
											2026-03-11 07:31:41 +01:00
+								from hermes_cli.config import get_hermes_home
-												Hermes Agent UX Improvements

											
										
										
											2026-02-22 02:16:11 -08:00
+								from hermes_constants import OPENROUTER_BASE_URL
-												fix(agent): normalize socks:// env proxies for httpx/anthropic

WSL2 / Clash-style setups often export ALL_PROXY=socks://127.0.0.1:PORT. httpx and the Anthropic SDK reject that alias and expect socks5://, so agent startup failed early with "Unknown scheme for proxy URL" before any provider request could proceed.

Add shared normalize_proxy_url()/normalize_proxy_env_vars() helpers in utils.py and route all proxy entry points through them:
  - run_agent._get_proxy_from_env
  - agent.auxiliary_client._validate_proxy_env_urls
  - agent.anthropic_adapter.build_anthropic_client
  - gateway.platforms.base.resolve_proxy_url

Regression coverage:
  - run_agent proxy env resolution
  - auxiliary proxy env normalization
  - gateway proxy URL resolution

Verified with:
PYTEST_DISABLE_PLUGIN_AUTOLOAD=1 /home/nonlinear/.hermes/hermes-agent/venv/bin/pytest -o addopts='' -p pytest_asyncio.plugin tests/run_agent/test_create_openai_client_proxy_env.py tests/agent/test_proxy_and_url_validation.py tests/gateway/test_proxy_mode.py

39 passed.

											
										
										
											2026-04-21 17:55:04 +08:00
+								from utils import base_url_host_matches, base_url_hostname, normalize_proxy_env_vars
-												Hermes Agent UX Improvements

											
										
										
											2026-02-22 02:16:11 -08:00
 								logger = logging.getLogger(__name__)
-												fix: preserve URL query params for Azure OpenAI and custom endpoints

Azure OpenAI requires an `api-version` query parameter on every request.
When users include it in the base_url (e.g. `?api-version=2025-04-01-preview`),
the OpenAI SDK silently drops it during URL construction, causing 404 errors.

Extract query params from base_url and pass them via `default_query` so the
SDK appends them to every request. This is a generic solution that works for
any custom endpoint requiring query parameters, not just Azure.

No-op for URLs without query params — fully backward compatible.

											
										
										
											2026-04-13 11:13:46 +08:00
 								def _extract_url_query_params(url: str):
 								    """Extract query params from URL, return (clean_url, default_query dict or None)."""
 								    parsed = urlparse(url)
 								    if parsed.query:
 								        clean = urlunparse(parsed._replace(query=""))
 								        params = {k: v[0] for k, v in parse_qs(parsed.query).items()}
 								        return clean, params
 								    return url, None
-												fix: warn and clear stale OPENAI_BASE_URL on provider switch (#5161)

											
										
										
											2026-04-11 12:48:09 +05:30
+								# Module-level flag: only warn once per process about stale OPENAI_BASE_URL.
 								_stale_base_url_warned = False
-												fix: provider/model resolution — salvage 4 PRs + MiniMax aux URL fix (#5983)

Salvaged fixes from community PRs:

- fix(model_switch): _read_auth_store → _load_auth_store + fix auth store
  key lookup (was checking top-level dict instead of store['providers']).
  OAuth providers now correctly detected in /model picker.
  Cherry-picked from PR #5911 by Xule Lin (linxule).

- fix(ollama): pass num_ctx to override 2048 default context window.
  Ollama defaults to 2048 context regardless of model capabilities. Now
  auto-detects from /api/show metadata and injects num_ctx into every
  request. Config override via model.ollama_num_ctx. Fixes #2708.
  Cherry-picked from PR #5929 by kshitij (kshitijk4poor).

- fix(aux): normalize provider aliases for vision/auxiliary routing.
  Adds _normalize_aux_provider() with 17 aliases (google→gemini,
  claude→anthropic, glm→zai, etc). Fixes vision routing failure when
  provider is set to 'google' instead of 'gemini'.
  Cherry-picked from PR #5793 by e11i (Elizabeth1979).

- fix(aux): rewrite MiniMax /anthropic base URLs to /v1 for OpenAI SDK.
  MiniMax's inference_base_url ends in /anthropic (Anthropic Messages API),
  but auxiliary client uses OpenAI SDK which appends /chat/completions →
  404 at /anthropic/chat/completions. Generic _to_openai_base_url() helper
  rewrites terminal /anthropic to /v1 for OpenAI-compatible endpoint.
  Inspired by PR #5786 by Lempkey.

Added debug logging to silent exception blocks across all fixes.

Co-authored-by: Hermes Agent <hermes@nousresearch.com>
											
										
										
											2026-04-07 22:23:28 -07:00
+								_PROVIDER_ALIASES = {
 								    "google": "gemini",
 								    "google-gemini": "gemini",
 								    "google-ai-studio": "gemini",
-												feat(xai): upgrade to Responses API, add TTS provider

Cherry-picked and trimmed from PR #10600 by Jaaneek.

- Switch xAI transport from openai_chat to codex_responses (Responses API)
- Add codex_responses detection for xAI in all runtime_provider resolution paths
- Add xAI api_mode detection in AIAgent.__init__ (provider name + URL auto-detect)
- Add extra_headers passthrough for codex_responses requests
- Add x-grok-conv-id session header for xAI prompt caching
- Add xAI reasoning support (encrypted_content include, no effort param)
- Move x-grok-conv-id from chat_completions path to codex_responses path
- Add xAI TTS provider (dedicated /v1/tts endpoint with Opus conversion)
- Add xAI provider aliases (grok, x-ai, x.ai) across auth, models, providers, auxiliary
- Trim xAI model list to agentic models (grok-4.20-reasoning, grok-4-1-fast-reasoning)
- Add XAI_API_KEY/XAI_BASE_URL to OPTIONAL_ENV_VARS
- Add xAI TTS config section, setup wizard entry, tools_config provider option
- Add shared xai_http.py helper for User-Agent string

Co-authored-by: Jaaneek <Jaaneek@users.noreply.github.com>

											
										
										
											2026-04-15 22:27:26 -07:00
+								    "x-ai": "xai",
 								    "x.ai": "xai",
 								    "grok": "xai",
-												fix: provider/model resolution — salvage 4 PRs + MiniMax aux URL fix (#5983)

Salvaged fixes from community PRs:

- fix(model_switch): _read_auth_store → _load_auth_store + fix auth store
  key lookup (was checking top-level dict instead of store['providers']).
  OAuth providers now correctly detected in /model picker.
  Cherry-picked from PR #5911 by Xule Lin (linxule).

- fix(ollama): pass num_ctx to override 2048 default context window.
  Ollama defaults to 2048 context regardless of model capabilities. Now
  auto-detects from /api/show metadata and injects num_ctx into every
  request. Config override via model.ollama_num_ctx. Fixes #2708.
  Cherry-picked from PR #5929 by kshitij (kshitijk4poor).

- fix(aux): normalize provider aliases for vision/auxiliary routing.
  Adds _normalize_aux_provider() with 17 aliases (google→gemini,
  claude→anthropic, glm→zai, etc). Fixes vision routing failure when
  provider is set to 'google' instead of 'gemini'.
  Cherry-picked from PR #5793 by e11i (Elizabeth1979).

- fix(aux): rewrite MiniMax /anthropic base URLs to /v1 for OpenAI SDK.
  MiniMax's inference_base_url ends in /anthropic (Anthropic Messages API),
  but auxiliary client uses OpenAI SDK which appends /chat/completions →
  404 at /anthropic/chat/completions. Generic _to_openai_base_url() helper
  rewrites terminal /anthropic to /v1 for OpenAI-compatible endpoint.
  Inspired by PR #5786 by Lempkey.

Added debug logging to silent exception blocks across all fixes.

Co-authored-by: Hermes Agent <hermes@nousresearch.com>
											
										
										
											2026-04-07 22:23:28 -07:00
+								    "glm": "zai",
 								    "z-ai": "zai",
 								    "z.ai": "zai",
 								    "zhipu": "zai",
 								    "kimi": "kimi-coding",
 								    "moonshot": "kimi-coding",
-												feat(providers): add kimi-coding-cn provider for mainland China users

Cherry-picked from PR #7637 by hcshen0111.
Adds kimi-coding-cn provider with dedicated KIMI_CN_API_KEY env var
and api.moonshot.cn/v1 endpoint for China-region Moonshot users.

											
										
										
											2026-04-13 11:13:09 -07:00
+								    "kimi-cn": "kimi-coding-cn",
 								    "moonshot-cn": "kimi-coding-cn",
-												feat(providers): add GMI Cloud as a first-class API-key provider (#11955)

Add GMI Cloud (api.gmi-serving.com) as a full first-class API-key provider
with built-in auth, aliases, model catalog, CLI entry points, auxiliary client
routing, context length resolution, doctor checks, env var tracking, and docs.

- auth.py: ProviderConfig for 'gmi' (api_key, GMI_API_KEY / GMI_BASE_URL)
- providers.py: HermesOverlay with extra_env_vars for models.dev detection
- models.py: curated slash-form model catalog; live /v1/models fetch
- main.py: 'gmi' in _named_custom_provider_map and --provider choices
- model_metadata.py: _URL_TO_PROVIDER, _PROVIDER_PREFIXES, dedicated
  context-length probe block (GMI's /models has authoritative data)
- auxiliary_client.py: alias entries; _compat_model fix for slash-form
  models on cached aggregator-style clients; gmi aux default model
- doctor.py: GMI in provider connectivity checks
- config.py: GMI_API_KEY / GMI_BASE_URL in OPTIONAL_ENV_VARS
- conftest.py: explicit GMI_BASE_URL clearing (not caught by _API_KEY suffix)
- docs: providers.md, environment-variables.md, fallback-providers.md,
  configuration.md, quickstart.md (expands provider table)

Co-authored-by: Isaac Huang <isaachuang@Isaacs-MacBook-Pro.local>

											
										
										
											2026-04-17 11:19:56 -07:00
+								    "gmi-cloud": "gmi",
 								    "gmicloud": "gmi",
-												fix: provider/model resolution — salvage 4 PRs + MiniMax aux URL fix (#5983)

Salvaged fixes from community PRs:

- fix(model_switch): _read_auth_store → _load_auth_store + fix auth store
  key lookup (was checking top-level dict instead of store['providers']).
  OAuth providers now correctly detected in /model picker.
  Cherry-picked from PR #5911 by Xule Lin (linxule).

- fix(ollama): pass num_ctx to override 2048 default context window.
  Ollama defaults to 2048 context regardless of model capabilities. Now
  auto-detects from /api/show metadata and injects num_ctx into every
  request. Config override via model.ollama_num_ctx. Fixes #2708.
  Cherry-picked from PR #5929 by kshitij (kshitijk4poor).

- fix(aux): normalize provider aliases for vision/auxiliary routing.
  Adds _normalize_aux_provider() with 17 aliases (google→gemini,
  claude→anthropic, glm→zai, etc). Fixes vision routing failure when
  provider is set to 'google' instead of 'gemini'.
  Cherry-picked from PR #5793 by e11i (Elizabeth1979).

- fix(aux): rewrite MiniMax /anthropic base URLs to /v1 for OpenAI SDK.
  MiniMax's inference_base_url ends in /anthropic (Anthropic Messages API),
  but auxiliary client uses OpenAI SDK which appends /chat/completions →
  404 at /anthropic/chat/completions. Generic _to_openai_base_url() helper
  rewrites terminal /anthropic to /v1 for OpenAI-compatible endpoint.
  Inspired by PR #5786 by Lempkey.

Added debug logging to silent exception blocks across all fixes.

Co-authored-by: Hermes Agent <hermes@nousresearch.com>
											
										
										
											2026-04-07 22:23:28 -07:00
+								    "minimax-china": "minimax-cn",
 								    "minimax_cn": "minimax-cn",
 								    "claude": "anthropic",
 								    "claude-code": "anthropic",
-												fix(aux): normalize GitHub Copilot provider slugs

Keep auxiliary provider resolution aligned with the switch and persisted main-provider paths when models.dev returns github-copilot slugs.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

											
										
										
											2026-04-24 17:27:26 +08:00
+								    "github": "copilot",
 								    "github-copilot": "copilot",
 								    "github-model": "copilot",
 								    "github-models": "copilot",
 								    "github-copilot-acp": "copilot-acp",
 								    "copilot-acp-agent": "copilot-acp",
-												fix: provider/model resolution — salvage 4 PRs + MiniMax aux URL fix (#5983)

Salvaged fixes from community PRs:

- fix(model_switch): _read_auth_store → _load_auth_store + fix auth store
  key lookup (was checking top-level dict instead of store['providers']).
  OAuth providers now correctly detected in /model picker.
  Cherry-picked from PR #5911 by Xule Lin (linxule).

- fix(ollama): pass num_ctx to override 2048 default context window.
  Ollama defaults to 2048 context regardless of model capabilities. Now
  auto-detects from /api/show metadata and injects num_ctx into every
  request. Config override via model.ollama_num_ctx. Fixes #2708.
  Cherry-picked from PR #5929 by kshitij (kshitijk4poor).

- fix(aux): normalize provider aliases for vision/auxiliary routing.
  Adds _normalize_aux_provider() with 17 aliases (google→gemini,
  claude→anthropic, glm→zai, etc). Fixes vision routing failure when
  provider is set to 'google' instead of 'gemini'.
  Cherry-picked from PR #5793 by e11i (Elizabeth1979).

- fix(aux): rewrite MiniMax /anthropic base URLs to /v1 for OpenAI SDK.
  MiniMax's inference_base_url ends in /anthropic (Anthropic Messages API),
  but auxiliary client uses OpenAI SDK which appends /chat/completions →
  404 at /anthropic/chat/completions. Generic _to_openai_base_url() helper
  rewrites terminal /anthropic to /v1 for OpenAI-compatible endpoint.
  Inspired by PR #5786 by Lempkey.

Added debug logging to silent exception blocks across all fixes.

Co-authored-by: Hermes Agent <hermes@nousresearch.com>
											
										
										
											2026-04-07 22:23:28 -07:00
+								}
-												fix(agent): propagate api_mode to vision provider resolution

resolve_vision_provider_client() computed resolved_api_mode from config
but never passed it to downstream resolve_provider_client() or
_get_cached_client() calls, causing custom providers with
api_mode: anthropic_messages to crash when used for vision tasks.

Also remove the for_vision special case in _normalize_aux_provider()
that incorrectly discarded named custom provider identifiers.

Fixes #8857

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

											
										
										
											2026-04-13 16:08:19 +08:00
+								def _normalize_aux_provider(provider: Optional[str]) -> str:
-												fix: provider/model resolution — salvage 4 PRs + MiniMax aux URL fix (#5983)

Salvaged fixes from community PRs:

- fix(model_switch): _read_auth_store → _load_auth_store + fix auth store
  key lookup (was checking top-level dict instead of store['providers']).
  OAuth providers now correctly detected in /model picker.
  Cherry-picked from PR #5911 by Xule Lin (linxule).

- fix(ollama): pass num_ctx to override 2048 default context window.
  Ollama defaults to 2048 context regardless of model capabilities. Now
  auto-detects from /api/show metadata and injects num_ctx into every
  request. Config override via model.ollama_num_ctx. Fixes #2708.
  Cherry-picked from PR #5929 by kshitij (kshitijk4poor).

- fix(aux): normalize provider aliases for vision/auxiliary routing.
  Adds _normalize_aux_provider() with 17 aliases (google→gemini,
  claude→anthropic, glm→zai, etc). Fixes vision routing failure when
  provider is set to 'google' instead of 'gemini'.
  Cherry-picked from PR #5793 by e11i (Elizabeth1979).

- fix(aux): rewrite MiniMax /anthropic base URLs to /v1 for OpenAI SDK.
  MiniMax's inference_base_url ends in /anthropic (Anthropic Messages API),
  but auxiliary client uses OpenAI SDK which appends /chat/completions →
  404 at /anthropic/chat/completions. Generic _to_openai_base_url() helper
  rewrites terminal /anthropic to /v1 for OpenAI-compatible endpoint.
  Inspired by PR #5786 by Lempkey.

Added debug logging to silent exception blocks across all fixes.

Co-authored-by: Hermes Agent <hermes@nousresearch.com>
											
										
										
											2026-04-07 22:23:28 -07:00
+								    normalized = (provider or "auto").strip().lower()
 								    if normalized.startswith("custom:"):
 								        suffix = normalized.split(":", 1)[1].strip()
 								        if not suffix:
 								            return "custom"
-												fix(agent): propagate api_mode to vision provider resolution

resolve_vision_provider_client() computed resolved_api_mode from config
but never passed it to downstream resolve_provider_client() or
_get_cached_client() calls, causing custom providers with
api_mode: anthropic_messages to crash when used for vision tasks.

Also remove the for_vision special case in _normalize_aux_provider()
that incorrectly discarded named custom provider identifiers.

Fixes #8857

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

											
										
										
											2026-04-13 16:08:19 +08:00
+								        normalized = suffix
-												fix: provider/model resolution — salvage 4 PRs + MiniMax aux URL fix (#5983)

Salvaged fixes from community PRs:

- fix(model_switch): _read_auth_store → _load_auth_store + fix auth store
  key lookup (was checking top-level dict instead of store['providers']).
  OAuth providers now correctly detected in /model picker.
  Cherry-picked from PR #5911 by Xule Lin (linxule).

- fix(ollama): pass num_ctx to override 2048 default context window.
  Ollama defaults to 2048 context regardless of model capabilities. Now
  auto-detects from /api/show metadata and injects num_ctx into every
  request. Config override via model.ollama_num_ctx. Fixes #2708.
  Cherry-picked from PR #5929 by kshitij (kshitijk4poor).

- fix(aux): normalize provider aliases for vision/auxiliary routing.
  Adds _normalize_aux_provider() with 17 aliases (google→gemini,
  claude→anthropic, glm→zai, etc). Fixes vision routing failure when
  provider is set to 'google' instead of 'gemini'.
  Cherry-picked from PR #5793 by e11i (Elizabeth1979).

- fix(aux): rewrite MiniMax /anthropic base URLs to /v1 for OpenAI SDK.
  MiniMax's inference_base_url ends in /anthropic (Anthropic Messages API),
  but auxiliary client uses OpenAI SDK which appends /chat/completions →
  404 at /anthropic/chat/completions. Generic _to_openai_base_url() helper
  rewrites terminal /anthropic to /v1 for OpenAI-compatible endpoint.
  Inspired by PR #5786 by Lempkey.

Added debug logging to silent exception blocks across all fixes.

Co-authored-by: Hermes Agent <hermes@nousresearch.com>
											
										
										
											2026-04-07 22:23:28 -07:00
+								    if normalized == "codex":
 								        return "openai-codex"
 								    if normalized == "main":
 								        # Resolve to the user's actual main provider so named custom providers
 								        # and non-aggregator providers (DeepSeek, Alibaba, etc.) work correctly.
-												fix(aux): normalize GitHub Copilot provider slugs

Keep auxiliary provider resolution aligned with the switch and persisted main-provider paths when models.dev returns github-copilot slugs.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

											
										
										
											2026-04-24 17:27:26 +08:00
+								        main_prov = (_read_main_provider() or "").strip().lower()
-												fix: provider/model resolution — salvage 4 PRs + MiniMax aux URL fix (#5983)

Salvaged fixes from community PRs:

- fix(model_switch): _read_auth_store → _load_auth_store + fix auth store
  key lookup (was checking top-level dict instead of store['providers']).
  OAuth providers now correctly detected in /model picker.
  Cherry-picked from PR #5911 by Xule Lin (linxule).

- fix(ollama): pass num_ctx to override 2048 default context window.
  Ollama defaults to 2048 context regardless of model capabilities. Now
  auto-detects from /api/show metadata and injects num_ctx into every
  request. Config override via model.ollama_num_ctx. Fixes #2708.
  Cherry-picked from PR #5929 by kshitij (kshitijk4poor).

- fix(aux): normalize provider aliases for vision/auxiliary routing.
  Adds _normalize_aux_provider() with 17 aliases (google→gemini,
  claude→anthropic, glm→zai, etc). Fixes vision routing failure when
  provider is set to 'google' instead of 'gemini'.
  Cherry-picked from PR #5793 by e11i (Elizabeth1979).

- fix(aux): rewrite MiniMax /anthropic base URLs to /v1 for OpenAI SDK.
  MiniMax's inference_base_url ends in /anthropic (Anthropic Messages API),
  but auxiliary client uses OpenAI SDK which appends /chat/completions →
  404 at /anthropic/chat/completions. Generic _to_openai_base_url() helper
  rewrites terminal /anthropic to /v1 for OpenAI-compatible endpoint.
  Inspired by PR #5786 by Lempkey.

Added debug logging to silent exception blocks across all fixes.

Co-authored-by: Hermes Agent <hermes@nousresearch.com>
											
										
										
											2026-04-07 22:23:28 -07:00
+								        if main_prov and main_prov not in ("auto", "main", ""):
-												fix(aux): normalize GitHub Copilot provider slugs

Keep auxiliary provider resolution aligned with the switch and persisted main-provider paths when models.dev returns github-copilot slugs.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

											
										
										
											2026-04-24 17:27:26 +08:00
+								            normalized = main_prov
 								        else:
 								            return "custom"
-												fix: provider/model resolution — salvage 4 PRs + MiniMax aux URL fix (#5983)

Salvaged fixes from community PRs:

- fix(model_switch): _read_auth_store → _load_auth_store + fix auth store
  key lookup (was checking top-level dict instead of store['providers']).
  OAuth providers now correctly detected in /model picker.
  Cherry-picked from PR #5911 by Xule Lin (linxule).

- fix(ollama): pass num_ctx to override 2048 default context window.
  Ollama defaults to 2048 context regardless of model capabilities. Now
  auto-detects from /api/show metadata and injects num_ctx into every
  request. Config override via model.ollama_num_ctx. Fixes #2708.
  Cherry-picked from PR #5929 by kshitij (kshitijk4poor).

- fix(aux): normalize provider aliases for vision/auxiliary routing.
  Adds _normalize_aux_provider() with 17 aliases (google→gemini,
  claude→anthropic, glm→zai, etc). Fixes vision routing failure when
  provider is set to 'google' instead of 'gemini'.
  Cherry-picked from PR #5793 by e11i (Elizabeth1979).

- fix(aux): rewrite MiniMax /anthropic base URLs to /v1 for OpenAI SDK.
  MiniMax's inference_base_url ends in /anthropic (Anthropic Messages API),
  but auxiliary client uses OpenAI SDK which appends /chat/completions →
  404 at /anthropic/chat/completions. Generic _to_openai_base_url() helper
  rewrites terminal /anthropic to /v1 for OpenAI-compatible endpoint.
  Inspired by PR #5786 by Lempkey.

Added debug logging to silent exception blocks across all fixes.

Co-authored-by: Hermes Agent <hermes@nousresearch.com>
											
										
										
											2026-04-07 22:23:28 -07:00
+								    return _PROVIDER_ALIASES.get(normalized, normalized)
-												fix(kimi): force kimi-for-coding temperature to 0.6

											
										
										
											2026-04-17 16:17:15 -06:00
-												fix(kimi): omit temperature entirely for Kimi/Moonshot models (#13157)

Kimi's gateway selects the correct temperature server-side based on the
active mode (thinking -> 1.0, non-thinking -> 0.6).  Sending any
temperature value — even the previously "correct" one — conflicts with
gateway-managed defaults.

Replaces the old approach of forcing specific temperature values (0.6
for non-thinking, 1.0 for thinking) with an OMIT_TEMPERATURE sentinel
that tells all call sites to strip the temperature key from API kwargs
entirely.

Changes:
- agent/auxiliary_client.py: OMIT_TEMPERATURE sentinel, _is_kimi_model()
  prefix check (covers all kimi-* models), _fixed_temperature_for_model()
  returns sentinel for kimi models.  _build_call_kwargs() strips temp.
- run_agent.py: _build_api_kwargs, flush_memories, and summary generation
  paths all handle the sentinel by popping/omitting temperature.
- trajectory_compressor.py: _effective_temperature_for_model returns None
  for kimi (sentinel mapped), direct client calls use kwargs dict to
  conditionally include temperature.
- mini_swe_runner.py: same sentinel handling via wrapper function.
- 6 test files updated: all 'forces temperature X' assertions replaced
  with 'temperature not in kwargs' assertions.

Net: -76 lines (171 added, 247 removed).
Inspired by PR #13137 (@kshitijk4poor).
											
										
										
											2026-04-20 12:23:05 -07:00
+								# Sentinel: when returned by _fixed_temperature_for_model(), callers must
 								# strip the ``temperature`` key from API kwargs entirely so the provider's
 								# server-side default applies.  Kimi/Moonshot models manage temperature
 								# internally — sending *any* value (even the "correct" one) can conflict
 								# with gateway-side mode selection (thinking → 1.0, non-thinking → 0.6).
 								OMIT_TEMPERATURE: object = object()
-												fix(kimi): force kimi-for-coding temperature to 0.6

											
										
										
											2026-04-17 16:17:15 -06:00
-												fix(kimi): omit temperature entirely for Kimi/Moonshot models (#13157)

Kimi's gateway selects the correct temperature server-side based on the
active mode (thinking -> 1.0, non-thinking -> 0.6).  Sending any
temperature value — even the previously "correct" one — conflicts with
gateway-managed defaults.

Replaces the old approach of forcing specific temperature values (0.6
for non-thinking, 1.0 for thinking) with an OMIT_TEMPERATURE sentinel
that tells all call sites to strip the temperature key from API kwargs
entirely.

Changes:
- agent/auxiliary_client.py: OMIT_TEMPERATURE sentinel, _is_kimi_model()
  prefix check (covers all kimi-* models), _fixed_temperature_for_model()
  returns sentinel for kimi models.  _build_call_kwargs() strips temp.
- run_agent.py: _build_api_kwargs, flush_memories, and summary generation
  paths all handle the sentinel by popping/omitting temperature.
- trajectory_compressor.py: _effective_temperature_for_model returns None
  for kimi (sentinel mapped), direct client calls use kwargs dict to
  conditionally include temperature.
- mini_swe_runner.py: same sentinel handling via wrapper function.
- 6 test files updated: all 'forces temperature X' assertions replaced
  with 'temperature not in kwargs' assertions.

Net: -76 lines (171 added, 247 removed).
Inspired by PR #13137 (@kshitijk4poor).
											
										
										
											2026-04-20 12:23:05 -07:00
 								def _is_kimi_model(model: Optional[str]) -> bool:
 								    """True for any Kimi / Moonshot model that manages temperature server-side."""
 								    bare = (model or "").strip().lower().rsplit("/", 1)[-1]
 								    return bare.startswith("kimi-") or bare == "kimi"
-												fix(kimi): route temperature override by base_url — kimi-k2.5 needs 1.0 on api.moonshot.ai

Follow-up to #12144.  That PR standardized the kimi-k2.* temperature lock
against the Coding Plan endpoint (api.kimi.com/coding/v1) docs, where
non-thinking models require 0.6.  Verified empirically against Moonshot
(April 2026) that the public chat endpoint (api.moonshot.ai/v1) has a
different contract for kimi-k2.5: it only accepts temperature=1, and rejects
0.6 with:

    HTTP 400 "invalid temperature: only 1 is allowed for this model"

Users hit the public endpoint when KIMI_API_KEY is a legacy sk-* key (the
sk-kimi-* prefix routes to Coding Plan — see hermes_cli/auth.py).  So for
Coding Plan subscribers the fix from #12144 is correct, but for public-API
users it reintroduces the exact 400 reported in #9125.

Reproduction on api.moonshot.ai/v1 + kimi-k2.5:
  temperature=1.0 → 200 OK
  temperature=0.6 → 400 "only 1 is allowed"     ← #12144 default
  temperature=None → 200 OK

Other kimi-k2.* models are unaffected empirically — turbo-preview accepts
0.6 and thinking-turbo accepts 1.0 on both endpoints — so only kimi-k2.5
diverges.

Fix: thread the client's actual base_url through _build_call_kwargs (the
parameter already existed but callers passed config-level resolved_base_url;
for auto-detected routes that was often empty).  _fixed_temperature_for_model
now checks api.moonshot.ai first via an explicit _KIMI_PUBLIC_API_OVERRIDES
map, then falls back to the Coding Plan defaults.  Tests parametrize over
endpoint + model to lock both contracts.

Closes #9125.

											
										
										
											2026-04-20 04:18:49 +09:00
-												fix(kimi): force kimi-for-coding temperature to 0.6

											
										
										
											2026-04-17 16:17:15 -06:00
-												fix(kimi): route temperature override by base_url — kimi-k2.5 needs 1.0 on api.moonshot.ai

Follow-up to #12144.  That PR standardized the kimi-k2.* temperature lock
against the Coding Plan endpoint (api.kimi.com/coding/v1) docs, where
non-thinking models require 0.6.  Verified empirically against Moonshot
(April 2026) that the public chat endpoint (api.moonshot.ai/v1) has a
different contract for kimi-k2.5: it only accepts temperature=1, and rejects
0.6 with:

    HTTP 400 "invalid temperature: only 1 is allowed for this model"

Users hit the public endpoint when KIMI_API_KEY is a legacy sk-* key (the
sk-kimi-* prefix routes to Coding Plan — see hermes_cli/auth.py).  So for
Coding Plan subscribers the fix from #12144 is correct, but for public-API
users it reintroduces the exact 400 reported in #9125.

Reproduction on api.moonshot.ai/v1 + kimi-k2.5:
  temperature=1.0 → 200 OK
  temperature=0.6 → 400 "only 1 is allowed"     ← #12144 default
  temperature=None → 200 OK

Other kimi-k2.* models are unaffected empirically — turbo-preview accepts
0.6 and thinking-turbo accepts 1.0 on both endpoints — so only kimi-k2.5
diverges.

Fix: thread the client's actual base_url through _build_call_kwargs (the
parameter already existed but callers passed config-level resolved_base_url;
for auto-detected routes that was often empty).  _fixed_temperature_for_model
now checks api.moonshot.ai first via an explicit _KIMI_PUBLIC_API_OVERRIDES
map, then falls back to the Coding Plan defaults.  Tests parametrize over
endpoint + model to lock both contracts.

Closes #9125.

											
										
										
											2026-04-20 04:18:49 +09:00
+								def _fixed_temperature_for_model(
 								    model: Optional[str],
 								    base_url: Optional[str] = None,
-												fix(kimi): omit temperature entirely for Kimi/Moonshot models (#13157)

Kimi's gateway selects the correct temperature server-side based on the
active mode (thinking -> 1.0, non-thinking -> 0.6).  Sending any
temperature value — even the previously "correct" one — conflicts with
gateway-managed defaults.

Replaces the old approach of forcing specific temperature values (0.6
for non-thinking, 1.0 for thinking) with an OMIT_TEMPERATURE sentinel
that tells all call sites to strip the temperature key from API kwargs
entirely.

Changes:
- agent/auxiliary_client.py: OMIT_TEMPERATURE sentinel, _is_kimi_model()
  prefix check (covers all kimi-* models), _fixed_temperature_for_model()
  returns sentinel for kimi models.  _build_call_kwargs() strips temp.
- run_agent.py: _build_api_kwargs, flush_memories, and summary generation
  paths all handle the sentinel by popping/omitting temperature.
- trajectory_compressor.py: _effective_temperature_for_model returns None
  for kimi (sentinel mapped), direct client calls use kwargs dict to
  conditionally include temperature.
- mini_swe_runner.py: same sentinel handling via wrapper function.
- 6 test files updated: all 'forces temperature X' assertions replaced
  with 'temperature not in kwargs' assertions.

Net: -76 lines (171 added, 247 removed).
Inspired by PR #13137 (@kshitijk4poor).
											
										
										
											2026-04-20 12:23:05 -07:00
+								) -> "Optional[float] | object":
 								    """Return a temperature directive for models with strict contracts.
-												fix(kimi): force fixed temperature on kimi-k2.* models (k2.5, thinking, turbo) (#12144)

* fix(kimi): force fixed temperature on kimi-k2.* models (k2.5, thinking, turbo)

The prior override only matched the literal model name "kimi-for-coding",
but Moonshot's coding endpoint is hit with real model IDs such as
`kimi-k2.5`, `kimi-k2-turbo-preview`, `kimi-k2-thinking`, etc.  Those
requests bypassed the override and kept the caller's temperature, so
Moonshot returns HTTP 400 "invalid temperature: only 0.6 is allowed for
this model" (or 1.0 for thinking variants).

Match the whole kimi-k2.* family:
  * kimi-k2-thinking / kimi-k2-thinking-turbo -> 1.0 (thinking mode)
  * all other kimi-k2.* -> 0.6 (non-thinking / instant mode)

Also accept an optional vendor prefix (e.g. `moonshotai/kimi-k2.5`) so
aggregator routings are covered.

* refactor(kimi): whitelist-match kimi coding models instead of prefix

Addresses review feedback on PR #12144.

- Replace `startswith("kimi-k2")` with explicit frozensets sourced from
  Moonshot's kimi-for-coding model list.  The prefix match would have also
  clamped `kimi-k2-instruct` / `kimi-k2-instruct-0905`, which are the
  separate non-coding K2 family with variable temperature (recommended 0.6
  but not enforced — see huggingface.co/moonshotai/Kimi-K2-Instruct).
- Confirmed via platform.kimi.ai docs that all five coding models
  (k2.5, k2-turbo-preview, k2-0905-preview, k2-thinking, k2-thinking-turbo)
  share the fixed-temperature lock, so the preview-model mapping is no
  longer an assumption.
- Drop the fragile `"thinking" in bare` substring test for a set lookup.
- Log a debug line on each override so operators can see when Hermes
  silently rewrites temperature.
- Update class docstring.  Extend the negative test to parametrize over
  kimi-k2-instruct, Kimi-K2-Instruct-0905, and a hypothetical future
  kimi-k2-experimental name — all must keep the caller's temperature.
											
										
										
											2026-04-18 09:35:51 -07:00
-												fix(kimi): omit temperature entirely for Kimi/Moonshot models (#13157)

Kimi's gateway selects the correct temperature server-side based on the
active mode (thinking -> 1.0, non-thinking -> 0.6).  Sending any
temperature value — even the previously "correct" one — conflicts with
gateway-managed defaults.

Replaces the old approach of forcing specific temperature values (0.6
for non-thinking, 1.0 for thinking) with an OMIT_TEMPERATURE sentinel
that tells all call sites to strip the temperature key from API kwargs
entirely.

Changes:
- agent/auxiliary_client.py: OMIT_TEMPERATURE sentinel, _is_kimi_model()
  prefix check (covers all kimi-* models), _fixed_temperature_for_model()
  returns sentinel for kimi models.  _build_call_kwargs() strips temp.
- run_agent.py: _build_api_kwargs, flush_memories, and summary generation
  paths all handle the sentinel by popping/omitting temperature.
- trajectory_compressor.py: _effective_temperature_for_model returns None
  for kimi (sentinel mapped), direct client calls use kwargs dict to
  conditionally include temperature.
- mini_swe_runner.py: same sentinel handling via wrapper function.
- 6 test files updated: all 'forces temperature X' assertions replaced
  with 'temperature not in kwargs' assertions.

Net: -76 lines (171 added, 247 removed).
Inspired by PR #13137 (@kshitijk4poor).
											
										
										
											2026-04-20 12:23:05 -07:00
+								    Returns:
 								        ``OMIT_TEMPERATURE`` — caller must remove the ``temperature`` key so the
 								            provider chooses its own default.  Used for all Kimi / Moonshot
 								            models whose gateway selects temperature server-side.
 								        ``float`` — a specific value the caller must use (reserved for future
 								            models with fixed-temperature contracts).
 								        ``None`` — no override; caller should use its own default.
-												fix(kimi): force fixed temperature on kimi-k2.* models (k2.5, thinking, turbo) (#12144)

* fix(kimi): force fixed temperature on kimi-k2.* models (k2.5, thinking, turbo)

The prior override only matched the literal model name "kimi-for-coding",
but Moonshot's coding endpoint is hit with real model IDs such as
`kimi-k2.5`, `kimi-k2-turbo-preview`, `kimi-k2-thinking`, etc.  Those
requests bypassed the override and kept the caller's temperature, so
Moonshot returns HTTP 400 "invalid temperature: only 0.6 is allowed for
this model" (or 1.0 for thinking variants).

Match the whole kimi-k2.* family:
  * kimi-k2-thinking / kimi-k2-thinking-turbo -> 1.0 (thinking mode)
  * all other kimi-k2.* -> 0.6 (non-thinking / instant mode)

Also accept an optional vendor prefix (e.g. `moonshotai/kimi-k2.5`) so
aggregator routings are covered.

* refactor(kimi): whitelist-match kimi coding models instead of prefix

Addresses review feedback on PR #12144.

- Replace `startswith("kimi-k2")` with explicit frozensets sourced from
  Moonshot's kimi-for-coding model list.  The prefix match would have also
  clamped `kimi-k2-instruct` / `kimi-k2-instruct-0905`, which are the
  separate non-coding K2 family with variable temperature (recommended 0.6
  but not enforced — see huggingface.co/moonshotai/Kimi-K2-Instruct).
- Confirmed via platform.kimi.ai docs that all five coding models
  (k2.5, k2-turbo-preview, k2-0905-preview, k2-thinking, k2-thinking-turbo)
  share the fixed-temperature lock, so the preview-model mapping is no
  longer an assumption.
- Drop the fragile `"thinking" in bare` substring test for a set lookup.
- Log a debug line on each override so operators can see when Hermes
  silently rewrites temperature.
- Update class docstring.  Extend the negative test to parametrize over
  kimi-k2-instruct, Kimi-K2-Instruct-0905, and a hypothetical future
  kimi-k2-experimental name — all must keep the caller's temperature.
											
										
										
											2026-04-18 09:35:51 -07:00
+								    """
-												fix(kimi): omit temperature entirely for Kimi/Moonshot models (#13157)

Kimi's gateway selects the correct temperature server-side based on the
active mode (thinking -> 1.0, non-thinking -> 0.6).  Sending any
temperature value — even the previously "correct" one — conflicts with
gateway-managed defaults.

Replaces the old approach of forcing specific temperature values (0.6
for non-thinking, 1.0 for thinking) with an OMIT_TEMPERATURE sentinel
that tells all call sites to strip the temperature key from API kwargs
entirely.

Changes:
- agent/auxiliary_client.py: OMIT_TEMPERATURE sentinel, _is_kimi_model()
  prefix check (covers all kimi-* models), _fixed_temperature_for_model()
  returns sentinel for kimi models.  _build_call_kwargs() strips temp.
- run_agent.py: _build_api_kwargs, flush_memories, and summary generation
  paths all handle the sentinel by popping/omitting temperature.
- trajectory_compressor.py: _effective_temperature_for_model returns None
  for kimi (sentinel mapped), direct client calls use kwargs dict to
  conditionally include temperature.
- mini_swe_runner.py: same sentinel handling via wrapper function.
- 6 test files updated: all 'forces temperature X' assertions replaced
  with 'temperature not in kwargs' assertions.

Net: -76 lines (171 added, 247 removed).
Inspired by PR #13137 (@kshitijk4poor).
											
										
										
											2026-04-20 12:23:05 -07:00
+								    if _is_kimi_model(model):
 								        logger.debug("Omitting temperature for Kimi model %r (server-managed)", model)
 								        return OMIT_TEMPERATURE
-												fix(kimi): force fixed temperature on kimi-k2.* models (k2.5, thinking, turbo) (#12144)

* fix(kimi): force fixed temperature on kimi-k2.* models (k2.5, thinking, turbo)

The prior override only matched the literal model name "kimi-for-coding",
but Moonshot's coding endpoint is hit with real model IDs such as
`kimi-k2.5`, `kimi-k2-turbo-preview`, `kimi-k2-thinking`, etc.  Those
requests bypassed the override and kept the caller's temperature, so
Moonshot returns HTTP 400 "invalid temperature: only 0.6 is allowed for
this model" (or 1.0 for thinking variants).

Match the whole kimi-k2.* family:
  * kimi-k2-thinking / kimi-k2-thinking-turbo -> 1.0 (thinking mode)
  * all other kimi-k2.* -> 0.6 (non-thinking / instant mode)

Also accept an optional vendor prefix (e.g. `moonshotai/kimi-k2.5`) so
aggregator routings are covered.

* refactor(kimi): whitelist-match kimi coding models instead of prefix

Addresses review feedback on PR #12144.

- Replace `startswith("kimi-k2")` with explicit frozensets sourced from
  Moonshot's kimi-for-coding model list.  The prefix match would have also
  clamped `kimi-k2-instruct` / `kimi-k2-instruct-0905`, which are the
  separate non-coding K2 family with variable temperature (recommended 0.6
  but not enforced — see huggingface.co/moonshotai/Kimi-K2-Instruct).
- Confirmed via platform.kimi.ai docs that all five coding models
  (k2.5, k2-turbo-preview, k2-0905-preview, k2-thinking, k2-thinking-turbo)
  share the fixed-temperature lock, so the preview-model mapping is no
  longer an assumption.
- Drop the fragile `"thinking" in bare` substring test for a set lookup.
- Log a debug line on each override so operators can see when Hermes
  silently rewrites temperature.
- Update class docstring.  Extend the negative test to parametrize over
  kimi-k2-instruct, Kimi-K2-Instruct-0905, and a hypothetical future
  kimi-k2-experimental name — all must keep the caller's temperature.
											
										
										
											2026-04-18 09:35:51 -07:00
+								    return None
-												fix(kimi): force kimi-for-coding temperature to 0.6

											
										
										
											2026-04-17 16:17:15 -06:00
-												feat: add direct API-key providers as auxiliary client fallbacks

When the user only has a z.ai/Kimi/MiniMax API key (no OpenRouter key),
auxiliary tasks (context compression, web summarization, session search)
now fall back to the configured direct provider instead of returning None.

Resolution chain: OpenRouter -> Nous -> Custom endpoint -> Codex OAuth
-> direct API-key providers -> None.

Uses cheap/fast models for auxiliary tasks:
- zai: glm-4.5-flash
- kimi-coding: kimi-k2-turbo-preview
- minimax/minimax-cn: MiniMax-M2.5-highspeed

Vision auxiliary intentionally NOT modified — vision needs multimodal
models (Gemini) that these providers don't serve.

											
										
										
											2026-03-06 19:08:54 -08:00
+								# Default auxiliary models for direct API-key providers (cheap/fast for side tasks)
 								_API_KEY_PROVIDER_AUX_MODELS: Dict[str, str] = {
-												fix: update Gemini model catalog + wire models.dev as live model source

Follow-up for salvaged PR #5494:
- Update model catalog to Gemini 3.x + Gemma 4 (drop deprecated 2.0)
- Add list_agentic_models() to models_dev.py with noise filter
- Wire models.dev into _model_flow_api_key_provider as primary source
  (static curated list serves as offline fallback)
- Add gemini -> google mapping in PROVIDER_TO_MODELS_DEV
- Fix Gemma 4 context lengths to 256K (models.dev values)
- Update auxiliary model to gemini-3-flash-preview
- Expand tests: 3.x catalog, context lengths, models.dev integration

											
										
										
											2026-04-06 10:19:19 -07:00
+								    "gemini": "gemini-3-flash-preview",
-												feat: add direct API-key providers as auxiliary client fallbacks

When the user only has a z.ai/Kimi/MiniMax API key (no OpenRouter key),
auxiliary tasks (context compression, web summarization, session search)
now fall back to the configured direct provider instead of returning None.

Resolution chain: OpenRouter -> Nous -> Custom endpoint -> Codex OAuth
-> direct API-key providers -> None.

Uses cheap/fast models for auxiliary tasks:
- zai: glm-4.5-flash
- kimi-coding: kimi-k2-turbo-preview
- minimax/minimax-cn: MiniMax-M2.5-highspeed

Vision auxiliary intentionally NOT modified — vision needs multimodal
models (Gemini) that these providers don't serve.

											
										
										
											2026-03-06 19:08:54 -08:00
+								    "zai": "glm-4.5-flash",
 								    "kimi-coding": "kimi-k2-turbo-preview",
-												feat: add Step Plan provider support (salvage #6005)

Adds a first-class 'stepfun' API-key provider surfaced as Step Plan:

- Support Step Plan setup for both International and China regions
- Discover Step Plan models live from /step_plan/v1/models, with a
  small coding-focused fallback catalog when discovery is unavailable
- Thread StepFun through provider metadata, setup persistence, status
  and doctor output, auxiliary routing, and model normalization
- Add tests for provider resolution, model validation, metadata
  mapping, and StepFun region/model persistence

Based on #6005 by @hengm3467.

Co-authored-by: hengm3467 <100685635+hengm3467@users.noreply.github.com>

											
										
										
											2026-04-22 13:28:01 +05:30
+								    "stepfun": "step-3.5-flash",
-												feat(providers): add kimi-coding-cn provider for mainland China users

Cherry-picked from PR #7637 by hcshen0111.
Adds kimi-coding-cn provider with dedicated KIMI_CN_API_KEY env var
and api.moonshot.cn/v1 endpoint for China-region Moonshot users.

											
										
										
											2026-04-13 11:13:09 -07:00
+								    "kimi-coding-cn": "kimi-k2-turbo-preview",
-												fix(providers/gmi): post-salvage review fixes

- config.py: remove dead ENV_VARS_BY_VERSION[17] entry (current _config_version
  is 22, so all users are past version 17 and would never be prompted for
  GMI_API_KEY on upgrade — consistent with how arcee was added)
- auxiliary_client.py: use google/gemini-3.1-flash-lite-preview as GMI aux
  model instead of anthropic/claude-opus-4.6 (matches cheap fast-model pattern
  used by all other providers: zai→glm-4.5-flash, kimi→kimi-k2-turbo-preview,
  stepfun→step-3.5-flash, kilocode→google/gemini-3-flash-preview)
- test_gmi_provider.py: fix malformed write_text() call in doctor test
  (was: write_text("GMI_API_KEY=*** encoding="utf-8") → missing closing quote,
  wrote literal string 'GMI_API_KEY=*** encoding=' to .env file)
- test_gmi_provider.py + test_auxiliary_client.py: update aux model assertions
  to match new cheaper default
- docs/integrations/providers.md: add 'gmi' to inline 'Supported providers'
  fallback list (was only in the table, not the inline list at line ~1181)
- docs/reference/cli-commands.md: add 'gmi' to --provider choices list

											
										
										
											2026-04-27 23:46:05 +05:30
+								    "gmi": "google/gemini-3.1-flash-lite-preview",
-												fix(minimax): correct context lengths, model catalog, thinking guard, aux model, and config base_url

Cherry-picked from PR #6046 by kshitijk4poor with dead code stripped.

- Context lengths: 204800 → 1M (M1) / 1048576 (M2.5/M2.7) per official docs
- Model catalog: add M1 family, remove deprecated M2.1 and highspeed variants
- Thinking guard: skip extended thinking for MiniMax (Anthropic-compat endpoint)
- Aux model: MiniMax-M2.7-highspeed → MiniMax-M2.7 (same model, half price)
- Config base_url: honour model.base_url for API-key providers (fixes China users)
- Stripped unused get_minimax_max_output() / _MINIMAX_MAX_OUTPUT (no consumer)

Fixes #5777, #4082, #6039. Closes #3895.

											
										
										
											2026-04-08 01:39:28 -07:00
+								    "minimax": "MiniMax-M2.7",
 								    "minimax-cn": "MiniMax-M2.7",
-												feat: native Anthropic provider with Claude Code credential auto-discovery

Add Anthropic as a first-class inference provider, bypassing OpenRouter
for direct API access. Uses the native Anthropic SDK with a full format
adapter (same pattern as the codex_responses api_mode).

## Auth (three methods, priority order)
1. ANTHROPIC_API_KEY env var (regular API key, sk-ant-api-*)
2. ANTHROPIC_TOKEN / CLAUDE_CODE_OAUTH_TOKEN env var (setup-token, sk-ant-oat-*)
3. Auto-discovery from ~/.claude/.credentials.json (Claude Code subscription)
   - Reads Claude Code's OAuth credentials
   - Checks token expiry with 60s buffer
   - Setup tokens use Bearer auth + anthropic-beta: oauth-2025-04-20 header
   - Regular API keys use standard x-api-key header

## Changes by file

### New files
- agent/anthropic_adapter.py — Client builder, message/tool/response
  format conversion, Claude Code credential reader, token resolver.
  Handles system prompt extraction, tool_use/tool_result blocks,
  thinking/reasoning, orphaned tool_use cleanup, cache_control.
- tests/test_anthropic_adapter.py — 36 tests covering all adapter logic

### Modified files
- pyproject.toml — Add anthropic>=0.39.0 dependency
- hermes_cli/auth.py — Add 'anthropic' to PROVIDER_REGISTRY with
  three env vars, plus 'claude'/'claude-code' aliases
- hermes_cli/models.py — Add model catalog, labels, aliases, provider order
- hermes_cli/main.py — Add 'anthropic' to --provider CLI choices
- hermes_cli/runtime_provider.py — Add Anthropic branch returning
  api_mode='anthropic_messages' (before generic api_key fallthrough)
- hermes_cli/setup.py — Add Anthropic setup wizard with Claude Code
  credential auto-discovery, model selection, OpenRouter tools prompt
- agent/auxiliary_client.py — Add claude-haiku-4-5 as aux model
- agent/model_metadata.py — Add bare Claude model context lengths
- run_agent.py — Add anthropic_messages api_mode:
  * Client init (Anthropic SDK instead of OpenAI)
  * API call dispatch (_anthropic_client.messages.create)
  * Response validation (content blocks)
  * finish_reason mapping (stop_reason -> finish_reason)
  * Token usage (input_tokens/output_tokens)
  * Response normalization (normalize_anthropic_response)
  * Client interrupt/rebuild
  * Prompt caching auto-enabled for native Anthropic
- tests/test_run_agent.py — Update test_anthropic_base_url_accepted to
  expect native routing, add test_prompt_caching_native_anthropic

											
										
										
											2026-03-12 15:47:45 -07:00
+								    "anthropic": "claude-haiku-4-5-20251001",
-												feat: add Vercel AI Gateway provider (#1628)

* feat: add Vercel AI Gateway as a first-class provider

Adds AI Gateway (ai-gateway.vercel.sh) as a new inference provider
with AI_GATEWAY_API_KEY authentication, live model discovery, and
reasoning support via extra_body.reasoning.

Based on PR #1492 by jerilynzheng.

* feat: add AI Gateway to setup wizard, doctor, and fallback providers

* test: add AI Gateway to api_key_providers test suite

* feat: add AI Gateway to hermes model CLI and model metadata

Wire AI Gateway into the interactive model selection menu and add
context lengths for AI Gateway model IDs in model_metadata.py.

* feat: use claude-haiku-4.5 as AI Gateway auxiliary model

* revert: use gemini-3-flash as AI Gateway auxiliary model

* fix: move AI Gateway below established providers in selection order

---------

Co-authored-by: jerilynzheng <jerilynzheng@users.noreply.github.com>
Co-authored-by: jerilynzheng <zheng.jerilyn@gmail.com>
											
										
										
											2026-03-17 00:12:16 -07:00
+								    "ai-gateway": "google/gemini-3-flash",
-												feat(provider): add OpenCode Zen and OpenCode Go providers

Add support for OpenCode Zen (pay-as-you-go, 35+ curated models) and
OpenCode Go ($10/month subscription, open models) as first-class providers.

Both are OpenAI-compatible endpoints resolved via the generic api_key
provider flow — no custom adapter needed.

Files changed:
- hermes_cli/auth.py — ProviderConfig entries + aliases
- hermes_cli/config.py — OPENCODE_ZEN/GO API key env vars
- hermes_cli/models.py — model catalogs, labels, aliases, provider order
- hermes_cli/main.py — provider labels, menu entries, model flow dispatch
- hermes_cli/setup.py — setup wizard branches (idx 10, 11)
- agent/model_metadata.py — context lengths for all OpenCode models
- agent/auxiliary_client.py — default aux models
- .env.example — documentation

Co-authored-by: DevAgarwal2 <DevAgarwal2@users.noreply.github.com>
											
										
										
											2026-03-17 02:02:43 -07:00
+								    "opencode-zen": "gemini-3-flash",
 								    "opencode-go": "glm-5",
-												feat: add Kilo Code (kilocode) as first-class inference provider (#1666)

Add Kilo Gateway (kilo.ai) as an API-key provider with OpenAI-compatible
endpoint at https://api.kilo.ai/api/gateway. Supports 500+ models from
Anthropic, OpenAI, Google, xAI, Mistral, MiniMax via a single API key.

- Register kilocode in PROVIDER_REGISTRY with aliases (kilo, kilo-code,
  kilo-gateway) and KILOCODE_API_KEY / KILOCODE_BASE_URL env vars
- Add to model catalog, CLI provider menu, setup wizard, doctor checks
- Add google/gemini-3-flash-preview as default aux model
- 12 new tests covering registration, aliases, credential resolution,
  runtime config
- Documentation updates (env vars, config, fallback providers)
- Fix setup test index shift from provider insertion

Inspired by PR #1473 by @amanning3390.

Co-authored-by: amanning3390 <amanning3390@users.noreply.github.com>
											
										
										
											2026-03-17 02:40:34 -07:00
+								    "kilocode": "google/gemini-3-flash-preview",
-												feat: add Ollama Cloud as built-in provider

Add ollama-cloud as a first-class provider with full parity to existing
API-key providers (gemini, zai, minimax, etc.):

- PROVIDER_REGISTRY entry with OLLAMA_API_KEY env var
- Provider aliases: ollama -> custom (local), ollama_cloud -> ollama-cloud
- models.dev integration for accurate context lengths
- URL-to-provider mapping (ollama.com -> ollama-cloud)
- Passthrough model normalization (preserves Ollama model:tag format)
- Default auxiliary model (nemotron-3-nano:30b)
- HermesOverlay in providers.py
- CLI --provider choices, CANONICAL_PROVIDERS entry
- Dynamic model discovery with disk caching (1hr TTL)
- 37 provider-specific tests

Cherry-picked from PR #6038 by kshitijk4poor. Closes #3926

											
										
										
											2026-04-15 22:32:05 -07:00
+								    "ollama-cloud": "nemotron-3-nano:30b",
-												feat(xiaomi): add Xiaomi MiMo as first-class provider

Cherry-picked from PR #7702 by kshitijk4poor.

Adds Xiaomi MiMo as a direct provider (XIAOMI_API_KEY) with models:
- mimo-v2-pro (1M context), mimo-v2-omni (256K, multimodal), mimo-v2-flash (256K, cheapest)

Standard OpenAI-compatible provider checklist: auth.py, config.py, models.py,
main.py, providers.py, doctor.py, model_normalize.py, model_metadata.py,
models_dev.py, auxiliary_client.py, .env.example, cli-config.yaml.example.

Follow-up: vision tasks use mimo-v2-omni (multimodal) instead of the user's
main model. Non-vision aux uses the user's selected model. Added
_PROVIDER_VISION_MODELS dict for provider-specific vision model overrides.
On failure, falls back to aggregators (gemini flash) via existing fallback chain.

Corrects pre-existing context lengths: mimo-v2-pro 1048576→1000000,
mimo-v2-omni 1048576→256000, adds mimo-v2-flash 256000.

36 tests covering registry, aliases, auto-detect, credentials, models.dev,
normalization, URL mapping, providers module, doctor, aux client, vision
model override, and agent init.

											
										
										
											2026-04-11 10:10:31 -07:00
+								}
 								# Vision-specific model overrides for direct providers.
 								# When the user's main provider has a dedicated vision/multimodal model that
 								# differs from their main chat model, map it here.  The vision auto-detect
 								# "exotic provider" branch checks this before falling back to the main model.
 								_PROVIDER_VISION_MODELS: Dict[str, str] = {
-												feat: add Xiaomi MiMo v2.5-pro and v2.5 model support (#14635)

## Merged

Adds MiMo v2.5-pro and v2.5 support to Xiaomi native provider, OpenCode Go, and setup wizard.

### Changes
- Context lengths: added v2.5-pro (1M) and v2.5 (1M), corrected existing MiMo entries to exact values (262144)
- Provider lists: xiaomi, opencode-go, setup wizard
- Vision: upgraded from mimo-v2-omni to mimo-v2.5 (omnimodal)
- Config description updated for XIAOMI_API_KEY
- Tests updated for new vision model preference

### Verification
- 4322 tests passed, 0 new regressions
- Live API tested on Xiaomi portal: basic, reasoning, tool calling, multi-tool, file ops, system prompt, vision — all pass
- Self-review found and fixed 2 issues (redundant vision check, stale HuggingFace context length)
											
										
										
											2026-04-23 10:06:25 -07:00
+								    "xiaomi": "mimo-v2.5",
-												feat(zai): add GLM-5V-Turbo support for coding plan (#9907)

- Add glm-5v-turbo to OpenRouter, Nous, and native Z.AI model lists
- Add glm-5v context length entry (200K tokens) to model metadata
- Update Z.AI endpoint probe to try multiple candidate models per
  endpoint (glm-5.1, glm-5v-turbo, glm-4.7) — fixes detection for
  newer coding plan accounts that lack older models
- Add zai to _PROVIDER_VISION_MODELS so auxiliary vision tasks
  (vision_analyze, browser screenshots) route through 5v

Fixes #9888
											
										
										
											2026-04-14 16:26:01 -07:00
+								    "zai": "glm-5v-turbo",
-												feat: add direct API-key providers as auxiliary client fallbacks

When the user only has a z.ai/Kimi/MiniMax API key (no OpenRouter key),
auxiliary tasks (context compression, web summarization, session search)
now fall back to the configured direct provider instead of returning None.

Resolution chain: OpenRouter -> Nous -> Custom endpoint -> Codex OAuth
-> direct API-key providers -> None.

Uses cheap/fast models for auxiliary tasks:
- zai: glm-4.5-flash
- kimi-coding: kimi-k2-turbo-preview
- minimax/minimax-cn: MiniMax-M2.5-highspeed

Vision auxiliary intentionally NOT modified — vision needs multimodal
models (Gemini) that these providers don't serve.

											
										
										
											2026-03-06 19:08:54 -08:00
+								}
-												add identifier for openrouter calls

											
										
										
											2026-02-25 16:34:47 -08:00
+								# OpenRouter app attribution headers
 								_OR_HEADERS = {
-												fix: use hermes-agent.nousresearch.com as OpenRouter HTTP-Referer

* fix: stop rejecting unlisted models + auto-detect from /models endpoint

validate_requested_model() now accepts models not in the provider's API
listing with a warning instead of blocking. Removes hardcoded catalog
fallback for validation — if API is unreachable, accepts with a warning.

Model selection flows (setup + /model command) now probe the provider's
/models endpoint to get the real available models. Falls back to
hardcoded defaults with a clear warning when auto-detection fails:
'Could not auto-detect models — use Custom model if yours isn't listed.'

Z.AI setup no longer excludes GLM-5 on coding plans.

* fix: use hermes-agent.nousresearch.com as HTTP-Referer for OpenRouter

OpenRouter scrapes the favicon/logo from the HTTP-Referer URL for app
rankings. We were sending the GitHub repo URL, which gives us a generic
GitHub logo. Changed to the proper website URL so our actual branding
shows up in rankings.

Changed in run_agent.py (main agent client) and auxiliary_client.py
(vision/summarization clients).
											
										
										
											2026-03-12 16:20:22 -07:00
+								    "HTTP-Referer": "https://hermes-agent.nousresearch.com",
-												add identifier for openrouter calls

											
										
										
											2026-02-25 16:34:47 -08:00
+								    "X-OpenRouter-Title": "Hermes Agent",
-												fix(headers): update X-OpenRouter-Categories to include 'productivity'

											
										
										
											2026-02-28 10:38:49 -08:00
+								    "X-OpenRouter-Categories": "productivity,cli-agent",
-												add identifier for openrouter calls

											
										
										
											2026-02-25 16:34:47 -08:00
+								}
-												feat: attribution default_headers for ai-gateway provider

Requests through Vercel AI Gateway now carry referrerUrl / appName /
User-Agent attribution so traffic shows up in the gateway's analytics.
Adds _AI_GATEWAY_HEADERS in auxiliary_client and a new
ai-gateway.vercel.sh branch in _apply_client_headers_for_base_url.

											
										
										
											2026-04-19 22:00:45 -07:00
+								# Vercel AI Gateway app attribution headers. HTTP-Referer maps to
 								# referrerUrl and X-Title maps to appName in the gateway's analytics.
 								from hermes_cli import __version__ as _HERMES_VERSION
 								_AI_GATEWAY_HEADERS = {
 								    "HTTP-Referer": "https://hermes-agent.nousresearch.com",
 								    "X-Title": "Hermes Agent",
 								    "User-Agent": f"HermesAgent/{_HERMES_VERSION}",
 								}
-												refactor: integrate Nous Portal support in auxiliary client

- Added functionality to include product attribution tags for Nous Portal in auxiliary API calls.
- Introduced a mechanism to determine if the auxiliary client is backed by Nous Portal, affecting the extra body of requests.
- Updated various tools to utilize the new extra body configuration for enhanced tracking in API calls.

											
										
										
											2026-02-25 18:39:36 -08:00
+								# Nous Portal extra_body for product attribution.
 								# Callers should pass this as extra_body in chat.completions.create()
 								# when the auxiliary client is backed by Nous Portal.
 								NOUS_EXTRA_BODY = {"tags": ["product=hermes-agent"]}
 								# Set at resolve time — True if the auxiliary client points to Nous Portal
 								auxiliary_is_nous: bool = False
-												Hermes Agent UX Improvements

											
										
										
											2026-02-22 02:16:11 -08:00
+								# Default auxiliary models per provider
 								_OPENROUTER_MODEL = "google/gemini-3-flash-preview"
-												fix: align Nous Portal model slugs with OpenRouter naming (#3253)

Nous Portal now passes through OpenRouter model names and routes from
there. Update the static fallback model list and auxiliary client default
to use OpenRouter-format slugs (provider/model) instead of bare names.

- _PROVIDER_MODELS['nous']: full OpenRouter catalog
- _NOUS_MODEL: google/gemini-3-flash-preview (was gemini-3-flash)
- Updated 4 test assertions for the new default model name
											
										
										
											2026-03-26 13:49:43 -07:00
+								_NOUS_MODEL = "google/gemini-3-flash-preview"
-												Hermes Agent UX Improvements

											
										
										
											2026-02-22 02:16:11 -08:00
+								_NOUS_DEFAULT_BASE_URL = "https://inference-api.nousresearch.com/v1"
-												feat: add native Anthropic auxiliary vision

											
										
										
											2026-03-14 21:14:20 -07:00
+								_ANTHROPIC_DEFAULT_BASE_URL = "https://api.anthropic.com"
-												fix(cli): respect HERMES_HOME in all remaining hardcoded ~/.hermes paths

Several files resolved paths via Path.home() / ".hermes" or
os.path.expanduser("~/.hermes/..."), bypassing the HERMES_HOME
environment variable. This broke isolation when running multiple
Hermes instances with distinct HERMES_HOME directories.

Replace all hardcoded paths with calls to get_hermes_home() from
hermes_cli.config, consistent with the rest of the codebase.

Files fixed:
- tools/process_registry.py (processes.json)
- gateway/pairing.py (pairing/)
- gateway/sticker_cache.py (sticker_cache.json)
- gateway/channel_directory.py (channel_directory.json, sessions.json)
- gateway/config.py (gateway.json, config.yaml, sessions_dir)
- gateway/mirror.py (sessions/)
- gateway/hooks.py (hooks/)
- gateway/platforms/base.py (image_cache/, audio_cache/, document_cache/)
- gateway/platforms/whatsapp.py (whatsapp/session)
- gateway/delivery.py (cron/output)
- agent/auxiliary_client.py (auth.json)
- agent/prompt_builder.py (SOUL.md)
- cli.py (config.yaml, images/, pastes/, history)
- run_agent.py (logs/)
- tools/environments/base.py (sandboxes/)
- tools/environments/modal.py (modal_snapshots.json)
- tools/environments/singularity.py (singularity_snapshots.json)
- tools/tts_tool.py (audio_cache)
- hermes_cli/status.py (cron/jobs.json, sessions.json)
- hermes_cli/gateway.py (logs/, whatsapp session)
- hermes_cli/main.py (whatsapp/session)

Tests updated to use HERMES_HOME env var instead of patching Path.home().

Closes #892

(cherry picked from commit 78ac1bba43b8b74a934c6172f2c29bb4d03164b9)

											
										
										
											2026-03-11 07:31:41 +01:00
+								_AUTH_JSON_PATH = get_hermes_home() / "auth.json"
-												Hermes Agent UX Improvements

											
										
										
											2026-02-22 02:16:11 -08:00
-												refactor(cli): Finalize OpenAI Codex Integration with OAuth

- Enhanced Codex model discovery by fetching available models from the API, with fallback to local cache and defaults.
- Updated the context compressor's summary target tokens to 2500 for improved performance.
- Added external credential detection for Codex CLI to streamline authentication.
- Refactored various components to ensure consistent handling of authentication and model selection across the application.

											
										
										
											2026-02-28 21:47:51 -08:00
+								# Codex fallback: uses the Responses API (the only endpoint the Codex
 								# OAuth token can access) with a fast model for auxiliary tasks.
-												fix: convert anthropic image content blocks

											
										
										
											2026-03-14 23:21:09 -07:00
+								# ChatGPT-backed Codex accounts currently reject gpt-5.3-codex for these
 								# auxiliary flows, while gpt-5.2-codex remains broadly available and supports
 								# vision via Responses.
 								_CODEX_AUX_MODEL = "gpt-5.2-codex"
-												refactor(cli): Finalize OpenAI Codex Integration with OAuth

- Enhanced Codex model discovery by fetching available models from the API, with fallback to local cache and defaults.
- Updated the context compressor's summary target tokens to 2500 for improved performance.
- Added external credential detection for Codex CLI to streamline authentication.
- Refactored various components to ensure consistent handling of authentication and model selection across the application.

											
										
										
											2026-02-28 21:47:51 -08:00
+								_CODEX_AUX_BASE_URL = "https://chatgpt.com/backend-api/codex"
-												fix(codex): pin correct Cloudflare headers and extend to auxiliary client

The cherry-picked salvage (admin28980's commit) added codex headers only on the
primary chat client path, with two inaccuracies:

  - originator was 'hermes-agent' — Cloudflare whitelists codex_cli_rs,
    codex_vscode, codex_sdk_ts, and Codex* prefixes. 'hermes-agent' isn't on
    the list, so the header had no mitigating effect on the 403 (the
    account-id header alone may have been carrying the fix).
  - account-id header was 'ChatGPT-Account-Id' — upstream codex-rs auth.rs
    uses canonical 'ChatGPT-Account-ID' (PascalCase, trailing -ID).

Also, the auxiliary client (_try_codex + resolve_provider_client raw_codex
branch) constructs OpenAI clients against the same chatgpt.com endpoint with
no default headers at all — so compression, title generation, vision, session
search, and web_extract all still 403 from VPS IPs.

Consolidate the header set into _codex_cloudflare_headers() in
agent/auxiliary_client.py (natural home next to _read_codex_access_token and
the existing JWT decode logic) and call it from all four insertion points:

  - run_agent.py: AIAgent.__init__ (initial construction)
  - run_agent.py: _apply_client_headers_for_base_url (credential rotation)
  - agent/auxiliary_client.py: _try_codex (aux client)
  - agent/auxiliary_client.py: resolve_provider_client raw_codex branch

Net: -36/+55 lines, -25 lines of duplicated inline JWT decode replaced by a
single helper. User-Agent switched to 'codex_cli_rs/0.0.0 (Hermes Agent)' to
match the codex-rs shape while keeping product attribution.

Tests in tests/agent/test_codex_cloudflare_headers.py cover:
  - originator value, User-Agent shape, canonical header casing
  - account-ID extraction from a real JWT fixture
  - graceful handling of malformed / non-string / claim-missing tokens
  - wiring at all four insertion points (primary init, rotation, both aux paths)
  - non-chatgpt base URLs (openrouter) do NOT get codex headers
  - switching away from chatgpt.com drops the headers

											
										
										
											2026-04-19 11:58:15 -07:00
+								def _codex_cloudflare_headers(access_token: str) -> Dict[str, str]:
 								    """Headers required to avoid Cloudflare 403s on chatgpt.com/backend-api/codex.
 								    The Cloudflare layer in front of the Codex endpoint whitelists a small set of
 								    first-party originators (``codex_cli_rs``, ``codex_vscode``, ``codex_sdk_ts``,
 								    anything starting with ``Codex``). Requests from non-residential IPs (VPS,
 								    server-hosted agents) that don't advertise an allowed originator are served
 								    a 403 with ``cf-mitigated: challenge`` regardless of auth correctness.
 								    We pin ``originator: codex_cli_rs`` to match the upstream codex-rs CLI, set
 								    ``User-Agent`` to a codex_cli_rs-shaped string (beats SDK fingerprinting),
 								    and extract ``ChatGPT-Account-ID`` (canonical casing, from codex-rs
 								    ``auth.rs``) out of the OAuth JWT's ``chatgpt_account_id`` claim.
 								    Malformed tokens are tolerated — we drop the account-ID header rather than
 								    raise, so a bad token still surfaces as an auth error (401) instead of a
 								    crash at client construction.
 								    """
 								    headers = {
 								        "User-Agent": "codex_cli_rs/0.0.0 (Hermes Agent)",
 								        "originator": "codex_cli_rs",
 								    }
 								    if not isinstance(access_token, str) or not access_token.strip():
 								        return headers
 								    try:
 								        import base64
 								        parts = access_token.split(".")
 								        if len(parts) < 2:
 								            return headers
 								        payload_b64 = parts[1] + "=" * (-len(parts[1]) % 4)
 								        claims = json.loads(base64.urlsafe_b64decode(payload_b64))
 								        acct_id = claims.get("https://api.openai.com/auth", {}).get("chatgpt_account_id")
 								        if isinstance(acct_id, str) and acct_id:
 								            headers["ChatGPT-Account-ID"] = acct_id
 								    except Exception:
 								        pass
 								    return headers
-												fix: provider/model resolution — salvage 4 PRs + MiniMax aux URL fix (#5983)

Salvaged fixes from community PRs:

- fix(model_switch): _read_auth_store → _load_auth_store + fix auth store
  key lookup (was checking top-level dict instead of store['providers']).
  OAuth providers now correctly detected in /model picker.
  Cherry-picked from PR #5911 by Xule Lin (linxule).

- fix(ollama): pass num_ctx to override 2048 default context window.
  Ollama defaults to 2048 context regardless of model capabilities. Now
  auto-detects from /api/show metadata and injects num_ctx into every
  request. Config override via model.ollama_num_ctx. Fixes #2708.
  Cherry-picked from PR #5929 by kshitij (kshitijk4poor).

- fix(aux): normalize provider aliases for vision/auxiliary routing.
  Adds _normalize_aux_provider() with 17 aliases (google→gemini,
  claude→anthropic, glm→zai, etc). Fixes vision routing failure when
  provider is set to 'google' instead of 'gemini'.
  Cherry-picked from PR #5793 by e11i (Elizabeth1979).

- fix(aux): rewrite MiniMax /anthropic base URLs to /v1 for OpenAI SDK.
  MiniMax's inference_base_url ends in /anthropic (Anthropic Messages API),
  but auxiliary client uses OpenAI SDK which appends /chat/completions →
  404 at /anthropic/chat/completions. Generic _to_openai_base_url() helper
  rewrites terminal /anthropic to /v1 for OpenAI-compatible endpoint.
  Inspired by PR #5786 by Lempkey.

Added debug logging to silent exception blocks across all fixes.

Co-authored-by: Hermes Agent <hermes@nousresearch.com>
											
										
										
											2026-04-07 22:23:28 -07:00
+								def _to_openai_base_url(base_url: str) -> str:
 								    """Normalize an Anthropic-style base URL to OpenAI-compatible format.
 								    Some providers (MiniMax, MiniMax-CN) expose an ``/anthropic`` endpoint for
 								    the Anthropic Messages API and a separate ``/v1`` endpoint for OpenAI chat
 								    completions.  The auxiliary client uses the OpenAI SDK, so it must hit the
 								    ``/v1`` surface.  Passing the raw ``inference_base_url`` causes requests to
 								    land on ``/anthropic/chat/completions`` — a 404.
 								    """
 								    url = str(base_url or "").strip().rstrip("/")
 								    if url.endswith("/anthropic"):
 								        rewritten = url[: -len("/anthropic")] + "/v1"
 								        logger.debug("Auxiliary client: rewrote base URL %s → %s", url, rewritten)
 								        return rewritten
 								    return url
-												feat(auth): same-provider credential pools with rotation, custom endpoint support, and interactive CLI (#2647)

* feat(auth): add same-provider credential pools and rotation UX

Add same-provider credential pooling so Hermes can rotate across
multiple credentials for a single provider, recover from exhausted
credentials without jumping providers immediately, and configure
that behavior directly in hermes setup.

- agent/credential_pool.py: persisted per-provider credential pools
- hermes auth add/list/remove/reset CLI commands
- 429/402/401 recovery with pool rotation in run_agent.py
- Setup wizard integration for pool strategy configuration
- Auto-seeding from env vars and existing OAuth state

Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
Salvaged from PR #2647

* fix(tests): prevent pool auto-seeding from host env in credential pool tests

Tests for non-pool Anthropic paths and auth remove were failing when
host env vars (ANTHROPIC_API_KEY) or file-backed OAuth credentials
were present. The pool auto-seeding picked these up, causing unexpected
pool entries in tests.

- Mock _select_pool_entry in auxiliary_client OAuth flag tests
- Clear Anthropic env vars and mock _seed_from_singletons in auth remove test

* feat(auth): add thread safety, least_used strategy, and request counting

- Add threading.Lock to CredentialPool for gateway thread safety
  (concurrent requests from multiple gateway sessions could race on
  pool state mutations without this)
- Add 'least_used' rotation strategy that selects the credential
  with the lowest request_count, distributing load more evenly
- Add request_count field to PooledCredential for usage tracking
- Add mark_used() method to increment per-credential request counts
- Wrap select(), mark_exhausted_and_rotate(), and try_refresh_current()
  with lock acquisition
- Add tests: least_used selection, mark_used counting, concurrent
  thread safety (4 threads × 20 selects with no corruption)

* feat(auth): add interactive mode for bare 'hermes auth' command

When 'hermes auth' is called without a subcommand, it now launches an
interactive wizard that:

1. Shows full credential pool status across all providers
2. Offers a menu: add, remove, reset cooldowns, set strategy
3. For OAuth-capable providers (anthropic, nous, openai-codex), the
   add flow explicitly asks 'API key or OAuth login?' — making it
   clear that both auth types are supported for the same provider
4. Strategy picker shows all 4 options (fill_first, round_robin,
   least_used, random) with the current selection marked
5. Remove flow shows entries with indices for easy selection

The subcommand paths (hermes auth add/list/remove/reset) still work
exactly as before for scripted/non-interactive use.

* fix(tests): update runtime_provider tests for config.yaml source of truth (#4165)

Tests were using OPENAI_BASE_URL env var which is no longer consulted
after #4165. Updated to use model config (provider, base_url, api_key)
which is the new single source of truth for custom endpoint URLs.

* feat(auth): support custom endpoint credential pools keyed by provider name

Custom OpenAI-compatible endpoints all share provider='custom', making
the provider-keyed pool useless. Now pools for custom endpoints are
keyed by 'custom:<normalized_name>' where the name comes from the
custom_providers config list (auto-generated from URL hostname).

- Pool key format: 'custom:together.ai', 'custom:local-(localhost:8080)'
- load_pool('custom:name') seeds from custom_providers api_key AND
  model.api_key when base_url matches
- hermes auth add/list now shows custom endpoints alongside registry
  providers
- _resolve_openrouter_runtime and _resolve_named_custom_runtime check
  pool before falling back to single config key
- 6 new tests covering custom pool keying, seeding, and listing

* docs: add Excalidraw diagram of full credential pool flow

Comprehensive architecture diagram showing:
- Credential sources (env vars, auth.json OAuth, config.yaml, CLI)
- Pool storage and auto-seeding
- Runtime resolution paths (registry, custom, OpenRouter)
- Error recovery (429 retry-then-rotate, 402 immediate, 401 refresh)
- CLI management commands and strategy configuration

Open at: https://excalidraw.com/#json=2Ycqhqpi6f12E_3ITyiwh,c7u9jSt5BwrmiVzHGbm87g

* fix(tests): update setup wizard pool tests for unified select_provider_and_model flow

The setup wizard now delegates to select_provider_and_model() instead
of using its own prompt_choice-based provider picker. Tests needed:
- Mock select_provider_and_model as no-op (provider pre-written to config)
- Call _stub_tts BEFORE custom prompt_choice mock (it overwrites it)
- Pre-write model.provider to config so the pool step is reached

* docs: add comprehensive credential pool documentation

- New page: website/docs/user-guide/features/credential-pools.md
  Full guide covering quick start, CLI commands, rotation strategies,
  error recovery, custom endpoint pools, auto-discovery, thread safety,
  architecture, and storage format.
- Updated fallback-providers.md to reference credential pools as the
  first layer of resilience (same-provider rotation before cross-provider)
- Added hermes auth to CLI commands reference with usage examples
- Added credential_pool_strategies to configuration guide

* chore: remove excalidraw diagram from repo (external link only)

* refactor: simplify credential pool code — extract helpers, collapse extras, dedup patterns

- _load_config_safe(): replace 4 identical try/except/import blocks
- _iter_custom_providers(): shared generator for custom provider iteration
- PooledCredential.extra dict: collapse 11 round-trip-only fields
  (token_type, scope, client_id, portal_base_url, obtained_at,
  expires_in, agent_key_id, agent_key_expires_in, agent_key_reused,
  agent_key_obtained_at, tls) into a single extra dict with
  __getattr__ for backward-compatible access
- _available_entries(): shared exhaustion-check between select and peek
- Dedup anthropic OAuth seeding (hermes_pkce + claude_code identical)
- SimpleNamespace replaces class _Args boilerplate in auth_commands
- _try_resolve_from_custom_pool(): shared pool-check in runtime_provider

Net -17 lines. All 383 targeted tests pass.

---------

Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
											
										
										
											2026-03-31 03:10:01 -07:00
+								def _select_pool_entry(provider: str) -> Tuple[bool, Optional[Any]]:
 								    """Return (pool_exists_for_provider, selected_entry)."""
 								    try:
 								        pool = load_pool(provider)
 								    except Exception as exc:
 								        logger.debug("Auxiliary client: could not load pool for %s: %s", provider, exc)
 								        return False, None
 								    if not pool or not pool.has_credentials():
 								        return False, None
 								    try:
 								        return True, pool.select()
 								    except Exception as exc:
 								        logger.debug("Auxiliary client: could not select pool entry for %s: %s", provider, exc)
 								        return True, None
 								def _pool_runtime_api_key(entry: Any) -> str:
 								    if entry is None:
 								        return ""
 								    # Use the PooledCredential.runtime_api_key property which handles
 								    # provider-specific fallback (e.g. agent_key for nous).
 								    key = getattr(entry, "runtime_api_key", None) or getattr(entry, "access_token", "")
 								    return str(key or "").strip()
 								def _pool_runtime_base_url(entry: Any, fallback: str = "") -> str:
 								    if entry is None:
 								        return str(fallback or "").strip().rstrip("/")
 								    # runtime_base_url handles provider-specific logic (e.g. nous prefers inference_base_url).
 								    # Fall back through inference_base_url and base_url for non-PooledCredential entries.
 								    url = (
 								        getattr(entry, "runtime_base_url", None)
 								        or getattr(entry, "inference_base_url", None)
 								        or getattr(entry, "base_url", None)
 								        or fallback
 								    )
 								    return str(url or "").strip().rstrip("/")
-												refactor(cli): Finalize OpenAI Codex Integration with OAuth

- Enhanced Codex model discovery by fetching available models from the API, with fallback to local cache and defaults.
- Updated the context compressor's summary target tokens to 2500 for improved performance.
- Added external credential detection for Codex CLI to streamline authentication.
- Refactored various components to ensure consistent handling of authentication and model selection across the application.

											
										
										
											2026-02-28 21:47:51 -08:00
+								# ── Codex Responses → chat.completions adapter ─────────────────────────────
 								# All auxiliary consumers call client.chat.completions.create(**kwargs) and
 								# read response.choices[0].message.content. This adapter translates those
 								# calls to the Codex Responses API so callers don't need any changes.
-												feat: Codex OAuth vision support + multimodal content adapter

The Codex Responses API (chatgpt.com/backend-api/codex) supports
vision via gpt-5.3-codex. This was verified with real API calls
using image analysis.

Changes to _CodexCompletionsAdapter:
- Added _convert_content_for_responses() to translate chat.completions
  multimodal format to Responses API format:
  - {type: 'text'} → {type: 'input_text'}
  - {type: 'image_url', image_url: {url: '...'}} → {type: 'input_image', image_url: '...'}
- Fixed: removed 'stream' from resp_kwargs (responses.stream() handles it)
- Fixed: removed max_output_tokens and temperature (Codex endpoint rejects them)

Provider changes:
- Added 'codex' as explicit auxiliary provider option
- Vision auto-fallback now includes Codex (OpenRouter → Nous → Codex)
  since gpt-5.3-codex supports multimodal input
- Updated docs with Codex OAuth examples

Tested with real Codex OAuth token + ~/.hermes/image2.png — confirmed
working end-to-end through the full adapter pipeline.

Tests: 2459 passed.

											
										
										
											2026-03-08 18:44:25 -07:00
 								def _convert_content_for_responses(content: Any) -> Any:
 								    """Convert chat.completions content to Responses API format.
 								    chat.completions uses:
 								      {"type": "text", "text": "..."}
 								      {"type": "image_url", "image_url": {"url": "data:image/png;base64,..."}}
 								    Responses API uses:
 								      {"type": "input_text", "text": "..."}
 								      {"type": "input_image", "image_url": "data:image/png;base64,..."}
 								    If content is a plain string, it's returned as-is (the Responses API
 								    accepts strings directly for text-only messages).
 								    """
 								    if isinstance(content, str):
 								        return content
 								    if not isinstance(content, list):
 								        return str(content) if content else ""
 								    converted: List[Dict[str, Any]] = []
 								    for part in content:
 								        if not isinstance(part, dict):
 								            continue
 								        ptype = part.get("type", "")
 								        if ptype == "text":
 								            converted.append({"type": "input_text", "text": part.get("text", "")})
 								        elif ptype == "image_url":
 								            # chat.completions nests the URL: {"image_url": {"url": "..."}}
 								            image_data = part.get("image_url", {})
 								            url = image_data.get("url", "") if isinstance(image_data, dict) else str(image_data)
 								            entry: Dict[str, Any] = {"type": "input_image", "image_url": url}
 								            # Preserve detail if specified
 								            detail = image_data.get("detail") if isinstance(image_data, dict) else None
 								            if detail:
 								                entry["detail"] = detail
 								            converted.append(entry)
 								        elif ptype in ("input_text", "input_image"):
 								            # Already in Responses format — pass through
 								            converted.append(part)
 								        else:
 								            # Unknown content type — try to preserve as text
 								            text = part.get("text", "")
 								            if text:
 								                converted.append({"type": "input_text", "text": text})
 								    return converted or ""
-												refactor(cli): Finalize OpenAI Codex Integration with OAuth

- Enhanced Codex model discovery by fetching available models from the API, with fallback to local cache and defaults.
- Updated the context compressor's summary target tokens to 2500 for improved performance.
- Added external credential detection for Codex CLI to streamline authentication.
- Refactored various components to ensure consistent handling of authentication and model selection across the application.

											
										
										
											2026-02-28 21:47:51 -08:00
+								class _CodexCompletionsAdapter:
 								    """Drop-in shim that accepts chat.completions.create() kwargs and
 								    routes them through the Codex Responses streaming API."""
 								    def __init__(self, real_client: OpenAI, model: str):
 								        self._client = real_client
 								        self._model = model
 								    def create(self, **kwargs) -> Any:
 								        messages = kwargs.get("messages", [])
 								        model = kwargs.get("model", self._model)
-												feat: Codex OAuth vision support + multimodal content adapter

The Codex Responses API (chatgpt.com/backend-api/codex) supports
vision via gpt-5.3-codex. This was verified with real API calls
using image analysis.

Changes to _CodexCompletionsAdapter:
- Added _convert_content_for_responses() to translate chat.completions
  multimodal format to Responses API format:
  - {type: 'text'} → {type: 'input_text'}
  - {type: 'image_url', image_url: {url: '...'}} → {type: 'input_image', image_url: '...'}
- Fixed: removed 'stream' from resp_kwargs (responses.stream() handles it)
- Fixed: removed max_output_tokens and temperature (Codex endpoint rejects them)

Provider changes:
- Added 'codex' as explicit auxiliary provider option
- Vision auto-fallback now includes Codex (OpenRouter → Nous → Codex)
  since gpt-5.3-codex supports multimodal input
- Updated docs with Codex OAuth examples

Tested with real Codex OAuth token + ~/.hermes/image2.png — confirmed
working end-to-end through the full adapter pipeline.

Tests: 2459 passed.

											
										
										
											2026-03-08 18:44:25 -07:00
+								        # Separate system/instructions from conversation messages.
 								        # Convert chat.completions multimodal content blocks to Responses
 								        # API format (input_text / input_image instead of text / image_url).
-												refactor(cli): Finalize OpenAI Codex Integration with OAuth

- Enhanced Codex model discovery by fetching available models from the API, with fallback to local cache and defaults.
- Updated the context compressor's summary target tokens to 2500 for improved performance.
- Added external credential detection for Codex CLI to streamline authentication.
- Refactored various components to ensure consistent handling of authentication and model selection across the application.

											
										
										
											2026-02-28 21:47:51 -08:00
+								        instructions = "You are a helpful assistant."
 								        input_msgs: List[Dict[str, Any]] = []
 								        for msg in messages:
 								            role = msg.get("role", "user")
-												fix: handle None message content across codebase (fixes #276)

The OpenAI API returns content: null on assistant messages with tool
calls. msg.get('content', '') returns None when the key exists with
value None, causing TypeError on len(), string concatenation, and
.strip() in downstream code paths.

Fixed 4 locations that process conversation messages:
- agent/auxiliary_client.py:84 — None passed to API calls
- cli.py:1288 — crash on content[:200] and len(content)
- run_agent.py:3444 — crash on None.strip()
- honcho_integration/session.py:445 — 'None' rendered in transcript

13 other instances were verified safe (already protected, only process
user/tool messages, or use the safe pattern).

Pattern: msg.get('content', '') → msg.get('content') or ''

Fixes #276

											
										
										
											2026-03-02 02:23:53 -08:00
+								            content = msg.get("content") or ""
-												refactor(cli): Finalize OpenAI Codex Integration with OAuth

- Enhanced Codex model discovery by fetching available models from the API, with fallback to local cache and defaults.
- Updated the context compressor's summary target tokens to 2500 for improved performance.
- Added external credential detection for Codex CLI to streamline authentication.
- Refactored various components to ensure consistent handling of authentication and model selection across the application.

											
										
										
											2026-02-28 21:47:51 -08:00
+								            if role == "system":
-												feat: Codex OAuth vision support + multimodal content adapter

The Codex Responses API (chatgpt.com/backend-api/codex) supports
vision via gpt-5.3-codex. This was verified with real API calls
using image analysis.

Changes to _CodexCompletionsAdapter:
- Added _convert_content_for_responses() to translate chat.completions
  multimodal format to Responses API format:
  - {type: 'text'} → {type: 'input_text'}
  - {type: 'image_url', image_url: {url: '...'}} → {type: 'input_image', image_url: '...'}
- Fixed: removed 'stream' from resp_kwargs (responses.stream() handles it)
- Fixed: removed max_output_tokens and temperature (Codex endpoint rejects them)

Provider changes:
- Added 'codex' as explicit auxiliary provider option
- Vision auto-fallback now includes Codex (OpenRouter → Nous → Codex)
  since gpt-5.3-codex supports multimodal input
- Updated docs with Codex OAuth examples

Tested with real Codex OAuth token + ~/.hermes/image2.png — confirmed
working end-to-end through the full adapter pipeline.

Tests: 2459 passed.

											
										
										
											2026-03-08 18:44:25 -07:00
+								                instructions = content if isinstance(content, str) else str(content)
-												refactor(cli): Finalize OpenAI Codex Integration with OAuth

- Enhanced Codex model discovery by fetching available models from the API, with fallback to local cache and defaults.
- Updated the context compressor's summary target tokens to 2500 for improved performance.
- Added external credential detection for Codex CLI to streamline authentication.
- Refactored various components to ensure consistent handling of authentication and model selection across the application.

											
										
										
											2026-02-28 21:47:51 -08:00
+								            else:
-												feat: Codex OAuth vision support + multimodal content adapter

The Codex Responses API (chatgpt.com/backend-api/codex) supports
vision via gpt-5.3-codex. This was verified with real API calls
using image analysis.

Changes to _CodexCompletionsAdapter:
- Added _convert_content_for_responses() to translate chat.completions
  multimodal format to Responses API format:
  - {type: 'text'} → {type: 'input_text'}
  - {type: 'image_url', image_url: {url: '...'}} → {type: 'input_image', image_url: '...'}
- Fixed: removed 'stream' from resp_kwargs (responses.stream() handles it)
- Fixed: removed max_output_tokens and temperature (Codex endpoint rejects them)

Provider changes:
- Added 'codex' as explicit auxiliary provider option
- Vision auto-fallback now includes Codex (OpenRouter → Nous → Codex)
  since gpt-5.3-codex supports multimodal input
- Updated docs with Codex OAuth examples

Tested with real Codex OAuth token + ~/.hermes/image2.png — confirmed
working end-to-end through the full adapter pipeline.

Tests: 2459 passed.

											
										
										
											2026-03-08 18:44:25 -07:00
+								                input_msgs.append({
 								                    "role": role,
 								                    "content": _convert_content_for_responses(content),
 								                })
-												refactor(cli): Finalize OpenAI Codex Integration with OAuth

- Enhanced Codex model discovery by fetching available models from the API, with fallback to local cache and defaults.
- Updated the context compressor's summary target tokens to 2500 for improved performance.
- Added external credential detection for Codex CLI to streamline authentication.
- Refactored various components to ensure consistent handling of authentication and model selection across the application.

											
										
										
											2026-02-28 21:47:51 -08:00
 								        resp_kwargs: Dict[str, Any] = {
 								            "model": model,
 								            "instructions": instructions,
 								            "input": input_msgs or [{"role": "user", "content": ""}],
 								            "store": False,
 								        }
-												feat: Codex OAuth vision support + multimodal content adapter

The Codex Responses API (chatgpt.com/backend-api/codex) supports
vision via gpt-5.3-codex. This was verified with real API calls
using image analysis.

Changes to _CodexCompletionsAdapter:
- Added _convert_content_for_responses() to translate chat.completions
  multimodal format to Responses API format:
  - {type: 'text'} → {type: 'input_text'}
  - {type: 'image_url', image_url: {url: '...'}} → {type: 'input_image', image_url: '...'}
- Fixed: removed 'stream' from resp_kwargs (responses.stream() handles it)
- Fixed: removed max_output_tokens and temperature (Codex endpoint rejects them)

Provider changes:
- Added 'codex' as explicit auxiliary provider option
- Vision auto-fallback now includes Codex (OpenRouter → Nous → Codex)
  since gpt-5.3-codex supports multimodal input
- Updated docs with Codex OAuth examples

Tested with real Codex OAuth token + ~/.hermes/image2.png — confirmed
working end-to-end through the full adapter pipeline.

Tests: 2459 passed.

											
										
										
											2026-03-08 18:44:25 -07:00
+								        # Note: the Codex endpoint (chatgpt.com/backend-api/codex) does NOT
 								        # support max_output_tokens or temperature — omit to avoid 400 errors.
-												refactor(cli): Finalize OpenAI Codex Integration with OAuth

- Enhanced Codex model discovery by fetching available models from the API, with fallback to local cache and defaults.
- Updated the context compressor's summary target tokens to 2500 for improved performance.
- Added external credential detection for Codex CLI to streamline authentication.
- Refactored various components to ensure consistent handling of authentication and model selection across the application.

											
										
										
											2026-02-28 21:47:51 -08:00
-												refactor(memory): remove flush_memories entirely (#15696)

The AIAgent.flush_memories pre-compression save, the gateway
_flush_memories_for_session, and everything feeding them are
obsolete now that the background memory/skill review handles
persistent memory extraction.

Problems with flush_memories:

- Pre-dates the background review loop.  It was the only memory-save
  path when introduced; the background review now fires every 10 user
  turns on CLI and gateway alike, which is far more frequent than
  compression or session reset ever triggered flush.
- Blocking and synchronous.  Pre-compression flush ran on the live agent
  before compression, blocking the user-visible response.
- Cache-breaking.  Flush built a temporary conversation prefix
  (system prompt + memory-only tool list) that diverged from the live
  conversation's cached prefix, invalidating prompt caching.  The
  gateway variant spawned a fresh AIAgent with its own clean prompt
  for each finalized session — still cache-breaking, just in a
  different process.
- Redundant.  Background review runs in the live conversation's
  session context, gets the same content, writes to the same memory
  store, and doesn't break the cache.  Everything flush_memories
  claimed to preserve is already covered.

What this removes:

- AIAgent.flush_memories() method (~248 LOC in run_agent.py)
- Pre-compression flush call in _compress_context
- flush_memories call sites in cli.py (/new + exit)
- GatewayRunner._flush_memories_for_session + _async_flush_memories
  (and the 3 call sites: session expiry watcher, /new, /resume)
- 'flush_memories' entry from DEFAULT_CONFIG auxiliary tasks,
  hermes tools UI task list, auxiliary_client docstrings
- _memory_flush_min_turns config + init
- #15631's headroom-deduction math in
  _check_compression_model_feasibility (headroom was only needed
  because flush dragged the full main-agent system prompt along;
  the compression summariser sends a single user-role prompt so
  new_threshold = aux_context is safe again)
- The dedicated test files and assertions that exercised
  flush-specific paths

What this renames (with read-time backcompat on sessions.json):

- SessionEntry.memory_flushed -> SessionEntry.expiry_finalized.
  The session-expiry watcher still uses the flag to avoid re-running
  finalize/eviction on the same expired session; the new name
  reflects what it now actually gates.  from_dict() reads
  'expiry_finalized' first, falls back to the legacy 'memory_flushed'
  key so existing sessions.json files upgrade seamlessly.

Supersedes #15631 and #15638.

Tested: 383 targeted tests pass across run_agent/, agent/, cli/,
and gateway/ session-boundary suites.  No behavior regressions —
background memory review continues to handle persistent memory
extraction on both CLI and gateway.
											
										
										
											2026-04-25 08:21:14 -07:00
+								        # Tools support for auxiliary callers (e.g. skills_hub) that pass function schemas
-												refactor(cli): Finalize OpenAI Codex Integration with OAuth

- Enhanced Codex model discovery by fetching available models from the API, with fallback to local cache and defaults.
- Updated the context compressor's summary target tokens to 2500 for improved performance.
- Added external credential detection for Codex CLI to streamline authentication.
- Refactored various components to ensure consistent handling of authentication and model selection across the application.

											
										
										
											2026-02-28 21:47:51 -08:00
+								        tools = kwargs.get("tools")
 								        if tools:
 								            converted = []
 								            for t in tools:
 								                fn = t.get("function", {}) if isinstance(t, dict) else {}
 								                name = fn.get("name")
 								                if not name:
 								                    continue
 								                converted.append({
 								                    "type": "function",
 								                    "name": name,
 								                    "description": fn.get("description", ""),
 								                    "parameters": fn.get("parameters", {}),
 								                })
 								            if converted:
 								                resp_kwargs["tools"] = converted
 								        # Stream and collect the response
 								        text_parts: List[str] = []
 								        tool_calls_raw: List[Any] = []
 								        usage = None
 								        try:
-												fix: backfill empty codex output in auxiliary client (#5730)

The _CodexCompletionsAdapter (used for compression, vision, web_extract,
session_search, and memory flush when on the codex provider) streamed
responses but discarded all events with 'for _event in stream: pass'.
When get_final_response() returned empty output (the same chatgpt.com
backend-api shape change), auxiliary calls silently returned None content.

Now collects response.output_item.done and text deltas during streaming
and backfills empty output — same pattern as _run_codex_stream().

Tested live against chatgpt.com/backend-api/codex with OAuth.
											
										
										
											2026-04-06 21:13:22 -07:00
+								            # Collect output items and text deltas during streaming —
 								            # the Codex backend can return empty response.output from
 								            # get_final_response() even when items were streamed.
 								            collected_output_items: List[Any] = []
 								            collected_text_deltas: List[str] = []
-												fix: harden auxiliary codex adapter — dict-shaped items + tool call guard (#5734)

Two remaining gaps from the codex empty-output spec:

1. Normalize dict-shaped streamed items: output_item.done events may
   yield dicts (raw/fallback paths) instead of SDK objects. The
   extraction loop now uses _item_get() that handles both getattr
   and dict .get() access.

2. Avoid plain-text synthesis when function_call events were streamed:
   tracks has_function_calls during streaming and skips text-delta
   synthesis when tool calls are present — prevents collapsing a
   tool-call response into a fake text message.
											
										
										
											2026-04-06 21:35:33 -07:00
+								            has_function_calls = False
-												refactor(cli): Finalize OpenAI Codex Integration with OAuth

- Enhanced Codex model discovery by fetching available models from the API, with fallback to local cache and defaults.
- Updated the context compressor's summary target tokens to 2500 for improved performance.
- Added external credential detection for Codex CLI to streamline authentication.
- Refactored various components to ensure consistent handling of authentication and model selection across the application.

											
										
										
											2026-02-28 21:47:51 -08:00
+								            with self._client.responses.stream(**resp_kwargs) as stream:
 								                for _event in stream:
-												fix: backfill empty codex output in auxiliary client (#5730)

The _CodexCompletionsAdapter (used for compression, vision, web_extract,
session_search, and memory flush when on the codex provider) streamed
responses but discarded all events with 'for _event in stream: pass'.
When get_final_response() returned empty output (the same chatgpt.com
backend-api shape change), auxiliary calls silently returned None content.

Now collects response.output_item.done and text deltas during streaming
and backfills empty output — same pattern as _run_codex_stream().

Tested live against chatgpt.com/backend-api/codex with OAuth.
											
										
										
											2026-04-06 21:13:22 -07:00
+								                    _etype = getattr(_event, "type", "")
 								                    if _etype == "response.output_item.done":
 								                        _done = getattr(_event, "item", None)
 								                        if _done is not None:
 								                            collected_output_items.append(_done)
 								                    elif "output_text.delta" in _etype:
 								                        _delta = getattr(_event, "delta", "")
 								                        if _delta:
 								                            collected_text_deltas.append(_delta)
-												fix: harden auxiliary codex adapter — dict-shaped items + tool call guard (#5734)

Two remaining gaps from the codex empty-output spec:

1. Normalize dict-shaped streamed items: output_item.done events may
   yield dicts (raw/fallback paths) instead of SDK objects. The
   extraction loop now uses _item_get() that handles both getattr
   and dict .get() access.

2. Avoid plain-text synthesis when function_call events were streamed:
   tracks has_function_calls during streaming and skips text-delta
   synthesis when tool calls are present — prevents collapsing a
   tool-call response into a fake text message.
											
										
										
											2026-04-06 21:35:33 -07:00
+								                    elif "function_call" in _etype:
 								                        has_function_calls = True
-												refactor(cli): Finalize OpenAI Codex Integration with OAuth

- Enhanced Codex model discovery by fetching available models from the API, with fallback to local cache and defaults.
- Updated the context compressor's summary target tokens to 2500 for improved performance.
- Added external credential detection for Codex CLI to streamline authentication.
- Refactored various components to ensure consistent handling of authentication and model selection across the application.

											
										
										
											2026-02-28 21:47:51 -08:00
+								                final = stream.get_final_response()
-												fix: backfill empty codex output in auxiliary client (#5730)

The _CodexCompletionsAdapter (used for compression, vision, web_extract,
session_search, and memory flush when on the codex provider) streamed
responses but discarded all events with 'for _event in stream: pass'.
When get_final_response() returned empty output (the same chatgpt.com
backend-api shape change), auxiliary calls silently returned None content.

Now collects response.output_item.done and text deltas during streaming
and backfills empty output — same pattern as _run_codex_stream().

Tested live against chatgpt.com/backend-api/codex with OAuth.
											
										
										
											2026-04-06 21:13:22 -07:00
+								            # Backfill empty output from collected stream events
 								            _output = getattr(final, "output", None)
 								            if isinstance(_output, list) and not _output:
 								                if collected_output_items:
 								                    final.output = list(collected_output_items)
 								                    logger.debug(
 								                        "Codex auxiliary: backfilled %d output items from stream events",
 								                        len(collected_output_items),
 								                    )
-												fix: harden auxiliary codex adapter — dict-shaped items + tool call guard (#5734)

Two remaining gaps from the codex empty-output spec:

1. Normalize dict-shaped streamed items: output_item.done events may
   yield dicts (raw/fallback paths) instead of SDK objects. The
   extraction loop now uses _item_get() that handles both getattr
   and dict .get() access.

2. Avoid plain-text synthesis when function_call events were streamed:
   tracks has_function_calls during streaming and skips text-delta
   synthesis when tool calls are present — prevents collapsing a
   tool-call response into a fake text message.
											
										
										
											2026-04-06 21:35:33 -07:00
+								                elif collected_text_deltas and not has_function_calls:
 								                    # Only synthesize text when no tool calls were streamed —
 								                    # a function_call response with incidental text should not
 								                    # be collapsed into a plain-text message.
-												fix: backfill empty codex output in auxiliary client (#5730)

The _CodexCompletionsAdapter (used for compression, vision, web_extract,
session_search, and memory flush when on the codex provider) streamed
responses but discarded all events with 'for _event in stream: pass'.
When get_final_response() returned empty output (the same chatgpt.com
backend-api shape change), auxiliary calls silently returned None content.

Now collects response.output_item.done and text deltas during streaming
and backfills empty output — same pattern as _run_codex_stream().

Tested live against chatgpt.com/backend-api/codex with OAuth.
											
										
										
											2026-04-06 21:13:22 -07:00
+								                    assembled = "".join(collected_text_deltas)
 								                    final.output = [SimpleNamespace(
 								                        type="message", role="assistant", status="completed",
 								                        content=[SimpleNamespace(type="output_text", text=assembled)],
 								                    )]
 								                    logger.debug(
 								                        "Codex auxiliary: synthesized from %d deltas (%d chars)",
 								                        len(collected_text_deltas), len(assembled),
 								                    )
-												fix: harden auxiliary codex adapter — dict-shaped items + tool call guard (#5734)

Two remaining gaps from the codex empty-output spec:

1. Normalize dict-shaped streamed items: output_item.done events may
   yield dicts (raw/fallback paths) instead of SDK objects. The
   extraction loop now uses _item_get() that handles both getattr
   and dict .get() access.

2. Avoid plain-text synthesis when function_call events were streamed:
   tracks has_function_calls during streaming and skips text-delta
   synthesis when tool calls are present — prevents collapsing a
   tool-call response into a fake text message.
											
										
										
											2026-04-06 21:35:33 -07:00
+								            # Extract text and tool calls from the Responses output.
 								            # Items may be SDK objects (attrs) or dicts (raw/fallback paths),
 								            # so use a helper that handles both shapes.
 								            def _item_get(obj: Any, key: str, default: Any = None) -> Any:
 								                val = getattr(obj, key, None)
 								                if val is None and isinstance(obj, dict):
 								                    val = obj.get(key, default)
 								                return val if val is not None else default
-												refactor(cli): Finalize OpenAI Codex Integration with OAuth

- Enhanced Codex model discovery by fetching available models from the API, with fallback to local cache and defaults.
- Updated the context compressor's summary target tokens to 2500 for improved performance.
- Added external credential detection for Codex CLI to streamline authentication.
- Refactored various components to ensure consistent handling of authentication and model selection across the application.

											
										
										
											2026-02-28 21:47:51 -08:00
+								            for item in getattr(final, "output", []):
-												fix: harden auxiliary codex adapter — dict-shaped items + tool call guard (#5734)

Two remaining gaps from the codex empty-output spec:

1. Normalize dict-shaped streamed items: output_item.done events may
   yield dicts (raw/fallback paths) instead of SDK objects. The
   extraction loop now uses _item_get() that handles both getattr
   and dict .get() access.

2. Avoid plain-text synthesis when function_call events were streamed:
   tracks has_function_calls during streaming and skips text-delta
   synthesis when tool calls are present — prevents collapsing a
   tool-call response into a fake text message.
											
										
										
											2026-04-06 21:35:33 -07:00
+								                item_type = _item_get(item, "type")
-												refactor(cli): Finalize OpenAI Codex Integration with OAuth

- Enhanced Codex model discovery by fetching available models from the API, with fallback to local cache and defaults.
- Updated the context compressor's summary target tokens to 2500 for improved performance.
- Added external credential detection for Codex CLI to streamline authentication.
- Refactored various components to ensure consistent handling of authentication and model selection across the application.

											
										
										
											2026-02-28 21:47:51 -08:00
+								                if item_type == "message":
-												fix: harden auxiliary codex adapter — dict-shaped items + tool call guard (#5734)

Two remaining gaps from the codex empty-output spec:

1. Normalize dict-shaped streamed items: output_item.done events may
   yield dicts (raw/fallback paths) instead of SDK objects. The
   extraction loop now uses _item_get() that handles both getattr
   and dict .get() access.

2. Avoid plain-text synthesis when function_call events were streamed:
   tracks has_function_calls during streaming and skips text-delta
   synthesis when tool calls are present — prevents collapsing a
   tool-call response into a fake text message.
											
										
										
											2026-04-06 21:35:33 -07:00
+								                    for part in (_item_get(item, "content") or []):
 								                        ptype = _item_get(part, "type")
-												refactor(cli): Finalize OpenAI Codex Integration with OAuth

- Enhanced Codex model discovery by fetching available models from the API, with fallback to local cache and defaults.
- Updated the context compressor's summary target tokens to 2500 for improved performance.
- Added external credential detection for Codex CLI to streamline authentication.
- Refactored various components to ensure consistent handling of authentication and model selection across the application.

											
										
										
											2026-02-28 21:47:51 -08:00
+								                        if ptype in ("output_text", "text"):
-												fix: harden auxiliary codex adapter — dict-shaped items + tool call guard (#5734)

Two remaining gaps from the codex empty-output spec:

1. Normalize dict-shaped streamed items: output_item.done events may
   yield dicts (raw/fallback paths) instead of SDK objects. The
   extraction loop now uses _item_get() that handles both getattr
   and dict .get() access.

2. Avoid plain-text synthesis when function_call events were streamed:
   tracks has_function_calls during streaming and skips text-delta
   synthesis when tool calls are present — prevents collapsing a
   tool-call response into a fake text message.
											
										
										
											2026-04-06 21:35:33 -07:00
+								                            text_parts.append(_item_get(part, "text", ""))
-												refactor(cli): Finalize OpenAI Codex Integration with OAuth

- Enhanced Codex model discovery by fetching available models from the API, with fallback to local cache and defaults.
- Updated the context compressor's summary target tokens to 2500 for improved performance.
- Added external credential detection for Codex CLI to streamline authentication.
- Refactored various components to ensure consistent handling of authentication and model selection across the application.

											
										
										
											2026-02-28 21:47:51 -08:00
+								                elif item_type == "function_call":
 								                    tool_calls_raw.append(SimpleNamespace(
-												fix: harden auxiliary codex adapter — dict-shaped items + tool call guard (#5734)

Two remaining gaps from the codex empty-output spec:

1. Normalize dict-shaped streamed items: output_item.done events may
   yield dicts (raw/fallback paths) instead of SDK objects. The
   extraction loop now uses _item_get() that handles both getattr
   and dict .get() access.

2. Avoid plain-text synthesis when function_call events were streamed:
   tracks has_function_calls during streaming and skips text-delta
   synthesis when tool calls are present — prevents collapsing a
   tool-call response into a fake text message.
											
										
										
											2026-04-06 21:35:33 -07:00
+								                        id=_item_get(item, "call_id", ""),
-												refactor(cli): Finalize OpenAI Codex Integration with OAuth

- Enhanced Codex model discovery by fetching available models from the API, with fallback to local cache and defaults.
- Updated the context compressor's summary target tokens to 2500 for improved performance.
- Added external credential detection for Codex CLI to streamline authentication.
- Refactored various components to ensure consistent handling of authentication and model selection across the application.

											
										
										
											2026-02-28 21:47:51 -08:00
+								                        type="function",
 								                        function=SimpleNamespace(
-												fix: harden auxiliary codex adapter — dict-shaped items + tool call guard (#5734)

Two remaining gaps from the codex empty-output spec:

1. Normalize dict-shaped streamed items: output_item.done events may
   yield dicts (raw/fallback paths) instead of SDK objects. The
   extraction loop now uses _item_get() that handles both getattr
   and dict .get() access.

2. Avoid plain-text synthesis when function_call events were streamed:
   tracks has_function_calls during streaming and skips text-delta
   synthesis when tool calls are present — prevents collapsing a
   tool-call response into a fake text message.
											
										
										
											2026-04-06 21:35:33 -07:00
+								                            name=_item_get(item, "name", ""),
 								                            arguments=_item_get(item, "arguments", "{}"),
-												refactor(cli): Finalize OpenAI Codex Integration with OAuth

- Enhanced Codex model discovery by fetching available models from the API, with fallback to local cache and defaults.
- Updated the context compressor's summary target tokens to 2500 for improved performance.
- Added external credential detection for Codex CLI to streamline authentication.
- Refactored various components to ensure consistent handling of authentication and model selection across the application.

											
										
										
											2026-02-28 21:47:51 -08:00
+								                        ),
 								                    ))
 								            resp_usage = getattr(final, "usage", None)
 								            if resp_usage:
 								                usage = SimpleNamespace(
 								                    prompt_tokens=getattr(resp_usage, "input_tokens", 0),
 								                    completion_tokens=getattr(resp_usage, "output_tokens", 0),
 								                    total_tokens=getattr(resp_usage, "total_tokens", 0),
 								                )
 								        except Exception as exc:
 								            logger.debug("Codex auxiliary Responses API call failed: %s", exc)
 								            raise
 								        content = "".join(text_parts).strip() or None
 								        # Build a response that looks like chat.completions
 								        message = SimpleNamespace(
 								            role="assistant",
 								            content=content,
 								            tool_calls=tool_calls_raw or None,
 								        )
 								        choice = SimpleNamespace(
 								            index=0,
 								            message=message,
 								            finish_reason="stop" if not tool_calls_raw else "tool_calls",
 								        )
 								        return SimpleNamespace(
 								            choices=[choice],
 								            model=model,
 								            usage=usage,
 								        )
 								class _CodexChatShim:
 								    """Wraps the adapter to provide client.chat.completions.create()."""
 								    def __init__(self, adapter: _CodexCompletionsAdapter):
 								        self.completions = adapter
 								class CodexAuxiliaryClient:
 								    """OpenAI-client-compatible wrapper that routes through Codex Responses API.
 								    Consumers can call client.chat.completions.create(**kwargs) as normal.
 								    Also exposes .api_key and .base_url for introspection by async wrappers.
 								    """
 								    def __init__(self, real_client: OpenAI, model: str):
 								        self._real_client = real_client
 								        adapter = _CodexCompletionsAdapter(real_client, model)
 								        self.chat = _CodexChatShim(adapter)
 								        self.api_key = real_client.api_key
 								        self.base_url = real_client.base_url
 								    def close(self):
 								        self._real_client.close()
 								class _AsyncCodexCompletionsAdapter:
 								    """Async version of the Codex Responses adapter.
 								    Wraps the sync adapter via asyncio.to_thread() so async consumers
 								    (web_tools, session_search) can await it as normal.
 								    """
 								    def __init__(self, sync_adapter: _CodexCompletionsAdapter):
 								        self._sync = sync_adapter
 								    async def create(self, **kwargs) -> Any:
 								        import asyncio
 								        return await asyncio.to_thread(self._sync.create, **kwargs)
 								class _AsyncCodexChatShim:
 								    def __init__(self, adapter: _AsyncCodexCompletionsAdapter):
 								        self.completions = adapter
 								class AsyncCodexAuxiliaryClient:
 								    """Async-compatible wrapper matching AsyncOpenAI.chat.completions.create()."""
 								    def __init__(self, sync_wrapper: "CodexAuxiliaryClient"):
 								        sync_adapter = sync_wrapper.chat.completions
 								        async_adapter = _AsyncCodexCompletionsAdapter(sync_adapter)
 								        self.chat = _AsyncCodexChatShim(async_adapter)
 								        self.api_key = sync_wrapper.api_key
 								        self.base_url = sync_wrapper.base_url
-												Hermes Agent UX Improvements

											
										
										
											2026-02-22 02:16:11 -08:00
-												feat: add native Anthropic auxiliary vision

											
										
										
											2026-03-14 21:14:20 -07:00
+								class _AnthropicCompletionsAdapter:
 								    """OpenAI-client-compatible adapter for Anthropic Messages API."""
-												fix: auxiliary client skips expired Codex JWT and propagates Anthropic OAuth flag

Two bugs in the auxiliary provider auto-detection chain:

1. Expired Codex JWT blocks the auto chain: _read_codex_access_token()
   returned any stored token without checking expiry, preventing fallback
   to working providers. Now decodes JWT exp claim and returns None for
   expired tokens.

2. Auxiliary Anthropic client missing OAuth identity transforms:
   _AnthropicCompletionsAdapter always called build_anthropic_kwargs with
   is_oauth=False, causing 400 errors for OAuth tokens. Now detects OAuth
   tokens via _is_oauth_token() and propagates the flag through the
   adapter chain.

Cherry-picked from PR #2378 by 0xbyt4. Fixed test_api_key_no_oauth_flag
to mock resolve_anthropic_token directly (env var alone was insufficient).

											
										
										
											2026-03-21 17:36:25 -07:00
+								    def __init__(self, real_client: Any, model: str, is_oauth: bool = False):
-												feat: add native Anthropic auxiliary vision

											
										
										
											2026-03-14 21:14:20 -07:00
+								        self._client = real_client
 								        self._model = model
-												fix: auxiliary client skips expired Codex JWT and propagates Anthropic OAuth flag

Two bugs in the auxiliary provider auto-detection chain:

1. Expired Codex JWT blocks the auto chain: _read_codex_access_token()
   returned any stored token without checking expiry, preventing fallback
   to working providers. Now decodes JWT exp claim and returns None for
   expired tokens.

2. Auxiliary Anthropic client missing OAuth identity transforms:
   _AnthropicCompletionsAdapter always called build_anthropic_kwargs with
   is_oauth=False, causing 400 errors for OAuth tokens. Now detects OAuth
   tokens via _is_oauth_token() and propagates the flag through the
   adapter chain.

Cherry-picked from PR #2378 by 0xbyt4. Fixed test_api_key_no_oauth_flag
to mock resolve_anthropic_token directly (env var alone was insufficient).

											
										
										
											2026-03-21 17:36:25 -07:00
+								        self._is_oauth = is_oauth
-												feat: add native Anthropic auxiliary vision

											
										
										
											2026-03-14 21:14:20 -07:00
 								    def create(self, **kwargs) -> Any:
-												refactor: migrate auxiliary_client Anthropic path to use transport

Replace direct normalize_anthropic_response() call in
_AnthropicCompletionsAdapter.create() with
AnthropicTransport.normalize_response() via get_transport().

Before: auxiliary_client called adapter v1 directly, bypassing
the transport layer entirely.

After: auxiliary_client → get_transport('anthropic_messages') →
transport.normalize_response() → adapter v1 → NormalizedResponse.

The adapter v1 function (normalize_anthropic_response) now has
zero callers outside agent/anthropic_adapter.py and the transport.
This unblocks collapsing v1 to return NormalizedResponse directly
in a follow-up (the remaining 2-layer chain becomes 1-layer).

WS1 item 2 of Cycle 2 (#14418).

											
										
										
											2026-04-23 13:39:44 +05:30
+								        from agent.anthropic_adapter import build_anthropic_kwargs
 								        from agent.transports import get_transport
-												feat: add native Anthropic auxiliary vision

											
										
										
											2026-03-14 21:14:20 -07:00
 								        messages = kwargs.get("messages", [])
 								        model = kwargs.get("model", self._model)
 								        tools = kwargs.get("tools")
 								        tool_choice = kwargs.get("tool_choice")
 								        max_tokens = kwargs.get("max_tokens") or kwargs.get("max_completion_tokens") or 2000
 								        temperature = kwargs.get("temperature")
 								        normalized_tool_choice = None
 								        if isinstance(tool_choice, str):
 								            normalized_tool_choice = tool_choice
 								        elif isinstance(tool_choice, dict):
 								            choice_type = str(tool_choice.get("type", "")).lower()
 								            if choice_type == "function":
 								                normalized_tool_choice = tool_choice.get("function", {}).get("name")
 								            elif choice_type in {"auto", "required", "none"}:
 								                normalized_tool_choice = choice_type
 								        anthropic_kwargs = build_anthropic_kwargs(
 								            model=model,
 								            messages=messages,
 								            tools=tools,
 								            max_tokens=max_tokens,
 								            reasoning_config=None,
 								            tool_choice=normalized_tool_choice,
-												fix: auxiliary client skips expired Codex JWT and propagates Anthropic OAuth flag

Two bugs in the auxiliary provider auto-detection chain:

1. Expired Codex JWT blocks the auto chain: _read_codex_access_token()
   returned any stored token without checking expiry, preventing fallback
   to working providers. Now decodes JWT exp claim and returns None for
   expired tokens.

2. Auxiliary Anthropic client missing OAuth identity transforms:
   _AnthropicCompletionsAdapter always called build_anthropic_kwargs with
   is_oauth=False, causing 400 errors for OAuth tokens. Now detects OAuth
   tokens via _is_oauth_token() and propagates the flag through the
   adapter chain.

Cherry-picked from PR #2378 by 0xbyt4. Fixed test_api_key_no_oauth_flag
to mock resolve_anthropic_token directly (env var alone was insufficient).

											
										
										
											2026-03-21 17:36:25 -07:00
+								            is_oauth=self._is_oauth,
-												feat: add native Anthropic auxiliary vision

											
										
										
											2026-03-14 21:14:20 -07:00
+								        )
-												fix(agent): complete Claude Opus 4.7 API migration

Claude Opus 4.7 introduced several breaking API changes that the current
codebase partially handled but not completely. This patch finishes the
migration per the official migration guide at
https://platform.claude.com/docs/en/about-claude/models/migration-guide

Fixes NousResearch/hermes-agent#11137

Breaking-change coverage:

1. Adaptive thinking + output_config.effort — 4.7 is now recognized by
   _supports_adaptive_thinking() (extends previous 4.6-only gate).

2. Sampling parameter stripping — 4.7 returns 400 for any non-default
   temperature / top_p / top_k. build_anthropic_kwargs drops them as a
   safety net; the OpenAI-protocol auxiliary path (_build_call_kwargs)
   and AnthropicCompletionsAdapter.create() both early-exit before
   setting temperature for 4.7+ models. This keeps flush_memories and
   structured-JSON aux paths that hardcode temperature from 400ing
   when the aux model is flipped to 4.7.

3. thinking.display = "summarized" — 4.7 defaults display to "omitted",
   which silently hides reasoning text from Hermes's CLI activity feed
   during long tool runs. Restoring "summarized" preserves 4.6 UX.

4. Effort level mapping — xhigh now maps to xhigh (was xhigh→max, which
   silently over-efforted every coding/agentic request). max is now a
   distinct ceiling per Anthropic's 5-level effort model.

5. New stop_reason values — refusal and model_context_window_exceeded
   were silently collapsed to "stop" (end_turn) by the adapter's
   stop_reason_map. Now mapped to "content_filter" and "length"
   respectively, matching upstream finish-reason handling already in
   bedrock_adapter.

6. Model catalogs — claude-opus-4-7 added to the Anthropic provider
   list, anthropic/claude-opus-4.7 added at top of OpenRouter fallback
   catalog (recommended), claude-opus-4-7 added to model_metadata
   DEFAULT_CONTEXT_LENGTHS (1M, matching 4.6 per migration guide).

7. Prefill docstrings — run_agent.AIAgent and BatchRunner now document
   that Anthropic Sonnet/Opus 4.6+ reject a trailing assistant-role
   prefill (400).

8. Tests — 4 new tests in test_anthropic_adapter covering display
   default, xhigh preservation, max on 4.7, refusal / context-overflow
   stop_reason mapping, plus the sampling-param predicate. test_model_metadata
   accepts 4.7 at 1M context.

Tested on macOS 15.5 (darwin). 119 tests pass in
tests/agent/test_anthropic_adapter.py, 1320 pass in tests/agent/.

											
										
										
											2026-04-16 12:35:43 -05:00
+								        # Opus 4.7+ rejects any non-default temperature/top_p/top_k; only set
 								        # temperature for models that still accept it. build_anthropic_kwargs
 								        # additionally strips these keys as a safety net — keep both layers.
-												feat: add native Anthropic auxiliary vision

											
										
										
											2026-03-14 21:14:20 -07:00
+								        if temperature is not None:
-												fix(agent): complete Claude Opus 4.7 API migration

Claude Opus 4.7 introduced several breaking API changes that the current
codebase partially handled but not completely. This patch finishes the
migration per the official migration guide at
https://platform.claude.com/docs/en/about-claude/models/migration-guide

Fixes NousResearch/hermes-agent#11137

Breaking-change coverage:

1. Adaptive thinking + output_config.effort — 4.7 is now recognized by
   _supports_adaptive_thinking() (extends previous 4.6-only gate).

2. Sampling parameter stripping — 4.7 returns 400 for any non-default
   temperature / top_p / top_k. build_anthropic_kwargs drops them as a
   safety net; the OpenAI-protocol auxiliary path (_build_call_kwargs)
   and AnthropicCompletionsAdapter.create() both early-exit before
   setting temperature for 4.7+ models. This keeps flush_memories and
   structured-JSON aux paths that hardcode temperature from 400ing
   when the aux model is flipped to 4.7.

3. thinking.display = "summarized" — 4.7 defaults display to "omitted",
   which silently hides reasoning text from Hermes's CLI activity feed
   during long tool runs. Restoring "summarized" preserves 4.6 UX.

4. Effort level mapping — xhigh now maps to xhigh (was xhigh→max, which
   silently over-efforted every coding/agentic request). max is now a
   distinct ceiling per Anthropic's 5-level effort model.

5. New stop_reason values — refusal and model_context_window_exceeded
   were silently collapsed to "stop" (end_turn) by the adapter's
   stop_reason_map. Now mapped to "content_filter" and "length"
   respectively, matching upstream finish-reason handling already in
   bedrock_adapter.

6. Model catalogs — claude-opus-4-7 added to the Anthropic provider
   list, anthropic/claude-opus-4.7 added at top of OpenRouter fallback
   catalog (recommended), claude-opus-4-7 added to model_metadata
   DEFAULT_CONTEXT_LENGTHS (1M, matching 4.6 per migration guide).

7. Prefill docstrings — run_agent.AIAgent and BatchRunner now document
   that Anthropic Sonnet/Opus 4.6+ reject a trailing assistant-role
   prefill (400).

8. Tests — 4 new tests in test_anthropic_adapter covering display
   default, xhigh preservation, max on 4.7, refusal / context-overflow
   stop_reason mapping, plus the sampling-param predicate. test_model_metadata
   accepts 4.7 at 1M context.

Tested on macOS 15.5 (darwin). 119 tests pass in
tests/agent/test_anthropic_adapter.py, 1320 pass in tests/agent/.

											
										
										
											2026-04-16 12:35:43 -05:00
+								            from agent.anthropic_adapter import _forbids_sampling_params
 								            if not _forbids_sampling_params(model):
 								                anthropic_kwargs["temperature"] = temperature
-												feat: add native Anthropic auxiliary vision

											
										
										
											2026-03-14 21:14:20 -07:00
 								        response = self._client.messages.create(**anthropic_kwargs)
-												refactor: migrate auxiliary_client Anthropic path to use transport

Replace direct normalize_anthropic_response() call in
_AnthropicCompletionsAdapter.create() with
AnthropicTransport.normalize_response() via get_transport().

Before: auxiliary_client called adapter v1 directly, bypassing
the transport layer entirely.

After: auxiliary_client → get_transport('anthropic_messages') →
transport.normalize_response() → adapter v1 → NormalizedResponse.

The adapter v1 function (normalize_anthropic_response) now has
zero callers outside agent/anthropic_adapter.py and the transport.
This unblocks collapsing v1 to return NormalizedResponse directly
in a follow-up (the remaining 2-layer chain becomes 1-layer).

WS1 item 2 of Cycle 2 (#14418).

											
										
										
											2026-04-23 13:39:44 +05:30
+								        _transport = get_transport("anthropic_messages")
 								        _nr = _transport.normalize_response(
 								            response, strip_tool_prefix=self._is_oauth
 								        )
-												refactor: remove _nr_to_assistant_message shim + fix flush_memories guard

NormalizedResponse and ToolCall now have backward-compat properties
so the agent loop can read them directly without the shim:

  ToolCall: .type, .function (returns self), .call_id, .response_item_id
  NormalizedResponse: .reasoning_content, .reasoning_details,
                      .codex_reasoning_items

This eliminates the 35-line shim and its 4 call sites in run_agent.py.

Also changes flush_memories guard from hasattr(response, 'choices')
to self.api_mode in ('chat_completions', 'bedrock_converse') so it
works with raw boto3 dicts too.

WS1 items 3+4 of Cycle 2 (#14418).

											
										
										
											2026-04-23 14:06:36 +05:30
+								        # ToolCall already duck-types as OpenAI shape (.type, .function.name,
 								        # .function.arguments) via properties, so no wrapping needed.
-												refactor: migrate auxiliary_client Anthropic path to use transport

Replace direct normalize_anthropic_response() call in
_AnthropicCompletionsAdapter.create() with
AnthropicTransport.normalize_response() via get_transport().

Before: auxiliary_client called adapter v1 directly, bypassing
the transport layer entirely.

After: auxiliary_client → get_transport('anthropic_messages') →
transport.normalize_response() → adapter v1 → NormalizedResponse.

The adapter v1 function (normalize_anthropic_response) now has
zero callers outside agent/anthropic_adapter.py and the transport.
This unblocks collapsing v1 to return NormalizedResponse directly
in a follow-up (the remaining 2-layer chain becomes 1-layer).

WS1 item 2 of Cycle 2 (#14418).

											
										
										
											2026-04-23 13:39:44 +05:30
+								        assistant_message = SimpleNamespace(
 								            content=_nr.content,
-												refactor: remove _nr_to_assistant_message shim + fix flush_memories guard

NormalizedResponse and ToolCall now have backward-compat properties
so the agent loop can read them directly without the shim:

  ToolCall: .type, .function (returns self), .call_id, .response_item_id
  NormalizedResponse: .reasoning_content, .reasoning_details,
                      .codex_reasoning_items

This eliminates the 35-line shim and its 4 call sites in run_agent.py.

Also changes flush_memories guard from hasattr(response, 'choices')
to self.api_mode in ('chat_completions', 'bedrock_converse') so it
works with raw boto3 dicts too.

WS1 items 3+4 of Cycle 2 (#14418).

											
										
										
											2026-04-23 14:06:36 +05:30
+								            tool_calls=_nr.tool_calls,
-												refactor: migrate auxiliary_client Anthropic path to use transport

Replace direct normalize_anthropic_response() call in
_AnthropicCompletionsAdapter.create() with
AnthropicTransport.normalize_response() via get_transport().

Before: auxiliary_client called adapter v1 directly, bypassing
the transport layer entirely.

After: auxiliary_client → get_transport('anthropic_messages') →
transport.normalize_response() → adapter v1 → NormalizedResponse.

The adapter v1 function (normalize_anthropic_response) now has
zero callers outside agent/anthropic_adapter.py and the transport.
This unblocks collapsing v1 to return NormalizedResponse directly
in a follow-up (the remaining 2-layer chain becomes 1-layer).

WS1 item 2 of Cycle 2 (#14418).

											
										
										
											2026-04-23 13:39:44 +05:30
+								            reasoning=_nr.reasoning,
 								        )
 								        finish_reason = _nr.finish_reason
-												feat: add native Anthropic auxiliary vision

											
										
										
											2026-03-14 21:14:20 -07:00
 								        usage = None
 								        if hasattr(response, "usage") and response.usage:
 								            prompt_tokens = getattr(response.usage, "input_tokens", 0) or 0
 								            completion_tokens = getattr(response.usage, "output_tokens", 0) or 0
 								            total_tokens = getattr(response.usage, "total_tokens", 0) or (prompt_tokens + completion_tokens)
 								            usage = SimpleNamespace(
 								                prompt_tokens=prompt_tokens,
 								                completion_tokens=completion_tokens,
 								                total_tokens=total_tokens,
 								            )
 								        choice = SimpleNamespace(
 								            index=0,
 								            message=assistant_message,
 								            finish_reason=finish_reason,
 								        )
 								        return SimpleNamespace(
 								            choices=[choice],
 								            model=model,
 								            usage=usage,
 								        )
 								class _AnthropicChatShim:
 								    def __init__(self, adapter: _AnthropicCompletionsAdapter):
 								        self.completions = adapter
 								class AnthropicAuxiliaryClient:
 								    """OpenAI-client-compatible wrapper over a native Anthropic client."""
-												fix: auxiliary client skips expired Codex JWT and propagates Anthropic OAuth flag

Two bugs in the auxiliary provider auto-detection chain:

1. Expired Codex JWT blocks the auto chain: _read_codex_access_token()
   returned any stored token without checking expiry, preventing fallback
   to working providers. Now decodes JWT exp claim and returns None for
   expired tokens.

2. Auxiliary Anthropic client missing OAuth identity transforms:
   _AnthropicCompletionsAdapter always called build_anthropic_kwargs with
   is_oauth=False, causing 400 errors for OAuth tokens. Now detects OAuth
   tokens via _is_oauth_token() and propagates the flag through the
   adapter chain.

Cherry-picked from PR #2378 by 0xbyt4. Fixed test_api_key_no_oauth_flag
to mock resolve_anthropic_token directly (env var alone was insufficient).

											
										
										
											2026-03-21 17:36:25 -07:00
+								    def __init__(self, real_client: Any, model: str, api_key: str, base_url: str, is_oauth: bool = False):
-												feat: add native Anthropic auxiliary vision

											
										
										
											2026-03-14 21:14:20 -07:00
+								        self._real_client = real_client
-												fix: auxiliary client skips expired Codex JWT and propagates Anthropic OAuth flag

Two bugs in the auxiliary provider auto-detection chain:

1. Expired Codex JWT blocks the auto chain: _read_codex_access_token()
   returned any stored token without checking expiry, preventing fallback
   to working providers. Now decodes JWT exp claim and returns None for
   expired tokens.

2. Auxiliary Anthropic client missing OAuth identity transforms:
   _AnthropicCompletionsAdapter always called build_anthropic_kwargs with
   is_oauth=False, causing 400 errors for OAuth tokens. Now detects OAuth
   tokens via _is_oauth_token() and propagates the flag through the
   adapter chain.

Cherry-picked from PR #2378 by 0xbyt4. Fixed test_api_key_no_oauth_flag
to mock resolve_anthropic_token directly (env var alone was insufficient).

											
										
										
											2026-03-21 17:36:25 -07:00
+								        adapter = _AnthropicCompletionsAdapter(real_client, model, is_oauth=is_oauth)
-												feat: add native Anthropic auxiliary vision

											
										
										
											2026-03-14 21:14:20 -07:00
+								        self.chat = _AnthropicChatShim(adapter)
 								        self.api_key = api_key
 								        self.base_url = base_url
 								    def close(self):
 								        close_fn = getattr(self._real_client, "close", None)
 								        if callable(close_fn):
 								            close_fn()
 								class _AsyncAnthropicCompletionsAdapter:
 								    def __init__(self, sync_adapter: _AnthropicCompletionsAdapter):
 								        self._sync = sync_adapter
 								    async def create(self, **kwargs) -> Any:
 								        import asyncio
 								        return await asyncio.to_thread(self._sync.create, **kwargs)
 								class _AsyncAnthropicChatShim:
 								    def __init__(self, adapter: _AsyncAnthropicCompletionsAdapter):
 								        self.completions = adapter
 								class AsyncAnthropicAuxiliaryClient:
 								    def __init__(self, sync_wrapper: "AnthropicAuxiliaryClient"):
 								        sync_adapter = sync_wrapper.chat.completions
 								        async_adapter = _AsyncAnthropicCompletionsAdapter(sync_adapter)
 								        self.chat = _AsyncAnthropicChatShim(async_adapter)
 								        self.api_key = sync_wrapper.api_key
 								        self.base_url = sync_wrapper.base_url
-												Hermes Agent UX Improvements

											
										
										
											2026-02-22 02:16:11 -08:00
+								def _read_nous_auth() -> Optional[dict]:
 								    """Read and validate ~/.hermes/auth.json for an active Nous provider.
 								    Returns the provider state dict if Nous is active with tokens,
 								    otherwise None.
 								    """
-												feat(auth): same-provider credential pools with rotation, custom endpoint support, and interactive CLI (#2647)

* feat(auth): add same-provider credential pools and rotation UX

Add same-provider credential pooling so Hermes can rotate across
multiple credentials for a single provider, recover from exhausted
credentials without jumping providers immediately, and configure
that behavior directly in hermes setup.

- agent/credential_pool.py: persisted per-provider credential pools
- hermes auth add/list/remove/reset CLI commands
- 429/402/401 recovery with pool rotation in run_agent.py
- Setup wizard integration for pool strategy configuration
- Auto-seeding from env vars and existing OAuth state

Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
Salvaged from PR #2647

* fix(tests): prevent pool auto-seeding from host env in credential pool tests

Tests for non-pool Anthropic paths and auth remove were failing when
host env vars (ANTHROPIC_API_KEY) or file-backed OAuth credentials
were present. The pool auto-seeding picked these up, causing unexpected
pool entries in tests.

- Mock _select_pool_entry in auxiliary_client OAuth flag tests
- Clear Anthropic env vars and mock _seed_from_singletons in auth remove test

* feat(auth): add thread safety, least_used strategy, and request counting

- Add threading.Lock to CredentialPool for gateway thread safety
  (concurrent requests from multiple gateway sessions could race on
  pool state mutations without this)
- Add 'least_used' rotation strategy that selects the credential
  with the lowest request_count, distributing load more evenly
- Add request_count field to PooledCredential for usage tracking
- Add mark_used() method to increment per-credential request counts
- Wrap select(), mark_exhausted_and_rotate(), and try_refresh_current()
  with lock acquisition
- Add tests: least_used selection, mark_used counting, concurrent
  thread safety (4 threads × 20 selects with no corruption)

* feat(auth): add interactive mode for bare 'hermes auth' command

When 'hermes auth' is called without a subcommand, it now launches an
interactive wizard that:

1. Shows full credential pool status across all providers
2. Offers a menu: add, remove, reset cooldowns, set strategy
3. For OAuth-capable providers (anthropic, nous, openai-codex), the
   add flow explicitly asks 'API key or OAuth login?' — making it
   clear that both auth types are supported for the same provider
4. Strategy picker shows all 4 options (fill_first, round_robin,
   least_used, random) with the current selection marked
5. Remove flow shows entries with indices for easy selection

The subcommand paths (hermes auth add/list/remove/reset) still work
exactly as before for scripted/non-interactive use.

* fix(tests): update runtime_provider tests for config.yaml source of truth (#4165)

Tests were using OPENAI_BASE_URL env var which is no longer consulted
after #4165. Updated to use model config (provider, base_url, api_key)
which is the new single source of truth for custom endpoint URLs.

* feat(auth): support custom endpoint credential pools keyed by provider name

Custom OpenAI-compatible endpoints all share provider='custom', making
the provider-keyed pool useless. Now pools for custom endpoints are
keyed by 'custom:<normalized_name>' where the name comes from the
custom_providers config list (auto-generated from URL hostname).

- Pool key format: 'custom:together.ai', 'custom:local-(localhost:8080)'
- load_pool('custom:name') seeds from custom_providers api_key AND
  model.api_key when base_url matches
- hermes auth add/list now shows custom endpoints alongside registry
  providers
- _resolve_openrouter_runtime and _resolve_named_custom_runtime check
  pool before falling back to single config key
- 6 new tests covering custom pool keying, seeding, and listing

* docs: add Excalidraw diagram of full credential pool flow

Comprehensive architecture diagram showing:
- Credential sources (env vars, auth.json OAuth, config.yaml, CLI)
- Pool storage and auto-seeding
- Runtime resolution paths (registry, custom, OpenRouter)
- Error recovery (429 retry-then-rotate, 402 immediate, 401 refresh)
- CLI management commands and strategy configuration

Open at: https://excalidraw.com/#json=2Ycqhqpi6f12E_3ITyiwh,c7u9jSt5BwrmiVzHGbm87g

* fix(tests): update setup wizard pool tests for unified select_provider_and_model flow

The setup wizard now delegates to select_provider_and_model() instead
of using its own prompt_choice-based provider picker. Tests needed:
- Mock select_provider_and_model as no-op (provider pre-written to config)
- Call _stub_tts BEFORE custom prompt_choice mock (it overwrites it)
- Pre-write model.provider to config so the pool step is reached

* docs: add comprehensive credential pool documentation

- New page: website/docs/user-guide/features/credential-pools.md
  Full guide covering quick start, CLI commands, rotation strategies,
  error recovery, custom endpoint pools, auto-discovery, thread safety,
  architecture, and storage format.
- Updated fallback-providers.md to reference credential pools as the
  first layer of resilience (same-provider rotation before cross-provider)
- Added hermes auth to CLI commands reference with usage examples
- Added credential_pool_strategies to configuration guide

* chore: remove excalidraw diagram from repo (external link only)

* refactor: simplify credential pool code — extract helpers, collapse extras, dedup patterns

- _load_config_safe(): replace 4 identical try/except/import blocks
- _iter_custom_providers(): shared generator for custom provider iteration
- PooledCredential.extra dict: collapse 11 round-trip-only fields
  (token_type, scope, client_id, portal_base_url, obtained_at,
  expires_in, agent_key_id, agent_key_expires_in, agent_key_reused,
  agent_key_obtained_at, tls) into a single extra dict with
  __getattr__ for backward-compatible access
- _available_entries(): shared exhaustion-check between select and peek
- Dedup anthropic OAuth seeding (hermes_pkce + claude_code identical)
- SimpleNamespace replaces class _Args boilerplate in auth_commands
- _try_resolve_from_custom_pool(): shared pool-check in runtime_provider

Net -17 lines. All 383 targeted tests pass.

---------

Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
											
										
										
											2026-03-31 03:10:01 -07:00
+								    pool_present, entry = _select_pool_entry("nous")
 								    if pool_present:
 								        if entry is None:
 								            return None
 								        return {
 								            "access_token": getattr(entry, "access_token", ""),
 								            "refresh_token": getattr(entry, "refresh_token", None),
 								            "agent_key": getattr(entry, "agent_key", None),
 								            "inference_base_url": _pool_runtime_base_url(entry, _NOUS_DEFAULT_BASE_URL),
 								            "portal_base_url": getattr(entry, "portal_base_url", None),
 								            "client_id": getattr(entry, "client_id", None),
 								            "scope": getattr(entry, "scope", None),
 								            "token_type": getattr(entry, "token_type", "Bearer"),
 								            "source": "pool",
 								        }
-												Hermes Agent UX Improvements

											
										
										
											2026-02-22 02:16:11 -08:00
+								    try:
 								        if not _AUTH_JSON_PATH.is_file():
 								            return None
 								        data = json.loads(_AUTH_JSON_PATH.read_text())
 								        if data.get("active_provider") != "nous":
 								            return None
 								        provider = data.get("providers", {}).get("nous", {})
 								        # Must have at least an access_token or agent_key
 								        if not provider.get("agent_key") and not provider.get("access_token"):
 								            return None
 								        return provider
 								    except Exception as exc:
 								        logger.debug("Could not read Nous auth: %s", exc)
 								        return None
 								def _nous_api_key(provider: dict) -> str:
 								    """Extract the best API key from a Nous provider state dict."""
 								    return provider.get("agent_key") or provider.get("access_token", "")
 								def _nous_base_url() -> str:
 								    """Resolve the Nous inference base URL from env or default."""
 								    return os.getenv("NOUS_INFERENCE_BASE_URL", _NOUS_DEFAULT_BASE_URL)
-												fix(auxiliary): refresh Nous runtime credentials after aux 401s

											
										
										
											2026-04-21 14:45:13 -06:00
+								def _resolve_nous_runtime_api(*, force_refresh: bool = False) -> Optional[tuple[str, str]]:
 								    """Return fresh Nous runtime credentials when available.
 								    This mirrors the main agent's 401 recovery path and keeps auxiliary
 								    clients aligned with the singleton auth store + mint flow instead of
 								    relying only on whatever raw tokens happen to be sitting in auth.json
 								    or the credential pool.
 								    """
 								    try:
 								        from hermes_cli.auth import resolve_nous_runtime_credentials
 								        creds = resolve_nous_runtime_credentials(
 								            min_key_ttl_seconds=max(60, int(os.getenv("HERMES_NOUS_MIN_KEY_TTL_SECONDS", "1800"))),
 								            timeout_seconds=float(os.getenv("HERMES_NOUS_TIMEOUT_SECONDS", "15")),
 								            force_mint=force_refresh,
 								        )
 								    except Exception as exc:
 								        logger.debug("Auxiliary Nous runtime credential resolution failed: %s", exc)
 								        return None
 								    api_key = str(creds.get("api_key") or "").strip()
 								    base_url = str(creds.get("base_url") or "").strip().rstrip("/")
 								    if not api_key or not base_url:
 								        return None
 								    return api_key, base_url
-												refactor(cli): Finalize OpenAI Codex Integration with OAuth

- Enhanced Codex model discovery by fetching available models from the API, with fallback to local cache and defaults.
- Updated the context compressor's summary target tokens to 2500 for improved performance.
- Added external credential detection for Codex CLI to streamline authentication.
- Refactored various components to ensure consistent handling of authentication and model selection across the application.

											
										
										
											2026-02-28 21:47:51 -08:00
+								def _read_codex_access_token() -> Optional[str]:
-												fix: restore codex fallback auth-store lookup

											
										
										
											2026-04-09 00:04:30 -07:00
+								    """Read a valid, non-expired Codex OAuth access token from Hermes auth store.
 								    If a credential pool exists but currently has no selectable runtime entry
 								    (for example all pool slots are marked exhausted), fall back to the
 								    profile's auth.json token instead of hard-failing. This keeps explicit
 								    fallback-to-Codex working when the pool state is stale but the stored OAuth
 								    token is still valid.
 								    """
-												feat(auth): same-provider credential pools with rotation, custom endpoint support, and interactive CLI (#2647)

* feat(auth): add same-provider credential pools and rotation UX

Add same-provider credential pooling so Hermes can rotate across
multiple credentials for a single provider, recover from exhausted
credentials without jumping providers immediately, and configure
that behavior directly in hermes setup.

- agent/credential_pool.py: persisted per-provider credential pools
- hermes auth add/list/remove/reset CLI commands
- 429/402/401 recovery with pool rotation in run_agent.py
- Setup wizard integration for pool strategy configuration
- Auto-seeding from env vars and existing OAuth state

Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
Salvaged from PR #2647

* fix(tests): prevent pool auto-seeding from host env in credential pool tests

Tests for non-pool Anthropic paths and auth remove were failing when
host env vars (ANTHROPIC_API_KEY) or file-backed OAuth credentials
were present. The pool auto-seeding picked these up, causing unexpected
pool entries in tests.

- Mock _select_pool_entry in auxiliary_client OAuth flag tests
- Clear Anthropic env vars and mock _seed_from_singletons in auth remove test

* feat(auth): add thread safety, least_used strategy, and request counting

- Add threading.Lock to CredentialPool for gateway thread safety
  (concurrent requests from multiple gateway sessions could race on
  pool state mutations without this)
- Add 'least_used' rotation strategy that selects the credential
  with the lowest request_count, distributing load more evenly
- Add request_count field to PooledCredential for usage tracking
- Add mark_used() method to increment per-credential request counts
- Wrap select(), mark_exhausted_and_rotate(), and try_refresh_current()
  with lock acquisition
- Add tests: least_used selection, mark_used counting, concurrent
  thread safety (4 threads × 20 selects with no corruption)

* feat(auth): add interactive mode for bare 'hermes auth' command

When 'hermes auth' is called without a subcommand, it now launches an
interactive wizard that:

1. Shows full credential pool status across all providers
2. Offers a menu: add, remove, reset cooldowns, set strategy
3. For OAuth-capable providers (anthropic, nous, openai-codex), the
   add flow explicitly asks 'API key or OAuth login?' — making it
   clear that both auth types are supported for the same provider
4. Strategy picker shows all 4 options (fill_first, round_robin,
   least_used, random) with the current selection marked
5. Remove flow shows entries with indices for easy selection

The subcommand paths (hermes auth add/list/remove/reset) still work
exactly as before for scripted/non-interactive use.

* fix(tests): update runtime_provider tests for config.yaml source of truth (#4165)

Tests were using OPENAI_BASE_URL env var which is no longer consulted
after #4165. Updated to use model config (provider, base_url, api_key)
which is the new single source of truth for custom endpoint URLs.

* feat(auth): support custom endpoint credential pools keyed by provider name

Custom OpenAI-compatible endpoints all share provider='custom', making
the provider-keyed pool useless. Now pools for custom endpoints are
keyed by 'custom:<normalized_name>' where the name comes from the
custom_providers config list (auto-generated from URL hostname).

- Pool key format: 'custom:together.ai', 'custom:local-(localhost:8080)'
- load_pool('custom:name') seeds from custom_providers api_key AND
  model.api_key when base_url matches
- hermes auth add/list now shows custom endpoints alongside registry
  providers
- _resolve_openrouter_runtime and _resolve_named_custom_runtime check
  pool before falling back to single config key
- 6 new tests covering custom pool keying, seeding, and listing

* docs: add Excalidraw diagram of full credential pool flow

Comprehensive architecture diagram showing:
- Credential sources (env vars, auth.json OAuth, config.yaml, CLI)
- Pool storage and auto-seeding
- Runtime resolution paths (registry, custom, OpenRouter)
- Error recovery (429 retry-then-rotate, 402 immediate, 401 refresh)
- CLI management commands and strategy configuration

Open at: https://excalidraw.com/#json=2Ycqhqpi6f12E_3ITyiwh,c7u9jSt5BwrmiVzHGbm87g

* fix(tests): update setup wizard pool tests for unified select_provider_and_model flow

The setup wizard now delegates to select_provider_and_model() instead
of using its own prompt_choice-based provider picker. Tests needed:
- Mock select_provider_and_model as no-op (provider pre-written to config)
- Call _stub_tts BEFORE custom prompt_choice mock (it overwrites it)
- Pre-write model.provider to config so the pool step is reached

* docs: add comprehensive credential pool documentation

- New page: website/docs/user-guide/features/credential-pools.md
  Full guide covering quick start, CLI commands, rotation strategies,
  error recovery, custom endpoint pools, auto-discovery, thread safety,
  architecture, and storage format.
- Updated fallback-providers.md to reference credential pools as the
  first layer of resilience (same-provider rotation before cross-provider)
- Added hermes auth to CLI commands reference with usage examples
- Added credential_pool_strategies to configuration guide

* chore: remove excalidraw diagram from repo (external link only)

* refactor: simplify credential pool code — extract helpers, collapse extras, dedup patterns

- _load_config_safe(): replace 4 identical try/except/import blocks
- _iter_custom_providers(): shared generator for custom provider iteration
- PooledCredential.extra dict: collapse 11 round-trip-only fields
  (token_type, scope, client_id, portal_base_url, obtained_at,
  expires_in, agent_key_id, agent_key_expires_in, agent_key_reused,
  agent_key_obtained_at, tls) into a single extra dict with
  __getattr__ for backward-compatible access
- _available_entries(): shared exhaustion-check between select and peek
- Dedup anthropic OAuth seeding (hermes_pkce + claude_code identical)
- SimpleNamespace replaces class _Args boilerplate in auth_commands
- _try_resolve_from_custom_pool(): shared pool-check in runtime_provider

Net -17 lines. All 383 targeted tests pass.

---------

Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
											
										
										
											2026-03-31 03:10:01 -07:00
+								    pool_present, entry = _select_pool_entry("openai-codex")
 								    if pool_present:
 								        token = _pool_runtime_api_key(entry)
-												fix: restore codex fallback auth-store lookup

											
										
										
											2026-04-09 00:04:30 -07:00
+								        if token:
 								            return token
-												feat(auth): same-provider credential pools with rotation, custom endpoint support, and interactive CLI (#2647)

* feat(auth): add same-provider credential pools and rotation UX

Add same-provider credential pooling so Hermes can rotate across
multiple credentials for a single provider, recover from exhausted
credentials without jumping providers immediately, and configure
that behavior directly in hermes setup.

- agent/credential_pool.py: persisted per-provider credential pools
- hermes auth add/list/remove/reset CLI commands
- 429/402/401 recovery with pool rotation in run_agent.py
- Setup wizard integration for pool strategy configuration
- Auto-seeding from env vars and existing OAuth state

Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
Salvaged from PR #2647

* fix(tests): prevent pool auto-seeding from host env in credential pool tests

Tests for non-pool Anthropic paths and auth remove were failing when
host env vars (ANTHROPIC_API_KEY) or file-backed OAuth credentials
were present. The pool auto-seeding picked these up, causing unexpected
pool entries in tests.

- Mock _select_pool_entry in auxiliary_client OAuth flag tests
- Clear Anthropic env vars and mock _seed_from_singletons in auth remove test

* feat(auth): add thread safety, least_used strategy, and request counting

- Add threading.Lock to CredentialPool for gateway thread safety
  (concurrent requests from multiple gateway sessions could race on
  pool state mutations without this)
- Add 'least_used' rotation strategy that selects the credential
  with the lowest request_count, distributing load more evenly
- Add request_count field to PooledCredential for usage tracking
- Add mark_used() method to increment per-credential request counts
- Wrap select(), mark_exhausted_and_rotate(), and try_refresh_current()
  with lock acquisition
- Add tests: least_used selection, mark_used counting, concurrent
  thread safety (4 threads × 20 selects with no corruption)

* feat(auth): add interactive mode for bare 'hermes auth' command

When 'hermes auth' is called without a subcommand, it now launches an
interactive wizard that:

1. Shows full credential pool status across all providers
2. Offers a menu: add, remove, reset cooldowns, set strategy
3. For OAuth-capable providers (anthropic, nous, openai-codex), the
   add flow explicitly asks 'API key or OAuth login?' — making it
   clear that both auth types are supported for the same provider
4. Strategy picker shows all 4 options (fill_first, round_robin,
   least_used, random) with the current selection marked
5. Remove flow shows entries with indices for easy selection

The subcommand paths (hermes auth add/list/remove/reset) still work
exactly as before for scripted/non-interactive use.

* fix(tests): update runtime_provider tests for config.yaml source of truth (#4165)

Tests were using OPENAI_BASE_URL env var which is no longer consulted
after #4165. Updated to use model config (provider, base_url, api_key)
which is the new single source of truth for custom endpoint URLs.

* feat(auth): support custom endpoint credential pools keyed by provider name

Custom OpenAI-compatible endpoints all share provider='custom', making
the provider-keyed pool useless. Now pools for custom endpoints are
keyed by 'custom:<normalized_name>' where the name comes from the
custom_providers config list (auto-generated from URL hostname).

- Pool key format: 'custom:together.ai', 'custom:local-(localhost:8080)'
- load_pool('custom:name') seeds from custom_providers api_key AND
  model.api_key when base_url matches
- hermes auth add/list now shows custom endpoints alongside registry
  providers
- _resolve_openrouter_runtime and _resolve_named_custom_runtime check
  pool before falling back to single config key
- 6 new tests covering custom pool keying, seeding, and listing

* docs: add Excalidraw diagram of full credential pool flow

Comprehensive architecture diagram showing:
- Credential sources (env vars, auth.json OAuth, config.yaml, CLI)
- Pool storage and auto-seeding
- Runtime resolution paths (registry, custom, OpenRouter)
- Error recovery (429 retry-then-rotate, 402 immediate, 401 refresh)
- CLI management commands and strategy configuration

Open at: https://excalidraw.com/#json=2Ycqhqpi6f12E_3ITyiwh,c7u9jSt5BwrmiVzHGbm87g

* fix(tests): update setup wizard pool tests for unified select_provider_and_model flow

The setup wizard now delegates to select_provider_and_model() instead
of using its own prompt_choice-based provider picker. Tests needed:
- Mock select_provider_and_model as no-op (provider pre-written to config)
- Call _stub_tts BEFORE custom prompt_choice mock (it overwrites it)
- Pre-write model.provider to config so the pool step is reached

* docs: add comprehensive credential pool documentation

- New page: website/docs/user-guide/features/credential-pools.md
  Full guide covering quick start, CLI commands, rotation strategies,
  error recovery, custom endpoint pools, auto-discovery, thread safety,
  architecture, and storage format.
- Updated fallback-providers.md to reference credential pools as the
  first layer of resilience (same-provider rotation before cross-provider)
- Added hermes auth to CLI commands reference with usage examples
- Added credential_pool_strategies to configuration guide

* chore: remove excalidraw diagram from repo (external link only)

* refactor: simplify credential pool code — extract helpers, collapse extras, dedup patterns

- _load_config_safe(): replace 4 identical try/except/import blocks
- _iter_custom_providers(): shared generator for custom provider iteration
- PooledCredential.extra dict: collapse 11 round-trip-only fields
  (token_type, scope, client_id, portal_base_url, obtained_at,
  expires_in, agent_key_id, agent_key_expires_in, agent_key_reused,
  agent_key_obtained_at, tls) into a single extra dict with
  __getattr__ for backward-compatible access
- _available_entries(): shared exhaustion-check between select and peek
- Dedup anthropic OAuth seeding (hermes_pkce + claude_code identical)
- SimpleNamespace replaces class _Args boilerplate in auth_commands
- _try_resolve_from_custom_pool(): shared pool-check in runtime_provider

Net -17 lines. All 383 targeted tests pass.

---------

Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
											
										
										
											2026-03-31 03:10:01 -07:00
-												refactor(cli): Finalize OpenAI Codex Integration with OAuth

- Enhanced Codex model discovery by fetching available models from the API, with fallback to local cache and defaults.
- Updated the context compressor's summary target tokens to 2500 for improved performance.
- Added external credential detection for Codex CLI to streamline authentication.
- Refactored various components to ensure consistent handling of authentication and model selection across the application.

											
										
										
											2026-02-28 21:47:51 -08:00
+								    try:
-												refactor(auth): transition Codex OAuth tokens to Hermes auth store

Updated the authentication mechanism to store Codex OAuth tokens in the Hermes auth store located at ~/.hermes/auth.json instead of the previous ~/.codex/auth.json. This change includes refactoring related functions for reading and saving tokens, ensuring better management of authentication states and preventing conflicts between different applications. Adjusted tests to reflect the new storage structure and improved error handling for missing or malformed tokens.

											
										
										
											2026-03-01 19:59:24 -08:00
+								        from hermes_cli.auth import _read_codex_tokens
 								        data = _read_codex_tokens()
 								        tokens = data.get("tokens", {})
-												refactor(cli): Finalize OpenAI Codex Integration with OAuth

- Enhanced Codex model discovery by fetching available models from the API, with fallback to local cache and defaults.
- Updated the context compressor's summary target tokens to 2500 for improved performance.
- Added external credential detection for Codex CLI to streamline authentication.
- Refactored various components to ensure consistent handling of authentication and model selection across the application.

											
										
										
											2026-02-28 21:47:51 -08:00
+								        access_token = tokens.get("access_token")
-												fix: auxiliary client skips expired Codex JWT and propagates Anthropic OAuth flag

Two bugs in the auxiliary provider auto-detection chain:

1. Expired Codex JWT blocks the auto chain: _read_codex_access_token()
   returned any stored token without checking expiry, preventing fallback
   to working providers. Now decodes JWT exp claim and returns None for
   expired tokens.

2. Auxiliary Anthropic client missing OAuth identity transforms:
   _AnthropicCompletionsAdapter always called build_anthropic_kwargs with
   is_oauth=False, causing 400 errors for OAuth tokens. Now detects OAuth
   tokens via _is_oauth_token() and propagates the flag through the
   adapter chain.

Cherry-picked from PR #2378 by 0xbyt4. Fixed test_api_key_no_oauth_flag
to mock resolve_anthropic_token directly (env var alone was insufficient).

											
										
										
											2026-03-21 17:36:25 -07:00
+								        if not isinstance(access_token, str) or not access_token.strip():
 								            return None
 								        # Check JWT expiry — expired tokens block the auto chain and
 								        # prevent fallback to working providers (e.g. Anthropic).
 								        try:
 								            import base64
 								            payload = access_token.split(".")[1]
 								            payload += "=" * (-len(payload) % 4)
 								            claims = json.loads(base64.urlsafe_b64decode(payload))
 								            exp = claims.get("exp", 0)
 								            if exp and time.time() > exp:
 								                logger.debug("Codex access token expired (exp=%s), skipping", exp)
 								                return None
 								        except Exception:
 								            pass  # Non-JWT token or decode error — use as-is
 								        return access_token.strip()
-												refactor(cli): Finalize OpenAI Codex Integration with OAuth

- Enhanced Codex model discovery by fetching available models from the API, with fallback to local cache and defaults.
- Updated the context compressor's summary target tokens to 2500 for improved performance.
- Added external credential detection for Codex CLI to streamline authentication.
- Refactored various components to ensure consistent handling of authentication and model selection across the application.

											
										
										
											2026-02-28 21:47:51 -08:00
+								    except Exception as exc:
 								        logger.debug("Could not read Codex auth for auxiliary client: %s", exc)
 								        return None
-												feat: add direct API-key providers as auxiliary client fallbacks

When the user only has a z.ai/Kimi/MiniMax API key (no OpenRouter key),
auxiliary tasks (context compression, web summarization, session search)
now fall back to the configured direct provider instead of returning None.

Resolution chain: OpenRouter -> Nous -> Custom endpoint -> Codex OAuth
-> direct API-key providers -> None.

Uses cheap/fast models for auxiliary tasks:
- zai: glm-4.5-flash
- kimi-coding: kimi-k2-turbo-preview
- minimax/minimax-cn: MiniMax-M2.5-highspeed

Vision auxiliary intentionally NOT modified — vision needs multimodal
models (Gemini) that these providers don't serve.

											
										
										
											2026-03-06 19:08:54 -08:00
+								def _resolve_api_key_provider() -> Tuple[Optional[OpenAI], Optional[str]]:
 								    """Try each API-key provider in PROVIDER_REGISTRY order.
-												feat: integrate GitHub Copilot providers across Hermes

Add first-class GitHub Copilot and Copilot ACP provider support across
model selection, runtime provider resolution, CLI sessions, delegated
subagents, cron jobs, and the Telegram gateway.

This also normalizes Copilot model catalogs and API modes, introduces a
Copilot ACP OpenAI-compatible shim, and fixes service-mode auth by
resolving Homebrew-installed gh binaries under launchd.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

											
										
										
											2026-03-17 23:40:22 -07:00
+								    Returns (client, model) for the first provider with usable runtime
 								    credentials, or (None, None) if none are configured.
-												feat: add direct API-key providers as auxiliary client fallbacks

When the user only has a z.ai/Kimi/MiniMax API key (no OpenRouter key),
auxiliary tasks (context compression, web summarization, session search)
now fall back to the configured direct provider instead of returning None.

Resolution chain: OpenRouter -> Nous -> Custom endpoint -> Codex OAuth
-> direct API-key providers -> None.

Uses cheap/fast models for auxiliary tasks:
- zai: glm-4.5-flash
- kimi-coding: kimi-k2-turbo-preview
- minimax/minimax-cn: MiniMax-M2.5-highspeed

Vision auxiliary intentionally NOT modified — vision needs multimodal
models (Gemini) that these providers don't serve.

											
										
										
											2026-03-06 19:08:54 -08:00
+								    """
 								    try:
-												feat: integrate GitHub Copilot providers across Hermes

Add first-class GitHub Copilot and Copilot ACP provider support across
model selection, runtime provider resolution, CLI sessions, delegated
subagents, cron jobs, and the Telegram gateway.

This also normalizes Copilot model catalogs and API modes, introduces a
Copilot ACP OpenAI-compatible shim, and fixes service-mode auth by
resolving Homebrew-installed gh binaries under launchd.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

											
										
										
											2026-03-17 23:40:22 -07:00
+								        from hermes_cli.auth import PROVIDER_REGISTRY, resolve_api_key_provider_credentials
-												feat: add direct API-key providers as auxiliary client fallbacks

When the user only has a z.ai/Kimi/MiniMax API key (no OpenRouter key),
auxiliary tasks (context compression, web summarization, session search)
now fall back to the configured direct provider instead of returning None.

Resolution chain: OpenRouter -> Nous -> Custom endpoint -> Codex OAuth
-> direct API-key providers -> None.

Uses cheap/fast models for auxiliary tasks:
- zai: glm-4.5-flash
- kimi-coding: kimi-k2-turbo-preview
- minimax/minimax-cn: MiniMax-M2.5-highspeed

Vision auxiliary intentionally NOT modified — vision needs multimodal
models (Gemini) that these providers don't serve.

											
										
										
											2026-03-06 19:08:54 -08:00
+								    except ImportError:
 								        logger.debug("Could not import PROVIDER_REGISTRY for API-key fallback")
 								        return None, None
 								    for provider_id, pconfig in PROVIDER_REGISTRY.items():
 								        if pconfig.auth_type != "api_key":
 								            continue
-												feat: add native Anthropic auxiliary vision

											
										
										
											2026-03-14 21:14:20 -07:00
+								        if provider_id == "anthropic":
-												fix(auxiliary): skip anthropic in fallback chain when not explicitly configured

_resolve_api_key_provider() now checks is_provider_explicitly_configured
before calling _try_anthropic().  Previously, any auxiliary fallback
(e.g. when kimi-coding key was invalid) would silently discover and use
Claude Code OAuth tokens — consuming the user's Claude Max subscription
without their knowledge.

This is the auxiliary-client counterpart of the setup-wizard gate in
PR #4210.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

											
										
										
											2026-04-10 15:16:18 +08:00
+								            # Only try anthropic when the user has explicitly configured it.
 								            # Without this gate, Claude Code credentials get silently used
 								            # as auxiliary fallback when the user's primary provider fails.
 								            try:
 								                from hermes_cli.auth import is_provider_explicitly_configured
 								                if not is_provider_explicitly_configured("anthropic"):
 								                    continue
 								            except ImportError:
 								                pass
-												feat: add native Anthropic auxiliary vision

											
										
										
											2026-03-14 21:14:20 -07:00
+								            return _try_anthropic()
-												feat(auth): same-provider credential pools with rotation, custom endpoint support, and interactive CLI (#2647)

* feat(auth): add same-provider credential pools and rotation UX

Add same-provider credential pooling so Hermes can rotate across
multiple credentials for a single provider, recover from exhausted
credentials without jumping providers immediately, and configure
that behavior directly in hermes setup.

- agent/credential_pool.py: persisted per-provider credential pools
- hermes auth add/list/remove/reset CLI commands
- 429/402/401 recovery with pool rotation in run_agent.py
- Setup wizard integration for pool strategy configuration
- Auto-seeding from env vars and existing OAuth state

Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
Salvaged from PR #2647

* fix(tests): prevent pool auto-seeding from host env in credential pool tests

Tests for non-pool Anthropic paths and auth remove were failing when
host env vars (ANTHROPIC_API_KEY) or file-backed OAuth credentials
were present. The pool auto-seeding picked these up, causing unexpected
pool entries in tests.

- Mock _select_pool_entry in auxiliary_client OAuth flag tests
- Clear Anthropic env vars and mock _seed_from_singletons in auth remove test

* feat(auth): add thread safety, least_used strategy, and request counting

- Add threading.Lock to CredentialPool for gateway thread safety
  (concurrent requests from multiple gateway sessions could race on
  pool state mutations without this)
- Add 'least_used' rotation strategy that selects the credential
  with the lowest request_count, distributing load more evenly
- Add request_count field to PooledCredential for usage tracking
- Add mark_used() method to increment per-credential request counts
- Wrap select(), mark_exhausted_and_rotate(), and try_refresh_current()
  with lock acquisition
- Add tests: least_used selection, mark_used counting, concurrent
  thread safety (4 threads × 20 selects with no corruption)

* feat(auth): add interactive mode for bare 'hermes auth' command

When 'hermes auth' is called without a subcommand, it now launches an
interactive wizard that:

1. Shows full credential pool status across all providers
2. Offers a menu: add, remove, reset cooldowns, set strategy
3. For OAuth-capable providers (anthropic, nous, openai-codex), the
   add flow explicitly asks 'API key or OAuth login?' — making it
   clear that both auth types are supported for the same provider
4. Strategy picker shows all 4 options (fill_first, round_robin,
   least_used, random) with the current selection marked
5. Remove flow shows entries with indices for easy selection

The subcommand paths (hermes auth add/list/remove/reset) still work
exactly as before for scripted/non-interactive use.

* fix(tests): update runtime_provider tests for config.yaml source of truth (#4165)

Tests were using OPENAI_BASE_URL env var which is no longer consulted
after #4165. Updated to use model config (provider, base_url, api_key)
which is the new single source of truth for custom endpoint URLs.

* feat(auth): support custom endpoint credential pools keyed by provider name

Custom OpenAI-compatible endpoints all share provider='custom', making
the provider-keyed pool useless. Now pools for custom endpoints are
keyed by 'custom:<normalized_name>' where the name comes from the
custom_providers config list (auto-generated from URL hostname).

- Pool key format: 'custom:together.ai', 'custom:local-(localhost:8080)'
- load_pool('custom:name') seeds from custom_providers api_key AND
  model.api_key when base_url matches
- hermes auth add/list now shows custom endpoints alongside registry
  providers
- _resolve_openrouter_runtime and _resolve_named_custom_runtime check
  pool before falling back to single config key
- 6 new tests covering custom pool keying, seeding, and listing

* docs: add Excalidraw diagram of full credential pool flow

Comprehensive architecture diagram showing:
- Credential sources (env vars, auth.json OAuth, config.yaml, CLI)
- Pool storage and auto-seeding
- Runtime resolution paths (registry, custom, OpenRouter)
- Error recovery (429 retry-then-rotate, 402 immediate, 401 refresh)
- CLI management commands and strategy configuration

Open at: https://excalidraw.com/#json=2Ycqhqpi6f12E_3ITyiwh,c7u9jSt5BwrmiVzHGbm87g

* fix(tests): update setup wizard pool tests for unified select_provider_and_model flow

The setup wizard now delegates to select_provider_and_model() instead
of using its own prompt_choice-based provider picker. Tests needed:
- Mock select_provider_and_model as no-op (provider pre-written to config)
- Call _stub_tts BEFORE custom prompt_choice mock (it overwrites it)
- Pre-write model.provider to config so the pool step is reached

* docs: add comprehensive credential pool documentation

- New page: website/docs/user-guide/features/credential-pools.md
  Full guide covering quick start, CLI commands, rotation strategies,
  error recovery, custom endpoint pools, auto-discovery, thread safety,
  architecture, and storage format.
- Updated fallback-providers.md to reference credential pools as the
  first layer of resilience (same-provider rotation before cross-provider)
- Added hermes auth to CLI commands reference with usage examples
- Added credential_pool_strategies to configuration guide

* chore: remove excalidraw diagram from repo (external link only)

* refactor: simplify credential pool code — extract helpers, collapse extras, dedup patterns

- _load_config_safe(): replace 4 identical try/except/import blocks
- _iter_custom_providers(): shared generator for custom provider iteration
- PooledCredential.extra dict: collapse 11 round-trip-only fields
  (token_type, scope, client_id, portal_base_url, obtained_at,
  expires_in, agent_key_id, agent_key_expires_in, agent_key_reused,
  agent_key_obtained_at, tls) into a single extra dict with
  __getattr__ for backward-compatible access
- _available_entries(): shared exhaustion-check between select and peek
- Dedup anthropic OAuth seeding (hermes_pkce + claude_code identical)
- SimpleNamespace replaces class _Args boilerplate in auth_commands
- _try_resolve_from_custom_pool(): shared pool-check in runtime_provider

Net -17 lines. All 383 targeted tests pass.

---------

Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
											
										
										
											2026-03-31 03:10:01 -07:00
+								        pool_present, entry = _select_pool_entry(provider_id)
 								        if pool_present:
 								            api_key = _pool_runtime_api_key(entry)
 								            if not api_key:
 								                continue
-												fix: provider/model resolution — salvage 4 PRs + MiniMax aux URL fix (#5983)

Salvaged fixes from community PRs:

- fix(model_switch): _read_auth_store → _load_auth_store + fix auth store
  key lookup (was checking top-level dict instead of store['providers']).
  OAuth providers now correctly detected in /model picker.
  Cherry-picked from PR #5911 by Xule Lin (linxule).

- fix(ollama): pass num_ctx to override 2048 default context window.
  Ollama defaults to 2048 context regardless of model capabilities. Now
  auto-detects from /api/show metadata and injects num_ctx into every
  request. Config override via model.ollama_num_ctx. Fixes #2708.
  Cherry-picked from PR #5929 by kshitij (kshitijk4poor).

- fix(aux): normalize provider aliases for vision/auxiliary routing.
  Adds _normalize_aux_provider() with 17 aliases (google→gemini,
  claude→anthropic, glm→zai, etc). Fixes vision routing failure when
  provider is set to 'google' instead of 'gemini'.
  Cherry-picked from PR #5793 by e11i (Elizabeth1979).

- fix(aux): rewrite MiniMax /anthropic base URLs to /v1 for OpenAI SDK.
  MiniMax's inference_base_url ends in /anthropic (Anthropic Messages API),
  but auxiliary client uses OpenAI SDK which appends /chat/completions →
  404 at /anthropic/chat/completions. Generic _to_openai_base_url() helper
  rewrites terminal /anthropic to /v1 for OpenAI-compatible endpoint.
  Inspired by PR #5786 by Lempkey.

Added debug logging to silent exception blocks across all fixes.

Co-authored-by: Hermes Agent <hermes@nousresearch.com>
											
										
										
											2026-04-07 22:23:28 -07:00
+								            base_url = _to_openai_base_url(
 								                _pool_runtime_base_url(entry, pconfig.inference_base_url) or pconfig.inference_base_url
 								            )
-												fix(auxiliary): harden fallback behavior for non-OpenRouter users

Four fixes to auxiliary_client.py:

1. Respect explicit provider as hard constraint (#7559)
   When auxiliary.{task}.provider is explicitly set (not 'auto'),
   connection/payment errors no longer silently fallback to cloud
   providers. Local-only users (Ollama, vLLM) will no longer get
   unexpected OpenRouter billing from auxiliary tasks.

2. Eliminate model='default' sentinel (#7512)
   _resolve_api_key_provider() no longer sends literal 'default' as
   model name to APIs. Providers without a known aux model in
   _API_KEY_PROVIDER_AUX_MODELS are skipped instead of producing
   model_not_supported errors.

3. Add payment/connection fallback to async_call_llm (#7512)
   async_call_llm now mirrors sync call_llm's fallback logic for
   payment (402) and connection errors. Previously, async consumers
   (session_search, web_tools, vision) got hard failures with no
   recovery. Also fixes hardcoded 'openrouter' fallback to use the
   full auto-detection chain.

4. Use accurate error reason in fallback logs (#7512)
   _try_payment_fallback() now accepts a reason parameter and uses
   it in log messages. Connection timeouts are no longer misleadingly
   logged as 'payment error'.

Closes #7559
Closes #7512

											
										
										
											2026-04-11 12:37:53 +05:30
+								            model = _API_KEY_PROVIDER_AUX_MODELS.get(provider_id)
 								            if model is None:
 								                continue  # skip provider if we don't know a valid aux model
-												feat(auth): same-provider credential pools with rotation, custom endpoint support, and interactive CLI (#2647)

* feat(auth): add same-provider credential pools and rotation UX

Add same-provider credential pooling so Hermes can rotate across
multiple credentials for a single provider, recover from exhausted
credentials without jumping providers immediately, and configure
that behavior directly in hermes setup.

- agent/credential_pool.py: persisted per-provider credential pools
- hermes auth add/list/remove/reset CLI commands
- 429/402/401 recovery with pool rotation in run_agent.py
- Setup wizard integration for pool strategy configuration
- Auto-seeding from env vars and existing OAuth state

Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
Salvaged from PR #2647

* fix(tests): prevent pool auto-seeding from host env in credential pool tests

Tests for non-pool Anthropic paths and auth remove were failing when
host env vars (ANTHROPIC_API_KEY) or file-backed OAuth credentials
were present. The pool auto-seeding picked these up, causing unexpected
pool entries in tests.

- Mock _select_pool_entry in auxiliary_client OAuth flag tests
- Clear Anthropic env vars and mock _seed_from_singletons in auth remove test

* feat(auth): add thread safety, least_used strategy, and request counting

- Add threading.Lock to CredentialPool for gateway thread safety
  (concurrent requests from multiple gateway sessions could race on
  pool state mutations without this)
- Add 'least_used' rotation strategy that selects the credential
  with the lowest request_count, distributing load more evenly
- Add request_count field to PooledCredential for usage tracking
- Add mark_used() method to increment per-credential request counts
- Wrap select(), mark_exhausted_and_rotate(), and try_refresh_current()
  with lock acquisition
- Add tests: least_used selection, mark_used counting, concurrent
  thread safety (4 threads × 20 selects with no corruption)

* feat(auth): add interactive mode for bare 'hermes auth' command

When 'hermes auth' is called without a subcommand, it now launches an
interactive wizard that:

1. Shows full credential pool status across all providers
2. Offers a menu: add, remove, reset cooldowns, set strategy
3. For OAuth-capable providers (anthropic, nous, openai-codex), the
   add flow explicitly asks 'API key or OAuth login?' — making it
   clear that both auth types are supported for the same provider
4. Strategy picker shows all 4 options (fill_first, round_robin,
   least_used, random) with the current selection marked
5. Remove flow shows entries with indices for easy selection

The subcommand paths (hermes auth add/list/remove/reset) still work
exactly as before for scripted/non-interactive use.

* fix(tests): update runtime_provider tests for config.yaml source of truth (#4165)

Tests were using OPENAI_BASE_URL env var which is no longer consulted
after #4165. Updated to use model config (provider, base_url, api_key)
which is the new single source of truth for custom endpoint URLs.

* feat(auth): support custom endpoint credential pools keyed by provider name

Custom OpenAI-compatible endpoints all share provider='custom', making
the provider-keyed pool useless. Now pools for custom endpoints are
keyed by 'custom:<normalized_name>' where the name comes from the
custom_providers config list (auto-generated from URL hostname).

- Pool key format: 'custom:together.ai', 'custom:local-(localhost:8080)'
- load_pool('custom:name') seeds from custom_providers api_key AND
  model.api_key when base_url matches
- hermes auth add/list now shows custom endpoints alongside registry
  providers
- _resolve_openrouter_runtime and _resolve_named_custom_runtime check
  pool before falling back to single config key
- 6 new tests covering custom pool keying, seeding, and listing

* docs: add Excalidraw diagram of full credential pool flow

Comprehensive architecture diagram showing:
- Credential sources (env vars, auth.json OAuth, config.yaml, CLI)
- Pool storage and auto-seeding
- Runtime resolution paths (registry, custom, OpenRouter)
- Error recovery (429 retry-then-rotate, 402 immediate, 401 refresh)
- CLI management commands and strategy configuration

Open at: https://excalidraw.com/#json=2Ycqhqpi6f12E_3ITyiwh,c7u9jSt5BwrmiVzHGbm87g

* fix(tests): update setup wizard pool tests for unified select_provider_and_model flow

The setup wizard now delegates to select_provider_and_model() instead
of using its own prompt_choice-based provider picker. Tests needed:
- Mock select_provider_and_model as no-op (provider pre-written to config)
- Call _stub_tts BEFORE custom prompt_choice mock (it overwrites it)
- Pre-write model.provider to config so the pool step is reached

* docs: add comprehensive credential pool documentation

- New page: website/docs/user-guide/features/credential-pools.md
  Full guide covering quick start, CLI commands, rotation strategies,
  error recovery, custom endpoint pools, auto-discovery, thread safety,
  architecture, and storage format.
- Updated fallback-providers.md to reference credential pools as the
  first layer of resilience (same-provider rotation before cross-provider)
- Added hermes auth to CLI commands reference with usage examples
- Added credential_pool_strategies to configuration guide

* chore: remove excalidraw diagram from repo (external link only)

* refactor: simplify credential pool code — extract helpers, collapse extras, dedup patterns

- _load_config_safe(): replace 4 identical try/except/import blocks
- _iter_custom_providers(): shared generator for custom provider iteration
- PooledCredential.extra dict: collapse 11 round-trip-only fields
  (token_type, scope, client_id, portal_base_url, obtained_at,
  expires_in, agent_key_id, agent_key_expires_in, agent_key_reused,
  agent_key_obtained_at, tls) into a single extra dict with
  __getattr__ for backward-compatible access
- _available_entries(): shared exhaustion-check between select and peek
- Dedup anthropic OAuth seeding (hermes_pkce + claude_code identical)
- SimpleNamespace replaces class _Args boilerplate in auth_commands
- _try_resolve_from_custom_pool(): shared pool-check in runtime_provider

Net -17 lines. All 383 targeted tests pass.

---------

Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
											
										
										
											2026-03-31 03:10:01 -07:00
+								            logger.debug("Auxiliary text client: %s (%s) via pool", pconfig.name, model)
-												feat(providers): route gemini through the native AI Studio API

- add a native Gemini adapter over generateContent/streamGenerateContent
- switch the built-in gemini provider off the OpenAI-compatible endpoint
- preserve thought signatures and native functionResponse replay
- route auxiliary Gemini clients through the same adapter
- add focused unit coverage plus native-provider integration checks

											
										
										
											2026-04-20 00:00:50 +05:30
+								            if provider_id == "gemini":
-												fix(gemini): tighten native routing and streaming replay

- only use the native adapter for the canonical Gemini native endpoint
- keep custom and /openai base URLs on the OpenAI-compatible path
- preserve Hermes keepalive transport injection for native Gemini clients
- stabilize streaming tool-call replay across repeated SSE events
- add follow-up tests for base_url precedence, async streaming, and duplicate tool-call chunks

											
										
										
											2026-04-20 00:41:20 +05:30
+								                from agent.gemini_native_adapter import GeminiNativeClient, is_native_gemini_base_url
-												feat(providers): route gemini through the native AI Studio API

- add a native Gemini adapter over generateContent/streamGenerateContent
- switch the built-in gemini provider off the OpenAI-compatible endpoint
- preserve thought signatures and native functionResponse replay
- route auxiliary Gemini clients through the same adapter
- add focused unit coverage plus native-provider integration checks

											
										
										
											2026-04-20 00:00:50 +05:30
-												fix(gemini): tighten native routing and streaming replay

- only use the native adapter for the canonical Gemini native endpoint
- keep custom and /openai base URLs on the OpenAI-compatible path
- preserve Hermes keepalive transport injection for native Gemini clients
- stabilize streaming tool-call replay across repeated SSE events
- add follow-up tests for base_url precedence, async streaming, and duplicate tool-call chunks

											
										
										
											2026-04-20 00:41:20 +05:30
+								                if is_native_gemini_base_url(base_url):
 								                    return GeminiNativeClient(api_key=api_key, base_url=base_url), model
-												feat(auth): same-provider credential pools with rotation, custom endpoint support, and interactive CLI (#2647)

* feat(auth): add same-provider credential pools and rotation UX

Add same-provider credential pooling so Hermes can rotate across
multiple credentials for a single provider, recover from exhausted
credentials without jumping providers immediately, and configure
that behavior directly in hermes setup.

- agent/credential_pool.py: persisted per-provider credential pools
- hermes auth add/list/remove/reset CLI commands
- 429/402/401 recovery with pool rotation in run_agent.py
- Setup wizard integration for pool strategy configuration
- Auto-seeding from env vars and existing OAuth state

Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
Salvaged from PR #2647

* fix(tests): prevent pool auto-seeding from host env in credential pool tests

Tests for non-pool Anthropic paths and auth remove were failing when
host env vars (ANTHROPIC_API_KEY) or file-backed OAuth credentials
were present. The pool auto-seeding picked these up, causing unexpected
pool entries in tests.

- Mock _select_pool_entry in auxiliary_client OAuth flag tests
- Clear Anthropic env vars and mock _seed_from_singletons in auth remove test

* feat(auth): add thread safety, least_used strategy, and request counting

- Add threading.Lock to CredentialPool for gateway thread safety
  (concurrent requests from multiple gateway sessions could race on
  pool state mutations without this)
- Add 'least_used' rotation strategy that selects the credential
  with the lowest request_count, distributing load more evenly
- Add request_count field to PooledCredential for usage tracking
- Add mark_used() method to increment per-credential request counts
- Wrap select(), mark_exhausted_and_rotate(), and try_refresh_current()
  with lock acquisition
- Add tests: least_used selection, mark_used counting, concurrent
  thread safety (4 threads × 20 selects with no corruption)

* feat(auth): add interactive mode for bare 'hermes auth' command

When 'hermes auth' is called without a subcommand, it now launches an
interactive wizard that:

1. Shows full credential pool status across all providers
2. Offers a menu: add, remove, reset cooldowns, set strategy
3. For OAuth-capable providers (anthropic, nous, openai-codex), the
   add flow explicitly asks 'API key or OAuth login?' — making it
   clear that both auth types are supported for the same provider
4. Strategy picker shows all 4 options (fill_first, round_robin,
   least_used, random) with the current selection marked
5. Remove flow shows entries with indices for easy selection

The subcommand paths (hermes auth add/list/remove/reset) still work
exactly as before for scripted/non-interactive use.

* fix(tests): update runtime_provider tests for config.yaml source of truth (#4165)

Tests were using OPENAI_BASE_URL env var which is no longer consulted
after #4165. Updated to use model config (provider, base_url, api_key)
which is the new single source of truth for custom endpoint URLs.

* feat(auth): support custom endpoint credential pools keyed by provider name

Custom OpenAI-compatible endpoints all share provider='custom', making
the provider-keyed pool useless. Now pools for custom endpoints are
keyed by 'custom:<normalized_name>' where the name comes from the
custom_providers config list (auto-generated from URL hostname).

- Pool key format: 'custom:together.ai', 'custom:local-(localhost:8080)'
- load_pool('custom:name') seeds from custom_providers api_key AND
  model.api_key when base_url matches
- hermes auth add/list now shows custom endpoints alongside registry
  providers
- _resolve_openrouter_runtime and _resolve_named_custom_runtime check
  pool before falling back to single config key
- 6 new tests covering custom pool keying, seeding, and listing

* docs: add Excalidraw diagram of full credential pool flow

Comprehensive architecture diagram showing:
- Credential sources (env vars, auth.json OAuth, config.yaml, CLI)
- Pool storage and auto-seeding
- Runtime resolution paths (registry, custom, OpenRouter)
- Error recovery (429 retry-then-rotate, 402 immediate, 401 refresh)
- CLI management commands and strategy configuration

Open at: https://excalidraw.com/#json=2Ycqhqpi6f12E_3ITyiwh,c7u9jSt5BwrmiVzHGbm87g

* fix(tests): update setup wizard pool tests for unified select_provider_and_model flow

The setup wizard now delegates to select_provider_and_model() instead
of using its own prompt_choice-based provider picker. Tests needed:
- Mock select_provider_and_model as no-op (provider pre-written to config)
- Call _stub_tts BEFORE custom prompt_choice mock (it overwrites it)
- Pre-write model.provider to config so the pool step is reached

* docs: add comprehensive credential pool documentation

- New page: website/docs/user-guide/features/credential-pools.md
  Full guide covering quick start, CLI commands, rotation strategies,
  error recovery, custom endpoint pools, auto-discovery, thread safety,
  architecture, and storage format.
- Updated fallback-providers.md to reference credential pools as the
  first layer of resilience (same-provider rotation before cross-provider)
- Added hermes auth to CLI commands reference with usage examples
- Added credential_pool_strategies to configuration guide

* chore: remove excalidraw diagram from repo (external link only)

* refactor: simplify credential pool code — extract helpers, collapse extras, dedup patterns

- _load_config_safe(): replace 4 identical try/except/import blocks
- _iter_custom_providers(): shared generator for custom provider iteration
- PooledCredential.extra dict: collapse 11 round-trip-only fields
  (token_type, scope, client_id, portal_base_url, obtained_at,
  expires_in, agent_key_id, agent_key_expires_in, agent_key_reused,
  agent_key_obtained_at, tls) into a single extra dict with
  __getattr__ for backward-compatible access
- _available_entries(): shared exhaustion-check between select and peek
- Dedup anthropic OAuth seeding (hermes_pkce + claude_code identical)
- SimpleNamespace replaces class _Args boilerplate in auth_commands
- _try_resolve_from_custom_pool(): shared pool-check in runtime_provider

Net -17 lines. All 383 targeted tests pass.

---------

Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
											
										
										
											2026-03-31 03:10:01 -07:00
+								            extra = {}
-												fix: sweep remaining provider-URL substring checks across codebase

Completes the hostname-hardening sweep — every substring check against a
provider host in live-routing code is now hostname-based. This closes the
same false-positive class for OpenRouter, GitHub Copilot, Kimi, Qwen,
ChatGPT/Codex, Bedrock, GitHub Models, Vercel AI Gateway, Nous, Z.AI,
Moonshot, Arcee, and MiniMax that the original PR closed for OpenAI, xAI,
and Anthropic.

New helper:
- utils.base_url_host_matches(base_url, domain) — safe counterpart to
  'domain in base_url'. Accepts hostname equality and subdomain matches;
  rejects path segments, host suffixes, and prefix collisions.

Call sites converted (real-code only; tests, optional-skills, red-teaming
scripts untouched):

run_agent.py (10 sites):
- AIAgent.__init__ Bedrock branch, ChatGPT/Codex branch (also path check)
- header cascade for openrouter / copilot / kimi / qwen / chatgpt
- interleaved-thinking trigger (openrouter + claude)
- _is_openrouter_url(), _is_qwen_portal()
- is_native_anthropic check
- github-models-vs-copilot detection (3 sites)
- reasoning-capable route gate (nousresearch, vercel, github)
- codex-backend detection in API kwargs build
- fallback api_mode Bedrock detection

agent/auxiliary_client.py (7 sites):
- extra-headers cascades in 4 distinct client-construction paths
  (resolve custom, resolve auto, OpenRouter-fallback-to-custom,
  _async_client_from_sync, resolve_provider_client explicit-custom,
  resolve_auto_with_codex)
- _is_openrouter_client() base_url sniff

agent/usage_pricing.py:
- resolve_billing_route openrouter branch

agent/model_metadata.py:
- _is_openrouter_base_url(), Bedrock context-length lookup

hermes_cli/providers.py:
- determine_api_mode Bedrock heuristic

hermes_cli/runtime_provider.py:
- _is_openrouter_url flag for API-key preference (issues #420, #560)

hermes_cli/doctor.py:
- Kimi User-Agent header for /models probes

tools/delegate_tool.py:
- subagent Codex endpoint detection

trajectory_compressor.py:
- _detect_provider() cascade (8 providers: openrouter, nous, codex, zai,
  kimi-coding, arcee, minimax-cn, minimax)

cli.py, gateway/run.py:
- /model-switch cache-enabled hint (openrouter + claude)

Bedrock detection tightened from 'bedrock-runtime in url' to
'hostname starts with bedrock-runtime. AND host is under amazonaws.com'.
ChatGPT/Codex detection tightened from 'chatgpt.com/backend-api/codex in
url' to 'hostname is chatgpt.com AND path contains /backend-api/codex'.

Tests:
- tests/test_base_url_hostname.py extended with a base_url_host_matches
  suite (exact match, subdomain, path-segment rejection, host-suffix
  rejection, host-prefix rejection, empty-input, case-insensitivity,
  trailing dot).

Validation: 651 targeted tests pass (runtime_provider, minimax, bedrock,
gemini, auxiliary, codex_cloudflare, usage_pricing, compressor_fallback,
fallback_model, openai_client_lifecycle, provider_parity, cli_provider_resolution,
delegate, credential_pool, context_compressor, plus the 4 hostname test
modules). 26-assertion E2E call-site verification across 6 modules passes.

											
										
										
											2026-04-20 21:17:28 -07:00
+								            if base_url_host_matches(base_url, "api.kimi.com"):
-												fix: Update Kimi Coding API endpoint and User-Agent

											
										
										
											2026-04-18 22:55:36 +08:00
+								                extra["default_headers"] = {"User-Agent": "claude-code/0.1.0"}
-												fix: sweep remaining provider-URL substring checks across codebase

Completes the hostname-hardening sweep — every substring check against a
provider host in live-routing code is now hostname-based. This closes the
same false-positive class for OpenRouter, GitHub Copilot, Kimi, Qwen,
ChatGPT/Codex, Bedrock, GitHub Models, Vercel AI Gateway, Nous, Z.AI,
Moonshot, Arcee, and MiniMax that the original PR closed for OpenAI, xAI,
and Anthropic.

New helper:
- utils.base_url_host_matches(base_url, domain) — safe counterpart to
  'domain in base_url'. Accepts hostname equality and subdomain matches;
  rejects path segments, host suffixes, and prefix collisions.

Call sites converted (real-code only; tests, optional-skills, red-teaming
scripts untouched):

run_agent.py (10 sites):
- AIAgent.__init__ Bedrock branch, ChatGPT/Codex branch (also path check)
- header cascade for openrouter / copilot / kimi / qwen / chatgpt
- interleaved-thinking trigger (openrouter + claude)
- _is_openrouter_url(), _is_qwen_portal()
- is_native_anthropic check
- github-models-vs-copilot detection (3 sites)
- reasoning-capable route gate (nousresearch, vercel, github)
- codex-backend detection in API kwargs build
- fallback api_mode Bedrock detection

agent/auxiliary_client.py (7 sites):
- extra-headers cascades in 4 distinct client-construction paths
  (resolve custom, resolve auto, OpenRouter-fallback-to-custom,
  _async_client_from_sync, resolve_provider_client explicit-custom,
  resolve_auto_with_codex)
- _is_openrouter_client() base_url sniff

agent/usage_pricing.py:
- resolve_billing_route openrouter branch

agent/model_metadata.py:
- _is_openrouter_base_url(), Bedrock context-length lookup

hermes_cli/providers.py:
- determine_api_mode Bedrock heuristic

hermes_cli/runtime_provider.py:
- _is_openrouter_url flag for API-key preference (issues #420, #560)

hermes_cli/doctor.py:
- Kimi User-Agent header for /models probes

tools/delegate_tool.py:
- subagent Codex endpoint detection

trajectory_compressor.py:
- _detect_provider() cascade (8 providers: openrouter, nous, codex, zai,
  kimi-coding, arcee, minimax-cn, minimax)

cli.py, gateway/run.py:
- /model-switch cache-enabled hint (openrouter + claude)

Bedrock detection tightened from 'bedrock-runtime in url' to
'hostname starts with bedrock-runtime. AND host is under amazonaws.com'.
ChatGPT/Codex detection tightened from 'chatgpt.com/backend-api/codex in
url' to 'hostname is chatgpt.com AND path contains /backend-api/codex'.

Tests:
- tests/test_base_url_hostname.py extended with a base_url_host_matches
  suite (exact match, subdomain, path-segment rejection, host-suffix
  rejection, host-prefix rejection, empty-input, case-insensitivity,
  trailing dot).

Validation: 651 targeted tests pass (runtime_provider, minimax, bedrock,
gemini, auxiliary, codex_cloudflare, usage_pricing, compressor_fallback,
fallback_model, openai_client_lifecycle, provider_parity, cli_provider_resolution,
delegate, credential_pool, context_compressor, plus the 4 hostname test
modules). 26-assertion E2E call-site verification across 6 modules passes.

											
										
										
											2026-04-20 21:17:28 -07:00
+								            elif base_url_host_matches(base_url, "api.githubcopilot.com"):
-												feat(auth): same-provider credential pools with rotation, custom endpoint support, and interactive CLI (#2647)

* feat(auth): add same-provider credential pools and rotation UX

Add same-provider credential pooling so Hermes can rotate across
multiple credentials for a single provider, recover from exhausted
credentials without jumping providers immediately, and configure
that behavior directly in hermes setup.

- agent/credential_pool.py: persisted per-provider credential pools
- hermes auth add/list/remove/reset CLI commands
- 429/402/401 recovery with pool rotation in run_agent.py
- Setup wizard integration for pool strategy configuration
- Auto-seeding from env vars and existing OAuth state

Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
Salvaged from PR #2647

* fix(tests): prevent pool auto-seeding from host env in credential pool tests

Tests for non-pool Anthropic paths and auth remove were failing when
host env vars (ANTHROPIC_API_KEY) or file-backed OAuth credentials
were present. The pool auto-seeding picked these up, causing unexpected
pool entries in tests.

- Mock _select_pool_entry in auxiliary_client OAuth flag tests
- Clear Anthropic env vars and mock _seed_from_singletons in auth remove test

* feat(auth): add thread safety, least_used strategy, and request counting

- Add threading.Lock to CredentialPool for gateway thread safety
  (concurrent requests from multiple gateway sessions could race on
  pool state mutations without this)
- Add 'least_used' rotation strategy that selects the credential
  with the lowest request_count, distributing load more evenly
- Add request_count field to PooledCredential for usage tracking
- Add mark_used() method to increment per-credential request counts
- Wrap select(), mark_exhausted_and_rotate(), and try_refresh_current()
  with lock acquisition
- Add tests: least_used selection, mark_used counting, concurrent
  thread safety (4 threads × 20 selects with no corruption)

* feat(auth): add interactive mode for bare 'hermes auth' command

When 'hermes auth' is called without a subcommand, it now launches an
interactive wizard that:

1. Shows full credential pool status across all providers
2. Offers a menu: add, remove, reset cooldowns, set strategy
3. For OAuth-capable providers (anthropic, nous, openai-codex), the
   add flow explicitly asks 'API key or OAuth login?' — making it
   clear that both auth types are supported for the same provider
4. Strategy picker shows all 4 options (fill_first, round_robin,
   least_used, random) with the current selection marked
5. Remove flow shows entries with indices for easy selection

The subcommand paths (hermes auth add/list/remove/reset) still work
exactly as before for scripted/non-interactive use.

* fix(tests): update runtime_provider tests for config.yaml source of truth (#4165)

Tests were using OPENAI_BASE_URL env var which is no longer consulted
after #4165. Updated to use model config (provider, base_url, api_key)
which is the new single source of truth for custom endpoint URLs.

* feat(auth): support custom endpoint credential pools keyed by provider name

Custom OpenAI-compatible endpoints all share provider='custom', making
the provider-keyed pool useless. Now pools for custom endpoints are
keyed by 'custom:<normalized_name>' where the name comes from the
custom_providers config list (auto-generated from URL hostname).

- Pool key format: 'custom:together.ai', 'custom:local-(localhost:8080)'
- load_pool('custom:name') seeds from custom_providers api_key AND
  model.api_key when base_url matches
- hermes auth add/list now shows custom endpoints alongside registry
  providers
- _resolve_openrouter_runtime and _resolve_named_custom_runtime check
  pool before falling back to single config key
- 6 new tests covering custom pool keying, seeding, and listing

* docs: add Excalidraw diagram of full credential pool flow

Comprehensive architecture diagram showing:
- Credential sources (env vars, auth.json OAuth, config.yaml, CLI)
- Pool storage and auto-seeding
- Runtime resolution paths (registry, custom, OpenRouter)
- Error recovery (429 retry-then-rotate, 402 immediate, 401 refresh)
- CLI management commands and strategy configuration

Open at: https://excalidraw.com/#json=2Ycqhqpi6f12E_3ITyiwh,c7u9jSt5BwrmiVzHGbm87g

* fix(tests): update setup wizard pool tests for unified select_provider_and_model flow

The setup wizard now delegates to select_provider_and_model() instead
of using its own prompt_choice-based provider picker. Tests needed:
- Mock select_provider_and_model as no-op (provider pre-written to config)
- Call _stub_tts BEFORE custom prompt_choice mock (it overwrites it)
- Pre-write model.provider to config so the pool step is reached

* docs: add comprehensive credential pool documentation

- New page: website/docs/user-guide/features/credential-pools.md
  Full guide covering quick start, CLI commands, rotation strategies,
  error recovery, custom endpoint pools, auto-discovery, thread safety,
  architecture, and storage format.
- Updated fallback-providers.md to reference credential pools as the
  first layer of resilience (same-provider rotation before cross-provider)
- Added hermes auth to CLI commands reference with usage examples
- Added credential_pool_strategies to configuration guide

* chore: remove excalidraw diagram from repo (external link only)

* refactor: simplify credential pool code — extract helpers, collapse extras, dedup patterns

- _load_config_safe(): replace 4 identical try/except/import blocks
- _iter_custom_providers(): shared generator for custom provider iteration
- PooledCredential.extra dict: collapse 11 round-trip-only fields
  (token_type, scope, client_id, portal_base_url, obtained_at,
  expires_in, agent_key_id, agent_key_expires_in, agent_key_reused,
  agent_key_obtained_at, tls) into a single extra dict with
  __getattr__ for backward-compatible access
- _available_entries(): shared exhaustion-check between select and peek
- Dedup anthropic OAuth seeding (hermes_pkce + claude_code identical)
- SimpleNamespace replaces class _Args boilerplate in auth_commands
- _try_resolve_from_custom_pool(): shared pool-check in runtime_provider

Net -17 lines. All 383 targeted tests pass.

---------

Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
											
										
										
											2026-03-31 03:10:01 -07:00
+								                from hermes_cli.models import copilot_default_headers
 								                extra["default_headers"] = copilot_default_headers()
 								            return OpenAI(api_key=api_key, base_url=base_url, **extra), model
-												feat: integrate GitHub Copilot providers across Hermes

Add first-class GitHub Copilot and Copilot ACP provider support across
model selection, runtime provider resolution, CLI sessions, delegated
subagents, cron jobs, and the Telegram gateway.

This also normalizes Copilot model catalogs and API modes, introduces a
Copilot ACP OpenAI-compatible shim, and fixes service-mode auth by
resolving Homebrew-installed gh binaries under launchd.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

											
										
										
											2026-03-17 23:40:22 -07:00
+								        creds = resolve_api_key_provider_credentials(provider_id)
 								        api_key = str(creds.get("api_key", "")).strip()
 								        if not api_key:
 								            continue
-												fix: provider/model resolution — salvage 4 PRs + MiniMax aux URL fix (#5983)

Salvaged fixes from community PRs:

- fix(model_switch): _read_auth_store → _load_auth_store + fix auth store
  key lookup (was checking top-level dict instead of store['providers']).
  OAuth providers now correctly detected in /model picker.
  Cherry-picked from PR #5911 by Xule Lin (linxule).

- fix(ollama): pass num_ctx to override 2048 default context window.
  Ollama defaults to 2048 context regardless of model capabilities. Now
  auto-detects from /api/show metadata and injects num_ctx into every
  request. Config override via model.ollama_num_ctx. Fixes #2708.
  Cherry-picked from PR #5929 by kshitij (kshitijk4poor).

- fix(aux): normalize provider aliases for vision/auxiliary routing.
  Adds _normalize_aux_provider() with 17 aliases (google→gemini,
  claude→anthropic, glm→zai, etc). Fixes vision routing failure when
  provider is set to 'google' instead of 'gemini'.
  Cherry-picked from PR #5793 by e11i (Elizabeth1979).

- fix(aux): rewrite MiniMax /anthropic base URLs to /v1 for OpenAI SDK.
  MiniMax's inference_base_url ends in /anthropic (Anthropic Messages API),
  but auxiliary client uses OpenAI SDK which appends /chat/completions →
  404 at /anthropic/chat/completions. Generic _to_openai_base_url() helper
  rewrites terminal /anthropic to /v1 for OpenAI-compatible endpoint.
  Inspired by PR #5786 by Lempkey.

Added debug logging to silent exception blocks across all fixes.

Co-authored-by: Hermes Agent <hermes@nousresearch.com>
											
										
										
											2026-04-07 22:23:28 -07:00
+								        base_url = _to_openai_base_url(
 								            str(creds.get("base_url", "")).strip().rstrip("/") or pconfig.inference_base_url
 								        )
-												fix(auxiliary): harden fallback behavior for non-OpenRouter users

Four fixes to auxiliary_client.py:

1. Respect explicit provider as hard constraint (#7559)
   When auxiliary.{task}.provider is explicitly set (not 'auto'),
   connection/payment errors no longer silently fallback to cloud
   providers. Local-only users (Ollama, vLLM) will no longer get
   unexpected OpenRouter billing from auxiliary tasks.

2. Eliminate model='default' sentinel (#7512)
   _resolve_api_key_provider() no longer sends literal 'default' as
   model name to APIs. Providers without a known aux model in
   _API_KEY_PROVIDER_AUX_MODELS are skipped instead of producing
   model_not_supported errors.

3. Add payment/connection fallback to async_call_llm (#7512)
   async_call_llm now mirrors sync call_llm's fallback logic for
   payment (402) and connection errors. Previously, async consumers
   (session_search, web_tools, vision) got hard failures with no
   recovery. Also fixes hardcoded 'openrouter' fallback to use the
   full auto-detection chain.

4. Use accurate error reason in fallback logs (#7512)
   _try_payment_fallback() now accepts a reason parameter and uses
   it in log messages. Connection timeouts are no longer misleadingly
   logged as 'payment error'.

Closes #7559
Closes #7512

											
										
										
											2026-04-11 12:37:53 +05:30
+								        model = _API_KEY_PROVIDER_AUX_MODELS.get(provider_id)
 								        if model is None:
 								            continue  # skip provider if we don't know a valid aux model
-												feat: add direct API-key providers as auxiliary client fallbacks

When the user only has a z.ai/Kimi/MiniMax API key (no OpenRouter key),
auxiliary tasks (context compression, web summarization, session search)
now fall back to the configured direct provider instead of returning None.

Resolution chain: OpenRouter -> Nous -> Custom endpoint -> Codex OAuth
-> direct API-key providers -> None.

Uses cheap/fast models for auxiliary tasks:
- zai: glm-4.5-flash
- kimi-coding: kimi-k2-turbo-preview
- minimax/minimax-cn: MiniMax-M2.5-highspeed

Vision auxiliary intentionally NOT modified — vision needs multimodal
models (Gemini) that these providers don't serve.

											
										
										
											2026-03-06 19:08:54 -08:00
+								        logger.debug("Auxiliary text client: %s (%s)", pconfig.name, model)
-												feat(providers): route gemini through the native AI Studio API

- add a native Gemini adapter over generateContent/streamGenerateContent
- switch the built-in gemini provider off the OpenAI-compatible endpoint
- preserve thought signatures and native functionResponse replay
- route auxiliary Gemini clients through the same adapter
- add focused unit coverage plus native-provider integration checks

											
										
										
											2026-04-20 00:00:50 +05:30
+								        if provider_id == "gemini":
-												fix(gemini): tighten native routing and streaming replay

- only use the native adapter for the canonical Gemini native endpoint
- keep custom and /openai base URLs on the OpenAI-compatible path
- preserve Hermes keepalive transport injection for native Gemini clients
- stabilize streaming tool-call replay across repeated SSE events
- add follow-up tests for base_url precedence, async streaming, and duplicate tool-call chunks

											
										
										
											2026-04-20 00:41:20 +05:30
+								            from agent.gemini_native_adapter import GeminiNativeClient, is_native_gemini_base_url
-												feat(providers): route gemini through the native AI Studio API

- add a native Gemini adapter over generateContent/streamGenerateContent
- switch the built-in gemini provider off the OpenAI-compatible endpoint
- preserve thought signatures and native functionResponse replay
- route auxiliary Gemini clients through the same adapter
- add focused unit coverage plus native-provider integration checks

											
										
										
											2026-04-20 00:00:50 +05:30
-												fix(gemini): tighten native routing and streaming replay

- only use the native adapter for the canonical Gemini native endpoint
- keep custom and /openai base URLs on the OpenAI-compatible path
- preserve Hermes keepalive transport injection for native Gemini clients
- stabilize streaming tool-call replay across repeated SSE events
- add follow-up tests for base_url precedence, async streaming, and duplicate tool-call chunks

											
										
										
											2026-04-20 00:41:20 +05:30
+								            if is_native_gemini_base_url(base_url):
 								                return GeminiNativeClient(api_key=api_key, base_url=base_url), model
-												fix: add Kimi Code API support (api.kimi.com/coding/v1)

Kimi Code (platform.kimi.ai) issues API keys prefixed sk-kimi- that require:
1. A different base URL: api.kimi.com/coding/v1 (not api.moonshot.ai/v1)
2. A User-Agent header identifying a recognized coding agent

Without this fix, sk-kimi- keys fail with 401 (wrong endpoint) or 403
('only available for Coding Agents') errors.

Changes:
- Auto-detect sk-kimi- key prefix and route to api.kimi.com/coding/v1
- Send User-Agent: KimiCLI/1.0 header for Kimi Code endpoints
- Legacy Moonshot keys (api.moonshot.ai) continue to work unchanged
- KIMI_BASE_URL env var override still takes priority over auto-detection
- Updated .env.example with correct docs and all endpoint options
- Fixed doctor.py health check for Kimi Code keys

Reference: https://github.com/MoonshotAI/kimi-cli (platforms.py)

											
										
										
											2026-03-07 20:43:34 -05:00
+								        extra = {}
-												fix: sweep remaining provider-URL substring checks across codebase

Completes the hostname-hardening sweep — every substring check against a
provider host in live-routing code is now hostname-based. This closes the
same false-positive class for OpenRouter, GitHub Copilot, Kimi, Qwen,
ChatGPT/Codex, Bedrock, GitHub Models, Vercel AI Gateway, Nous, Z.AI,
Moonshot, Arcee, and MiniMax that the original PR closed for OpenAI, xAI,
and Anthropic.

New helper:
- utils.base_url_host_matches(base_url, domain) — safe counterpart to
  'domain in base_url'. Accepts hostname equality and subdomain matches;
  rejects path segments, host suffixes, and prefix collisions.

Call sites converted (real-code only; tests, optional-skills, red-teaming
scripts untouched):

run_agent.py (10 sites):
- AIAgent.__init__ Bedrock branch, ChatGPT/Codex branch (also path check)
- header cascade for openrouter / copilot / kimi / qwen / chatgpt
- interleaved-thinking trigger (openrouter + claude)
- _is_openrouter_url(), _is_qwen_portal()
- is_native_anthropic check
- github-models-vs-copilot detection (3 sites)
- reasoning-capable route gate (nousresearch, vercel, github)
- codex-backend detection in API kwargs build
- fallback api_mode Bedrock detection

agent/auxiliary_client.py (7 sites):
- extra-headers cascades in 4 distinct client-construction paths
  (resolve custom, resolve auto, OpenRouter-fallback-to-custom,
  _async_client_from_sync, resolve_provider_client explicit-custom,
  resolve_auto_with_codex)
- _is_openrouter_client() base_url sniff

agent/usage_pricing.py:
- resolve_billing_route openrouter branch

agent/model_metadata.py:
- _is_openrouter_base_url(), Bedrock context-length lookup

hermes_cli/providers.py:
- determine_api_mode Bedrock heuristic

hermes_cli/runtime_provider.py:
- _is_openrouter_url flag for API-key preference (issues #420, #560)

hermes_cli/doctor.py:
- Kimi User-Agent header for /models probes

tools/delegate_tool.py:
- subagent Codex endpoint detection

trajectory_compressor.py:
- _detect_provider() cascade (8 providers: openrouter, nous, codex, zai,
  kimi-coding, arcee, minimax-cn, minimax)

cli.py, gateway/run.py:
- /model-switch cache-enabled hint (openrouter + claude)

Bedrock detection tightened from 'bedrock-runtime in url' to
'hostname starts with bedrock-runtime. AND host is under amazonaws.com'.
ChatGPT/Codex detection tightened from 'chatgpt.com/backend-api/codex in
url' to 'hostname is chatgpt.com AND path contains /backend-api/codex'.

Tests:
- tests/test_base_url_hostname.py extended with a base_url_host_matches
  suite (exact match, subdomain, path-segment rejection, host-suffix
  rejection, host-prefix rejection, empty-input, case-insensitivity,
  trailing dot).

Validation: 651 targeted tests pass (runtime_provider, minimax, bedrock,
gemini, auxiliary, codex_cloudflare, usage_pricing, compressor_fallback,
fallback_model, openai_client_lifecycle, provider_parity, cli_provider_resolution,
delegate, credential_pool, context_compressor, plus the 4 hostname test
modules). 26-assertion E2E call-site verification across 6 modules passes.

											
										
										
											2026-04-20 21:17:28 -07:00
+								        if base_url_host_matches(base_url, "api.kimi.com"):
-												fix: Update Kimi Coding API endpoint and User-Agent

											
										
										
											2026-04-18 22:55:36 +08:00
+								            extra["default_headers"] = {"User-Agent": "claude-code/0.1.0"}
-												fix: sweep remaining provider-URL substring checks across codebase

Completes the hostname-hardening sweep — every substring check against a
provider host in live-routing code is now hostname-based. This closes the
same false-positive class for OpenRouter, GitHub Copilot, Kimi, Qwen,
ChatGPT/Codex, Bedrock, GitHub Models, Vercel AI Gateway, Nous, Z.AI,
Moonshot, Arcee, and MiniMax that the original PR closed for OpenAI, xAI,
and Anthropic.

New helper:
- utils.base_url_host_matches(base_url, domain) — safe counterpart to
  'domain in base_url'. Accepts hostname equality and subdomain matches;
  rejects path segments, host suffixes, and prefix collisions.

Call sites converted (real-code only; tests, optional-skills, red-teaming
scripts untouched):

run_agent.py (10 sites):
- AIAgent.__init__ Bedrock branch, ChatGPT/Codex branch (also path check)
- header cascade for openrouter / copilot / kimi / qwen / chatgpt
- interleaved-thinking trigger (openrouter + claude)
- _is_openrouter_url(), _is_qwen_portal()
- is_native_anthropic check
- github-models-vs-copilot detection (3 sites)
- reasoning-capable route gate (nousresearch, vercel, github)
- codex-backend detection in API kwargs build
- fallback api_mode Bedrock detection

agent/auxiliary_client.py (7 sites):
- extra-headers cascades in 4 distinct client-construction paths
  (resolve custom, resolve auto, OpenRouter-fallback-to-custom,
  _async_client_from_sync, resolve_provider_client explicit-custom,
  resolve_auto_with_codex)
- _is_openrouter_client() base_url sniff

agent/usage_pricing.py:
- resolve_billing_route openrouter branch

agent/model_metadata.py:
- _is_openrouter_base_url(), Bedrock context-length lookup

hermes_cli/providers.py:
- determine_api_mode Bedrock heuristic

hermes_cli/runtime_provider.py:
- _is_openrouter_url flag for API-key preference (issues #420, #560)

hermes_cli/doctor.py:
- Kimi User-Agent header for /models probes

tools/delegate_tool.py:
- subagent Codex endpoint detection

trajectory_compressor.py:
- _detect_provider() cascade (8 providers: openrouter, nous, codex, zai,
  kimi-coding, arcee, minimax-cn, minimax)

cli.py, gateway/run.py:
- /model-switch cache-enabled hint (openrouter + claude)

Bedrock detection tightened from 'bedrock-runtime in url' to
'hostname starts with bedrock-runtime. AND host is under amazonaws.com'.
ChatGPT/Codex detection tightened from 'chatgpt.com/backend-api/codex in
url' to 'hostname is chatgpt.com AND path contains /backend-api/codex'.

Tests:
- tests/test_base_url_hostname.py extended with a base_url_host_matches
  suite (exact match, subdomain, path-segment rejection, host-suffix
  rejection, host-prefix rejection, empty-input, case-insensitivity,
  trailing dot).

Validation: 651 targeted tests pass (runtime_provider, minimax, bedrock,
gemini, auxiliary, codex_cloudflare, usage_pricing, compressor_fallback,
fallback_model, openai_client_lifecycle, provider_parity, cli_provider_resolution,
delegate, credential_pool, context_compressor, plus the 4 hostname test
modules). 26-assertion E2E call-site verification across 6 modules passes.

											
										
										
											2026-04-20 21:17:28 -07:00
+								        elif base_url_host_matches(base_url, "api.githubcopilot.com"):
-												feat: integrate GitHub Copilot providers across Hermes

Add first-class GitHub Copilot and Copilot ACP provider support across
model selection, runtime provider resolution, CLI sessions, delegated
subagents, cron jobs, and the Telegram gateway.

This also normalizes Copilot model catalogs and API modes, introduces a
Copilot ACP OpenAI-compatible shim, and fixes service-mode auth by
resolving Homebrew-installed gh binaries under launchd.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

											
										
										
											2026-03-17 23:40:22 -07:00
+								            from hermes_cli.models import copilot_default_headers
 								            extra["default_headers"] = copilot_default_headers()
-												fix: add Kimi Code API support (api.kimi.com/coding/v1)

Kimi Code (platform.kimi.ai) issues API keys prefixed sk-kimi- that require:
1. A different base URL: api.kimi.com/coding/v1 (not api.moonshot.ai/v1)
2. A User-Agent header identifying a recognized coding agent

Without this fix, sk-kimi- keys fail with 401 (wrong endpoint) or 403
('only available for Coding Agents') errors.

Changes:
- Auto-detect sk-kimi- key prefix and route to api.kimi.com/coding/v1
- Send User-Agent: KimiCLI/1.0 header for Kimi Code endpoints
- Legacy Moonshot keys (api.moonshot.ai) continue to work unchanged
- KIMI_BASE_URL env var override still takes priority over auto-detection
- Updated .env.example with correct docs and all endpoint options
- Fixed doctor.py health check for Kimi Code keys

Reference: https://github.com/MoonshotAI/kimi-cli (platforms.py)

											
										
										
											2026-03-07 20:43:34 -05:00
+								        return OpenAI(api_key=api_key, base_url=base_url, **extra), model
-												feat: add direct API-key providers as auxiliary client fallbacks

When the user only has a z.ai/Kimi/MiniMax API key (no OpenRouter key),
auxiliary tasks (context compression, web summarization, session search)
now fall back to the configured direct provider instead of returning None.

Resolution chain: OpenRouter -> Nous -> Custom endpoint -> Codex OAuth
-> direct API-key providers -> None.

Uses cheap/fast models for auxiliary tasks:
- zai: glm-4.5-flash
- kimi-coding: kimi-k2-turbo-preview
- minimax/minimax-cn: MiniMax-M2.5-highspeed

Vision auxiliary intentionally NOT modified — vision needs multimodal
models (Gemini) that these providers don't serve.

											
										
										
											2026-03-06 19:08:54 -08:00
 								    return None, None
-												feat: enhance auxiliary model configuration and environment variable handling

- Added support for auxiliary model overrides in the configuration, allowing users to specify providers and models for vision and web extraction tasks.
- Updated the CLI configuration example to include new auxiliary model settings.
- Enhanced the environment variable mapping in the CLI to accommodate auxiliary model configurations.
- Improved the resolution logic for auxiliary clients to support task-specific provider overrides.
- Updated relevant documentation and comments for clarity on the new features and their usage.

											
										
										
											2026-03-07 08:52:06 -08:00
+								# ── Provider resolution helpers ─────────────────────────────────────────────
-												Hermes Agent UX Improvements

											
										
										
											2026-02-22 02:16:11 -08:00
-												feat: add direct endpoint overrides for auxiliary and delegation

Add base_url/api_key overrides for auxiliary tasks and delegation so users can
route those flows straight to a custom OpenAI-compatible endpoint without
having to rely on provider=main or named custom providers.

Also clear gateway session env vars in test isolation so the full suite stays
deterministic when run from a messaging-backed agent session.

											
										
										
											2026-03-14 20:48:29 -07:00
-												feat: enhance auxiliary model configuration and environment variable handling

- Added support for auxiliary model overrides in the configuration, allowing users to specify providers and models for vision and web extraction tasks.
- Updated the CLI configuration example to include new auxiliary model settings.
- Enhanced the environment variable mapping in the CLI to accommodate auxiliary model configurations.
- Improved the resolution logic for auxiliary clients to support task-specific provider overrides.
- Updated relevant documentation and comments for clarity on the new features and their usage.

											
										
										
											2026-03-07 08:52:06 -08:00
+								def _try_openrouter() -> Tuple[Optional[OpenAI], Optional[str]]:
-												feat(auth): same-provider credential pools with rotation, custom endpoint support, and interactive CLI (#2647)

* feat(auth): add same-provider credential pools and rotation UX

Add same-provider credential pooling so Hermes can rotate across
multiple credentials for a single provider, recover from exhausted
credentials without jumping providers immediately, and configure
that behavior directly in hermes setup.

- agent/credential_pool.py: persisted per-provider credential pools
- hermes auth add/list/remove/reset CLI commands
- 429/402/401 recovery with pool rotation in run_agent.py
- Setup wizard integration for pool strategy configuration
- Auto-seeding from env vars and existing OAuth state

Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
Salvaged from PR #2647

* fix(tests): prevent pool auto-seeding from host env in credential pool tests

Tests for non-pool Anthropic paths and auth remove were failing when
host env vars (ANTHROPIC_API_KEY) or file-backed OAuth credentials
were present. The pool auto-seeding picked these up, causing unexpected
pool entries in tests.

- Mock _select_pool_entry in auxiliary_client OAuth flag tests
- Clear Anthropic env vars and mock _seed_from_singletons in auth remove test

* feat(auth): add thread safety, least_used strategy, and request counting

- Add threading.Lock to CredentialPool for gateway thread safety
  (concurrent requests from multiple gateway sessions could race on
  pool state mutations without this)
- Add 'least_used' rotation strategy that selects the credential
  with the lowest request_count, distributing load more evenly
- Add request_count field to PooledCredential for usage tracking
- Add mark_used() method to increment per-credential request counts
- Wrap select(), mark_exhausted_and_rotate(), and try_refresh_current()
  with lock acquisition
- Add tests: least_used selection, mark_used counting, concurrent
  thread safety (4 threads × 20 selects with no corruption)

* feat(auth): add interactive mode for bare 'hermes auth' command

When 'hermes auth' is called without a subcommand, it now launches an
interactive wizard that:

1. Shows full credential pool status across all providers
2. Offers a menu: add, remove, reset cooldowns, set strategy
3. For OAuth-capable providers (anthropic, nous, openai-codex), the
   add flow explicitly asks 'API key or OAuth login?' — making it
   clear that both auth types are supported for the same provider
4. Strategy picker shows all 4 options (fill_first, round_robin,
   least_used, random) with the current selection marked
5. Remove flow shows entries with indices for easy selection

The subcommand paths (hermes auth add/list/remove/reset) still work
exactly as before for scripted/non-interactive use.

* fix(tests): update runtime_provider tests for config.yaml source of truth (#4165)

Tests were using OPENAI_BASE_URL env var which is no longer consulted
after #4165. Updated to use model config (provider, base_url, api_key)
which is the new single source of truth for custom endpoint URLs.

* feat(auth): support custom endpoint credential pools keyed by provider name

Custom OpenAI-compatible endpoints all share provider='custom', making
the provider-keyed pool useless. Now pools for custom endpoints are
keyed by 'custom:<normalized_name>' where the name comes from the
custom_providers config list (auto-generated from URL hostname).

- Pool key format: 'custom:together.ai', 'custom:local-(localhost:8080)'
- load_pool('custom:name') seeds from custom_providers api_key AND
  model.api_key when base_url matches
- hermes auth add/list now shows custom endpoints alongside registry
  providers
- _resolve_openrouter_runtime and _resolve_named_custom_runtime check
  pool before falling back to single config key
- 6 new tests covering custom pool keying, seeding, and listing

* docs: add Excalidraw diagram of full credential pool flow

Comprehensive architecture diagram showing:
- Credential sources (env vars, auth.json OAuth, config.yaml, CLI)
- Pool storage and auto-seeding
- Runtime resolution paths (registry, custom, OpenRouter)
- Error recovery (429 retry-then-rotate, 402 immediate, 401 refresh)
- CLI management commands and strategy configuration

Open at: https://excalidraw.com/#json=2Ycqhqpi6f12E_3ITyiwh,c7u9jSt5BwrmiVzHGbm87g

* fix(tests): update setup wizard pool tests for unified select_provider_and_model flow

The setup wizard now delegates to select_provider_and_model() instead
of using its own prompt_choice-based provider picker. Tests needed:
- Mock select_provider_and_model as no-op (provider pre-written to config)
- Call _stub_tts BEFORE custom prompt_choice mock (it overwrites it)
- Pre-write model.provider to config so the pool step is reached

* docs: add comprehensive credential pool documentation

- New page: website/docs/user-guide/features/credential-pools.md
  Full guide covering quick start, CLI commands, rotation strategies,
  error recovery, custom endpoint pools, auto-discovery, thread safety,
  architecture, and storage format.
- Updated fallback-providers.md to reference credential pools as the
  first layer of resilience (same-provider rotation before cross-provider)
- Added hermes auth to CLI commands reference with usage examples
- Added credential_pool_strategies to configuration guide

* chore: remove excalidraw diagram from repo (external link only)

* refactor: simplify credential pool code — extract helpers, collapse extras, dedup patterns

- _load_config_safe(): replace 4 identical try/except/import blocks
- _iter_custom_providers(): shared generator for custom provider iteration
- PooledCredential.extra dict: collapse 11 round-trip-only fields
  (token_type, scope, client_id, portal_base_url, obtained_at,
  expires_in, agent_key_id, agent_key_expires_in, agent_key_reused,
  agent_key_obtained_at, tls) into a single extra dict with
  __getattr__ for backward-compatible access
- _available_entries(): shared exhaustion-check between select and peek
- Dedup anthropic OAuth seeding (hermes_pkce + claude_code identical)
- SimpleNamespace replaces class _Args boilerplate in auth_commands
- _try_resolve_from_custom_pool(): shared pool-check in runtime_provider

Net -17 lines. All 383 targeted tests pass.

---------

Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
											
										
										
											2026-03-31 03:10:01 -07:00
+								    pool_present, entry = _select_pool_entry("openrouter")
 								    if pool_present:
 								        or_key = _pool_runtime_api_key(entry)
 								        if not or_key:
 								            return None, None
 								        base_url = _pool_runtime_base_url(entry, OPENROUTER_BASE_URL) or OPENROUTER_BASE_URL
 								        logger.debug("Auxiliary client: OpenRouter via pool")
 								        return OpenAI(api_key=or_key, base_url=base_url,
 								                       default_headers=_OR_HEADERS), _OPENROUTER_MODEL
-												Hermes Agent UX Improvements

											
										
										
											2026-02-22 02:16:11 -08:00
+								    or_key = os.getenv("OPENROUTER_API_KEY")
-												feat: enhance auxiliary model configuration and environment variable handling

- Added support for auxiliary model overrides in the configuration, allowing users to specify providers and models for vision and web extraction tasks.
- Updated the CLI configuration example to include new auxiliary model settings.
- Enhanced the environment variable mapping in the CLI to accommodate auxiliary model configurations.
- Improved the resolution logic for auxiliary clients to support task-specific provider overrides.
- Updated relevant documentation and comments for clarity on the new features and their usage.

											
										
										
											2026-03-07 08:52:06 -08:00
+								    if not or_key:
 								        return None, None
 								    logger.debug("Auxiliary client: OpenRouter")
 								    return OpenAI(api_key=or_key, base_url=OPENROUTER_BASE_URL,
 								                   default_headers=_OR_HEADERS), _OPENROUTER_MODEL
-												Hermes Agent UX Improvements

											
										
										
											2026-02-22 02:16:11 -08:00
-												feat: enhance auxiliary model configuration and environment variable handling

- Added support for auxiliary model overrides in the configuration, allowing users to specify providers and models for vision and web extraction tasks.
- Updated the CLI configuration example to include new auxiliary model settings.
- Enhanced the environment variable mapping in the CLI to accommodate auxiliary model configurations.
- Improved the resolution logic for auxiliary clients to support task-specific provider overrides.
- Updated relevant documentation and comments for clarity on the new features and their usage.

											
										
										
											2026-03-07 08:52:06 -08:00
-												fix(agent): clarify exhausted OpenRouter auxiliary credentials

											
										
										
											2026-04-23 09:54:12 -07:00
+								def _describe_openrouter_unavailable() -> str:
 								    """Return a more precise OpenRouter auth failure reason for logs."""
 								    pool_present, entry = _select_pool_entry("openrouter")
 								    if pool_present:
 								        if entry is None:
 								            return "OpenRouter credential pool has no usable entries (credentials may be exhausted)"
 								        if not _pool_runtime_api_key(entry):
 								            return "OpenRouter credential pool entry is missing a runtime API key"
 								    if not str(os.getenv("OPENROUTER_API_KEY") or "").strip():
 								        return "OPENROUTER_API_KEY not set"
 								    return "no usable OpenRouter credentials found"
-												feat: use mimo-v2-pro for non-vision auxiliary tasks on Nous free tier (#6018)

Free-tier Nous Portal users were getting mimo-v2-omni (a multimodal
model) for all auxiliary tasks including compression, session search,
and web extraction. Now routes non-vision tasks to mimo-v2-pro (a
text model) which is better suited for those workloads.

- Added _NOUS_FREE_TIER_AUX_MODEL constant for text auxiliary tasks
- _try_nous() accepts vision=False param to select the right model
- Vision path (_resolve_strict_vision_backend) passes vision=True
- All other callers default to vision=False → mimo-v2-pro
											
										
										
											2026-04-07 21:41:05 -07:00
+								def _try_nous(vision: bool = False) -> Tuple[Optional[OpenAI], Optional[str]]:
-												fix: Nous Portal rate limit guard — prevent retry amplification (#10568)

When Nous returns a 429, the retry amplification chain burns up to 9
API requests per conversation turn (3 SDK retries × 3 Hermes retries),
each counting against RPH and deepening the rate limit. With multiple
concurrent sessions (cron + gateway + auxiliary), this creates a spiral
where retries keep the limit tapped indefinitely.

New module: agent/nous_rate_guard.py
- Shared file-based rate limit state (~/.hermes/rate_limits/nous.json)
- Parses reset time from x-ratelimit-reset-requests-1h, x-ratelimit-
  reset-requests, retry-after headers, or error context
- Falls back to 5-minute default cooldown if no header data
- Atomic writes (tempfile + rename) for cross-process safety
- Auto-cleanup of expired state files

run_agent.py changes:
- Top-of-retry-loop guard: when another session already recorded Nous
  as rate-limited, skip the API call entirely. Try fallback provider
  first, then return a clear message with the reset time.
- On 429 from Nous: record rate limit state and skip further retries
  (sets retry_count = max_retries to trigger fallback path)
- On success from Nous: clear the rate limit state so other sessions
  know they can resume

auxiliary_client.py changes:
- _try_nous() checks rate guard before attempting Nous in the auxiliary
  fallback chain. When rate-limited, returns (None, None) so the chain
  skips to the next provider instead of piling more requests onto Nous.

This eliminates three sources of amplification:
1. Hermes-level retries (saves 6 of 9 calls per turn)
2. Cross-session retries (cron + gateway all skip Nous)
3. Auxiliary fallback to Nous (compression/session_search skip too)

Includes 24 tests covering the rate guard module, header parsing,
state lifecycle, and auxiliary client integration.
											
										
										
											2026-04-15 16:31:48 -07:00
+								    # Check cross-session rate limit guard before attempting Nous —
 								    # if another session already recorded a 429, skip Nous entirely
 								    # to avoid piling more requests onto the tapped RPH bucket.
 								    try:
 								        from agent.nous_rate_guard import nous_rate_limit_remaining
 								        _remaining = nous_rate_limit_remaining()
 								        if _remaining is not None and _remaining > 0:
 								            logger.debug(
 								                "Auxiliary: skipping Nous Portal (rate-limited, resets in %.0fs)",
 								                _remaining,
 								            )
 								            return None, None
 								    except Exception:
 								        pass
-												Hermes Agent UX Improvements

											
										
										
											2026-02-22 02:16:11 -08:00
+								    nous = _read_nous_auth()
-												fix(auxiliary): refresh Nous runtime credentials after aux 401s

											
										
										
											2026-04-21 14:45:13 -06:00
+								    runtime = _resolve_nous_runtime_api(force_refresh=False)
 								    if runtime is None and not nous:
-												feat: enhance auxiliary model configuration and environment variable handling

- Added support for auxiliary model overrides in the configuration, allowing users to specify providers and models for vision and web extraction tasks.
- Updated the CLI configuration example to include new auxiliary model settings.
- Enhanced the environment variable mapping in the CLI to accommodate auxiliary model configurations.
- Improved the resolution logic for auxiliary clients to support task-specific provider overrides.
- Updated relevant documentation and comments for clarity on the new features and their usage.

											
										
										
											2026-03-07 08:52:06 -08:00
+								        return None, None
 								    global auxiliary_is_nous
 								    auxiliary_is_nous = True
 								    logger.debug("Auxiliary client: Nous Portal")
-												feat(aux): use Portal /api/nous/recommended-models for auxiliary models

Wire the auxiliary client (compaction, vision, session search, web extract)
to the Nous Portal's curated recommended-models endpoint when running on
Nous Portal, with a TTL-cached fetch that mirrors how we pull /models for
pricing.

hermes_cli/models.py
  - fetch_nous_recommended_models(portal_base_url, force_refresh=False)
    10-minute TTL cache, keyed per portal URL (staging vs prod don't
    collide).  Public endpoint, no auth required.  Returns {} on any
    failure so callers always get a dict.
  - get_nous_recommended_aux_model(vision, free_tier=None, ...)
    Tier-aware pick from the payload:
      - Paid tier → paidRecommended{Vision,Compaction}Model, falling back
        to freeRecommended* when the paid field is null (common during
        staged rollouts of new paid models).
      - Free tier → freeRecommended* only, never leaks paid models.
    When free_tier is None, auto-detects via the existing
    check_nous_free_tier() helper (already cached 3 min against
    /api/oauth/account).  Detection errors default to paid so we never
    silently downgrade a paying user.

agent/auxiliary_client.py — _try_nous()
  - Replaces the hardcoded xiaomi/mimo free-tier branch with a single call
    to get_nous_recommended_aux_model(vision=vision).
  - Falls back to _NOUS_MODEL (google/gemini-3-flash-preview) when the
    Portal is unreachable or returns a null recommendation.
  - The Portal is now the source of truth for aux model selection; the
    xiaomi allowlist we used to carry is effectively dead.

Tests (15 new)
  - tests/hermes_cli/test_models.py::TestNousRecommendedModels
    Fetch caching, per-portal keying, network failure, force_refresh;
    paid-prefers-paid, paid-falls-to-free, free-never-leaks-paid,
    auto-detect, detection-error → paid default, null/blank modelName
    handling.
  - tests/agent/test_auxiliary_client.py::TestNousAuxiliaryRefresh
    _try_nous honors Portal recommendation for text + vision, falls
    back to google/gemini-3-flash-preview on None or exception.

Behavior won't visibly change today — both tier recommendations currently
point at google/gemini-3-flash-preview — but the moment the Portal ships
a better paid recommendation, subscribers pick it up within 10 minutes
without a Hermes release.

											
										
										
											2026-04-21 22:53:45 -04:00
 								    # Ask the Portal which model it currently recommends for this task type.
 								    # The /api/nous/recommended-models endpoint is the authoritative source:
 								    # it distinguishes paid vs free tier recommendations, and get_nous_recommended_aux_model
 								    # auto-detects the caller's tier via check_nous_free_tier().  Fall back to
 								    # _NOUS_MODEL (google/gemini-3-flash-preview) when the Portal is unreachable
 								    # or returns a null recommendation for this task type.
 								    model = _NOUS_MODEL
-												feat(nous): free-tier model gating, pricing display, and vision fallback

- Show pricing during initial Nous Portal login (was missing from
  _login_nous, only shown in the already-logged-in hermes model path)

- Filter free models for paid subscribers: non-allowlisted free models
  are hidden; allowlisted models (xiaomi/mimo-v2-pro, xiaomi/mimo-v2-omni)
  only appear when actually priced as free

- Detect free-tier accounts via portal api/oauth/account endpoint
  (monthly_charge == 0); free-tier users see only free models as
  selectable, with paid models shown dimmed and unselectable

- Use xiaomi/mimo-v2-omni as the auxiliary vision model for free-tier
  Nous users so vision_analyze and browser_vision work without paid
  model access (replaces the default google/gemini-3-flash-preview)

- Unavailable models rendered via print() before TerminalMenu to avoid
  simple_term_menu line-width padding artifacts; upgrade URL resolved
  from auth state portal_base_url (supports staging/custom portals)

- Add 21 tests covering filter_nous_free_models, is_nous_free_tier,
  and partition_nous_models_by_tier

											
										
										
											2026-04-07 02:17:14 -04:00
+								    try:
-												feat(aux): use Portal /api/nous/recommended-models for auxiliary models

Wire the auxiliary client (compaction, vision, session search, web extract)
to the Nous Portal's curated recommended-models endpoint when running on
Nous Portal, with a TTL-cached fetch that mirrors how we pull /models for
pricing.

hermes_cli/models.py
  - fetch_nous_recommended_models(portal_base_url, force_refresh=False)
    10-minute TTL cache, keyed per portal URL (staging vs prod don't
    collide).  Public endpoint, no auth required.  Returns {} on any
    failure so callers always get a dict.
  - get_nous_recommended_aux_model(vision, free_tier=None, ...)
    Tier-aware pick from the payload:
      - Paid tier → paidRecommended{Vision,Compaction}Model, falling back
        to freeRecommended* when the paid field is null (common during
        staged rollouts of new paid models).
      - Free tier → freeRecommended* only, never leaks paid models.
    When free_tier is None, auto-detects via the existing
    check_nous_free_tier() helper (already cached 3 min against
    /api/oauth/account).  Detection errors default to paid so we never
    silently downgrade a paying user.

agent/auxiliary_client.py — _try_nous()
  - Replaces the hardcoded xiaomi/mimo free-tier branch with a single call
    to get_nous_recommended_aux_model(vision=vision).
  - Falls back to _NOUS_MODEL (google/gemini-3-flash-preview) when the
    Portal is unreachable or returns a null recommendation.
  - The Portal is now the source of truth for aux model selection; the
    xiaomi allowlist we used to carry is effectively dead.

Tests (15 new)
  - tests/hermes_cli/test_models.py::TestNousRecommendedModels
    Fetch caching, per-portal keying, network failure, force_refresh;
    paid-prefers-paid, paid-falls-to-free, free-never-leaks-paid,
    auto-detect, detection-error → paid default, null/blank modelName
    handling.
  - tests/agent/test_auxiliary_client.py::TestNousAuxiliaryRefresh
    _try_nous honors Portal recommendation for text + vision, falls
    back to google/gemini-3-flash-preview on None or exception.

Behavior won't visibly change today — both tier recommendations currently
point at google/gemini-3-flash-preview — but the moment the Portal ships
a better paid recommendation, subscribers pick it up within 10 minutes
without a Hermes release.

											
										
										
											2026-04-21 22:53:45 -04:00
+								        from hermes_cli.models import get_nous_recommended_aux_model
 								        recommended = get_nous_recommended_aux_model(vision=vision)
 								        if recommended:
 								            model = recommended
 								            logger.debug(
 								                "Auxiliary/%s: using Portal-recommended model %s",
 								                "vision" if vision else "text", model,
 								            )
 								        else:
 								            logger.debug(
 								                "Auxiliary/%s: no Portal recommendation, falling back to %s",
 								                "vision" if vision else "text", model,
 								            )
 								    except Exception as exc:
 								        logger.debug(
 								            "Auxiliary/%s: recommended-models lookup failed (%s); "
 								            "falling back to %s",
 								            "vision" if vision else "text", exc, model,
 								        )
-												fix(auxiliary): refresh Nous runtime credentials after aux 401s

											
										
										
											2026-04-21 14:45:13 -06:00
+								    if runtime is not None:
 								        api_key, base_url = runtime
 								    else:
 								        api_key = _nous_api_key(nous or {})
 								        base_url = str((nous or {}).get("inference_base_url") or _nous_base_url()).rstrip("/")
-												feat: enhance auxiliary model configuration and environment variable handling

- Added support for auxiliary model overrides in the configuration, allowing users to specify providers and models for vision and web extraction tasks.
- Updated the CLI configuration example to include new auxiliary model settings.
- Enhanced the environment variable mapping in the CLI to accommodate auxiliary model configurations.
- Improved the resolution logic for auxiliary clients to support task-specific provider overrides.
- Updated relevant documentation and comments for clarity on the new features and their usage.

											
										
										
											2026-03-07 08:52:06 -08:00
+								    return (
-												feat(auth): same-provider credential pools with rotation, custom endpoint support, and interactive CLI (#2647)

* feat(auth): add same-provider credential pools and rotation UX

Add same-provider credential pooling so Hermes can rotate across
multiple credentials for a single provider, recover from exhausted
credentials without jumping providers immediately, and configure
that behavior directly in hermes setup.

- agent/credential_pool.py: persisted per-provider credential pools
- hermes auth add/list/remove/reset CLI commands
- 429/402/401 recovery with pool rotation in run_agent.py
- Setup wizard integration for pool strategy configuration
- Auto-seeding from env vars and existing OAuth state

Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
Salvaged from PR #2647

* fix(tests): prevent pool auto-seeding from host env in credential pool tests

Tests for non-pool Anthropic paths and auth remove were failing when
host env vars (ANTHROPIC_API_KEY) or file-backed OAuth credentials
were present. The pool auto-seeding picked these up, causing unexpected
pool entries in tests.

- Mock _select_pool_entry in auxiliary_client OAuth flag tests
- Clear Anthropic env vars and mock _seed_from_singletons in auth remove test

* feat(auth): add thread safety, least_used strategy, and request counting

- Add threading.Lock to CredentialPool for gateway thread safety
  (concurrent requests from multiple gateway sessions could race on
  pool state mutations without this)
- Add 'least_used' rotation strategy that selects the credential
  with the lowest request_count, distributing load more evenly
- Add request_count field to PooledCredential for usage tracking
- Add mark_used() method to increment per-credential request counts
- Wrap select(), mark_exhausted_and_rotate(), and try_refresh_current()
  with lock acquisition
- Add tests: least_used selection, mark_used counting, concurrent
  thread safety (4 threads × 20 selects with no corruption)

* feat(auth): add interactive mode for bare 'hermes auth' command

When 'hermes auth' is called without a subcommand, it now launches an
interactive wizard that:

1. Shows full credential pool status across all providers
2. Offers a menu: add, remove, reset cooldowns, set strategy
3. For OAuth-capable providers (anthropic, nous, openai-codex), the
   add flow explicitly asks 'API key or OAuth login?' — making it
   clear that both auth types are supported for the same provider
4. Strategy picker shows all 4 options (fill_first, round_robin,
   least_used, random) with the current selection marked
5. Remove flow shows entries with indices for easy selection

The subcommand paths (hermes auth add/list/remove/reset) still work
exactly as before for scripted/non-interactive use.

* fix(tests): update runtime_provider tests for config.yaml source of truth (#4165)

Tests were using OPENAI_BASE_URL env var which is no longer consulted
after #4165. Updated to use model config (provider, base_url, api_key)
which is the new single source of truth for custom endpoint URLs.

* feat(auth): support custom endpoint credential pools keyed by provider name

Custom OpenAI-compatible endpoints all share provider='custom', making
the provider-keyed pool useless. Now pools for custom endpoints are
keyed by 'custom:<normalized_name>' where the name comes from the
custom_providers config list (auto-generated from URL hostname).

- Pool key format: 'custom:together.ai', 'custom:local-(localhost:8080)'
- load_pool('custom:name') seeds from custom_providers api_key AND
  model.api_key when base_url matches
- hermes auth add/list now shows custom endpoints alongside registry
  providers
- _resolve_openrouter_runtime and _resolve_named_custom_runtime check
  pool before falling back to single config key
- 6 new tests covering custom pool keying, seeding, and listing

* docs: add Excalidraw diagram of full credential pool flow

Comprehensive architecture diagram showing:
- Credential sources (env vars, auth.json OAuth, config.yaml, CLI)
- Pool storage and auto-seeding
- Runtime resolution paths (registry, custom, OpenRouter)
- Error recovery (429 retry-then-rotate, 402 immediate, 401 refresh)
- CLI management commands and strategy configuration

Open at: https://excalidraw.com/#json=2Ycqhqpi6f12E_3ITyiwh,c7u9jSt5BwrmiVzHGbm87g

* fix(tests): update setup wizard pool tests for unified select_provider_and_model flow

The setup wizard now delegates to select_provider_and_model() instead
of using its own prompt_choice-based provider picker. Tests needed:
- Mock select_provider_and_model as no-op (provider pre-written to config)
- Call _stub_tts BEFORE custom prompt_choice mock (it overwrites it)
- Pre-write model.provider to config so the pool step is reached

* docs: add comprehensive credential pool documentation

- New page: website/docs/user-guide/features/credential-pools.md
  Full guide covering quick start, CLI commands, rotation strategies,
  error recovery, custom endpoint pools, auto-discovery, thread safety,
  architecture, and storage format.
- Updated fallback-providers.md to reference credential pools as the
  first layer of resilience (same-provider rotation before cross-provider)
- Added hermes auth to CLI commands reference with usage examples
- Added credential_pool_strategies to configuration guide

* chore: remove excalidraw diagram from repo (external link only)

* refactor: simplify credential pool code — extract helpers, collapse extras, dedup patterns

- _load_config_safe(): replace 4 identical try/except/import blocks
- _iter_custom_providers(): shared generator for custom provider iteration
- PooledCredential.extra dict: collapse 11 round-trip-only fields
  (token_type, scope, client_id, portal_base_url, obtained_at,
  expires_in, agent_key_id, agent_key_expires_in, agent_key_reused,
  agent_key_obtained_at, tls) into a single extra dict with
  __getattr__ for backward-compatible access
- _available_entries(): shared exhaustion-check between select and peek
- Dedup anthropic OAuth seeding (hermes_pkce + claude_code identical)
- SimpleNamespace replaces class _Args boilerplate in auth_commands
- _try_resolve_from_custom_pool(): shared pool-check in runtime_provider

Net -17 lines. All 383 targeted tests pass.

---------

Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
											
										
										
											2026-03-31 03:10:01 -07:00
+								        OpenAI(
-												fix(auxiliary): refresh Nous runtime credentials after aux 401s

											
										
										
											2026-04-21 14:45:13 -06:00
+								            api_key=api_key,
 								            base_url=base_url,
-												feat(auth): same-provider credential pools with rotation, custom endpoint support, and interactive CLI (#2647)

* feat(auth): add same-provider credential pools and rotation UX

Add same-provider credential pooling so Hermes can rotate across
multiple credentials for a single provider, recover from exhausted
credentials without jumping providers immediately, and configure
that behavior directly in hermes setup.

- agent/credential_pool.py: persisted per-provider credential pools
- hermes auth add/list/remove/reset CLI commands
- 429/402/401 recovery with pool rotation in run_agent.py
- Setup wizard integration for pool strategy configuration
- Auto-seeding from env vars and existing OAuth state

Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
Salvaged from PR #2647

* fix(tests): prevent pool auto-seeding from host env in credential pool tests

Tests for non-pool Anthropic paths and auth remove were failing when
host env vars (ANTHROPIC_API_KEY) or file-backed OAuth credentials
were present. The pool auto-seeding picked these up, causing unexpected
pool entries in tests.

- Mock _select_pool_entry in auxiliary_client OAuth flag tests
- Clear Anthropic env vars and mock _seed_from_singletons in auth remove test

* feat(auth): add thread safety, least_used strategy, and request counting

- Add threading.Lock to CredentialPool for gateway thread safety
  (concurrent requests from multiple gateway sessions could race on
  pool state mutations without this)
- Add 'least_used' rotation strategy that selects the credential
  with the lowest request_count, distributing load more evenly
- Add request_count field to PooledCredential for usage tracking
- Add mark_used() method to increment per-credential request counts
- Wrap select(), mark_exhausted_and_rotate(), and try_refresh_current()
  with lock acquisition
- Add tests: least_used selection, mark_used counting, concurrent
  thread safety (4 threads × 20 selects with no corruption)

* feat(auth): add interactive mode for bare 'hermes auth' command

When 'hermes auth' is called without a subcommand, it now launches an
interactive wizard that:

1. Shows full credential pool status across all providers
2. Offers a menu: add, remove, reset cooldowns, set strategy
3. For OAuth-capable providers (anthropic, nous, openai-codex), the
   add flow explicitly asks 'API key or OAuth login?' — making it
   clear that both auth types are supported for the same provider
4. Strategy picker shows all 4 options (fill_first, round_robin,
   least_used, random) with the current selection marked
5. Remove flow shows entries with indices for easy selection

The subcommand paths (hermes auth add/list/remove/reset) still work
exactly as before for scripted/non-interactive use.

* fix(tests): update runtime_provider tests for config.yaml source of truth (#4165)

Tests were using OPENAI_BASE_URL env var which is no longer consulted
after #4165. Updated to use model config (provider, base_url, api_key)
which is the new single source of truth for custom endpoint URLs.

* feat(auth): support custom endpoint credential pools keyed by provider name

Custom OpenAI-compatible endpoints all share provider='custom', making
the provider-keyed pool useless. Now pools for custom endpoints are
keyed by 'custom:<normalized_name>' where the name comes from the
custom_providers config list (auto-generated from URL hostname).

- Pool key format: 'custom:together.ai', 'custom:local-(localhost:8080)'
- load_pool('custom:name') seeds from custom_providers api_key AND
  model.api_key when base_url matches
- hermes auth add/list now shows custom endpoints alongside registry
  providers
- _resolve_openrouter_runtime and _resolve_named_custom_runtime check
  pool before falling back to single config key
- 6 new tests covering custom pool keying, seeding, and listing

* docs: add Excalidraw diagram of full credential pool flow

Comprehensive architecture diagram showing:
- Credential sources (env vars, auth.json OAuth, config.yaml, CLI)
- Pool storage and auto-seeding
- Runtime resolution paths (registry, custom, OpenRouter)
- Error recovery (429 retry-then-rotate, 402 immediate, 401 refresh)
- CLI management commands and strategy configuration

Open at: https://excalidraw.com/#json=2Ycqhqpi6f12E_3ITyiwh,c7u9jSt5BwrmiVzHGbm87g

* fix(tests): update setup wizard pool tests for unified select_provider_and_model flow

The setup wizard now delegates to select_provider_and_model() instead
of using its own prompt_choice-based provider picker. Tests needed:
- Mock select_provider_and_model as no-op (provider pre-written to config)
- Call _stub_tts BEFORE custom prompt_choice mock (it overwrites it)
- Pre-write model.provider to config so the pool step is reached

* docs: add comprehensive credential pool documentation

- New page: website/docs/user-guide/features/credential-pools.md
  Full guide covering quick start, CLI commands, rotation strategies,
  error recovery, custom endpoint pools, auto-discovery, thread safety,
  architecture, and storage format.
- Updated fallback-providers.md to reference credential pools as the
  first layer of resilience (same-provider rotation before cross-provider)
- Added hermes auth to CLI commands reference with usage examples
- Added credential_pool_strategies to configuration guide

* chore: remove excalidraw diagram from repo (external link only)

* refactor: simplify credential pool code — extract helpers, collapse extras, dedup patterns

- _load_config_safe(): replace 4 identical try/except/import blocks
- _iter_custom_providers(): shared generator for custom provider iteration
- PooledCredential.extra dict: collapse 11 round-trip-only fields
  (token_type, scope, client_id, portal_base_url, obtained_at,
  expires_in, agent_key_id, agent_key_expires_in, agent_key_reused,
  agent_key_obtained_at, tls) into a single extra dict with
  __getattr__ for backward-compatible access
- _available_entries(): shared exhaustion-check between select and peek
- Dedup anthropic OAuth seeding (hermes_pkce + claude_code identical)
- SimpleNamespace replaces class _Args boilerplate in auth_commands
- _try_resolve_from_custom_pool(): shared pool-check in runtime_provider

Net -17 lines. All 383 targeted tests pass.

---------

Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
											
										
										
											2026-03-31 03:10:01 -07:00
+								        ),
 								        model,
-												feat: enhance auxiliary model configuration and environment variable handling

- Added support for auxiliary model overrides in the configuration, allowing users to specify providers and models for vision and web extraction tasks.
- Updated the CLI configuration example to include new auxiliary model settings.
- Enhanced the environment variable mapping in the CLI to accommodate auxiliary model configurations.
- Improved the resolution logic for auxiliary clients to support task-specific provider overrides.
- Updated relevant documentation and comments for clarity on the new features and their usage.

											
										
										
											2026-03-07 08:52:06 -08:00
+								    )
-												Hermes Agent UX Improvements

											
										
										
											2026-02-22 02:16:11 -08:00
-												fix: auxiliary client uses main model for custom/local endpoints instead of gpt-4o-mini (#1189)

* fix: prevent model/provider mismatch when switching providers during active gateway

When _update_config_for_provider() writes the new provider and base_url
to config.yaml, the gateway (which re-reads config per-message) can pick
up the change before model selection completes. This causes the old model
name (e.g. 'anthropic/claude-opus-4.6') to be sent to the new provider's
API (e.g. MiniMax), which fails.

Changes:
- _update_config_for_provider() now accepts an optional default_model
  parameter. When provided and the current model.default is empty or
  uses OpenRouter format (contains '/'), it sets a safe default model
  for the new provider.
- All setup.py callers for direct-API providers (zai, kimi, minimax,
  minimax-cn, anthropic) now pass a provider-appropriate default model.
- _setup_provider_model_selection() now validates the 'Keep current'
  choice: if the current model uses OpenRouter format and wouldn't work
  with the new provider, it warns and switches to the provider's first
  default model instead of silently keeping the incompatible name.

Reported by a user on Home Assistant whose gateway started sending
'anthropic/claude-opus-4.6' to MiniMax's API after running hermes setup.

* fix: auxiliary client uses main model for custom/local endpoints instead of gpt-4o-mini

When a user runs a local server (e.g. Qwen3.5-9B via OPENAI_BASE_URL),
the auxiliary client (context compression, vision, session search) would
send requests for 'gpt-4o-mini' or 'google/gemini-3-flash-preview' to
the local server, which only serves one model — causing 404 errors
mid-task.

Changes:
- _try_custom_endpoint() now reads the user's configured main model via
  _read_main_model() (checks OPENAI_MODEL → HERMES_MODEL → LLM_MODEL →
  config.yaml model.default) instead of hardcoding 'gpt-4o-mini'.
- resolve_provider_client() auto mode now detects when an OpenRouter-
  formatted model override (containing '/') would be sent to a non-
  OpenRouter provider (like a local server) and drops it in favor of
  the provider's default model.
- Test isolation fixes: properly clear env vars in 'nothing available'
  tests to prevent host environment leakage.
											
										
										
											2026-03-13 10:02:16 -07:00
+								def _read_main_model() -> str:
-												refactor: make config.yaml the single source of truth for endpoint URLs (#4165)

OPENAI_BASE_URL was written to .env AND config.yaml, creating a dual-source
confusion. Users (especially Docker) would see the URL in .env and assume
that's where all config lives, then wonder why LLM_MODEL in .env didn't work.

Changes:
- Remove all 27 save_env_value("OPENAI_BASE_URL", ...) calls across main.py,
  setup.py, and tools_config.py
- Remove OPENAI_BASE_URL env var reading from runtime_provider.py, cli.py,
  models.py, and gateway/run.py
- Remove LLM_MODEL/HERMES_MODEL env var reading from gateway/run.py and
  auxiliary_client.py — config.yaml model.default is authoritative
- Vision base URL now saved to config.yaml auxiliary.vision.base_url
  (both setup wizard and tools_config paths)
- Tests updated to set config values instead of env vars

Convention enforced: .env is for SECRETS only (API keys). All other
configuration (model names, base URLs, provider selection) lives
exclusively in config.yaml.
											
										
										
											2026-03-30 22:02:53 -07:00
+								    """Read the user's configured main model from config.yaml.
-												fix: auxiliary client uses main model for custom/local endpoints instead of gpt-4o-mini (#1189)

* fix: prevent model/provider mismatch when switching providers during active gateway

When _update_config_for_provider() writes the new provider and base_url
to config.yaml, the gateway (which re-reads config per-message) can pick
up the change before model selection completes. This causes the old model
name (e.g. 'anthropic/claude-opus-4.6') to be sent to the new provider's
API (e.g. MiniMax), which fails.

Changes:
- _update_config_for_provider() now accepts an optional default_model
  parameter. When provided and the current model.default is empty or
  uses OpenRouter format (contains '/'), it sets a safe default model
  for the new provider.
- All setup.py callers for direct-API providers (zai, kimi, minimax,
  minimax-cn, anthropic) now pass a provider-appropriate default model.
- _setup_provider_model_selection() now validates the 'Keep current'
  choice: if the current model uses OpenRouter format and wouldn't work
  with the new provider, it warns and switches to the provider's first
  default model instead of silently keeping the incompatible name.

Reported by a user on Home Assistant whose gateway started sending
'anthropic/claude-opus-4.6' to MiniMax's API after running hermes setup.

* fix: auxiliary client uses main model for custom/local endpoints instead of gpt-4o-mini

When a user runs a local server (e.g. Qwen3.5-9B via OPENAI_BASE_URL),
the auxiliary client (context compression, vision, session search) would
send requests for 'gpt-4o-mini' or 'google/gemini-3-flash-preview' to
the local server, which only serves one model — causing 404 errors
mid-task.

Changes:
- _try_custom_endpoint() now reads the user's configured main model via
  _read_main_model() (checks OPENAI_MODEL → HERMES_MODEL → LLM_MODEL →
  config.yaml model.default) instead of hardcoding 'gpt-4o-mini'.
- resolve_provider_client() auto mode now detects when an OpenRouter-
  formatted model override (containing '/') would be sent to a non-
  OpenRouter provider (like a local server) and drops it in favor of
  the provider's default model.
- Test isolation fixes: properly clear env vars in 'nothing available'
  tests to prevent host environment leakage.
											
										
										
											2026-03-13 10:02:16 -07:00
-												refactor: make config.yaml the single source of truth for endpoint URLs (#4165)

OPENAI_BASE_URL was written to .env AND config.yaml, creating a dual-source
confusion. Users (especially Docker) would see the URL in .env and assume
that's where all config lives, then wonder why LLM_MODEL in .env didn't work.

Changes:
- Remove all 27 save_env_value("OPENAI_BASE_URL", ...) calls across main.py,
  setup.py, and tools_config.py
- Remove OPENAI_BASE_URL env var reading from runtime_provider.py, cli.py,
  models.py, and gateway/run.py
- Remove LLM_MODEL/HERMES_MODEL env var reading from gateway/run.py and
  auxiliary_client.py — config.yaml model.default is authoritative
- Vision base URL now saved to config.yaml auxiliary.vision.base_url
  (both setup wizard and tools_config paths)
- Tests updated to set config values instead of env vars

Convention enforced: .env is for SECRETS only (API keys). All other
configuration (model names, base URLs, provider selection) lives
exclusively in config.yaml.
											
										
										
											2026-03-30 22:02:53 -07:00
+								    config.yaml model.default is the single source of truth for the active
 								    model. Environment variables are no longer consulted.
-												fix: auxiliary client uses main model for custom/local endpoints instead of gpt-4o-mini (#1189)

* fix: prevent model/provider mismatch when switching providers during active gateway

When _update_config_for_provider() writes the new provider and base_url
to config.yaml, the gateway (which re-reads config per-message) can pick
up the change before model selection completes. This causes the old model
name (e.g. 'anthropic/claude-opus-4.6') to be sent to the new provider's
API (e.g. MiniMax), which fails.

Changes:
- _update_config_for_provider() now accepts an optional default_model
  parameter. When provided and the current model.default is empty or
  uses OpenRouter format (contains '/'), it sets a safe default model
  for the new provider.
- All setup.py callers for direct-API providers (zai, kimi, minimax,
  minimax-cn, anthropic) now pass a provider-appropriate default model.
- _setup_provider_model_selection() now validates the 'Keep current'
  choice: if the current model uses OpenRouter format and wouldn't work
  with the new provider, it warns and switches to the provider's first
  default model instead of silently keeping the incompatible name.

Reported by a user on Home Assistant whose gateway started sending
'anthropic/claude-opus-4.6' to MiniMax's API after running hermes setup.

* fix: auxiliary client uses main model for custom/local endpoints instead of gpt-4o-mini

When a user runs a local server (e.g. Qwen3.5-9B via OPENAI_BASE_URL),
the auxiliary client (context compression, vision, session search) would
send requests for 'gpt-4o-mini' or 'google/gemini-3-flash-preview' to
the local server, which only serves one model — causing 404 errors
mid-task.

Changes:
- _try_custom_endpoint() now reads the user's configured main model via
  _read_main_model() (checks OPENAI_MODEL → HERMES_MODEL → LLM_MODEL →
  config.yaml model.default) instead of hardcoding 'gpt-4o-mini'.
- resolve_provider_client() auto mode now detects when an OpenRouter-
  formatted model override (containing '/') would be sent to a non-
  OpenRouter provider (like a local server) and drops it in favor of
  the provider's default model.
- Test isolation fixes: properly clear env vars in 'nothing available'
  tests to prevent host environment leakage.
											
										
										
											2026-03-13 10:02:16 -07:00
+								    """
 								    try:
 								        from hermes_cli.config import load_config
 								        cfg = load_config()
 								        model_cfg = cfg.get("model", {})
 								        if isinstance(model_cfg, str) and model_cfg.strip():
 								            return model_cfg.strip()
 								        if isinstance(model_cfg, dict):
 								            default = model_cfg.get("default", "")
 								            if isinstance(default, str) and default.strip():
 								                return default.strip()
 								    except Exception:
 								        pass
 								    return ""
-												fix: use main provider model for auxiliary tasks on non-aggregator providers (#5091)

Users on direct API-key providers (Alibaba, DeepSeek, ZAI, etc.) without
an OpenRouter or Nous key would get broken auxiliary tasks (compression,
vision, etc.) because _resolve_auto() only tried aggregator providers
first, then fell back to iterating PROVIDER_REGISTRY with wrong default
model names.

Now _resolve_auto() checks the user's main provider first. If it's not
an aggregator (OpenRouter/Nous), it uses their main model directly for
all auxiliary tasks. Aggregator users still get the cheap gemini-flash
model as before.

Adds _read_main_provider() to read model.provider from config.yaml,
mirroring the existing _read_main_model().

Reported by SkyLinx — Alibaba Coding Plan user getting 400 errors from
google/gemini-3-flash-preview being sent to DashScope.
											
										
										
											2026-04-04 12:07:43 -07:00
+								def _read_main_provider() -> str:
 								    """Read the user's configured main provider from config.yaml.
 								    Returns the lowercase provider id (e.g. "alibaba", "openrouter") or ""
 								    if not configured.
 								    """
 								    try:
 								        from hermes_cli.config import load_config
 								        cfg = load_config()
 								        model_cfg = cfg.get("model", {})
 								        if isinstance(model_cfg, dict):
 								            provider = model_cfg.get("provider", "")
 								            if isinstance(provider, str) and provider.strip():
-												fix: CI test failures — metadata key, cli console, docker env, vision order (#6294)

Fixes 9 test failures on current main, incorporating ideas from PR stack
#6219-#6222 by xinbenlv with corrections:

- model_metadata: sync HF context length key casing
  (minimaxai/minimax-m2.5 → MiniMaxAI/MiniMax-M2.5)

- cli.py: route quick command error output through self.console
  instead of creating a new ChatConsole() instance

- docker.py: explicit docker_forward_env entries now bypass the
  Hermes secret blocklist (intentional opt-in wins over generic filter)

- auxiliary_client: revert _read_main_provider() to simple
  provider.strip().lower() — the _normalize_aux_provider() call
  introduced in 5c03f2e7 stripped the custom: prefix, breaking
  named custom provider resolution

- auxiliary_client: flip vision auto-detection order to
  active provider → OpenRouter → Nous → stop (was OR → Nous → active)

- test: update vision priority test to match new order

Based on PR #6219-#6222 by xinbenlv.
											
										
										
											2026-04-08 16:37:05 -07:00
+								                return provider.strip().lower()
-												fix: use main provider model for auxiliary tasks on non-aggregator providers (#5091)

Users on direct API-key providers (Alibaba, DeepSeek, ZAI, etc.) without
an OpenRouter or Nous key would get broken auxiliary tasks (compression,
vision, etc.) because _resolve_auto() only tried aggregator providers
first, then fell back to iterating PROVIDER_REGISTRY with wrong default
model names.

Now _resolve_auto() checks the user's main provider first. If it's not
an aggregator (OpenRouter/Nous), it uses their main model directly for
all auxiliary tasks. Aggregator users still get the cheap gemini-flash
model as before.

Adds _read_main_provider() to read model.provider from config.yaml,
mirroring the existing _read_main_model().

Reported by SkyLinx — Alibaba Coding Plan user getting 400 errors from
google/gemini-3-flash-preview being sent to DashScope.
											
										
										
											2026-04-04 12:07:43 -07:00
+								    except Exception:
 								        pass
 								    return ""
-												fix(aux): honor api_mode for custom auxiliary endpoints

											
										
										
											2026-04-10 00:10:32 +02:00
+								def _resolve_custom_runtime() -> Tuple[Optional[str], Optional[str], Optional[str]]:
-												fix: restore config-saved custom endpoint resolution

											
										
										
											2026-03-14 20:58:12 -07:00
+								    """Resolve the active custom/main endpoint the same way the main CLI does.
 								    This covers both env-driven OPENAI_BASE_URL setups and config-saved custom
 								    endpoints where the base URL lives in config.yaml instead of the live
 								    environment.
 								    """
 								    try:
 								        from hermes_cli.runtime_provider import resolve_runtime_provider
 								        runtime = resolve_runtime_provider(requested="custom")
 								    except Exception as exc:
 								        logger.debug("Auxiliary client: custom runtime resolution failed: %s", exc)
-												fix(aux): honor api_mode for custom auxiliary endpoints

											
										
										
											2026-04-10 00:10:32 +02:00
+								        runtime = None
 								    if not isinstance(runtime, dict):
 								        openai_base = os.getenv("OPENAI_BASE_URL", "").strip().rstrip("/")
 								        openai_key = os.getenv("OPENAI_API_KEY", "").strip()
 								        if not openai_base:
 								            return None, None, None
 								        runtime = {
 								            "base_url": openai_base,
 								            "api_key": openai_key,
 								        }
-												fix: restore config-saved custom endpoint resolution

											
										
										
											2026-03-14 20:58:12 -07:00
 								    custom_base = runtime.get("base_url")
 								    custom_key = runtime.get("api_key")
-												fix(aux): honor api_mode for custom auxiliary endpoints

											
										
										
											2026-04-10 00:10:32 +02:00
+								    custom_mode = runtime.get("api_mode")
-												fix: restore config-saved custom endpoint resolution

											
										
										
											2026-03-14 20:58:12 -07:00
+								    if not isinstance(custom_base, str) or not custom_base.strip():
-												fix(aux): honor api_mode for custom auxiliary endpoints

											
										
										
											2026-04-10 00:10:32 +02:00
+								        return None, None, None
-												fix: restore config-saved custom endpoint resolution

											
										
										
											2026-03-14 20:58:12 -07:00
 								    custom_base = custom_base.strip().rstrip("/")
-												fix: sweep remaining provider-URL substring checks across codebase

Completes the hostname-hardening sweep — every substring check against a
provider host in live-routing code is now hostname-based. This closes the
same false-positive class for OpenRouter, GitHub Copilot, Kimi, Qwen,
ChatGPT/Codex, Bedrock, GitHub Models, Vercel AI Gateway, Nous, Z.AI,
Moonshot, Arcee, and MiniMax that the original PR closed for OpenAI, xAI,
and Anthropic.

New helper:
- utils.base_url_host_matches(base_url, domain) — safe counterpart to
  'domain in base_url'. Accepts hostname equality and subdomain matches;
  rejects path segments, host suffixes, and prefix collisions.

Call sites converted (real-code only; tests, optional-skills, red-teaming
scripts untouched):

run_agent.py (10 sites):
- AIAgent.__init__ Bedrock branch, ChatGPT/Codex branch (also path check)
- header cascade for openrouter / copilot / kimi / qwen / chatgpt
- interleaved-thinking trigger (openrouter + claude)
- _is_openrouter_url(), _is_qwen_portal()
- is_native_anthropic check
- github-models-vs-copilot detection (3 sites)
- reasoning-capable route gate (nousresearch, vercel, github)
- codex-backend detection in API kwargs build
- fallback api_mode Bedrock detection

agent/auxiliary_client.py (7 sites):
- extra-headers cascades in 4 distinct client-construction paths
  (resolve custom, resolve auto, OpenRouter-fallback-to-custom,
  _async_client_from_sync, resolve_provider_client explicit-custom,
  resolve_auto_with_codex)
- _is_openrouter_client() base_url sniff

agent/usage_pricing.py:
- resolve_billing_route openrouter branch

agent/model_metadata.py:
- _is_openrouter_base_url(), Bedrock context-length lookup

hermes_cli/providers.py:
- determine_api_mode Bedrock heuristic

hermes_cli/runtime_provider.py:
- _is_openrouter_url flag for API-key preference (issues #420, #560)

hermes_cli/doctor.py:
- Kimi User-Agent header for /models probes

tools/delegate_tool.py:
- subagent Codex endpoint detection

trajectory_compressor.py:
- _detect_provider() cascade (8 providers: openrouter, nous, codex, zai,
  kimi-coding, arcee, minimax-cn, minimax)

cli.py, gateway/run.py:
- /model-switch cache-enabled hint (openrouter + claude)

Bedrock detection tightened from 'bedrock-runtime in url' to
'hostname starts with bedrock-runtime. AND host is under amazonaws.com'.
ChatGPT/Codex detection tightened from 'chatgpt.com/backend-api/codex in
url' to 'hostname is chatgpt.com AND path contains /backend-api/codex'.

Tests:
- tests/test_base_url_hostname.py extended with a base_url_host_matches
  suite (exact match, subdomain, path-segment rejection, host-suffix
  rejection, host-prefix rejection, empty-input, case-insensitivity,
  trailing dot).

Validation: 651 targeted tests pass (runtime_provider, minimax, bedrock,
gemini, auxiliary, codex_cloudflare, usage_pricing, compressor_fallback,
fallback_model, openai_client_lifecycle, provider_parity, cli_provider_resolution,
delegate, credential_pool, context_compressor, plus the 4 hostname test
modules). 26-assertion E2E call-site verification across 6 modules passes.

											
										
										
											2026-04-20 21:17:28 -07:00
+								    if base_url_host_matches(custom_base, "openrouter.ai"):
-												fix: restore config-saved custom endpoint resolution

											
										
										
											2026-03-14 20:58:12 -07:00
+								        # requested='custom' falls back to OpenRouter when no custom endpoint is
 								        # configured. Treat that as "no custom endpoint" for auxiliary routing.
-												fix(aux): honor api_mode for custom auxiliary endpoints

											
										
										
											2026-04-10 00:10:32 +02:00
+								        return None, None, None
-												fix: restore config-saved custom endpoint resolution

											
										
										
											2026-03-14 20:58:12 -07:00
-												fix: auxiliary client uses placeholder key for local servers without auth (#3842)

Local inference servers (Ollama, llama.cpp, vLLM, LM Studio) don't
require API keys, but the auxiliary client's _resolve_custom_runtime()
rejected endpoints with empty keys — causing the auto-detection chain
to skip the user's local server entirely.  This broke compression,
summarization, and memory flush for users running local models without
an OpenRouter/cloud API key.

The main CLI already had this fix (PR #2556, 'no-key-required'
placeholder), but the auxiliary client's resolution path was missed.

Two fixes:
- _resolve_custom_runtime(): use 'no-key-required' placeholder instead
  of returning None when base_url is present but key is empty
- resolve_provider_client() custom branch: same placeholder fallback
  for explicit_base_url without explicit_api_key

Updates 2 tests that expected the old (broken) behavior.
											
										
										
											2026-03-29 21:05:36 -07:00
+								    # Local servers (Ollama, llama.cpp, vLLM, LM Studio) don't require auth.
 								    # Use a placeholder key — the OpenAI SDK requires a non-empty string but
 								    # local servers ignore the Authorization header.  Same fix as cli.py
 								    # _ensure_runtime_credentials() (PR #2556).
 								    if not isinstance(custom_key, str) or not custom_key.strip():
 								        custom_key = "no-key-required"
-												fix(aux): honor api_mode for custom auxiliary endpoints

											
										
										
											2026-04-10 00:10:32 +02:00
+								    if not isinstance(custom_mode, str) or not custom_mode.strip():
 								        custom_mode = None
 								    return custom_base, custom_key.strip(), custom_mode
-												fix: restore config-saved custom endpoint resolution

											
										
										
											2026-03-14 20:58:12 -07:00
 								def _current_custom_base_url() -> str:
-												fix(aux): honor api_mode for custom auxiliary endpoints

											
										
										
											2026-04-10 00:10:32 +02:00
+								    custom_base, _, _ = _resolve_custom_runtime()
-												fix: restore config-saved custom endpoint resolution

											
										
										
											2026-03-14 20:58:12 -07:00
+								    return custom_base or ""
-												fix(runtime): surface malformed proxy env and base URL before client init

When proxy env vars (HTTP_PROXY, HTTPS_PROXY, ALL_PROXY) contain
malformed URLs — e.g. 'http://127.0.0.1:6153export' from a broken
shell config — the OpenAI/httpx client throws a cryptic 'Invalid port'
error that doesn't identify the offending variable.

Add _validate_proxy_env_urls() and _validate_base_url() in
auxiliary_client.py, called from resolve_provider_client() and
_create_openai_client() to fail fast with a clear, actionable error
message naming the broken env var or URL.

Closes #6360
Co-authored-by: MestreY0d4-Uninter <MestreY0d4-Uninter@users.noreply.github.com>

											
										
										
											2026-04-15 15:07:11 -07:00
+								def _validate_proxy_env_urls() -> None:
 								    """Fail fast with a clear error when proxy env vars have malformed URLs.
 								    Common cause: shell config (e.g. .zshrc) with a typo like
 								    ``export HTTP_PROXY=http://127.0.0.1:6153export NEXT_VAR=...``
 								    which concatenates 'export' into the port number.  Without this
 								    check the OpenAI/httpx client raises a cryptic ``Invalid port``
 								    error that doesn't name the offending env var.
 								    """
 								    from urllib.parse import urlparse
-												fix(agent): normalize socks:// env proxies for httpx/anthropic

WSL2 / Clash-style setups often export ALL_PROXY=socks://127.0.0.1:PORT. httpx and the Anthropic SDK reject that alias and expect socks5://, so agent startup failed early with "Unknown scheme for proxy URL" before any provider request could proceed.

Add shared normalize_proxy_url()/normalize_proxy_env_vars() helpers in utils.py and route all proxy entry points through them:
  - run_agent._get_proxy_from_env
  - agent.auxiliary_client._validate_proxy_env_urls
  - agent.anthropic_adapter.build_anthropic_client
  - gateway.platforms.base.resolve_proxy_url

Regression coverage:
  - run_agent proxy env resolution
  - auxiliary proxy env normalization
  - gateway proxy URL resolution

Verified with:
PYTEST_DISABLE_PLUGIN_AUTOLOAD=1 /home/nonlinear/.hermes/hermes-agent/venv/bin/pytest -o addopts='' -p pytest_asyncio.plugin tests/run_agent/test_create_openai_client_proxy_env.py tests/agent/test_proxy_and_url_validation.py tests/gateway/test_proxy_mode.py

39 passed.

											
										
										
											2026-04-21 17:55:04 +08:00
+								    normalize_proxy_env_vars()
-												fix(runtime): surface malformed proxy env and base URL before client init

When proxy env vars (HTTP_PROXY, HTTPS_PROXY, ALL_PROXY) contain
malformed URLs — e.g. 'http://127.0.0.1:6153export' from a broken
shell config — the OpenAI/httpx client throws a cryptic 'Invalid port'
error that doesn't identify the offending variable.

Add _validate_proxy_env_urls() and _validate_base_url() in
auxiliary_client.py, called from resolve_provider_client() and
_create_openai_client() to fail fast with a clear, actionable error
message naming the broken env var or URL.

Closes #6360
Co-authored-by: MestreY0d4-Uninter <MestreY0d4-Uninter@users.noreply.github.com>

											
										
										
											2026-04-15 15:07:11 -07:00
+								    for key in ("HTTPS_PROXY", "HTTP_PROXY", "ALL_PROXY",
 								                "https_proxy", "http_proxy", "all_proxy"):
 								        value = str(os.environ.get(key) or "").strip()
 								        if not value:
 								            continue
 								        try:
 								            parsed = urlparse(value)
 								            if parsed.scheme:
 								                _ = parsed.port          # raises ValueError for e.g. '6153export'
 								        except ValueError as exc:
 								            raise RuntimeError(
 								                f"Malformed proxy environment variable {key}={value!r}. "
 								                "Fix or unset your proxy settings and try again."
 								            ) from exc
 								def _validate_base_url(base_url: str) -> None:
 								    """Reject obviously broken custom endpoint URLs before they reach httpx."""
 								    from urllib.parse import urlparse
 								    candidate = str(base_url or "").strip()
 								    if not candidate or candidate.startswith("acp://"):
 								        return
 								    try:
 								        parsed = urlparse(candidate)
 								        if parsed.scheme in {"http", "https"}:
 								            _ = parsed.port              # raises ValueError for malformed ports
 								    except ValueError as exc:
 								        raise RuntimeError(
 								            f"Malformed custom endpoint URL: {candidate!r}. "
 								            "Run `hermes setup` or `hermes model` and enter a valid http(s) base URL."
 								        ) from exc
-												fix(anthropic): complete third-party Anthropic-compatible provider support (#12846)

Third-party gateways that speak the native Anthropic protocol (MiniMax,
Zhipu GLM, Alibaba DashScope, Kimi, LiteLLM proxies) now work end-to-end
with the same feature set as direct api.anthropic.com callers.  Synthesizes
eight stale community PRs into one consolidated change.

Five fixes:

- URL detection: consolidate three inline `endswith("/anthropic")`
  checks in runtime_provider.py into the shared _detect_api_mode_for_url
  helper.  Third-party /anthropic endpoints now auto-resolve to
  api_mode=anthropic_messages via one code path instead of three.

- OAuth leak-guard: all five sites that assign `_is_anthropic_oauth`
  (__init__, switch_model, _try_refresh_anthropic_client_credentials,
  _swap_credential, _try_activate_fallback) now gate on
  `provider == "anthropic"` so a stale ANTHROPIC_TOKEN never trips
  Claude-Code identity injection on third-party endpoints.  Previously
  only 2 of 5 sites were guarded.

- Prompt caching: new method `_anthropic_prompt_cache_policy()` returns
  `(should_cache, use_native_layout)` per endpoint.  Replaces three
  inline conditions and the `native_anthropic=(api_mode=='anthropic_messages')`
  call-site flag.  Native Anthropic and third-party Anthropic gateways
  both get the native cache_control layout; OpenRouter gets envelope
  layout.  Layout is persisted in `_primary_runtime` so fallback
  restoration preserves the per-endpoint choice.

- Auxiliary client: `_try_custom_endpoint` honors
  `api_mode=anthropic_messages` and builds `AnthropicAuxiliaryClient`
  instead of silently downgrading to an OpenAI-wire client.  Degrades
  gracefully to OpenAI-wire when the anthropic SDK isn't installed.

- Config hygiene: `_update_config_for_provider` (hermes_cli/auth.py)
  clears stale `api_key`/`api_mode` when switching to a built-in
  provider, so a previous MiniMax custom endpoint's credentials can't
  leak into a later OpenRouter session.

- Truncation continuation: length-continuation and tool-call-truncation
  retry now cover `anthropic_messages` in addition to `chat_completions`
  and `bedrock_converse`.  Reuses the existing `_build_assistant_message`
  path via `normalize_anthropic_response()` so the interim message
  shape is byte-identical to the non-truncated path.

Tests: 6 new files, 42 test cases.  Targeted run + tests/run_agent,
tests/agent, tests/hermes_cli all pass (4554 passed).

Synthesized from (credits preserved via Co-authored-by trailers):
  #7410  @nocoo           — URL detection helper
  #7393  @keyuyuan        — OAuth 5-site guard
  #7367  @n-WN            — OAuth guard (narrower cousin, kept comment)
  #8636  @sgaofen         — caching helper + native-vs-proxy layout split
  #10954 @Only-Code-A     — caching on anthropic_messages+Claude
  #7648  @zhongyueming1121 — aux client anthropic_messages branch
  #6096  @hansnow         — /model switch clears stale api_mode
  #9691  @TroyMitchell911 — anthropic_messages truncation continuation

Closes: #7366, #8294 (third-party Anthropic identity + caching).
Supersedes: #7410, #7367, #7393, #8636, #10954, #7648, #6096, #9691.
Rejects:    #9621 (OpenAI-wire caching with incomplete blocklist — risky),
            #7242 (superseded by #9691, stale branch),
            #8321 (targets smart_model_routing which was removed in #12732).

Co-authored-by: nocoo <nocoo@users.noreply.github.com>
Co-authored-by: Keyu Yuan <leoyuan0099@gmail.com>
Co-authored-by: Zoee <30841158+n-WN@users.noreply.github.com>
Co-authored-by: sgaofen <135070653+sgaofen@users.noreply.github.com>
Co-authored-by: Only-Code-A <bxzt2006@163.com>
Co-authored-by: zhongyueming <mygamez@163.com>
Co-authored-by: Xiaohan Li <hansnow@users.noreply.github.com>
Co-authored-by: Troy Mitchell <i@troy-y.org>
											
										
										
											2026-04-19 22:43:09 -07:00
+								def _try_custom_endpoint() -> Tuple[Optional[Any], Optional[str]]:
-												fix(aux): honor api_mode for custom auxiliary endpoints

											
										
										
											2026-04-10 00:10:32 +02:00
+								    runtime = _resolve_custom_runtime()
 								    if len(runtime) == 2:
 								        custom_base, custom_key = runtime
 								        custom_mode = None
 								    else:
 								        custom_base, custom_key, custom_mode = runtime
-												feat: enhance auxiliary model configuration and environment variable handling

- Added support for auxiliary model overrides in the configuration, allowing users to specify providers and models for vision and web extraction tasks.
- Updated the CLI configuration example to include new auxiliary model settings.
- Enhanced the environment variable mapping in the CLI to accommodate auxiliary model configurations.
- Improved the resolution logic for auxiliary clients to support task-specific provider overrides.
- Updated relevant documentation and comments for clarity on the new features and their usage.

											
										
										
											2026-03-07 08:52:06 -08:00
+								    if not custom_base or not custom_key:
 								        return None, None
-												fix(aux): honor api_mode for custom auxiliary endpoints

											
										
										
											2026-04-10 00:10:32 +02:00
+								    if custom_base.lower().startswith(_CODEX_AUX_BASE_URL.lower()):
 								        return None, None
-												fix: auxiliary client uses main model for custom/local endpoints instead of gpt-4o-mini (#1189)

* fix: prevent model/provider mismatch when switching providers during active gateway

When _update_config_for_provider() writes the new provider and base_url
to config.yaml, the gateway (which re-reads config per-message) can pick
up the change before model selection completes. This causes the old model
name (e.g. 'anthropic/claude-opus-4.6') to be sent to the new provider's
API (e.g. MiniMax), which fails.

Changes:
- _update_config_for_provider() now accepts an optional default_model
  parameter. When provided and the current model.default is empty or
  uses OpenRouter format (contains '/'), it sets a safe default model
  for the new provider.
- All setup.py callers for direct-API providers (zai, kimi, minimax,
  minimax-cn, anthropic) now pass a provider-appropriate default model.
- _setup_provider_model_selection() now validates the 'Keep current'
  choice: if the current model uses OpenRouter format and wouldn't work
  with the new provider, it warns and switches to the provider's first
  default model instead of silently keeping the incompatible name.

Reported by a user on Home Assistant whose gateway started sending
'anthropic/claude-opus-4.6' to MiniMax's API after running hermes setup.

* fix: auxiliary client uses main model for custom/local endpoints instead of gpt-4o-mini

When a user runs a local server (e.g. Qwen3.5-9B via OPENAI_BASE_URL),
the auxiliary client (context compression, vision, session search) would
send requests for 'gpt-4o-mini' or 'google/gemini-3-flash-preview' to
the local server, which only serves one model — causing 404 errors
mid-task.

Changes:
- _try_custom_endpoint() now reads the user's configured main model via
  _read_main_model() (checks OPENAI_MODEL → HERMES_MODEL → LLM_MODEL →
  config.yaml model.default) instead of hardcoding 'gpt-4o-mini'.
- resolve_provider_client() auto mode now detects when an OpenRouter-
  formatted model override (containing '/') would be sent to a non-
  OpenRouter provider (like a local server) and drops it in favor of
  the provider's default model.
- Test isolation fixes: properly clear env vars in 'nothing available'
  tests to prevent host environment leakage.
											
										
										
											2026-03-13 10:02:16 -07:00
+								    model = _read_main_model() or "gpt-4o-mini"
-												fix(aux): honor api_mode for custom auxiliary endpoints

											
										
										
											2026-04-10 00:10:32 +02:00
+								    logger.debug("Auxiliary client: custom endpoint (%s, api_mode=%s)", model, custom_mode or "chat_completions")
-												fix: preserve URL query params for Azure OpenAI and custom endpoints

Azure OpenAI requires an `api-version` query parameter on every request.
When users include it in the base_url (e.g. `?api-version=2025-04-01-preview`),
the OpenAI SDK silently drops it during URL construction, causing 404 errors.

Extract query params from base_url and pass them via `default_query` so the
SDK appends them to every request. This is a generic solution that works for
any custom endpoint requiring query parameters, not just Azure.

No-op for URLs without query params — fully backward compatible.

											
										
										
											2026-04-13 11:13:46 +08:00
+								    _clean_base, _dq = _extract_url_query_params(custom_base)
 								    _extra = {"default_query": _dq} if _dq else {}
-												fix(aux): honor api_mode for custom auxiliary endpoints

											
										
										
											2026-04-10 00:10:32 +02:00
+								    if custom_mode == "codex_responses":
-												fix: preserve URL query params for Azure OpenAI and custom endpoints

Azure OpenAI requires an `api-version` query parameter on every request.
When users include it in the base_url (e.g. `?api-version=2025-04-01-preview`),
the OpenAI SDK silently drops it during URL construction, causing 404 errors.

Extract query params from base_url and pass them via `default_query` so the
SDK appends them to every request. This is a generic solution that works for
any custom endpoint requiring query parameters, not just Azure.

No-op for URLs without query params — fully backward compatible.

											
										
										
											2026-04-13 11:13:46 +08:00
+								        real_client = OpenAI(api_key=custom_key, base_url=_clean_base, **_extra)
-												fix(aux): honor api_mode for custom auxiliary endpoints

											
										
										
											2026-04-10 00:10:32 +02:00
+								        return CodexAuxiliaryClient(real_client, model), model
-												fix(anthropic): complete third-party Anthropic-compatible provider support (#12846)

Third-party gateways that speak the native Anthropic protocol (MiniMax,
Zhipu GLM, Alibaba DashScope, Kimi, LiteLLM proxies) now work end-to-end
with the same feature set as direct api.anthropic.com callers.  Synthesizes
eight stale community PRs into one consolidated change.

Five fixes:

- URL detection: consolidate three inline `endswith("/anthropic")`
  checks in runtime_provider.py into the shared _detect_api_mode_for_url
  helper.  Third-party /anthropic endpoints now auto-resolve to
  api_mode=anthropic_messages via one code path instead of three.

- OAuth leak-guard: all five sites that assign `_is_anthropic_oauth`
  (__init__, switch_model, _try_refresh_anthropic_client_credentials,
  _swap_credential, _try_activate_fallback) now gate on
  `provider == "anthropic"` so a stale ANTHROPIC_TOKEN never trips
  Claude-Code identity injection on third-party endpoints.  Previously
  only 2 of 5 sites were guarded.

- Prompt caching: new method `_anthropic_prompt_cache_policy()` returns
  `(should_cache, use_native_layout)` per endpoint.  Replaces three
  inline conditions and the `native_anthropic=(api_mode=='anthropic_messages')`
  call-site flag.  Native Anthropic and third-party Anthropic gateways
  both get the native cache_control layout; OpenRouter gets envelope
  layout.  Layout is persisted in `_primary_runtime` so fallback
  restoration preserves the per-endpoint choice.

- Auxiliary client: `_try_custom_endpoint` honors
  `api_mode=anthropic_messages` and builds `AnthropicAuxiliaryClient`
  instead of silently downgrading to an OpenAI-wire client.  Degrades
  gracefully to OpenAI-wire when the anthropic SDK isn't installed.

- Config hygiene: `_update_config_for_provider` (hermes_cli/auth.py)
  clears stale `api_key`/`api_mode` when switching to a built-in
  provider, so a previous MiniMax custom endpoint's credentials can't
  leak into a later OpenRouter session.

- Truncation continuation: length-continuation and tool-call-truncation
  retry now cover `anthropic_messages` in addition to `chat_completions`
  and `bedrock_converse`.  Reuses the existing `_build_assistant_message`
  path via `normalize_anthropic_response()` so the interim message
  shape is byte-identical to the non-truncated path.

Tests: 6 new files, 42 test cases.  Targeted run + tests/run_agent,
tests/agent, tests/hermes_cli all pass (4554 passed).

Synthesized from (credits preserved via Co-authored-by trailers):
  #7410  @nocoo           — URL detection helper
  #7393  @keyuyuan        — OAuth 5-site guard
  #7367  @n-WN            — OAuth guard (narrower cousin, kept comment)
  #8636  @sgaofen         — caching helper + native-vs-proxy layout split
  #10954 @Only-Code-A     — caching on anthropic_messages+Claude
  #7648  @zhongyueming1121 — aux client anthropic_messages branch
  #6096  @hansnow         — /model switch clears stale api_mode
  #9691  @TroyMitchell911 — anthropic_messages truncation continuation

Closes: #7366, #8294 (third-party Anthropic identity + caching).
Supersedes: #7410, #7367, #7393, #8636, #10954, #7648, #6096, #9691.
Rejects:    #9621 (OpenAI-wire caching with incomplete blocklist — risky),
            #7242 (superseded by #9691, stale branch),
            #8321 (targets smart_model_routing which was removed in #12732).

Co-authored-by: nocoo <nocoo@users.noreply.github.com>
Co-authored-by: Keyu Yuan <leoyuan0099@gmail.com>
Co-authored-by: Zoee <30841158+n-WN@users.noreply.github.com>
Co-authored-by: sgaofen <135070653+sgaofen@users.noreply.github.com>
Co-authored-by: Only-Code-A <bxzt2006@163.com>
Co-authored-by: zhongyueming <mygamez@163.com>
Co-authored-by: Xiaohan Li <hansnow@users.noreply.github.com>
Co-authored-by: Troy Mitchell <i@troy-y.org>
											
										
										
											2026-04-19 22:43:09 -07:00
+								    if custom_mode == "anthropic_messages":
 								        # Third-party Anthropic-compatible gateway (MiniMax, Zhipu GLM,
 								        # LiteLLM proxies, etc.).  Must NEVER be treated as OAuth —
 								        # Anthropic OAuth claims only apply to api.anthropic.com.
 								        try:
 								            from agent.anthropic_adapter import build_anthropic_client
 								            real_client = build_anthropic_client(custom_key, custom_base)
 								        except ImportError:
 								            logger.warning(
 								                "Custom endpoint declares api_mode=anthropic_messages but the "
 								                "anthropic SDK is not installed — falling back to OpenAI-wire."
 								            )
-												fix: preserve URL query params for Azure OpenAI and custom endpoints

Azure OpenAI requires an `api-version` query parameter on every request.
When users include it in the base_url (e.g. `?api-version=2025-04-01-preview`),
the OpenAI SDK silently drops it during URL construction, causing 404 errors.

Extract query params from base_url and pass them via `default_query` so the
SDK appends them to every request. This is a generic solution that works for
any custom endpoint requiring query parameters, not just Azure.

No-op for URLs without query params — fully backward compatible.

											
										
										
											2026-04-13 11:13:46 +08:00
+								            return OpenAI(api_key=custom_key, base_url=_clean_base, **_extra), model
-												fix(anthropic): complete third-party Anthropic-compatible provider support (#12846)

Third-party gateways that speak the native Anthropic protocol (MiniMax,
Zhipu GLM, Alibaba DashScope, Kimi, LiteLLM proxies) now work end-to-end
with the same feature set as direct api.anthropic.com callers.  Synthesizes
eight stale community PRs into one consolidated change.

Five fixes:

- URL detection: consolidate three inline `endswith("/anthropic")`
  checks in runtime_provider.py into the shared _detect_api_mode_for_url
  helper.  Third-party /anthropic endpoints now auto-resolve to
  api_mode=anthropic_messages via one code path instead of three.

- OAuth leak-guard: all five sites that assign `_is_anthropic_oauth`
  (__init__, switch_model, _try_refresh_anthropic_client_credentials,
  _swap_credential, _try_activate_fallback) now gate on
  `provider == "anthropic"` so a stale ANTHROPIC_TOKEN never trips
  Claude-Code identity injection on third-party endpoints.  Previously
  only 2 of 5 sites were guarded.

- Prompt caching: new method `_anthropic_prompt_cache_policy()` returns
  `(should_cache, use_native_layout)` per endpoint.  Replaces three
  inline conditions and the `native_anthropic=(api_mode=='anthropic_messages')`
  call-site flag.  Native Anthropic and third-party Anthropic gateways
  both get the native cache_control layout; OpenRouter gets envelope
  layout.  Layout is persisted in `_primary_runtime` so fallback
  restoration preserves the per-endpoint choice.

- Auxiliary client: `_try_custom_endpoint` honors
  `api_mode=anthropic_messages` and builds `AnthropicAuxiliaryClient`
  instead of silently downgrading to an OpenAI-wire client.  Degrades
  gracefully to OpenAI-wire when the anthropic SDK isn't installed.

- Config hygiene: `_update_config_for_provider` (hermes_cli/auth.py)
  clears stale `api_key`/`api_mode` when switching to a built-in
  provider, so a previous MiniMax custom endpoint's credentials can't
  leak into a later OpenRouter session.

- Truncation continuation: length-continuation and tool-call-truncation
  retry now cover `anthropic_messages` in addition to `chat_completions`
  and `bedrock_converse`.  Reuses the existing `_build_assistant_message`
  path via `normalize_anthropic_response()` so the interim message
  shape is byte-identical to the non-truncated path.

Tests: 6 new files, 42 test cases.  Targeted run + tests/run_agent,
tests/agent, tests/hermes_cli all pass (4554 passed).

Synthesized from (credits preserved via Co-authored-by trailers):
  #7410  @nocoo           — URL detection helper
  #7393  @keyuyuan        — OAuth 5-site guard
  #7367  @n-WN            — OAuth guard (narrower cousin, kept comment)
  #8636  @sgaofen         — caching helper + native-vs-proxy layout split
  #10954 @Only-Code-A     — caching on anthropic_messages+Claude
  #7648  @zhongyueming1121 — aux client anthropic_messages branch
  #6096  @hansnow         — /model switch clears stale api_mode
  #9691  @TroyMitchell911 — anthropic_messages truncation continuation

Closes: #7366, #8294 (third-party Anthropic identity + caching).
Supersedes: #7410, #7367, #7393, #8636, #10954, #7648, #6096, #9691.
Rejects:    #9621 (OpenAI-wire caching with incomplete blocklist — risky),
            #7242 (superseded by #9691, stale branch),
            #8321 (targets smart_model_routing which was removed in #12732).

Co-authored-by: nocoo <nocoo@users.noreply.github.com>
Co-authored-by: Keyu Yuan <leoyuan0099@gmail.com>
Co-authored-by: Zoee <30841158+n-WN@users.noreply.github.com>
Co-authored-by: sgaofen <135070653+sgaofen@users.noreply.github.com>
Co-authored-by: Only-Code-A <bxzt2006@163.com>
Co-authored-by: zhongyueming <mygamez@163.com>
Co-authored-by: Xiaohan Li <hansnow@users.noreply.github.com>
Co-authored-by: Troy Mitchell <i@troy-y.org>
											
										
										
											2026-04-19 22:43:09 -07:00
+								        return (
 								            AnthropicAuxiliaryClient(real_client, model, custom_key, custom_base, is_oauth=False),
 								            model,
 								        )
-												fix: preserve URL query params for Azure OpenAI and custom endpoints

Azure OpenAI requires an `api-version` query parameter on every request.
When users include it in the base_url (e.g. `?api-version=2025-04-01-preview`),
the OpenAI SDK silently drops it during URL construction, causing 404 errors.

Extract query params from base_url and pass them via `default_query` so the
SDK appends them to every request. This is a generic solution that works for
any custom endpoint requiring query parameters, not just Azure.

No-op for URLs without query params — fully backward compatible.

											
										
										
											2026-04-13 11:13:46 +08:00
+								    return OpenAI(api_key=custom_key, base_url=_clean_base, **_extra), model
-												feat: enhance auxiliary model configuration and environment variable handling

- Added support for auxiliary model overrides in the configuration, allowing users to specify providers and models for vision and web extraction tasks.
- Updated the CLI configuration example to include new auxiliary model settings.
- Enhanced the environment variable mapping in the CLI to accommodate auxiliary model configurations.
- Improved the resolution logic for auxiliary clients to support task-specific provider overrides.
- Updated relevant documentation and comments for clarity on the new features and their usage.

											
										
										
											2026-03-07 08:52:06 -08:00
-												Hermes Agent UX Improvements

											
										
										
											2026-02-22 02:16:11 -08:00
-												feat: enhance auxiliary model configuration and environment variable handling

- Added support for auxiliary model overrides in the configuration, allowing users to specify providers and models for vision and web extraction tasks.
- Updated the CLI configuration example to include new auxiliary model settings.
- Enhanced the environment variable mapping in the CLI to accommodate auxiliary model configurations.
- Improved the resolution logic for auxiliary clients to support task-specific provider overrides.
- Updated relevant documentation and comments for clarity on the new features and their usage.

											
										
										
											2026-03-07 08:52:06 -08:00
+								def _try_codex() -> Tuple[Optional[Any], Optional[str]]:
-												feat(auth): same-provider credential pools with rotation, custom endpoint support, and interactive CLI (#2647)

* feat(auth): add same-provider credential pools and rotation UX

Add same-provider credential pooling so Hermes can rotate across
multiple credentials for a single provider, recover from exhausted
credentials without jumping providers immediately, and configure
that behavior directly in hermes setup.

- agent/credential_pool.py: persisted per-provider credential pools
- hermes auth add/list/remove/reset CLI commands
- 429/402/401 recovery with pool rotation in run_agent.py
- Setup wizard integration for pool strategy configuration
- Auto-seeding from env vars and existing OAuth state

Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
Salvaged from PR #2647

* fix(tests): prevent pool auto-seeding from host env in credential pool tests

Tests for non-pool Anthropic paths and auth remove were failing when
host env vars (ANTHROPIC_API_KEY) or file-backed OAuth credentials
were present. The pool auto-seeding picked these up, causing unexpected
pool entries in tests.

- Mock _select_pool_entry in auxiliary_client OAuth flag tests
- Clear Anthropic env vars and mock _seed_from_singletons in auth remove test

* feat(auth): add thread safety, least_used strategy, and request counting

- Add threading.Lock to CredentialPool for gateway thread safety
  (concurrent requests from multiple gateway sessions could race on
  pool state mutations without this)
- Add 'least_used' rotation strategy that selects the credential
  with the lowest request_count, distributing load more evenly
- Add request_count field to PooledCredential for usage tracking
- Add mark_used() method to increment per-credential request counts
- Wrap select(), mark_exhausted_and_rotate(), and try_refresh_current()
  with lock acquisition
- Add tests: least_used selection, mark_used counting, concurrent
  thread safety (4 threads × 20 selects with no corruption)

* feat(auth): add interactive mode for bare 'hermes auth' command

When 'hermes auth' is called without a subcommand, it now launches an
interactive wizard that:

1. Shows full credential pool status across all providers
2. Offers a menu: add, remove, reset cooldowns, set strategy
3. For OAuth-capable providers (anthropic, nous, openai-codex), the
   add flow explicitly asks 'API key or OAuth login?' — making it
   clear that both auth types are supported for the same provider
4. Strategy picker shows all 4 options (fill_first, round_robin,
   least_used, random) with the current selection marked
5. Remove flow shows entries with indices for easy selection

The subcommand paths (hermes auth add/list/remove/reset) still work
exactly as before for scripted/non-interactive use.

* fix(tests): update runtime_provider tests for config.yaml source of truth (#4165)

Tests were using OPENAI_BASE_URL env var which is no longer consulted
after #4165. Updated to use model config (provider, base_url, api_key)
which is the new single source of truth for custom endpoint URLs.

* feat(auth): support custom endpoint credential pools keyed by provider name

Custom OpenAI-compatible endpoints all share provider='custom', making
the provider-keyed pool useless. Now pools for custom endpoints are
keyed by 'custom:<normalized_name>' where the name comes from the
custom_providers config list (auto-generated from URL hostname).

- Pool key format: 'custom:together.ai', 'custom:local-(localhost:8080)'
- load_pool('custom:name') seeds from custom_providers api_key AND
  model.api_key when base_url matches
- hermes auth add/list now shows custom endpoints alongside registry
  providers
- _resolve_openrouter_runtime and _resolve_named_custom_runtime check
  pool before falling back to single config key
- 6 new tests covering custom pool keying, seeding, and listing

* docs: add Excalidraw diagram of full credential pool flow

Comprehensive architecture diagram showing:
- Credential sources (env vars, auth.json OAuth, config.yaml, CLI)
- Pool storage and auto-seeding
- Runtime resolution paths (registry, custom, OpenRouter)
- Error recovery (429 retry-then-rotate, 402 immediate, 401 refresh)
- CLI management commands and strategy configuration

Open at: https://excalidraw.com/#json=2Ycqhqpi6f12E_3ITyiwh,c7u9jSt5BwrmiVzHGbm87g

* fix(tests): update setup wizard pool tests for unified select_provider_and_model flow

The setup wizard now delegates to select_provider_and_model() instead
of using its own prompt_choice-based provider picker. Tests needed:
- Mock select_provider_and_model as no-op (provider pre-written to config)
- Call _stub_tts BEFORE custom prompt_choice mock (it overwrites it)
- Pre-write model.provider to config so the pool step is reached

* docs: add comprehensive credential pool documentation

- New page: website/docs/user-guide/features/credential-pools.md
  Full guide covering quick start, CLI commands, rotation strategies,
  error recovery, custom endpoint pools, auto-discovery, thread safety,
  architecture, and storage format.
- Updated fallback-providers.md to reference credential pools as the
  first layer of resilience (same-provider rotation before cross-provider)
- Added hermes auth to CLI commands reference with usage examples
- Added credential_pool_strategies to configuration guide

* chore: remove excalidraw diagram from repo (external link only)

* refactor: simplify credential pool code — extract helpers, collapse extras, dedup patterns

- _load_config_safe(): replace 4 identical try/except/import blocks
- _iter_custom_providers(): shared generator for custom provider iteration
- PooledCredential.extra dict: collapse 11 round-trip-only fields
  (token_type, scope, client_id, portal_base_url, obtained_at,
  expires_in, agent_key_id, agent_key_expires_in, agent_key_reused,
  agent_key_obtained_at, tls) into a single extra dict with
  __getattr__ for backward-compatible access
- _available_entries(): shared exhaustion-check between select and peek
- Dedup anthropic OAuth seeding (hermes_pkce + claude_code identical)
- SimpleNamespace replaces class _Args boilerplate in auth_commands
- _try_resolve_from_custom_pool(): shared pool-check in runtime_provider

Net -17 lines. All 383 targeted tests pass.

---------

Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
											
										
										
											2026-03-31 03:10:01 -07:00
+								    pool_present, entry = _select_pool_entry("openai-codex")
 								    if pool_present:
 								        codex_token = _pool_runtime_api_key(entry)
-												fix: restore codex fallback auth-store lookup

											
										
										
											2026-04-09 00:04:30 -07:00
+								        if codex_token:
 								            base_url = _pool_runtime_base_url(entry, _CODEX_AUX_BASE_URL) or _CODEX_AUX_BASE_URL
 								        else:
 								            codex_token = _read_codex_access_token()
 								            if not codex_token:
 								                return None, None
 								            base_url = _CODEX_AUX_BASE_URL
-												feat(auth): same-provider credential pools with rotation, custom endpoint support, and interactive CLI (#2647)

* feat(auth): add same-provider credential pools and rotation UX

Add same-provider credential pooling so Hermes can rotate across
multiple credentials for a single provider, recover from exhausted
credentials without jumping providers immediately, and configure
that behavior directly in hermes setup.

- agent/credential_pool.py: persisted per-provider credential pools
- hermes auth add/list/remove/reset CLI commands
- 429/402/401 recovery with pool rotation in run_agent.py
- Setup wizard integration for pool strategy configuration
- Auto-seeding from env vars and existing OAuth state

Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
Salvaged from PR #2647

* fix(tests): prevent pool auto-seeding from host env in credential pool tests

Tests for non-pool Anthropic paths and auth remove were failing when
host env vars (ANTHROPIC_API_KEY) or file-backed OAuth credentials
were present. The pool auto-seeding picked these up, causing unexpected
pool entries in tests.

- Mock _select_pool_entry in auxiliary_client OAuth flag tests
- Clear Anthropic env vars and mock _seed_from_singletons in auth remove test

* feat(auth): add thread safety, least_used strategy, and request counting

- Add threading.Lock to CredentialPool for gateway thread safety
  (concurrent requests from multiple gateway sessions could race on
  pool state mutations without this)
- Add 'least_used' rotation strategy that selects the credential
  with the lowest request_count, distributing load more evenly
- Add request_count field to PooledCredential for usage tracking
- Add mark_used() method to increment per-credential request counts
- Wrap select(), mark_exhausted_and_rotate(), and try_refresh_current()
  with lock acquisition
- Add tests: least_used selection, mark_used counting, concurrent
  thread safety (4 threads × 20 selects with no corruption)

* feat(auth): add interactive mode for bare 'hermes auth' command

When 'hermes auth' is called without a subcommand, it now launches an
interactive wizard that:

1. Shows full credential pool status across all providers
2. Offers a menu: add, remove, reset cooldowns, set strategy
3. For OAuth-capable providers (anthropic, nous, openai-codex), the
   add flow explicitly asks 'API key or OAuth login?' — making it
   clear that both auth types are supported for the same provider
4. Strategy picker shows all 4 options (fill_first, round_robin,
   least_used, random) with the current selection marked
5. Remove flow shows entries with indices for easy selection

The subcommand paths (hermes auth add/list/remove/reset) still work
exactly as before for scripted/non-interactive use.

* fix(tests): update runtime_provider tests for config.yaml source of truth (#4165)

Tests were using OPENAI_BASE_URL env var which is no longer consulted
after #4165. Updated to use model config (provider, base_url, api_key)
which is the new single source of truth for custom endpoint URLs.

* feat(auth): support custom endpoint credential pools keyed by provider name

Custom OpenAI-compatible endpoints all share provider='custom', making
the provider-keyed pool useless. Now pools for custom endpoints are
keyed by 'custom:<normalized_name>' where the name comes from the
custom_providers config list (auto-generated from URL hostname).

- Pool key format: 'custom:together.ai', 'custom:local-(localhost:8080)'
- load_pool('custom:name') seeds from custom_providers api_key AND
  model.api_key when base_url matches
- hermes auth add/list now shows custom endpoints alongside registry
  providers
- _resolve_openrouter_runtime and _resolve_named_custom_runtime check
  pool before falling back to single config key
- 6 new tests covering custom pool keying, seeding, and listing

* docs: add Excalidraw diagram of full credential pool flow

Comprehensive architecture diagram showing:
- Credential sources (env vars, auth.json OAuth, config.yaml, CLI)
- Pool storage and auto-seeding
- Runtime resolution paths (registry, custom, OpenRouter)
- Error recovery (429 retry-then-rotate, 402 immediate, 401 refresh)
- CLI management commands and strategy configuration

Open at: https://excalidraw.com/#json=2Ycqhqpi6f12E_3ITyiwh,c7u9jSt5BwrmiVzHGbm87g

* fix(tests): update setup wizard pool tests for unified select_provider_and_model flow

The setup wizard now delegates to select_provider_and_model() instead
of using its own prompt_choice-based provider picker. Tests needed:
- Mock select_provider_and_model as no-op (provider pre-written to config)
- Call _stub_tts BEFORE custom prompt_choice mock (it overwrites it)
- Pre-write model.provider to config so the pool step is reached

* docs: add comprehensive credential pool documentation

- New page: website/docs/user-guide/features/credential-pools.md
  Full guide covering quick start, CLI commands, rotation strategies,
  error recovery, custom endpoint pools, auto-discovery, thread safety,
  architecture, and storage format.
- Updated fallback-providers.md to reference credential pools as the
  first layer of resilience (same-provider rotation before cross-provider)
- Added hermes auth to CLI commands reference with usage examples
- Added credential_pool_strategies to configuration guide

* chore: remove excalidraw diagram from repo (external link only)

* refactor: simplify credential pool code — extract helpers, collapse extras, dedup patterns

- _load_config_safe(): replace 4 identical try/except/import blocks
- _iter_custom_providers(): shared generator for custom provider iteration
- PooledCredential.extra dict: collapse 11 round-trip-only fields
  (token_type, scope, client_id, portal_base_url, obtained_at,
  expires_in, agent_key_id, agent_key_expires_in, agent_key_reused,
  agent_key_obtained_at, tls) into a single extra dict with
  __getattr__ for backward-compatible access
- _available_entries(): shared exhaustion-check between select and peek
- Dedup anthropic OAuth seeding (hermes_pkce + claude_code identical)
- SimpleNamespace replaces class _Args boilerplate in auth_commands
- _try_resolve_from_custom_pool(): shared pool-check in runtime_provider

Net -17 lines. All 383 targeted tests pass.

---------

Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
											
										
										
											2026-03-31 03:10:01 -07:00
+								    else:
 								        codex_token = _read_codex_access_token()
 								        if not codex_token:
 								            return None, None
 								        base_url = _CODEX_AUX_BASE_URL
-												feat: enhance auxiliary model configuration and environment variable handling

- Added support for auxiliary model overrides in the configuration, allowing users to specify providers and models for vision and web extraction tasks.
- Updated the CLI configuration example to include new auxiliary model settings.
- Enhanced the environment variable mapping in the CLI to accommodate auxiliary model configurations.
- Improved the resolution logic for auxiliary clients to support task-specific provider overrides.
- Updated relevant documentation and comments for clarity on the new features and their usage.

											
										
										
											2026-03-07 08:52:06 -08:00
+								    logger.debug("Auxiliary client: Codex OAuth (%s via Responses API)", _CODEX_AUX_MODEL)
-												fix(codex): pin correct Cloudflare headers and extend to auxiliary client

The cherry-picked salvage (admin28980's commit) added codex headers only on the
primary chat client path, with two inaccuracies:

  - originator was 'hermes-agent' — Cloudflare whitelists codex_cli_rs,
    codex_vscode, codex_sdk_ts, and Codex* prefixes. 'hermes-agent' isn't on
    the list, so the header had no mitigating effect on the 403 (the
    account-id header alone may have been carrying the fix).
  - account-id header was 'ChatGPT-Account-Id' — upstream codex-rs auth.rs
    uses canonical 'ChatGPT-Account-ID' (PascalCase, trailing -ID).

Also, the auxiliary client (_try_codex + resolve_provider_client raw_codex
branch) constructs OpenAI clients against the same chatgpt.com endpoint with
no default headers at all — so compression, title generation, vision, session
search, and web_extract all still 403 from VPS IPs.

Consolidate the header set into _codex_cloudflare_headers() in
agent/auxiliary_client.py (natural home next to _read_codex_access_token and
the existing JWT decode logic) and call it from all four insertion points:

  - run_agent.py: AIAgent.__init__ (initial construction)
  - run_agent.py: _apply_client_headers_for_base_url (credential rotation)
  - agent/auxiliary_client.py: _try_codex (aux client)
  - agent/auxiliary_client.py: resolve_provider_client raw_codex branch

Net: -36/+55 lines, -25 lines of duplicated inline JWT decode replaced by a
single helper. User-Agent switched to 'codex_cli_rs/0.0.0 (Hermes Agent)' to
match the codex-rs shape while keeping product attribution.

Tests in tests/agent/test_codex_cloudflare_headers.py cover:
  - originator value, User-Agent shape, canonical header casing
  - account-ID extraction from a real JWT fixture
  - graceful handling of malformed / non-string / claim-missing tokens
  - wiring at all four insertion points (primary init, rotation, both aux paths)
  - non-chatgpt base URLs (openrouter) do NOT get codex headers
  - switching away from chatgpt.com drops the headers

											
										
										
											2026-04-19 11:58:15 -07:00
+								    real_client = OpenAI(
 								        api_key=codex_token,
 								        base_url=base_url,
 								        default_headers=_codex_cloudflare_headers(codex_token),
 								    )
-												feat: enhance auxiliary model configuration and environment variable handling

- Added support for auxiliary model overrides in the configuration, allowing users to specify providers and models for vision and web extraction tasks.
- Updated the CLI configuration example to include new auxiliary model settings.
- Enhanced the environment variable mapping in the CLI to accommodate auxiliary model configurations.
- Improved the resolution logic for auxiliary clients to support task-specific provider overrides.
- Updated relevant documentation and comments for clarity on the new features and their usage.

											
										
										
											2026-03-07 08:52:06 -08:00
+								    return CodexAuxiliaryClient(real_client, _CODEX_AUX_MODEL), _CODEX_AUX_MODEL
-												feat: add native Anthropic auxiliary vision

											
										
										
											2026-03-14 21:14:20 -07:00
+								def _try_anthropic() -> Tuple[Optional[Any], Optional[str]]:
 								    try:
 								        from agent.anthropic_adapter import build_anthropic_client, resolve_anthropic_token
 								    except ImportError:
 								        return None, None
-												feat(auth): same-provider credential pools with rotation, custom endpoint support, and interactive CLI (#2647)

* feat(auth): add same-provider credential pools and rotation UX

Add same-provider credential pooling so Hermes can rotate across
multiple credentials for a single provider, recover from exhausted
credentials without jumping providers immediately, and configure
that behavior directly in hermes setup.

- agent/credential_pool.py: persisted per-provider credential pools
- hermes auth add/list/remove/reset CLI commands
- 429/402/401 recovery with pool rotation in run_agent.py
- Setup wizard integration for pool strategy configuration
- Auto-seeding from env vars and existing OAuth state

Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
Salvaged from PR #2647

* fix(tests): prevent pool auto-seeding from host env in credential pool tests

Tests for non-pool Anthropic paths and auth remove were failing when
host env vars (ANTHROPIC_API_KEY) or file-backed OAuth credentials
were present. The pool auto-seeding picked these up, causing unexpected
pool entries in tests.

- Mock _select_pool_entry in auxiliary_client OAuth flag tests
- Clear Anthropic env vars and mock _seed_from_singletons in auth remove test

* feat(auth): add thread safety, least_used strategy, and request counting

- Add threading.Lock to CredentialPool for gateway thread safety
  (concurrent requests from multiple gateway sessions could race on
  pool state mutations without this)
- Add 'least_used' rotation strategy that selects the credential
  with the lowest request_count, distributing load more evenly
- Add request_count field to PooledCredential for usage tracking
- Add mark_used() method to increment per-credential request counts
- Wrap select(), mark_exhausted_and_rotate(), and try_refresh_current()
  with lock acquisition
- Add tests: least_used selection, mark_used counting, concurrent
  thread safety (4 threads × 20 selects with no corruption)

* feat(auth): add interactive mode for bare 'hermes auth' command

When 'hermes auth' is called without a subcommand, it now launches an
interactive wizard that:

1. Shows full credential pool status across all providers
2. Offers a menu: add, remove, reset cooldowns, set strategy
3. For OAuth-capable providers (anthropic, nous, openai-codex), the
   add flow explicitly asks 'API key or OAuth login?' — making it
   clear that both auth types are supported for the same provider
4. Strategy picker shows all 4 options (fill_first, round_robin,
   least_used, random) with the current selection marked
5. Remove flow shows entries with indices for easy selection

The subcommand paths (hermes auth add/list/remove/reset) still work
exactly as before for scripted/non-interactive use.

* fix(tests): update runtime_provider tests for config.yaml source of truth (#4165)

Tests were using OPENAI_BASE_URL env var which is no longer consulted
after #4165. Updated to use model config (provider, base_url, api_key)
which is the new single source of truth for custom endpoint URLs.

* feat(auth): support custom endpoint credential pools keyed by provider name

Custom OpenAI-compatible endpoints all share provider='custom', making
the provider-keyed pool useless. Now pools for custom endpoints are
keyed by 'custom:<normalized_name>' where the name comes from the
custom_providers config list (auto-generated from URL hostname).

- Pool key format: 'custom:together.ai', 'custom:local-(localhost:8080)'
- load_pool('custom:name') seeds from custom_providers api_key AND
  model.api_key when base_url matches
- hermes auth add/list now shows custom endpoints alongside registry
  providers
- _resolve_openrouter_runtime and _resolve_named_custom_runtime check
  pool before falling back to single config key
- 6 new tests covering custom pool keying, seeding, and listing

* docs: add Excalidraw diagram of full credential pool flow

Comprehensive architecture diagram showing:
- Credential sources (env vars, auth.json OAuth, config.yaml, CLI)
- Pool storage and auto-seeding
- Runtime resolution paths (registry, custom, OpenRouter)
- Error recovery (429 retry-then-rotate, 402 immediate, 401 refresh)
- CLI management commands and strategy configuration

Open at: https://excalidraw.com/#json=2Ycqhqpi6f12E_3ITyiwh,c7u9jSt5BwrmiVzHGbm87g

* fix(tests): update setup wizard pool tests for unified select_provider_and_model flow

The setup wizard now delegates to select_provider_and_model() instead
of using its own prompt_choice-based provider picker. Tests needed:
- Mock select_provider_and_model as no-op (provider pre-written to config)
- Call _stub_tts BEFORE custom prompt_choice mock (it overwrites it)
- Pre-write model.provider to config so the pool step is reached

* docs: add comprehensive credential pool documentation

- New page: website/docs/user-guide/features/credential-pools.md
  Full guide covering quick start, CLI commands, rotation strategies,
  error recovery, custom endpoint pools, auto-discovery, thread safety,
  architecture, and storage format.
- Updated fallback-providers.md to reference credential pools as the
  first layer of resilience (same-provider rotation before cross-provider)
- Added hermes auth to CLI commands reference with usage examples
- Added credential_pool_strategies to configuration guide

* chore: remove excalidraw diagram from repo (external link only)

* refactor: simplify credential pool code — extract helpers, collapse extras, dedup patterns

- _load_config_safe(): replace 4 identical try/except/import blocks
- _iter_custom_providers(): shared generator for custom provider iteration
- PooledCredential.extra dict: collapse 11 round-trip-only fields
  (token_type, scope, client_id, portal_base_url, obtained_at,
  expires_in, agent_key_id, agent_key_expires_in, agent_key_reused,
  agent_key_obtained_at, tls) into a single extra dict with
  __getattr__ for backward-compatible access
- _available_entries(): shared exhaustion-check between select and peek
- Dedup anthropic OAuth seeding (hermes_pkce + claude_code identical)
- SimpleNamespace replaces class _Args boilerplate in auth_commands
- _try_resolve_from_custom_pool(): shared pool-check in runtime_provider

Net -17 lines. All 383 targeted tests pass.

---------

Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
											
										
										
											2026-03-31 03:10:01 -07:00
+								    pool_present, entry = _select_pool_entry("anthropic")
 								    if pool_present:
 								        if entry is None:
 								            return None, None
 								        token = _pool_runtime_api_key(entry)
 								    else:
 								        entry = None
 								        token = resolve_anthropic_token()
-												feat: add native Anthropic auxiliary vision

											
										
										
											2026-03-14 21:14:20 -07:00
+								    if not token:
 								        return None, None
-												fix(provider): prevent Anthropic fallback from inheriting non-Anthropic base_url

Only honor config.model.base_url for Anthropic resolution when
config.model.provider is actually "anthropic". This prevents a Codex
(or other provider) base_url from leaking into Anthropic runtime and
auxiliary client paths, which would send  requests to the wrong
endpoint.

Closes #2384

											
										
										
											2026-03-21 16:16:17 -07:00
+								    # Allow base URL override from config.yaml model.base_url, but only
 								    # when the configured provider is anthropic — otherwise a non-Anthropic
 								    # base_url (e.g. Codex endpoint) would leak into Anthropic requests.
-												feat(auth): same-provider credential pools with rotation, custom endpoint support, and interactive CLI (#2647)

* feat(auth): add same-provider credential pools and rotation UX

Add same-provider credential pooling so Hermes can rotate across
multiple credentials for a single provider, recover from exhausted
credentials without jumping providers immediately, and configure
that behavior directly in hermes setup.

- agent/credential_pool.py: persisted per-provider credential pools
- hermes auth add/list/remove/reset CLI commands
- 429/402/401 recovery with pool rotation in run_agent.py
- Setup wizard integration for pool strategy configuration
- Auto-seeding from env vars and existing OAuth state

Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
Salvaged from PR #2647

* fix(tests): prevent pool auto-seeding from host env in credential pool tests

Tests for non-pool Anthropic paths and auth remove were failing when
host env vars (ANTHROPIC_API_KEY) or file-backed OAuth credentials
were present. The pool auto-seeding picked these up, causing unexpected
pool entries in tests.

- Mock _select_pool_entry in auxiliary_client OAuth flag tests
- Clear Anthropic env vars and mock _seed_from_singletons in auth remove test

* feat(auth): add thread safety, least_used strategy, and request counting

- Add threading.Lock to CredentialPool for gateway thread safety
  (concurrent requests from multiple gateway sessions could race on
  pool state mutations without this)
- Add 'least_used' rotation strategy that selects the credential
  with the lowest request_count, distributing load more evenly
- Add request_count field to PooledCredential for usage tracking
- Add mark_used() method to increment per-credential request counts
- Wrap select(), mark_exhausted_and_rotate(), and try_refresh_current()
  with lock acquisition
- Add tests: least_used selection, mark_used counting, concurrent
  thread safety (4 threads × 20 selects with no corruption)

* feat(auth): add interactive mode for bare 'hermes auth' command

When 'hermes auth' is called without a subcommand, it now launches an
interactive wizard that:

1. Shows full credential pool status across all providers
2. Offers a menu: add, remove, reset cooldowns, set strategy
3. For OAuth-capable providers (anthropic, nous, openai-codex), the
   add flow explicitly asks 'API key or OAuth login?' — making it
   clear that both auth types are supported for the same provider
4. Strategy picker shows all 4 options (fill_first, round_robin,
   least_used, random) with the current selection marked
5. Remove flow shows entries with indices for easy selection

The subcommand paths (hermes auth add/list/remove/reset) still work
exactly as before for scripted/non-interactive use.

* fix(tests): update runtime_provider tests for config.yaml source of truth (#4165)

Tests were using OPENAI_BASE_URL env var which is no longer consulted
after #4165. Updated to use model config (provider, base_url, api_key)
which is the new single source of truth for custom endpoint URLs.

* feat(auth): support custom endpoint credential pools keyed by provider name

Custom OpenAI-compatible endpoints all share provider='custom', making
the provider-keyed pool useless. Now pools for custom endpoints are
keyed by 'custom:<normalized_name>' where the name comes from the
custom_providers config list (auto-generated from URL hostname).

- Pool key format: 'custom:together.ai', 'custom:local-(localhost:8080)'
- load_pool('custom:name') seeds from custom_providers api_key AND
  model.api_key when base_url matches
- hermes auth add/list now shows custom endpoints alongside registry
  providers
- _resolve_openrouter_runtime and _resolve_named_custom_runtime check
  pool before falling back to single config key
- 6 new tests covering custom pool keying, seeding, and listing

* docs: add Excalidraw diagram of full credential pool flow

Comprehensive architecture diagram showing:
- Credential sources (env vars, auth.json OAuth, config.yaml, CLI)
- Pool storage and auto-seeding
- Runtime resolution paths (registry, custom, OpenRouter)
- Error recovery (429 retry-then-rotate, 402 immediate, 401 refresh)
- CLI management commands and strategy configuration

Open at: https://excalidraw.com/#json=2Ycqhqpi6f12E_3ITyiwh,c7u9jSt5BwrmiVzHGbm87g

* fix(tests): update setup wizard pool tests for unified select_provider_and_model flow

The setup wizard now delegates to select_provider_and_model() instead
of using its own prompt_choice-based provider picker. Tests needed:
- Mock select_provider_and_model as no-op (provider pre-written to config)
- Call _stub_tts BEFORE custom prompt_choice mock (it overwrites it)
- Pre-write model.provider to config so the pool step is reached

* docs: add comprehensive credential pool documentation

- New page: website/docs/user-guide/features/credential-pools.md
  Full guide covering quick start, CLI commands, rotation strategies,
  error recovery, custom endpoint pools, auto-discovery, thread safety,
  architecture, and storage format.
- Updated fallback-providers.md to reference credential pools as the
  first layer of resilience (same-provider rotation before cross-provider)
- Added hermes auth to CLI commands reference with usage examples
- Added credential_pool_strategies to configuration guide

* chore: remove excalidraw diagram from repo (external link only)

* refactor: simplify credential pool code — extract helpers, collapse extras, dedup patterns

- _load_config_safe(): replace 4 identical try/except/import blocks
- _iter_custom_providers(): shared generator for custom provider iteration
- PooledCredential.extra dict: collapse 11 round-trip-only fields
  (token_type, scope, client_id, portal_base_url, obtained_at,
  expires_in, agent_key_id, agent_key_expires_in, agent_key_reused,
  agent_key_obtained_at, tls) into a single extra dict with
  __getattr__ for backward-compatible access
- _available_entries(): shared exhaustion-check between select and peek
- Dedup anthropic OAuth seeding (hermes_pkce + claude_code identical)
- SimpleNamespace replaces class _Args boilerplate in auth_commands
- _try_resolve_from_custom_pool(): shared pool-check in runtime_provider

Net -17 lines. All 383 targeted tests pass.

---------

Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
											
										
										
											2026-03-31 03:10:01 -07:00
+								    base_url = _pool_runtime_base_url(entry, _ANTHROPIC_DEFAULT_BASE_URL) if pool_present else _ANTHROPIC_DEFAULT_BASE_URL
-												fix: respect config.yaml model.base_url for Anthropic provider (#1948) (#1998)

After #1675 removed ANTHROPIC_BASE_URL env var support, the Anthropic
provider base URL was hardcoded to https://api.anthropic.com. Now reads
model.base_url from config.yaml as an override, falling back to the
default when not set. Also applies to the auxiliary client.

Cherry-picked from PR #1949 by @rivercrab26.

Co-authored-by: rivercrab26 <rivercrab26@users.noreply.github.com>
											
										
										
											2026-03-18 16:51:24 -07:00
+								    try:
 								        from hermes_cli.config import load_config
 								        cfg = load_config()
 								        model_cfg = cfg.get("model")
 								        if isinstance(model_cfg, dict):
-												fix(provider): prevent Anthropic fallback from inheriting non-Anthropic base_url

Only honor config.model.base_url for Anthropic resolution when
config.model.provider is actually "anthropic". This prevents a Codex
(or other provider) base_url from leaking into Anthropic runtime and
auxiliary client paths, which would send  requests to the wrong
endpoint.

Closes #2384

											
										
										
											2026-03-21 16:16:17 -07:00
+								            cfg_provider = str(model_cfg.get("provider") or "").strip().lower()
 								            if cfg_provider == "anthropic":
 								                cfg_base_url = (model_cfg.get("base_url") or "").strip().rstrip("/")
 								                if cfg_base_url:
 								                    base_url = cfg_base_url
-												fix: respect config.yaml model.base_url for Anthropic provider (#1948) (#1998)

After #1675 removed ANTHROPIC_BASE_URL env var support, the Anthropic
provider base URL was hardcoded to https://api.anthropic.com. Now reads
model.base_url from config.yaml as an override, falling back to the
default when not set. Also applies to the auxiliary client.

Cherry-picked from PR #1949 by @rivercrab26.

Co-authored-by: rivercrab26 <rivercrab26@users.noreply.github.com>
											
										
										
											2026-03-18 16:51:24 -07:00
+								    except Exception:
 								        pass
-												fix: auxiliary client skips expired Codex JWT and propagates Anthropic OAuth flag

Two bugs in the auxiliary provider auto-detection chain:

1. Expired Codex JWT blocks the auto chain: _read_codex_access_token()
   returned any stored token without checking expiry, preventing fallback
   to working providers. Now decodes JWT exp claim and returns None for
   expired tokens.

2. Auxiliary Anthropic client missing OAuth identity transforms:
   _AnthropicCompletionsAdapter always called build_anthropic_kwargs with
   is_oauth=False, causing 400 errors for OAuth tokens. Now detects OAuth
   tokens via _is_oauth_token() and propagates the flag through the
   adapter chain.

Cherry-picked from PR #2378 by 0xbyt4. Fixed test_api_key_no_oauth_flag
to mock resolve_anthropic_token directly (env var alone was insufficient).

											
										
										
											2026-03-21 17:36:25 -07:00
+								    from agent.anthropic_adapter import _is_oauth_token
 								    is_oauth = _is_oauth_token(token)
-												feat: add native Anthropic auxiliary vision

											
										
										
											2026-03-14 21:14:20 -07:00
+								    model = _API_KEY_PROVIDER_AUX_MODELS.get("anthropic", "claude-haiku-4-5-20251001")
-												fix: auxiliary client skips expired Codex JWT and propagates Anthropic OAuth flag

Two bugs in the auxiliary provider auto-detection chain:

1. Expired Codex JWT blocks the auto chain: _read_codex_access_token()
   returned any stored token without checking expiry, preventing fallback
   to working providers. Now decodes JWT exp claim and returns None for
   expired tokens.

2. Auxiliary Anthropic client missing OAuth identity transforms:
   _AnthropicCompletionsAdapter always called build_anthropic_kwargs with
   is_oauth=False, causing 400 errors for OAuth tokens. Now detects OAuth
   tokens via _is_oauth_token() and propagates the flag through the
   adapter chain.

Cherry-picked from PR #2378 by 0xbyt4. Fixed test_api_key_no_oauth_flag
to mock resolve_anthropic_token directly (env var alone was insufficient).

											
										
										
											2026-03-21 17:36:25 -07:00
+								    logger.debug("Auxiliary client: Anthropic native (%s) at %s (oauth=%s)", model, base_url, is_oauth)
-												fix(auxiliary): catch ImportError from build_anthropic_client in vision auto-detection (#3312)

_try_anthropic() caught ImportError on the module import (line 667-669)
but not on the build_anthropic_client() call (line 696). When the
anthropic_adapter module imports fine but the anthropic SDK is missing,
build_anthropic_client() raises ImportError at call time. This escaped
_try_anthropic() entirely, killing get_available_vision_backends() and
cascading to 7 test failures:

- 4 setup wizard tests hit unexpected 'Configure vision:' prompt
- 3 codex-auth-as-vision tests failed check_vision_requirements()

The fix wraps the build_anthropic_client call in try/except ImportError,
returning (None, None) when the SDK is unavailable — consistent with the
existing guard at the top of the function.
											
										
										
											2026-03-26 18:21:59 -07:00
+								    try:
 								        real_client = build_anthropic_client(token, base_url)
 								    except ImportError:
 								        # The anthropic_adapter module imports fine but the SDK itself is
 								        # missing — build_anthropic_client raises ImportError at call time
 								        # when _anthropic_sdk is None.  Treat as unavailable.
 								        return None, None
-												fix: auxiliary client skips expired Codex JWT and propagates Anthropic OAuth flag

Two bugs in the auxiliary provider auto-detection chain:

1. Expired Codex JWT blocks the auto chain: _read_codex_access_token()
   returned any stored token without checking expiry, preventing fallback
   to working providers. Now decodes JWT exp claim and returns None for
   expired tokens.

2. Auxiliary Anthropic client missing OAuth identity transforms:
   _AnthropicCompletionsAdapter always called build_anthropic_kwargs with
   is_oauth=False, causing 400 errors for OAuth tokens. Now detects OAuth
   tokens via _is_oauth_token() and propagates the flag through the
   adapter chain.

Cherry-picked from PR #2378 by 0xbyt4. Fixed test_api_key_no_oauth_flag
to mock resolve_anthropic_token directly (env var alone was insufficient).

											
										
										
											2026-03-21 17:36:25 -07:00
+								    return AnthropicAuxiliaryClient(real_client, model, token, base_url, is_oauth=is_oauth), model
-												feat: add native Anthropic auxiliary vision

											
										
										
											2026-03-14 21:14:20 -07:00
-												fix: add INFO-level logging for auxiliary provider resolution (#3866)

The auxiliary client's auto-detection chain was a black box — when
compression, summarization, or memory flush failed, the only clue was
a generic 'Request timed out' with no indication of which provider was
tried or why it was skipped.

Now logs at INFO level:
- 'Auxiliary auto-detect: using local/custom (qwen3.5-9b) — skipped:
  openrouter, nous' when auto-detection picks a provider
- 'Auxiliary compression: using auto (qwen3.5-9b) at http://localhost:11434/v1'
  before each auxiliary call
- 'Auxiliary compression: provider custom unavailable, falling back to
  openrouter' on fallback
- Clear warning with actionable guidance when NO provider is available:
  'Set OPENROUTER_API_KEY or configure a local model in config.yaml'
											
										
										
											2026-03-29 21:29:00 -07:00
+								_AUTO_PROVIDER_LABELS = {
 								    "_try_openrouter": "openrouter",
 								    "_try_nous": "nous",
 								    "_try_custom_endpoint": "local/custom",
 								    "_try_codex": "openai-codex",
 								    "_resolve_api_key_provider": "api-key",
 								}
-												fix(agent): route compression aux through live session runtime

											
										
										
											2026-04-12 00:10:19 -04:00
+								_MAIN_RUNTIME_FIELDS = ("provider", "model", "base_url", "api_key", "api_mode")
 								def _normalize_main_runtime(main_runtime: Optional[Dict[str, Any]]) -> Dict[str, str]:
 								    """Return a sanitized copy of a live main-runtime override."""
 								    if not isinstance(main_runtime, dict):
 								        return {}
 								    normalized: Dict[str, str] = {}
 								    for field in _MAIN_RUNTIME_FIELDS:
 								        value = main_runtime.get(field)
 								        if isinstance(value, str) and value.strip():
 								            normalized[field] = value.strip()
 								    provider = normalized.get("provider")
 								    if provider:
 								        normalized["provider"] = provider.lower()
 								    return normalized
-												fix: use main provider model for auxiliary tasks on non-aggregator providers (#5091)

Users on direct API-key providers (Alibaba, DeepSeek, ZAI, etc.) without
an OpenRouter or Nous key would get broken auxiliary tasks (compression,
vision, etc.) because _resolve_auto() only tried aggregator providers
first, then fell back to iterating PROVIDER_REGISTRY with wrong default
model names.

Now _resolve_auto() checks the user's main provider first. If it's not
an aggregator (OpenRouter/Nous), it uses their main model directly for
all auxiliary tasks. Aggregator users still get the cheap gemini-flash
model as before.

Adds _read_main_provider() to read model.provider from config.yaml,
mirroring the existing _read_main_model().

Reported by SkyLinx — Alibaba Coding Plan user getting 400 errors from
google/gemini-3-flash-preview being sent to DashScope.
											
										
										
											2026-04-04 12:07:43 -07:00
-												fix: auxiliary client payment fallback — retry with next provider on 402 (#5599)

When a user runs out of OpenRouter credits and switches to Codex (or any
other provider), auxiliary tasks (compression, vision, web_extract) would
still try OpenRouter first and fail with 402.  Two fixes:

1. Payment fallback in call_llm(): When a resolved provider returns HTTP 402
   or a credit-related error, automatically retry with the next available
   provider in the auto-detection chain.  Skips the depleted provider and
   tries Nous → Custom → Codex → API-key providers.

2. Remove hardcoded OpenRouter fallback: The old code fell back specifically
   to OpenRouter when auto/custom resolution returned no client.  Now falls
   back to the full auto-detection chain, which handles any available
   provider — not just OpenRouter.

Also extracts _get_provider_chain() as a shared function (replaces inline
tuple in _resolve_auto and the new fallback), built at call time so test
patches on _try_* functions remain visible.

Adds 16 tests covering _is_payment_error(), _get_provider_chain(),
_try_payment_fallback(), and call_llm() integration with 402 retry.
											
										
										
											2026-04-06 12:41:40 -07:00
+								def _get_provider_chain() -> List[tuple]:
 								    """Return the ordered provider detection chain.
 								    Built at call time (not module level) so that test patches
 								    on the ``_try_*`` functions are picked up correctly.
 								    """
 								    return [
 								        ("openrouter", _try_openrouter),
 								        ("nous", _try_nous),
 								        ("local/custom", _try_custom_endpoint),
 								        ("openai-codex", _try_codex),
 								        ("api-key", _resolve_api_key_provider),
 								    ]
 								def _is_payment_error(exc: Exception) -> bool:
 								    """Detect payment/credit/quota exhaustion errors.
 								    Returns True for HTTP 402 (Payment Required) and for 429/other errors
 								    whose message indicates billing exhaustion rather than rate limiting.
 								    """
 								    status = getattr(exc, "status_code", None)
 								    if status == 402:
 								        return True
 								    err_lower = str(exc).lower()
 								    # OpenRouter and other providers include "credits" or "afford" in 402 bodies,
 								    # but sometimes wrap them in 429 or other codes.
 								    if status in (402, 429, None):
 								        if any(kw in err_lower for kw in ("credits", "insufficient funds",
 								                                           "can only afford", "billing",
 								                                           "payment required")):
 								            return True
 								    return False
-												fix: model fallback — stale model on Nous login + connection error fallback (#6554)

Two bugs in the model fallback system:

1. Nous login leaves stale model in config (provider=nous, model=opus
   from previous OpenRouter setup). Fixed by deferring the config.yaml
   provider write until AFTER model selection completes, and passing the
   selected model atomically via _update_config_for_provider's
   default_model parameter. Previously, _update_config_for_provider was
   called before model selection — if selection failed (free tier, no
   models, exception), config stayed as nous+opus permanently.

2. Codex/stale providers in auxiliary fallback can't connect but block
   the auto-detection chain. Added _is_connection_error() detection
   (APIConnectionError, APITimeoutError, DNS failures, connection
   refused) alongside the existing _is_payment_error() check in
   call_llm(). When a provider endpoint is unreachable, the system now
   falls back to the next available provider instead of crashing.
											
										
										
											2026-04-09 10:38:53 -07:00
+								def _is_connection_error(exc: Exception) -> bool:
 								    """Detect connection/network errors that warrant provider fallback.
 								    Returns True for errors indicating the provider endpoint is unreachable
 								    (DNS failure, connection refused, TLS errors, timeouts).  These are
 								    distinct from API errors (4xx/5xx) which indicate the provider IS
 								    reachable but returned an error.
 								    """
 								    from openai import APIConnectionError, APITimeoutError
 								    if isinstance(exc, (APIConnectionError, APITimeoutError)):
 								        return True
 								    # urllib3 / httpx / httpcore connection errors
 								    err_type = type(exc).__name__
 								    if any(kw in err_type for kw in ("Connection", "Timeout", "DNS", "SSL")):
 								        return True
 								    err_lower = str(exc).lower()
 								    if any(kw in err_lower for kw in (
 								        "connection refused", "name or service not known",
 								        "no route to host", "network is unreachable",
 								        "timed out", "connection reset",
 								    )):
 								        return True
 								    return False
-												fix(auxiliary): refresh Nous runtime credentials after aux 401s

											
										
										
											2026-04-21 14:45:13 -06:00
+								def _is_auth_error(exc: Exception) -> bool:
 								    """Detect auth failures that should trigger provider-specific refresh."""
 								    status = getattr(exc, "status_code", None)
 								    if status == 401:
 								        return True
 								    err_lower = str(exc).lower()
 								    return "error code: 401" in err_lower or "authenticationerror" in type(exc).__name__.lower()
-												fix(auxiliary): generalize unsupported-parameter detector and harden max_tokens retry (#15633)

Generalize the temperature-specific 400 retry that shipped in PR #15621 so
the same reactive strategy covers any provider that rejects an arbitrary
request parameter —  — not just temperature.

- agent/auxiliary_client.py:
  * New _is_unsupported_parameter_error(exc, param): matches the same six
    phrasings the old temperature detector did plus 'unrecognized parameter'
    and 'invalid parameter', against any named param.
  * _is_unsupported_temperature_error is now a thin back-compat wrapper so
    existing imports and tests keep working.
  * The max_tokens → max_completion_tokens retry branch in call_llm and
    async_call_llm now (a) gates on 'max_tokens is not None' so we do not
    pop a key that was never set and silently substitute a None value on
    the retry, and (b) also matches the generic helper in addition to the
    legacy 'max_tokens' / 'unsupported_parameter' substring checks — picking
    up phrasings like 'Unknown parameter: max_tokens' that previously slipped
    through.

- tests/agent/test_unsupported_parameter_retry.py: 18 new tests covering
  the generic detector across params, the back-compat wrapper, and the two
  hardenings to the max_tokens retry branch (None gate + generic phrasing).

Credit: retry-generalization pattern from @nicholasrae's PR #15416. That PR
also proposed the reactive temperature retry which landed independently via
PR #15621 + #15623 (co-authored with @BlueBirdBack). This commit salvages
the remaining hardening ideas onto current main.
											
										
										
											2026-04-25 05:50:34 -07:00
+								def _is_unsupported_parameter_error(exc: Exception, param: str) -> bool:
 								    """Detect provider 400s for an unsupported request parameter.
 								    Different OpenAI-compatible endpoints phrase the same class of error a few
 								    ways: ``Unsupported parameter: X``, ``unsupported_parameter`` with a
 								    ``param`` field, ``X is not supported``, ``unknown parameter: X``,
 								    ``unrecognized request argument: X``.  We match on both the parameter
 								    name and a generic "unsupported/unknown/unrecognized parameter" marker so
 								    call sites can reactively retry without the offending key instead of
 								    surfacing a noisy auxiliary failure.
 								    Generalizes the temperature-specific detector that originally shipped
 								    with PR #15621 so the same retry strategy can cover ``max_tokens``,
 								    ``seed``, ``top_p``, and any future quirk. Credit @nicholasrae (PR #15416)
 								    for the generalization pattern.
-												fix(auxiliary): retry without temperature when any provider rejects it

Universal reactive fix for 'HTTP 400: Unsupported parameter: temperature'
across all providers/models — not just Codex Responses.

The same backend can accept temperature for some models and reject it for
others (e.g. gpt-5.4 accepts but gpt-5.5 rejects on the same OpenAI
endpoint; similar patterns on Copilot, OpenRouter reasoning routes, and
Anthropic Opus 4.7+ via OAI-compat). An allow/deny-list by model name does
not scale.

call_llm / async_call_llm now detect the concrete 'unsupported parameter:
temperature' 400 and transparently retry once without temperature. Kimi's
server-managed omission and Opus 4.7+'s proactive strip stay in place —
this is the safety net for everything else.

Changes:
- agent/auxiliary_client.py: add _is_unsupported_temperature_error helper;
  wire into both sync and async call_llm paths before the existing
  max_tokens/payment/auth retry ladder
- tests/agent/test_unsupported_temperature_retry.py: 19 tests covering
  detector phrasings, sync + async retry, no-retry-without-temperature,
  and non-temperature 400s not triggering the retry

Builds on PR #15620 (codex_responses fallback) which stripped temperature
up front for that one api_mode. This PR closes the gap for every other
provider/model combo via reactive retry.

Credit: retry approach and detector originate from @BlueBirdBack's PR #15578.

Co-authored-by: BlueBirdBack <BlueBirdBack@users.noreply.github.com>

											
										
										
											2026-04-25 05:25:09 -07:00
+								    """
-												fix(auxiliary): generalize unsupported-parameter detector and harden max_tokens retry (#15633)

Generalize the temperature-specific 400 retry that shipped in PR #15621 so
the same reactive strategy covers any provider that rejects an arbitrary
request parameter —  — not just temperature.

- agent/auxiliary_client.py:
  * New _is_unsupported_parameter_error(exc, param): matches the same six
    phrasings the old temperature detector did plus 'unrecognized parameter'
    and 'invalid parameter', against any named param.
  * _is_unsupported_temperature_error is now a thin back-compat wrapper so
    existing imports and tests keep working.
  * The max_tokens → max_completion_tokens retry branch in call_llm and
    async_call_llm now (a) gates on 'max_tokens is not None' so we do not
    pop a key that was never set and silently substitute a None value on
    the retry, and (b) also matches the generic helper in addition to the
    legacy 'max_tokens' / 'unsupported_parameter' substring checks — picking
    up phrasings like 'Unknown parameter: max_tokens' that previously slipped
    through.

- tests/agent/test_unsupported_parameter_retry.py: 18 new tests covering
  the generic detector across params, the back-compat wrapper, and the two
  hardenings to the max_tokens retry branch (None gate + generic phrasing).

Credit: retry-generalization pattern from @nicholasrae's PR #15416. That PR
also proposed the reactive temperature retry which landed independently via
PR #15621 + #15623 (co-authored with @BlueBirdBack). This commit salvages
the remaining hardening ideas onto current main.
											
										
										
											2026-04-25 05:50:34 -07:00
+								    param_lower = (param or "").lower()
 								    if not param_lower:
 								        return False
-												fix(auxiliary): retry without temperature when any provider rejects it

Universal reactive fix for 'HTTP 400: Unsupported parameter: temperature'
across all providers/models — not just Codex Responses.

The same backend can accept temperature for some models and reject it for
others (e.g. gpt-5.4 accepts but gpt-5.5 rejects on the same OpenAI
endpoint; similar patterns on Copilot, OpenRouter reasoning routes, and
Anthropic Opus 4.7+ via OAI-compat). An allow/deny-list by model name does
not scale.

call_llm / async_call_llm now detect the concrete 'unsupported parameter:
temperature' 400 and transparently retry once without temperature. Kimi's
server-managed omission and Opus 4.7+'s proactive strip stay in place —
this is the safety net for everything else.

Changes:
- agent/auxiliary_client.py: add _is_unsupported_temperature_error helper;
  wire into both sync and async call_llm paths before the existing
  max_tokens/payment/auth retry ladder
- tests/agent/test_unsupported_temperature_retry.py: 19 tests covering
  detector phrasings, sync + async retry, no-retry-without-temperature,
  and non-temperature 400s not triggering the retry

Builds on PR #15620 (codex_responses fallback) which stripped temperature
up front for that one api_mode. This PR closes the gap for every other
provider/model combo via reactive retry.

Credit: retry approach and detector originate from @BlueBirdBack's PR #15578.

Co-authored-by: BlueBirdBack <BlueBirdBack@users.noreply.github.com>

											
										
										
											2026-04-25 05:25:09 -07:00
+								    err_lower = str(exc).lower()
-												fix(auxiliary): generalize unsupported-parameter detector and harden max_tokens retry (#15633)

Generalize the temperature-specific 400 retry that shipped in PR #15621 so
the same reactive strategy covers any provider that rejects an arbitrary
request parameter —  — not just temperature.

- agent/auxiliary_client.py:
  * New _is_unsupported_parameter_error(exc, param): matches the same six
    phrasings the old temperature detector did plus 'unrecognized parameter'
    and 'invalid parameter', against any named param.
  * _is_unsupported_temperature_error is now a thin back-compat wrapper so
    existing imports and tests keep working.
  * The max_tokens → max_completion_tokens retry branch in call_llm and
    async_call_llm now (a) gates on 'max_tokens is not None' so we do not
    pop a key that was never set and silently substitute a None value on
    the retry, and (b) also matches the generic helper in addition to the
    legacy 'max_tokens' / 'unsupported_parameter' substring checks — picking
    up phrasings like 'Unknown parameter: max_tokens' that previously slipped
    through.

- tests/agent/test_unsupported_parameter_retry.py: 18 new tests covering
  the generic detector across params, the back-compat wrapper, and the two
  hardenings to the max_tokens retry branch (None gate + generic phrasing).

Credit: retry-generalization pattern from @nicholasrae's PR #15416. That PR
also proposed the reactive temperature retry which landed independently via
PR #15621 + #15623 (co-authored with @BlueBirdBack). This commit salvages
the remaining hardening ideas onto current main.
											
										
										
											2026-04-25 05:50:34 -07:00
+								    if param_lower not in err_lower:
-												fix(auxiliary): retry without temperature when any provider rejects it

Universal reactive fix for 'HTTP 400: Unsupported parameter: temperature'
across all providers/models — not just Codex Responses.

The same backend can accept temperature for some models and reject it for
others (e.g. gpt-5.4 accepts but gpt-5.5 rejects on the same OpenAI
endpoint; similar patterns on Copilot, OpenRouter reasoning routes, and
Anthropic Opus 4.7+ via OAI-compat). An allow/deny-list by model name does
not scale.

call_llm / async_call_llm now detect the concrete 'unsupported parameter:
temperature' 400 and transparently retry once without temperature. Kimi's
server-managed omission and Opus 4.7+'s proactive strip stay in place —
this is the safety net for everything else.

Changes:
- agent/auxiliary_client.py: add _is_unsupported_temperature_error helper;
  wire into both sync and async call_llm paths before the existing
  max_tokens/payment/auth retry ladder
- tests/agent/test_unsupported_temperature_retry.py: 19 tests covering
  detector phrasings, sync + async retry, no-retry-without-temperature,
  and non-temperature 400s not triggering the retry

Builds on PR #15620 (codex_responses fallback) which stripped temperature
up front for that one api_mode. This PR closes the gap for every other
provider/model combo via reactive retry.

Credit: retry approach and detector originate from @BlueBirdBack's PR #15578.

Co-authored-by: BlueBirdBack <BlueBirdBack@users.noreply.github.com>

											
										
										
											2026-04-25 05:25:09 -07:00
+								        return False
 								    return any(marker in err_lower for marker in (
 								        "unsupported parameter",
 								        "unsupported_parameter",
 								        "not supported",
 								        "does not support",
 								        "unknown parameter",
 								        "unrecognized request argument",
-												fix(auxiliary): generalize unsupported-parameter detector and harden max_tokens retry (#15633)

Generalize the temperature-specific 400 retry that shipped in PR #15621 so
the same reactive strategy covers any provider that rejects an arbitrary
request parameter —  — not just temperature.

- agent/auxiliary_client.py:
  * New _is_unsupported_parameter_error(exc, param): matches the same six
    phrasings the old temperature detector did plus 'unrecognized parameter'
    and 'invalid parameter', against any named param.
  * _is_unsupported_temperature_error is now a thin back-compat wrapper so
    existing imports and tests keep working.
  * The max_tokens → max_completion_tokens retry branch in call_llm and
    async_call_llm now (a) gates on 'max_tokens is not None' so we do not
    pop a key that was never set and silently substitute a None value on
    the retry, and (b) also matches the generic helper in addition to the
    legacy 'max_tokens' / 'unsupported_parameter' substring checks — picking
    up phrasings like 'Unknown parameter: max_tokens' that previously slipped
    through.

- tests/agent/test_unsupported_parameter_retry.py: 18 new tests covering
  the generic detector across params, the back-compat wrapper, and the two
  hardenings to the max_tokens retry branch (None gate + generic phrasing).

Credit: retry-generalization pattern from @nicholasrae's PR #15416. That PR
also proposed the reactive temperature retry which landed independently via
PR #15621 + #15623 (co-authored with @BlueBirdBack). This commit salvages
the remaining hardening ideas onto current main.
											
										
										
											2026-04-25 05:50:34 -07:00
+								        "unrecognized parameter",
 								        "invalid parameter",
-												fix(auxiliary): retry without temperature when any provider rejects it

Universal reactive fix for 'HTTP 400: Unsupported parameter: temperature'
across all providers/models — not just Codex Responses.

The same backend can accept temperature for some models and reject it for
others (e.g. gpt-5.4 accepts but gpt-5.5 rejects on the same OpenAI
endpoint; similar patterns on Copilot, OpenRouter reasoning routes, and
Anthropic Opus 4.7+ via OAI-compat). An allow/deny-list by model name does
not scale.

call_llm / async_call_llm now detect the concrete 'unsupported parameter:
temperature' 400 and transparently retry once without temperature. Kimi's
server-managed omission and Opus 4.7+'s proactive strip stay in place —
this is the safety net for everything else.

Changes:
- agent/auxiliary_client.py: add _is_unsupported_temperature_error helper;
  wire into both sync and async call_llm paths before the existing
  max_tokens/payment/auth retry ladder
- tests/agent/test_unsupported_temperature_retry.py: 19 tests covering
  detector phrasings, sync + async retry, no-retry-without-temperature,
  and non-temperature 400s not triggering the retry

Builds on PR #15620 (codex_responses fallback) which stripped temperature
up front for that one api_mode. This PR closes the gap for every other
provider/model combo via reactive retry.

Credit: retry approach and detector originate from @BlueBirdBack's PR #15578.

Co-authored-by: BlueBirdBack <BlueBirdBack@users.noreply.github.com>

											
										
										
											2026-04-25 05:25:09 -07:00
+								    ))
-												fix(auxiliary): generalize unsupported-parameter detector and harden max_tokens retry (#15633)

Generalize the temperature-specific 400 retry that shipped in PR #15621 so
the same reactive strategy covers any provider that rejects an arbitrary
request parameter —  — not just temperature.

- agent/auxiliary_client.py:
  * New _is_unsupported_parameter_error(exc, param): matches the same six
    phrasings the old temperature detector did plus 'unrecognized parameter'
    and 'invalid parameter', against any named param.
  * _is_unsupported_temperature_error is now a thin back-compat wrapper so
    existing imports and tests keep working.
  * The max_tokens → max_completion_tokens retry branch in call_llm and
    async_call_llm now (a) gates on 'max_tokens is not None' so we do not
    pop a key that was never set and silently substitute a None value on
    the retry, and (b) also matches the generic helper in addition to the
    legacy 'max_tokens' / 'unsupported_parameter' substring checks — picking
    up phrasings like 'Unknown parameter: max_tokens' that previously slipped
    through.

- tests/agent/test_unsupported_parameter_retry.py: 18 new tests covering
  the generic detector across params, the back-compat wrapper, and the two
  hardenings to the max_tokens retry branch (None gate + generic phrasing).

Credit: retry-generalization pattern from @nicholasrae's PR #15416. That PR
also proposed the reactive temperature retry which landed independently via
PR #15621 + #15623 (co-authored with @BlueBirdBack). This commit salvages
the remaining hardening ideas onto current main.
											
										
										
											2026-04-25 05:50:34 -07:00
+								def _is_unsupported_temperature_error(exc: Exception) -> bool:
 								    """Back-compat wrapper: detect API errors where the model rejects ``temperature``.
 								    Delegates to :func:`_is_unsupported_parameter_error`; kept as a separate
 								    public symbol because existing tests and call sites import it by name.
 								    """
 								    return _is_unsupported_parameter_error(exc, "temperature")
-												fix(aux): force anthropic oauth refresh after 401

Co-Authored-By: Paperclip <noreply@paperclip.ing>

											
										
										
											2026-04-23 19:50:09 +09:00
+								def _evict_cached_clients(provider: str) -> None:
 								    """Drop cached auxiliary clients for a provider so fresh creds are used."""
 								    normalized = _normalize_aux_provider(provider)
 								    with _client_cache_lock:
 								        stale_keys = [
 								            key for key in _client_cache
 								            if _normalize_aux_provider(str(key[0])) == normalized
 								        ]
 								        for key in stale_keys:
 								            client = _client_cache.get(key, (None, None, None))[0]
 								            if client is not None:
 								                _force_close_async_httpx(client)
 								                try:
 								                    close_fn = getattr(client, "close", None)
 								                    if callable(close_fn):
 								                        close_fn()
 								                except Exception:
 								                    pass
 								            _client_cache.pop(key, None)
 								def _refresh_provider_credentials(provider: str) -> bool:
 								    """Refresh short-lived credentials for OAuth-backed auxiliary providers."""
 								    normalized = _normalize_aux_provider(provider)
 								    try:
 								        if normalized == "openai-codex":
 								            from hermes_cli.auth import resolve_codex_runtime_credentials
 								            creds = resolve_codex_runtime_credentials(force_refresh=True)
 								            if not str(creds.get("api_key", "") or "").strip():
 								                return False
 								            _evict_cached_clients(normalized)
 								            return True
 								        if normalized == "nous":
 								            from hermes_cli.auth import resolve_nous_runtime_credentials
 								            creds = resolve_nous_runtime_credentials(
 								                min_key_ttl_seconds=max(60, int(os.getenv("HERMES_NOUS_MIN_KEY_TTL_SECONDS", "1800"))),
 								                timeout_seconds=float(os.getenv("HERMES_NOUS_TIMEOUT_SECONDS", "15")),
 								                force_mint=True,
 								            )
 								            if not str(creds.get("api_key", "") or "").strip():
 								                return False
 								            _evict_cached_clients(normalized)
 								            return True
 								        if normalized == "anthropic":
 								            from agent.anthropic_adapter import read_claude_code_credentials, _refresh_oauth_token, resolve_anthropic_token
 								            creds = read_claude_code_credentials()
 								            token = _refresh_oauth_token(creds) if isinstance(creds, dict) and creds.get("refreshToken") else None
 								            if not str(token or "").strip():
 								                token = resolve_anthropic_token()
 								            if not str(token or "").strip():
 								                return False
 								            _evict_cached_clients(normalized)
 								            return True
 								    except Exception as exc:
 								        logger.debug("Auxiliary provider credential refresh failed for %s: %s", normalized, exc)
 								        return False
 								    return False
-												fix: auxiliary client payment fallback — retry with next provider on 402 (#5599)

When a user runs out of OpenRouter credits and switches to Codex (or any
other provider), auxiliary tasks (compression, vision, web_extract) would
still try OpenRouter first and fail with 402.  Two fixes:

1. Payment fallback in call_llm(): When a resolved provider returns HTTP 402
   or a credit-related error, automatically retry with the next available
   provider in the auto-detection chain.  Skips the depleted provider and
   tries Nous → Custom → Codex → API-key providers.

2. Remove hardcoded OpenRouter fallback: The old code fell back specifically
   to OpenRouter when auto/custom resolution returned no client.  Now falls
   back to the full auto-detection chain, which handles any available
   provider — not just OpenRouter.

Also extracts _get_provider_chain() as a shared function (replaces inline
tuple in _resolve_auto and the new fallback), built at call time so test
patches on _try_* functions remain visible.

Adds 16 tests covering _is_payment_error(), _get_provider_chain(),
_try_payment_fallback(), and call_llm() integration with 402 retry.
											
										
										
											2026-04-06 12:41:40 -07:00
+								def _try_payment_fallback(
 								    failed_provider: str,
 								    task: str = None,
-												fix(auxiliary): harden fallback behavior for non-OpenRouter users

Four fixes to auxiliary_client.py:

1. Respect explicit provider as hard constraint (#7559)
   When auxiliary.{task}.provider is explicitly set (not 'auto'),
   connection/payment errors no longer silently fallback to cloud
   providers. Local-only users (Ollama, vLLM) will no longer get
   unexpected OpenRouter billing from auxiliary tasks.

2. Eliminate model='default' sentinel (#7512)
   _resolve_api_key_provider() no longer sends literal 'default' as
   model name to APIs. Providers without a known aux model in
   _API_KEY_PROVIDER_AUX_MODELS are skipped instead of producing
   model_not_supported errors.

3. Add payment/connection fallback to async_call_llm (#7512)
   async_call_llm now mirrors sync call_llm's fallback logic for
   payment (402) and connection errors. Previously, async consumers
   (session_search, web_tools, vision) got hard failures with no
   recovery. Also fixes hardcoded 'openrouter' fallback to use the
   full auto-detection chain.

4. Use accurate error reason in fallback logs (#7512)
   _try_payment_fallback() now accepts a reason parameter and uses
   it in log messages. Connection timeouts are no longer misleadingly
   logged as 'payment error'.

Closes #7559
Closes #7512

											
										
										
											2026-04-11 12:37:53 +05:30
+								    reason: str = "payment error",
-												fix: auxiliary client payment fallback — retry with next provider on 402 (#5599)

When a user runs out of OpenRouter credits and switches to Codex (or any
other provider), auxiliary tasks (compression, vision, web_extract) would
still try OpenRouter first and fail with 402.  Two fixes:

1. Payment fallback in call_llm(): When a resolved provider returns HTTP 402
   or a credit-related error, automatically retry with the next available
   provider in the auto-detection chain.  Skips the depleted provider and
   tries Nous → Custom → Codex → API-key providers.

2. Remove hardcoded OpenRouter fallback: The old code fell back specifically
   to OpenRouter when auto/custom resolution returned no client.  Now falls
   back to the full auto-detection chain, which handles any available
   provider — not just OpenRouter.

Also extracts _get_provider_chain() as a shared function (replaces inline
tuple in _resolve_auto and the new fallback), built at call time so test
patches on _try_* functions remain visible.

Adds 16 tests covering _is_payment_error(), _get_provider_chain(),
_try_payment_fallback(), and call_llm() integration with 402 retry.
											
										
										
											2026-04-06 12:41:40 -07:00
+								) -> Tuple[Optional[Any], Optional[str], str]:
-												fix(auxiliary): harden fallback behavior for non-OpenRouter users

Four fixes to auxiliary_client.py:

1. Respect explicit provider as hard constraint (#7559)
   When auxiliary.{task}.provider is explicitly set (not 'auto'),
   connection/payment errors no longer silently fallback to cloud
   providers. Local-only users (Ollama, vLLM) will no longer get
   unexpected OpenRouter billing from auxiliary tasks.

2. Eliminate model='default' sentinel (#7512)
   _resolve_api_key_provider() no longer sends literal 'default' as
   model name to APIs. Providers without a known aux model in
   _API_KEY_PROVIDER_AUX_MODELS are skipped instead of producing
   model_not_supported errors.

3. Add payment/connection fallback to async_call_llm (#7512)
   async_call_llm now mirrors sync call_llm's fallback logic for
   payment (402) and connection errors. Previously, async consumers
   (session_search, web_tools, vision) got hard failures with no
   recovery. Also fixes hardcoded 'openrouter' fallback to use the
   full auto-detection chain.

4. Use accurate error reason in fallback logs (#7512)
   _try_payment_fallback() now accepts a reason parameter and uses
   it in log messages. Connection timeouts are no longer misleadingly
   logged as 'payment error'.

Closes #7559
Closes #7512

											
										
										
											2026-04-11 12:37:53 +05:30
+								    """Try alternative providers after a payment/credit or connection error.
-												fix: auxiliary client payment fallback — retry with next provider on 402 (#5599)

When a user runs out of OpenRouter credits and switches to Codex (or any
other provider), auxiliary tasks (compression, vision, web_extract) would
still try OpenRouter first and fail with 402.  Two fixes:

1. Payment fallback in call_llm(): When a resolved provider returns HTTP 402
   or a credit-related error, automatically retry with the next available
   provider in the auto-detection chain.  Skips the depleted provider and
   tries Nous → Custom → Codex → API-key providers.

2. Remove hardcoded OpenRouter fallback: The old code fell back specifically
   to OpenRouter when auto/custom resolution returned no client.  Now falls
   back to the full auto-detection chain, which handles any available
   provider — not just OpenRouter.

Also extracts _get_provider_chain() as a shared function (replaces inline
tuple in _resolve_auto and the new fallback), built at call time so test
patches on _try_* functions remain visible.

Adds 16 tests covering _is_payment_error(), _get_provider_chain(),
_try_payment_fallback(), and call_llm() integration with 402 retry.
											
										
										
											2026-04-06 12:41:40 -07:00
 								    Iterates the standard auto-detection chain, skipping the provider that
-												fix(auxiliary): harden fallback behavior for non-OpenRouter users

Four fixes to auxiliary_client.py:

1. Respect explicit provider as hard constraint (#7559)
   When auxiliary.{task}.provider is explicitly set (not 'auto'),
   connection/payment errors no longer silently fallback to cloud
   providers. Local-only users (Ollama, vLLM) will no longer get
   unexpected OpenRouter billing from auxiliary tasks.

2. Eliminate model='default' sentinel (#7512)
   _resolve_api_key_provider() no longer sends literal 'default' as
   model name to APIs. Providers without a known aux model in
   _API_KEY_PROVIDER_AUX_MODELS are skipped instead of producing
   model_not_supported errors.

3. Add payment/connection fallback to async_call_llm (#7512)
   async_call_llm now mirrors sync call_llm's fallback logic for
   payment (402) and connection errors. Previously, async consumers
   (session_search, web_tools, vision) got hard failures with no
   recovery. Also fixes hardcoded 'openrouter' fallback to use the
   full auto-detection chain.

4. Use accurate error reason in fallback logs (#7512)
   _try_payment_fallback() now accepts a reason parameter and uses
   it in log messages. Connection timeouts are no longer misleadingly
   logged as 'payment error'.

Closes #7559
Closes #7512

											
										
										
											2026-04-11 12:37:53 +05:30
+								    failed.
-												fix: auxiliary client payment fallback — retry with next provider on 402 (#5599)

When a user runs out of OpenRouter credits and switches to Codex (or any
other provider), auxiliary tasks (compression, vision, web_extract) would
still try OpenRouter first and fail with 402.  Two fixes:

1. Payment fallback in call_llm(): When a resolved provider returns HTTP 402
   or a credit-related error, automatically retry with the next available
   provider in the auto-detection chain.  Skips the depleted provider and
   tries Nous → Custom → Codex → API-key providers.

2. Remove hardcoded OpenRouter fallback: The old code fell back specifically
   to OpenRouter when auto/custom resolution returned no client.  Now falls
   back to the full auto-detection chain, which handles any available
   provider — not just OpenRouter.

Also extracts _get_provider_chain() as a shared function (replaces inline
tuple in _resolve_auto and the new fallback), built at call time so test
patches on _try_* functions remain visible.

Adds 16 tests covering _is_payment_error(), _get_provider_chain(),
_try_payment_fallback(), and call_llm() integration with 402 retry.
											
										
										
											2026-04-06 12:41:40 -07:00
 								    Returns:
 								        (client, model, provider_label) or (None, None, "") if no fallback.
 								    """
 								    # Normalise the failed provider label for matching.
 								    skip = failed_provider.lower().strip()
 								    # Also skip Step-1 main-provider path if it maps to the same backend.
 								    # (e.g. main_provider="openrouter" → skip "openrouter" in chain)
 								    main_provider = _read_main_provider()
 								    skip_labels = {skip}
 								    if main_provider and main_provider.lower() in skip:
 								        skip_labels.add(main_provider.lower())
 								    # Map common resolved_provider values back to chain labels.
 								    _alias_to_label = {"openrouter": "openrouter", "nous": "nous",
 								                       "openai-codex": "openai-codex", "codex": "openai-codex",
 								                       "custom": "local/custom", "local/custom": "local/custom"}
 								    skip_chain_labels = {_alias_to_label.get(s, s) for s in skip_labels}
 								    tried = []
 								    for label, try_fn in _get_provider_chain():
 								        if label in skip_chain_labels:
 								            continue
 								        client, model = try_fn()
 								        if client is not None:
 								            logger.info(
-												fix(auxiliary): harden fallback behavior for non-OpenRouter users

Four fixes to auxiliary_client.py:

1. Respect explicit provider as hard constraint (#7559)
   When auxiliary.{task}.provider is explicitly set (not 'auto'),
   connection/payment errors no longer silently fallback to cloud
   providers. Local-only users (Ollama, vLLM) will no longer get
   unexpected OpenRouter billing from auxiliary tasks.

2. Eliminate model='default' sentinel (#7512)
   _resolve_api_key_provider() no longer sends literal 'default' as
   model name to APIs. Providers without a known aux model in
   _API_KEY_PROVIDER_AUX_MODELS are skipped instead of producing
   model_not_supported errors.

3. Add payment/connection fallback to async_call_llm (#7512)
   async_call_llm now mirrors sync call_llm's fallback logic for
   payment (402) and connection errors. Previously, async consumers
   (session_search, web_tools, vision) got hard failures with no
   recovery. Also fixes hardcoded 'openrouter' fallback to use the
   full auto-detection chain.

4. Use accurate error reason in fallback logs (#7512)
   _try_payment_fallback() now accepts a reason parameter and uses
   it in log messages. Connection timeouts are no longer misleadingly
   logged as 'payment error'.

Closes #7559
Closes #7512

											
										
										
											2026-04-11 12:37:53 +05:30
+								                "Auxiliary %s: %s on %s — falling back to %s (%s)",
 								                task or "call", reason, failed_provider, label, model or "default",
-												fix: auxiliary client payment fallback — retry with next provider on 402 (#5599)

When a user runs out of OpenRouter credits and switches to Codex (or any
other provider), auxiliary tasks (compression, vision, web_extract) would
still try OpenRouter first and fail with 402.  Two fixes:

1. Payment fallback in call_llm(): When a resolved provider returns HTTP 402
   or a credit-related error, automatically retry with the next available
   provider in the auto-detection chain.  Skips the depleted provider and
   tries Nous → Custom → Codex → API-key providers.

2. Remove hardcoded OpenRouter fallback: The old code fell back specifically
   to OpenRouter when auto/custom resolution returned no client.  Now falls
   back to the full auto-detection chain, which handles any available
   provider — not just OpenRouter.

Also extracts _get_provider_chain() as a shared function (replaces inline
tuple in _resolve_auto and the new fallback), built at call time so test
patches on _try_* functions remain visible.

Adds 16 tests covering _is_payment_error(), _get_provider_chain(),
_try_payment_fallback(), and call_llm() integration with 402 retry.
											
										
										
											2026-04-06 12:41:40 -07:00
+								            )
 								            return client, model, label
 								        tried.append(label)
 								    logger.warning(
-												fix(auxiliary): harden fallback behavior for non-OpenRouter users

Four fixes to auxiliary_client.py:

1. Respect explicit provider as hard constraint (#7559)
   When auxiliary.{task}.provider is explicitly set (not 'auto'),
   connection/payment errors no longer silently fallback to cloud
   providers. Local-only users (Ollama, vLLM) will no longer get
   unexpected OpenRouter billing from auxiliary tasks.

2. Eliminate model='default' sentinel (#7512)
   _resolve_api_key_provider() no longer sends literal 'default' as
   model name to APIs. Providers without a known aux model in
   _API_KEY_PROVIDER_AUX_MODELS are skipped instead of producing
   model_not_supported errors.

3. Add payment/connection fallback to async_call_llm (#7512)
   async_call_llm now mirrors sync call_llm's fallback logic for
   payment (402) and connection errors. Previously, async consumers
   (session_search, web_tools, vision) got hard failures with no
   recovery. Also fixes hardcoded 'openrouter' fallback to use the
   full auto-detection chain.

4. Use accurate error reason in fallback logs (#7512)
   _try_payment_fallback() now accepts a reason parameter and uses
   it in log messages. Connection timeouts are no longer misleadingly
   logged as 'payment error'.

Closes #7559
Closes #7512

											
										
										
											2026-04-11 12:37:53 +05:30
+								        "Auxiliary %s: %s on %s and no fallback available (tried: %s)",
 								        task or "call", reason, failed_provider, ", ".join(tried),
-												fix: auxiliary client payment fallback — retry with next provider on 402 (#5599)

When a user runs out of OpenRouter credits and switches to Codex (or any
other provider), auxiliary tasks (compression, vision, web_extract) would
still try OpenRouter first and fail with 402.  Two fixes:

1. Payment fallback in call_llm(): When a resolved provider returns HTTP 402
   or a credit-related error, automatically retry with the next available
   provider in the auto-detection chain.  Skips the depleted provider and
   tries Nous → Custom → Codex → API-key providers.

2. Remove hardcoded OpenRouter fallback: The old code fell back specifically
   to OpenRouter when auto/custom resolution returned no client.  Now falls
   back to the full auto-detection chain, which handles any available
   provider — not just OpenRouter.

Also extracts _get_provider_chain() as a shared function (replaces inline
tuple in _resolve_auto and the new fallback), built at call time so test
patches on _try_* functions remain visible.

Adds 16 tests covering _is_payment_error(), _get_provider_chain(),
_try_payment_fallback(), and call_llm() integration with 402 retry.
											
										
										
											2026-04-06 12:41:40 -07:00
+								    )
 								    return None, None, ""
-												fix(agent): route compression aux through live session runtime

											
										
										
											2026-04-12 00:10:19 -04:00
+								def _resolve_auto(main_runtime: Optional[Dict[str, Any]] = None) -> Tuple[Optional[OpenAI], Optional[str]]:
-												fix: use main provider model for auxiliary tasks on non-aggregator providers (#5091)

Users on direct API-key providers (Alibaba, DeepSeek, ZAI, etc.) without
an OpenRouter or Nous key would get broken auxiliary tasks (compression,
vision, etc.) because _resolve_auto() only tried aggregator providers
first, then fell back to iterating PROVIDER_REGISTRY with wrong default
model names.

Now _resolve_auto() checks the user's main provider first. If it's not
an aggregator (OpenRouter/Nous), it uses their main model directly for
all auxiliary tasks. Aggregator users still get the cheap gemini-flash
model as before.

Adds _read_main_provider() to read model.provider from config.yaml,
mirroring the existing _read_main_model().

Reported by SkyLinx — Alibaba Coding Plan user getting 400 errors from
google/gemini-3-flash-preview being sent to DashScope.
											
										
										
											2026-04-04 12:07:43 -07:00
+								    """Full auto-detection chain.
 								    Priority:
-												feat(auxiliary): default 'auto' routing to main model for all users (#11900)

Before: aggregator users (OpenRouter / Nous Portal) running 'auto'
routing for auxiliary tasks — compression, vision, web extraction,
session search, etc. — got routed to a cheap provider-side default
model (Gemini Flash).  Non-aggregator users already got their main
model.  Behavior was inconsistent and surprising — users picked
Claude / GPT / their preferred model, but side tasks ran on
Gemini Flash.

After: 'auto' means "use my main chat model" for every user,
regardless of provider type.  Only when the main provider has no
working client does the fallback chain run (OpenRouter → Nous →
custom → Codex → API-key providers).  Explicit per-task overrides
in config.yaml (auxiliary.<task>.provider / .model) still win —
they are a hard constraint, not subject to the auto policy.

Vision auto-detection follows the same policy: try main provider +
main model first (with _PROVIDER_VISION_MODELS overrides preserved
for providers like xiaomi and zai that ship a dedicated multimodal
model distinct from their chat model).  Aggregator strict vision
backends are fallbacks, not the primary path.

Changes:
  - agent/auxiliary_client.py: _resolve_auto() drops the
    `_AGGREGATOR_PROVIDERS` guard.  resolve_vision_provider_client()
    auto branch unifies aggregator and exotic-provider paths —
    everyone goes through resolve_provider_client() with main_model.
    Dead _AGGREGATOR_PROVIDERS constant removed (was only used by
    the guard we just removed).
  - hermes_cli/main.py: aux config menu copy updated to reflect
    the new semantics ("'auto' means 'use my main model'").
  - tests/agent/test_auxiliary_main_first.py: 12 regression tests
    covering OpenRouter/Nous/DeepSeek main paths, runtime-override
    wins, explicit-config wins, vision override preservation for
    exotic providers, and fallback-chain activation when the main
    provider has no working client.

Co-authored-by: teknium1 <teknium@nousresearch.com>
											
										
										
											2026-04-17 19:13:23 -07:00
+. User's main provider + main model, regardless of provider type.
 								         This means auxiliary tasks (compression, vision, web extraction,
 								         session search, etc.) use the same model the user configured for
 								         chat.  Users on OpenRouter/Nous get their chosen chat model; users
 								         on DeepSeek/ZAI/Alibaba get theirs; etc.  Running aux tasks on the
 								         user's picked model keeps behavior predictable — no surprise
 								         switches to a cheap fallback model for side tasks.
 . OpenRouter → Nous → custom → Codex → API-key providers (fallback
 								         chain, only used when the main provider has no working client).
-												fix: use main provider model for auxiliary tasks on non-aggregator providers (#5091)

Users on direct API-key providers (Alibaba, DeepSeek, ZAI, etc.) without
an OpenRouter or Nous key would get broken auxiliary tasks (compression,
vision, etc.) because _resolve_auto() only tried aggregator providers
first, then fell back to iterating PROVIDER_REGISTRY with wrong default
model names.

Now _resolve_auto() checks the user's main provider first. If it's not
an aggregator (OpenRouter/Nous), it uses their main model directly for
all auxiliary tasks. Aggregator users still get the cheap gemini-flash
model as before.

Adds _read_main_provider() to read model.provider from config.yaml,
mirroring the existing _read_main_model().

Reported by SkyLinx — Alibaba Coding Plan user getting 400 errors from
google/gemini-3-flash-preview being sent to DashScope.
											
										
										
											2026-04-04 12:07:43 -07:00
+								    """
-												fix: warn and clear stale OPENAI_BASE_URL on provider switch (#5161)

											
										
										
											2026-04-11 12:48:09 +05:30
+								    global auxiliary_is_nous, _stale_base_url_warned
-												fix(aux): reset auxiliary_is_nous flag on each resolution attempt

The module-level auxiliary_is_nous was set to True by _try_nous() and
never reset. In long-running gateway processes, once Nous was resolved
as auxiliary provider, the flag stayed True forever — even if
subsequent resolutions chose a different provider (e.g. OpenRouter).
This caused Nous product tags to be sent to non-Nous providers.

Reset the flag at the start of _resolve_auto() so only the winning
provider's flag persists.

											
										
										
											2026-03-17 04:02:15 -07:00
+								    auxiliary_is_nous = False  # Reset — _try_nous() will set True if it wins
-												fix(agent): route compression aux through live session runtime

											
										
										
											2026-04-12 00:10:19 -04:00
+								    runtime = _normalize_main_runtime(main_runtime)
 								    runtime_provider = runtime.get("provider", "")
 								    runtime_model = runtime.get("model", "")
 								    runtime_base_url = runtime.get("base_url", "")
 								    runtime_api_key = runtime.get("api_key", "")
 								    runtime_api_mode = runtime.get("api_mode", "")
-												fix: use main provider model for auxiliary tasks on non-aggregator providers (#5091)

Users on direct API-key providers (Alibaba, DeepSeek, ZAI, etc.) without
an OpenRouter or Nous key would get broken auxiliary tasks (compression,
vision, etc.) because _resolve_auto() only tried aggregator providers
first, then fell back to iterating PROVIDER_REGISTRY with wrong default
model names.

Now _resolve_auto() checks the user's main provider first. If it's not
an aggregator (OpenRouter/Nous), it uses their main model directly for
all auxiliary tasks. Aggregator users still get the cheap gemini-flash
model as before.

Adds _read_main_provider() to read model.provider from config.yaml,
mirroring the existing _read_main_model().

Reported by SkyLinx — Alibaba Coding Plan user getting 400 errors from
google/gemini-3-flash-preview being sent to DashScope.
											
										
										
											2026-04-04 12:07:43 -07:00
-												fix: warn and clear stale OPENAI_BASE_URL on provider switch (#5161)

											
										
										
											2026-04-11 12:48:09 +05:30
+								    # ── Warn once if OPENAI_BASE_URL is set but config.yaml uses a named
 								    #    provider (not 'custom').  This catches the common "env poisoning"
 								    #    scenario where a user switches providers via `hermes model` but the
 								    #    old OPENAI_BASE_URL lingers in ~/.hermes/.env. ──
 								    if not _stale_base_url_warned:
 								        _env_base = os.getenv("OPENAI_BASE_URL", "").strip()
-												fix(agent): route compression aux through live session runtime

											
										
										
											2026-04-12 00:10:19 -04:00
+								        _cfg_provider = runtime_provider or _read_main_provider()
-												fix: warn and clear stale OPENAI_BASE_URL on provider switch (#5161)

											
										
										
											2026-04-11 12:48:09 +05:30
+								        if (_env_base and _cfg_provider
 								                and _cfg_provider != "custom"
 								                and not _cfg_provider.startswith("custom:")):
 								            logger.warning(
 								                "OPENAI_BASE_URL is set (%s) but model.provider is '%s'. "
 								                "Auxiliary clients may route to the wrong endpoint. "
 								                "Run: hermes model to reconfigure, or remove "
 								                "OPENAI_BASE_URL from ~/.hermes/.env",
 								                _env_base, _cfg_provider,
 								            )
 								            _stale_base_url_warned = True
-												feat(auxiliary): default 'auto' routing to main model for all users (#11900)

Before: aggregator users (OpenRouter / Nous Portal) running 'auto'
routing for auxiliary tasks — compression, vision, web extraction,
session search, etc. — got routed to a cheap provider-side default
model (Gemini Flash).  Non-aggregator users already got their main
model.  Behavior was inconsistent and surprising — users picked
Claude / GPT / their preferred model, but side tasks ran on
Gemini Flash.

After: 'auto' means "use my main chat model" for every user,
regardless of provider type.  Only when the main provider has no
working client does the fallback chain run (OpenRouter → Nous →
custom → Codex → API-key providers).  Explicit per-task overrides
in config.yaml (auxiliary.<task>.provider / .model) still win —
they are a hard constraint, not subject to the auto policy.

Vision auto-detection follows the same policy: try main provider +
main model first (with _PROVIDER_VISION_MODELS overrides preserved
for providers like xiaomi and zai that ship a dedicated multimodal
model distinct from their chat model).  Aggregator strict vision
backends are fallbacks, not the primary path.

Changes:
  - agent/auxiliary_client.py: _resolve_auto() drops the
    `_AGGREGATOR_PROVIDERS` guard.  resolve_vision_provider_client()
    auto branch unifies aggregator and exotic-provider paths —
    everyone goes through resolve_provider_client() with main_model.
    Dead _AGGREGATOR_PROVIDERS constant removed (was only used by
    the guard we just removed).
  - hermes_cli/main.py: aux config menu copy updated to reflect
    the new semantics ("'auto' means 'use my main model'").
  - tests/agent/test_auxiliary_main_first.py: 12 regression tests
    covering OpenRouter/Nous/DeepSeek main paths, runtime-override
    wins, explicit-config wins, vision override preservation for
    exotic providers, and fallback-chain activation when the main
    provider has no working client.

Co-authored-by: teknium1 <teknium@nousresearch.com>
											
										
										
											2026-04-17 19:13:23 -07:00
+								    # ── Step 1: main provider + main model → use them directly ──
 								    #
 								    # This is the primary aux backend for every user.  "auto" means
 								    # "use my main chat model for side tasks as well" — including users
 								    # on aggregators (OpenRouter, Nous) who previously got routed to a
 								    # cheap provider-side default.  Explicit per-task overrides set via
 								    # config.yaml (auxiliary.<task>.provider) still win over this.
-												fix(agent): route compression aux through live session runtime

											
										
										
											2026-04-12 00:10:19 -04:00
+								    main_provider = runtime_provider or _read_main_provider()
 								    main_model = runtime_model or _read_main_model()
-												fix: use main provider model for auxiliary tasks on non-aggregator providers (#5091)

Users on direct API-key providers (Alibaba, DeepSeek, ZAI, etc.) without
an OpenRouter or Nous key would get broken auxiliary tasks (compression,
vision, etc.) because _resolve_auto() only tried aggregator providers
first, then fell back to iterating PROVIDER_REGISTRY with wrong default
model names.

Now _resolve_auto() checks the user's main provider first. If it's not
an aggregator (OpenRouter/Nous), it uses their main model directly for
all auxiliary tasks. Aggregator users still get the cheap gemini-flash
model as before.

Adds _read_main_provider() to read model.provider from config.yaml,
mirroring the existing _read_main_model().

Reported by SkyLinx — Alibaba Coding Plan user getting 400 errors from
google/gemini-3-flash-preview being sent to DashScope.
											
										
										
											2026-04-04 12:07:43 -07:00
+								    if (main_provider and main_model
-												fix: allow custom endpoint users to use main model for auxiliary tasks

Step 1 of _resolve_auto() explicitly excluded 'custom' providers,
forcing custom endpoint users through the fragile fallback chain
instead of using their known-working main model credentials.

This caused silent compression failures for users on local OpenAI-
compatible endpoints — the summary generation would fail, middle
turns would be silently dropped, and the agent would lose all
conversation context.

Remove 'custom' from the exclusion list so custom endpoint users
get the same main-model-first treatment as DeepSeek, Anthropic,
Gemini, and other direct providers.

											
										
										
											2026-04-09 13:23:43 -07:00
+								            and main_provider not in ("auto", "")):
-												fix(agent): route compression aux through live session runtime

											
										
										
											2026-04-12 00:10:19 -04:00
+								        resolved_provider = main_provider
 								        explicit_base_url = None
 								        explicit_api_key = None
 								        if runtime_base_url and (main_provider == "custom" or main_provider.startswith("custom:")):
 								            resolved_provider = "custom"
 								            explicit_base_url = runtime_base_url
 								            explicit_api_key = runtime_api_key or None
 								        client, resolved = resolve_provider_client(
 								            resolved_provider,
 								            main_model,
 								            explicit_base_url=explicit_base_url,
 								            explicit_api_key=explicit_api_key,
 								            api_mode=runtime_api_mode or None,
 								        )
-												fix: use main provider model for auxiliary tasks on non-aggregator providers (#5091)

Users on direct API-key providers (Alibaba, DeepSeek, ZAI, etc.) without
an OpenRouter or Nous key would get broken auxiliary tasks (compression,
vision, etc.) because _resolve_auto() only tried aggregator providers
first, then fell back to iterating PROVIDER_REGISTRY with wrong default
model names.

Now _resolve_auto() checks the user's main provider first. If it's not
an aggregator (OpenRouter/Nous), it uses their main model directly for
all auxiliary tasks. Aggregator users still get the cheap gemini-flash
model as before.

Adds _read_main_provider() to read model.provider from config.yaml,
mirroring the existing _read_main_model().

Reported by SkyLinx — Alibaba Coding Plan user getting 400 errors from
google/gemini-3-flash-preview being sent to DashScope.
											
										
										
											2026-04-04 12:07:43 -07:00
+								        if client is not None:
 								            logger.info("Auxiliary auto-detect: using main provider %s (%s)",
 								                        main_provider, resolved or main_model)
 								            return client, resolved or main_model
 								    # ── Step 2: aggregator / fallback chain ──────────────────────────────
-												fix: add INFO-level logging for auxiliary provider resolution (#3866)

The auxiliary client's auto-detection chain was a black box — when
compression, summarization, or memory flush failed, the only clue was
a generic 'Request timed out' with no indication of which provider was
tried or why it was skipped.

Now logs at INFO level:
- 'Auxiliary auto-detect: using local/custom (qwen3.5-9b) — skipped:
  openrouter, nous' when auto-detection picks a provider
- 'Auxiliary compression: using auto (qwen3.5-9b) at http://localhost:11434/v1'
  before each auxiliary call
- 'Auxiliary compression: provider custom unavailable, falling back to
  openrouter' on fallback
- Clear warning with actionable guidance when NO provider is available:
  'Set OPENROUTER_API_KEY or configure a local model in config.yaml'
											
										
										
											2026-03-29 21:29:00 -07:00
+								    tried = []
-												fix: auxiliary client payment fallback — retry with next provider on 402 (#5599)

When a user runs out of OpenRouter credits and switches to Codex (or any
other provider), auxiliary tasks (compression, vision, web_extract) would
still try OpenRouter first and fail with 402.  Two fixes:

1. Payment fallback in call_llm(): When a resolved provider returns HTTP 402
   or a credit-related error, automatically retry with the next available
   provider in the auto-detection chain.  Skips the depleted provider and
   tries Nous → Custom → Codex → API-key providers.

2. Remove hardcoded OpenRouter fallback: The old code fell back specifically
   to OpenRouter when auto/custom resolution returned no client.  Now falls
   back to the full auto-detection chain, which handles any available
   provider — not just OpenRouter.

Also extracts _get_provider_chain() as a shared function (replaces inline
tuple in _resolve_auto and the new fallback), built at call time so test
patches on _try_* functions remain visible.

Adds 16 tests covering _is_payment_error(), _get_provider_chain(),
_try_payment_fallback(), and call_llm() integration with 402 retry.
											
										
										
											2026-04-06 12:41:40 -07:00
+								    for label, try_fn in _get_provider_chain():
-												feat: enhance auxiliary model configuration and environment variable handling

- Added support for auxiliary model overrides in the configuration, allowing users to specify providers and models for vision and web extraction tasks.
- Updated the CLI configuration example to include new auxiliary model settings.
- Enhanced the environment variable mapping in the CLI to accommodate auxiliary model configurations.
- Improved the resolution logic for auxiliary clients to support task-specific provider overrides.
- Updated relevant documentation and comments for clarity on the new features and their usage.

											
										
										
											2026-03-07 08:52:06 -08:00
+								        client, model = try_fn()
 								        if client is not None:
-												fix: add INFO-level logging for auxiliary provider resolution (#3866)

The auxiliary client's auto-detection chain was a black box — when
compression, summarization, or memory flush failed, the only clue was
a generic 'Request timed out' with no indication of which provider was
tried or why it was skipped.

Now logs at INFO level:
- 'Auxiliary auto-detect: using local/custom (qwen3.5-9b) — skipped:
  openrouter, nous' when auto-detection picks a provider
- 'Auxiliary compression: using auto (qwen3.5-9b) at http://localhost:11434/v1'
  before each auxiliary call
- 'Auxiliary compression: provider custom unavailable, falling back to
  openrouter' on fallback
- Clear warning with actionable guidance when NO provider is available:
  'Set OPENROUTER_API_KEY or configure a local model in config.yaml'
											
										
										
											2026-03-29 21:29:00 -07:00
+								            if tried:
 								                logger.info("Auxiliary auto-detect: using %s (%s) — skipped: %s",
 								                            label, model or "default", ", ".join(tried))
 								            else:
 								                logger.info("Auxiliary auto-detect: using %s (%s)", label, model or "default")
-												feat: enhance auxiliary model configuration and environment variable handling

- Added support for auxiliary model overrides in the configuration, allowing users to specify providers and models for vision and web extraction tasks.
- Updated the CLI configuration example to include new auxiliary model settings.
- Enhanced the environment variable mapping in the CLI to accommodate auxiliary model configurations.
- Improved the resolution logic for auxiliary clients to support task-specific provider overrides.
- Updated relevant documentation and comments for clarity on the new features and their usage.

											
										
										
											2026-03-07 08:52:06 -08:00
+								            return client, model
-												fix: add INFO-level logging for auxiliary provider resolution (#3866)

The auxiliary client's auto-detection chain was a black box — when
compression, summarization, or memory flush failed, the only clue was
a generic 'Request timed out' with no indication of which provider was
tried or why it was skipped.

Now logs at INFO level:
- 'Auxiliary auto-detect: using local/custom (qwen3.5-9b) — skipped:
  openrouter, nous' when auto-detection picks a provider
- 'Auxiliary compression: using auto (qwen3.5-9b) at http://localhost:11434/v1'
  before each auxiliary call
- 'Auxiliary compression: provider custom unavailable, falling back to
  openrouter' on fallback
- Clear warning with actionable guidance when NO provider is available:
  'Set OPENROUTER_API_KEY or configure a local model in config.yaml'
											
										
										
											2026-03-29 21:29:00 -07:00
+								        tried.append(label)
 								    logger.warning("Auxiliary auto-detect: no provider available (tried: %s). "
 								                   "Compression, summarization, and memory flush will not work. "
 								                   "Set OPENROUTER_API_KEY or configure a local model in config.yaml.",
 								                   ", ".join(tried))
-												feat: enhance auxiliary model configuration and environment variable handling

- Added support for auxiliary model overrides in the configuration, allowing users to specify providers and models for vision and web extraction tasks.
- Updated the CLI configuration example to include new auxiliary model settings.
- Enhanced the environment variable mapping in the CLI to accommodate auxiliary model configurations.
- Improved the resolution logic for auxiliary clients to support task-specific provider overrides.
- Updated relevant documentation and comments for clarity on the new features and their usage.

											
										
										
											2026-03-07 08:52:06 -08:00
+								    return None, None
-												feat: centralized provider router + fix Codex vision bypass + vision error handling

Three interconnected fixes for auxiliary client infrastructure:

1. CENTRALIZED PROVIDER ROUTER (auxiliary_client.py)
   Add resolve_provider_client(provider, model, async_mode) — a single
   entry point for creating properly configured clients. Given a provider
   name and optional model, it handles auth lookup (env vars, OAuth
   tokens, auth.json), base URL resolution, provider-specific headers,
   and API format differences (Chat Completions vs Responses API for
   Codex). All auxiliary consumers should route through this instead of
   ad-hoc env var lookups.

   Refactored get_text_auxiliary_client, get_async_text_auxiliary_client,
   and get_vision_auxiliary_client to use the router internally.

2. FIX CODEX VISION BYPASS (vision_tools.py)
   vision_tools.py was constructing a raw AsyncOpenAI client from the
   sync vision client's api_key/base_url, completely bypassing the Codex
   Responses API adapter. When the vision provider resolved to Codex,
   the raw client would hit chatgpt.com/backend-api/codex with
   chat.completions.create() which only supports the Responses API.

   Fix: Added get_async_vision_auxiliary_client() which properly wraps
   Codex into AsyncCodexAuxiliaryClient. vision_tools.py now uses this
   instead of manual client construction.

3. FIX COMPRESSION FALLBACK + VISION ERROR HANDLING
   - context_compressor.py: Removed _get_fallback_client() which blindly
     looked for OPENAI_API_KEY + OPENAI_BASE_URL (fails for Codex OAuth,
     API-key providers, users without OPENAI_BASE_URL set). Replaced
     with fallback loop through resolve_provider_client() for each
     known provider, with same-provider dedup.

   - vision_tools.py: Added error detection for vision capability
     failures. Returns clear message to the model when the configured
     model doesn't support vision, instead of a generic error.

Addresses #886

											
										
										
											2026-03-11 19:46:47 -07:00
+								# ── Centralized Provider Router ─────────────────────────────────────────────
 								#
 								# resolve_provider_client() is the single entry point for creating a properly
 								# configured client given a (provider, model) pair.  It handles auth lookup,
 								# base URL resolution, provider-specific headers, and API format differences
 								# (Chat Completions vs Responses API for Codex).
 								#
 								# All auxiliary consumer code should go through this or the public helpers
 								# below — never look up auth env vars ad-hoc.
-												fix(copilot): send vision header for Copilot vision requests

Thread a vision-request flag through auxiliary provider resolution so Copilot clients can include Copilot-Vision-Request only for vision tasks. This preserves normal text requests while ensuring Copilot vision payloads reach the vision-capable route.

Add regression coverage for Copilot vision routing and keep cached text and vision clients separate so a text client without the header is not reused for vision.

Co-authored-by: dhabibi <9087935+dhabibi@users.noreply.github.com>

											
										
										
											2026-04-26 21:29:55 +00:00
+								def _to_async_client(sync_client, model: str, is_vision: bool = False):
 								    """Convert a sync client to its async counterpart, preserving Codex routing.
 								    When ``is_vision=True`` and the underlying base URL is Copilot, the
 								    resulting async client carries the ``Copilot-Vision-Request: true``
 								    header so the request is routed to Copilot's vision-capable
 								    infrastructure (otherwise vision payloads silently time out).
 								    """
-												feat: centralized provider router + fix Codex vision bypass + vision error handling

Three interconnected fixes for auxiliary client infrastructure:

1. CENTRALIZED PROVIDER ROUTER (auxiliary_client.py)
   Add resolve_provider_client(provider, model, async_mode) — a single
   entry point for creating properly configured clients. Given a provider
   name and optional model, it handles auth lookup (env vars, OAuth
   tokens, auth.json), base URL resolution, provider-specific headers,
   and API format differences (Chat Completions vs Responses API for
   Codex). All auxiliary consumers should route through this instead of
   ad-hoc env var lookups.

   Refactored get_text_auxiliary_client, get_async_text_auxiliary_client,
   and get_vision_auxiliary_client to use the router internally.

2. FIX CODEX VISION BYPASS (vision_tools.py)
   vision_tools.py was constructing a raw AsyncOpenAI client from the
   sync vision client's api_key/base_url, completely bypassing the Codex
   Responses API adapter. When the vision provider resolved to Codex,
   the raw client would hit chatgpt.com/backend-api/codex with
   chat.completions.create() which only supports the Responses API.

   Fix: Added get_async_vision_auxiliary_client() which properly wraps
   Codex into AsyncCodexAuxiliaryClient. vision_tools.py now uses this
   instead of manual client construction.

3. FIX COMPRESSION FALLBACK + VISION ERROR HANDLING
   - context_compressor.py: Removed _get_fallback_client() which blindly
     looked for OPENAI_API_KEY + OPENAI_BASE_URL (fails for Codex OAuth,
     API-key providers, users without OPENAI_BASE_URL set). Replaced
     with fallback loop through resolve_provider_client() for each
     known provider, with same-provider dedup.

   - vision_tools.py: Added error detection for vision capability
     failures. Returns clear message to the model when the configured
     model doesn't support vision, instead of a generic error.

Addresses #886

											
										
										
											2026-03-11 19:46:47 -07:00
+								    from openai import AsyncOpenAI
 								    if isinstance(sync_client, CodexAuxiliaryClient):
 								        return AsyncCodexAuxiliaryClient(sync_client), model
-												feat: add native Anthropic auxiliary vision

											
										
										
											2026-03-14 21:14:20 -07:00
+								    if isinstance(sync_client, AnthropicAuxiliaryClient):
 								        return AsyncAnthropicAuxiliaryClient(sync_client), model
-												feat(providers): route gemini through the native AI Studio API

- add a native Gemini adapter over generateContent/streamGenerateContent
- switch the built-in gemini provider off the OpenAI-compatible endpoint
- preserve thought signatures and native functionResponse replay
- route auxiliary Gemini clients through the same adapter
- add focused unit coverage plus native-provider integration checks

											
										
										
											2026-04-20 00:00:50 +05:30
+								    try:
 								        from agent.gemini_native_adapter import GeminiNativeClient, AsyncGeminiNativeClient
 								        if isinstance(sync_client, GeminiNativeClient):
 								            return AsyncGeminiNativeClient(sync_client), model
 								    except ImportError:
 								        pass
-												fix: improve ACP type check and restore comment accuracy

- Use isinstance() with try/except import for CopilotACPClient check
  in _to_async_client instead of fragile __class__.__name__ string check
- Restore accurate comment: GPT-5.x models *require* (not 'often require')
  the Responses API on OpenAI/OpenRouter; ACP is the exception, not a
  softening of the requirement
- Add inline comment explaining the ACP exclusion rationale

											
										
										
											2026-04-13 16:06:22 -07:00
+								    try:
 								        from agent.copilot_acp_client import CopilotACPClient
 								        if isinstance(sync_client, CopilotACPClient):
 								            return sync_client, model
 								    except ImportError:
 								        pass
-												feat: centralized provider router + fix Codex vision bypass + vision error handling

Three interconnected fixes for auxiliary client infrastructure:

1. CENTRALIZED PROVIDER ROUTER (auxiliary_client.py)
   Add resolve_provider_client(provider, model, async_mode) — a single
   entry point for creating properly configured clients. Given a provider
   name and optional model, it handles auth lookup (env vars, OAuth
   tokens, auth.json), base URL resolution, provider-specific headers,
   and API format differences (Chat Completions vs Responses API for
   Codex). All auxiliary consumers should route through this instead of
   ad-hoc env var lookups.

   Refactored get_text_auxiliary_client, get_async_text_auxiliary_client,
   and get_vision_auxiliary_client to use the router internally.

2. FIX CODEX VISION BYPASS (vision_tools.py)
   vision_tools.py was constructing a raw AsyncOpenAI client from the
   sync vision client's api_key/base_url, completely bypassing the Codex
   Responses API adapter. When the vision provider resolved to Codex,
   the raw client would hit chatgpt.com/backend-api/codex with
   chat.completions.create() which only supports the Responses API.

   Fix: Added get_async_vision_auxiliary_client() which properly wraps
   Codex into AsyncCodexAuxiliaryClient. vision_tools.py now uses this
   instead of manual client construction.

3. FIX COMPRESSION FALLBACK + VISION ERROR HANDLING
   - context_compressor.py: Removed _get_fallback_client() which blindly
     looked for OPENAI_API_KEY + OPENAI_BASE_URL (fails for Codex OAuth,
     API-key providers, users without OPENAI_BASE_URL set). Replaced
     with fallback loop through resolve_provider_client() for each
     known provider, with same-provider dedup.

   - vision_tools.py: Added error detection for vision capability
     failures. Returns clear message to the model when the configured
     model doesn't support vision, instead of a generic error.

Addresses #886

											
										
										
											2026-03-11 19:46:47 -07:00
 								    async_kwargs = {
 								        "api_key": sync_client.api_key,
 								        "base_url": str(sync_client.base_url),
 								    }
-												fix: sweep remaining provider-URL substring checks across codebase

Completes the hostname-hardening sweep — every substring check against a
provider host in live-routing code is now hostname-based. This closes the
same false-positive class for OpenRouter, GitHub Copilot, Kimi, Qwen,
ChatGPT/Codex, Bedrock, GitHub Models, Vercel AI Gateway, Nous, Z.AI,
Moonshot, Arcee, and MiniMax that the original PR closed for OpenAI, xAI,
and Anthropic.

New helper:
- utils.base_url_host_matches(base_url, domain) — safe counterpart to
  'domain in base_url'. Accepts hostname equality and subdomain matches;
  rejects path segments, host suffixes, and prefix collisions.

Call sites converted (real-code only; tests, optional-skills, red-teaming
scripts untouched):

run_agent.py (10 sites):
- AIAgent.__init__ Bedrock branch, ChatGPT/Codex branch (also path check)
- header cascade for openrouter / copilot / kimi / qwen / chatgpt
- interleaved-thinking trigger (openrouter + claude)
- _is_openrouter_url(), _is_qwen_portal()
- is_native_anthropic check
- github-models-vs-copilot detection (3 sites)
- reasoning-capable route gate (nousresearch, vercel, github)
- codex-backend detection in API kwargs build
- fallback api_mode Bedrock detection

agent/auxiliary_client.py (7 sites):
- extra-headers cascades in 4 distinct client-construction paths
  (resolve custom, resolve auto, OpenRouter-fallback-to-custom,
  _async_client_from_sync, resolve_provider_client explicit-custom,
  resolve_auto_with_codex)
- _is_openrouter_client() base_url sniff

agent/usage_pricing.py:
- resolve_billing_route openrouter branch

agent/model_metadata.py:
- _is_openrouter_base_url(), Bedrock context-length lookup

hermes_cli/providers.py:
- determine_api_mode Bedrock heuristic

hermes_cli/runtime_provider.py:
- _is_openrouter_url flag for API-key preference (issues #420, #560)

hermes_cli/doctor.py:
- Kimi User-Agent header for /models probes

tools/delegate_tool.py:
- subagent Codex endpoint detection

trajectory_compressor.py:
- _detect_provider() cascade (8 providers: openrouter, nous, codex, zai,
  kimi-coding, arcee, minimax-cn, minimax)

cli.py, gateway/run.py:
- /model-switch cache-enabled hint (openrouter + claude)

Bedrock detection tightened from 'bedrock-runtime in url' to
'hostname starts with bedrock-runtime. AND host is under amazonaws.com'.
ChatGPT/Codex detection tightened from 'chatgpt.com/backend-api/codex in
url' to 'hostname is chatgpt.com AND path contains /backend-api/codex'.

Tests:
- tests/test_base_url_hostname.py extended with a base_url_host_matches
  suite (exact match, subdomain, path-segment rejection, host-suffix
  rejection, host-prefix rejection, empty-input, case-insensitivity,
  trailing dot).

Validation: 651 targeted tests pass (runtime_provider, minimax, bedrock,
gemini, auxiliary, codex_cloudflare, usage_pricing, compressor_fallback,
fallback_model, openai_client_lifecycle, provider_parity, cli_provider_resolution,
delegate, credential_pool, context_compressor, plus the 4 hostname test
modules). 26-assertion E2E call-site verification across 6 modules passes.

											
										
										
											2026-04-20 21:17:28 -07:00
+								    sync_base_url = str(sync_client.base_url)
 								    if base_url_host_matches(sync_base_url, "openrouter.ai"):
-												feat: centralized provider router + fix Codex vision bypass + vision error handling

Three interconnected fixes for auxiliary client infrastructure:

1. CENTRALIZED PROVIDER ROUTER (auxiliary_client.py)
   Add resolve_provider_client(provider, model, async_mode) — a single
   entry point for creating properly configured clients. Given a provider
   name and optional model, it handles auth lookup (env vars, OAuth
   tokens, auth.json), base URL resolution, provider-specific headers,
   and API format differences (Chat Completions vs Responses API for
   Codex). All auxiliary consumers should route through this instead of
   ad-hoc env var lookups.

   Refactored get_text_auxiliary_client, get_async_text_auxiliary_client,
   and get_vision_auxiliary_client to use the router internally.

2. FIX CODEX VISION BYPASS (vision_tools.py)
   vision_tools.py was constructing a raw AsyncOpenAI client from the
   sync vision client's api_key/base_url, completely bypassing the Codex
   Responses API adapter. When the vision provider resolved to Codex,
   the raw client would hit chatgpt.com/backend-api/codex with
   chat.completions.create() which only supports the Responses API.

   Fix: Added get_async_vision_auxiliary_client() which properly wraps
   Codex into AsyncCodexAuxiliaryClient. vision_tools.py now uses this
   instead of manual client construction.

3. FIX COMPRESSION FALLBACK + VISION ERROR HANDLING
   - context_compressor.py: Removed _get_fallback_client() which blindly
     looked for OPENAI_API_KEY + OPENAI_BASE_URL (fails for Codex OAuth,
     API-key providers, users without OPENAI_BASE_URL set). Replaced
     with fallback loop through resolve_provider_client() for each
     known provider, with same-provider dedup.

   - vision_tools.py: Added error detection for vision capability
     failures. Returns clear message to the model when the configured
     model doesn't support vision, instead of a generic error.

Addresses #886

											
										
										
											2026-03-11 19:46:47 -07:00
+								        async_kwargs["default_headers"] = dict(_OR_HEADERS)
-												fix: sweep remaining provider-URL substring checks across codebase

Completes the hostname-hardening sweep — every substring check against a
provider host in live-routing code is now hostname-based. This closes the
same false-positive class for OpenRouter, GitHub Copilot, Kimi, Qwen,
ChatGPT/Codex, Bedrock, GitHub Models, Vercel AI Gateway, Nous, Z.AI,
Moonshot, Arcee, and MiniMax that the original PR closed for OpenAI, xAI,
and Anthropic.

New helper:
- utils.base_url_host_matches(base_url, domain) — safe counterpart to
  'domain in base_url'. Accepts hostname equality and subdomain matches;
  rejects path segments, host suffixes, and prefix collisions.

Call sites converted (real-code only; tests, optional-skills, red-teaming
scripts untouched):

run_agent.py (10 sites):
- AIAgent.__init__ Bedrock branch, ChatGPT/Codex branch (also path check)
- header cascade for openrouter / copilot / kimi / qwen / chatgpt
- interleaved-thinking trigger (openrouter + claude)
- _is_openrouter_url(), _is_qwen_portal()
- is_native_anthropic check
- github-models-vs-copilot detection (3 sites)
- reasoning-capable route gate (nousresearch, vercel, github)
- codex-backend detection in API kwargs build
- fallback api_mode Bedrock detection

agent/auxiliary_client.py (7 sites):
- extra-headers cascades in 4 distinct client-construction paths
  (resolve custom, resolve auto, OpenRouter-fallback-to-custom,
  _async_client_from_sync, resolve_provider_client explicit-custom,
  resolve_auto_with_codex)
- _is_openrouter_client() base_url sniff

agent/usage_pricing.py:
- resolve_billing_route openrouter branch

agent/model_metadata.py:
- _is_openrouter_base_url(), Bedrock context-length lookup

hermes_cli/providers.py:
- determine_api_mode Bedrock heuristic

hermes_cli/runtime_provider.py:
- _is_openrouter_url flag for API-key preference (issues #420, #560)

hermes_cli/doctor.py:
- Kimi User-Agent header for /models probes

tools/delegate_tool.py:
- subagent Codex endpoint detection

trajectory_compressor.py:
- _detect_provider() cascade (8 providers: openrouter, nous, codex, zai,
  kimi-coding, arcee, minimax-cn, minimax)

cli.py, gateway/run.py:
- /model-switch cache-enabled hint (openrouter + claude)

Bedrock detection tightened from 'bedrock-runtime in url' to
'hostname starts with bedrock-runtime. AND host is under amazonaws.com'.
ChatGPT/Codex detection tightened from 'chatgpt.com/backend-api/codex in
url' to 'hostname is chatgpt.com AND path contains /backend-api/codex'.

Tests:
- tests/test_base_url_hostname.py extended with a base_url_host_matches
  suite (exact match, subdomain, path-segment rejection, host-suffix
  rejection, host-prefix rejection, empty-input, case-insensitivity,
  trailing dot).

Validation: 651 targeted tests pass (runtime_provider, minimax, bedrock,
gemini, auxiliary, codex_cloudflare, usage_pricing, compressor_fallback,
fallback_model, openai_client_lifecycle, provider_parity, cli_provider_resolution,
delegate, credential_pool, context_compressor, plus the 4 hostname test
modules). 26-assertion E2E call-site verification across 6 modules passes.

											
										
										
											2026-04-20 21:17:28 -07:00
+								    elif base_url_host_matches(sync_base_url, "api.githubcopilot.com"):
-												fix(copilot): send vision header for Copilot vision requests

Thread a vision-request flag through auxiliary provider resolution so Copilot clients can include Copilot-Vision-Request only for vision tasks. This preserves normal text requests while ensuring Copilot vision payloads reach the vision-capable route.

Add regression coverage for Copilot vision routing and keep cached text and vision clients separate so a text client without the header is not reused for vision.

Co-authored-by: dhabibi <9087935+dhabibi@users.noreply.github.com>

											
										
										
											2026-04-26 21:29:55 +00:00
+								        from hermes_cli.copilot_auth import copilot_request_headers
-												feat: integrate GitHub Copilot providers across Hermes

Add first-class GitHub Copilot and Copilot ACP provider support across
model selection, runtime provider resolution, CLI sessions, delegated
subagents, cron jobs, and the Telegram gateway.

This also normalizes Copilot model catalogs and API modes, introduces a
Copilot ACP OpenAI-compatible shim, and fixes service-mode auth by
resolving Homebrew-installed gh binaries under launchd.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

											
										
										
											2026-03-17 23:40:22 -07:00
-												fix(copilot): send vision header for Copilot vision requests

Thread a vision-request flag through auxiliary provider resolution so Copilot clients can include Copilot-Vision-Request only for vision tasks. This preserves normal text requests while ensuring Copilot vision payloads reach the vision-capable route.

Add regression coverage for Copilot vision routing and keep cached text and vision clients separate so a text client without the header is not reused for vision.

Co-authored-by: dhabibi <9087935+dhabibi@users.noreply.github.com>

											
										
										
											2026-04-26 21:29:55 +00:00
+								        async_kwargs["default_headers"] = copilot_request_headers(
 								            is_agent_turn=True, is_vision=is_vision
 								        )
-												fix: sweep remaining provider-URL substring checks across codebase

Completes the hostname-hardening sweep — every substring check against a
provider host in live-routing code is now hostname-based. This closes the
same false-positive class for OpenRouter, GitHub Copilot, Kimi, Qwen,
ChatGPT/Codex, Bedrock, GitHub Models, Vercel AI Gateway, Nous, Z.AI,
Moonshot, Arcee, and MiniMax that the original PR closed for OpenAI, xAI,
and Anthropic.

New helper:
- utils.base_url_host_matches(base_url, domain) — safe counterpart to
  'domain in base_url'. Accepts hostname equality and subdomain matches;
  rejects path segments, host suffixes, and prefix collisions.

Call sites converted (real-code only; tests, optional-skills, red-teaming
scripts untouched):

run_agent.py (10 sites):
- AIAgent.__init__ Bedrock branch, ChatGPT/Codex branch (also path check)
- header cascade for openrouter / copilot / kimi / qwen / chatgpt
- interleaved-thinking trigger (openrouter + claude)
- _is_openrouter_url(), _is_qwen_portal()
- is_native_anthropic check
- github-models-vs-copilot detection (3 sites)
- reasoning-capable route gate (nousresearch, vercel, github)
- codex-backend detection in API kwargs build
- fallback api_mode Bedrock detection

agent/auxiliary_client.py (7 sites):
- extra-headers cascades in 4 distinct client-construction paths
  (resolve custom, resolve auto, OpenRouter-fallback-to-custom,
  _async_client_from_sync, resolve_provider_client explicit-custom,
  resolve_auto_with_codex)
- _is_openrouter_client() base_url sniff

agent/usage_pricing.py:
- resolve_billing_route openrouter branch

agent/model_metadata.py:
- _is_openrouter_base_url(), Bedrock context-length lookup

hermes_cli/providers.py:
- determine_api_mode Bedrock heuristic

hermes_cli/runtime_provider.py:
- _is_openrouter_url flag for API-key preference (issues #420, #560)

hermes_cli/doctor.py:
- Kimi User-Agent header for /models probes

tools/delegate_tool.py:
- subagent Codex endpoint detection

trajectory_compressor.py:
- _detect_provider() cascade (8 providers: openrouter, nous, codex, zai,
  kimi-coding, arcee, minimax-cn, minimax)

cli.py, gateway/run.py:
- /model-switch cache-enabled hint (openrouter + claude)

Bedrock detection tightened from 'bedrock-runtime in url' to
'hostname starts with bedrock-runtime. AND host is under amazonaws.com'.
ChatGPT/Codex detection tightened from 'chatgpt.com/backend-api/codex in
url' to 'hostname is chatgpt.com AND path contains /backend-api/codex'.

Tests:
- tests/test_base_url_hostname.py extended with a base_url_host_matches
  suite (exact match, subdomain, path-segment rejection, host-suffix
  rejection, host-prefix rejection, empty-input, case-insensitivity,
  trailing dot).

Validation: 651 targeted tests pass (runtime_provider, minimax, bedrock,
gemini, auxiliary, codex_cloudflare, usage_pricing, compressor_fallback,
fallback_model, openai_client_lifecycle, provider_parity, cli_provider_resolution,
delegate, credential_pool, context_compressor, plus the 4 hostname test
modules). 26-assertion E2E call-site verification across 6 modules passes.

											
										
										
											2026-04-20 21:17:28 -07:00
+								    elif base_url_host_matches(sync_base_url, "api.kimi.com"):
-												fix: Update Kimi Coding API endpoint and User-Agent

											
										
										
											2026-04-18 22:55:36 +08:00
+								        async_kwargs["default_headers"] = {"User-Agent": "claude-code/0.1.0"}
-												feat: centralized provider router + fix Codex vision bypass + vision error handling

Three interconnected fixes for auxiliary client infrastructure:

1. CENTRALIZED PROVIDER ROUTER (auxiliary_client.py)
   Add resolve_provider_client(provider, model, async_mode) — a single
   entry point for creating properly configured clients. Given a provider
   name and optional model, it handles auth lookup (env vars, OAuth
   tokens, auth.json), base URL resolution, provider-specific headers,
   and API format differences (Chat Completions vs Responses API for
   Codex). All auxiliary consumers should route through this instead of
   ad-hoc env var lookups.

   Refactored get_text_auxiliary_client, get_async_text_auxiliary_client,
   and get_vision_auxiliary_client to use the router internally.

2. FIX CODEX VISION BYPASS (vision_tools.py)
   vision_tools.py was constructing a raw AsyncOpenAI client from the
   sync vision client's api_key/base_url, completely bypassing the Codex
   Responses API adapter. When the vision provider resolved to Codex,
   the raw client would hit chatgpt.com/backend-api/codex with
   chat.completions.create() which only supports the Responses API.

   Fix: Added get_async_vision_auxiliary_client() which properly wraps
   Codex into AsyncCodexAuxiliaryClient. vision_tools.py now uses this
   instead of manual client construction.

3. FIX COMPRESSION FALLBACK + VISION ERROR HANDLING
   - context_compressor.py: Removed _get_fallback_client() which blindly
     looked for OPENAI_API_KEY + OPENAI_BASE_URL (fails for Codex OAuth,
     API-key providers, users without OPENAI_BASE_URL set). Replaced
     with fallback loop through resolve_provider_client() for each
     known provider, with same-provider dedup.

   - vision_tools.py: Added error detection for vision capability
     failures. Returns clear message to the model when the configured
     model doesn't support vision, instead of a generic error.

Addresses #886

											
										
										
											2026-03-11 19:46:47 -07:00
+								    return AsyncOpenAI(**async_kwargs), model
-												fix(model): normalize direct provider ids in auxiliary routing

											
										
										
											2026-04-09 21:20:29 -07:00
+								def _normalize_resolved_model(model_name: Optional[str], provider: str) -> Optional[str]:
 								    """Normalize a resolved model for the provider that will receive it."""
 								    if not model_name:
 								        return model_name
 								    try:
 								        from hermes_cli.model_normalize import normalize_model_for_provider
 								        return normalize_model_for_provider(model_name, provider)
 								    except Exception:
 								        return model_name
-												feat: centralized provider router + fix Codex vision bypass + vision error handling

Three interconnected fixes for auxiliary client infrastructure:

1. CENTRALIZED PROVIDER ROUTER (auxiliary_client.py)
   Add resolve_provider_client(provider, model, async_mode) — a single
   entry point for creating properly configured clients. Given a provider
   name and optional model, it handles auth lookup (env vars, OAuth
   tokens, auth.json), base URL resolution, provider-specific headers,
   and API format differences (Chat Completions vs Responses API for
   Codex). All auxiliary consumers should route through this instead of
   ad-hoc env var lookups.

   Refactored get_text_auxiliary_client, get_async_text_auxiliary_client,
   and get_vision_auxiliary_client to use the router internally.

2. FIX CODEX VISION BYPASS (vision_tools.py)
   vision_tools.py was constructing a raw AsyncOpenAI client from the
   sync vision client's api_key/base_url, completely bypassing the Codex
   Responses API adapter. When the vision provider resolved to Codex,
   the raw client would hit chatgpt.com/backend-api/codex with
   chat.completions.create() which only supports the Responses API.

   Fix: Added get_async_vision_auxiliary_client() which properly wraps
   Codex into AsyncCodexAuxiliaryClient. vision_tools.py now uses this
   instead of manual client construction.

3. FIX COMPRESSION FALLBACK + VISION ERROR HANDLING
   - context_compressor.py: Removed _get_fallback_client() which blindly
     looked for OPENAI_API_KEY + OPENAI_BASE_URL (fails for Codex OAuth,
     API-key providers, users without OPENAI_BASE_URL set). Replaced
     with fallback loop through resolve_provider_client() for each
     known provider, with same-provider dedup.

   - vision_tools.py: Added error detection for vision capability
     failures. Returns clear message to the model when the configured
     model doesn't support vision, instead of a generic error.

Addresses #886

											
										
										
											2026-03-11 19:46:47 -07:00
+								def resolve_provider_client(
 								    provider: str,
 								    model: str = None,
 								    async_mode: bool = False,
-												refactor: route main agent client + fallback through centralized router

Phase 2 of the provider router migration — route the main agent's
client construction and fallback activation through
resolve_provider_client() instead of duplicated ad-hoc logic.

run_agent.py:
- __init__: When no explicit api_key/base_url, use
  resolve_provider_client(provider, raw_codex=True) for client
  construction. Explicit creds (from CLI/gateway runtime provider)
  still construct directly.
- _try_activate_fallback: Replace _resolve_fallback_credentials and
  its duplicated _FALLBACK_API_KEY_PROVIDERS / _FALLBACK_OAUTH_PROVIDERS
  dicts with a single resolve_provider_client() call. The router
  handles all provider types (API-key, OAuth, Codex) centrally.
- Remove _resolve_fallback_credentials method and both fallback dicts.

agent/auxiliary_client.py:
- Add raw_codex parameter to resolve_provider_client(). When True,
  returns the raw OpenAI client for Codex providers instead of wrapping
  in CodexAuxiliaryClient. The main agent needs this for direct
  responses.stream() access.

3251 passed, 2 pre-existing unrelated failures.

											
										
										
											2026-03-11 21:38:29 -07:00
+								    raw_codex: bool = False,
-												feat: add direct endpoint overrides for auxiliary and delegation

Add base_url/api_key overrides for auxiliary tasks and delegation so users can
route those flows straight to a custom OpenAI-compatible endpoint without
having to rely on provider=main or named custom providers.

Also clear gateway session env vars in test isolation so the full suite stays
deterministic when run from a messaging-backed agent session.

											
										
										
											2026-03-14 20:48:29 -07:00
+								    explicit_base_url: str = None,
 								    explicit_api_key: str = None,
-												fix(auxiliary): honor api_mode in auxiliary client (#6800)

The auxiliary client always calls client.chat.completions.create(),
ignoring the api_mode config flag. This breaks codex-family models
(e.g. gpt-5.3-codex) on direct OpenAI API keys, which need the
/v1/responses endpoint.

Changes:
- Expand _resolve_task_provider_model to return api_mode (5-tuple)
- Read api_mode from auxiliary.{task}.api_mode config and env vars
  (AUXILIARY_{TASK}_API_MODE)
- Pass api_mode through _get_cached_client to resolve_provider_client
- Add _needs_codex_wrap/_wrap_if_needed helpers that wrap plain OpenAI
  clients in CodexAuxiliaryClient when api_mode=codex_responses or
  when auto-detection finds api.openai.com + codex model pattern
- Apply wrapping at all custom endpoint, named custom provider, and
  API-key provider return paths
- Update test mocks for the new 5-tuple return format

Users can now set:
  auxiliary:
    compression:
      model: gpt-5.3-codex
      base_url: https://api.openai.com/v1
      api_mode: codex_responses

Closes #6800

											
										
										
											2026-04-11 13:50:43 +05:30
+								    api_mode: str = None,
-												fix(agent): route compression aux through live session runtime

											
										
										
											2026-04-12 00:10:19 -04:00
+								    main_runtime: Optional[Dict[str, Any]] = None,
-												fix(copilot): send vision header for Copilot vision requests

Thread a vision-request flag through auxiliary provider resolution so Copilot clients can include Copilot-Vision-Request only for vision tasks. This preserves normal text requests while ensuring Copilot vision payloads reach the vision-capable route.

Add regression coverage for Copilot vision routing and keep cached text and vision clients separate so a text client without the header is not reused for vision.

Co-authored-by: dhabibi <9087935+dhabibi@users.noreply.github.com>

											
										
										
											2026-04-26 21:29:55 +00:00
+								    is_vision: bool = False,
-												feat: centralized provider router + fix Codex vision bypass + vision error handling

Three interconnected fixes for auxiliary client infrastructure:

1. CENTRALIZED PROVIDER ROUTER (auxiliary_client.py)
   Add resolve_provider_client(provider, model, async_mode) — a single
   entry point for creating properly configured clients. Given a provider
   name and optional model, it handles auth lookup (env vars, OAuth
   tokens, auth.json), base URL resolution, provider-specific headers,
   and API format differences (Chat Completions vs Responses API for
   Codex). All auxiliary consumers should route through this instead of
   ad-hoc env var lookups.

   Refactored get_text_auxiliary_client, get_async_text_auxiliary_client,
   and get_vision_auxiliary_client to use the router internally.

2. FIX CODEX VISION BYPASS (vision_tools.py)
   vision_tools.py was constructing a raw AsyncOpenAI client from the
   sync vision client's api_key/base_url, completely bypassing the Codex
   Responses API adapter. When the vision provider resolved to Codex,
   the raw client would hit chatgpt.com/backend-api/codex with
   chat.completions.create() which only supports the Responses API.

   Fix: Added get_async_vision_auxiliary_client() which properly wraps
   Codex into AsyncCodexAuxiliaryClient. vision_tools.py now uses this
   instead of manual client construction.

3. FIX COMPRESSION FALLBACK + VISION ERROR HANDLING
   - context_compressor.py: Removed _get_fallback_client() which blindly
     looked for OPENAI_API_KEY + OPENAI_BASE_URL (fails for Codex OAuth,
     API-key providers, users without OPENAI_BASE_URL set). Replaced
     with fallback loop through resolve_provider_client() for each
     known provider, with same-provider dedup.

   - vision_tools.py: Added error detection for vision capability
     failures. Returns clear message to the model when the configured
     model doesn't support vision, instead of a generic error.

Addresses #886

											
										
										
											2026-03-11 19:46:47 -07:00
+								) -> Tuple[Optional[Any], Optional[str]]:
 								    """Central router: given a provider name and optional model, return a
 								    configured client with the correct auth, base URL, and API format.
 								    The returned client always exposes ``.chat.completions.create()`` — for
 								    Codex/Responses API providers, an adapter handles the translation
 								    transparently.
 								    Args:
 								        provider: Provider identifier.  One of:
 								            "openrouter", "nous", "openai-codex" (or "codex"),
-												chore: remove nous-api provider (API key path)

Nous Portal only supports OAuth authentication. Remove the 'nous-api'
provider which allowed direct API key access via NOUS_API_KEY env var.

Removed from:
- hermes_cli/auth.py: PROVIDER_REGISTRY entry + aliases
- hermes_cli/config.py: OPTIONAL_ENV_VARS entry
- hermes_cli/setup.py: setup wizard option + model selection handler
  (reindexed remaining provider choices)
- agent/auxiliary_client.py: docstring references
- tests/test_runtime_provider_resolution.py: nous-api test
- tests/integration/test_web_tools.py: renamed dict key

											
										
										
											2026-03-11 20:14:44 -07:00
+								            "zai", "kimi-coding", "minimax", "minimax-cn",
-												feat: centralized provider router + fix Codex vision bypass + vision error handling

Three interconnected fixes for auxiliary client infrastructure:

1. CENTRALIZED PROVIDER ROUTER (auxiliary_client.py)
   Add resolve_provider_client(provider, model, async_mode) — a single
   entry point for creating properly configured clients. Given a provider
   name and optional model, it handles auth lookup (env vars, OAuth
   tokens, auth.json), base URL resolution, provider-specific headers,
   and API format differences (Chat Completions vs Responses API for
   Codex). All auxiliary consumers should route through this instead of
   ad-hoc env var lookups.

   Refactored get_text_auxiliary_client, get_async_text_auxiliary_client,
   and get_vision_auxiliary_client to use the router internally.

2. FIX CODEX VISION BYPASS (vision_tools.py)
   vision_tools.py was constructing a raw AsyncOpenAI client from the
   sync vision client's api_key/base_url, completely bypassing the Codex
   Responses API adapter. When the vision provider resolved to Codex,
   the raw client would hit chatgpt.com/backend-api/codex with
   chat.completions.create() which only supports the Responses API.

   Fix: Added get_async_vision_auxiliary_client() which properly wraps
   Codex into AsyncCodexAuxiliaryClient. vision_tools.py now uses this
   instead of manual client construction.

3. FIX COMPRESSION FALLBACK + VISION ERROR HANDLING
   - context_compressor.py: Removed _get_fallback_client() which blindly
     looked for OPENAI_API_KEY + OPENAI_BASE_URL (fails for Codex OAuth,
     API-key providers, users without OPENAI_BASE_URL set). Replaced
     with fallback loop through resolve_provider_client() for each
     known provider, with same-provider dedup.

   - vision_tools.py: Added error detection for vision capability
     failures. Returns clear message to the model when the configured
     model doesn't support vision, instead of a generic error.

Addresses #886

											
										
										
											2026-03-11 19:46:47 -07:00
+								            "custom" (OPENAI_BASE_URL + OPENAI_API_KEY),
 								            "auto" (full auto-detection chain).
 								        model: Model slug override.  If None, uses the provider's default
 								               auxiliary model.
 								        async_mode: If True, return an async-compatible client.
-												refactor: route main agent client + fallback through centralized router

Phase 2 of the provider router migration — route the main agent's
client construction and fallback activation through
resolve_provider_client() instead of duplicated ad-hoc logic.

run_agent.py:
- __init__: When no explicit api_key/base_url, use
  resolve_provider_client(provider, raw_codex=True) for client
  construction. Explicit creds (from CLI/gateway runtime provider)
  still construct directly.
- _try_activate_fallback: Replace _resolve_fallback_credentials and
  its duplicated _FALLBACK_API_KEY_PROVIDERS / _FALLBACK_OAUTH_PROVIDERS
  dicts with a single resolve_provider_client() call. The router
  handles all provider types (API-key, OAuth, Codex) centrally.
- Remove _resolve_fallback_credentials method and both fallback dicts.

agent/auxiliary_client.py:
- Add raw_codex parameter to resolve_provider_client(). When True,
  returns the raw OpenAI client for Codex providers instead of wrapping
  in CodexAuxiliaryClient. The main agent needs this for direct
  responses.stream() access.

3251 passed, 2 pre-existing unrelated failures.

											
										
										
											2026-03-11 21:38:29 -07:00
+								        raw_codex: If True, return a raw OpenAI client for Codex providers
 								            instead of wrapping in CodexAuxiliaryClient.  Use this when
 								            the caller needs direct access to responses.stream() (e.g.,
 								            the main agent loop).
-												feat: add direct endpoint overrides for auxiliary and delegation

Add base_url/api_key overrides for auxiliary tasks and delegation so users can
route those flows straight to a custom OpenAI-compatible endpoint without
having to rely on provider=main or named custom providers.

Also clear gateway session env vars in test isolation so the full suite stays
deterministic when run from a messaging-backed agent session.

											
										
										
											2026-03-14 20:48:29 -07:00
+								        explicit_base_url: Optional direct OpenAI-compatible endpoint.
 								        explicit_api_key: Optional API key paired with explicit_base_url.
-												fix(auxiliary): honor api_mode in auxiliary client (#6800)

The auxiliary client always calls client.chat.completions.create(),
ignoring the api_mode config flag. This breaks codex-family models
(e.g. gpt-5.3-codex) on direct OpenAI API keys, which need the
/v1/responses endpoint.

Changes:
- Expand _resolve_task_provider_model to return api_mode (5-tuple)
- Read api_mode from auxiliary.{task}.api_mode config and env vars
  (AUXILIARY_{TASK}_API_MODE)
- Pass api_mode through _get_cached_client to resolve_provider_client
- Add _needs_codex_wrap/_wrap_if_needed helpers that wrap plain OpenAI
  clients in CodexAuxiliaryClient when api_mode=codex_responses or
  when auto-detection finds api.openai.com + codex model pattern
- Apply wrapping at all custom endpoint, named custom provider, and
  API-key provider return paths
- Update test mocks for the new 5-tuple return format

Users can now set:
  auxiliary:
    compression:
      model: gpt-5.3-codex
      base_url: https://api.openai.com/v1
      api_mode: codex_responses

Closes #6800

											
										
										
											2026-04-11 13:50:43 +05:30
+								        api_mode: API mode override.  One of "chat_completions",
 								            "codex_responses", or None (auto-detect).  When set to
 								            "codex_responses", the client is wrapped in
 								            CodexAuxiliaryClient to route through the Responses API.
-												feat: centralized provider router + fix Codex vision bypass + vision error handling

Three interconnected fixes for auxiliary client infrastructure:

1. CENTRALIZED PROVIDER ROUTER (auxiliary_client.py)
   Add resolve_provider_client(provider, model, async_mode) — a single
   entry point for creating properly configured clients. Given a provider
   name and optional model, it handles auth lookup (env vars, OAuth
   tokens, auth.json), base URL resolution, provider-specific headers,
   and API format differences (Chat Completions vs Responses API for
   Codex). All auxiliary consumers should route through this instead of
   ad-hoc env var lookups.

   Refactored get_text_auxiliary_client, get_async_text_auxiliary_client,
   and get_vision_auxiliary_client to use the router internally.

2. FIX CODEX VISION BYPASS (vision_tools.py)
   vision_tools.py was constructing a raw AsyncOpenAI client from the
   sync vision client's api_key/base_url, completely bypassing the Codex
   Responses API adapter. When the vision provider resolved to Codex,
   the raw client would hit chatgpt.com/backend-api/codex with
   chat.completions.create() which only supports the Responses API.

   Fix: Added get_async_vision_auxiliary_client() which properly wraps
   Codex into AsyncCodexAuxiliaryClient. vision_tools.py now uses this
   instead of manual client construction.

3. FIX COMPRESSION FALLBACK + VISION ERROR HANDLING
   - context_compressor.py: Removed _get_fallback_client() which blindly
     looked for OPENAI_API_KEY + OPENAI_BASE_URL (fails for Codex OAuth,
     API-key providers, users without OPENAI_BASE_URL set). Replaced
     with fallback loop through resolve_provider_client() for each
     known provider, with same-provider dedup.

   - vision_tools.py: Added error detection for vision capability
     failures. Returns clear message to the model when the configured
     model doesn't support vision, instead of a generic error.

Addresses #886

											
										
										
											2026-03-11 19:46:47 -07:00
 								    Returns:
 								        (client, resolved_model) or (None, None) if auth is unavailable.
 								    """
-												fix(runtime): surface malformed proxy env and base URL before client init

When proxy env vars (HTTP_PROXY, HTTPS_PROXY, ALL_PROXY) contain
malformed URLs — e.g. 'http://127.0.0.1:6153export' from a broken
shell config — the OpenAI/httpx client throws a cryptic 'Invalid port'
error that doesn't identify the offending variable.

Add _validate_proxy_env_urls() and _validate_base_url() in
auxiliary_client.py, called from resolve_provider_client() and
_create_openai_client() to fail fast with a clear, actionable error
message naming the broken env var or URL.

Closes #6360
Co-authored-by: MestreY0d4-Uninter <MestreY0d4-Uninter@users.noreply.github.com>

											
										
										
											2026-04-15 15:07:11 -07:00
+								    _validate_proxy_env_urls()
-												feat: centralized provider router + fix Codex vision bypass + vision error handling

Three interconnected fixes for auxiliary client infrastructure:

1. CENTRALIZED PROVIDER ROUTER (auxiliary_client.py)
   Add resolve_provider_client(provider, model, async_mode) — a single
   entry point for creating properly configured clients. Given a provider
   name and optional model, it handles auth lookup (env vars, OAuth
   tokens, auth.json), base URL resolution, provider-specific headers,
   and API format differences (Chat Completions vs Responses API for
   Codex). All auxiliary consumers should route through this instead of
   ad-hoc env var lookups.

   Refactored get_text_auxiliary_client, get_async_text_auxiliary_client,
   and get_vision_auxiliary_client to use the router internally.

2. FIX CODEX VISION BYPASS (vision_tools.py)
   vision_tools.py was constructing a raw AsyncOpenAI client from the
   sync vision client's api_key/base_url, completely bypassing the Codex
   Responses API adapter. When the vision provider resolved to Codex,
   the raw client would hit chatgpt.com/backend-api/codex with
   chat.completions.create() which only supports the Responses API.

   Fix: Added get_async_vision_auxiliary_client() which properly wraps
   Codex into AsyncCodexAuxiliaryClient. vision_tools.py now uses this
   instead of manual client construction.

3. FIX COMPRESSION FALLBACK + VISION ERROR HANDLING
   - context_compressor.py: Removed _get_fallback_client() which blindly
     looked for OPENAI_API_KEY + OPENAI_BASE_URL (fails for Codex OAuth,
     API-key providers, users without OPENAI_BASE_URL set). Replaced
     with fallback loop through resolve_provider_client() for each
     known provider, with same-provider dedup.

   - vision_tools.py: Added error detection for vision capability
     failures. Returns clear message to the model when the configured
     model doesn't support vision, instead of a generic error.

Addresses #886

											
										
										
											2026-03-11 19:46:47 -07:00
+								    # Normalise aliases
-												fix: provider/model resolution — salvage 4 PRs + MiniMax aux URL fix (#5983)

Salvaged fixes from community PRs:

- fix(model_switch): _read_auth_store → _load_auth_store + fix auth store
  key lookup (was checking top-level dict instead of store['providers']).
  OAuth providers now correctly detected in /model picker.
  Cherry-picked from PR #5911 by Xule Lin (linxule).

- fix(ollama): pass num_ctx to override 2048 default context window.
  Ollama defaults to 2048 context regardless of model capabilities. Now
  auto-detects from /api/show metadata and injects num_ctx into every
  request. Config override via model.ollama_num_ctx. Fixes #2708.
  Cherry-picked from PR #5929 by kshitij (kshitijk4poor).

- fix(aux): normalize provider aliases for vision/auxiliary routing.
  Adds _normalize_aux_provider() with 17 aliases (google→gemini,
  claude→anthropic, glm→zai, etc). Fixes vision routing failure when
  provider is set to 'google' instead of 'gemini'.
  Cherry-picked from PR #5793 by e11i (Elizabeth1979).

- fix(aux): rewrite MiniMax /anthropic base URLs to /v1 for OpenAI SDK.
  MiniMax's inference_base_url ends in /anthropic (Anthropic Messages API),
  but auxiliary client uses OpenAI SDK which appends /chat/completions →
  404 at /anthropic/chat/completions. Generic _to_openai_base_url() helper
  rewrites terminal /anthropic to /v1 for OpenAI-compatible endpoint.
  Inspired by PR #5786 by Lempkey.

Added debug logging to silent exception blocks across all fixes.

Co-authored-by: Hermes Agent <hermes@nousresearch.com>
											
										
										
											2026-04-07 22:23:28 -07:00
+								    provider = _normalize_aux_provider(provider)
-												feat: centralized provider router + fix Codex vision bypass + vision error handling

Three interconnected fixes for auxiliary client infrastructure:

1. CENTRALIZED PROVIDER ROUTER (auxiliary_client.py)
   Add resolve_provider_client(provider, model, async_mode) — a single
   entry point for creating properly configured clients. Given a provider
   name and optional model, it handles auth lookup (env vars, OAuth
   tokens, auth.json), base URL resolution, provider-specific headers,
   and API format differences (Chat Completions vs Responses API for
   Codex). All auxiliary consumers should route through this instead of
   ad-hoc env var lookups.

   Refactored get_text_auxiliary_client, get_async_text_auxiliary_client,
   and get_vision_auxiliary_client to use the router internally.

2. FIX CODEX VISION BYPASS (vision_tools.py)
   vision_tools.py was constructing a raw AsyncOpenAI client from the
   sync vision client's api_key/base_url, completely bypassing the Codex
   Responses API adapter. When the vision provider resolved to Codex,
   the raw client would hit chatgpt.com/backend-api/codex with
   chat.completions.create() which only supports the Responses API.

   Fix: Added get_async_vision_auxiliary_client() which properly wraps
   Codex into AsyncCodexAuxiliaryClient. vision_tools.py now uses this
   instead of manual client construction.

3. FIX COMPRESSION FALLBACK + VISION ERROR HANDLING
   - context_compressor.py: Removed _get_fallback_client() which blindly
     looked for OPENAI_API_KEY + OPENAI_BASE_URL (fails for Codex OAuth,
     API-key providers, users without OPENAI_BASE_URL set). Replaced
     with fallback loop through resolve_provider_client() for each
     known provider, with same-provider dedup.

   - vision_tools.py: Added error detection for vision capability
     failures. Returns clear message to the model when the configured
     model doesn't support vision, instead of a generic error.

Addresses #886

											
										
										
											2026-03-11 19:46:47 -07:00
-												fix(auxiliary): honor api_mode in auxiliary client (#6800)

The auxiliary client always calls client.chat.completions.create(),
ignoring the api_mode config flag. This breaks codex-family models
(e.g. gpt-5.3-codex) on direct OpenAI API keys, which need the
/v1/responses endpoint.

Changes:
- Expand _resolve_task_provider_model to return api_mode (5-tuple)
- Read api_mode from auxiliary.{task}.api_mode config and env vars
  (AUXILIARY_{TASK}_API_MODE)
- Pass api_mode through _get_cached_client to resolve_provider_client
- Add _needs_codex_wrap/_wrap_if_needed helpers that wrap plain OpenAI
  clients in CodexAuxiliaryClient when api_mode=codex_responses or
  when auto-detection finds api.openai.com + codex model pattern
- Apply wrapping at all custom endpoint, named custom provider, and
  API-key provider return paths
- Update test mocks for the new 5-tuple return format

Users can now set:
  auxiliary:
    compression:
      model: gpt-5.3-codex
      base_url: https://api.openai.com/v1
      api_mode: codex_responses

Closes #6800

											
										
										
											2026-04-11 13:50:43 +05:30
+								    def _needs_codex_wrap(client_obj, base_url_str: str, model_str: str) -> bool:
 								        """Decide if a plain OpenAI client should be wrapped for Responses API.
 								        Returns True when api_mode is explicitly "codex_responses", or when
 								        auto-detection (api.openai.com + codex-family model) suggests it.
 								        Already-wrapped clients (CodexAuxiliaryClient) are skipped.
 								        """
 								        if isinstance(client_obj, CodexAuxiliaryClient):
 								            return False
 								        if raw_codex:
 								            return False
 								        if api_mode == "codex_responses":
 								            return True
 								        # Auto-detect: api.openai.com + codex model name pattern
 								        if api_mode and api_mode != "codex_responses":
 								            return False  # explicit non-codex mode
-												fix: extend hostname-match provider detection across remaining call sites

Aslaaen's fix in the original PR covered _detect_api_mode_for_url and the
two openai/xai sites in run_agent.py. This finishes the sweep: the same
substring-match false-positive class (e.g. https://api.openai.com.evil/v1,
https://proxy/api.openai.com/v1, https://api.anthropic.com.example/v1)
existed in eight more call sites, and the hostname helper was duplicated
in two modules.

- utils: add shared base_url_hostname() (single source of truth).
- hermes_cli/runtime_provider, run_agent: drop local duplicates, import
  from utils. Reuse the cached AIAgent._base_url_hostname attribute
  everywhere it's already populated.
- agent/auxiliary_client: switch codex-wrap auto-detect, max_completion_tokens
  gate (auxiliary_max_tokens_param), and custom-endpoint max_tokens kwarg
  selection to hostname equality.
- run_agent: native-anthropic check in the Claude-style model branch
  and in the AIAgent init provider-auto-detect branch.
- agent/model_metadata: Anthropic /v1/models context-length lookup.
- hermes_cli/providers.determine_api_mode: anthropic / openai URL
  heuristics for custom/unknown providers (the /anthropic path-suffix
  convention for third-party gateways is preserved).
- tools/delegate_tool: anthropic detection for delegated subagent
  runtimes.
- hermes_cli/setup, hermes_cli/tools_config: setup-wizard vision-endpoint
  native-OpenAI detection (paired with deduping the repeated check into
  a single is_native_openai boolean per branch).

Tests:
- tests/test_base_url_hostname.py covers the helper directly
  (path-containing-host, host-suffix, trailing dot, port, case).
- tests/hermes_cli/test_determine_api_mode_hostname.py adds the same
  regression class for determine_api_mode, plus a test that the
  /anthropic third-party gateway convention still wins.

Also: add asslaenn5@gmail.com → Aslaaen to scripts/release.py AUTHOR_MAP.

											
										
										
											2026-04-20 20:58:01 -07:00
+								        if base_url_hostname(base_url_str) == "api.openai.com":
-												fix(auxiliary): honor api_mode in auxiliary client (#6800)

The auxiliary client always calls client.chat.completions.create(),
ignoring the api_mode config flag. This breaks codex-family models
(e.g. gpt-5.3-codex) on direct OpenAI API keys, which need the
/v1/responses endpoint.

Changes:
- Expand _resolve_task_provider_model to return api_mode (5-tuple)
- Read api_mode from auxiliary.{task}.api_mode config and env vars
  (AUXILIARY_{TASK}_API_MODE)
- Pass api_mode through _get_cached_client to resolve_provider_client
- Add _needs_codex_wrap/_wrap_if_needed helpers that wrap plain OpenAI
  clients in CodexAuxiliaryClient when api_mode=codex_responses or
  when auto-detection finds api.openai.com + codex model pattern
- Apply wrapping at all custom endpoint, named custom provider, and
  API-key provider return paths
- Update test mocks for the new 5-tuple return format

Users can now set:
  auxiliary:
    compression:
      model: gpt-5.3-codex
      base_url: https://api.openai.com/v1
      api_mode: codex_responses

Closes #6800

											
										
										
											2026-04-11 13:50:43 +05:30
+								            model_lower = (model_str or "").lower()
 								            if "codex" in model_lower:
 								                return True
 								        return False
 								    def _wrap_if_needed(client_obj, final_model_str: str, base_url_str: str = ""):
 								        """Wrap a plain OpenAI client in CodexAuxiliaryClient if Responses API is needed."""
 								        if _needs_codex_wrap(client_obj, base_url_str, final_model_str):
 								            logger.debug(
 								                "resolve_provider_client: wrapping client in CodexAuxiliaryClient "
 								                "(api_mode=%s, model=%s, base_url=%s)",
 								                api_mode or "auto-detected", final_model_str,
 								                base_url_str[:60] if base_url_str else "")
 								            return CodexAuxiliaryClient(client_obj, final_model_str)
 								        return client_obj
-												feat: centralized provider router + fix Codex vision bypass + vision error handling

Three interconnected fixes for auxiliary client infrastructure:

1. CENTRALIZED PROVIDER ROUTER (auxiliary_client.py)
   Add resolve_provider_client(provider, model, async_mode) — a single
   entry point for creating properly configured clients. Given a provider
   name and optional model, it handles auth lookup (env vars, OAuth
   tokens, auth.json), base URL resolution, provider-specific headers,
   and API format differences (Chat Completions vs Responses API for
   Codex). All auxiliary consumers should route through this instead of
   ad-hoc env var lookups.

   Refactored get_text_auxiliary_client, get_async_text_auxiliary_client,
   and get_vision_auxiliary_client to use the router internally.

2. FIX CODEX VISION BYPASS (vision_tools.py)
   vision_tools.py was constructing a raw AsyncOpenAI client from the
   sync vision client's api_key/base_url, completely bypassing the Codex
   Responses API adapter. When the vision provider resolved to Codex,
   the raw client would hit chatgpt.com/backend-api/codex with
   chat.completions.create() which only supports the Responses API.

   Fix: Added get_async_vision_auxiliary_client() which properly wraps
   Codex into AsyncCodexAuxiliaryClient. vision_tools.py now uses this
   instead of manual client construction.

3. FIX COMPRESSION FALLBACK + VISION ERROR HANDLING
   - context_compressor.py: Removed _get_fallback_client() which blindly
     looked for OPENAI_API_KEY + OPENAI_BASE_URL (fails for Codex OAuth,
     API-key providers, users without OPENAI_BASE_URL set). Replaced
     with fallback loop through resolve_provider_client() for each
     known provider, with same-provider dedup.

   - vision_tools.py: Added error detection for vision capability
     failures. Returns clear message to the model when the configured
     model doesn't support vision, instead of a generic error.

Addresses #886

											
										
										
											2026-03-11 19:46:47 -07:00
+								    # ── Auto: try all providers in priority order ────────────────────
 								    if provider == "auto":
-												fix(agent): route compression aux through live session runtime

											
										
										
											2026-04-12 00:10:19 -04:00
+								        client, resolved = _resolve_auto(main_runtime=main_runtime)
-												feat: centralized provider router + fix Codex vision bypass + vision error handling

Three interconnected fixes for auxiliary client infrastructure:

1. CENTRALIZED PROVIDER ROUTER (auxiliary_client.py)
   Add resolve_provider_client(provider, model, async_mode) — a single
   entry point for creating properly configured clients. Given a provider
   name and optional model, it handles auth lookup (env vars, OAuth
   tokens, auth.json), base URL resolution, provider-specific headers,
   and API format differences (Chat Completions vs Responses API for
   Codex). All auxiliary consumers should route through this instead of
   ad-hoc env var lookups.

   Refactored get_text_auxiliary_client, get_async_text_auxiliary_client,
   and get_vision_auxiliary_client to use the router internally.

2. FIX CODEX VISION BYPASS (vision_tools.py)
   vision_tools.py was constructing a raw AsyncOpenAI client from the
   sync vision client's api_key/base_url, completely bypassing the Codex
   Responses API adapter. When the vision provider resolved to Codex,
   the raw client would hit chatgpt.com/backend-api/codex with
   chat.completions.create() which only supports the Responses API.

   Fix: Added get_async_vision_auxiliary_client() which properly wraps
   Codex into AsyncCodexAuxiliaryClient. vision_tools.py now uses this
   instead of manual client construction.

3. FIX COMPRESSION FALLBACK + VISION ERROR HANDLING
   - context_compressor.py: Removed _get_fallback_client() which blindly
     looked for OPENAI_API_KEY + OPENAI_BASE_URL (fails for Codex OAuth,
     API-key providers, users without OPENAI_BASE_URL set). Replaced
     with fallback loop through resolve_provider_client() for each
     known provider, with same-provider dedup.

   - vision_tools.py: Added error detection for vision capability
     failures. Returns clear message to the model when the configured
     model doesn't support vision, instead of a generic error.

Addresses #886

											
										
										
											2026-03-11 19:46:47 -07:00
+								        if client is None:
 								            return None, None
-												fix: auxiliary client uses main model for custom/local endpoints instead of gpt-4o-mini (#1189)

* fix: prevent model/provider mismatch when switching providers during active gateway

When _update_config_for_provider() writes the new provider and base_url
to config.yaml, the gateway (which re-reads config per-message) can pick
up the change before model selection completes. This causes the old model
name (e.g. 'anthropic/claude-opus-4.6') to be sent to the new provider's
API (e.g. MiniMax), which fails.

Changes:
- _update_config_for_provider() now accepts an optional default_model
  parameter. When provided and the current model.default is empty or
  uses OpenRouter format (contains '/'), it sets a safe default model
  for the new provider.
- All setup.py callers for direct-API providers (zai, kimi, minimax,
  minimax-cn, anthropic) now pass a provider-appropriate default model.
- _setup_provider_model_selection() now validates the 'Keep current'
  choice: if the current model uses OpenRouter format and wouldn't work
  with the new provider, it warns and switches to the provider's first
  default model instead of silently keeping the incompatible name.

Reported by a user on Home Assistant whose gateway started sending
'anthropic/claude-opus-4.6' to MiniMax's API after running hermes setup.

* fix: auxiliary client uses main model for custom/local endpoints instead of gpt-4o-mini

When a user runs a local server (e.g. Qwen3.5-9B via OPENAI_BASE_URL),
the auxiliary client (context compression, vision, session search) would
send requests for 'gpt-4o-mini' or 'google/gemini-3-flash-preview' to
the local server, which only serves one model — causing 404 errors
mid-task.

Changes:
- _try_custom_endpoint() now reads the user's configured main model via
  _read_main_model() (checks OPENAI_MODEL → HERMES_MODEL → LLM_MODEL →
  config.yaml model.default) instead of hardcoding 'gpt-4o-mini'.
- resolve_provider_client() auto mode now detects when an OpenRouter-
  formatted model override (containing '/') would be sent to a non-
  OpenRouter provider (like a local server) and drops it in favor of
  the provider's default model.
- Test isolation fixes: properly clear env vars in 'nothing available'
  tests to prevent host environment leakage.
											
										
										
											2026-03-13 10:02:16 -07:00
+								        # When auto-detection lands on a non-OpenRouter provider (e.g. a
 								        # local server), an OpenRouter-formatted model override like
 								        # "google/gemini-3-flash-preview" won't work.  Drop it and use
 								        # the provider's own default model instead.
 								        if model and "/" in model and resolved and "/" not in resolved:
 								            logger.debug(
 								                "Dropping OpenRouter-format model %r for non-OpenRouter "
 								                "auxiliary provider (using %r instead)", model, resolved)
 								            model = None
-												feat: centralized provider router + fix Codex vision bypass + vision error handling

Three interconnected fixes for auxiliary client infrastructure:

1. CENTRALIZED PROVIDER ROUTER (auxiliary_client.py)
   Add resolve_provider_client(provider, model, async_mode) — a single
   entry point for creating properly configured clients. Given a provider
   name and optional model, it handles auth lookup (env vars, OAuth
   tokens, auth.json), base URL resolution, provider-specific headers,
   and API format differences (Chat Completions vs Responses API for
   Codex). All auxiliary consumers should route through this instead of
   ad-hoc env var lookups.

   Refactored get_text_auxiliary_client, get_async_text_auxiliary_client,
   and get_vision_auxiliary_client to use the router internally.

2. FIX CODEX VISION BYPASS (vision_tools.py)
   vision_tools.py was constructing a raw AsyncOpenAI client from the
   sync vision client's api_key/base_url, completely bypassing the Codex
   Responses API adapter. When the vision provider resolved to Codex,
   the raw client would hit chatgpt.com/backend-api/codex with
   chat.completions.create() which only supports the Responses API.

   Fix: Added get_async_vision_auxiliary_client() which properly wraps
   Codex into AsyncCodexAuxiliaryClient. vision_tools.py now uses this
   instead of manual client construction.

3. FIX COMPRESSION FALLBACK + VISION ERROR HANDLING
   - context_compressor.py: Removed _get_fallback_client() which blindly
     looked for OPENAI_API_KEY + OPENAI_BASE_URL (fails for Codex OAuth,
     API-key providers, users without OPENAI_BASE_URL set). Replaced
     with fallback loop through resolve_provider_client() for each
     known provider, with same-provider dedup.

   - vision_tools.py: Added error detection for vision capability
     failures. Returns clear message to the model when the configured
     model doesn't support vision, instead of a generic error.

Addresses #886

											
										
										
											2026-03-11 19:46:47 -07:00
+								        final_model = model or resolved
-												fix(copilot): send vision header for Copilot vision requests

Thread a vision-request flag through auxiliary provider resolution so Copilot clients can include Copilot-Vision-Request only for vision tasks. This preserves normal text requests while ensuring Copilot vision payloads reach the vision-capable route.

Add regression coverage for Copilot vision routing and keep cached text and vision clients separate so a text client without the header is not reused for vision.

Co-authored-by: dhabibi <9087935+dhabibi@users.noreply.github.com>

											
										
										
											2026-04-26 21:29:55 +00:00
+								        return (_to_async_client(client, final_model, is_vision=is_vision) if async_mode
-												feat: centralized provider router + fix Codex vision bypass + vision error handling

Three interconnected fixes for auxiliary client infrastructure:

1. CENTRALIZED PROVIDER ROUTER (auxiliary_client.py)
   Add resolve_provider_client(provider, model, async_mode) — a single
   entry point for creating properly configured clients. Given a provider
   name and optional model, it handles auth lookup (env vars, OAuth
   tokens, auth.json), base URL resolution, provider-specific headers,
   and API format differences (Chat Completions vs Responses API for
   Codex). All auxiliary consumers should route through this instead of
   ad-hoc env var lookups.

   Refactored get_text_auxiliary_client, get_async_text_auxiliary_client,
   and get_vision_auxiliary_client to use the router internally.

2. FIX CODEX VISION BYPASS (vision_tools.py)
   vision_tools.py was constructing a raw AsyncOpenAI client from the
   sync vision client's api_key/base_url, completely bypassing the Codex
   Responses API adapter. When the vision provider resolved to Codex,
   the raw client would hit chatgpt.com/backend-api/codex with
   chat.completions.create() which only supports the Responses API.

   Fix: Added get_async_vision_auxiliary_client() which properly wraps
   Codex into AsyncCodexAuxiliaryClient. vision_tools.py now uses this
   instead of manual client construction.

3. FIX COMPRESSION FALLBACK + VISION ERROR HANDLING
   - context_compressor.py: Removed _get_fallback_client() which blindly
     looked for OPENAI_API_KEY + OPENAI_BASE_URL (fails for Codex OAuth,
     API-key providers, users without OPENAI_BASE_URL set). Replaced
     with fallback loop through resolve_provider_client() for each
     known provider, with same-provider dedup.

   - vision_tools.py: Added error detection for vision capability
     failures. Returns clear message to the model when the configured
     model doesn't support vision, instead of a generic error.

Addresses #886

											
										
										
											2026-03-11 19:46:47 -07:00
+								                else (client, final_model))
 								    # ── OpenRouter ───────────────────────────────────────────────────
 								    if provider == "openrouter":
 								        client, default = _try_openrouter()
 								        if client is None:
-												fix(agent): clarify exhausted OpenRouter auxiliary credentials

											
										
										
											2026-04-23 09:54:12 -07:00
+								            logger.warning(
 								                "resolve_provider_client: openrouter requested but %s",
 								                _describe_openrouter_unavailable(),
 								            )
-												feat: centralized provider router + fix Codex vision bypass + vision error handling

Three interconnected fixes for auxiliary client infrastructure:

1. CENTRALIZED PROVIDER ROUTER (auxiliary_client.py)
   Add resolve_provider_client(provider, model, async_mode) — a single
   entry point for creating properly configured clients. Given a provider
   name and optional model, it handles auth lookup (env vars, OAuth
   tokens, auth.json), base URL resolution, provider-specific headers,
   and API format differences (Chat Completions vs Responses API for
   Codex). All auxiliary consumers should route through this instead of
   ad-hoc env var lookups.

   Refactored get_text_auxiliary_client, get_async_text_auxiliary_client,
   and get_vision_auxiliary_client to use the router internally.

2. FIX CODEX VISION BYPASS (vision_tools.py)
   vision_tools.py was constructing a raw AsyncOpenAI client from the
   sync vision client's api_key/base_url, completely bypassing the Codex
   Responses API adapter. When the vision provider resolved to Codex,
   the raw client would hit chatgpt.com/backend-api/codex with
   chat.completions.create() which only supports the Responses API.

   Fix: Added get_async_vision_auxiliary_client() which properly wraps
   Codex into AsyncCodexAuxiliaryClient. vision_tools.py now uses this
   instead of manual client construction.

3. FIX COMPRESSION FALLBACK + VISION ERROR HANDLING
   - context_compressor.py: Removed _get_fallback_client() which blindly
     looked for OPENAI_API_KEY + OPENAI_BASE_URL (fails for Codex OAuth,
     API-key providers, users without OPENAI_BASE_URL set). Replaced
     with fallback loop through resolve_provider_client() for each
     known provider, with same-provider dedup.

   - vision_tools.py: Added error detection for vision capability
     failures. Returns clear message to the model when the configured
     model doesn't support vision, instead of a generic error.

Addresses #886

											
										
										
											2026-03-11 19:46:47 -07:00
+								            return None, None
-												fix(model): normalize direct provider ids in auxiliary routing

											
										
										
											2026-04-09 21:20:29 -07:00
+								        final_model = _normalize_resolved_model(model or default, provider)
-												fix(copilot): send vision header for Copilot vision requests

Thread a vision-request flag through auxiliary provider resolution so Copilot clients can include Copilot-Vision-Request only for vision tasks. This preserves normal text requests while ensuring Copilot vision payloads reach the vision-capable route.

Add regression coverage for Copilot vision routing and keep cached text and vision clients separate so a text client without the header is not reused for vision.

Co-authored-by: dhabibi <9087935+dhabibi@users.noreply.github.com>

											
										
										
											2026-04-26 21:29:55 +00:00
+								        return (_to_async_client(client, final_model, is_vision=is_vision) if async_mode
-												feat: centralized provider router + fix Codex vision bypass + vision error handling

Three interconnected fixes for auxiliary client infrastructure:

1. CENTRALIZED PROVIDER ROUTER (auxiliary_client.py)
   Add resolve_provider_client(provider, model, async_mode) — a single
   entry point for creating properly configured clients. Given a provider
   name and optional model, it handles auth lookup (env vars, OAuth
   tokens, auth.json), base URL resolution, provider-specific headers,
   and API format differences (Chat Completions vs Responses API for
   Codex). All auxiliary consumers should route through this instead of
   ad-hoc env var lookups.

   Refactored get_text_auxiliary_client, get_async_text_auxiliary_client,
   and get_vision_auxiliary_client to use the router internally.

2. FIX CODEX VISION BYPASS (vision_tools.py)
   vision_tools.py was constructing a raw AsyncOpenAI client from the
   sync vision client's api_key/base_url, completely bypassing the Codex
   Responses API adapter. When the vision provider resolved to Codex,
   the raw client would hit chatgpt.com/backend-api/codex with
   chat.completions.create() which only supports the Responses API.

   Fix: Added get_async_vision_auxiliary_client() which properly wraps
   Codex into AsyncCodexAuxiliaryClient. vision_tools.py now uses this
   instead of manual client construction.

3. FIX COMPRESSION FALLBACK + VISION ERROR HANDLING
   - context_compressor.py: Removed _get_fallback_client() which blindly
     looked for OPENAI_API_KEY + OPENAI_BASE_URL (fails for Codex OAuth,
     API-key providers, users without OPENAI_BASE_URL set). Replaced
     with fallback loop through resolve_provider_client() for each
     known provider, with same-provider dedup.

   - vision_tools.py: Added error detection for vision capability
     failures. Returns clear message to the model when the configured
     model doesn't support vision, instead of a generic error.

Addresses #886

											
										
										
											2026-03-11 19:46:47 -07:00
+								                else (client, final_model))
 								    # ── Nous Portal (OAuth) ──────────────────────────────────────────
 								    if provider == "nous":
-												fix(vision): resolve Nous vision model correctly in auto-detect path

Two changes:
1. _PROVIDER_VISION_MODELS: add 'nous' -> 'xiaomi/mimo-v2-omni' entry
   so the vision auto-detect chain picks the correct multimodal model.

2. resolve_provider_client: detect when the requested model is a vision
   model (from _PROVIDER_VISION_MODELS or known vision model names) and
   pass vision=True to _try_nous().  Previously, _try_nous() was always
   called without vision=True in resolve_provider_client(), causing it to
   return the default text model (gemini-3-flash-preview or mimo-v2-pro)
   instead of the vision-capable mimo-v2-omni.

The _try_nous() function already handled free-tier vision correctly, but
the resolve_provider_client() path (used by the auto-detect vision chain)
never signaled that a vision task was in progress.

Verified: xiaomi/mimo-v2-omni returns HTTP 200 with image inputs on Nous
inference API. google/gemini-3-flash-preview returns 404 with images.

											
										
										
											2026-04-19 20:08:03 +00:00
+								        # Detect vision tasks: either explicit model override from
 								        # _PROVIDER_VISION_MODELS, or caller passed a known vision model.
 								        _is_vision = (
 								            model in _PROVIDER_VISION_MODELS.values()
 								            or (model or "").strip().lower() == "mimo-v2-omni"
 								        )
 								        client, default = _try_nous(vision=_is_vision)
-												feat: centralized provider router + fix Codex vision bypass + vision error handling

Three interconnected fixes for auxiliary client infrastructure:

1. CENTRALIZED PROVIDER ROUTER (auxiliary_client.py)
   Add resolve_provider_client(provider, model, async_mode) — a single
   entry point for creating properly configured clients. Given a provider
   name and optional model, it handles auth lookup (env vars, OAuth
   tokens, auth.json), base URL resolution, provider-specific headers,
   and API format differences (Chat Completions vs Responses API for
   Codex). All auxiliary consumers should route through this instead of
   ad-hoc env var lookups.

   Refactored get_text_auxiliary_client, get_async_text_auxiliary_client,
   and get_vision_auxiliary_client to use the router internally.

2. FIX CODEX VISION BYPASS (vision_tools.py)
   vision_tools.py was constructing a raw AsyncOpenAI client from the
   sync vision client's api_key/base_url, completely bypassing the Codex
   Responses API adapter. When the vision provider resolved to Codex,
   the raw client would hit chatgpt.com/backend-api/codex with
   chat.completions.create() which only supports the Responses API.

   Fix: Added get_async_vision_auxiliary_client() which properly wraps
   Codex into AsyncCodexAuxiliaryClient. vision_tools.py now uses this
   instead of manual client construction.

3. FIX COMPRESSION FALLBACK + VISION ERROR HANDLING
   - context_compressor.py: Removed _get_fallback_client() which blindly
     looked for OPENAI_API_KEY + OPENAI_BASE_URL (fails for Codex OAuth,
     API-key providers, users without OPENAI_BASE_URL set). Replaced
     with fallback loop through resolve_provider_client() for each
     known provider, with same-provider dedup.

   - vision_tools.py: Added error detection for vision capability
     failures. Returns clear message to the model when the configured
     model doesn't support vision, instead of a generic error.

Addresses #886

											
										
										
											2026-03-11 19:46:47 -07:00
+								        if client is None:
 								            logger.warning("resolve_provider_client: nous requested "
-												fix: replace stale 'hermes login' refs with 'hermes auth' + fix credential removal re-seeding (#5670)

Two fixes:

1. Replace all stale 'hermes login' references with 'hermes auth' across
   auth.py, auxiliary_client.py, delegate_tool.py, config.py, run_agent.py,
   and documentation. The 'hermes login' command was deprecated; 'hermes auth'
   now handles OAuth credential management.

2. Fix credential removal not persisting for singleton-sourced credentials
   (device_code for openai-codex/nous, hermes_pkce for anthropic).
   auth_remove_command already cleared env vars for env-sourced credentials,
   but singleton credentials stored in the auth store were re-seeded by
   _seed_from_singletons() on the next load_pool() call. Now clears the
   underlying auth store entry when removing singleton-sourced credentials.
											
										
										
											2026-04-06 17:17:57 -07:00
+								                           "but Nous Portal not configured (run: hermes auth)")
-												feat: centralized provider router + fix Codex vision bypass + vision error handling

Three interconnected fixes for auxiliary client infrastructure:

1. CENTRALIZED PROVIDER ROUTER (auxiliary_client.py)
   Add resolve_provider_client(provider, model, async_mode) — a single
   entry point for creating properly configured clients. Given a provider
   name and optional model, it handles auth lookup (env vars, OAuth
   tokens, auth.json), base URL resolution, provider-specific headers,
   and API format differences (Chat Completions vs Responses API for
   Codex). All auxiliary consumers should route through this instead of
   ad-hoc env var lookups.

   Refactored get_text_auxiliary_client, get_async_text_auxiliary_client,
   and get_vision_auxiliary_client to use the router internally.

2. FIX CODEX VISION BYPASS (vision_tools.py)
   vision_tools.py was constructing a raw AsyncOpenAI client from the
   sync vision client's api_key/base_url, completely bypassing the Codex
   Responses API adapter. When the vision provider resolved to Codex,
   the raw client would hit chatgpt.com/backend-api/codex with
   chat.completions.create() which only supports the Responses API.

   Fix: Added get_async_vision_auxiliary_client() which properly wraps
   Codex into AsyncCodexAuxiliaryClient. vision_tools.py now uses this
   instead of manual client construction.

3. FIX COMPRESSION FALLBACK + VISION ERROR HANDLING
   - context_compressor.py: Removed _get_fallback_client() which blindly
     looked for OPENAI_API_KEY + OPENAI_BASE_URL (fails for Codex OAuth,
     API-key providers, users without OPENAI_BASE_URL set). Replaced
     with fallback loop through resolve_provider_client() for each
     known provider, with same-provider dedup.

   - vision_tools.py: Added error detection for vision capability
     failures. Returns clear message to the model when the configured
     model doesn't support vision, instead of a generic error.

Addresses #886

											
										
										
											2026-03-11 19:46:47 -07:00
+								            return None, None
-												fix(model): normalize direct provider ids in auxiliary routing

											
										
										
											2026-04-09 21:20:29 -07:00
+								        final_model = _normalize_resolved_model(model or default, provider)
-												fix(copilot): send vision header for Copilot vision requests

Thread a vision-request flag through auxiliary provider resolution so Copilot clients can include Copilot-Vision-Request only for vision tasks. This preserves normal text requests while ensuring Copilot vision payloads reach the vision-capable route.

Add regression coverage for Copilot vision routing and keep cached text and vision clients separate so a text client without the header is not reused for vision.

Co-authored-by: dhabibi <9087935+dhabibi@users.noreply.github.com>

											
										
										
											2026-04-26 21:29:55 +00:00
+								        return (_to_async_client(client, final_model, is_vision=is_vision) if async_mode
-												feat: centralized provider router + fix Codex vision bypass + vision error handling

Three interconnected fixes for auxiliary client infrastructure:

1. CENTRALIZED PROVIDER ROUTER (auxiliary_client.py)
   Add resolve_provider_client(provider, model, async_mode) — a single
   entry point for creating properly configured clients. Given a provider
   name and optional model, it handles auth lookup (env vars, OAuth
   tokens, auth.json), base URL resolution, provider-specific headers,
   and API format differences (Chat Completions vs Responses API for
   Codex). All auxiliary consumers should route through this instead of
   ad-hoc env var lookups.

   Refactored get_text_auxiliary_client, get_async_text_auxiliary_client,
   and get_vision_auxiliary_client to use the router internally.

2. FIX CODEX VISION BYPASS (vision_tools.py)
   vision_tools.py was constructing a raw AsyncOpenAI client from the
   sync vision client's api_key/base_url, completely bypassing the Codex
   Responses API adapter. When the vision provider resolved to Codex,
   the raw client would hit chatgpt.com/backend-api/codex with
   chat.completions.create() which only supports the Responses API.

   Fix: Added get_async_vision_auxiliary_client() which properly wraps
   Codex into AsyncCodexAuxiliaryClient. vision_tools.py now uses this
   instead of manual client construction.

3. FIX COMPRESSION FALLBACK + VISION ERROR HANDLING
   - context_compressor.py: Removed _get_fallback_client() which blindly
     looked for OPENAI_API_KEY + OPENAI_BASE_URL (fails for Codex OAuth,
     API-key providers, users without OPENAI_BASE_URL set). Replaced
     with fallback loop through resolve_provider_client() for each
     known provider, with same-provider dedup.

   - vision_tools.py: Added error detection for vision capability
     failures. Returns clear message to the model when the configured
     model doesn't support vision, instead of a generic error.

Addresses #886

											
										
										
											2026-03-11 19:46:47 -07:00
+								                else (client, final_model))
 								    # ── OpenAI Codex (OAuth → Responses API) ─────────────────────────
 								    if provider == "openai-codex":
-												refactor: route main agent client + fallback through centralized router

Phase 2 of the provider router migration — route the main agent's
client construction and fallback activation through
resolve_provider_client() instead of duplicated ad-hoc logic.

run_agent.py:
- __init__: When no explicit api_key/base_url, use
  resolve_provider_client(provider, raw_codex=True) for client
  construction. Explicit creds (from CLI/gateway runtime provider)
  still construct directly.
- _try_activate_fallback: Replace _resolve_fallback_credentials and
  its duplicated _FALLBACK_API_KEY_PROVIDERS / _FALLBACK_OAUTH_PROVIDERS
  dicts with a single resolve_provider_client() call. The router
  handles all provider types (API-key, OAuth, Codex) centrally.
- Remove _resolve_fallback_credentials method and both fallback dicts.

agent/auxiliary_client.py:
- Add raw_codex parameter to resolve_provider_client(). When True,
  returns the raw OpenAI client for Codex providers instead of wrapping
  in CodexAuxiliaryClient. The main agent needs this for direct
  responses.stream() access.

3251 passed, 2 pre-existing unrelated failures.

											
										
										
											2026-03-11 21:38:29 -07:00
+								        if raw_codex:
 								            # Return the raw OpenAI client for callers that need direct
 								            # access to responses.stream() (e.g., the main agent loop).
 								            codex_token = _read_codex_access_token()
 								            if not codex_token:
 								                logger.warning("resolve_provider_client: openai-codex requested "
 								                               "but no Codex OAuth token found (run: hermes model)")
 								                return None, None
-												fix(model): normalize direct provider ids in auxiliary routing

											
										
										
											2026-04-09 21:20:29 -07:00
+								            final_model = _normalize_resolved_model(model or _CODEX_AUX_MODEL, provider)
-												fix(codex): pin correct Cloudflare headers and extend to auxiliary client

The cherry-picked salvage (admin28980's commit) added codex headers only on the
primary chat client path, with two inaccuracies:

  - originator was 'hermes-agent' — Cloudflare whitelists codex_cli_rs,
    codex_vscode, codex_sdk_ts, and Codex* prefixes. 'hermes-agent' isn't on
    the list, so the header had no mitigating effect on the 403 (the
    account-id header alone may have been carrying the fix).
  - account-id header was 'ChatGPT-Account-Id' — upstream codex-rs auth.rs
    uses canonical 'ChatGPT-Account-ID' (PascalCase, trailing -ID).

Also, the auxiliary client (_try_codex + resolve_provider_client raw_codex
branch) constructs OpenAI clients against the same chatgpt.com endpoint with
no default headers at all — so compression, title generation, vision, session
search, and web_extract all still 403 from VPS IPs.

Consolidate the header set into _codex_cloudflare_headers() in
agent/auxiliary_client.py (natural home next to _read_codex_access_token and
the existing JWT decode logic) and call it from all four insertion points:

  - run_agent.py: AIAgent.__init__ (initial construction)
  - run_agent.py: _apply_client_headers_for_base_url (credential rotation)
  - agent/auxiliary_client.py: _try_codex (aux client)
  - agent/auxiliary_client.py: resolve_provider_client raw_codex branch

Net: -36/+55 lines, -25 lines of duplicated inline JWT decode replaced by a
single helper. User-Agent switched to 'codex_cli_rs/0.0.0 (Hermes Agent)' to
match the codex-rs shape while keeping product attribution.

Tests in tests/agent/test_codex_cloudflare_headers.py cover:
  - originator value, User-Agent shape, canonical header casing
  - account-ID extraction from a real JWT fixture
  - graceful handling of malformed / non-string / claim-missing tokens
  - wiring at all four insertion points (primary init, rotation, both aux paths)
  - non-chatgpt base URLs (openrouter) do NOT get codex headers
  - switching away from chatgpt.com drops the headers

											
										
										
											2026-04-19 11:58:15 -07:00
+								            raw_client = OpenAI(
 								                api_key=codex_token,
 								                base_url=_CODEX_AUX_BASE_URL,
 								                default_headers=_codex_cloudflare_headers(codex_token),
 								            )
-												refactor: route main agent client + fallback through centralized router

Phase 2 of the provider router migration — route the main agent's
client construction and fallback activation through
resolve_provider_client() instead of duplicated ad-hoc logic.

run_agent.py:
- __init__: When no explicit api_key/base_url, use
  resolve_provider_client(provider, raw_codex=True) for client
  construction. Explicit creds (from CLI/gateway runtime provider)
  still construct directly.
- _try_activate_fallback: Replace _resolve_fallback_credentials and
  its duplicated _FALLBACK_API_KEY_PROVIDERS / _FALLBACK_OAUTH_PROVIDERS
  dicts with a single resolve_provider_client() call. The router
  handles all provider types (API-key, OAuth, Codex) centrally.
- Remove _resolve_fallback_credentials method and both fallback dicts.

agent/auxiliary_client.py:
- Add raw_codex parameter to resolve_provider_client(). When True,
  returns the raw OpenAI client for Codex providers instead of wrapping
  in CodexAuxiliaryClient. The main agent needs this for direct
  responses.stream() access.

3251 passed, 2 pre-existing unrelated failures.

											
										
										
											2026-03-11 21:38:29 -07:00
+								            return (raw_client, final_model)
 								        # Standard path: wrap in CodexAuxiliaryClient adapter
-												feat: centralized provider router + fix Codex vision bypass + vision error handling

Three interconnected fixes for auxiliary client infrastructure:

1. CENTRALIZED PROVIDER ROUTER (auxiliary_client.py)
   Add resolve_provider_client(provider, model, async_mode) — a single
   entry point for creating properly configured clients. Given a provider
   name and optional model, it handles auth lookup (env vars, OAuth
   tokens, auth.json), base URL resolution, provider-specific headers,
   and API format differences (Chat Completions vs Responses API for
   Codex). All auxiliary consumers should route through this instead of
   ad-hoc env var lookups.

   Refactored get_text_auxiliary_client, get_async_text_auxiliary_client,
   and get_vision_auxiliary_client to use the router internally.

2. FIX CODEX VISION BYPASS (vision_tools.py)
   vision_tools.py was constructing a raw AsyncOpenAI client from the
   sync vision client's api_key/base_url, completely bypassing the Codex
   Responses API adapter. When the vision provider resolved to Codex,
   the raw client would hit chatgpt.com/backend-api/codex with
   chat.completions.create() which only supports the Responses API.

   Fix: Added get_async_vision_auxiliary_client() which properly wraps
   Codex into AsyncCodexAuxiliaryClient. vision_tools.py now uses this
   instead of manual client construction.

3. FIX COMPRESSION FALLBACK + VISION ERROR HANDLING
   - context_compressor.py: Removed _get_fallback_client() which blindly
     looked for OPENAI_API_KEY + OPENAI_BASE_URL (fails for Codex OAuth,
     API-key providers, users without OPENAI_BASE_URL set). Replaced
     with fallback loop through resolve_provider_client() for each
     known provider, with same-provider dedup.

   - vision_tools.py: Added error detection for vision capability
     failures. Returns clear message to the model when the configured
     model doesn't support vision, instead of a generic error.

Addresses #886

											
										
										
											2026-03-11 19:46:47 -07:00
+								        client, default = _try_codex()
 								        if client is None:
 								            logger.warning("resolve_provider_client: openai-codex requested "
 								                           "but no Codex OAuth token found (run: hermes model)")
 								            return None, None
-												fix(model): normalize direct provider ids in auxiliary routing

											
										
										
											2026-04-09 21:20:29 -07:00
+								        final_model = _normalize_resolved_model(model or default, provider)
-												fix(copilot): send vision header for Copilot vision requests

Thread a vision-request flag through auxiliary provider resolution so Copilot clients can include Copilot-Vision-Request only for vision tasks. This preserves normal text requests while ensuring Copilot vision payloads reach the vision-capable route.

Add regression coverage for Copilot vision routing and keep cached text and vision clients separate so a text client without the header is not reused for vision.

Co-authored-by: dhabibi <9087935+dhabibi@users.noreply.github.com>

											
										
										
											2026-04-26 21:29:55 +00:00
+								        return (_to_async_client(client, final_model, is_vision=is_vision) if async_mode
-												feat: centralized provider router + fix Codex vision bypass + vision error handling

Three interconnected fixes for auxiliary client infrastructure:

1. CENTRALIZED PROVIDER ROUTER (auxiliary_client.py)
   Add resolve_provider_client(provider, model, async_mode) — a single
   entry point for creating properly configured clients. Given a provider
   name and optional model, it handles auth lookup (env vars, OAuth
   tokens, auth.json), base URL resolution, provider-specific headers,
   and API format differences (Chat Completions vs Responses API for
   Codex). All auxiliary consumers should route through this instead of
   ad-hoc env var lookups.

   Refactored get_text_auxiliary_client, get_async_text_auxiliary_client,
   and get_vision_auxiliary_client to use the router internally.

2. FIX CODEX VISION BYPASS (vision_tools.py)
   vision_tools.py was constructing a raw AsyncOpenAI client from the
   sync vision client's api_key/base_url, completely bypassing the Codex
   Responses API adapter. When the vision provider resolved to Codex,
   the raw client would hit chatgpt.com/backend-api/codex with
   chat.completions.create() which only supports the Responses API.

   Fix: Added get_async_vision_auxiliary_client() which properly wraps
   Codex into AsyncCodexAuxiliaryClient. vision_tools.py now uses this
   instead of manual client construction.

3. FIX COMPRESSION FALLBACK + VISION ERROR HANDLING
   - context_compressor.py: Removed _get_fallback_client() which blindly
     looked for OPENAI_API_KEY + OPENAI_BASE_URL (fails for Codex OAuth,
     API-key providers, users without OPENAI_BASE_URL set). Replaced
     with fallback loop through resolve_provider_client() for each
     known provider, with same-provider dedup.

   - vision_tools.py: Added error detection for vision capability
     failures. Returns clear message to the model when the configured
     model doesn't support vision, instead of a generic error.

Addresses #886

											
										
										
											2026-03-11 19:46:47 -07:00
+								                else (client, final_model))
 								    # ── Custom endpoint (OPENAI_BASE_URL + OPENAI_API_KEY) ───────────
 								    if provider == "custom":
-												feat: add direct endpoint overrides for auxiliary and delegation

Add base_url/api_key overrides for auxiliary tasks and delegation so users can
route those flows straight to a custom OpenAI-compatible endpoint without
having to rely on provider=main or named custom providers.

Also clear gateway session env vars in test isolation so the full suite stays
deterministic when run from a messaging-backed agent session.

											
										
										
											2026-03-14 20:48:29 -07:00
+								        if explicit_base_url:
 								            custom_base = explicit_base_url.strip()
 								            custom_key = (
 								                (explicit_api_key or "").strip()
 								                or os.getenv("OPENAI_API_KEY", "").strip()
-												fix: auxiliary client uses placeholder key for local servers without auth (#3842)

Local inference servers (Ollama, llama.cpp, vLLM, LM Studio) don't
require API keys, but the auxiliary client's _resolve_custom_runtime()
rejected endpoints with empty keys — causing the auto-detection chain
to skip the user's local server entirely.  This broke compression,
summarization, and memory flush for users running local models without
an OpenRouter/cloud API key.

The main CLI already had this fix (PR #2556, 'no-key-required'
placeholder), but the auxiliary client's resolution path was missed.

Two fixes:
- _resolve_custom_runtime(): use 'no-key-required' placeholder instead
  of returning None when base_url is present but key is empty
- resolve_provider_client() custom branch: same placeholder fallback
  for explicit_base_url without explicit_api_key

Updates 2 tests that expected the old (broken) behavior.
											
										
										
											2026-03-29 21:05:36 -07:00
+								                or "no-key-required"  # local servers don't need auth
-												feat: add direct endpoint overrides for auxiliary and delegation

Add base_url/api_key overrides for auxiliary tasks and delegation so users can
route those flows straight to a custom OpenAI-compatible endpoint without
having to rely on provider=main or named custom providers.

Also clear gateway session env vars in test isolation so the full suite stays
deterministic when run from a messaging-backed agent session.

											
										
										
											2026-03-14 20:48:29 -07:00
+								            )
-												fix: auxiliary client uses placeholder key for local servers without auth (#3842)

Local inference servers (Ollama, llama.cpp, vLLM, LM Studio) don't
require API keys, but the auxiliary client's _resolve_custom_runtime()
rejected endpoints with empty keys — causing the auto-detection chain
to skip the user's local server entirely.  This broke compression,
summarization, and memory flush for users running local models without
an OpenRouter/cloud API key.

The main CLI already had this fix (PR #2556, 'no-key-required'
placeholder), but the auxiliary client's resolution path was missed.

Two fixes:
- _resolve_custom_runtime(): use 'no-key-required' placeholder instead
  of returning None when base_url is present but key is empty
- resolve_provider_client() custom branch: same placeholder fallback
  for explicit_base_url without explicit_api_key

Updates 2 tests that expected the old (broken) behavior.
											
										
										
											2026-03-29 21:05:36 -07:00
+								            if not custom_base:
-												feat: add direct endpoint overrides for auxiliary and delegation

Add base_url/api_key overrides for auxiliary tasks and delegation so users can
route those flows straight to a custom OpenAI-compatible endpoint without
having to rely on provider=main or named custom providers.

Also clear gateway session env vars in test isolation so the full suite stays
deterministic when run from a messaging-backed agent session.

											
										
										
											2026-03-14 20:48:29 -07:00
+								                logger.warning(
 								                    "resolve_provider_client: explicit custom endpoint requested "
-												fix: auxiliary client uses placeholder key for local servers without auth (#3842)

Local inference servers (Ollama, llama.cpp, vLLM, LM Studio) don't
require API keys, but the auxiliary client's _resolve_custom_runtime()
rejected endpoints with empty keys — causing the auto-detection chain
to skip the user's local server entirely.  This broke compression,
summarization, and memory flush for users running local models without
an OpenRouter/cloud API key.

The main CLI already had this fix (PR #2556, 'no-key-required'
placeholder), but the auxiliary client's resolution path was missed.

Two fixes:
- _resolve_custom_runtime(): use 'no-key-required' placeholder instead
  of returning None when base_url is present but key is empty
- resolve_provider_client() custom branch: same placeholder fallback
  for explicit_base_url without explicit_api_key

Updates 2 tests that expected the old (broken) behavior.
											
										
										
											2026-03-29 21:05:36 -07:00
+								                    "but base_url is empty"
-												feat: add direct endpoint overrides for auxiliary and delegation

Add base_url/api_key overrides for auxiliary tasks and delegation so users can
route those flows straight to a custom OpenAI-compatible endpoint without
having to rely on provider=main or named custom providers.

Also clear gateway session env vars in test isolation so the full suite stays
deterministic when run from a messaging-backed agent session.

											
										
										
											2026-03-14 20:48:29 -07:00
+								                )
 								                return None, None
-												fix(model): normalize direct provider ids in auxiliary routing

											
										
										
											2026-04-09 21:20:29 -07:00
+								            final_model = _normalize_resolved_model(
 								                model or _read_main_model() or "gpt-4o-mini",
 								                provider,
 								            )
-												fix(auxiliary_client): inject KimiCLI User-Agent for custom endpoint sync clients

When  is explicitly set to ,
the custom-endpoint path in  creates a plain
client without provider-specific headers. This means sync vision calls (e.g.
) use the generic  User-Agent and get rejected by
Kimi's coding endpoint with a 403:

    'Kimi For Coding is currently only available for Coding Agents such as Kimi CLI...'

The async converter  already injects , and the
auto-detected API-key provider path also injects it, but the explicit custom
endpoint shortcut was missing it entirely.

This patch adds the same  injection to the custom endpoint
branch, and updates all existing Kimi header sites to  for
consistency.

Fixes <issue number to be filled in>

											
										
										
											2026-04-07 17:50:42 +08:00
+								            extra = {}
-												fix: preserve URL query params for Azure OpenAI and custom endpoints

Azure OpenAI requires an `api-version` query parameter on every request.
When users include it in the base_url (e.g. `?api-version=2025-04-01-preview`),
the OpenAI SDK silently drops it during URL construction, causing 404 errors.

Extract query params from base_url and pass them via `default_query` so the
SDK appends them to every request. This is a generic solution that works for
any custom endpoint requiring query parameters, not just Azure.

No-op for URLs without query params — fully backward compatible.

											
										
										
											2026-04-13 11:13:46 +08:00
+								            _clean_base, _dq = _extract_url_query_params(custom_base)
 								            if _dq:
 								                extra["default_query"] = _dq
-												fix: sweep remaining provider-URL substring checks across codebase

Completes the hostname-hardening sweep — every substring check against a
provider host in live-routing code is now hostname-based. This closes the
same false-positive class for OpenRouter, GitHub Copilot, Kimi, Qwen,
ChatGPT/Codex, Bedrock, GitHub Models, Vercel AI Gateway, Nous, Z.AI,
Moonshot, Arcee, and MiniMax that the original PR closed for OpenAI, xAI,
and Anthropic.

New helper:
- utils.base_url_host_matches(base_url, domain) — safe counterpart to
  'domain in base_url'. Accepts hostname equality and subdomain matches;
  rejects path segments, host suffixes, and prefix collisions.

Call sites converted (real-code only; tests, optional-skills, red-teaming
scripts untouched):

run_agent.py (10 sites):
- AIAgent.__init__ Bedrock branch, ChatGPT/Codex branch (also path check)
- header cascade for openrouter / copilot / kimi / qwen / chatgpt
- interleaved-thinking trigger (openrouter + claude)
- _is_openrouter_url(), _is_qwen_portal()
- is_native_anthropic check
- github-models-vs-copilot detection (3 sites)
- reasoning-capable route gate (nousresearch, vercel, github)
- codex-backend detection in API kwargs build
- fallback api_mode Bedrock detection

agent/auxiliary_client.py (7 sites):
- extra-headers cascades in 4 distinct client-construction paths
  (resolve custom, resolve auto, OpenRouter-fallback-to-custom,
  _async_client_from_sync, resolve_provider_client explicit-custom,
  resolve_auto_with_codex)
- _is_openrouter_client() base_url sniff

agent/usage_pricing.py:
- resolve_billing_route openrouter branch

agent/model_metadata.py:
- _is_openrouter_base_url(), Bedrock context-length lookup

hermes_cli/providers.py:
- determine_api_mode Bedrock heuristic

hermes_cli/runtime_provider.py:
- _is_openrouter_url flag for API-key preference (issues #420, #560)

hermes_cli/doctor.py:
- Kimi User-Agent header for /models probes

tools/delegate_tool.py:
- subagent Codex endpoint detection

trajectory_compressor.py:
- _detect_provider() cascade (8 providers: openrouter, nous, codex, zai,
  kimi-coding, arcee, minimax-cn, minimax)

cli.py, gateway/run.py:
- /model-switch cache-enabled hint (openrouter + claude)

Bedrock detection tightened from 'bedrock-runtime in url' to
'hostname starts with bedrock-runtime. AND host is under amazonaws.com'.
ChatGPT/Codex detection tightened from 'chatgpt.com/backend-api/codex in
url' to 'hostname is chatgpt.com AND path contains /backend-api/codex'.

Tests:
- tests/test_base_url_hostname.py extended with a base_url_host_matches
  suite (exact match, subdomain, path-segment rejection, host-suffix
  rejection, host-prefix rejection, empty-input, case-insensitivity,
  trailing dot).

Validation: 651 targeted tests pass (runtime_provider, minimax, bedrock,
gemini, auxiliary, codex_cloudflare, usage_pricing, compressor_fallback,
fallback_model, openai_client_lifecycle, provider_parity, cli_provider_resolution,
delegate, credential_pool, context_compressor, plus the 4 hostname test
modules). 26-assertion E2E call-site verification across 6 modules passes.

											
										
										
											2026-04-20 21:17:28 -07:00
+								            if base_url_host_matches(custom_base, "api.kimi.com"):
-												fix: Update Kimi Coding API endpoint and User-Agent

											
										
										
											2026-04-18 22:55:36 +08:00
+								                extra["default_headers"] = {"User-Agent": "claude-code/0.1.0"}
-												fix: sweep remaining provider-URL substring checks across codebase

Completes the hostname-hardening sweep — every substring check against a
provider host in live-routing code is now hostname-based. This closes the
same false-positive class for OpenRouter, GitHub Copilot, Kimi, Qwen,
ChatGPT/Codex, Bedrock, GitHub Models, Vercel AI Gateway, Nous, Z.AI,
Moonshot, Arcee, and MiniMax that the original PR closed for OpenAI, xAI,
and Anthropic.

New helper:
- utils.base_url_host_matches(base_url, domain) — safe counterpart to
  'domain in base_url'. Accepts hostname equality and subdomain matches;
  rejects path segments, host suffixes, and prefix collisions.

Call sites converted (real-code only; tests, optional-skills, red-teaming
scripts untouched):

run_agent.py (10 sites):
- AIAgent.__init__ Bedrock branch, ChatGPT/Codex branch (also path check)
- header cascade for openrouter / copilot / kimi / qwen / chatgpt
- interleaved-thinking trigger (openrouter + claude)
- _is_openrouter_url(), _is_qwen_portal()
- is_native_anthropic check
- github-models-vs-copilot detection (3 sites)
- reasoning-capable route gate (nousresearch, vercel, github)
- codex-backend detection in API kwargs build
- fallback api_mode Bedrock detection

agent/auxiliary_client.py (7 sites):
- extra-headers cascades in 4 distinct client-construction paths
  (resolve custom, resolve auto, OpenRouter-fallback-to-custom,
  _async_client_from_sync, resolve_provider_client explicit-custom,
  resolve_auto_with_codex)
- _is_openrouter_client() base_url sniff

agent/usage_pricing.py:
- resolve_billing_route openrouter branch

agent/model_metadata.py:
- _is_openrouter_base_url(), Bedrock context-length lookup

hermes_cli/providers.py:
- determine_api_mode Bedrock heuristic

hermes_cli/runtime_provider.py:
- _is_openrouter_url flag for API-key preference (issues #420, #560)

hermes_cli/doctor.py:
- Kimi User-Agent header for /models probes

tools/delegate_tool.py:
- subagent Codex endpoint detection

trajectory_compressor.py:
- _detect_provider() cascade (8 providers: openrouter, nous, codex, zai,
  kimi-coding, arcee, minimax-cn, minimax)

cli.py, gateway/run.py:
- /model-switch cache-enabled hint (openrouter + claude)

Bedrock detection tightened from 'bedrock-runtime in url' to
'hostname starts with bedrock-runtime. AND host is under amazonaws.com'.
ChatGPT/Codex detection tightened from 'chatgpt.com/backend-api/codex in
url' to 'hostname is chatgpt.com AND path contains /backend-api/codex'.

Tests:
- tests/test_base_url_hostname.py extended with a base_url_host_matches
  suite (exact match, subdomain, path-segment rejection, host-suffix
  rejection, host-prefix rejection, empty-input, case-insensitivity,
  trailing dot).

Validation: 651 targeted tests pass (runtime_provider, minimax, bedrock,
gemini, auxiliary, codex_cloudflare, usage_pricing, compressor_fallback,
fallback_model, openai_client_lifecycle, provider_parity, cli_provider_resolution,
delegate, credential_pool, context_compressor, plus the 4 hostname test
modules). 26-assertion E2E call-site verification across 6 modules passes.

											
										
										
											2026-04-20 21:17:28 -07:00
+								            elif base_url_host_matches(custom_base, "api.githubcopilot.com"):
-												fix(copilot): send vision header for Copilot vision requests

Thread a vision-request flag through auxiliary provider resolution so Copilot clients can include Copilot-Vision-Request only for vision tasks. This preserves normal text requests while ensuring Copilot vision payloads reach the vision-capable route.

Add regression coverage for Copilot vision routing and keep cached text and vision clients separate so a text client without the header is not reused for vision.

Co-authored-by: dhabibi <9087935+dhabibi@users.noreply.github.com>

											
										
										
											2026-04-26 21:29:55 +00:00
+								                from hermes_cli.copilot_auth import copilot_request_headers
 								                extra["default_headers"] = copilot_request_headers(
 								                    is_agent_turn=True, is_vision=is_vision
 								                )
-												fix: preserve URL query params for Azure OpenAI and custom endpoints

Azure OpenAI requires an `api-version` query parameter on every request.
When users include it in the base_url (e.g. `?api-version=2025-04-01-preview`),
the OpenAI SDK silently drops it during URL construction, causing 404 errors.

Extract query params from base_url and pass them via `default_query` so the
SDK appends them to every request. This is a generic solution that works for
any custom endpoint requiring query parameters, not just Azure.

No-op for URLs without query params — fully backward compatible.

											
										
										
											2026-04-13 11:13:46 +08:00
+								            client = OpenAI(api_key=custom_key, base_url=_clean_base, **extra)
-												fix(auxiliary): honor api_mode in auxiliary client (#6800)

The auxiliary client always calls client.chat.completions.create(),
ignoring the api_mode config flag. This breaks codex-family models
(e.g. gpt-5.3-codex) on direct OpenAI API keys, which need the
/v1/responses endpoint.

Changes:
- Expand _resolve_task_provider_model to return api_mode (5-tuple)
- Read api_mode from auxiliary.{task}.api_mode config and env vars
  (AUXILIARY_{TASK}_API_MODE)
- Pass api_mode through _get_cached_client to resolve_provider_client
- Add _needs_codex_wrap/_wrap_if_needed helpers that wrap plain OpenAI
  clients in CodexAuxiliaryClient when api_mode=codex_responses or
  when auto-detection finds api.openai.com + codex model pattern
- Apply wrapping at all custom endpoint, named custom provider, and
  API-key provider return paths
- Update test mocks for the new 5-tuple return format

Users can now set:
  auxiliary:
    compression:
      model: gpt-5.3-codex
      base_url: https://api.openai.com/v1
      api_mode: codex_responses

Closes #6800

											
										
										
											2026-04-11 13:50:43 +05:30
+								            client = _wrap_if_needed(client, final_model, custom_base)
-												fix(copilot): send vision header for Copilot vision requests

Thread a vision-request flag through auxiliary provider resolution so Copilot clients can include Copilot-Vision-Request only for vision tasks. This preserves normal text requests while ensuring Copilot vision payloads reach the vision-capable route.

Add regression coverage for Copilot vision routing and keep cached text and vision clients separate so a text client without the header is not reused for vision.

Co-authored-by: dhabibi <9087935+dhabibi@users.noreply.github.com>

											
										
										
											2026-04-26 21:29:55 +00:00
+								            return (_to_async_client(client, final_model, is_vision=is_vision) if async_mode
-												feat: add direct endpoint overrides for auxiliary and delegation

Add base_url/api_key overrides for auxiliary tasks and delegation so users can
route those flows straight to a custom OpenAI-compatible endpoint without
having to rely on provider=main or named custom providers.

Also clear gateway session env vars in test isolation so the full suite stays
deterministic when run from a messaging-backed agent session.

											
										
										
											2026-03-14 20:48:29 -07:00
+								                    else (client, final_model))
-												feat: centralized provider router + fix Codex vision bypass + vision error handling

Three interconnected fixes for auxiliary client infrastructure:

1. CENTRALIZED PROVIDER ROUTER (auxiliary_client.py)
   Add resolve_provider_client(provider, model, async_mode) — a single
   entry point for creating properly configured clients. Given a provider
   name and optional model, it handles auth lookup (env vars, OAuth
   tokens, auth.json), base URL resolution, provider-specific headers,
   and API format differences (Chat Completions vs Responses API for
   Codex). All auxiliary consumers should route through this instead of
   ad-hoc env var lookups.

   Refactored get_text_auxiliary_client, get_async_text_auxiliary_client,
   and get_vision_auxiliary_client to use the router internally.

2. FIX CODEX VISION BYPASS (vision_tools.py)
   vision_tools.py was constructing a raw AsyncOpenAI client from the
   sync vision client's api_key/base_url, completely bypassing the Codex
   Responses API adapter. When the vision provider resolved to Codex,
   the raw client would hit chatgpt.com/backend-api/codex with
   chat.completions.create() which only supports the Responses API.

   Fix: Added get_async_vision_auxiliary_client() which properly wraps
   Codex into AsyncCodexAuxiliaryClient. vision_tools.py now uses this
   instead of manual client construction.

3. FIX COMPRESSION FALLBACK + VISION ERROR HANDLING
   - context_compressor.py: Removed _get_fallback_client() which blindly
     looked for OPENAI_API_KEY + OPENAI_BASE_URL (fails for Codex OAuth,
     API-key providers, users without OPENAI_BASE_URL set). Replaced
     with fallback loop through resolve_provider_client() for each
     known provider, with same-provider dedup.

   - vision_tools.py: Added error detection for vision capability
     failures. Returns clear message to the model when the configured
     model doesn't support vision, instead of a generic error.

Addresses #886

											
										
										
											2026-03-11 19:46:47 -07:00
+								        # Try custom first, then codex, then API-key providers
 								        for try_fn in (_try_custom_endpoint, _try_codex,
 								                       _resolve_api_key_provider):
 								            client, default = try_fn()
 								            if client is not None:
-												fix(model): normalize direct provider ids in auxiliary routing

											
										
										
											2026-04-09 21:20:29 -07:00
+								                final_model = _normalize_resolved_model(model or default, provider)
-												fix(auxiliary): honor api_mode in auxiliary client (#6800)

The auxiliary client always calls client.chat.completions.create(),
ignoring the api_mode config flag. This breaks codex-family models
(e.g. gpt-5.3-codex) on direct OpenAI API keys, which need the
/v1/responses endpoint.

Changes:
- Expand _resolve_task_provider_model to return api_mode (5-tuple)
- Read api_mode from auxiliary.{task}.api_mode config and env vars
  (AUXILIARY_{TASK}_API_MODE)
- Pass api_mode through _get_cached_client to resolve_provider_client
- Add _needs_codex_wrap/_wrap_if_needed helpers that wrap plain OpenAI
  clients in CodexAuxiliaryClient when api_mode=codex_responses or
  when auto-detection finds api.openai.com + codex model pattern
- Apply wrapping at all custom endpoint, named custom provider, and
  API-key provider return paths
- Update test mocks for the new 5-tuple return format

Users can now set:
  auxiliary:
    compression:
      model: gpt-5.3-codex
      base_url: https://api.openai.com/v1
      api_mode: codex_responses

Closes #6800

											
										
										
											2026-04-11 13:50:43 +05:30
+								                _cbase = str(getattr(client, "base_url", "") or "")
 								                client = _wrap_if_needed(client, final_model, _cbase)
-												fix(copilot): send vision header for Copilot vision requests

Thread a vision-request flag through auxiliary provider resolution so Copilot clients can include Copilot-Vision-Request only for vision tasks. This preserves normal text requests while ensuring Copilot vision payloads reach the vision-capable route.

Add regression coverage for Copilot vision routing and keep cached text and vision clients separate so a text client without the header is not reused for vision.

Co-authored-by: dhabibi <9087935+dhabibi@users.noreply.github.com>

											
										
										
											2026-04-26 21:29:55 +00:00
+								                return (_to_async_client(client, final_model, is_vision=is_vision) if async_mode
-												feat: centralized provider router + fix Codex vision bypass + vision error handling

Three interconnected fixes for auxiliary client infrastructure:

1. CENTRALIZED PROVIDER ROUTER (auxiliary_client.py)
   Add resolve_provider_client(provider, model, async_mode) — a single
   entry point for creating properly configured clients. Given a provider
   name and optional model, it handles auth lookup (env vars, OAuth
   tokens, auth.json), base URL resolution, provider-specific headers,
   and API format differences (Chat Completions vs Responses API for
   Codex). All auxiliary consumers should route through this instead of
   ad-hoc env var lookups.

   Refactored get_text_auxiliary_client, get_async_text_auxiliary_client,
   and get_vision_auxiliary_client to use the router internally.

2. FIX CODEX VISION BYPASS (vision_tools.py)
   vision_tools.py was constructing a raw AsyncOpenAI client from the
   sync vision client's api_key/base_url, completely bypassing the Codex
   Responses API adapter. When the vision provider resolved to Codex,
   the raw client would hit chatgpt.com/backend-api/codex with
   chat.completions.create() which only supports the Responses API.

   Fix: Added get_async_vision_auxiliary_client() which properly wraps
   Codex into AsyncCodexAuxiliaryClient. vision_tools.py now uses this
   instead of manual client construction.

3. FIX COMPRESSION FALLBACK + VISION ERROR HANDLING
   - context_compressor.py: Removed _get_fallback_client() which blindly
     looked for OPENAI_API_KEY + OPENAI_BASE_URL (fails for Codex OAuth,
     API-key providers, users without OPENAI_BASE_URL set). Replaced
     with fallback loop through resolve_provider_client() for each
     known provider, with same-provider dedup.

   - vision_tools.py: Added error detection for vision capability
     failures. Returns clear message to the model when the configured
     model doesn't support vision, instead of a generic error.

Addresses #886

											
										
										
											2026-03-11 19:46:47 -07:00
+								                        else (client, final_model))
 								        logger.warning("resolve_provider_client: custom/main requested "
 								                       "but no endpoint credentials found")
 								        return None, None
-												fix(aux-client): honor api_mode: anthropic_messages for named custom providers (#15059)

Auxiliary tasks (session_search, flush_memories, approvals, compression,
vision, etc.) that route to a named custom provider declared under
config.yaml 'providers:' with 'api_mode: anthropic_messages' were
silently building a plain OpenAI client and POSTing to
{base_url}/chat/completions, which returns 404 on Anthropic-compatible
gateways that only expose /v1/messages.

Two gaps caused this:

1. hermes_cli/runtime_provider.py::_get_named_custom_provider — the
   providers-dict branch (new-style) returned only name/base_url/api_key/
   model and dropped api_mode. The legacy custom_providers-list branch
   already propagated it correctly. The dict branch now parses and
   returns api_mode via _parse_api_mode() in both match paths.

2. agent/auxiliary_client.py::resolve_provider_client — the named
   custom provider block at ~L1740 ignored custom_entry['api_mode']
   and unconditionally built an OpenAI client (only wrapping for
   Codex/Responses). It now mirrors _try_custom_endpoint()'s three-way
   dispatch: anthropic_messages → AnthropicAuxiliaryClient (async wrapped
   in AsyncAnthropicAuxiliaryClient), codex_responses → CodexAuxiliaryClient,
   otherwise plain OpenAI. An explicit task-level api_mode override
   still wins over the provider entry's declared api_mode.

Fixes #15033

Tests: tests/agent/test_auxiliary_named_custom_providers.py gains a
TestProvidersDictApiModeAnthropicMessages class covering

  - providers-dict preserves valid api_mode
  - invalid api_mode values are dropped
  - missing api_mode leaves the entry unchanged (no regression)
  - resolve_provider_client returns (Async)AnthropicAuxiliaryClient for
    api_mode=anthropic_messages
  - full chain via get_text_auxiliary_client / get_async_text_auxiliary_client
    with an auxiliary.<task> override
  - providers without api_mode still use the OpenAI-wire path
											
										
										
											2026-04-24 03:10:30 -07:00
+								    # ── Named custom providers (config.yaml providers dict / custom_providers list) ───
-												fix(auxiliary): resolve named custom providers and 'main' alias in auxiliary routing (#5978)

* fix(telegram): replace substring caption check with exact line-by-line match

Captions in photo bursts and media group albums were silently dropped when
a shorter caption happened to be a substring of an existing one (e.g.
"Meeting" lost inside "Meeting agenda"). Extract a shared _merge_caption
static helper that splits on "\n\n" and uses exact match with whitespace
normalisation, then use it in both _enqueue_photo_event and
_queue_media_group_event.

Adds 13 unit tests covering the fixed bug scenarios.

Cherry-picked from PR #2671 by Dilee.

* fix: extend caption substring fix to all platforms

Move _merge_caption helper from TelegramAdapter to BasePlatformAdapter
so all adapters inherit it. Fix the same substring-containment bug in:
- gateway/platforms/base.py (photo burst merging)
- gateway/run.py (priority photo follow-up merging)
- gateway/platforms/feishu.py (media batch merging)

The original fix only covered telegram.py. The same bug existed in base.py
and run.py (pure substring check) and feishu.py (list membership without
whitespace normalization).

* fix(auxiliary): resolve named custom providers and 'main' alias in auxiliary routing

Two bugs caused auxiliary tasks (vision, compression, etc.) to fail when
using named custom providers defined in config.yaml:

1. 'provider: main' was hardcoded to 'custom', which only checks legacy
   OPENAI_BASE_URL env vars. Now reads _read_main_provider() to resolve
   to the actual provider (e.g., 'custom:beans', 'openrouter', 'deepseek').

2. Named custom provider names (e.g., 'beans') fell through to
   PROVIDER_REGISTRY which doesn't know about config.yaml entries.
   Now checks _get_named_custom_provider() before the registry fallback.

Fixes both resolve_provider_client() and _normalize_vision_provider()
so the fix covers all auxiliary tasks (vision, compression, web_extract,
session_search, etc.).

Adds 13 unit tests. Reported by Laura via Discord.

---------

Co-authored-by: Dilee <uzmpsk.dilekakbas@gmail.com>
											
										
										
											2026-04-07 17:59:47 -07:00
+								    try:
 								        from hermes_cli.runtime_provider import _get_named_custom_provider
 								        custom_entry = _get_named_custom_provider(provider)
 								        if custom_entry:
 								            custom_base = custom_entry.get("base_url", "").strip()
-												fix(config): restore custom providers after v11→v12 migration

The v11→v12 migration converts custom_providers (list) into providers
(dict), then deletes the list. But all runtime resolvers read from
custom_providers — after migration, named custom endpoints silently stop
resolving and fallback chains fail with AuthError.

Add get_compatible_custom_providers() that reads from both config schemas
(legacy custom_providers list + v12+ providers dict), normalizes entries,
deduplicates, and returns a unified list. Update ALL consumers:

- hermes_cli/runtime_provider.py: _get_named_custom_provider() + key_env
- hermes_cli/auth_commands.py: credential pool provider names
- hermes_cli/main.py: model picker + _model_flow_named_custom()
- agent/auxiliary_client.py: key_env + custom_entry model fallback
- agent/credential_pool.py: _iter_custom_providers()
- cli.py + gateway/run.py: /model switch custom_providers passthrough
- run_agent.py + gateway/run.py: per-model context_length lookup

Also: use config.pop() instead of del for safer migration, fix stale
_config_version assertions in tests, add pool mock to codex test.

Co-authored-by: 墨綠BG <s5460703@gmail.com>
Closes #8776, salvaged from PR #8814

											
										
										
											2026-04-13 05:26:32 -07:00
+								            custom_key = custom_entry.get("api_key", "").strip()
 								            custom_key_env = custom_entry.get("key_env", "").strip()
 								            if not custom_key and custom_key_env:
 								                custom_key = os.getenv(custom_key_env, "").strip()
 								            custom_key = custom_key or "no-key-required"
-												fix(aux-client): honor api_mode: anthropic_messages for named custom providers (#15059)

Auxiliary tasks (session_search, flush_memories, approvals, compression,
vision, etc.) that route to a named custom provider declared under
config.yaml 'providers:' with 'api_mode: anthropic_messages' were
silently building a plain OpenAI client and POSTing to
{base_url}/chat/completions, which returns 404 on Anthropic-compatible
gateways that only expose /v1/messages.

Two gaps caused this:

1. hermes_cli/runtime_provider.py::_get_named_custom_provider — the
   providers-dict branch (new-style) returned only name/base_url/api_key/
   model and dropped api_mode. The legacy custom_providers-list branch
   already propagated it correctly. The dict branch now parses and
   returns api_mode via _parse_api_mode() in both match paths.

2. agent/auxiliary_client.py::resolve_provider_client — the named
   custom provider block at ~L1740 ignored custom_entry['api_mode']
   and unconditionally built an OpenAI client (only wrapping for
   Codex/Responses). It now mirrors _try_custom_endpoint()'s three-way
   dispatch: anthropic_messages → AnthropicAuxiliaryClient (async wrapped
   in AsyncAnthropicAuxiliaryClient), codex_responses → CodexAuxiliaryClient,
   otherwise plain OpenAI. An explicit task-level api_mode override
   still wins over the provider entry's declared api_mode.

Fixes #15033

Tests: tests/agent/test_auxiliary_named_custom_providers.py gains a
TestProvidersDictApiModeAnthropicMessages class covering

  - providers-dict preserves valid api_mode
  - invalid api_mode values are dropped
  - missing api_mode leaves the entry unchanged (no regression)
  - resolve_provider_client returns (Async)AnthropicAuxiliaryClient for
    api_mode=anthropic_messages
  - full chain via get_text_auxiliary_client / get_async_text_auxiliary_client
    with an auxiliary.<task> override
  - providers without api_mode still use the OpenAI-wire path
											
										
										
											2026-04-24 03:10:30 -07:00
+								            # An explicit per-task api_mode override (from _resolve_task_provider_model)
 								            # wins; otherwise fall back to what the provider entry declared.
 								            entry_api_mode = (api_mode or custom_entry.get("api_mode") or "").strip()
-												fix(auxiliary): resolve named custom providers and 'main' alias in auxiliary routing (#5978)

* fix(telegram): replace substring caption check with exact line-by-line match

Captions in photo bursts and media group albums were silently dropped when
a shorter caption happened to be a substring of an existing one (e.g.
"Meeting" lost inside "Meeting agenda"). Extract a shared _merge_caption
static helper that splits on "\n\n" and uses exact match with whitespace
normalisation, then use it in both _enqueue_photo_event and
_queue_media_group_event.

Adds 13 unit tests covering the fixed bug scenarios.

Cherry-picked from PR #2671 by Dilee.

* fix: extend caption substring fix to all platforms

Move _merge_caption helper from TelegramAdapter to BasePlatformAdapter
so all adapters inherit it. Fix the same substring-containment bug in:
- gateway/platforms/base.py (photo burst merging)
- gateway/run.py (priority photo follow-up merging)
- gateway/platforms/feishu.py (media batch merging)

The original fix only covered telegram.py. The same bug existed in base.py
and run.py (pure substring check) and feishu.py (list membership without
whitespace normalization).

* fix(auxiliary): resolve named custom providers and 'main' alias in auxiliary routing

Two bugs caused auxiliary tasks (vision, compression, etc.) to fail when
using named custom providers defined in config.yaml:

1. 'provider: main' was hardcoded to 'custom', which only checks legacy
   OPENAI_BASE_URL env vars. Now reads _read_main_provider() to resolve
   to the actual provider (e.g., 'custom:beans', 'openrouter', 'deepseek').

2. Named custom provider names (e.g., 'beans') fell through to
   PROVIDER_REGISTRY which doesn't know about config.yaml entries.
   Now checks _get_named_custom_provider() before the registry fallback.

Fixes both resolve_provider_client() and _normalize_vision_provider()
so the fix covers all auxiliary tasks (vision, compression, web_extract,
session_search, etc.).

Adds 13 unit tests. Reported by Laura via Discord.

---------

Co-authored-by: Dilee <uzmpsk.dilekakbas@gmail.com>
											
										
										
											2026-04-07 17:59:47 -07:00
+								            if custom_base:
-												fix(model): normalize direct provider ids in auxiliary routing

											
										
										
											2026-04-09 21:20:29 -07:00
+								                final_model = _normalize_resolved_model(
-												fix(config): restore custom providers after v11→v12 migration

The v11→v12 migration converts custom_providers (list) into providers
(dict), then deletes the list. But all runtime resolvers read from
custom_providers — after migration, named custom endpoints silently stop
resolving and fallback chains fail with AuthError.

Add get_compatible_custom_providers() that reads from both config schemas
(legacy custom_providers list + v12+ providers dict), normalizes entries,
deduplicates, and returns a unified list. Update ALL consumers:

- hermes_cli/runtime_provider.py: _get_named_custom_provider() + key_env
- hermes_cli/auth_commands.py: credential pool provider names
- hermes_cli/main.py: model picker + _model_flow_named_custom()
- agent/auxiliary_client.py: key_env + custom_entry model fallback
- agent/credential_pool.py: _iter_custom_providers()
- cli.py + gateway/run.py: /model switch custom_providers passthrough
- run_agent.py + gateway/run.py: per-model context_length lookup

Also: use config.pop() instead of del for safer migration, fix stale
_config_version assertions in tests, add pool mock to codex test.

Co-authored-by: 墨綠BG <s5460703@gmail.com>
Closes #8776, salvaged from PR #8814

											
										
										
											2026-04-13 05:26:32 -07:00
+								                    model or custom_entry.get("model") or _read_main_model() or "gpt-4o-mini",
-												fix(model): normalize direct provider ids in auxiliary routing

											
										
										
											2026-04-09 21:20:29 -07:00
+								                    provider,
 								                )
-												fix: preserve URL query params for Azure OpenAI and custom endpoints

Azure OpenAI requires an `api-version` query parameter on every request.
When users include it in the base_url (e.g. `?api-version=2025-04-01-preview`),
the OpenAI SDK silently drops it during URL construction, causing 404 errors.

Extract query params from base_url and pass them via `default_query` so the
SDK appends them to every request. This is a generic solution that works for
any custom endpoint requiring query parameters, not just Azure.

No-op for URLs without query params — fully backward compatible.

											
										
										
											2026-04-13 11:13:46 +08:00
+								                _clean_base2, _dq2 = _extract_url_query_params(custom_base)
 								                _extra2 = {"default_query": _dq2} if _dq2 else {}
-												fix(auxiliary): resolve named custom providers and 'main' alias in auxiliary routing (#5978)

* fix(telegram): replace substring caption check with exact line-by-line match

Captions in photo bursts and media group albums were silently dropped when
a shorter caption happened to be a substring of an existing one (e.g.
"Meeting" lost inside "Meeting agenda"). Extract a shared _merge_caption
static helper that splits on "\n\n" and uses exact match with whitespace
normalisation, then use it in both _enqueue_photo_event and
_queue_media_group_event.

Adds 13 unit tests covering the fixed bug scenarios.

Cherry-picked from PR #2671 by Dilee.

* fix: extend caption substring fix to all platforms

Move _merge_caption helper from TelegramAdapter to BasePlatformAdapter
so all adapters inherit it. Fix the same substring-containment bug in:
- gateway/platforms/base.py (photo burst merging)
- gateway/run.py (priority photo follow-up merging)
- gateway/platforms/feishu.py (media batch merging)

The original fix only covered telegram.py. The same bug existed in base.py
and run.py (pure substring check) and feishu.py (list membership without
whitespace normalization).

* fix(auxiliary): resolve named custom providers and 'main' alias in auxiliary routing

Two bugs caused auxiliary tasks (vision, compression, etc.) to fail when
using named custom providers defined in config.yaml:

1. 'provider: main' was hardcoded to 'custom', which only checks legacy
   OPENAI_BASE_URL env vars. Now reads _read_main_provider() to resolve
   to the actual provider (e.g., 'custom:beans', 'openrouter', 'deepseek').

2. Named custom provider names (e.g., 'beans') fell through to
   PROVIDER_REGISTRY which doesn't know about config.yaml entries.
   Now checks _get_named_custom_provider() before the registry fallback.

Fixes both resolve_provider_client() and _normalize_vision_provider()
so the fix covers all auxiliary tasks (vision, compression, web_extract,
session_search, etc.).

Adds 13 unit tests. Reported by Laura via Discord.

---------

Co-authored-by: Dilee <uzmpsk.dilekakbas@gmail.com>
											
										
										
											2026-04-07 17:59:47 -07:00
+								                logger.debug(
-												fix(aux-client): honor api_mode: anthropic_messages for named custom providers (#15059)

Auxiliary tasks (session_search, flush_memories, approvals, compression,
vision, etc.) that route to a named custom provider declared under
config.yaml 'providers:' with 'api_mode: anthropic_messages' were
silently building a plain OpenAI client and POSTing to
{base_url}/chat/completions, which returns 404 on Anthropic-compatible
gateways that only expose /v1/messages.

Two gaps caused this:

1. hermes_cli/runtime_provider.py::_get_named_custom_provider — the
   providers-dict branch (new-style) returned only name/base_url/api_key/
   model and dropped api_mode. The legacy custom_providers-list branch
   already propagated it correctly. The dict branch now parses and
   returns api_mode via _parse_api_mode() in both match paths.

2. agent/auxiliary_client.py::resolve_provider_client — the named
   custom provider block at ~L1740 ignored custom_entry['api_mode']
   and unconditionally built an OpenAI client (only wrapping for
   Codex/Responses). It now mirrors _try_custom_endpoint()'s three-way
   dispatch: anthropic_messages → AnthropicAuxiliaryClient (async wrapped
   in AsyncAnthropicAuxiliaryClient), codex_responses → CodexAuxiliaryClient,
   otherwise plain OpenAI. An explicit task-level api_mode override
   still wins over the provider entry's declared api_mode.

Fixes #15033

Tests: tests/agent/test_auxiliary_named_custom_providers.py gains a
TestProvidersDictApiModeAnthropicMessages class covering

  - providers-dict preserves valid api_mode
  - invalid api_mode values are dropped
  - missing api_mode leaves the entry unchanged (no regression)
  - resolve_provider_client returns (Async)AnthropicAuxiliaryClient for
    api_mode=anthropic_messages
  - full chain via get_text_auxiliary_client / get_async_text_auxiliary_client
    with an auxiliary.<task> override
  - providers without api_mode still use the OpenAI-wire path
											
										
										
											2026-04-24 03:10:30 -07:00
+								                    "resolve_provider_client: named custom provider %r (%s, api_mode=%s)",
 								                    provider, final_model, entry_api_mode or "chat_completions")
 								                # anthropic_messages: route through the Anthropic Messages API
 								                # via AnthropicAuxiliaryClient. Mirrors the anonymous-custom
 								                # branch in _try_custom_endpoint(). See #15033.
 								                if entry_api_mode == "anthropic_messages":
 								                    try:
 								                        from agent.anthropic_adapter import build_anthropic_client
 								                        real_client = build_anthropic_client(custom_key, custom_base)
 								                    except ImportError:
 								                        logger.warning(
 								                            "Named custom provider %r declares api_mode="
 								                            "anthropic_messages but the anthropic SDK is not "
 								                            "installed — falling back to OpenAI-wire.",
 								                            provider,
 								                        )
-												fix: preserve URL query params for Azure OpenAI and custom endpoints

Azure OpenAI requires an `api-version` query parameter on every request.
When users include it in the base_url (e.g. `?api-version=2025-04-01-preview`),
the OpenAI SDK silently drops it during URL construction, causing 404 errors.

Extract query params from base_url and pass them via `default_query` so the
SDK appends them to every request. This is a generic solution that works for
any custom endpoint requiring query parameters, not just Azure.

No-op for URLs without query params — fully backward compatible.

											
										
										
											2026-04-13 11:13:46 +08:00
+								                        client = OpenAI(api_key=custom_key, base_url=_clean_base2, **_extra2)
-												fix(copilot): send vision header for Copilot vision requests

Thread a vision-request flag through auxiliary provider resolution so Copilot clients can include Copilot-Vision-Request only for vision tasks. This preserves normal text requests while ensuring Copilot vision payloads reach the vision-capable route.

Add regression coverage for Copilot vision routing and keep cached text and vision clients separate so a text client without the header is not reused for vision.

Co-authored-by: dhabibi <9087935+dhabibi@users.noreply.github.com>

											
										
										
											2026-04-26 21:29:55 +00:00
+								                        return (_to_async_client(client, final_model, is_vision=is_vision) if async_mode
-												fix(aux-client): honor api_mode: anthropic_messages for named custom providers (#15059)

Auxiliary tasks (session_search, flush_memories, approvals, compression,
vision, etc.) that route to a named custom provider declared under
config.yaml 'providers:' with 'api_mode: anthropic_messages' were
silently building a plain OpenAI client and POSTing to
{base_url}/chat/completions, which returns 404 on Anthropic-compatible
gateways that only expose /v1/messages.

Two gaps caused this:

1. hermes_cli/runtime_provider.py::_get_named_custom_provider — the
   providers-dict branch (new-style) returned only name/base_url/api_key/
   model and dropped api_mode. The legacy custom_providers-list branch
   already propagated it correctly. The dict branch now parses and
   returns api_mode via _parse_api_mode() in both match paths.

2. agent/auxiliary_client.py::resolve_provider_client — the named
   custom provider block at ~L1740 ignored custom_entry['api_mode']
   and unconditionally built an OpenAI client (only wrapping for
   Codex/Responses). It now mirrors _try_custom_endpoint()'s three-way
   dispatch: anthropic_messages → AnthropicAuxiliaryClient (async wrapped
   in AsyncAnthropicAuxiliaryClient), codex_responses → CodexAuxiliaryClient,
   otherwise plain OpenAI. An explicit task-level api_mode override
   still wins over the provider entry's declared api_mode.

Fixes #15033

Tests: tests/agent/test_auxiliary_named_custom_providers.py gains a
TestProvidersDictApiModeAnthropicMessages class covering

  - providers-dict preserves valid api_mode
  - invalid api_mode values are dropped
  - missing api_mode leaves the entry unchanged (no regression)
  - resolve_provider_client returns (Async)AnthropicAuxiliaryClient for
    api_mode=anthropic_messages
  - full chain via get_text_auxiliary_client / get_async_text_auxiliary_client
    with an auxiliary.<task> override
  - providers without api_mode still use the OpenAI-wire path
											
										
										
											2026-04-24 03:10:30 -07:00
+								                                else (client, final_model))
 								                    sync_anthropic = AnthropicAuxiliaryClient(
 								                        real_client, final_model, custom_key, custom_base, is_oauth=False,
 								                    )
 								                    if async_mode:
 								                        return AsyncAnthropicAuxiliaryClient(sync_anthropic), final_model
 								                    return sync_anthropic, final_model
-												fix: preserve URL query params for Azure OpenAI and custom endpoints

Azure OpenAI requires an `api-version` query parameter on every request.
When users include it in the base_url (e.g. `?api-version=2025-04-01-preview`),
the OpenAI SDK silently drops it during URL construction, causing 404 errors.

Extract query params from base_url and pass them via `default_query` so the
SDK appends them to every request. This is a generic solution that works for
any custom endpoint requiring query parameters, not just Azure.

No-op for URLs without query params — fully backward compatible.

											
										
										
											2026-04-13 11:13:46 +08:00
+								                client = OpenAI(api_key=custom_key, base_url=_clean_base2, **_extra2)
-												fix(aux-client): honor api_mode: anthropic_messages for named custom providers (#15059)

Auxiliary tasks (session_search, flush_memories, approvals, compression,
vision, etc.) that route to a named custom provider declared under
config.yaml 'providers:' with 'api_mode: anthropic_messages' were
silently building a plain OpenAI client and POSTing to
{base_url}/chat/completions, which returns 404 on Anthropic-compatible
gateways that only expose /v1/messages.

Two gaps caused this:

1. hermes_cli/runtime_provider.py::_get_named_custom_provider — the
   providers-dict branch (new-style) returned only name/base_url/api_key/
   model and dropped api_mode. The legacy custom_providers-list branch
   already propagated it correctly. The dict branch now parses and
   returns api_mode via _parse_api_mode() in both match paths.

2. agent/auxiliary_client.py::resolve_provider_client — the named
   custom provider block at ~L1740 ignored custom_entry['api_mode']
   and unconditionally built an OpenAI client (only wrapping for
   Codex/Responses). It now mirrors _try_custom_endpoint()'s three-way
   dispatch: anthropic_messages → AnthropicAuxiliaryClient (async wrapped
   in AsyncAnthropicAuxiliaryClient), codex_responses → CodexAuxiliaryClient,
   otherwise plain OpenAI. An explicit task-level api_mode override
   still wins over the provider entry's declared api_mode.

Fixes #15033

Tests: tests/agent/test_auxiliary_named_custom_providers.py gains a
TestProvidersDictApiModeAnthropicMessages class covering

  - providers-dict preserves valid api_mode
  - invalid api_mode values are dropped
  - missing api_mode leaves the entry unchanged (no regression)
  - resolve_provider_client returns (Async)AnthropicAuxiliaryClient for
    api_mode=anthropic_messages
  - full chain via get_text_auxiliary_client / get_async_text_auxiliary_client
    with an auxiliary.<task> override
  - providers without api_mode still use the OpenAI-wire path
											
										
										
											2026-04-24 03:10:30 -07:00
+								                # codex_responses or inherited auto-detect (via _wrap_if_needed).
 								                # _wrap_if_needed reads the closed-over `api_mode` (the task-level
 								                # override). Named-provider entry api_mode=codex_responses also
 								                # flows through here.
 								                if entry_api_mode == "codex_responses" and not isinstance(
 								                    client, CodexAuxiliaryClient
 								                ):
 								                    client = CodexAuxiliaryClient(client, final_model)
 								                else:
 								                    client = _wrap_if_needed(client, final_model, custom_base)
-												fix(copilot): send vision header for Copilot vision requests

Thread a vision-request flag through auxiliary provider resolution so Copilot clients can include Copilot-Vision-Request only for vision tasks. This preserves normal text requests while ensuring Copilot vision payloads reach the vision-capable route.

Add regression coverage for Copilot vision routing and keep cached text and vision clients separate so a text client without the header is not reused for vision.

Co-authored-by: dhabibi <9087935+dhabibi@users.noreply.github.com>

											
										
										
											2026-04-26 21:29:55 +00:00
+								                return (_to_async_client(client, final_model, is_vision=is_vision) if async_mode
-												fix(auxiliary): resolve named custom providers and 'main' alias in auxiliary routing (#5978)

* fix(telegram): replace substring caption check with exact line-by-line match

Captions in photo bursts and media group albums were silently dropped when
a shorter caption happened to be a substring of an existing one (e.g.
"Meeting" lost inside "Meeting agenda"). Extract a shared _merge_caption
static helper that splits on "\n\n" and uses exact match with whitespace
normalisation, then use it in both _enqueue_photo_event and
_queue_media_group_event.

Adds 13 unit tests covering the fixed bug scenarios.

Cherry-picked from PR #2671 by Dilee.

* fix: extend caption substring fix to all platforms

Move _merge_caption helper from TelegramAdapter to BasePlatformAdapter
so all adapters inherit it. Fix the same substring-containment bug in:
- gateway/platforms/base.py (photo burst merging)
- gateway/run.py (priority photo follow-up merging)
- gateway/platforms/feishu.py (media batch merging)

The original fix only covered telegram.py. The same bug existed in base.py
and run.py (pure substring check) and feishu.py (list membership without
whitespace normalization).

* fix(auxiliary): resolve named custom providers and 'main' alias in auxiliary routing

Two bugs caused auxiliary tasks (vision, compression, etc.) to fail when
using named custom providers defined in config.yaml:

1. 'provider: main' was hardcoded to 'custom', which only checks legacy
   OPENAI_BASE_URL env vars. Now reads _read_main_provider() to resolve
   to the actual provider (e.g., 'custom:beans', 'openrouter', 'deepseek').

2. Named custom provider names (e.g., 'beans') fell through to
   PROVIDER_REGISTRY which doesn't know about config.yaml entries.
   Now checks _get_named_custom_provider() before the registry fallback.

Fixes both resolve_provider_client() and _normalize_vision_provider()
so the fix covers all auxiliary tasks (vision, compression, web_extract,
session_search, etc.).

Adds 13 unit tests. Reported by Laura via Discord.

---------

Co-authored-by: Dilee <uzmpsk.dilekakbas@gmail.com>
											
										
										
											2026-04-07 17:59:47 -07:00
+								                        else (client, final_model))
 								            logger.warning(
 								                "resolve_provider_client: named custom provider %r has no base_url",
 								                provider)
 								            return None, None
 								    except ImportError:
 								        pass
-												feat: centralized provider router + fix Codex vision bypass + vision error handling

Three interconnected fixes for auxiliary client infrastructure:

1. CENTRALIZED PROVIDER ROUTER (auxiliary_client.py)
   Add resolve_provider_client(provider, model, async_mode) — a single
   entry point for creating properly configured clients. Given a provider
   name and optional model, it handles auth lookup (env vars, OAuth
   tokens, auth.json), base URL resolution, provider-specific headers,
   and API format differences (Chat Completions vs Responses API for
   Codex). All auxiliary consumers should route through this instead of
   ad-hoc env var lookups.

   Refactored get_text_auxiliary_client, get_async_text_auxiliary_client,
   and get_vision_auxiliary_client to use the router internally.

2. FIX CODEX VISION BYPASS (vision_tools.py)
   vision_tools.py was constructing a raw AsyncOpenAI client from the
   sync vision client's api_key/base_url, completely bypassing the Codex
   Responses API adapter. When the vision provider resolved to Codex,
   the raw client would hit chatgpt.com/backend-api/codex with
   chat.completions.create() which only supports the Responses API.

   Fix: Added get_async_vision_auxiliary_client() which properly wraps
   Codex into AsyncCodexAuxiliaryClient. vision_tools.py now uses this
   instead of manual client construction.

3. FIX COMPRESSION FALLBACK + VISION ERROR HANDLING
   - context_compressor.py: Removed _get_fallback_client() which blindly
     looked for OPENAI_API_KEY + OPENAI_BASE_URL (fails for Codex OAuth,
     API-key providers, users without OPENAI_BASE_URL set). Replaced
     with fallback loop through resolve_provider_client() for each
     known provider, with same-provider dedup.

   - vision_tools.py: Added error detection for vision capability
     failures. Returns clear message to the model when the configured
     model doesn't support vision, instead of a generic error.

Addresses #886

											
										
										
											2026-03-11 19:46:47 -07:00
+								    # ── API-key providers from PROVIDER_REGISTRY ─────────────────────
 								    try:
-												fix(copilot-acp): keep acp runtime off responses path

											
										
										
											2026-04-12 18:47:14 -06:00
+								        from hermes_cli.auth import (
 								            PROVIDER_REGISTRY,
 								            resolve_api_key_provider_credentials,
 								            resolve_external_process_provider_credentials,
 								        )
-												feat: centralized provider router + fix Codex vision bypass + vision error handling

Three interconnected fixes for auxiliary client infrastructure:

1. CENTRALIZED PROVIDER ROUTER (auxiliary_client.py)
   Add resolve_provider_client(provider, model, async_mode) — a single
   entry point for creating properly configured clients. Given a provider
   name and optional model, it handles auth lookup (env vars, OAuth
   tokens, auth.json), base URL resolution, provider-specific headers,
   and API format differences (Chat Completions vs Responses API for
   Codex). All auxiliary consumers should route through this instead of
   ad-hoc env var lookups.

   Refactored get_text_auxiliary_client, get_async_text_auxiliary_client,
   and get_vision_auxiliary_client to use the router internally.

2. FIX CODEX VISION BYPASS (vision_tools.py)
   vision_tools.py was constructing a raw AsyncOpenAI client from the
   sync vision client's api_key/base_url, completely bypassing the Codex
   Responses API adapter. When the vision provider resolved to Codex,
   the raw client would hit chatgpt.com/backend-api/codex with
   chat.completions.create() which only supports the Responses API.

   Fix: Added get_async_vision_auxiliary_client() which properly wraps
   Codex into AsyncCodexAuxiliaryClient. vision_tools.py now uses this
   instead of manual client construction.

3. FIX COMPRESSION FALLBACK + VISION ERROR HANDLING
   - context_compressor.py: Removed _get_fallback_client() which blindly
     looked for OPENAI_API_KEY + OPENAI_BASE_URL (fails for Codex OAuth,
     API-key providers, users without OPENAI_BASE_URL set). Replaced
     with fallback loop through resolve_provider_client() for each
     known provider, with same-provider dedup.

   - vision_tools.py: Added error detection for vision capability
     failures. Returns clear message to the model when the configured
     model doesn't support vision, instead of a generic error.

Addresses #886

											
										
										
											2026-03-11 19:46:47 -07:00
+								    except ImportError:
 								        logger.debug("hermes_cli.auth not available for provider %s", provider)
 								        return None, None
 								    pconfig = PROVIDER_REGISTRY.get(provider)
 								    if pconfig is None:
 								        logger.warning("resolve_provider_client: unknown provider %r", provider)
 								        return None, None
 								    if pconfig.auth_type == "api_key":
-												feat: add native Anthropic auxiliary vision

											
										
										
											2026-03-14 21:14:20 -07:00
+								        if provider == "anthropic":
 								            client, default_model = _try_anthropic()
 								            if client is None:
 								                logger.warning("resolve_provider_client: anthropic requested but no Anthropic credentials found")
 								                return None, None
-												fix(model): normalize direct provider ids in auxiliary routing

											
										
										
											2026-04-09 21:20:29 -07:00
+								            final_model = _normalize_resolved_model(model or default_model, provider)
-												fix(copilot): send vision header for Copilot vision requests

Thread a vision-request flag through auxiliary provider resolution so Copilot clients can include Copilot-Vision-Request only for vision tasks. This preserves normal text requests while ensuring Copilot vision payloads reach the vision-capable route.

Add regression coverage for Copilot vision routing and keep cached text and vision clients separate so a text client without the header is not reused for vision.

Co-authored-by: dhabibi <9087935+dhabibi@users.noreply.github.com>

											
										
										
											2026-04-26 21:29:55 +00:00
+								            return (_to_async_client(client, final_model, is_vision=is_vision) if async_mode else (client, final_model))
-												feat: add native Anthropic auxiliary vision

											
										
										
											2026-03-14 21:14:20 -07:00
-												feat: integrate GitHub Copilot providers across Hermes

Add first-class GitHub Copilot and Copilot ACP provider support across
model selection, runtime provider resolution, CLI sessions, delegated
subagents, cron jobs, and the Telegram gateway.

This also normalizes Copilot model catalogs and API modes, introduces a
Copilot ACP OpenAI-compatible shim, and fixes service-mode auth by
resolving Homebrew-installed gh binaries under launchd.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

											
										
										
											2026-03-17 23:40:22 -07:00
+								        creds = resolve_api_key_provider_credentials(provider)
 								        api_key = str(creds.get("api_key", "")).strip()
-												feat: centralized provider router + fix Codex vision bypass + vision error handling

Three interconnected fixes for auxiliary client infrastructure:

1. CENTRALIZED PROVIDER ROUTER (auxiliary_client.py)
   Add resolve_provider_client(provider, model, async_mode) — a single
   entry point for creating properly configured clients. Given a provider
   name and optional model, it handles auth lookup (env vars, OAuth
   tokens, auth.json), base URL resolution, provider-specific headers,
   and API format differences (Chat Completions vs Responses API for
   Codex). All auxiliary consumers should route through this instead of
   ad-hoc env var lookups.

   Refactored get_text_auxiliary_client, get_async_text_auxiliary_client,
   and get_vision_auxiliary_client to use the router internally.

2. FIX CODEX VISION BYPASS (vision_tools.py)
   vision_tools.py was constructing a raw AsyncOpenAI client from the
   sync vision client's api_key/base_url, completely bypassing the Codex
   Responses API adapter. When the vision provider resolved to Codex,
   the raw client would hit chatgpt.com/backend-api/codex with
   chat.completions.create() which only supports the Responses API.

   Fix: Added get_async_vision_auxiliary_client() which properly wraps
   Codex into AsyncCodexAuxiliaryClient. vision_tools.py now uses this
   instead of manual client construction.

3. FIX COMPRESSION FALLBACK + VISION ERROR HANDLING
   - context_compressor.py: Removed _get_fallback_client() which blindly
     looked for OPENAI_API_KEY + OPENAI_BASE_URL (fails for Codex OAuth,
     API-key providers, users without OPENAI_BASE_URL set). Replaced
     with fallback loop through resolve_provider_client() for each
     known provider, with same-provider dedup.

   - vision_tools.py: Added error detection for vision capability
     failures. Returns clear message to the model when the configured
     model doesn't support vision, instead of a generic error.

Addresses #886

											
										
										
											2026-03-11 19:46:47 -07:00
+								        if not api_key:
-												feat: integrate GitHub Copilot providers across Hermes

Add first-class GitHub Copilot and Copilot ACP provider support across
model selection, runtime provider resolution, CLI sessions, delegated
subagents, cron jobs, and the Telegram gateway.

This also normalizes Copilot model catalogs and API modes, introduces a
Copilot ACP OpenAI-compatible shim, and fixes service-mode auth by
resolving Homebrew-installed gh binaries under launchd.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

											
										
										
											2026-03-17 23:40:22 -07:00
+								            tried_sources = list(pconfig.api_key_env_vars)
 								            if provider == "copilot":
 								                tried_sources.append("gh auth token")
-												fix: preserve allowed_users during setup reconfigure and quiet unconfigured provider warnings

Setup wizard now shows existing allowed_users when reconfiguring a
platform and preserves them if the user presses Enter. Previously the
wizard would display a misleading "No allowlist set" warning even when
the .env still held the original IDs.

Also downgrades the "provider X has no API key configured" log from
WARNING to DEBUG in resolve_provider_client — callers already handle
the None return with their own contextual messages. This eliminates
noisy startup warnings for providers in the fallback chain that the
user never configured (e.g. minimax).

											
										
										
											2026-04-02 12:58:08 +05:30
+								            logger.debug("resolve_provider_client: provider %s has no API "
 								                         "key configured (tried: %s)",
 								                         provider, ", ".join(tried_sources))
-												feat: centralized provider router + fix Codex vision bypass + vision error handling

Three interconnected fixes for auxiliary client infrastructure:

1. CENTRALIZED PROVIDER ROUTER (auxiliary_client.py)
   Add resolve_provider_client(provider, model, async_mode) — a single
   entry point for creating properly configured clients. Given a provider
   name and optional model, it handles auth lookup (env vars, OAuth
   tokens, auth.json), base URL resolution, provider-specific headers,
   and API format differences (Chat Completions vs Responses API for
   Codex). All auxiliary consumers should route through this instead of
   ad-hoc env var lookups.

   Refactored get_text_auxiliary_client, get_async_text_auxiliary_client,
   and get_vision_auxiliary_client to use the router internally.

2. FIX CODEX VISION BYPASS (vision_tools.py)
   vision_tools.py was constructing a raw AsyncOpenAI client from the
   sync vision client's api_key/base_url, completely bypassing the Codex
   Responses API adapter. When the vision provider resolved to Codex,
   the raw client would hit chatgpt.com/backend-api/codex with
   chat.completions.create() which only supports the Responses API.

   Fix: Added get_async_vision_auxiliary_client() which properly wraps
   Codex into AsyncCodexAuxiliaryClient. vision_tools.py now uses this
   instead of manual client construction.

3. FIX COMPRESSION FALLBACK + VISION ERROR HANDLING
   - context_compressor.py: Removed _get_fallback_client() which blindly
     looked for OPENAI_API_KEY + OPENAI_BASE_URL (fails for Codex OAuth,
     API-key providers, users without OPENAI_BASE_URL set). Replaced
     with fallback loop through resolve_provider_client() for each
     known provider, with same-provider dedup.

   - vision_tools.py: Added error detection for vision capability
     failures. Returns clear message to the model when the configured
     model doesn't support vision, instead of a generic error.

Addresses #886

											
										
										
											2026-03-11 19:46:47 -07:00
+								            return None, None
-												fix: provider/model resolution — salvage 4 PRs + MiniMax aux URL fix (#5983)

Salvaged fixes from community PRs:

- fix(model_switch): _read_auth_store → _load_auth_store + fix auth store
  key lookup (was checking top-level dict instead of store['providers']).
  OAuth providers now correctly detected in /model picker.
  Cherry-picked from PR #5911 by Xule Lin (linxule).

- fix(ollama): pass num_ctx to override 2048 default context window.
  Ollama defaults to 2048 context regardless of model capabilities. Now
  auto-detects from /api/show metadata and injects num_ctx into every
  request. Config override via model.ollama_num_ctx. Fixes #2708.
  Cherry-picked from PR #5929 by kshitij (kshitijk4poor).

- fix(aux): normalize provider aliases for vision/auxiliary routing.
  Adds _normalize_aux_provider() with 17 aliases (google→gemini,
  claude→anthropic, glm→zai, etc). Fixes vision routing failure when
  provider is set to 'google' instead of 'gemini'.
  Cherry-picked from PR #5793 by e11i (Elizabeth1979).

- fix(aux): rewrite MiniMax /anthropic base URLs to /v1 for OpenAI SDK.
  MiniMax's inference_base_url ends in /anthropic (Anthropic Messages API),
  but auxiliary client uses OpenAI SDK which appends /chat/completions →
  404 at /anthropic/chat/completions. Generic _to_openai_base_url() helper
  rewrites terminal /anthropic to /v1 for OpenAI-compatible endpoint.
  Inspired by PR #5786 by Lempkey.

Added debug logging to silent exception blocks across all fixes.

Co-authored-by: Hermes Agent <hermes@nousresearch.com>
											
										
										
											2026-04-07 22:23:28 -07:00
+								        base_url = _to_openai_base_url(
 								            str(creds.get("base_url", "")).strip().rstrip("/") or pconfig.inference_base_url
 								        )
-												feat: centralized provider router + fix Codex vision bypass + vision error handling

Three interconnected fixes for auxiliary client infrastructure:

1. CENTRALIZED PROVIDER ROUTER (auxiliary_client.py)
   Add resolve_provider_client(provider, model, async_mode) — a single
   entry point for creating properly configured clients. Given a provider
   name and optional model, it handles auth lookup (env vars, OAuth
   tokens, auth.json), base URL resolution, provider-specific headers,
   and API format differences (Chat Completions vs Responses API for
   Codex). All auxiliary consumers should route through this instead of
   ad-hoc env var lookups.

   Refactored get_text_auxiliary_client, get_async_text_auxiliary_client,
   and get_vision_auxiliary_client to use the router internally.

2. FIX CODEX VISION BYPASS (vision_tools.py)
   vision_tools.py was constructing a raw AsyncOpenAI client from the
   sync vision client's api_key/base_url, completely bypassing the Codex
   Responses API adapter. When the vision provider resolved to Codex,
   the raw client would hit chatgpt.com/backend-api/codex with
   chat.completions.create() which only supports the Responses API.

   Fix: Added get_async_vision_auxiliary_client() which properly wraps
   Codex into AsyncCodexAuxiliaryClient. vision_tools.py now uses this
   instead of manual client construction.

3. FIX COMPRESSION FALLBACK + VISION ERROR HANDLING
   - context_compressor.py: Removed _get_fallback_client() which blindly
     looked for OPENAI_API_KEY + OPENAI_BASE_URL (fails for Codex OAuth,
     API-key providers, users without OPENAI_BASE_URL set). Replaced
     with fallback loop through resolve_provider_client() for each
     known provider, with same-provider dedup.

   - vision_tools.py: Added error detection for vision capability
     failures. Returns clear message to the model when the configured
     model doesn't support vision, instead of a generic error.

Addresses #886

											
										
										
											2026-03-11 19:46:47 -07:00
 								        default_model = _API_KEY_PROVIDER_AUX_MODELS.get(provider, "")
-												fix(model): normalize direct provider ids in auxiliary routing

											
										
										
											2026-04-09 21:20:29 -07:00
+								        final_model = _normalize_resolved_model(model or default_model, provider)
-												feat: centralized provider router + fix Codex vision bypass + vision error handling

Three interconnected fixes for auxiliary client infrastructure:

1. CENTRALIZED PROVIDER ROUTER (auxiliary_client.py)
   Add resolve_provider_client(provider, model, async_mode) — a single
   entry point for creating properly configured clients. Given a provider
   name and optional model, it handles auth lookup (env vars, OAuth
   tokens, auth.json), base URL resolution, provider-specific headers,
   and API format differences (Chat Completions vs Responses API for
   Codex). All auxiliary consumers should route through this instead of
   ad-hoc env var lookups.

   Refactored get_text_auxiliary_client, get_async_text_auxiliary_client,
   and get_vision_auxiliary_client to use the router internally.

2. FIX CODEX VISION BYPASS (vision_tools.py)
   vision_tools.py was constructing a raw AsyncOpenAI client from the
   sync vision client's api_key/base_url, completely bypassing the Codex
   Responses API adapter. When the vision provider resolved to Codex,
   the raw client would hit chatgpt.com/backend-api/codex with
   chat.completions.create() which only supports the Responses API.

   Fix: Added get_async_vision_auxiliary_client() which properly wraps
   Codex into AsyncCodexAuxiliaryClient. vision_tools.py now uses this
   instead of manual client construction.

3. FIX COMPRESSION FALLBACK + VISION ERROR HANDLING
   - context_compressor.py: Removed _get_fallback_client() which blindly
     looked for OPENAI_API_KEY + OPENAI_BASE_URL (fails for Codex OAuth,
     API-key providers, users without OPENAI_BASE_URL set). Replaced
     with fallback loop through resolve_provider_client() for each
     known provider, with same-provider dedup.

   - vision_tools.py: Added error detection for vision capability
     failures. Returns clear message to the model when the configured
     model doesn't support vision, instead of a generic error.

Addresses #886

											
										
										
											2026-03-11 19:46:47 -07:00
-												feat(providers): route gemini through the native AI Studio API

- add a native Gemini adapter over generateContent/streamGenerateContent
- switch the built-in gemini provider off the OpenAI-compatible endpoint
- preserve thought signatures and native functionResponse replay
- route auxiliary Gemini clients through the same adapter
- add focused unit coverage plus native-provider integration checks

											
										
										
											2026-04-20 00:00:50 +05:30
+								        if provider == "gemini":
-												fix(gemini): tighten native routing and streaming replay

- only use the native adapter for the canonical Gemini native endpoint
- keep custom and /openai base URLs on the OpenAI-compatible path
- preserve Hermes keepalive transport injection for native Gemini clients
- stabilize streaming tool-call replay across repeated SSE events
- add follow-up tests for base_url precedence, async streaming, and duplicate tool-call chunks

											
										
										
											2026-04-20 00:41:20 +05:30
+								            from agent.gemini_native_adapter import GeminiNativeClient, is_native_gemini_base_url
-												feat(providers): route gemini through the native AI Studio API

- add a native Gemini adapter over generateContent/streamGenerateContent
- switch the built-in gemini provider off the OpenAI-compatible endpoint
- preserve thought signatures and native functionResponse replay
- route auxiliary Gemini clients through the same adapter
- add focused unit coverage plus native-provider integration checks

											
										
										
											2026-04-20 00:00:50 +05:30
-												fix(gemini): tighten native routing and streaming replay

- only use the native adapter for the canonical Gemini native endpoint
- keep custom and /openai base URLs on the OpenAI-compatible path
- preserve Hermes keepalive transport injection for native Gemini clients
- stabilize streaming tool-call replay across repeated SSE events
- add follow-up tests for base_url precedence, async streaming, and duplicate tool-call chunks

											
										
										
											2026-04-20 00:41:20 +05:30
+								            if is_native_gemini_base_url(base_url):
 								                client = GeminiNativeClient(api_key=api_key, base_url=base_url)
 								                logger.debug("resolve_provider_client: %s (%s)", provider, final_model)
-												fix(copilot): send vision header for Copilot vision requests

Thread a vision-request flag through auxiliary provider resolution so Copilot clients can include Copilot-Vision-Request only for vision tasks. This preserves normal text requests while ensuring Copilot vision payloads reach the vision-capable route.

Add regression coverage for Copilot vision routing and keep cached text and vision clients separate so a text client without the header is not reused for vision.

Co-authored-by: dhabibi <9087935+dhabibi@users.noreply.github.com>

											
										
										
											2026-04-26 21:29:55 +00:00
+								                return (_to_async_client(client, final_model, is_vision=is_vision) if async_mode
-												fix(gemini): tighten native routing and streaming replay

- only use the native adapter for the canonical Gemini native endpoint
- keep custom and /openai base URLs on the OpenAI-compatible path
- preserve Hermes keepalive transport injection for native Gemini clients
- stabilize streaming tool-call replay across repeated SSE events
- add follow-up tests for base_url precedence, async streaming, and duplicate tool-call chunks

											
										
										
											2026-04-20 00:41:20 +05:30
+								                        else (client, final_model))
-												feat(providers): route gemini through the native AI Studio API

- add a native Gemini adapter over generateContent/streamGenerateContent
- switch the built-in gemini provider off the OpenAI-compatible endpoint
- preserve thought signatures and native functionResponse replay
- route auxiliary Gemini clients through the same adapter
- add focused unit coverage plus native-provider integration checks

											
										
										
											2026-04-20 00:00:50 +05:30
-												feat: centralized provider router + fix Codex vision bypass + vision error handling

Three interconnected fixes for auxiliary client infrastructure:

1. CENTRALIZED PROVIDER ROUTER (auxiliary_client.py)
   Add resolve_provider_client(provider, model, async_mode) — a single
   entry point for creating properly configured clients. Given a provider
   name and optional model, it handles auth lookup (env vars, OAuth
   tokens, auth.json), base URL resolution, provider-specific headers,
   and API format differences (Chat Completions vs Responses API for
   Codex). All auxiliary consumers should route through this instead of
   ad-hoc env var lookups.

   Refactored get_text_auxiliary_client, get_async_text_auxiliary_client,
   and get_vision_auxiliary_client to use the router internally.

2. FIX CODEX VISION BYPASS (vision_tools.py)
   vision_tools.py was constructing a raw AsyncOpenAI client from the
   sync vision client's api_key/base_url, completely bypassing the Codex
   Responses API adapter. When the vision provider resolved to Codex,
   the raw client would hit chatgpt.com/backend-api/codex with
   chat.completions.create() which only supports the Responses API.

   Fix: Added get_async_vision_auxiliary_client() which properly wraps
   Codex into AsyncCodexAuxiliaryClient. vision_tools.py now uses this
   instead of manual client construction.

3. FIX COMPRESSION FALLBACK + VISION ERROR HANDLING
   - context_compressor.py: Removed _get_fallback_client() which blindly
     looked for OPENAI_API_KEY + OPENAI_BASE_URL (fails for Codex OAuth,
     API-key providers, users without OPENAI_BASE_URL set). Replaced
     with fallback loop through resolve_provider_client() for each
     known provider, with same-provider dedup.

   - vision_tools.py: Added error detection for vision capability
     failures. Returns clear message to the model when the configured
     model doesn't support vision, instead of a generic error.

Addresses #886

											
										
										
											2026-03-11 19:46:47 -07:00
+								        # Provider-specific headers
 								        headers = {}
-												fix: sweep remaining provider-URL substring checks across codebase

Completes the hostname-hardening sweep — every substring check against a
provider host in live-routing code is now hostname-based. This closes the
same false-positive class for OpenRouter, GitHub Copilot, Kimi, Qwen,
ChatGPT/Codex, Bedrock, GitHub Models, Vercel AI Gateway, Nous, Z.AI,
Moonshot, Arcee, and MiniMax that the original PR closed for OpenAI, xAI,
and Anthropic.

New helper:
- utils.base_url_host_matches(base_url, domain) — safe counterpart to
  'domain in base_url'. Accepts hostname equality and subdomain matches;
  rejects path segments, host suffixes, and prefix collisions.

Call sites converted (real-code only; tests, optional-skills, red-teaming
scripts untouched):

run_agent.py (10 sites):
- AIAgent.__init__ Bedrock branch, ChatGPT/Codex branch (also path check)
- header cascade for openrouter / copilot / kimi / qwen / chatgpt
- interleaved-thinking trigger (openrouter + claude)
- _is_openrouter_url(), _is_qwen_portal()
- is_native_anthropic check
- github-models-vs-copilot detection (3 sites)
- reasoning-capable route gate (nousresearch, vercel, github)
- codex-backend detection in API kwargs build
- fallback api_mode Bedrock detection

agent/auxiliary_client.py (7 sites):
- extra-headers cascades in 4 distinct client-construction paths
  (resolve custom, resolve auto, OpenRouter-fallback-to-custom,
  _async_client_from_sync, resolve_provider_client explicit-custom,
  resolve_auto_with_codex)
- _is_openrouter_client() base_url sniff

agent/usage_pricing.py:
- resolve_billing_route openrouter branch

agent/model_metadata.py:
- _is_openrouter_base_url(), Bedrock context-length lookup

hermes_cli/providers.py:
- determine_api_mode Bedrock heuristic

hermes_cli/runtime_provider.py:
- _is_openrouter_url flag for API-key preference (issues #420, #560)

hermes_cli/doctor.py:
- Kimi User-Agent header for /models probes

tools/delegate_tool.py:
- subagent Codex endpoint detection

trajectory_compressor.py:
- _detect_provider() cascade (8 providers: openrouter, nous, codex, zai,
  kimi-coding, arcee, minimax-cn, minimax)

cli.py, gateway/run.py:
- /model-switch cache-enabled hint (openrouter + claude)

Bedrock detection tightened from 'bedrock-runtime in url' to
'hostname starts with bedrock-runtime. AND host is under amazonaws.com'.
ChatGPT/Codex detection tightened from 'chatgpt.com/backend-api/codex in
url' to 'hostname is chatgpt.com AND path contains /backend-api/codex'.

Tests:
- tests/test_base_url_hostname.py extended with a base_url_host_matches
  suite (exact match, subdomain, path-segment rejection, host-suffix
  rejection, host-prefix rejection, empty-input, case-insensitivity,
  trailing dot).

Validation: 651 targeted tests pass (runtime_provider, minimax, bedrock,
gemini, auxiliary, codex_cloudflare, usage_pricing, compressor_fallback,
fallback_model, openai_client_lifecycle, provider_parity, cli_provider_resolution,
delegate, credential_pool, context_compressor, plus the 4 hostname test
modules). 26-assertion E2E call-site verification across 6 modules passes.

											
										
										
											2026-04-20 21:17:28 -07:00
+								        if base_url_host_matches(base_url, "api.kimi.com"):
-												fix: Update Kimi Coding API endpoint and User-Agent

											
										
										
											2026-04-18 22:55:36 +08:00
+								            headers["User-Agent"] = "claude-code/0.1.0"
-												fix: sweep remaining provider-URL substring checks across codebase

Completes the hostname-hardening sweep — every substring check against a
provider host in live-routing code is now hostname-based. This closes the
same false-positive class for OpenRouter, GitHub Copilot, Kimi, Qwen,
ChatGPT/Codex, Bedrock, GitHub Models, Vercel AI Gateway, Nous, Z.AI,
Moonshot, Arcee, and MiniMax that the original PR closed for OpenAI, xAI,
and Anthropic.

New helper:
- utils.base_url_host_matches(base_url, domain) — safe counterpart to
  'domain in base_url'. Accepts hostname equality and subdomain matches;
  rejects path segments, host suffixes, and prefix collisions.

Call sites converted (real-code only; tests, optional-skills, red-teaming
scripts untouched):

run_agent.py (10 sites):
- AIAgent.__init__ Bedrock branch, ChatGPT/Codex branch (also path check)
- header cascade for openrouter / copilot / kimi / qwen / chatgpt
- interleaved-thinking trigger (openrouter + claude)
- _is_openrouter_url(), _is_qwen_portal()
- is_native_anthropic check
- github-models-vs-copilot detection (3 sites)
- reasoning-capable route gate (nousresearch, vercel, github)
- codex-backend detection in API kwargs build
- fallback api_mode Bedrock detection

agent/auxiliary_client.py (7 sites):
- extra-headers cascades in 4 distinct client-construction paths
  (resolve custom, resolve auto, OpenRouter-fallback-to-custom,
  _async_client_from_sync, resolve_provider_client explicit-custom,
  resolve_auto_with_codex)
- _is_openrouter_client() base_url sniff

agent/usage_pricing.py:
- resolve_billing_route openrouter branch

agent/model_metadata.py:
- _is_openrouter_base_url(), Bedrock context-length lookup

hermes_cli/providers.py:
- determine_api_mode Bedrock heuristic

hermes_cli/runtime_provider.py:
- _is_openrouter_url flag for API-key preference (issues #420, #560)

hermes_cli/doctor.py:
- Kimi User-Agent header for /models probes

tools/delegate_tool.py:
- subagent Codex endpoint detection

trajectory_compressor.py:
- _detect_provider() cascade (8 providers: openrouter, nous, codex, zai,
  kimi-coding, arcee, minimax-cn, minimax)

cli.py, gateway/run.py:
- /model-switch cache-enabled hint (openrouter + claude)

Bedrock detection tightened from 'bedrock-runtime in url' to
'hostname starts with bedrock-runtime. AND host is under amazonaws.com'.
ChatGPT/Codex detection tightened from 'chatgpt.com/backend-api/codex in
url' to 'hostname is chatgpt.com AND path contains /backend-api/codex'.

Tests:
- tests/test_base_url_hostname.py extended with a base_url_host_matches
  suite (exact match, subdomain, path-segment rejection, host-suffix
  rejection, host-prefix rejection, empty-input, case-insensitivity,
  trailing dot).

Validation: 651 targeted tests pass (runtime_provider, minimax, bedrock,
gemini, auxiliary, codex_cloudflare, usage_pricing, compressor_fallback,
fallback_model, openai_client_lifecycle, provider_parity, cli_provider_resolution,
delegate, credential_pool, context_compressor, plus the 4 hostname test
modules). 26-assertion E2E call-site verification across 6 modules passes.

											
										
										
											2026-04-20 21:17:28 -07:00
+								        elif base_url_host_matches(base_url, "api.githubcopilot.com"):
-												fix(copilot): send vision header for Copilot vision requests

Thread a vision-request flag through auxiliary provider resolution so Copilot clients can include Copilot-Vision-Request only for vision tasks. This preserves normal text requests while ensuring Copilot vision payloads reach the vision-capable route.

Add regression coverage for Copilot vision routing and keep cached text and vision clients separate so a text client without the header is not reused for vision.

Co-authored-by: dhabibi <9087935+dhabibi@users.noreply.github.com>

											
										
										
											2026-04-26 21:29:55 +00:00
+								            from hermes_cli.copilot_auth import copilot_request_headers
-												feat: integrate GitHub Copilot providers across Hermes

Add first-class GitHub Copilot and Copilot ACP provider support across
model selection, runtime provider resolution, CLI sessions, delegated
subagents, cron jobs, and the Telegram gateway.

This also normalizes Copilot model catalogs and API modes, introduces a
Copilot ACP OpenAI-compatible shim, and fixes service-mode auth by
resolving Homebrew-installed gh binaries under launchd.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

											
										
										
											2026-03-17 23:40:22 -07:00
-												fix(copilot): send vision header for Copilot vision requests

Thread a vision-request flag through auxiliary provider resolution so Copilot clients can include Copilot-Vision-Request only for vision tasks. This preserves normal text requests while ensuring Copilot vision payloads reach the vision-capable route.

Add regression coverage for Copilot vision routing and keep cached text and vision clients separate so a text client without the header is not reused for vision.

Co-authored-by: dhabibi <9087935+dhabibi@users.noreply.github.com>

											
										
										
											2026-04-26 21:29:55 +00:00
+								            headers.update(copilot_request_headers(
 								                is_agent_turn=True, is_vision=is_vision
 								            ))
-												feat: centralized provider router + fix Codex vision bypass + vision error handling

Three interconnected fixes for auxiliary client infrastructure:

1. CENTRALIZED PROVIDER ROUTER (auxiliary_client.py)
   Add resolve_provider_client(provider, model, async_mode) — a single
   entry point for creating properly configured clients. Given a provider
   name and optional model, it handles auth lookup (env vars, OAuth
   tokens, auth.json), base URL resolution, provider-specific headers,
   and API format differences (Chat Completions vs Responses API for
   Codex). All auxiliary consumers should route through this instead of
   ad-hoc env var lookups.

   Refactored get_text_auxiliary_client, get_async_text_auxiliary_client,
   and get_vision_auxiliary_client to use the router internally.

2. FIX CODEX VISION BYPASS (vision_tools.py)
   vision_tools.py was constructing a raw AsyncOpenAI client from the
   sync vision client's api_key/base_url, completely bypassing the Codex
   Responses API adapter. When the vision provider resolved to Codex,
   the raw client would hit chatgpt.com/backend-api/codex with
   chat.completions.create() which only supports the Responses API.

   Fix: Added get_async_vision_auxiliary_client() which properly wraps
   Codex into AsyncCodexAuxiliaryClient. vision_tools.py now uses this
   instead of manual client construction.

3. FIX COMPRESSION FALLBACK + VISION ERROR HANDLING
   - context_compressor.py: Removed _get_fallback_client() which blindly
     looked for OPENAI_API_KEY + OPENAI_BASE_URL (fails for Codex OAuth,
     API-key providers, users without OPENAI_BASE_URL set). Replaced
     with fallback loop through resolve_provider_client() for each
     known provider, with same-provider dedup.

   - vision_tools.py: Added error detection for vision capability
     failures. Returns clear message to the model when the configured
     model doesn't support vision, instead of a generic error.

Addresses #886

											
										
										
											2026-03-11 19:46:47 -07:00
+								        client = OpenAI(api_key=api_key, base_url=base_url,
 								                        **({"default_headers": headers} if headers else {}))
-												fix: wrap copilot Responses-API models in CodexAuxiliaryClient for auxiliary tasks

GPT-5+ models (except gpt-5-mini) are only accessible via the Responses
API on Copilot. When these models were configured as the compression
summary_model (or any auxiliary task), the plain OpenAI client sent them
to /chat/completions which returned a 400 error:

    model "gpt-5.4-mini" is not accessible via the /chat/completions endpoint

resolve_provider_client() now checks _should_use_copilot_responses_api()
for the copilot provider and wraps the client in CodexAuxiliaryClient
when needed, routing calls through responses.stream() transparently.

Adds tests for both the wrapping (gpt-5.4-mini) and non-wrapping
(gpt-4.1-mini) paths.

											
										
										
											2026-04-10 04:35:07 +00:00
 								        # Copilot GPT-5+ models (except gpt-5-mini) require the Responses
 								        # API — they are not accessible via /chat/completions.  Wrap the
 								        # plain client in CodexAuxiliaryClient so call_llm() transparently
 								        # routes through responses.stream().
 								        if provider == "copilot" and final_model and not raw_codex:
 								            try:
 								                from hermes_cli.models import _should_use_copilot_responses_api
 								                if _should_use_copilot_responses_api(final_model):
 								                    logger.debug(
 								                        "resolve_provider_client: copilot model %s needs "
 								                        "Responses API — wrapping with CodexAuxiliaryClient",
 								                        final_model)
 								                    client = CodexAuxiliaryClient(client, final_model)
 								            except ImportError:
 								                pass
-												fix(auxiliary): honor api_mode in auxiliary client (#6800)

The auxiliary client always calls client.chat.completions.create(),
ignoring the api_mode config flag. This breaks codex-family models
(e.g. gpt-5.3-codex) on direct OpenAI API keys, which need the
/v1/responses endpoint.

Changes:
- Expand _resolve_task_provider_model to return api_mode (5-tuple)
- Read api_mode from auxiliary.{task}.api_mode config and env vars
  (AUXILIARY_{TASK}_API_MODE)
- Pass api_mode through _get_cached_client to resolve_provider_client
- Add _needs_codex_wrap/_wrap_if_needed helpers that wrap plain OpenAI
  clients in CodexAuxiliaryClient when api_mode=codex_responses or
  when auto-detection finds api.openai.com + codex model pattern
- Apply wrapping at all custom endpoint, named custom provider, and
  API-key provider return paths
- Update test mocks for the new 5-tuple return format

Users can now set:
  auxiliary:
    compression:
      model: gpt-5.3-codex
      base_url: https://api.openai.com/v1
      api_mode: codex_responses

Closes #6800

											
										
										
											2026-04-11 13:50:43 +05:30
+								        # Honor api_mode for any API-key provider (e.g. direct OpenAI with
 								        # codex-family models).  The copilot-specific wrapping above handles
 								        # copilot; this covers the general case (#6800).
 								        client = _wrap_if_needed(client, final_model, base_url)
-												feat: centralized provider router + fix Codex vision bypass + vision error handling

Three interconnected fixes for auxiliary client infrastructure:

1. CENTRALIZED PROVIDER ROUTER (auxiliary_client.py)
   Add resolve_provider_client(provider, model, async_mode) — a single
   entry point for creating properly configured clients. Given a provider
   name and optional model, it handles auth lookup (env vars, OAuth
   tokens, auth.json), base URL resolution, provider-specific headers,
   and API format differences (Chat Completions vs Responses API for
   Codex). All auxiliary consumers should route through this instead of
   ad-hoc env var lookups.

   Refactored get_text_auxiliary_client, get_async_text_auxiliary_client,
   and get_vision_auxiliary_client to use the router internally.

2. FIX CODEX VISION BYPASS (vision_tools.py)
   vision_tools.py was constructing a raw AsyncOpenAI client from the
   sync vision client's api_key/base_url, completely bypassing the Codex
   Responses API adapter. When the vision provider resolved to Codex,
   the raw client would hit chatgpt.com/backend-api/codex with
   chat.completions.create() which only supports the Responses API.

   Fix: Added get_async_vision_auxiliary_client() which properly wraps
   Codex into AsyncCodexAuxiliaryClient. vision_tools.py now uses this
   instead of manual client construction.

3. FIX COMPRESSION FALLBACK + VISION ERROR HANDLING
   - context_compressor.py: Removed _get_fallback_client() which blindly
     looked for OPENAI_API_KEY + OPENAI_BASE_URL (fails for Codex OAuth,
     API-key providers, users without OPENAI_BASE_URL set). Replaced
     with fallback loop through resolve_provider_client() for each
     known provider, with same-provider dedup.

   - vision_tools.py: Added error detection for vision capability
     failures. Returns clear message to the model when the configured
     model doesn't support vision, instead of a generic error.

Addresses #886

											
										
										
											2026-03-11 19:46:47 -07:00
+								        logger.debug("resolve_provider_client: %s (%s)", provider, final_model)
-												fix(copilot): send vision header for Copilot vision requests

Thread a vision-request flag through auxiliary provider resolution so Copilot clients can include Copilot-Vision-Request only for vision tasks. This preserves normal text requests while ensuring Copilot vision payloads reach the vision-capable route.

Add regression coverage for Copilot vision routing and keep cached text and vision clients separate so a text client without the header is not reused for vision.

Co-authored-by: dhabibi <9087935+dhabibi@users.noreply.github.com>

											
										
										
											2026-04-26 21:29:55 +00:00
+								        return (_to_async_client(client, final_model, is_vision=is_vision) if async_mode
-												feat: centralized provider router + fix Codex vision bypass + vision error handling

Three interconnected fixes for auxiliary client infrastructure:

1. CENTRALIZED PROVIDER ROUTER (auxiliary_client.py)
   Add resolve_provider_client(provider, model, async_mode) — a single
   entry point for creating properly configured clients. Given a provider
   name and optional model, it handles auth lookup (env vars, OAuth
   tokens, auth.json), base URL resolution, provider-specific headers,
   and API format differences (Chat Completions vs Responses API for
   Codex). All auxiliary consumers should route through this instead of
   ad-hoc env var lookups.

   Refactored get_text_auxiliary_client, get_async_text_auxiliary_client,
   and get_vision_auxiliary_client to use the router internally.

2. FIX CODEX VISION BYPASS (vision_tools.py)
   vision_tools.py was constructing a raw AsyncOpenAI client from the
   sync vision client's api_key/base_url, completely bypassing the Codex
   Responses API adapter. When the vision provider resolved to Codex,
   the raw client would hit chatgpt.com/backend-api/codex with
   chat.completions.create() which only supports the Responses API.

   Fix: Added get_async_vision_auxiliary_client() which properly wraps
   Codex into AsyncCodexAuxiliaryClient. vision_tools.py now uses this
   instead of manual client construction.

3. FIX COMPRESSION FALLBACK + VISION ERROR HANDLING
   - context_compressor.py: Removed _get_fallback_client() which blindly
     looked for OPENAI_API_KEY + OPENAI_BASE_URL (fails for Codex OAuth,
     API-key providers, users without OPENAI_BASE_URL set). Replaced
     with fallback loop through resolve_provider_client() for each
     known provider, with same-provider dedup.

   - vision_tools.py: Added error detection for vision capability
     failures. Returns clear message to the model when the configured
     model doesn't support vision, instead of a generic error.

Addresses #886

											
										
										
											2026-03-11 19:46:47 -07:00
+								                else (client, final_model))
-												fix(copilot-acp): keep acp runtime off responses path

											
										
										
											2026-04-12 18:47:14 -06:00
+								    if pconfig.auth_type == "external_process":
 								        creds = resolve_external_process_provider_credentials(provider)
 								        final_model = _normalize_resolved_model(model or _read_main_model(), provider)
 								        if provider == "copilot-acp":
 								            api_key = str(creds.get("api_key", "")).strip()
 								            base_url = str(creds.get("base_url", "")).strip()
 								            command = str(creds.get("command", "")).strip() or None
 								            args = list(creds.get("args") or [])
 								            if not final_model:
 								                logger.warning(
 								                    "resolve_provider_client: copilot-acp requested but no model "
 								                    "was provided or configured"
 								                )
 								                return None, None
 								            if not api_key or not base_url:
 								                logger.warning(
 								                    "resolve_provider_client: copilot-acp requested but external "
 								                    "process credentials are incomplete"
 								                )
 								                return None, None
 								            from agent.copilot_acp_client import CopilotACPClient
 								            client = CopilotACPClient(
 								                api_key=api_key,
 								                base_url=base_url,
 								                command=command,
 								                args=args,
 								            )
 								            logger.debug("resolve_provider_client: %s (%s)", provider, final_model)
-												fix(copilot): send vision header for Copilot vision requests

Thread a vision-request flag through auxiliary provider resolution so Copilot clients can include Copilot-Vision-Request only for vision tasks. This preserves normal text requests while ensuring Copilot vision payloads reach the vision-capable route.

Add regression coverage for Copilot vision routing and keep cached text and vision clients separate so a text client without the header is not reused for vision.

Co-authored-by: dhabibi <9087935+dhabibi@users.noreply.github.com>

											
										
										
											2026-04-26 21:29:55 +00:00
+								            return (_to_async_client(client, final_model, is_vision=is_vision) if async_mode
-												fix(copilot-acp): keep acp runtime off responses path

											
										
										
											2026-04-12 18:47:14 -06:00
+								                    else (client, final_model))
 								        logger.warning("resolve_provider_client: external-process provider %s not "
 								                       "directly supported", provider)
 								        return None, None
-												fix(agent): handle aws_sdk auth type in resolve_provider_client

Bedrock's aws_sdk auth_type had no matching branch in
resolve_provider_client(), causing it to fall through to the
"unhandled auth_type" warning and return (None, None).  This broke
all auxiliary tasks (compression, memory, summarization) for Bedrock
users — the main conversation loop worked fine, but background
context management silently failed.

Add an aws_sdk branch that creates an AnthropicAuxiliaryClient via
build_anthropic_bedrock_client(), using boto3's default credential
chain (IAM roles, SSO, env vars, instance metadata).  Default
auxiliary model is Haiku for cost efficiency.

Closes #13919

											
										
										
											2026-04-23 21:37:20 +10:00
+								    elif pconfig.auth_type == "aws_sdk":
 								        # AWS SDK providers (Bedrock) — use the Anthropic Bedrock client via
 								        # boto3's credential chain (IAM roles, SSO, env vars, instance metadata).
 								        try:
 								            from agent.bedrock_adapter import has_aws_credentials, resolve_bedrock_region
 								            from agent.anthropic_adapter import build_anthropic_bedrock_client
 								        except ImportError:
 								            logger.warning("resolve_provider_client: bedrock requested but "
 								                           "boto3 or anthropic SDK not installed")
 								            return None, None
 								        if not has_aws_credentials():
 								            logger.debug("resolve_provider_client: bedrock requested but "
 								                         "no AWS credentials found")
 								            return None, None
 								        region = resolve_bedrock_region()
 								        default_model = "anthropic.claude-haiku-4-5-20251001-v1:0"
 								        final_model = _normalize_resolved_model(model or default_model, provider)
 								        try:
 								            real_client = build_anthropic_bedrock_client(region)
 								        except ImportError as exc:
 								            logger.warning("resolve_provider_client: cannot create Bedrock "
 								                           "client: %s", exc)
 								            return None, None
 								        client = AnthropicAuxiliaryClient(
 								            real_client, final_model, api_key="aws-sdk",
 								            base_url=f"https://bedrock-runtime.{region}.amazonaws.com",
 								        )
 								        logger.debug("resolve_provider_client: bedrock (%s, %s)", final_model, region)
-												fix(copilot): send vision header for Copilot vision requests

Thread a vision-request flag through auxiliary provider resolution so Copilot clients can include Copilot-Vision-Request only for vision tasks. This preserves normal text requests while ensuring Copilot vision payloads reach the vision-capable route.

Add regression coverage for Copilot vision routing and keep cached text and vision clients separate so a text client without the header is not reused for vision.

Co-authored-by: dhabibi <9087935+dhabibi@users.noreply.github.com>

											
										
										
											2026-04-26 21:29:55 +00:00
+								        return (_to_async_client(client, final_model, is_vision=is_vision) if async_mode
-												fix(agent): handle aws_sdk auth type in resolve_provider_client

Bedrock's aws_sdk auth_type had no matching branch in
resolve_provider_client(), causing it to fall through to the
"unhandled auth_type" warning and return (None, None).  This broke
all auxiliary tasks (compression, memory, summarization) for Bedrock
users — the main conversation loop worked fine, but background
context management silently failed.

Add an aws_sdk branch that creates an AnthropicAuxiliaryClient via
build_anthropic_bedrock_client(), using boto3's default credential
chain (IAM roles, SSO, env vars, instance metadata).  Default
auxiliary model is Haiku for cost efficiency.

Closes #13919

											
										
										
											2026-04-23 21:37:20 +10:00
+								                else (client, final_model))
-												feat: centralized provider router + fix Codex vision bypass + vision error handling

Three interconnected fixes for auxiliary client infrastructure:

1. CENTRALIZED PROVIDER ROUTER (auxiliary_client.py)
   Add resolve_provider_client(provider, model, async_mode) — a single
   entry point for creating properly configured clients. Given a provider
   name and optional model, it handles auth lookup (env vars, OAuth
   tokens, auth.json), base URL resolution, provider-specific headers,
   and API format differences (Chat Completions vs Responses API for
   Codex). All auxiliary consumers should route through this instead of
   ad-hoc env var lookups.

   Refactored get_text_auxiliary_client, get_async_text_auxiliary_client,
   and get_vision_auxiliary_client to use the router internally.

2. FIX CODEX VISION BYPASS (vision_tools.py)
   vision_tools.py was constructing a raw AsyncOpenAI client from the
   sync vision client's api_key/base_url, completely bypassing the Codex
   Responses API adapter. When the vision provider resolved to Codex,
   the raw client would hit chatgpt.com/backend-api/codex with
   chat.completions.create() which only supports the Responses API.

   Fix: Added get_async_vision_auxiliary_client() which properly wraps
   Codex into AsyncCodexAuxiliaryClient. vision_tools.py now uses this
   instead of manual client construction.

3. FIX COMPRESSION FALLBACK + VISION ERROR HANDLING
   - context_compressor.py: Removed _get_fallback_client() which blindly
     looked for OPENAI_API_KEY + OPENAI_BASE_URL (fails for Codex OAuth,
     API-key providers, users without OPENAI_BASE_URL set). Replaced
     with fallback loop through resolve_provider_client() for each
     known provider, with same-provider dedup.

   - vision_tools.py: Added error detection for vision capability
     failures. Returns clear message to the model when the configured
     model doesn't support vision, instead of a generic error.

Addresses #886

											
										
										
											2026-03-11 19:46:47 -07:00
+								    elif pconfig.auth_type in ("oauth_device_code", "oauth_external"):
 								        # OAuth providers — route through their specific try functions
 								        if provider == "nous":
 								            return resolve_provider_client("nous", model, async_mode)
 								        if provider == "openai-codex":
 								            return resolve_provider_client("openai-codex", model, async_mode)
-												chore: remove nous-api provider (API key path)

Nous Portal only supports OAuth authentication. Remove the 'nous-api'
provider which allowed direct API key access via NOUS_API_KEY env var.

Removed from:
- hermes_cli/auth.py: PROVIDER_REGISTRY entry + aliases
- hermes_cli/config.py: OPTIONAL_ENV_VARS entry
- hermes_cli/setup.py: setup wizard option + model selection handler
  (reindexed remaining provider choices)
- agent/auxiliary_client.py: docstring references
- tests/test_runtime_provider_resolution.py: nous-api test
- tests/integration/test_web_tools.py: renamed dict key

											
										
										
											2026-03-11 20:14:44 -07:00
+								        # Other OAuth providers not directly supported
-												feat: centralized provider router + fix Codex vision bypass + vision error handling

Three interconnected fixes for auxiliary client infrastructure:

1. CENTRALIZED PROVIDER ROUTER (auxiliary_client.py)
   Add resolve_provider_client(provider, model, async_mode) — a single
   entry point for creating properly configured clients. Given a provider
   name and optional model, it handles auth lookup (env vars, OAuth
   tokens, auth.json), base URL resolution, provider-specific headers,
   and API format differences (Chat Completions vs Responses API for
   Codex). All auxiliary consumers should route through this instead of
   ad-hoc env var lookups.

   Refactored get_text_auxiliary_client, get_async_text_auxiliary_client,
   and get_vision_auxiliary_client to use the router internally.

2. FIX CODEX VISION BYPASS (vision_tools.py)
   vision_tools.py was constructing a raw AsyncOpenAI client from the
   sync vision client's api_key/base_url, completely bypassing the Codex
   Responses API adapter. When the vision provider resolved to Codex,
   the raw client would hit chatgpt.com/backend-api/codex with
   chat.completions.create() which only supports the Responses API.

   Fix: Added get_async_vision_auxiliary_client() which properly wraps
   Codex into AsyncCodexAuxiliaryClient. vision_tools.py now uses this
   instead of manual client construction.

3. FIX COMPRESSION FALLBACK + VISION ERROR HANDLING
   - context_compressor.py: Removed _get_fallback_client() which blindly
     looked for OPENAI_API_KEY + OPENAI_BASE_URL (fails for Codex OAuth,
     API-key providers, users without OPENAI_BASE_URL set). Replaced
     with fallback loop through resolve_provider_client() for each
     known provider, with same-provider dedup.

   - vision_tools.py: Added error detection for vision capability
     failures. Returns clear message to the model when the configured
     model doesn't support vision, instead of a generic error.

Addresses #886

											
										
										
											2026-03-11 19:46:47 -07:00
+								        logger.warning("resolve_provider_client: OAuth provider %s not "
 								                       "directly supported, try 'auto'", provider)
 								        return None, None
 								    logger.warning("resolve_provider_client: unhandled auth_type %s for %s",
 								                   pconfig.auth_type, provider)
 								    return None, None
-												feat: enhance auxiliary model configuration and environment variable handling

- Added support for auxiliary model overrides in the configuration, allowing users to specify providers and models for vision and web extraction tasks.
- Updated the CLI configuration example to include new auxiliary model settings.
- Enhanced the environment variable mapping in the CLI to accommodate auxiliary model configurations.
- Improved the resolution logic for auxiliary clients to support task-specific provider overrides.
- Updated relevant documentation and comments for clarity on the new features and their usage.

											
										
										
											2026-03-07 08:52:06 -08:00
+								# ── Public API ──────────────────────────────────────────────────────────────
-												fix(agent): route compression aux through live session runtime

											
										
										
											2026-04-12 00:10:19 -04:00
+								def get_text_auxiliary_client(
 								    task: str = "",
 								    *,
 								    main_runtime: Optional[Dict[str, Any]] = None,
 								) -> Tuple[Optional[OpenAI], Optional[str]]:
-												feat: enhance auxiliary model configuration and environment variable handling

- Added support for auxiliary model overrides in the configuration, allowing users to specify providers and models for vision and web extraction tasks.
- Updated the CLI configuration example to include new auxiliary model settings.
- Enhanced the environment variable mapping in the CLI to accommodate auxiliary model configurations.
- Improved the resolution logic for auxiliary clients to support task-specific provider overrides.
- Updated relevant documentation and comments for clarity on the new features and their usage.

											
										
										
											2026-03-07 08:52:06 -08:00
+								    """Return (client, default_model_slug) for text-only auxiliary tasks.
 								    Args:
 								        task: Optional task name ("compression", "web_extract") to check
 								              for a task-specific provider override.
-												fix: remove legacy compression.summary_* config and env var fallbacks (#8992)

Remove the backward-compat code paths that read compression provider/model
settings from legacy config keys and env vars, which caused silent failures
when auto-detection resolved to incompatible backends.

What changed:
- Remove compression.summary_model, summary_provider, summary_base_url from
  DEFAULT_CONFIG and cli.py defaults
- Remove backward-compat block in _resolve_task_provider_model() that read
  from the legacy compression section
- Remove _get_auxiliary_provider() and _get_auxiliary_env_override() helper
  functions (AUXILIARY_*/CONTEXT_* env var readers)
- Remove env var fallback chain for per-task overrides
- Update hermes config show to read from auxiliary.compression
- Add config migration (v16→17) that moves non-empty legacy values to
  auxiliary.compression and strips the old keys
- Update example config and openclaw migration script
- Remove/update tests for deleted code paths

Compression model/provider is now configured exclusively via:
  auxiliary.compression.provider / auxiliary.compression.model

Closes #8923
											
										
										
											2026-04-13 04:59:26 -07:00
+								    Callers may override the returned model via config.yaml
 								    (e.g. auxiliary.compression.model, auxiliary.web_extract.model).
-												feat: enhance auxiliary model configuration and environment variable handling

- Added support for auxiliary model overrides in the configuration, allowing users to specify providers and models for vision and web extraction tasks.
- Updated the CLI configuration example to include new auxiliary model settings.
- Enhanced the environment variable mapping in the CLI to accommodate auxiliary model configurations.
- Improved the resolution logic for auxiliary clients to support task-specific provider overrides.
- Updated relevant documentation and comments for clarity on the new features and their usage.

											
										
										
											2026-03-07 08:52:06 -08:00
+								    """
-												fix(auxiliary): honor api_mode in auxiliary client (#6800)

The auxiliary client always calls client.chat.completions.create(),
ignoring the api_mode config flag. This breaks codex-family models
(e.g. gpt-5.3-codex) on direct OpenAI API keys, which need the
/v1/responses endpoint.

Changes:
- Expand _resolve_task_provider_model to return api_mode (5-tuple)
- Read api_mode from auxiliary.{task}.api_mode config and env vars
  (AUXILIARY_{TASK}_API_MODE)
- Pass api_mode through _get_cached_client to resolve_provider_client
- Add _needs_codex_wrap/_wrap_if_needed helpers that wrap plain OpenAI
  clients in CodexAuxiliaryClient when api_mode=codex_responses or
  when auto-detection finds api.openai.com + codex model pattern
- Apply wrapping at all custom endpoint, named custom provider, and
  API-key provider return paths
- Update test mocks for the new 5-tuple return format

Users can now set:
  auxiliary:
    compression:
      model: gpt-5.3-codex
      base_url: https://api.openai.com/v1
      api_mode: codex_responses

Closes #6800

											
										
										
											2026-04-11 13:50:43 +05:30
+								    provider, model, base_url, api_key, api_mode = _resolve_task_provider_model(task or None)
-												feat: add direct endpoint overrides for auxiliary and delegation

Add base_url/api_key overrides for auxiliary tasks and delegation so users can
route those flows straight to a custom OpenAI-compatible endpoint without
having to rely on provider=main or named custom providers.

Also clear gateway session env vars in test isolation so the full suite stays
deterministic when run from a messaging-backed agent session.

											
										
										
											2026-03-14 20:48:29 -07:00
+								    return resolve_provider_client(
 								        provider,
 								        model=model,
 								        explicit_base_url=base_url,
 								        explicit_api_key=api_key,
-												fix(auxiliary): honor api_mode in auxiliary client (#6800)

The auxiliary client always calls client.chat.completions.create(),
ignoring the api_mode config flag. This breaks codex-family models
(e.g. gpt-5.3-codex) on direct OpenAI API keys, which need the
/v1/responses endpoint.

Changes:
- Expand _resolve_task_provider_model to return api_mode (5-tuple)
- Read api_mode from auxiliary.{task}.api_mode config and env vars
  (AUXILIARY_{TASK}_API_MODE)
- Pass api_mode through _get_cached_client to resolve_provider_client
- Add _needs_codex_wrap/_wrap_if_needed helpers that wrap plain OpenAI
  clients in CodexAuxiliaryClient when api_mode=codex_responses or
  when auto-detection finds api.openai.com + codex model pattern
- Apply wrapping at all custom endpoint, named custom provider, and
  API-key provider return paths
- Update test mocks for the new 5-tuple return format

Users can now set:
  auxiliary:
    compression:
      model: gpt-5.3-codex
      base_url: https://api.openai.com/v1
      api_mode: codex_responses

Closes #6800

											
										
										
											2026-04-11 13:50:43 +05:30
+								        api_mode=api_mode,
-												fix(agent): route compression aux through live session runtime

											
										
										
											2026-04-12 00:10:19 -04:00
+								        main_runtime=main_runtime,
-												feat: add direct endpoint overrides for auxiliary and delegation

Add base_url/api_key overrides for auxiliary tasks and delegation so users can
route those flows straight to a custom OpenAI-compatible endpoint without
having to rely on provider=main or named custom providers.

Also clear gateway session env vars in test isolation so the full suite stays
deterministic when run from a messaging-backed agent session.

											
										
										
											2026-03-14 20:48:29 -07:00
+								    )
-												feat: enhance auxiliary model configuration and environment variable handling

- Added support for auxiliary model overrides in the configuration, allowing users to specify providers and models for vision and web extraction tasks.
- Updated the CLI configuration example to include new auxiliary model settings.
- Enhanced the environment variable mapping in the CLI to accommodate auxiliary model configurations.
- Improved the resolution logic for auxiliary clients to support task-specific provider overrides.
- Updated relevant documentation and comments for clarity on the new features and their usage.

											
										
										
											2026-03-07 08:52:06 -08:00
-												fix(agent): route compression aux through live session runtime

											
										
										
											2026-04-12 00:10:19 -04:00
+								def get_async_text_auxiliary_client(task: str = "", *, main_runtime: Optional[Dict[str, Any]] = None):
-												refactor(cli): Finalize OpenAI Codex Integration with OAuth

- Enhanced Codex model discovery by fetching available models from the API, with fallback to local cache and defaults.
- Updated the context compressor's summary target tokens to 2500 for improved performance.
- Added external credential detection for Codex CLI to streamline authentication.
- Refactored various components to ensure consistent handling of authentication and model selection across the application.

											
										
										
											2026-02-28 21:47:51 -08:00
+								    """Return (async_client, model_slug) for async consumers.
 								    For standard providers returns (AsyncOpenAI, model). For Codex returns
 								    (AsyncCodexAuxiliaryClient, model) which wraps the Responses API.
 								    Returns (None, None) when no provider is available.
 								    """
-												fix(auxiliary): honor api_mode in auxiliary client (#6800)

The auxiliary client always calls client.chat.completions.create(),
ignoring the api_mode config flag. This breaks codex-family models
(e.g. gpt-5.3-codex) on direct OpenAI API keys, which need the
/v1/responses endpoint.

Changes:
- Expand _resolve_task_provider_model to return api_mode (5-tuple)
- Read api_mode from auxiliary.{task}.api_mode config and env vars
  (AUXILIARY_{TASK}_API_MODE)
- Pass api_mode through _get_cached_client to resolve_provider_client
- Add _needs_codex_wrap/_wrap_if_needed helpers that wrap plain OpenAI
  clients in CodexAuxiliaryClient when api_mode=codex_responses or
  when auto-detection finds api.openai.com + codex model pattern
- Apply wrapping at all custom endpoint, named custom provider, and
  API-key provider return paths
- Update test mocks for the new 5-tuple return format

Users can now set:
  auxiliary:
    compression:
      model: gpt-5.3-codex
      base_url: https://api.openai.com/v1
      api_mode: codex_responses

Closes #6800

											
										
										
											2026-04-11 13:50:43 +05:30
+								    provider, model, base_url, api_key, api_mode = _resolve_task_provider_model(task or None)
-												feat: add direct endpoint overrides for auxiliary and delegation

Add base_url/api_key overrides for auxiliary tasks and delegation so users can
route those flows straight to a custom OpenAI-compatible endpoint without
having to rely on provider=main or named custom providers.

Also clear gateway session env vars in test isolation so the full suite stays
deterministic when run from a messaging-backed agent session.

											
										
										
											2026-03-14 20:48:29 -07:00
+								    return resolve_provider_client(
 								        provider,
 								        model=model,
 								        async_mode=True,
 								        explicit_base_url=base_url,
 								        explicit_api_key=api_key,
-												fix(auxiliary): honor api_mode in auxiliary client (#6800)

The auxiliary client always calls client.chat.completions.create(),
ignoring the api_mode config flag. This breaks codex-family models
(e.g. gpt-5.3-codex) on direct OpenAI API keys, which need the
/v1/responses endpoint.

Changes:
- Expand _resolve_task_provider_model to return api_mode (5-tuple)
- Read api_mode from auxiliary.{task}.api_mode config and env vars
  (AUXILIARY_{TASK}_API_MODE)
- Pass api_mode through _get_cached_client to resolve_provider_client
- Add _needs_codex_wrap/_wrap_if_needed helpers that wrap plain OpenAI
  clients in CodexAuxiliaryClient when api_mode=codex_responses or
  when auto-detection finds api.openai.com + codex model pattern
- Apply wrapping at all custom endpoint, named custom provider, and
  API-key provider return paths
- Update test mocks for the new 5-tuple return format

Users can now set:
  auxiliary:
    compression:
      model: gpt-5.3-codex
      base_url: https://api.openai.com/v1
      api_mode: codex_responses

Closes #6800

											
										
										
											2026-04-11 13:50:43 +05:30
+								        api_mode=api_mode,
-												fix(agent): route compression aux through live session runtime

											
										
										
											2026-04-12 00:10:19 -04:00
+								        main_runtime=main_runtime,
-												feat: add direct endpoint overrides for auxiliary and delegation

Add base_url/api_key overrides for auxiliary tasks and delegation so users can
route those flows straight to a custom OpenAI-compatible endpoint without
having to rely on provider=main or named custom providers.

Also clear gateway session env vars in test isolation so the full suite stays
deterministic when run from a messaging-backed agent session.

											
										
										
											2026-03-14 20:48:29 -07:00
+								    )
-												refactor(cli): Finalize OpenAI Codex Integration with OAuth

- Enhanced Codex model discovery by fetching available models from the API, with fallback to local cache and defaults.
- Updated the context compressor's summary target tokens to 2500 for improved performance.
- Added external credential detection for Codex CLI to streamline authentication.
- Refactored various components to ensure consistent handling of authentication and model selection across the application.

											
										
										
											2026-02-28 21:47:51 -08:00
-												refactor: unify vision backend gating

											
										
										
											2026-03-14 20:22:13 -07:00
+								_VISION_AUTO_PROVIDER_ORDER = (
 								    "openrouter",
 								    "nous",
 								)
-												Hermes Agent UX Improvements

											
										
										
											2026-02-22 02:16:11 -08:00
-												fix: harden auxiliary model config — gateway bridge, vision safety, tests

Improvements on top of PR #606 (auxiliary model configuration):

1. Gateway bridge: Added auxiliary.* and compression.summary_provider
   config bridging to gateway/run.py so config.yaml settings work from
   messaging platforms (not just CLI). Matches the pattern in cli.py.

2. Vision auto-fallback safety: In auto mode, vision now only tries
   OpenRouter + Nous Portal (known multimodal-capable providers).
   Custom endpoints, Codex, and API-key providers are skipped to avoid
   confusing errors from providers that don't support vision input.
   Explicit provider override (AUXILIARY_VISION_PROVIDER=main) still
   allows using any provider.

3. Comprehensive tests (46 new):
   - _get_auxiliary_provider env var resolution (8 tests)
   - _resolve_forced_provider with all provider types (8 tests)
   - Per-task provider routing integration (4 tests)
   - Vision auto-fallback safety (7 tests)
   - Config bridging logic (11 tests)
   - Gateway/CLI bridge parity (2 tests)
   - Vision model override via env var (2 tests)
   - DEFAULT_CONFIG shape validation (4 tests)

4. Docs: Added auxiliary_client.py to AGENTS.md project structure.
   Updated module docstring with separate text/vision resolution chains.

Tests: 2429 passed (was 2383).

											
										
										
											2026-03-08 18:06:40 -07:00
-												refactor: unify vision backend gating

											
										
										
											2026-03-14 20:22:13 -07:00
+								def _normalize_vision_provider(provider: Optional[str]) -> str:
-												fix(agent): propagate api_mode to vision provider resolution

resolve_vision_provider_client() computed resolved_api_mode from config
but never passed it to downstream resolve_provider_client() or
_get_cached_client() calls, causing custom providers with
api_mode: anthropic_messages to crash when used for vision tasks.

Also remove the for_vision special case in _normalize_aux_provider()
that incorrectly discarded named custom provider identifiers.

Fixes #8857

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

											
										
										
											2026-04-13 16:08:19 +08:00
+								    return _normalize_aux_provider(provider)
-												refactor: unify vision backend gating

											
										
										
											2026-03-14 20:22:13 -07:00
-												fix(copilot): send vision header for Copilot vision requests

Thread a vision-request flag through auxiliary provider resolution so Copilot clients can include Copilot-Vision-Request only for vision tasks. This preserves normal text requests while ensuring Copilot vision payloads reach the vision-capable route.

Add regression coverage for Copilot vision routing and keep cached text and vision clients separate so a text client without the header is not reused for vision.

Co-authored-by: dhabibi <9087935+dhabibi@users.noreply.github.com>

											
										
										
											2026-04-26 21:29:55 +00:00
+								def _resolve_strict_vision_backend(
 								    provider: str,
 								    model: Optional[str] = None,
 								) -> Tuple[Optional[Any], Optional[str]]:
-												refactor: unify vision backend gating

											
										
										
											2026-03-14 20:22:13 -07:00
+								    provider = _normalize_vision_provider(provider)
-												fix(copilot): send vision header for Copilot vision requests

Thread a vision-request flag through auxiliary provider resolution so Copilot clients can include Copilot-Vision-Request only for vision tasks. This preserves normal text requests while ensuring Copilot vision payloads reach the vision-capable route.

Add regression coverage for Copilot vision routing and keep cached text and vision clients separate so a text client without the header is not reused for vision.

Co-authored-by: dhabibi <9087935+dhabibi@users.noreply.github.com>

											
										
										
											2026-04-26 21:29:55 +00:00
+								    if provider == "copilot":
 								        return resolve_provider_client("copilot", model, is_vision=True)
-												refactor: unify vision backend gating

											
										
										
											2026-03-14 20:22:13 -07:00
+								    if provider == "openrouter":
 								        return _try_openrouter()
 								    if provider == "nous":
-												feat: use mimo-v2-pro for non-vision auxiliary tasks on Nous free tier (#6018)

Free-tier Nous Portal users were getting mimo-v2-omni (a multimodal
model) for all auxiliary tasks including compression, session search,
and web extraction. Now routes non-vision tasks to mimo-v2-pro (a
text model) which is better suited for those workloads.

- Added _NOUS_FREE_TIER_AUX_MODEL constant for text auxiliary tasks
- _try_nous() accepts vision=False param to select the right model
- Vision path (_resolve_strict_vision_backend) passes vision=True
- All other callers default to vision=False → mimo-v2-pro
											
										
										
											2026-04-07 21:41:05 -07:00
+								        return _try_nous(vision=True)
-												refactor: unify vision backend gating

											
										
										
											2026-03-14 20:22:13 -07:00
+								    if provider == "openai-codex":
 								        return _try_codex()
-												feat: add native Anthropic auxiliary vision

											
										
										
											2026-03-14 21:14:20 -07:00
+								    if provider == "anthropic":
 								        return _try_anthropic()
-												refactor: unify vision backend gating

											
										
										
											2026-03-14 20:22:13 -07:00
+								    if provider == "custom":
 								        return _try_custom_endpoint()
-												fix: harden auxiliary model config — gateway bridge, vision safety, tests

Improvements on top of PR #606 (auxiliary model configuration):

1. Gateway bridge: Added auxiliary.* and compression.summary_provider
   config bridging to gateway/run.py so config.yaml settings work from
   messaging platforms (not just CLI). Matches the pattern in cli.py.

2. Vision auto-fallback safety: In auto mode, vision now only tries
   OpenRouter + Nous Portal (known multimodal-capable providers).
   Custom endpoints, Codex, and API-key providers are skipped to avoid
   confusing errors from providers that don't support vision input.
   Explicit provider override (AUXILIARY_VISION_PROVIDER=main) still
   allows using any provider.

3. Comprehensive tests (46 new):
   - _get_auxiliary_provider env var resolution (8 tests)
   - _resolve_forced_provider with all provider types (8 tests)
   - Per-task provider routing integration (4 tests)
   - Vision auto-fallback safety (7 tests)
   - Config bridging logic (11 tests)
   - Gateway/CLI bridge parity (2 tests)
   - Vision model override via env var (2 tests)
   - DEFAULT_CONFIG shape validation (4 tests)

4. Docs: Added auxiliary_client.py to AGENTS.md project structure.
   Updated module docstring with separate text/vision resolution chains.

Tests: 2429 passed (was 2383).

											
										
										
											2026-03-08 18:06:40 -07:00
+								    return None, None
-												refactor: integrate Nous Portal support in auxiliary client

- Added functionality to include product attribution tags for Nous Portal in auxiliary API calls.
- Introduced a mechanism to determine if the auxiliary client is backed by Nous Portal, affecting the extra body of requests.
- Updated various tools to utilize the new extra body configuration for enhanced tracking in API calls.

											
										
										
											2026-02-25 18:39:36 -08:00
-												refactor: unify vision backend gating

											
										
										
											2026-03-14 20:22:13 -07:00
+								def _strict_vision_backend_available(provider: str) -> bool:
 								    return _resolve_strict_vision_backend(provider)[0] is not None
-												feat: centralized provider router + fix Codex vision bypass + vision error handling

Three interconnected fixes for auxiliary client infrastructure:

1. CENTRALIZED PROVIDER ROUTER (auxiliary_client.py)
   Add resolve_provider_client(provider, model, async_mode) — a single
   entry point for creating properly configured clients. Given a provider
   name and optional model, it handles auth lookup (env vars, OAuth
   tokens, auth.json), base URL resolution, provider-specific headers,
   and API format differences (Chat Completions vs Responses API for
   Codex). All auxiliary consumers should route through this instead of
   ad-hoc env var lookups.

   Refactored get_text_auxiliary_client, get_async_text_auxiliary_client,
   and get_vision_auxiliary_client to use the router internally.

2. FIX CODEX VISION BYPASS (vision_tools.py)
   vision_tools.py was constructing a raw AsyncOpenAI client from the
   sync vision client's api_key/base_url, completely bypassing the Codex
   Responses API adapter. When the vision provider resolved to Codex,
   the raw client would hit chatgpt.com/backend-api/codex with
   chat.completions.create() which only supports the Responses API.

   Fix: Added get_async_vision_auxiliary_client() which properly wraps
   Codex into AsyncCodexAuxiliaryClient. vision_tools.py now uses this
   instead of manual client construction.

3. FIX COMPRESSION FALLBACK + VISION ERROR HANDLING
   - context_compressor.py: Removed _get_fallback_client() which blindly
     looked for OPENAI_API_KEY + OPENAI_BASE_URL (fails for Codex OAuth,
     API-key providers, users without OPENAI_BASE_URL set). Replaced
     with fallback loop through resolve_provider_client() for each
     known provider, with same-provider dedup.

   - vision_tools.py: Added error detection for vision capability
     failures. Returns clear message to the model when the configured
     model doesn't support vision, instead of a generic error.

Addresses #886

											
										
										
											2026-03-11 19:46:47 -07:00
-												refactor: unify vision backend gating

											
										
										
											2026-03-14 20:22:13 -07:00
+								def get_available_vision_backends() -> List[str]:
 								    """Return the currently available vision backends in auto-selection order.
-												fix: CI test failures — metadata key, cli console, docker env, vision order (#6294)

Fixes 9 test failures on current main, incorporating ideas from PR stack
#6219-#6222 by xinbenlv with corrections:

- model_metadata: sync HF context length key casing
  (minimaxai/minimax-m2.5 → MiniMaxAI/MiniMax-M2.5)

- cli.py: route quick command error output through self.console
  instead of creating a new ChatConsole() instance

- docker.py: explicit docker_forward_env entries now bypass the
  Hermes secret blocklist (intentional opt-in wins over generic filter)

- auxiliary_client: revert _read_main_provider() to simple
  provider.strip().lower() — the _normalize_aux_provider() call
  introduced in 5c03f2e7 stripped the custom: prefix, breaking
  named custom provider resolution

- auxiliary_client: flip vision auto-detection order to
  active provider → OpenRouter → Nous → stop (was OR → Nous → active)

- test: update vision priority test to match new order

Based on PR #6219-#6222 by xinbenlv.
											
										
										
											2026-04-08 16:37:05 -07:00
+								    Order: active provider → OpenRouter → Nous → stop.  This is the single
 								    source of truth for setup, tool gating, and runtime auto-routing of
 								    vision tasks.
-												feat: centralized provider router + fix Codex vision bypass + vision error handling

Three interconnected fixes for auxiliary client infrastructure:

1. CENTRALIZED PROVIDER ROUTER (auxiliary_client.py)
   Add resolve_provider_client(provider, model, async_mode) — a single
   entry point for creating properly configured clients. Given a provider
   name and optional model, it handles auth lookup (env vars, OAuth
   tokens, auth.json), base URL resolution, provider-specific headers,
   and API format differences (Chat Completions vs Responses API for
   Codex). All auxiliary consumers should route through this instead of
   ad-hoc env var lookups.

   Refactored get_text_auxiliary_client, get_async_text_auxiliary_client,
   and get_vision_auxiliary_client to use the router internally.

2. FIX CODEX VISION BYPASS (vision_tools.py)
   vision_tools.py was constructing a raw AsyncOpenAI client from the
   sync vision client's api_key/base_url, completely bypassing the Codex
   Responses API adapter. When the vision provider resolved to Codex,
   the raw client would hit chatgpt.com/backend-api/codex with
   chat.completions.create() which only supports the Responses API.

   Fix: Added get_async_vision_auxiliary_client() which properly wraps
   Codex into AsyncCodexAuxiliaryClient. vision_tools.py now uses this
   instead of manual client construction.

3. FIX COMPRESSION FALLBACK + VISION ERROR HANDLING
   - context_compressor.py: Removed _get_fallback_client() which blindly
     looked for OPENAI_API_KEY + OPENAI_BASE_URL (fails for Codex OAuth,
     API-key providers, users without OPENAI_BASE_URL set). Replaced
     with fallback loop through resolve_provider_client() for each
     known provider, with same-provider dedup.

   - vision_tools.py: Added error detection for vision capability
     failures. Returns clear message to the model when the configured
     model doesn't support vision, instead of a generic error.

Addresses #886

											
										
										
											2026-03-11 19:46:47 -07:00
+								    """
-												fix: CI test failures — metadata key, cli console, docker env, vision order (#6294)

Fixes 9 test failures on current main, incorporating ideas from PR stack
#6219-#6222 by xinbenlv with corrections:

- model_metadata: sync HF context length key casing
  (minimaxai/minimax-m2.5 → MiniMaxAI/MiniMax-M2.5)

- cli.py: route quick command error output through self.console
  instead of creating a new ChatConsole() instance

- docker.py: explicit docker_forward_env entries now bypass the
  Hermes secret blocklist (intentional opt-in wins over generic filter)

- auxiliary_client: revert _read_main_provider() to simple
  provider.strip().lower() — the _normalize_aux_provider() call
  introduced in 5c03f2e7 stripped the custom: prefix, breaking
  named custom provider resolution

- auxiliary_client: flip vision auto-detection order to
  active provider → OpenRouter → Nous → stop (was OR → Nous → active)

- test: update vision priority test to match new order

Based on PR #6219-#6222 by xinbenlv.
											
										
										
											2026-04-08 16:37:05 -07:00
+								    available: List[str] = []
 								    # 1. Active provider — if the user configured a provider, try it first.
-												fix(vision): simplify vision auto-detection to openrouter → nous → active provider

Simplify the vision auto-detection chain from 5 backends (openrouter,
nous, codex, anthropic, custom) down to 3:

  1. OpenRouter  (known vision-capable default model)
  2. Nous Portal (known vision-capable default model)
  3. Active provider + model (whatever the user is running)
  4. Stop

This is simpler and more predictable. The active provider step uses
resolve_provider_client() which handles all provider types including
named custom providers (from #5978).

Removed the complex preferred-provider promotion logic and API-level
fallback — the chain is short enough that it doesn't need them.

Based on PR #5376 by Mibay. Closes #5366.

											
										
										
											2026-04-07 22:24:36 -07:00
+								    main_provider = _read_main_provider()
-												fix: CI test failures — metadata key, cli console, docker env, vision order (#6294)

Fixes 9 test failures on current main, incorporating ideas from PR stack
#6219-#6222 by xinbenlv with corrections:

- model_metadata: sync HF context length key casing
  (minimaxai/minimax-m2.5 → MiniMaxAI/MiniMax-M2.5)

- cli.py: route quick command error output through self.console
  instead of creating a new ChatConsole() instance

- docker.py: explicit docker_forward_env entries now bypass the
  Hermes secret blocklist (intentional opt-in wins over generic filter)

- auxiliary_client: revert _read_main_provider() to simple
  provider.strip().lower() — the _normalize_aux_provider() call
  introduced in 5c03f2e7 stripped the custom: prefix, breaking
  named custom provider resolution

- auxiliary_client: flip vision auto-detection order to
  active provider → OpenRouter → Nous → stop (was OR → Nous → active)

- test: update vision priority test to match new order

Based on PR #6219-#6222 by xinbenlv.
											
										
										
											2026-04-08 16:37:05 -07:00
+								    if main_provider and main_provider not in ("auto", ""):
 								        if main_provider in _VISION_AUTO_PROVIDER_ORDER:
 								            if _strict_vision_backend_available(main_provider):
 								                available.append(main_provider)
 								        else:
 								            client, _ = resolve_provider_client(main_provider, _read_main_model())
 								            if client is not None:
 								                available.append(main_provider)
 								    # 2. OpenRouter, 3. Nous — skip if already covered by main provider.
 								    for p in _VISION_AUTO_PROVIDER_ORDER:
 								        if p not in available and _strict_vision_backend_available(p):
 								            available.append(p)
-												fix(vision): simplify vision auto-detection to openrouter → nous → active provider

Simplify the vision auto-detection chain from 5 backends (openrouter,
nous, codex, anthropic, custom) down to 3:

  1. OpenRouter  (known vision-capable default model)
  2. Nous Portal (known vision-capable default model)
  3. Active provider + model (whatever the user is running)
  4. Stop

This is simpler and more predictable. The active provider step uses
resolve_provider_client() which handles all provider types including
named custom providers (from #5978).

Removed the complex preferred-provider promotion logic and API-level
fallback — the chain is short enough that it doesn't need them.

Based on PR #5376 by Mibay. Closes #5366.

											
										
										
											2026-04-07 22:24:36 -07:00
+								    return available
-												refactor: unify vision backend gating

											
										
										
											2026-03-14 20:22:13 -07:00
 								def resolve_vision_provider_client(
 								    provider: Optional[str] = None,
 								    model: Optional[str] = None,
 								    *,
-												feat: add direct endpoint overrides for auxiliary and delegation

Add base_url/api_key overrides for auxiliary tasks and delegation so users can
route those flows straight to a custom OpenAI-compatible endpoint without
having to rely on provider=main or named custom providers.

Also clear gateway session env vars in test isolation so the full suite stays
deterministic when run from a messaging-backed agent session.

											
										
										
											2026-03-14 20:48:29 -07:00
+								    base_url: Optional[str] = None,
 								    api_key: Optional[str] = None,
-												refactor: unify vision backend gating

											
										
										
											2026-03-14 20:22:13 -07:00
+								    async_mode: bool = False,
 								) -> Tuple[Optional[str], Optional[Any], Optional[str]]:
 								    """Resolve the client actually used for vision tasks.
-												feat: add direct endpoint overrides for auxiliary and delegation

Add base_url/api_key overrides for auxiliary tasks and delegation so users can
route those flows straight to a custom OpenAI-compatible endpoint without
having to rely on provider=main or named custom providers.

Also clear gateway session env vars in test isolation so the full suite stays
deterministic when run from a messaging-backed agent session.

											
										
										
											2026-03-14 20:48:29 -07:00
+								    Direct endpoint overrides take precedence over provider selection. Explicit
 								    provider overrides still use the generic provider router for non-standard
 								    backends, so users can intentionally force experimental providers. Auto mode
 								    stays conservative and only tries vision backends known to work today.
-												refactor: unify vision backend gating

											
										
										
											2026-03-14 20:22:13 -07:00
+								    """
-												fix(auxiliary): honor api_mode in auxiliary client (#6800)

The auxiliary client always calls client.chat.completions.create(),
ignoring the api_mode config flag. This breaks codex-family models
(e.g. gpt-5.3-codex) on direct OpenAI API keys, which need the
/v1/responses endpoint.

Changes:
- Expand _resolve_task_provider_model to return api_mode (5-tuple)
- Read api_mode from auxiliary.{task}.api_mode config and env vars
  (AUXILIARY_{TASK}_API_MODE)
- Pass api_mode through _get_cached_client to resolve_provider_client
- Add _needs_codex_wrap/_wrap_if_needed helpers that wrap plain OpenAI
  clients in CodexAuxiliaryClient when api_mode=codex_responses or
  when auto-detection finds api.openai.com + codex model pattern
- Apply wrapping at all custom endpoint, named custom provider, and
  API-key provider return paths
- Update test mocks for the new 5-tuple return format

Users can now set:
  auxiliary:
    compression:
      model: gpt-5.3-codex
      base_url: https://api.openai.com/v1
      api_mode: codex_responses

Closes #6800

											
										
										
											2026-04-11 13:50:43 +05:30
+								    requested, resolved_model, resolved_base_url, resolved_api_key, resolved_api_mode = _resolve_task_provider_model(
-												feat: add direct endpoint overrides for auxiliary and delegation

Add base_url/api_key overrides for auxiliary tasks and delegation so users can
route those flows straight to a custom OpenAI-compatible endpoint without
having to rely on provider=main or named custom providers.

Also clear gateway session env vars in test isolation so the full suite stays
deterministic when run from a messaging-backed agent session.

											
										
										
											2026-03-14 20:48:29 -07:00
+								        "vision", provider, model, base_url, api_key
 								    )
 								    requested = _normalize_vision_provider(requested)
-												refactor: unify vision backend gating

											
										
										
											2026-03-14 20:22:13 -07:00
 								    def _finalize(resolved_provider: str, sync_client: Any, default_model: Optional[str]):
 								        if sync_client is None:
 								            return resolved_provider, None, None
-												feat: add direct endpoint overrides for auxiliary and delegation

Add base_url/api_key overrides for auxiliary tasks and delegation so users can
route those flows straight to a custom OpenAI-compatible endpoint without
having to rely on provider=main or named custom providers.

Also clear gateway session env vars in test isolation so the full suite stays
deterministic when run from a messaging-backed agent session.

											
										
										
											2026-03-14 20:48:29 -07:00
+								        final_model = resolved_model or default_model
-												refactor: unify vision backend gating

											
										
										
											2026-03-14 20:22:13 -07:00
+								        if async_mode:
-												fix(copilot): send vision header for Copilot vision requests

Thread a vision-request flag through auxiliary provider resolution so Copilot clients can include Copilot-Vision-Request only for vision tasks. This preserves normal text requests while ensuring Copilot vision payloads reach the vision-capable route.

Add regression coverage for Copilot vision routing and keep cached text and vision clients separate so a text client without the header is not reused for vision.

Co-authored-by: dhabibi <9087935+dhabibi@users.noreply.github.com>

											
										
										
											2026-04-26 21:29:55 +00:00
+								            async_client, async_model = _to_async_client(sync_client, final_model, is_vision=True)
-												refactor: unify vision backend gating

											
										
										
											2026-03-14 20:22:13 -07:00
+								            return resolved_provider, async_client, async_model
 								        return resolved_provider, sync_client, final_model
-												feat: add direct endpoint overrides for auxiliary and delegation

Add base_url/api_key overrides for auxiliary tasks and delegation so users can
route those flows straight to a custom OpenAI-compatible endpoint without
having to rely on provider=main or named custom providers.

Also clear gateway session env vars in test isolation so the full suite stays
deterministic when run from a messaging-backed agent session.

											
										
										
											2026-03-14 20:48:29 -07:00
+								    if resolved_base_url:
 								        client, final_model = resolve_provider_client(
 								            "custom",
 								            model=resolved_model,
 								            async_mode=async_mode,
 								            explicit_base_url=resolved_base_url,
 								            explicit_api_key=resolved_api_key,
-												fix(agent): propagate api_mode to vision provider resolution

resolve_vision_provider_client() computed resolved_api_mode from config
but never passed it to downstream resolve_provider_client() or
_get_cached_client() calls, causing custom providers with
api_mode: anthropic_messages to crash when used for vision tasks.

Also remove the for_vision special case in _normalize_aux_provider()
that incorrectly discarded named custom provider identifiers.

Fixes #8857

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

											
										
										
											2026-04-13 16:08:19 +08:00
+								            api_mode=resolved_api_mode,
-												feat: add direct endpoint overrides for auxiliary and delegation

Add base_url/api_key overrides for auxiliary tasks and delegation so users can
route those flows straight to a custom OpenAI-compatible endpoint without
having to rely on provider=main or named custom providers.

Also clear gateway session env vars in test isolation so the full suite stays
deterministic when run from a messaging-backed agent session.

											
										
										
											2026-03-14 20:48:29 -07:00
+								        )
 								        if client is None:
 								            return "custom", None, None
 								        return "custom", client, final_model
-												refactor: unify vision backend gating

											
										
										
											2026-03-14 20:22:13 -07:00
+								    if requested == "auto":
-												fix(vision): simplify vision auto-detection to openrouter → nous → active provider

Simplify the vision auto-detection chain from 5 backends (openrouter,
nous, codex, anthropic, custom) down to 3:

  1. OpenRouter  (known vision-capable default model)
  2. Nous Portal (known vision-capable default model)
  3. Active provider + model (whatever the user is running)
  4. Stop

This is simpler and more predictable. The active provider step uses
resolve_provider_client() which handles all provider types including
named custom providers (from #5978).

Removed the complex preferred-provider promotion logic and API-level
fallback — the chain is short enough that it doesn't need them.

Based on PR #5376 by Mibay. Closes #5366.

											
										
										
											2026-04-07 22:24:36 -07:00
+								        # Vision auto-detection order:
-												feat(auxiliary): default 'auto' routing to main model for all users (#11900)

Before: aggregator users (OpenRouter / Nous Portal) running 'auto'
routing for auxiliary tasks — compression, vision, web extraction,
session search, etc. — got routed to a cheap provider-side default
model (Gemini Flash).  Non-aggregator users already got their main
model.  Behavior was inconsistent and surprising — users picked
Claude / GPT / their preferred model, but side tasks ran on
Gemini Flash.

After: 'auto' means "use my main chat model" for every user,
regardless of provider type.  Only when the main provider has no
working client does the fallback chain run (OpenRouter → Nous →
custom → Codex → API-key providers).  Explicit per-task overrides
in config.yaml (auxiliary.<task>.provider / .model) still win —
they are a hard constraint, not subject to the auto policy.

Vision auto-detection follows the same policy: try main provider +
main model first (with _PROVIDER_VISION_MODELS overrides preserved
for providers like xiaomi and zai that ship a dedicated multimodal
model distinct from their chat model).  Aggregator strict vision
backends are fallbacks, not the primary path.

Changes:
  - agent/auxiliary_client.py: _resolve_auto() drops the
    `_AGGREGATOR_PROVIDERS` guard.  resolve_vision_provider_client()
    auto branch unifies aggregator and exotic-provider paths —
    everyone goes through resolve_provider_client() with main_model.
    Dead _AGGREGATOR_PROVIDERS constant removed (was only used by
    the guard we just removed).
  - hermes_cli/main.py: aux config menu copy updated to reflect
    the new semantics ("'auto' means 'use my main model'").
  - tests/agent/test_auxiliary_main_first.py: 12 regression tests
    covering OpenRouter/Nous/DeepSeek main paths, runtime-override
    wins, explicit-config wins, vision override preservation for
    exotic providers, and fallback-chain activation when the main
    provider has no working client.

Co-authored-by: teknium1 <teknium@nousresearch.com>
											
										
										
											2026-04-17 19:13:23 -07:00
+								        #   1. User's main provider + main model (including aggregators).
 								        #      _PROVIDER_VISION_MODELS provides per-provider vision model
 								        #      overrides when the provider has a dedicated multimodal model
 								        #      that differs from the chat model (e.g. xiaomi → mimo-v2-omni,
-												fix(vision): route Nous main-provider vision through tier-aware backend

											
										
										
											2026-04-21 15:19:00 -06:00
+								        #      zai → glm-5v-turbo). Nous is the exception: it has a dedicated
 								        #      strict vision backend with tier-aware defaults, so it must not
 								        #      fall through to the user's text chat model here.
-												feat(auxiliary): default 'auto' routing to main model for all users (#11900)

Before: aggregator users (OpenRouter / Nous Portal) running 'auto'
routing for auxiliary tasks — compression, vision, web extraction,
session search, etc. — got routed to a cheap provider-side default
model (Gemini Flash).  Non-aggregator users already got their main
model.  Behavior was inconsistent and surprising — users picked
Claude / GPT / their preferred model, but side tasks ran on
Gemini Flash.

After: 'auto' means "use my main chat model" for every user,
regardless of provider type.  Only when the main provider has no
working client does the fallback chain run (OpenRouter → Nous →
custom → Codex → API-key providers).  Explicit per-task overrides
in config.yaml (auxiliary.<task>.provider / .model) still win —
they are a hard constraint, not subject to the auto policy.

Vision auto-detection follows the same policy: try main provider +
main model first (with _PROVIDER_VISION_MODELS overrides preserved
for providers like xiaomi and zai that ship a dedicated multimodal
model distinct from their chat model).  Aggregator strict vision
backends are fallbacks, not the primary path.

Changes:
  - agent/auxiliary_client.py: _resolve_auto() drops the
    `_AGGREGATOR_PROVIDERS` guard.  resolve_vision_provider_client()
    auto branch unifies aggregator and exotic-provider paths —
    everyone goes through resolve_provider_client() with main_model.
    Dead _AGGREGATOR_PROVIDERS constant removed (was only used by
    the guard we just removed).
  - hermes_cli/main.py: aux config menu copy updated to reflect
    the new semantics ("'auto' means 'use my main model'").
  - tests/agent/test_auxiliary_main_first.py: 12 regression tests
    covering OpenRouter/Nous/DeepSeek main paths, runtime-override
    wins, explicit-config wins, vision override preservation for
    exotic providers, and fallback-chain activation when the main
    provider has no working client.

Co-authored-by: teknium1 <teknium@nousresearch.com>
											
										
										
											2026-04-17 19:13:23 -07:00
+								        #   2. OpenRouter  (vision-capable aggregator fallback)
 								        #   3. Nous Portal (vision-capable aggregator fallback)
-												fix(vision): simplify vision auto-detection to openrouter → nous → active provider

Simplify the vision auto-detection chain from 5 backends (openrouter,
nous, codex, anthropic, custom) down to 3:

  1. OpenRouter  (known vision-capable default model)
  2. Nous Portal (known vision-capable default model)
  3. Active provider + model (whatever the user is running)
  4. Stop

This is simpler and more predictable. The active provider step uses
resolve_provider_client() which handles all provider types including
named custom providers (from #5978).

Removed the complex preferred-provider promotion logic and API-level
fallback — the chain is short enough that it doesn't need them.

Based on PR #5376 by Mibay. Closes #5366.

											
										
										
											2026-04-07 22:24:36 -07:00
+								        #   4. Stop
 								        main_provider = _read_main_provider()
 								        main_model = _read_main_model()
 								        if main_provider and main_provider not in ("auto", ""):
-												fix(copilot): send vision header for Copilot vision requests

Thread a vision-request flag through auxiliary provider resolution so Copilot clients can include Copilot-Vision-Request only for vision tasks. This preserves normal text requests while ensuring Copilot vision payloads reach the vision-capable route.

Add regression coverage for Copilot vision routing and keep cached text and vision clients separate so a text client without the header is not reused for vision.

Co-authored-by: dhabibi <9087935+dhabibi@users.noreply.github.com>

											
										
										
											2026-04-26 21:29:55 +00:00
+								            vision_model = _PROVIDER_VISION_MODELS.get(main_provider, main_model)
-												fix(vision): route Nous main-provider vision through tier-aware backend

											
										
										
											2026-04-21 15:19:00 -06:00
+								            if main_provider == "nous":
-												fix(copilot): send vision header for Copilot vision requests

Thread a vision-request flag through auxiliary provider resolution so Copilot clients can include Copilot-Vision-Request only for vision tasks. This preserves normal text requests while ensuring Copilot vision payloads reach the vision-capable route.

Add regression coverage for Copilot vision routing and keep cached text and vision clients separate so a text client without the header is not reused for vision.

Co-authored-by: dhabibi <9087935+dhabibi@users.noreply.github.com>

											
										
										
											2026-04-26 21:29:55 +00:00
+								                sync_client, default_model = _resolve_strict_vision_backend(
 								                    main_provider, vision_model
 								                )
-												fix(vision): route Nous main-provider vision through tier-aware backend

											
										
										
											2026-04-21 15:19:00 -06:00
+								                if sync_client is not None:
 								                    logger.info(
 								                        "Vision auto-detect: using main provider %s (%s)",
 								                        main_provider, default_model or resolved_model or main_model,
 								                    )
 								                    return _finalize(main_provider, sync_client, default_model)
 								            else:
 								                rpc_client, rpc_model = resolve_provider_client(
 								                    main_provider, vision_model,
-												fix(copilot): send vision header for Copilot vision requests

Thread a vision-request flag through auxiliary provider resolution so Copilot clients can include Copilot-Vision-Request only for vision tasks. This preserves normal text requests while ensuring Copilot vision payloads reach the vision-capable route.

Add regression coverage for Copilot vision routing and keep cached text and vision clients separate so a text client without the header is not reused for vision.

Co-authored-by: dhabibi <9087935+dhabibi@users.noreply.github.com>

											
										
										
											2026-04-26 21:29:55 +00:00
+								                    api_mode=resolved_api_mode,
 								                    is_vision=True)
-												fix(vision): route Nous main-provider vision through tier-aware backend

											
										
										
											2026-04-21 15:19:00 -06:00
+								                if rpc_client is not None:
 								                    logger.info(
 								                        "Vision auto-detect: using main provider %s (%s)",
 								                        main_provider, rpc_model or vision_model,
 								                    )
 								                    return _finalize(
 								                        main_provider, rpc_client, rpc_model or vision_model)
-												fix: CI test failures — metadata key, cli console, docker env, vision order (#6294)

Fixes 9 test failures on current main, incorporating ideas from PR stack
#6219-#6222 by xinbenlv with corrections:

- model_metadata: sync HF context length key casing
  (minimaxai/minimax-m2.5 → MiniMaxAI/MiniMax-M2.5)

- cli.py: route quick command error output through self.console
  instead of creating a new ChatConsole() instance

- docker.py: explicit docker_forward_env entries now bypass the
  Hermes secret blocklist (intentional opt-in wins over generic filter)

- auxiliary_client: revert _read_main_provider() to simple
  provider.strip().lower() — the _normalize_aux_provider() call
  introduced in 5c03f2e7 stripped the custom: prefix, breaking
  named custom provider resolution

- auxiliary_client: flip vision auto-detection order to
  active provider → OpenRouter → Nous → stop (was OR → Nous → active)

- test: update vision priority test to match new order

Based on PR #6219-#6222 by xinbenlv.
											
										
										
											2026-04-08 16:37:05 -07:00
-												feat(auxiliary): default 'auto' routing to main model for all users (#11900)

Before: aggregator users (OpenRouter / Nous Portal) running 'auto'
routing for auxiliary tasks — compression, vision, web extraction,
session search, etc. — got routed to a cheap provider-side default
model (Gemini Flash).  Non-aggregator users already got their main
model.  Behavior was inconsistent and surprising — users picked
Claude / GPT / their preferred model, but side tasks ran on
Gemini Flash.

After: 'auto' means "use my main chat model" for every user,
regardless of provider type.  Only when the main provider has no
working client does the fallback chain run (OpenRouter → Nous →
custom → Codex → API-key providers).  Explicit per-task overrides
in config.yaml (auxiliary.<task>.provider / .model) still win —
they are a hard constraint, not subject to the auto policy.

Vision auto-detection follows the same policy: try main provider +
main model first (with _PROVIDER_VISION_MODELS overrides preserved
for providers like xiaomi and zai that ship a dedicated multimodal
model distinct from their chat model).  Aggregator strict vision
backends are fallbacks, not the primary path.

Changes:
  - agent/auxiliary_client.py: _resolve_auto() drops the
    `_AGGREGATOR_PROVIDERS` guard.  resolve_vision_provider_client()
    auto branch unifies aggregator and exotic-provider paths —
    everyone goes through resolve_provider_client() with main_model.
    Dead _AGGREGATOR_PROVIDERS constant removed (was only used by
    the guard we just removed).
  - hermes_cli/main.py: aux config menu copy updated to reflect
    the new semantics ("'auto' means 'use my main model'").
  - tests/agent/test_auxiliary_main_first.py: 12 regression tests
    covering OpenRouter/Nous/DeepSeek main paths, runtime-override
    wins, explicit-config wins, vision override preservation for
    exotic providers, and fallback-chain activation when the main
    provider has no working client.

Co-authored-by: teknium1 <teknium@nousresearch.com>
											
										
										
											2026-04-17 19:13:23 -07:00
+								        # Fall back through aggregators (uses their dedicated vision model,
 								        # not the user's main model) when main provider has no client.
-												fix: CI test failures — metadata key, cli console, docker env, vision order (#6294)

Fixes 9 test failures on current main, incorporating ideas from PR stack
#6219-#6222 by xinbenlv with corrections:

- model_metadata: sync HF context length key casing
  (minimaxai/minimax-m2.5 → MiniMaxAI/MiniMax-M2.5)

- cli.py: route quick command error output through self.console
  instead of creating a new ChatConsole() instance

- docker.py: explicit docker_forward_env entries now bypass the
  Hermes secret blocklist (intentional opt-in wins over generic filter)

- auxiliary_client: revert _read_main_provider() to simple
  provider.strip().lower() — the _normalize_aux_provider() call
  introduced in 5c03f2e7 stripped the custom: prefix, breaking
  named custom provider resolution

- auxiliary_client: flip vision auto-detection order to
  active provider → OpenRouter → Nous → stop (was OR → Nous → active)

- test: update vision priority test to match new order

Based on PR #6219-#6222 by xinbenlv.
											
										
										
											2026-04-08 16:37:05 -07:00
+								        for candidate in _VISION_AUTO_PROVIDER_ORDER:
 								            if candidate == main_provider:
 								                continue  # already tried above
 								            sync_client, default_model = _resolve_strict_vision_backend(candidate)
-												fix(vision): simplify vision auto-detection to openrouter → nous → active provider

Simplify the vision auto-detection chain from 5 backends (openrouter,
nous, codex, anthropic, custom) down to 3:

  1. OpenRouter  (known vision-capable default model)
  2. Nous Portal (known vision-capable default model)
  3. Active provider + model (whatever the user is running)
  4. Stop

This is simpler and more predictable. The active provider step uses
resolve_provider_client() which handles all provider types including
named custom providers (from #5978).

Removed the complex preferred-provider promotion logic and API-level
fallback — the chain is short enough that it doesn't need them.

Based on PR #5376 by Mibay. Closes #5366.

											
										
										
											2026-04-07 22:24:36 -07:00
+								            if sync_client is not None:
-												fix: CI test failures — metadata key, cli console, docker env, vision order (#6294)

Fixes 9 test failures on current main, incorporating ideas from PR stack
#6219-#6222 by xinbenlv with corrections:

- model_metadata: sync HF context length key casing
  (minimaxai/minimax-m2.5 → MiniMaxAI/MiniMax-M2.5)

- cli.py: route quick command error output through self.console
  instead of creating a new ChatConsole() instance

- docker.py: explicit docker_forward_env entries now bypass the
  Hermes secret blocklist (intentional opt-in wins over generic filter)

- auxiliary_client: revert _read_main_provider() to simple
  provider.strip().lower() — the _normalize_aux_provider() call
  introduced in 5c03f2e7 stripped the custom: prefix, breaking
  named custom provider resolution

- auxiliary_client: flip vision auto-detection order to
  active provider → OpenRouter → Nous → stop (was OR → Nous → active)

- test: update vision priority test to match new order

Based on PR #6219-#6222 by xinbenlv.
											
										
										
											2026-04-08 16:37:05 -07:00
+								                return _finalize(candidate, sync_client, default_model)
-												fix(vision): simplify vision auto-detection to openrouter → nous → active provider

Simplify the vision auto-detection chain from 5 backends (openrouter,
nous, codex, anthropic, custom) down to 3:

  1. OpenRouter  (known vision-capable default model)
  2. Nous Portal (known vision-capable default model)
  3. Active provider + model (whatever the user is running)
  4. Stop

This is simpler and more predictable. The active provider step uses
resolve_provider_client() which handles all provider types including
named custom providers (from #5978).

Removed the complex preferred-provider promotion logic and API-level
fallback — the chain is short enough that it doesn't need them.

Based on PR #5376 by Mibay. Closes #5366.

											
										
										
											2026-04-07 22:24:36 -07:00
-												refactor: unify vision backend gating

											
										
										
											2026-03-14 20:22:13 -07:00
+								        logger.debug("Auxiliary vision client: none available")
 								        return None, None, None
 								    if requested in _VISION_AUTO_PROVIDER_ORDER:
-												fix(copilot): send vision header for Copilot vision requests

Thread a vision-request flag through auxiliary provider resolution so Copilot clients can include Copilot-Vision-Request only for vision tasks. This preserves normal text requests while ensuring Copilot vision payloads reach the vision-capable route.

Add regression coverage for Copilot vision routing and keep cached text and vision clients separate so a text client without the header is not reused for vision.

Co-authored-by: dhabibi <9087935+dhabibi@users.noreply.github.com>

											
										
										
											2026-04-26 21:29:55 +00:00
+								        sync_client, default_model = _resolve_strict_vision_backend(
 								            requested, resolved_model
 								        )
-												refactor: unify vision backend gating

											
										
										
											2026-03-14 20:22:13 -07:00
+								        return _finalize(requested, sync_client, default_model)
-												fix(agent): propagate api_mode to vision provider resolution

resolve_vision_provider_client() computed resolved_api_mode from config
but never passed it to downstream resolve_provider_client() or
_get_cached_client() calls, causing custom providers with
api_mode: anthropic_messages to crash when used for vision tasks.

Also remove the for_vision special case in _normalize_aux_provider()
that incorrectly discarded named custom provider identifiers.

Fixes #8857

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

											
										
										
											2026-04-13 16:08:19 +08:00
+								    client, final_model = _get_cached_client(requested, resolved_model, async_mode,
-												fix(copilot): send vision header for Copilot vision requests

Thread a vision-request flag through auxiliary provider resolution so Copilot clients can include Copilot-Vision-Request only for vision tasks. This preserves normal text requests while ensuring Copilot vision payloads reach the vision-capable route.

Add regression coverage for Copilot vision routing and keep cached text and vision clients separate so a text client without the header is not reused for vision.

Co-authored-by: dhabibi <9087935+dhabibi@users.noreply.github.com>

											
										
										
											2026-04-26 21:29:55 +00:00
+								                                             api_mode=resolved_api_mode,
 								                                             is_vision=True)
-												refactor: unify vision backend gating

											
										
										
											2026-03-14 20:22:13 -07:00
+								    if client is None:
 								        return requested, None, None
 								    return requested, client, final_model
-												refactor: integrate Nous Portal support in auxiliary client

- Added functionality to include product attribution tags for Nous Portal in auxiliary API calls.
- Introduced a mechanism to determine if the auxiliary client is backed by Nous Portal, affecting the extra body of requests.
- Updated various tools to utilize the new extra body configuration for enhanced tracking in API calls.

											
										
										
											2026-02-25 18:39:36 -08:00
+								def get_auxiliary_extra_body() -> dict:
 								    """Return extra_body kwargs for auxiliary API calls.
 								    Includes Nous Portal product tags when the auxiliary client is backed
 								    by Nous Portal. Returns empty dict otherwise.
 								    """
 								    return dict(NOUS_EXTRA_BODY) if auxiliary_is_nous else {}
-												feat(api): implement dynamic max tokens handling for various providers

- Added _max_tokens_param method in AIAgent to return appropriate max tokens parameter based on the provider (OpenAI vs. others).
- Updated API calls in AIAgent to utilize the new max tokens handling.
- Introduced auxiliary_max_tokens_param function in auxiliary_client for consistent max tokens management across auxiliary clients.
- Refactored multiple tools to use auxiliary_max_tokens_param for improved compatibility with different models and providers.

											
										
										
											2026-02-26 20:23:56 -08:00
 								def auxiliary_max_tokens_param(value: int) -> dict:
 								    """Return the correct max tokens kwarg for the auxiliary client's provider.
 								    OpenRouter and local models use 'max_tokens'. Direct OpenAI with newer
 								    models (gpt-4o, o-series, gpt-5+) requires 'max_completion_tokens'.
-												refactor(cli): Finalize OpenAI Codex Integration with OAuth

- Enhanced Codex model discovery by fetching available models from the API, with fallback to local cache and defaults.
- Updated the context compressor's summary target tokens to 2500 for improved performance.
- Added external credential detection for Codex CLI to streamline authentication.
- Refactored various components to ensure consistent handling of authentication and model selection across the application.

											
										
										
											2026-02-28 21:47:51 -08:00
+								    The Codex adapter translates max_tokens internally, so we use max_tokens
 								    for it as well.
-												feat(api): implement dynamic max tokens handling for various providers

- Added _max_tokens_param method in AIAgent to return appropriate max tokens parameter based on the provider (OpenAI vs. others).
- Updated API calls in AIAgent to utilize the new max tokens handling.
- Introduced auxiliary_max_tokens_param function in auxiliary_client for consistent max tokens management across auxiliary clients.
- Refactored multiple tools to use auxiliary_max_tokens_param for improved compatibility with different models and providers.

											
										
										
											2026-02-26 20:23:56 -08:00
+								    """
-												fix: restore config-saved custom endpoint resolution

											
										
										
											2026-03-14 20:58:12 -07:00
+								    custom_base = _current_custom_base_url()
-												feat(api): implement dynamic max tokens handling for various providers

- Added _max_tokens_param method in AIAgent to return appropriate max tokens parameter based on the provider (OpenAI vs. others).
- Updated API calls in AIAgent to utilize the new max tokens handling.
- Introduced auxiliary_max_tokens_param function in auxiliary_client for consistent max tokens management across auxiliary clients.
- Refactored multiple tools to use auxiliary_max_tokens_param for improved compatibility with different models and providers.

											
										
										
											2026-02-26 20:23:56 -08:00
+								    or_key = os.getenv("OPENROUTER_API_KEY")
-												refactor(cli): Finalize OpenAI Codex Integration with OAuth

- Enhanced Codex model discovery by fetching available models from the API, with fallback to local cache and defaults.
- Updated the context compressor's summary target tokens to 2500 for improved performance.
- Added external credential detection for Codex CLI to streamline authentication.
- Refactored various components to ensure consistent handling of authentication and model selection across the application.

											
										
										
											2026-02-28 21:47:51 -08:00
+								    # Only use max_completion_tokens for direct OpenAI custom endpoints
-												feat(api): implement dynamic max tokens handling for various providers

- Added _max_tokens_param method in AIAgent to return appropriate max tokens parameter based on the provider (OpenAI vs. others).
- Updated API calls in AIAgent to utilize the new max tokens handling.
- Introduced auxiliary_max_tokens_param function in auxiliary_client for consistent max tokens management across auxiliary clients.
- Refactored multiple tools to use auxiliary_max_tokens_param for improved compatibility with different models and providers.

											
										
										
											2026-02-26 20:23:56 -08:00
+								    if (not or_key
 								            and _read_nous_auth() is None
-												fix: extend hostname-match provider detection across remaining call sites

Aslaaen's fix in the original PR covered _detect_api_mode_for_url and the
two openai/xai sites in run_agent.py. This finishes the sweep: the same
substring-match false-positive class (e.g. https://api.openai.com.evil/v1,
https://proxy/api.openai.com/v1, https://api.anthropic.com.example/v1)
existed in eight more call sites, and the hostname helper was duplicated
in two modules.

- utils: add shared base_url_hostname() (single source of truth).
- hermes_cli/runtime_provider, run_agent: drop local duplicates, import
  from utils. Reuse the cached AIAgent._base_url_hostname attribute
  everywhere it's already populated.
- agent/auxiliary_client: switch codex-wrap auto-detect, max_completion_tokens
  gate (auxiliary_max_tokens_param), and custom-endpoint max_tokens kwarg
  selection to hostname equality.
- run_agent: native-anthropic check in the Claude-style model branch
  and in the AIAgent init provider-auto-detect branch.
- agent/model_metadata: Anthropic /v1/models context-length lookup.
- hermes_cli/providers.determine_api_mode: anthropic / openai URL
  heuristics for custom/unknown providers (the /anthropic path-suffix
  convention for third-party gateways is preserved).
- tools/delegate_tool: anthropic detection for delegated subagent
  runtimes.
- hermes_cli/setup, hermes_cli/tools_config: setup-wizard vision-endpoint
  native-OpenAI detection (paired with deduping the repeated check into
  a single is_native_openai boolean per branch).

Tests:
- tests/test_base_url_hostname.py covers the helper directly
  (path-containing-host, host-suffix, trailing dot, port, case).
- tests/hermes_cli/test_determine_api_mode_hostname.py adds the same
  regression class for determine_api_mode, plus a test that the
  /anthropic third-party gateway convention still wins.

Also: add asslaenn5@gmail.com → Aslaaen to scripts/release.py AUTHOR_MAP.

											
										
										
											2026-04-20 20:58:01 -07:00
+								            and base_url_hostname(custom_base) == "api.openai.com"):
-												feat(api): implement dynamic max tokens handling for various providers

- Added _max_tokens_param method in AIAgent to return appropriate max tokens parameter based on the provider (OpenAI vs. others).
- Updated API calls in AIAgent to utilize the new max tokens handling.
- Introduced auxiliary_max_tokens_param function in auxiliary_client for consistent max tokens management across auxiliary clients.
- Refactored multiple tools to use auxiliary_max_tokens_param for improved compatibility with different models and providers.

											
										
										
											2026-02-26 20:23:56 -08:00
+								        return {"max_completion_tokens": value}
 								    return {"max_tokens": value}
-												feat: call_llm/async_call_llm + config slots + migrate all consumers

Add centralized call_llm() and async_call_llm() functions that own the
full LLM request lifecycle:
  1. Resolve provider + model from task config or explicit args
  2. Get or create a cached client for that provider
  3. Format request args (max_tokens handling, provider extra_body)
  4. Make the API call with max_tokens/max_completion_tokens retry
  5. Return the response

Config: expanded auxiliary section with provider:model slots for all
tasks (compression, vision, web_extract, session_search, skills_hub,
mcp, flush_memories). Config version bumped to 7.

Migrated all auxiliary consumers:
- context_compressor.py: uses call_llm(task='compression')
- vision_tools.py: uses async_call_llm(task='vision')
- web_tools.py: uses async_call_llm(task='web_extract')
- session_search_tool.py: uses async_call_llm(task='session_search')
- browser_tool.py: uses call_llm(task='vision'/'web_extract')
- mcp_tool.py: uses call_llm(task='mcp')
- skills_guard.py: uses call_llm(provider='openrouter')
- run_agent.py flush_memories: uses call_llm(task='flush_memories')

Tests updated for context_compressor and MCP tool. Some test mocks
still need updating (15 remaining failures from mock pattern changes,
2 pre-existing).

											
										
										
											2026-03-11 20:52:19 -07:00
 								# ── Centralized LLM Call API ────────────────────────────────────────────────
 								#
 								# call_llm() and async_call_llm() own the full request lifecycle:
 								#   1. Resolve provider + model from task config (or explicit args)
 								#   2. Get or create a cached client for that provider
 								#   3. Format request args for the provider + model (max_tokens handling, etc.)
 								#   4. Make the API call
 								#   5. Return the response
 								#
 								# Every auxiliary LLM consumer should use these instead of manually
 								# constructing clients and calling .chat.completions.create().
-												fix: bound auxiliary client cache to prevent fd exhaustion in long-running gateways (#10200) (#10470)

The _client_cache used event loop id() as part of the cache key, so
every new worker-thread event loop created a new entry for the same
provider config.  In long-running gateways where threads are recycled
frequently, this caused unbounded cache growth — each stale entry
held an unclosed AsyncOpenAI client with its httpx connection pool,
eventually exhausting file descriptors.

Fix: remove loop_id from the cache key and instead validate on each
async cache hit that the cached loop is the current, open loop.  If
the loop changed or was closed, the stale entry is replaced in-place
rather than creating an additional entry.  This bounds cache growth
to at most one entry per unique provider config.

Also adds a _CLIENT_CACHE_MAX_SIZE (64) safety belt with FIFO
eviction as defense-in-depth against any remaining unbounded growth.

Cross-loop safety is preserved: different event loops still get
different client instances (validated by existing test suite).

Closes #10200
											
										
										
											2026-04-15 13:16:28 -07:00
+								# Client cache: (provider, async_mode, base_url, api_key, api_mode, runtime_key) -> (client, default_model, loop)
 								# NOTE: loop identity is NOT part of the key.  On async cache hits we check
 								# whether the cached loop is the *current* loop; if not, the stale entry is
 								# replaced in-place.  This bounds cache growth to one entry per unique
 								# provider config rather than one per (config × event-loop), which previously
 								# caused unbounded fd accumulation in long-running gateway processes (#10200).
-												feat: call_llm/async_call_llm + config slots + migrate all consumers

Add centralized call_llm() and async_call_llm() functions that own the
full LLM request lifecycle:
  1. Resolve provider + model from task config or explicit args
  2. Get or create a cached client for that provider
  3. Format request args (max_tokens handling, provider extra_body)
  4. Make the API call with max_tokens/max_completion_tokens retry
  5. Return the response

Config: expanded auxiliary section with provider:model slots for all
tasks (compression, vision, web_extract, session_search, skills_hub,
mcp, flush_memories). Config version bumped to 7.

Migrated all auxiliary consumers:
- context_compressor.py: uses call_llm(task='compression')
- vision_tools.py: uses async_call_llm(task='vision')
- web_tools.py: uses async_call_llm(task='web_extract')
- session_search_tool.py: uses async_call_llm(task='session_search')
- browser_tool.py: uses call_llm(task='vision'/'web_extract')
- mcp_tool.py: uses call_llm(task='mcp')
- skills_guard.py: uses call_llm(provider='openrouter')
- run_agent.py flush_memories: uses call_llm(task='flush_memories')

Tests updated for context_compressor and MCP tool. Some test mocks
still need updating (15 remaining failures from mock pattern changes,
2 pre-existing).

											
										
										
											2026-03-11 20:52:19 -07:00
+								_client_cache: Dict[tuple, tuple] = {}
-												fix: thread safety for concurrent subagent delegation (#1672)

* fix: thread safety for concurrent subagent delegation

Four thread-safety fixes that prevent crashes and data races when
running multiple subagents concurrently via delegate_task:

1. Remove redirect_stdout/stderr from delegate_tool — mutating global
   sys.stdout races with the spinner thread when multiple children start
   concurrently, causing segfaults. Children already run with
   quiet_mode=True so the redirect was redundant.

2. Split _run_single_child into _build_child_agent (main thread) +
   _run_single_child (worker thread). AIAgent construction creates
   httpx/SSL clients which are not thread-safe to initialize
   concurrently.

3. Add threading.Lock to SessionDB — subagents share the parent's
   SessionDB and call create_session/append_message from worker threads
   with no synchronization.

4. Add _active_children_lock to AIAgent — interrupt() iterates
   _active_children while worker threads append/remove children.

5. Add _client_cache_lock to auxiliary_client — multiple subagent
   threads may resolve clients concurrently via call_llm().

Based on PR #1471 by peteromallet.

* feat: Honcho base_url override via config.yaml + quick command alias type

Two features salvaged from PR #1576:

1. Honcho base_url override: allows pointing Hermes at a remote
   self-hosted Honcho deployment via config.yaml:

     honcho:
       base_url: "http://192.168.x.x:8000"

   When set, this overrides the Honcho SDK's environment mapping
   (production/local), enabling LAN/VPN Honcho deployments without
   requiring the server to live on localhost. Uses config.yaml instead
   of env var (HONCHO_URL) per project convention.

2. Quick command alias type: adds a new 'alias' quick command type
   that rewrites to another slash command before normal dispatch:

     quick_commands:
       sc:
         type: alias
         target: /context

   Supports both CLI and gateway. Arguments are forwarded to the
   target command.

Based on PR #1576 by redhelix.

---------

Co-authored-by: peteromallet <peteromallet@users.noreply.github.com>
Co-authored-by: redhelix <redhelix@users.noreply.github.com>
											
										
										
											2026-03-17 02:53:33 -07:00
+								_client_cache_lock = threading.Lock()
-												fix: bound auxiliary client cache to prevent fd exhaustion in long-running gateways (#10200) (#10470)

The _client_cache used event loop id() as part of the cache key, so
every new worker-thread event loop created a new entry for the same
provider config.  In long-running gateways where threads are recycled
frequently, this caused unbounded cache growth — each stale entry
held an unclosed AsyncOpenAI client with its httpx connection pool,
eventually exhausting file descriptors.

Fix: remove loop_id from the cache key and instead validate on each
async cache hit that the cached loop is the current, open loop.  If
the loop changed or was closed, the stale entry is replaced in-place
rather than creating an additional entry.  This bounds cache growth
to at most one entry per unique provider config.

Also adds a _CLIENT_CACHE_MAX_SIZE (64) safety belt with FIFO
eviction as defense-in-depth against any remaining unbounded growth.

Cross-loop safety is preserved: different event loops still get
different client instances (validated by existing test suite).

Closes #10200
											
										
										
											2026-04-15 13:16:28 -07:00
+								_CLIENT_CACHE_MAX_SIZE = 64  # safety belt — evict oldest when exceeded
-												feat: call_llm/async_call_llm + config slots + migrate all consumers

Add centralized call_llm() and async_call_llm() functions that own the
full LLM request lifecycle:
  1. Resolve provider + model from task config or explicit args
  2. Get or create a cached client for that provider
  3. Format request args (max_tokens handling, provider extra_body)
  4. Make the API call with max_tokens/max_completion_tokens retry
  5. Return the response

Config: expanded auxiliary section with provider:model slots for all
tasks (compression, vision, web_extract, session_search, skills_hub,
mcp, flush_memories). Config version bumped to 7.

Migrated all auxiliary consumers:
- context_compressor.py: uses call_llm(task='compression')
- vision_tools.py: uses async_call_llm(task='vision')
- web_tools.py: uses async_call_llm(task='web_extract')
- session_search_tool.py: uses async_call_llm(task='session_search')
- browser_tool.py: uses call_llm(task='vision'/'web_extract')
- mcp_tool.py: uses call_llm(task='mcp')
- skills_guard.py: uses call_llm(provider='openrouter')
- run_agent.py flush_memories: uses call_llm(task='flush_memories')

Tests updated for context_compressor and MCP tool. Some test mocks
still need updating (15 remaining failures from mock pattern changes,
2 pre-existing).

											
										
										
											2026-03-11 20:52:19 -07:00
-												fix(auxiliary): refresh Nous runtime credentials after aux 401s

											
										
										
											2026-04-21 14:45:13 -06:00
+								def _client_cache_key(
 								    provider: str,
 								    *,
 								    async_mode: bool,
 								    base_url: Optional[str] = None,
 								    api_key: Optional[str] = None,
 								    api_mode: Optional[str] = None,
 								    main_runtime: Optional[Dict[str, Any]] = None,
-												fix(copilot): send vision header for Copilot vision requests

Thread a vision-request flag through auxiliary provider resolution so Copilot clients can include Copilot-Vision-Request only for vision tasks. This preserves normal text requests while ensuring Copilot vision payloads reach the vision-capable route.

Add regression coverage for Copilot vision routing and keep cached text and vision clients separate so a text client without the header is not reused for vision.

Co-authored-by: dhabibi <9087935+dhabibi@users.noreply.github.com>

											
										
										
											2026-04-26 21:29:55 +00:00
+								    is_vision: bool = False,
-												fix(auxiliary): refresh Nous runtime credentials after aux 401s

											
										
										
											2026-04-21 14:45:13 -06:00
+								) -> tuple:
 								    runtime = _normalize_main_runtime(main_runtime)
 								    runtime_key = tuple(runtime.get(field, "") for field in _MAIN_RUNTIME_FIELDS) if provider == "auto" else ()
-												fix(copilot): send vision header for Copilot vision requests

Thread a vision-request flag through auxiliary provider resolution so Copilot clients can include Copilot-Vision-Request only for vision tasks. This preserves normal text requests while ensuring Copilot vision payloads reach the vision-capable route.

Add regression coverage for Copilot vision routing and keep cached text and vision clients separate so a text client without the header is not reused for vision.

Co-authored-by: dhabibi <9087935+dhabibi@users.noreply.github.com>

											
										
										
											2026-04-26 21:29:55 +00:00
+								    return (provider, async_mode, base_url or "", api_key or "", api_mode or "", runtime_key, is_vision)
-												fix(auxiliary): refresh Nous runtime credentials after aux 401s

											
										
										
											2026-04-21 14:45:13 -06:00
 								def _store_cached_client(cache_key: tuple, client: Any, default_model: Optional[str], *, bound_loop: Any = None) -> None:
 								    with _client_cache_lock:
 								        old_entry = _client_cache.get(cache_key)
 								        if old_entry is not None and old_entry[0] is not client:
 								            _force_close_async_httpx(old_entry[0])
 								            try:
 								                close_fn = getattr(old_entry[0], "close", None)
 								                if callable(close_fn):
 								                    close_fn()
 								            except Exception:
 								                pass
 								        _client_cache[cache_key] = (client, default_model, bound_loop)
 								def _refresh_nous_auxiliary_client(
 								    *,
 								    cache_provider: str,
 								    model: Optional[str],
 								    async_mode: bool,
 								    base_url: Optional[str] = None,
 								    api_key: Optional[str] = None,
 								    api_mode: Optional[str] = None,
 								    main_runtime: Optional[Dict[str, Any]] = None,
-												fix(copilot): send vision header for Copilot vision requests

Thread a vision-request flag through auxiliary provider resolution so Copilot clients can include Copilot-Vision-Request only for vision tasks. This preserves normal text requests while ensuring Copilot vision payloads reach the vision-capable route.

Add regression coverage for Copilot vision routing and keep cached text and vision clients separate so a text client without the header is not reused for vision.

Co-authored-by: dhabibi <9087935+dhabibi@users.noreply.github.com>

											
										
										
											2026-04-26 21:29:55 +00:00
+								    is_vision: bool = False,
-												fix(auxiliary): refresh Nous runtime credentials after aux 401s

											
										
										
											2026-04-21 14:45:13 -06:00
+								) -> Tuple[Optional[Any], Optional[str]]:
 								    """Refresh Nous runtime creds, rebuild the client, and replace the cache entry."""
 								    runtime = _resolve_nous_runtime_api(force_refresh=True)
 								    if runtime is None:
 								        return None, model
 								    fresh_key, fresh_base_url = runtime
 								    sync_client = OpenAI(api_key=fresh_key, base_url=fresh_base_url)
 								    final_model = model
 								    current_loop = None
 								    if async_mode:
 								        try:
 								            import asyncio as _aio
 								            current_loop = _aio.get_event_loop()
 								        except RuntimeError:
 								            pass
-												fix(copilot): send vision header for Copilot vision requests

Thread a vision-request flag through auxiliary provider resolution so Copilot clients can include Copilot-Vision-Request only for vision tasks. This preserves normal text requests while ensuring Copilot vision payloads reach the vision-capable route.

Add regression coverage for Copilot vision routing and keep cached text and vision clients separate so a text client without the header is not reused for vision.

Co-authored-by: dhabibi <9087935+dhabibi@users.noreply.github.com>

											
										
										
											2026-04-26 21:29:55 +00:00
+								        client, final_model = _to_async_client(sync_client, final_model or "", is_vision=is_vision)
-												fix(auxiliary): refresh Nous runtime credentials after aux 401s

											
										
										
											2026-04-21 14:45:13 -06:00
+								    else:
 								        client = sync_client
 								    cache_key = _client_cache_key(
 								        cache_provider,
 								        async_mode=async_mode,
 								        base_url=base_url,
 								        api_key=api_key,
 								        api_mode=api_mode,
 								        main_runtime=main_runtime,
-												fix(copilot): send vision header for Copilot vision requests

Thread a vision-request flag through auxiliary provider resolution so Copilot clients can include Copilot-Vision-Request only for vision tasks. This preserves normal text requests while ensuring Copilot vision payloads reach the vision-capable route.

Add regression coverage for Copilot vision routing and keep cached text and vision clients separate so a text client without the header is not reused for vision.

Co-authored-by: dhabibi <9087935+dhabibi@users.noreply.github.com>

											
										
										
											2026-04-26 21:29:55 +00:00
+								        is_vision=is_vision,
-												fix(auxiliary): refresh Nous runtime credentials after aux 401s

											
										
										
											2026-04-21 14:45:13 -06:00
+								    )
 								    _store_cached_client(cache_key, client, final_model, bound_loop=current_loop)
 								    return client, final_model
-												fix: eliminate 'Event loop is closed' / 'Press ENTER to continue' during idle (#3398)

The OpenAI SDK's AsyncHttpxClientWrapper.__del__ schedules aclose() via
asyncio.get_running_loop().create_task().  When an AsyncOpenAI client is
garbage-collected while prompt_toolkit's event loop is running (the common
CLI idle state), the aclose() task runs on prompt_toolkit's loop but the
underlying TCP transport is bound to a different (dead) worker loop.
The transport's self._loop.call_soon() then raises RuntimeError('Event
loop is closed'), which prompt_toolkit surfaces as the disruptive
'Unhandled exception in event loop ... Press ENTER to continue...' error.

Three-layer fix:

1. neuter_async_httpx_del(): Monkey-patches __del__ to a no-op at CLI
   startup before any AsyncOpenAI clients are created.  Safe because
   cached clients are explicitly cleaned via _force_close_async_httpx,
   and uncached clients' TCP connections are cleaned by the OS on exit.

2. Custom asyncio exception handler: Installed on prompt_toolkit's event
   loop to silently suppress 'Event loop is closed' RuntimeError.
   Defense-in-depth for SDK upgrades that might change the class name.

3. cleanup_stale_async_clients(): Called after each agent turn (when the
   agent thread joins) to proactively evict cache entries whose event
   loop is closed, preventing stale clients from accumulating.
											
										
										
											2026-03-27 09:45:25 -07:00
+								def neuter_async_httpx_del() -> None:
 								    """Monkey-patch ``AsyncHttpxClientWrapper.__del__`` to be a no-op.
 								    The OpenAI SDK's ``AsyncHttpxClientWrapper.__del__`` schedules
 								    ``self.aclose()`` via ``asyncio.get_running_loop().create_task()``.
 								    When an ``AsyncOpenAI`` client is garbage-collected while
 								    prompt_toolkit's event loop is running (the common CLI idle state),
 								    the ``aclose()`` task runs on prompt_toolkit's loop but the
 								    underlying TCP transport is bound to a *different* loop (the worker
 								    thread's loop that the client was originally created on).  If that
 								    loop is closed or its thread is dead, the transport's
 								    ``self._loop.call_soon()`` raises ``RuntimeError("Event loop is
 								    closed")``, which prompt_toolkit surfaces as "Unhandled exception
 								    in event loop ... Press ENTER to continue...".
 								    Neutering ``__del__`` is safe because:
 								    - Cached clients are explicitly cleaned via ``_force_close_async_httpx``
 								      on stale-loop detection and ``shutdown_cached_clients`` on exit.
 								    - Uncached clients' TCP connections are cleaned up by the OS when the
 								      process exits.
 								    - The OpenAI SDK itself marks this as a TODO (``# TODO(someday):
 								      support non asyncio runtimes here``).
 								    Call this once at CLI startup, before any ``AsyncOpenAI`` clients are
 								    created.
 								    """
 								    try:
 								        from openai._base_client import AsyncHttpxClientWrapper
 								        AsyncHttpxClientWrapper.__del__ = lambda self: None  # type: ignore[assignment]
 								    except (ImportError, AttributeError):
 								        pass  # Graceful degradation if the SDK changes its internals
-												fix(cli): prevent 'Press ENTER to continue...' on exit

When AsyncOpenAI clients are garbage-collected after the event loop
closes, their AsyncHttpxClientWrapper.__del__ tries to schedule
aclose() on the dead loop, causing RuntimeError: Event loop is closed.
prompt_toolkit catches this as an unhandled exception and shows
'Press ENTER to continue...' which blocks CLI exit.

Fix: Add shutdown_cached_clients() to auxiliary_client.py that marks
all cached async clients' underlying httpx transport as CLOSED before
GC runs. This prevents __del__ from attempting the aclose() call.

- _force_close_async_httpx(): sets httpx AsyncClient._state to CLOSED
- shutdown_cached_clients(): iterates _client_cache, closes sync clients
  normally and marks async clients as closed
- Also fix stale client eviction in _get_cached_client to mark evicted
  async clients as closed (was just del-ing them, triggering __del__)
- Call shutdown_cached_clients() from _run_cleanup() in cli.py

											
										
										
											2026-03-22 15:31:54 -07:00
+								def _force_close_async_httpx(client: Any) -> None:
 								    """Mark the httpx AsyncClient inside an AsyncOpenAI client as closed.
 								    This prevents ``AsyncHttpxClientWrapper.__del__`` from scheduling
 								    ``aclose()`` on a (potentially closed) event loop, which causes
 								    ``RuntimeError: Event loop is closed`` → prompt_toolkit's
 								    "Press ENTER to continue..." handler.
 								    We intentionally do NOT run the full async close path — the
 								    connections will be dropped by the OS when the process exits.
 								    """
 								    try:
 								        from httpx._client import ClientState
 								        inner = getattr(client, "_client", None)
 								        if inner is not None and not getattr(inner, "is_closed", True):
 								            inner._state = ClientState.CLOSED
 								    except Exception:
 								        pass
 								def shutdown_cached_clients() -> None:
 								    """Close all cached clients (sync and async) to prevent event-loop errors.
 								    Call this during CLI shutdown, *before* the event loop is closed, to
 								    avoid ``AsyncHttpxClientWrapper.__del__`` raising on a dead loop.
 								    """
 								    import inspect
 								    with _client_cache_lock:
 								        for key, entry in list(_client_cache.items()):
 								            client = entry[0]
 								            if client is None:
 								                continue
 								            # Mark any async httpx transport as closed first (prevents __del__
 								            # from scheduling aclose() on a dead event loop).
 								            _force_close_async_httpx(client)
 								            # Sync clients: close the httpx connection pool cleanly.
 								            # Async clients: skip — we already neutered __del__ above.
 								            try:
 								                close_fn = getattr(client, "close", None)
 								                if close_fn and not inspect.iscoroutinefunction(close_fn):
 								                    close_fn()
 								            except Exception:
 								                pass
 								        _client_cache.clear()
-												fix: eliminate 'Event loop is closed' / 'Press ENTER to continue' during idle (#3398)

The OpenAI SDK's AsyncHttpxClientWrapper.__del__ schedules aclose() via
asyncio.get_running_loop().create_task().  When an AsyncOpenAI client is
garbage-collected while prompt_toolkit's event loop is running (the common
CLI idle state), the aclose() task runs on prompt_toolkit's loop but the
underlying TCP transport is bound to a different (dead) worker loop.
The transport's self._loop.call_soon() then raises RuntimeError('Event
loop is closed'), which prompt_toolkit surfaces as the disruptive
'Unhandled exception in event loop ... Press ENTER to continue...' error.

Three-layer fix:

1. neuter_async_httpx_del(): Monkey-patches __del__ to a no-op at CLI
   startup before any AsyncOpenAI clients are created.  Safe because
   cached clients are explicitly cleaned via _force_close_async_httpx,
   and uncached clients' TCP connections are cleaned by the OS on exit.

2. Custom asyncio exception handler: Installed on prompt_toolkit's event
   loop to silently suppress 'Event loop is closed' RuntimeError.
   Defense-in-depth for SDK upgrades that might change the class name.

3. cleanup_stale_async_clients(): Called after each agent turn (when the
   agent thread joins) to proactively evict cache entries whose event
   loop is closed, preventing stale clients from accumulating.
											
										
										
											2026-03-27 09:45:25 -07:00
+								def cleanup_stale_async_clients() -> None:
 								    """Force-close cached async clients whose event loop is closed.
 								    Call this after each agent turn to proactively clean up stale clients
 								    before GC can trigger ``AsyncHttpxClientWrapper.__del__`` on them.
 								    This is defense-in-depth — the primary fix is ``neuter_async_httpx_del``
 								    which disables ``__del__`` entirely.
 								    """
 								    with _client_cache_lock:
 								        stale_keys = []
 								        for key, entry in _client_cache.items():
 								            client, _default, cached_loop = entry
 								            if cached_loop is not None and cached_loop.is_closed():
 								                _force_close_async_httpx(client)
 								                stale_keys.append(key)
 								        for key in stale_keys:
 								            del _client_cache[key]
-												fix: drop incompatible model slugs on auxiliary client cache hit

`resolve_provider_client()` already drops OpenRouter-format model slugs
(containing "/") when the resolved provider is not OpenRouter (line 1097).
However, `_get_cached_client()` returns `model or cached_default` directly
on cache hits, bypassing this check entirely.

When the main provider is openai-codex, the auto-detection chain (Step 1
of `_resolve_auto`) caches a CodexAuxiliaryClient. Subsequent auxiliary
calls for different tasks (e.g. compression with `summary_model:
google/gemini-3-flash-preview`) hit the cache and pass the OpenRouter-
format model slug straight to the Codex Responses API, which does not
understand it and returns an empty `response.output`.

This causes two user-visible failures:
- "Invalid API response shape" (empty output after 3 retries)
- "Context length exceeded, cannot compress further" (compression itself
  fails through the same path)

Add `_compat_model()` helper that mirrors the "/" check from
`resolve_provider_client()` and call it on the cache-hit return path.

											
										
										
											2026-04-07 18:07:08 +08:00
+								def _is_openrouter_client(client: Any) -> bool:
 								    for obj in (client, getattr(client, "_client", None), getattr(client, "client", None)):
-												fix: sweep remaining provider-URL substring checks across codebase

Completes the hostname-hardening sweep — every substring check against a
provider host in live-routing code is now hostname-based. This closes the
same false-positive class for OpenRouter, GitHub Copilot, Kimi, Qwen,
ChatGPT/Codex, Bedrock, GitHub Models, Vercel AI Gateway, Nous, Z.AI,
Moonshot, Arcee, and MiniMax that the original PR closed for OpenAI, xAI,
and Anthropic.

New helper:
- utils.base_url_host_matches(base_url, domain) — safe counterpart to
  'domain in base_url'. Accepts hostname equality and subdomain matches;
  rejects path segments, host suffixes, and prefix collisions.

Call sites converted (real-code only; tests, optional-skills, red-teaming
scripts untouched):

run_agent.py (10 sites):
- AIAgent.__init__ Bedrock branch, ChatGPT/Codex branch (also path check)
- header cascade for openrouter / copilot / kimi / qwen / chatgpt
- interleaved-thinking trigger (openrouter + claude)
- _is_openrouter_url(), _is_qwen_portal()
- is_native_anthropic check
- github-models-vs-copilot detection (3 sites)
- reasoning-capable route gate (nousresearch, vercel, github)
- codex-backend detection in API kwargs build
- fallback api_mode Bedrock detection

agent/auxiliary_client.py (7 sites):
- extra-headers cascades in 4 distinct client-construction paths
  (resolve custom, resolve auto, OpenRouter-fallback-to-custom,
  _async_client_from_sync, resolve_provider_client explicit-custom,
  resolve_auto_with_codex)
- _is_openrouter_client() base_url sniff

agent/usage_pricing.py:
- resolve_billing_route openrouter branch

agent/model_metadata.py:
- _is_openrouter_base_url(), Bedrock context-length lookup

hermes_cli/providers.py:
- determine_api_mode Bedrock heuristic

hermes_cli/runtime_provider.py:
- _is_openrouter_url flag for API-key preference (issues #420, #560)

hermes_cli/doctor.py:
- Kimi User-Agent header for /models probes

tools/delegate_tool.py:
- subagent Codex endpoint detection

trajectory_compressor.py:
- _detect_provider() cascade (8 providers: openrouter, nous, codex, zai,
  kimi-coding, arcee, minimax-cn, minimax)

cli.py, gateway/run.py:
- /model-switch cache-enabled hint (openrouter + claude)

Bedrock detection tightened from 'bedrock-runtime in url' to
'hostname starts with bedrock-runtime. AND host is under amazonaws.com'.
ChatGPT/Codex detection tightened from 'chatgpt.com/backend-api/codex in
url' to 'hostname is chatgpt.com AND path contains /backend-api/codex'.

Tests:
- tests/test_base_url_hostname.py extended with a base_url_host_matches
  suite (exact match, subdomain, path-segment rejection, host-suffix
  rejection, host-prefix rejection, empty-input, case-insensitivity,
  trailing dot).

Validation: 651 targeted tests pass (runtime_provider, minimax, bedrock,
gemini, auxiliary, codex_cloudflare, usage_pricing, compressor_fallback,
fallback_model, openai_client_lifecycle, provider_parity, cli_provider_resolution,
delegate, credential_pool, context_compressor, plus the 4 hostname test
modules). 26-assertion E2E call-site verification across 6 modules passes.

											
										
										
											2026-04-20 21:17:28 -07:00
+								        if obj and base_url_host_matches(str(getattr(obj, "base_url", "") or ""), "openrouter.ai"):
-												fix: drop incompatible model slugs on auxiliary client cache hit

`resolve_provider_client()` already drops OpenRouter-format model slugs
(containing "/") when the resolved provider is not OpenRouter (line 1097).
However, `_get_cached_client()` returns `model or cached_default` directly
on cache hits, bypassing this check entirely.

When the main provider is openai-codex, the auto-detection chain (Step 1
of `_resolve_auto`) caches a CodexAuxiliaryClient. Subsequent auxiliary
calls for different tasks (e.g. compression with `summary_model:
google/gemini-3-flash-preview`) hit the cache and pass the OpenRouter-
format model slug straight to the Codex Responses API, which does not
understand it and returns an empty `response.output`.

This causes two user-visible failures:
- "Invalid API response shape" (empty output after 3 retries)
- "Context length exceeded, cannot compress further" (compression itself
  fails through the same path)

Add `_compat_model()` helper that mirrors the "/" check from
`resolve_provider_client()` and call it on the cache-hit return path.

											
										
										
											2026-04-07 18:07:08 +08:00
+								            return True
 								    return False
-												feat(providers): add GMI Cloud as a first-class API-key provider (#11955)

Add GMI Cloud (api.gmi-serving.com) as a full first-class API-key provider
with built-in auth, aliases, model catalog, CLI entry points, auxiliary client
routing, context length resolution, doctor checks, env var tracking, and docs.

- auth.py: ProviderConfig for 'gmi' (api_key, GMI_API_KEY / GMI_BASE_URL)
- providers.py: HermesOverlay with extra_env_vars for models.dev detection
- models.py: curated slash-form model catalog; live /v1/models fetch
- main.py: 'gmi' in _named_custom_provider_map and --provider choices
- model_metadata.py: _URL_TO_PROVIDER, _PROVIDER_PREFIXES, dedicated
  context-length probe block (GMI's /models has authoritative data)
- auxiliary_client.py: alias entries; _compat_model fix for slash-form
  models on cached aggregator-style clients; gmi aux default model
- doctor.py: GMI in provider connectivity checks
- config.py: GMI_API_KEY / GMI_BASE_URL in OPTIONAL_ENV_VARS
- conftest.py: explicit GMI_BASE_URL clearing (not caught by _API_KEY suffix)
- docs: providers.md, environment-variables.md, fallback-providers.md,
  configuration.md, quickstart.md (expands provider table)

Co-authored-by: Isaac Huang <isaachuang@Isaacs-MacBook-Pro.local>

											
										
										
											2026-04-17 11:19:56 -07:00
+								def _cached_client_accepts_slash_models(client: Any, cached_default: Optional[str]) -> bool:
 								    """Best-effort check for cached clients that accept ``vendor/model`` IDs."""
 								    if _is_openrouter_client(client):
 								        return True
 								    return bool(cached_default and "/" in cached_default)
-												fix: drop incompatible model slugs on auxiliary client cache hit

`resolve_provider_client()` already drops OpenRouter-format model slugs
(containing "/") when the resolved provider is not OpenRouter (line 1097).
However, `_get_cached_client()` returns `model or cached_default` directly
on cache hits, bypassing this check entirely.

When the main provider is openai-codex, the auto-detection chain (Step 1
of `_resolve_auto`) caches a CodexAuxiliaryClient. Subsequent auxiliary
calls for different tasks (e.g. compression with `summary_model:
google/gemini-3-flash-preview`) hit the cache and pass the OpenRouter-
format model slug straight to the Codex Responses API, which does not
understand it and returns an empty `response.output`.

This causes two user-visible failures:
- "Invalid API response shape" (empty output after 3 retries)
- "Context length exceeded, cannot compress further" (compression itself
  fails through the same path)

Add `_compat_model()` helper that mirrors the "/" check from
`resolve_provider_client()` and call it on the cache-hit return path.

											
										
										
											2026-04-07 18:07:08 +08:00
+								def _compat_model(client: Any, model: Optional[str], cached_default: Optional[str]) -> Optional[str]:
-												feat(providers): add GMI Cloud as a first-class API-key provider (#11955)

Add GMI Cloud (api.gmi-serving.com) as a full first-class API-key provider
with built-in auth, aliases, model catalog, CLI entry points, auxiliary client
routing, context length resolution, doctor checks, env var tracking, and docs.

- auth.py: ProviderConfig for 'gmi' (api_key, GMI_API_KEY / GMI_BASE_URL)
- providers.py: HermesOverlay with extra_env_vars for models.dev detection
- models.py: curated slash-form model catalog; live /v1/models fetch
- main.py: 'gmi' in _named_custom_provider_map and --provider choices
- model_metadata.py: _URL_TO_PROVIDER, _PROVIDER_PREFIXES, dedicated
  context-length probe block (GMI's /models has authoritative data)
- auxiliary_client.py: alias entries; _compat_model fix for slash-form
  models on cached aggregator-style clients; gmi aux default model
- doctor.py: GMI in provider connectivity checks
- config.py: GMI_API_KEY / GMI_BASE_URL in OPTIONAL_ENV_VARS
- conftest.py: explicit GMI_BASE_URL clearing (not caught by _API_KEY suffix)
- docs: providers.md, environment-variables.md, fallback-providers.md,
  configuration.md, quickstart.md (expands provider table)

Co-authored-by: Isaac Huang <isaachuang@Isaacs-MacBook-Pro.local>

											
										
										
											2026-04-17 11:19:56 -07:00
+								    """Keep slash-bearing model IDs only for cached clients that support them.
-												fix: drop incompatible model slugs on auxiliary client cache hit

`resolve_provider_client()` already drops OpenRouter-format model slugs
(containing "/") when the resolved provider is not OpenRouter (line 1097).
However, `_get_cached_client()` returns `model or cached_default` directly
on cache hits, bypassing this check entirely.

When the main provider is openai-codex, the auto-detection chain (Step 1
of `_resolve_auto`) caches a CodexAuxiliaryClient. Subsequent auxiliary
calls for different tasks (e.g. compression with `summary_model:
google/gemini-3-flash-preview`) hit the cache and pass the OpenRouter-
format model slug straight to the Codex Responses API, which does not
understand it and returns an empty `response.output`.

This causes two user-visible failures:
- "Invalid API response shape" (empty output after 3 retries)
- "Context length exceeded, cannot compress further" (compression itself
  fails through the same path)

Add `_compat_model()` helper that mirrors the "/" check from
`resolve_provider_client()` and call it on the cache-hit return path.

											
										
										
											2026-04-07 18:07:08 +08:00
 								    Mirrors the guard in resolve_provider_client() which is skipped on cache hits.
 								    """
-												feat(providers): add GMI Cloud as a first-class API-key provider (#11955)

Add GMI Cloud (api.gmi-serving.com) as a full first-class API-key provider
with built-in auth, aliases, model catalog, CLI entry points, auxiliary client
routing, context length resolution, doctor checks, env var tracking, and docs.

- auth.py: ProviderConfig for 'gmi' (api_key, GMI_API_KEY / GMI_BASE_URL)
- providers.py: HermesOverlay with extra_env_vars for models.dev detection
- models.py: curated slash-form model catalog; live /v1/models fetch
- main.py: 'gmi' in _named_custom_provider_map and --provider choices
- model_metadata.py: _URL_TO_PROVIDER, _PROVIDER_PREFIXES, dedicated
  context-length probe block (GMI's /models has authoritative data)
- auxiliary_client.py: alias entries; _compat_model fix for slash-form
  models on cached aggregator-style clients; gmi aux default model
- doctor.py: GMI in provider connectivity checks
- config.py: GMI_API_KEY / GMI_BASE_URL in OPTIONAL_ENV_VARS
- conftest.py: explicit GMI_BASE_URL clearing (not caught by _API_KEY suffix)
- docs: providers.md, environment-variables.md, fallback-providers.md,
  configuration.md, quickstart.md (expands provider table)

Co-authored-by: Isaac Huang <isaachuang@Isaacs-MacBook-Pro.local>

											
										
										
											2026-04-17 11:19:56 -07:00
+								    if model and "/" in model and not _cached_client_accepts_slash_models(client, cached_default):
-												fix: drop incompatible model slugs on auxiliary client cache hit

`resolve_provider_client()` already drops OpenRouter-format model slugs
(containing "/") when the resolved provider is not OpenRouter (line 1097).
However, `_get_cached_client()` returns `model or cached_default` directly
on cache hits, bypassing this check entirely.

When the main provider is openai-codex, the auto-detection chain (Step 1
of `_resolve_auto`) caches a CodexAuxiliaryClient. Subsequent auxiliary
calls for different tasks (e.g. compression with `summary_model:
google/gemini-3-flash-preview`) hit the cache and pass the OpenRouter-
format model slug straight to the Codex Responses API, which does not
understand it and returns an empty `response.output`.

This causes two user-visible failures:
- "Invalid API response shape" (empty output after 3 retries)
- "Context length exceeded, cannot compress further" (compression itself
  fails through the same path)

Add `_compat_model()` helper that mirrors the "/" check from
`resolve_provider_client()` and call it on the cache-hit return path.

											
										
										
											2026-04-07 18:07:08 +08:00
+								        return cached_default
 								    return model or cached_default
-												feat: call_llm/async_call_llm + config slots + migrate all consumers

Add centralized call_llm() and async_call_llm() functions that own the
full LLM request lifecycle:
  1. Resolve provider + model from task config or explicit args
  2. Get or create a cached client for that provider
  3. Format request args (max_tokens handling, provider extra_body)
  4. Make the API call with max_tokens/max_completion_tokens retry
  5. Return the response

Config: expanded auxiliary section with provider:model slots for all
tasks (compression, vision, web_extract, session_search, skills_hub,
mcp, flush_memories). Config version bumped to 7.

Migrated all auxiliary consumers:
- context_compressor.py: uses call_llm(task='compression')
- vision_tools.py: uses async_call_llm(task='vision')
- web_tools.py: uses async_call_llm(task='web_extract')
- session_search_tool.py: uses async_call_llm(task='session_search')
- browser_tool.py: uses call_llm(task='vision'/'web_extract')
- mcp_tool.py: uses call_llm(task='mcp')
- skills_guard.py: uses call_llm(provider='openrouter')
- run_agent.py flush_memories: uses call_llm(task='flush_memories')

Tests updated for context_compressor and MCP tool. Some test mocks
still need updating (15 remaining failures from mock pattern changes,
2 pre-existing).

											
										
										
											2026-03-11 20:52:19 -07:00
+								def _get_cached_client(
-												feat: add direct endpoint overrides for auxiliary and delegation

Add base_url/api_key overrides for auxiliary tasks and delegation so users can
route those flows straight to a custom OpenAI-compatible endpoint without
having to rely on provider=main or named custom providers.

Also clear gateway session env vars in test isolation so the full suite stays
deterministic when run from a messaging-backed agent session.

											
										
										
											2026-03-14 20:48:29 -07:00
+								    provider: str,
 								    model: str = None,
 								    async_mode: bool = False,
 								    base_url: str = None,
 								    api_key: str = None,
-												fix(auxiliary): honor api_mode in auxiliary client (#6800)

The auxiliary client always calls client.chat.completions.create(),
ignoring the api_mode config flag. This breaks codex-family models
(e.g. gpt-5.3-codex) on direct OpenAI API keys, which need the
/v1/responses endpoint.

Changes:
- Expand _resolve_task_provider_model to return api_mode (5-tuple)
- Read api_mode from auxiliary.{task}.api_mode config and env vars
  (AUXILIARY_{TASK}_API_MODE)
- Pass api_mode through _get_cached_client to resolve_provider_client
- Add _needs_codex_wrap/_wrap_if_needed helpers that wrap plain OpenAI
  clients in CodexAuxiliaryClient when api_mode=codex_responses or
  when auto-detection finds api.openai.com + codex model pattern
- Apply wrapping at all custom endpoint, named custom provider, and
  API-key provider return paths
- Update test mocks for the new 5-tuple return format

Users can now set:
  auxiliary:
    compression:
      model: gpt-5.3-codex
      base_url: https://api.openai.com/v1
      api_mode: codex_responses

Closes #6800

											
										
										
											2026-04-11 13:50:43 +05:30
+								    api_mode: str = None,
-												fix(agent): route compression aux through live session runtime

											
										
										
											2026-04-12 00:10:19 -04:00
+								    main_runtime: Optional[Dict[str, Any]] = None,
-												fix(copilot): send vision header for Copilot vision requests

Thread a vision-request flag through auxiliary provider resolution so Copilot clients can include Copilot-Vision-Request only for vision tasks. This preserves normal text requests while ensuring Copilot vision payloads reach the vision-capable route.

Add regression coverage for Copilot vision routing and keep cached text and vision clients separate so a text client without the header is not reused for vision.

Co-authored-by: dhabibi <9087935+dhabibi@users.noreply.github.com>

											
										
										
											2026-04-26 21:29:55 +00:00
+								    is_vision: bool = False,
-												feat: call_llm/async_call_llm + config slots + migrate all consumers

Add centralized call_llm() and async_call_llm() functions that own the
full LLM request lifecycle:
  1. Resolve provider + model from task config or explicit args
  2. Get or create a cached client for that provider
  3. Format request args (max_tokens handling, provider extra_body)
  4. Make the API call with max_tokens/max_completion_tokens retry
  5. Return the response

Config: expanded auxiliary section with provider:model slots for all
tasks (compression, vision, web_extract, session_search, skills_hub,
mcp, flush_memories). Config version bumped to 7.

Migrated all auxiliary consumers:
- context_compressor.py: uses call_llm(task='compression')
- vision_tools.py: uses async_call_llm(task='vision')
- web_tools.py: uses async_call_llm(task='web_extract')
- session_search_tool.py: uses async_call_llm(task='session_search')
- browser_tool.py: uses call_llm(task='vision'/'web_extract')
- mcp_tool.py: uses call_llm(task='mcp')
- skills_guard.py: uses call_llm(provider='openrouter')
- run_agent.py flush_memories: uses call_llm(task='flush_memories')

Tests updated for context_compressor and MCP tool. Some test mocks
still need updating (15 remaining failures from mock pattern changes,
2 pre-existing).

											
										
										
											2026-03-11 20:52:19 -07:00
+								) -> Tuple[Optional[Any], Optional[str]]:
-												fix(agent): prevent AsyncOpenAI/httpx cross-loop deadlock in gateway mode (#2701)

In gateway mode, async tools (vision_analyze, web_extract, session_search)
deadlock because _run_async() spawns a thread with asyncio.run(), creating
a new event loop, but _get_cached_client() returns an AsyncOpenAI client
bound to a different loop. httpx.AsyncClient cannot work across event loop
boundaries, causing await client.chat.completions.create() to hang forever.

Fix: include the event loop identity in the async client cache key so each
loop gets its own AsyncOpenAI instance. Also fix session_search_tool.py
which had its own broken asyncio.run()-in-thread pattern — now uses the
centralized _run_async() bridge.
											
										
										
											2026-03-25 17:31:56 -07:00
+								    """Get or create a cached client for the given provider.
 								    Async clients (AsyncOpenAI) use httpx.AsyncClient internally, which
 								    binds to the event loop that was current when the client was created.
 								    Using such a client on a *different* loop causes deadlocks or
-												fix: bound auxiliary client cache to prevent fd exhaustion in long-running gateways (#10200) (#10470)

The _client_cache used event loop id() as part of the cache key, so
every new worker-thread event loop created a new entry for the same
provider config.  In long-running gateways where threads are recycled
frequently, this caused unbounded cache growth — each stale entry
held an unclosed AsyncOpenAI client with its httpx connection pool,
eventually exhausting file descriptors.

Fix: remove loop_id from the cache key and instead validate on each
async cache hit that the cached loop is the current, open loop.  If
the loop changed or was closed, the stale entry is replaced in-place
rather than creating an additional entry.  This bounds cache growth
to at most one entry per unique provider config.

Also adds a _CLIENT_CACHE_MAX_SIZE (64) safety belt with FIFO
eviction as defense-in-depth against any remaining unbounded growth.

Cross-loop safety is preserved: different event loops still get
different client instances (validated by existing test suite).

Closes #10200
											
										
										
											2026-04-15 13:16:28 -07:00
+								    RuntimeError.  To prevent cross-loop issues, the cache validates on
 								    every async hit that the cached loop is the *current, open* loop.
 								    If the loop changed (e.g. a new gateway worker-thread loop), the stale
 								    entry is replaced in-place rather than creating an additional entry.
 								    This keeps cache size bounded to one entry per unique provider config,
 								    preventing the fd-exhaustion that previously occurred in long-running
 								    gateways where recycled worker threads created unbounded entries (#10200).
-												fix(agent): prevent AsyncOpenAI/httpx cross-loop deadlock in gateway mode (#2701)

In gateway mode, async tools (vision_analyze, web_extract, session_search)
deadlock because _run_async() spawns a thread with asyncio.run(), creating
a new event loop, but _get_cached_client() returns an AsyncOpenAI client
bound to a different loop. httpx.AsyncClient cannot work across event loop
boundaries, causing await client.chat.completions.create() to hang forever.

Fix: include the event loop identity in the async client cache key so each
loop gets its own AsyncOpenAI instance. Also fix session_search_tool.py
which had its own broken asyncio.run()-in-thread pattern — now uses the
centralized _run_async() bridge.
											
										
										
											2026-03-25 17:31:56 -07:00
+								    """
-												fix: bound auxiliary client cache to prevent fd exhaustion in long-running gateways (#10200) (#10470)

The _client_cache used event loop id() as part of the cache key, so
every new worker-thread event loop created a new entry for the same
provider config.  In long-running gateways where threads are recycled
frequently, this caused unbounded cache growth — each stale entry
held an unclosed AsyncOpenAI client with its httpx connection pool,
eventually exhausting file descriptors.

Fix: remove loop_id from the cache key and instead validate on each
async cache hit that the cached loop is the current, open loop.  If
the loop changed or was closed, the stale entry is replaced in-place
rather than creating an additional entry.  This bounds cache growth
to at most one entry per unique provider config.

Also adds a _CLIENT_CACHE_MAX_SIZE (64) safety belt with FIFO
eviction as defense-in-depth against any remaining unbounded growth.

Cross-loop safety is preserved: different event loops still get
different client instances (validated by existing test suite).

Closes #10200
											
										
										
											2026-04-15 13:16:28 -07:00
+								    # Resolve the current event loop for async clients so we can validate
 								    # cached entries.  Loop identity is NOT in the cache key — instead we
 								    # check at hit time whether the cached loop is still current and open.
 								    # This prevents unbounded cache growth from recycled worker-thread loops
 								    # while still guaranteeing we never reuse a client on the wrong loop
 								    # (which causes deadlocks, see #2681).
-												fix(agent): prevent AsyncOpenAI/httpx cross-loop deadlock in gateway mode (#2701)

In gateway mode, async tools (vision_analyze, web_extract, session_search)
deadlock because _run_async() spawns a thread with asyncio.run(), creating
a new event loop, but _get_cached_client() returns an AsyncOpenAI client
bound to a different loop. httpx.AsyncClient cannot work across event loop
boundaries, causing await client.chat.completions.create() to hang forever.

Fix: include the event loop identity in the async client cache key so each
loop gets its own AsyncOpenAI instance. Also fix session_search_tool.py
which had its own broken asyncio.run()-in-thread pattern — now uses the
centralized _run_async() bridge.
											
										
										
											2026-03-25 17:31:56 -07:00
+								    current_loop = None
 								    if async_mode:
 								        try:
 								            import asyncio as _aio
 								            current_loop = _aio.get_event_loop()
 								        except RuntimeError:
 								            pass
-												fix(agent): route compression aux through live session runtime

											
										
										
											2026-04-12 00:10:19 -04:00
+								    runtime = _normalize_main_runtime(main_runtime)
-												fix(auxiliary): refresh Nous runtime credentials after aux 401s

											
										
										
											2026-04-21 14:45:13 -06:00
+								    cache_key = _client_cache_key(
 								        provider,
 								        async_mode=async_mode,
 								        base_url=base_url,
 								        api_key=api_key,
 								        api_mode=api_mode,
 								        main_runtime=main_runtime,
-												fix(copilot): send vision header for Copilot vision requests

Thread a vision-request flag through auxiliary provider resolution so Copilot clients can include Copilot-Vision-Request only for vision tasks. This preserves normal text requests while ensuring Copilot vision payloads reach the vision-capable route.

Add regression coverage for Copilot vision routing and keep cached text and vision clients separate so a text client without the header is not reused for vision.

Co-authored-by: dhabibi <9087935+dhabibi@users.noreply.github.com>

											
										
										
											2026-04-26 21:29:55 +00:00
+								        is_vision=is_vision,
-												fix(auxiliary): refresh Nous runtime credentials after aux 401s

											
										
										
											2026-04-21 14:45:13 -06:00
+								    )
-												fix: thread safety for concurrent subagent delegation (#1672)

* fix: thread safety for concurrent subagent delegation

Four thread-safety fixes that prevent crashes and data races when
running multiple subagents concurrently via delegate_task:

1. Remove redirect_stdout/stderr from delegate_tool — mutating global
   sys.stdout races with the spinner thread when multiple children start
   concurrently, causing segfaults. Children already run with
   quiet_mode=True so the redirect was redundant.

2. Split _run_single_child into _build_child_agent (main thread) +
   _run_single_child (worker thread). AIAgent construction creates
   httpx/SSL clients which are not thread-safe to initialize
   concurrently.

3. Add threading.Lock to SessionDB — subagents share the parent's
   SessionDB and call create_session/append_message from worker threads
   with no synchronization.

4. Add _active_children_lock to AIAgent — interrupt() iterates
   _active_children while worker threads append/remove children.

5. Add _client_cache_lock to auxiliary_client — multiple subagent
   threads may resolve clients concurrently via call_llm().

Based on PR #1471 by peteromallet.

* feat: Honcho base_url override via config.yaml + quick command alias type

Two features salvaged from PR #1576:

1. Honcho base_url override: allows pointing Hermes at a remote
   self-hosted Honcho deployment via config.yaml:

     honcho:
       base_url: "http://192.168.x.x:8000"

   When set, this overrides the Honcho SDK's environment mapping
   (production/local), enabling LAN/VPN Honcho deployments without
   requiring the server to live on localhost. Uses config.yaml instead
   of env var (HONCHO_URL) per project convention.

2. Quick command alias type: adds a new 'alias' quick command type
   that rewrites to another slash command before normal dispatch:

     quick_commands:
       sc:
         type: alias
         target: /context

   Supports both CLI and gateway. Arguments are forwarded to the
   target command.

Based on PR #1576 by redhelix.

---------

Co-authored-by: peteromallet <peteromallet@users.noreply.github.com>
Co-authored-by: redhelix <redhelix@users.noreply.github.com>
											
										
										
											2026-03-17 02:53:33 -07:00
+								    with _client_cache_lock:
 								        if cache_key in _client_cache:
-												fix: persistent event loop in _run_async prevents 'Event loop is closed' (#2190)

Cherry-picked from PR #2146 by @crazywriter1. Fixes #2104.

asyncio.run() creates and closes a fresh event loop each call. Cached
httpx/AsyncOpenAI clients bound to the dead loop crash on GC with
'Event loop is closed'. This hit vision_analyze on first use in CLI.

Two-layer fix:
- model_tools._run_async(): replace asyncio.run() with persistent
  loop via _get_tool_loop() + run_until_complete()
- auxiliary_client._get_cached_client(): track which loop created
  each async client, discard stale entries if loop is closed

6 regression tests covering loop lifecycle, reuse, and full vision
dispatch chain.

Co-authored-by: Test <test@test.com>
											
										
										
											2026-03-20 09:44:50 -07:00
+								            cached_client, cached_default, cached_loop = _client_cache[cache_key]
 								            if async_mode:
-												fix: bound auxiliary client cache to prevent fd exhaustion in long-running gateways (#10200) (#10470)

The _client_cache used event loop id() as part of the cache key, so
every new worker-thread event loop created a new entry for the same
provider config.  In long-running gateways where threads are recycled
frequently, this caused unbounded cache growth — each stale entry
held an unclosed AsyncOpenAI client with its httpx connection pool,
eventually exhausting file descriptors.

Fix: remove loop_id from the cache key and instead validate on each
async cache hit that the cached loop is the current, open loop.  If
the loop changed or was closed, the stale entry is replaced in-place
rather than creating an additional entry.  This bounds cache growth
to at most one entry per unique provider config.

Also adds a _CLIENT_CACHE_MAX_SIZE (64) safety belt with FIFO
eviction as defense-in-depth against any remaining unbounded growth.

Cross-loop safety is preserved: different event loops still get
different client instances (validated by existing test suite).

Closes #10200
											
										
										
											2026-04-15 13:16:28 -07:00
+								                # Validate: the cached client must be bound to the CURRENT,
 								                # OPEN loop.  If the loop changed or was closed, the httpx
 								                # transport inside is dead — force-close and replace.
 								                loop_ok = (
 								                    cached_loop is not None
 								                    and cached_loop is current_loop
 								                    and not cached_loop.is_closed()
 								                )
 								                if loop_ok:
-												fix: drop incompatible model slugs on auxiliary client cache hit

`resolve_provider_client()` already drops OpenRouter-format model slugs
(containing "/") when the resolved provider is not OpenRouter (line 1097).
However, `_get_cached_client()` returns `model or cached_default` directly
on cache hits, bypassing this check entirely.

When the main provider is openai-codex, the auto-detection chain (Step 1
of `_resolve_auto`) caches a CodexAuxiliaryClient. Subsequent auxiliary
calls for different tasks (e.g. compression with `summary_model:
google/gemini-3-flash-preview`) hit the cache and pass the OpenRouter-
format model slug straight to the Codex Responses API, which does not
understand it and returns an empty `response.output`.

This causes two user-visible failures:
- "Invalid API response shape" (empty output after 3 retries)
- "Context length exceeded, cannot compress further" (compression itself
  fails through the same path)

Add `_compat_model()` helper that mirrors the "/" check from
`resolve_provider_client()` and call it on the cache-hit return path.

											
										
										
											2026-04-07 18:07:08 +08:00
+								                    effective = _compat_model(cached_client, model, cached_default)
 								                    return cached_client, effective
-												fix: bound auxiliary client cache to prevent fd exhaustion in long-running gateways (#10200) (#10470)

The _client_cache used event loop id() as part of the cache key, so
every new worker-thread event loop created a new entry for the same
provider config.  In long-running gateways where threads are recycled
frequently, this caused unbounded cache growth — each stale entry
held an unclosed AsyncOpenAI client with its httpx connection pool,
eventually exhausting file descriptors.

Fix: remove loop_id from the cache key and instead validate on each
async cache hit that the cached loop is the current, open loop.  If
the loop changed or was closed, the stale entry is replaced in-place
rather than creating an additional entry.  This bounds cache growth
to at most one entry per unique provider config.

Also adds a _CLIENT_CACHE_MAX_SIZE (64) safety belt with FIFO
eviction as defense-in-depth against any remaining unbounded growth.

Cross-loop safety is preserved: different event loops still get
different client instances (validated by existing test suite).

Closes #10200
											
										
										
											2026-04-15 13:16:28 -07:00
+								                # Stale — evict and fall through to create a new client.
 								                _force_close_async_httpx(cached_client)
 								                del _client_cache[cache_key]
-												fix: persistent event loop in _run_async prevents 'Event loop is closed' (#2190)

Cherry-picked from PR #2146 by @crazywriter1. Fixes #2104.

asyncio.run() creates and closes a fresh event loop each call. Cached
httpx/AsyncOpenAI clients bound to the dead loop crash on GC with
'Event loop is closed'. This hit vision_analyze on first use in CLI.

Two-layer fix:
- model_tools._run_async(): replace asyncio.run() with persistent
  loop via _get_tool_loop() + run_until_complete()
- auxiliary_client._get_cached_client(): track which loop created
  each async client, discard stale entries if loop is closed

6 regression tests covering loop lifecycle, reuse, and full vision
dispatch chain.

Co-authored-by: Test <test@test.com>
											
										
										
											2026-03-20 09:44:50 -07:00
+								            else:
-												fix: drop incompatible model slugs on auxiliary client cache hit

`resolve_provider_client()` already drops OpenRouter-format model slugs
(containing "/") when the resolved provider is not OpenRouter (line 1097).
However, `_get_cached_client()` returns `model or cached_default` directly
on cache hits, bypassing this check entirely.

When the main provider is openai-codex, the auto-detection chain (Step 1
of `_resolve_auto`) caches a CodexAuxiliaryClient. Subsequent auxiliary
calls for different tasks (e.g. compression with `summary_model:
google/gemini-3-flash-preview`) hit the cache and pass the OpenRouter-
format model slug straight to the Codex Responses API, which does not
understand it and returns an empty `response.output`.

This causes two user-visible failures:
- "Invalid API response shape" (empty output after 3 retries)
- "Context length exceeded, cannot compress further" (compression itself
  fails through the same path)

Add `_compat_model()` helper that mirrors the "/" check from
`resolve_provider_client()` and call it on the cache-hit return path.

											
										
										
											2026-04-07 18:07:08 +08:00
+								                effective = _compat_model(cached_client, model, cached_default)
 								                return cached_client, effective
-												fix: thread safety for concurrent subagent delegation (#1672)

* fix: thread safety for concurrent subagent delegation

Four thread-safety fixes that prevent crashes and data races when
running multiple subagents concurrently via delegate_task:

1. Remove redirect_stdout/stderr from delegate_tool — mutating global
   sys.stdout races with the spinner thread when multiple children start
   concurrently, causing segfaults. Children already run with
   quiet_mode=True so the redirect was redundant.

2. Split _run_single_child into _build_child_agent (main thread) +
   _run_single_child (worker thread). AIAgent construction creates
   httpx/SSL clients which are not thread-safe to initialize
   concurrently.

3. Add threading.Lock to SessionDB — subagents share the parent's
   SessionDB and call create_session/append_message from worker threads
   with no synchronization.

4. Add _active_children_lock to AIAgent — interrupt() iterates
   _active_children while worker threads append/remove children.

5. Add _client_cache_lock to auxiliary_client — multiple subagent
   threads may resolve clients concurrently via call_llm().

Based on PR #1471 by peteromallet.

* feat: Honcho base_url override via config.yaml + quick command alias type

Two features salvaged from PR #1576:

1. Honcho base_url override: allows pointing Hermes at a remote
   self-hosted Honcho deployment via config.yaml:

     honcho:
       base_url: "http://192.168.x.x:8000"

   When set, this overrides the Honcho SDK's environment mapping
   (production/local), enabling LAN/VPN Honcho deployments without
   requiring the server to live on localhost. Uses config.yaml instead
   of env var (HONCHO_URL) per project convention.

2. Quick command alias type: adds a new 'alias' quick command type
   that rewrites to another slash command before normal dispatch:

     quick_commands:
       sc:
         type: alias
         target: /context

   Supports both CLI and gateway. Arguments are forwarded to the
   target command.

Based on PR #1576 by redhelix.

---------

Co-authored-by: peteromallet <peteromallet@users.noreply.github.com>
Co-authored-by: redhelix <redhelix@users.noreply.github.com>
											
										
										
											2026-03-17 02:53:33 -07:00
+								    # Build outside the lock
-												feat: add direct endpoint overrides for auxiliary and delegation

Add base_url/api_key overrides for auxiliary tasks and delegation so users can
route those flows straight to a custom OpenAI-compatible endpoint without
having to rely on provider=main or named custom providers.

Also clear gateway session env vars in test isolation so the full suite stays
deterministic when run from a messaging-backed agent session.

											
										
										
											2026-03-14 20:48:29 -07:00
+								    client, default_model = resolve_provider_client(
 								        provider,
 								        model,
 								        async_mode,
 								        explicit_base_url=base_url,
 								        explicit_api_key=api_key,
-												fix(auxiliary): honor api_mode in auxiliary client (#6800)

The auxiliary client always calls client.chat.completions.create(),
ignoring the api_mode config flag. This breaks codex-family models
(e.g. gpt-5.3-codex) on direct OpenAI API keys, which need the
/v1/responses endpoint.

Changes:
- Expand _resolve_task_provider_model to return api_mode (5-tuple)
- Read api_mode from auxiliary.{task}.api_mode config and env vars
  (AUXILIARY_{TASK}_API_MODE)
- Pass api_mode through _get_cached_client to resolve_provider_client
- Add _needs_codex_wrap/_wrap_if_needed helpers that wrap plain OpenAI
  clients in CodexAuxiliaryClient when api_mode=codex_responses or
  when auto-detection finds api.openai.com + codex model pattern
- Apply wrapping at all custom endpoint, named custom provider, and
  API-key provider return paths
- Update test mocks for the new 5-tuple return format

Users can now set:
  auxiliary:
    compression:
      model: gpt-5.3-codex
      base_url: https://api.openai.com/v1
      api_mode: codex_responses

Closes #6800

											
										
										
											2026-04-11 13:50:43 +05:30
+								        api_mode=api_mode,
-												fix(agent): route compression aux through live session runtime

											
										
										
											2026-04-12 00:10:19 -04:00
+								        main_runtime=runtime,
-												fix(copilot): send vision header for Copilot vision requests

Thread a vision-request flag through auxiliary provider resolution so Copilot clients can include Copilot-Vision-Request only for vision tasks. This preserves normal text requests while ensuring Copilot vision payloads reach the vision-capable route.

Add regression coverage for Copilot vision routing and keep cached text and vision clients separate so a text client without the header is not reused for vision.

Co-authored-by: dhabibi <9087935+dhabibi@users.noreply.github.com>

											
										
										
											2026-04-26 21:29:55 +00:00
+								        is_vision=is_vision,
-												feat: add direct endpoint overrides for auxiliary and delegation

Add base_url/api_key overrides for auxiliary tasks and delegation so users can
route those flows straight to a custom OpenAI-compatible endpoint without
having to rely on provider=main or named custom providers.

Also clear gateway session env vars in test isolation so the full suite stays
deterministic when run from a messaging-backed agent session.

											
										
										
											2026-03-14 20:48:29 -07:00
+								    )
-												feat: call_llm/async_call_llm + config slots + migrate all consumers

Add centralized call_llm() and async_call_llm() functions that own the
full LLM request lifecycle:
  1. Resolve provider + model from task config or explicit args
  2. Get or create a cached client for that provider
  3. Format request args (max_tokens handling, provider extra_body)
  4. Make the API call with max_tokens/max_completion_tokens retry
  5. Return the response

Config: expanded auxiliary section with provider:model slots for all
tasks (compression, vision, web_extract, session_search, skills_hub,
mcp, flush_memories). Config version bumped to 7.

Migrated all auxiliary consumers:
- context_compressor.py: uses call_llm(task='compression')
- vision_tools.py: uses async_call_llm(task='vision')
- web_tools.py: uses async_call_llm(task='web_extract')
- session_search_tool.py: uses async_call_llm(task='session_search')
- browser_tool.py: uses call_llm(task='vision'/'web_extract')
- mcp_tool.py: uses call_llm(task='mcp')
- skills_guard.py: uses call_llm(provider='openrouter')
- run_agent.py flush_memories: uses call_llm(task='flush_memories')

Tests updated for context_compressor and MCP tool. Some test mocks
still need updating (15 remaining failures from mock pattern changes,
2 pre-existing).

											
										
										
											2026-03-11 20:52:19 -07:00
+								    if client is not None:
-												fix: persistent event loop in _run_async prevents 'Event loop is closed' (#2190)

Cherry-picked from PR #2146 by @crazywriter1. Fixes #2104.

asyncio.run() creates and closes a fresh event loop each call. Cached
httpx/AsyncOpenAI clients bound to the dead loop crash on GC with
'Event loop is closed'. This hit vision_analyze on first use in CLI.

Two-layer fix:
- model_tools._run_async(): replace asyncio.run() with persistent
  loop via _get_tool_loop() + run_until_complete()
- auxiliary_client._get_cached_client(): track which loop created
  each async client, discard stale entries if loop is closed

6 regression tests covering loop lifecycle, reuse, and full vision
dispatch chain.

Co-authored-by: Test <test@test.com>
											
										
										
											2026-03-20 09:44:50 -07:00
+								        # For async clients, remember which loop they were created on so we
 								        # can detect stale entries later.
-												fix(agent): prevent AsyncOpenAI/httpx cross-loop deadlock in gateway mode (#2701)

In gateway mode, async tools (vision_analyze, web_extract, session_search)
deadlock because _run_async() spawns a thread with asyncio.run(), creating
a new event loop, but _get_cached_client() returns an AsyncOpenAI client
bound to a different loop. httpx.AsyncClient cannot work across event loop
boundaries, causing await client.chat.completions.create() to hang forever.

Fix: include the event loop identity in the async client cache key so each
loop gets its own AsyncOpenAI instance. Also fix session_search_tool.py
which had its own broken asyncio.run()-in-thread pattern — now uses the
centralized _run_async() bridge.
											
										
										
											2026-03-25 17:31:56 -07:00
+								        bound_loop = current_loop
-												fix: thread safety for concurrent subagent delegation (#1672)

* fix: thread safety for concurrent subagent delegation

Four thread-safety fixes that prevent crashes and data races when
running multiple subagents concurrently via delegate_task:

1. Remove redirect_stdout/stderr from delegate_tool — mutating global
   sys.stdout races with the spinner thread when multiple children start
   concurrently, causing segfaults. Children already run with
   quiet_mode=True so the redirect was redundant.

2. Split _run_single_child into _build_child_agent (main thread) +
   _run_single_child (worker thread). AIAgent construction creates
   httpx/SSL clients which are not thread-safe to initialize
   concurrently.

3. Add threading.Lock to SessionDB — subagents share the parent's
   SessionDB and call create_session/append_message from worker threads
   with no synchronization.

4. Add _active_children_lock to AIAgent — interrupt() iterates
   _active_children while worker threads append/remove children.

5. Add _client_cache_lock to auxiliary_client — multiple subagent
   threads may resolve clients concurrently via call_llm().

Based on PR #1471 by peteromallet.

* feat: Honcho base_url override via config.yaml + quick command alias type

Two features salvaged from PR #1576:

1. Honcho base_url override: allows pointing Hermes at a remote
   self-hosted Honcho deployment via config.yaml:

     honcho:
       base_url: "http://192.168.x.x:8000"

   When set, this overrides the Honcho SDK's environment mapping
   (production/local), enabling LAN/VPN Honcho deployments without
   requiring the server to live on localhost. Uses config.yaml instead
   of env var (HONCHO_URL) per project convention.

2. Quick command alias type: adds a new 'alias' quick command type
   that rewrites to another slash command before normal dispatch:

     quick_commands:
       sc:
         type: alias
         target: /context

   Supports both CLI and gateway. Arguments are forwarded to the
   target command.

Based on PR #1576 by redhelix.

---------

Co-authored-by: peteromallet <peteromallet@users.noreply.github.com>
Co-authored-by: redhelix <redhelix@users.noreply.github.com>
											
										
										
											2026-03-17 02:53:33 -07:00
+								        with _client_cache_lock:
 								            if cache_key not in _client_cache:
-												fix: bound auxiliary client cache to prevent fd exhaustion in long-running gateways (#10200) (#10470)

The _client_cache used event loop id() as part of the cache key, so
every new worker-thread event loop created a new entry for the same
provider config.  In long-running gateways where threads are recycled
frequently, this caused unbounded cache growth — each stale entry
held an unclosed AsyncOpenAI client with its httpx connection pool,
eventually exhausting file descriptors.

Fix: remove loop_id from the cache key and instead validate on each
async cache hit that the cached loop is the current, open loop.  If
the loop changed or was closed, the stale entry is replaced in-place
rather than creating an additional entry.  This bounds cache growth
to at most one entry per unique provider config.

Also adds a _CLIENT_CACHE_MAX_SIZE (64) safety belt with FIFO
eviction as defense-in-depth against any remaining unbounded growth.

Cross-loop safety is preserved: different event loops still get
different client instances (validated by existing test suite).

Closes #10200
											
										
										
											2026-04-15 13:16:28 -07:00
+								                # Safety belt: if the cache has grown beyond the max, evict
 								                # the oldest entries (FIFO — dict preserves insertion order).
 								                while len(_client_cache) >= _CLIENT_CACHE_MAX_SIZE:
 								                    evict_key, evict_entry = next(iter(_client_cache.items()))
 								                    _force_close_async_httpx(evict_entry[0])
 								                    del _client_cache[evict_key]
-												fix: persistent event loop in _run_async prevents 'Event loop is closed' (#2190)

Cherry-picked from PR #2146 by @crazywriter1. Fixes #2104.

asyncio.run() creates and closes a fresh event loop each call. Cached
httpx/AsyncOpenAI clients bound to the dead loop crash on GC with
'Event loop is closed'. This hit vision_analyze on first use in CLI.

Two-layer fix:
- model_tools._run_async(): replace asyncio.run() with persistent
  loop via _get_tool_loop() + run_until_complete()
- auxiliary_client._get_cached_client(): track which loop created
  each async client, discard stale entries if loop is closed

6 regression tests covering loop lifecycle, reuse, and full vision
dispatch chain.

Co-authored-by: Test <test@test.com>
											
										
										
											2026-03-20 09:44:50 -07:00
+								                _client_cache[cache_key] = (client, default_model, bound_loop)
-												fix: thread safety for concurrent subagent delegation (#1672)

* fix: thread safety for concurrent subagent delegation

Four thread-safety fixes that prevent crashes and data races when
running multiple subagents concurrently via delegate_task:

1. Remove redirect_stdout/stderr from delegate_tool — mutating global
   sys.stdout races with the spinner thread when multiple children start
   concurrently, causing segfaults. Children already run with
   quiet_mode=True so the redirect was redundant.

2. Split _run_single_child into _build_child_agent (main thread) +
   _run_single_child (worker thread). AIAgent construction creates
   httpx/SSL clients which are not thread-safe to initialize
   concurrently.

3. Add threading.Lock to SessionDB — subagents share the parent's
   SessionDB and call create_session/append_message from worker threads
   with no synchronization.

4. Add _active_children_lock to AIAgent — interrupt() iterates
   _active_children while worker threads append/remove children.

5. Add _client_cache_lock to auxiliary_client — multiple subagent
   threads may resolve clients concurrently via call_llm().

Based on PR #1471 by peteromallet.

* feat: Honcho base_url override via config.yaml + quick command alias type

Two features salvaged from PR #1576:

1. Honcho base_url override: allows pointing Hermes at a remote
   self-hosted Honcho deployment via config.yaml:

     honcho:
       base_url: "http://192.168.x.x:8000"

   When set, this overrides the Honcho SDK's environment mapping
   (production/local), enabling LAN/VPN Honcho deployments without
   requiring the server to live on localhost. Uses config.yaml instead
   of env var (HONCHO_URL) per project convention.

2. Quick command alias type: adds a new 'alias' quick command type
   that rewrites to another slash command before normal dispatch:

     quick_commands:
       sc:
         type: alias
         target: /context

   Supports both CLI and gateway. Arguments are forwarded to the
   target command.

Based on PR #1576 by redhelix.

---------

Co-authored-by: peteromallet <peteromallet@users.noreply.github.com>
Co-authored-by: redhelix <redhelix@users.noreply.github.com>
											
										
										
											2026-03-17 02:53:33 -07:00
+								            else:
-												fix: persistent event loop in _run_async prevents 'Event loop is closed' (#2190)

Cherry-picked from PR #2146 by @crazywriter1. Fixes #2104.

asyncio.run() creates and closes a fresh event loop each call. Cached
httpx/AsyncOpenAI clients bound to the dead loop crash on GC with
'Event loop is closed'. This hit vision_analyze on first use in CLI.

Two-layer fix:
- model_tools._run_async(): replace asyncio.run() with persistent
  loop via _get_tool_loop() + run_until_complete()
- auxiliary_client._get_cached_client(): track which loop created
  each async client, discard stale entries if loop is closed

6 regression tests covering loop lifecycle, reuse, and full vision
dispatch chain.

Co-authored-by: Test <test@test.com>
											
										
										
											2026-03-20 09:44:50 -07:00
+								                client, default_model, _ = _client_cache[cache_key]
-												feat: call_llm/async_call_llm + config slots + migrate all consumers

Add centralized call_llm() and async_call_llm() functions that own the
full LLM request lifecycle:
  1. Resolve provider + model from task config or explicit args
  2. Get or create a cached client for that provider
  3. Format request args (max_tokens handling, provider extra_body)
  4. Make the API call with max_tokens/max_completion_tokens retry
  5. Return the response

Config: expanded auxiliary section with provider:model slots for all
tasks (compression, vision, web_extract, session_search, skills_hub,
mcp, flush_memories). Config version bumped to 7.

Migrated all auxiliary consumers:
- context_compressor.py: uses call_llm(task='compression')
- vision_tools.py: uses async_call_llm(task='vision')
- web_tools.py: uses async_call_llm(task='web_extract')
- session_search_tool.py: uses async_call_llm(task='session_search')
- browser_tool.py: uses call_llm(task='vision'/'web_extract')
- mcp_tool.py: uses call_llm(task='mcp')
- skills_guard.py: uses call_llm(provider='openrouter')
- run_agent.py flush_memories: uses call_llm(task='flush_memories')

Tests updated for context_compressor and MCP tool. Some test mocks
still need updating (15 remaining failures from mock pattern changes,
2 pre-existing).

											
										
										
											2026-03-11 20:52:19 -07:00
+								    return client, model or default_model
 								def _resolve_task_provider_model(
 								    task: str = None,
 								    provider: str = None,
 								    model: str = None,
-												feat: add direct endpoint overrides for auxiliary and delegation

Add base_url/api_key overrides for auxiliary tasks and delegation so users can
route those flows straight to a custom OpenAI-compatible endpoint without
having to rely on provider=main or named custom providers.

Also clear gateway session env vars in test isolation so the full suite stays
deterministic when run from a messaging-backed agent session.

											
										
										
											2026-03-14 20:48:29 -07:00
+								    base_url: str = None,
 								    api_key: str = None,
-												fix(auxiliary): honor api_mode in auxiliary client (#6800)

The auxiliary client always calls client.chat.completions.create(),
ignoring the api_mode config flag. This breaks codex-family models
(e.g. gpt-5.3-codex) on direct OpenAI API keys, which need the
/v1/responses endpoint.

Changes:
- Expand _resolve_task_provider_model to return api_mode (5-tuple)
- Read api_mode from auxiliary.{task}.api_mode config and env vars
  (AUXILIARY_{TASK}_API_MODE)
- Pass api_mode through _get_cached_client to resolve_provider_client
- Add _needs_codex_wrap/_wrap_if_needed helpers that wrap plain OpenAI
  clients in CodexAuxiliaryClient when api_mode=codex_responses or
  when auto-detection finds api.openai.com + codex model pattern
- Apply wrapping at all custom endpoint, named custom provider, and
  API-key provider return paths
- Update test mocks for the new 5-tuple return format

Users can now set:
  auxiliary:
    compression:
      model: gpt-5.3-codex
      base_url: https://api.openai.com/v1
      api_mode: codex_responses

Closes #6800

											
										
										
											2026-04-11 13:50:43 +05:30
+								) -> Tuple[str, Optional[str], Optional[str], Optional[str], Optional[str]]:
-												feat: call_llm/async_call_llm + config slots + migrate all consumers

Add centralized call_llm() and async_call_llm() functions that own the
full LLM request lifecycle:
  1. Resolve provider + model from task config or explicit args
  2. Get or create a cached client for that provider
  3. Format request args (max_tokens handling, provider extra_body)
  4. Make the API call with max_tokens/max_completion_tokens retry
  5. Return the response

Config: expanded auxiliary section with provider:model slots for all
tasks (compression, vision, web_extract, session_search, skills_hub,
mcp, flush_memories). Config version bumped to 7.

Migrated all auxiliary consumers:
- context_compressor.py: uses call_llm(task='compression')
- vision_tools.py: uses async_call_llm(task='vision')
- web_tools.py: uses async_call_llm(task='web_extract')
- session_search_tool.py: uses async_call_llm(task='session_search')
- browser_tool.py: uses call_llm(task='vision'/'web_extract')
- mcp_tool.py: uses call_llm(task='mcp')
- skills_guard.py: uses call_llm(provider='openrouter')
- run_agent.py flush_memories: uses call_llm(task='flush_memories')

Tests updated for context_compressor and MCP tool. Some test mocks
still need updating (15 remaining failures from mock pattern changes,
2 pre-existing).

											
										
										
											2026-03-11 20:52:19 -07:00
+								    """Determine provider + model for a call.
 								    Priority:
-												feat: add direct endpoint overrides for auxiliary and delegation

Add base_url/api_key overrides for auxiliary tasks and delegation so users can
route those flows straight to a custom OpenAI-compatible endpoint without
having to rely on provider=main or named custom providers.

Also clear gateway session env vars in test isolation so the full suite stays
deterministic when run from a messaging-backed agent session.

											
										
										
											2026-03-14 20:48:29 -07:00
+. Explicit provider/model/base_url/api_key args (always win)
-												fix: remove legacy compression.summary_* config and env var fallbacks (#8992)

Remove the backward-compat code paths that read compression provider/model
settings from legacy config keys and env vars, which caused silent failures
when auto-detection resolved to incompatible backends.

What changed:
- Remove compression.summary_model, summary_provider, summary_base_url from
  DEFAULT_CONFIG and cli.py defaults
- Remove backward-compat block in _resolve_task_provider_model() that read
  from the legacy compression section
- Remove _get_auxiliary_provider() and _get_auxiliary_env_override() helper
  functions (AUXILIARY_*/CONTEXT_* env var readers)
- Remove env var fallback chain for per-task overrides
- Update hermes config show to read from auxiliary.compression
- Add config migration (v16→17) that moves non-empty legacy values to
  auxiliary.compression and strips the old keys
- Update example config and openclaw migration script
- Remove/update tests for deleted code paths

Compression model/provider is now configured exclusively via:
  auxiliary.compression.provider / auxiliary.compression.model

Closes #8923
											
										
										
											2026-04-13 04:59:26 -07:00
+. Config file (auxiliary.{task}.provider/model/base_url)
 . "auto" (full auto-detection chain)
-												feat: call_llm/async_call_llm + config slots + migrate all consumers

Add centralized call_llm() and async_call_llm() functions that own the
full LLM request lifecycle:
  1. Resolve provider + model from task config or explicit args
  2. Get or create a cached client for that provider
  3. Format request args (max_tokens handling, provider extra_body)
  4. Make the API call with max_tokens/max_completion_tokens retry
  5. Return the response

Config: expanded auxiliary section with provider:model slots for all
tasks (compression, vision, web_extract, session_search, skills_hub,
mcp, flush_memories). Config version bumped to 7.

Migrated all auxiliary consumers:
- context_compressor.py: uses call_llm(task='compression')
- vision_tools.py: uses async_call_llm(task='vision')
- web_tools.py: uses async_call_llm(task='web_extract')
- session_search_tool.py: uses async_call_llm(task='session_search')
- browser_tool.py: uses call_llm(task='vision'/'web_extract')
- mcp_tool.py: uses call_llm(task='mcp')
- skills_guard.py: uses call_llm(provider='openrouter')
- run_agent.py flush_memories: uses call_llm(task='flush_memories')

Tests updated for context_compressor and MCP tool. Some test mocks
still need updating (15 remaining failures from mock pattern changes,
2 pre-existing).

											
										
										
											2026-03-11 20:52:19 -07:00
-												fix(auxiliary): honor api_mode in auxiliary client (#6800)

The auxiliary client always calls client.chat.completions.create(),
ignoring the api_mode config flag. This breaks codex-family models
(e.g. gpt-5.3-codex) on direct OpenAI API keys, which need the
/v1/responses endpoint.

Changes:
- Expand _resolve_task_provider_model to return api_mode (5-tuple)
- Read api_mode from auxiliary.{task}.api_mode config and env vars
  (AUXILIARY_{TASK}_API_MODE)
- Pass api_mode through _get_cached_client to resolve_provider_client
- Add _needs_codex_wrap/_wrap_if_needed helpers that wrap plain OpenAI
  clients in CodexAuxiliaryClient when api_mode=codex_responses or
  when auto-detection finds api.openai.com + codex model pattern
- Apply wrapping at all custom endpoint, named custom provider, and
  API-key provider return paths
- Update test mocks for the new 5-tuple return format

Users can now set:
  auxiliary:
    compression:
      model: gpt-5.3-codex
      base_url: https://api.openai.com/v1
      api_mode: codex_responses

Closes #6800

											
										
										
											2026-04-11 13:50:43 +05:30
+								    Returns (provider, model, base_url, api_key, api_mode) where model may
 								    be None (use provider default). When base_url is set, provider is forced
 								    to "custom" and the task uses that direct endpoint. api_mode is one of
 								    "chat_completions", "codex_responses", or None (auto-detect).
-												feat: call_llm/async_call_llm + config slots + migrate all consumers

Add centralized call_llm() and async_call_llm() functions that own the
full LLM request lifecycle:
  1. Resolve provider + model from task config or explicit args
  2. Get or create a cached client for that provider
  3. Format request args (max_tokens handling, provider extra_body)
  4. Make the API call with max_tokens/max_completion_tokens retry
  5. Return the response

Config: expanded auxiliary section with provider:model slots for all
tasks (compression, vision, web_extract, session_search, skills_hub,
mcp, flush_memories). Config version bumped to 7.

Migrated all auxiliary consumers:
- context_compressor.py: uses call_llm(task='compression')
- vision_tools.py: uses async_call_llm(task='vision')
- web_tools.py: uses async_call_llm(task='web_extract')
- session_search_tool.py: uses async_call_llm(task='session_search')
- browser_tool.py: uses call_llm(task='vision'/'web_extract')
- mcp_tool.py: uses call_llm(task='mcp')
- skills_guard.py: uses call_llm(provider='openrouter')
- run_agent.py flush_memories: uses call_llm(task='flush_memories')

Tests updated for context_compressor and MCP tool. Some test mocks
still need updating (15 remaining failures from mock pattern changes,
2 pre-existing).

											
										
										
											2026-03-11 20:52:19 -07:00
+								    """
-												feat: add direct endpoint overrides for auxiliary and delegation

Add base_url/api_key overrides for auxiliary tasks and delegation so users can
route those flows straight to a custom OpenAI-compatible endpoint without
having to rely on provider=main or named custom providers.

Also clear gateway session env vars in test isolation so the full suite stays
deterministic when run from a messaging-backed agent session.

											
										
										
											2026-03-14 20:48:29 -07:00
+								    cfg_provider = None
 								    cfg_model = None
 								    cfg_base_url = None
 								    cfg_api_key = None
-												fix(auxiliary): honor api_mode in auxiliary client (#6800)

The auxiliary client always calls client.chat.completions.create(),
ignoring the api_mode config flag. This breaks codex-family models
(e.g. gpt-5.3-codex) on direct OpenAI API keys, which need the
/v1/responses endpoint.

Changes:
- Expand _resolve_task_provider_model to return api_mode (5-tuple)
- Read api_mode from auxiliary.{task}.api_mode config and env vars
  (AUXILIARY_{TASK}_API_MODE)
- Pass api_mode through _get_cached_client to resolve_provider_client
- Add _needs_codex_wrap/_wrap_if_needed helpers that wrap plain OpenAI
  clients in CodexAuxiliaryClient when api_mode=codex_responses or
  when auto-detection finds api.openai.com + codex model pattern
- Apply wrapping at all custom endpoint, named custom provider, and
  API-key provider return paths
- Update test mocks for the new 5-tuple return format

Users can now set:
  auxiliary:
    compression:
      model: gpt-5.3-codex
      base_url: https://api.openai.com/v1
      api_mode: codex_responses

Closes #6800

											
										
										
											2026-04-11 13:50:43 +05:30
+								    cfg_api_mode = None
-												feat: call_llm/async_call_llm + config slots + migrate all consumers

Add centralized call_llm() and async_call_llm() functions that own the
full LLM request lifecycle:
  1. Resolve provider + model from task config or explicit args
  2. Get or create a cached client for that provider
  3. Format request args (max_tokens handling, provider extra_body)
  4. Make the API call with max_tokens/max_completion_tokens retry
  5. Return the response

Config: expanded auxiliary section with provider:model slots for all
tasks (compression, vision, web_extract, session_search, skills_hub,
mcp, flush_memories). Config version bumped to 7.

Migrated all auxiliary consumers:
- context_compressor.py: uses call_llm(task='compression')
- vision_tools.py: uses async_call_llm(task='vision')
- web_tools.py: uses async_call_llm(task='web_extract')
- session_search_tool.py: uses async_call_llm(task='session_search')
- browser_tool.py: uses call_llm(task='vision'/'web_extract')
- mcp_tool.py: uses call_llm(task='mcp')
- skills_guard.py: uses call_llm(provider='openrouter')
- run_agent.py flush_memories: uses call_llm(task='flush_memories')

Tests updated for context_compressor and MCP tool. Some test mocks
still need updating (15 remaining failures from mock pattern changes,
2 pre-existing).

											
										
										
											2026-03-11 20:52:19 -07:00
 								    if task:
-												fix(aux): add session_search extra_body and concurrency controls

Adds auxiliary.<task>.extra_body config passthrough so reasoning-heavy
OpenAI-compatible providers can receive provider-specific request fields
(e.g. enable_thinking: false on GLM) on auxiliary calls, and bounds
session_search summary fan-out with auxiliary.session_search.max_concurrency
(default 3, clamped 1-5) to avoid 429 bursts on small providers.

- agent/auxiliary_client.py: extract _get_auxiliary_task_config helper,
  add _get_task_extra_body, merge config+explicit extra_body with explicit winning
- hermes_cli/config.py: extra_body defaults on all aux tasks +
  session_search.max_concurrency; _config_version 19 -> 20
- tools/session_search_tool.py: semaphore around _summarize_all gather
- tests: coverage in test_auxiliary_client, test_session_search, test_aux_config
- docs: user-guide/configuration.md + fallback-providers.md

Co-authored-by: Teknium <teknium@nousresearch.com>

											
										
										
											2026-04-20 00:44:32 -07:00
+								        task_config = _get_auxiliary_task_config(task)
-												feat: add direct endpoint overrides for auxiliary and delegation

Add base_url/api_key overrides for auxiliary tasks and delegation so users can
route those flows straight to a custom OpenAI-compatible endpoint without
having to rely on provider=main or named custom providers.

Also clear gateway session env vars in test isolation so the full suite stays
deterministic when run from a messaging-backed agent session.

											
										
										
											2026-03-14 20:48:29 -07:00
+								        cfg_provider = str(task_config.get("provider", "")).strip() or None
 								        cfg_model = str(task_config.get("model", "")).strip() or None
 								        cfg_base_url = str(task_config.get("base_url", "")).strip() or None
 								        cfg_api_key = str(task_config.get("api_key", "")).strip() or None
-												fix(auxiliary): honor api_mode in auxiliary client (#6800)

The auxiliary client always calls client.chat.completions.create(),
ignoring the api_mode config flag. This breaks codex-family models
(e.g. gpt-5.3-codex) on direct OpenAI API keys, which need the
/v1/responses endpoint.

Changes:
- Expand _resolve_task_provider_model to return api_mode (5-tuple)
- Read api_mode from auxiliary.{task}.api_mode config and env vars
  (AUXILIARY_{TASK}_API_MODE)
- Pass api_mode through _get_cached_client to resolve_provider_client
- Add _needs_codex_wrap/_wrap_if_needed helpers that wrap plain OpenAI
  clients in CodexAuxiliaryClient when api_mode=codex_responses or
  when auto-detection finds api.openai.com + codex model pattern
- Apply wrapping at all custom endpoint, named custom provider, and
  API-key provider return paths
- Update test mocks for the new 5-tuple return format

Users can now set:
  auxiliary:
    compression:
      model: gpt-5.3-codex
      base_url: https://api.openai.com/v1
      api_mode: codex_responses

Closes #6800

											
										
										
											2026-04-11 13:50:43 +05:30
+								        cfg_api_mode = str(task_config.get("api_mode", "")).strip() or None
-												feat: call_llm/async_call_llm + config slots + migrate all consumers

Add centralized call_llm() and async_call_llm() functions that own the
full LLM request lifecycle:
  1. Resolve provider + model from task config or explicit args
  2. Get or create a cached client for that provider
  3. Format request args (max_tokens handling, provider extra_body)
  4. Make the API call with max_tokens/max_completion_tokens retry
  5. Return the response

Config: expanded auxiliary section with provider:model slots for all
tasks (compression, vision, web_extract, session_search, skills_hub,
mcp, flush_memories). Config version bumped to 7.

Migrated all auxiliary consumers:
- context_compressor.py: uses call_llm(task='compression')
- vision_tools.py: uses async_call_llm(task='vision')
- web_tools.py: uses async_call_llm(task='web_extract')
- session_search_tool.py: uses async_call_llm(task='session_search')
- browser_tool.py: uses call_llm(task='vision'/'web_extract')
- mcp_tool.py: uses call_llm(task='mcp')
- skills_guard.py: uses call_llm(provider='openrouter')
- run_agent.py flush_memories: uses call_llm(task='flush_memories')

Tests updated for context_compressor and MCP tool. Some test mocks
still need updating (15 remaining failures from mock pattern changes,
2 pre-existing).

											
										
										
											2026-03-11 20:52:19 -07:00
-												fix: remove legacy compression.summary_* config and env var fallbacks (#8992)

Remove the backward-compat code paths that read compression provider/model
settings from legacy config keys and env vars, which caused silent failures
when auto-detection resolved to incompatible backends.

What changed:
- Remove compression.summary_model, summary_provider, summary_base_url from
  DEFAULT_CONFIG and cli.py defaults
- Remove backward-compat block in _resolve_task_provider_model() that read
  from the legacy compression section
- Remove _get_auxiliary_provider() and _get_auxiliary_env_override() helper
  functions (AUXILIARY_*/CONTEXT_* env var readers)
- Remove env var fallback chain for per-task overrides
- Update hermes config show to read from auxiliary.compression
- Add config migration (v16→17) that moves non-empty legacy values to
  auxiliary.compression and strips the old keys
- Update example config and openclaw migration script
- Remove/update tests for deleted code paths

Compression model/provider is now configured exclusively via:
  auxiliary.compression.provider / auxiliary.compression.model

Closes #8923
											
										
										
											2026-04-13 04:59:26 -07:00
+								    resolved_model = model or cfg_model
 								    resolved_api_mode = cfg_api_mode
-												feat: add direct endpoint overrides for auxiliary and delegation

Add base_url/api_key overrides for auxiliary tasks and delegation so users can
route those flows straight to a custom OpenAI-compatible endpoint without
having to rely on provider=main or named custom providers.

Also clear gateway session env vars in test isolation so the full suite stays
deterministic when run from a messaging-backed agent session.

											
										
										
											2026-03-14 20:48:29 -07:00
 								    if base_url:
-												fix(auxiliary): honor api_mode in auxiliary client (#6800)

The auxiliary client always calls client.chat.completions.create(),
ignoring the api_mode config flag. This breaks codex-family models
(e.g. gpt-5.3-codex) on direct OpenAI API keys, which need the
/v1/responses endpoint.

Changes:
- Expand _resolve_task_provider_model to return api_mode (5-tuple)
- Read api_mode from auxiliary.{task}.api_mode config and env vars
  (AUXILIARY_{TASK}_API_MODE)
- Pass api_mode through _get_cached_client to resolve_provider_client
- Add _needs_codex_wrap/_wrap_if_needed helpers that wrap plain OpenAI
  clients in CodexAuxiliaryClient when api_mode=codex_responses or
  when auto-detection finds api.openai.com + codex model pattern
- Apply wrapping at all custom endpoint, named custom provider, and
  API-key provider return paths
- Update test mocks for the new 5-tuple return format

Users can now set:
  auxiliary:
    compression:
      model: gpt-5.3-codex
      base_url: https://api.openai.com/v1
      api_mode: codex_responses

Closes #6800

											
										
										
											2026-04-11 13:50:43 +05:30
+								        return "custom", resolved_model, base_url, api_key, resolved_api_mode
-												feat: add direct endpoint overrides for auxiliary and delegation

Add base_url/api_key overrides for auxiliary tasks and delegation so users can
route those flows straight to a custom OpenAI-compatible endpoint without
having to rely on provider=main or named custom providers.

Also clear gateway session env vars in test isolation so the full suite stays
deterministic when run from a messaging-backed agent session.

											
										
										
											2026-03-14 20:48:29 -07:00
+								    if provider:
-												fix(auxiliary): honor api_mode in auxiliary client (#6800)

The auxiliary client always calls client.chat.completions.create(),
ignoring the api_mode config flag. This breaks codex-family models
(e.g. gpt-5.3-codex) on direct OpenAI API keys, which need the
/v1/responses endpoint.

Changes:
- Expand _resolve_task_provider_model to return api_mode (5-tuple)
- Read api_mode from auxiliary.{task}.api_mode config and env vars
  (AUXILIARY_{TASK}_API_MODE)
- Pass api_mode through _get_cached_client to resolve_provider_client
- Add _needs_codex_wrap/_wrap_if_needed helpers that wrap plain OpenAI
  clients in CodexAuxiliaryClient when api_mode=codex_responses or
  when auto-detection finds api.openai.com + codex model pattern
- Apply wrapping at all custom endpoint, named custom provider, and
  API-key provider return paths
- Update test mocks for the new 5-tuple return format

Users can now set:
  auxiliary:
    compression:
      model: gpt-5.3-codex
      base_url: https://api.openai.com/v1
      api_mode: codex_responses

Closes #6800

											
										
										
											2026-04-11 13:50:43 +05:30
+								        return provider, resolved_model, base_url, api_key, resolved_api_mode
-												feat: call_llm/async_call_llm + config slots + migrate all consumers

Add centralized call_llm() and async_call_llm() functions that own the
full LLM request lifecycle:
  1. Resolve provider + model from task config or explicit args
  2. Get or create a cached client for that provider
  3. Format request args (max_tokens handling, provider extra_body)
  4. Make the API call with max_tokens/max_completion_tokens retry
  5. Return the response

Config: expanded auxiliary section with provider:model slots for all
tasks (compression, vision, web_extract, session_search, skills_hub,
mcp, flush_memories). Config version bumped to 7.

Migrated all auxiliary consumers:
- context_compressor.py: uses call_llm(task='compression')
- vision_tools.py: uses async_call_llm(task='vision')
- web_tools.py: uses async_call_llm(task='web_extract')
- session_search_tool.py: uses async_call_llm(task='session_search')
- browser_tool.py: uses call_llm(task='vision'/'web_extract')
- mcp_tool.py: uses call_llm(task='mcp')
- skills_guard.py: uses call_llm(provider='openrouter')
- run_agent.py flush_memories: uses call_llm(task='flush_memories')

Tests updated for context_compressor and MCP tool. Some test mocks
still need updating (15 remaining failures from mock pattern changes,
2 pre-existing).

											
										
										
											2026-03-11 20:52:19 -07:00
-												feat: add direct endpoint overrides for auxiliary and delegation

Add base_url/api_key overrides for auxiliary tasks and delegation so users can
route those flows straight to a custom OpenAI-compatible endpoint without
having to rely on provider=main or named custom providers.

Also clear gateway session env vars in test isolation so the full suite stays
deterministic when run from a messaging-backed agent session.

											
										
										
											2026-03-14 20:48:29 -07:00
+								    if task:
-												refactor(auxiliary): config.yaml takes priority over env vars for aux task settings (#7889)

The auxiliary client previously checked env vars (AUXILIARY_{TASK}_PROVIDER,
AUXILIARY_{TASK}_MODEL, etc.) before config.yaml's auxiliary.{task}.* section.
This violated the project's '.env is for secrets only' policy — these are
behavioral settings, not API keys.

Flipped the resolution order in _resolve_task_provider_model():
  1. Explicit args (always win)
  2. config.yaml auxiliary.{task}.* (PRIMARY)
  3. Env var overrides (backward-compat fallback only)
  4. 'auto' (full auto-detection chain)

Env var reading code is kept for backward compatibility but config.yaml
now takes precedence. Updated module docstring and function docstring.

Also removed AUXILIARY_VISION_MODEL from _EXTRA_ENV_KEYS in config.py.
											
										
										
											2026-04-11 11:21:59 -07:00
+								        # Config.yaml is the primary source for per-task overrides.
 								        if cfg_base_url:
 								            return "custom", resolved_model, cfg_base_url, cfg_api_key, resolved_api_mode
 								        if cfg_provider and cfg_provider != "auto":
 								            return cfg_provider, resolved_model, None, None, resolved_api_mode
-												fix(auxiliary): honor api_mode in auxiliary client (#6800)

The auxiliary client always calls client.chat.completions.create(),
ignoring the api_mode config flag. This breaks codex-family models
(e.g. gpt-5.3-codex) on direct OpenAI API keys, which need the
/v1/responses endpoint.

Changes:
- Expand _resolve_task_provider_model to return api_mode (5-tuple)
- Read api_mode from auxiliary.{task}.api_mode config and env vars
  (AUXILIARY_{TASK}_API_MODE)
- Pass api_mode through _get_cached_client to resolve_provider_client
- Add _needs_codex_wrap/_wrap_if_needed helpers that wrap plain OpenAI
  clients in CodexAuxiliaryClient when api_mode=codex_responses or
  when auto-detection finds api.openai.com + codex model pattern
- Apply wrapping at all custom endpoint, named custom provider, and
  API-key provider return paths
- Update test mocks for the new 5-tuple return format

Users can now set:
  auxiliary:
    compression:
      model: gpt-5.3-codex
      base_url: https://api.openai.com/v1
      api_mode: codex_responses

Closes #6800

											
										
										
											2026-04-11 13:50:43 +05:30
+								        return "auto", resolved_model, None, None, resolved_api_mode
-												feat: call_llm/async_call_llm + config slots + migrate all consumers

Add centralized call_llm() and async_call_llm() functions that own the
full LLM request lifecycle:
  1. Resolve provider + model from task config or explicit args
  2. Get or create a cached client for that provider
  3. Format request args (max_tokens handling, provider extra_body)
  4. Make the API call with max_tokens/max_completion_tokens retry
  5. Return the response

Config: expanded auxiliary section with provider:model slots for all
tasks (compression, vision, web_extract, session_search, skills_hub,
mcp, flush_memories). Config version bumped to 7.

Migrated all auxiliary consumers:
- context_compressor.py: uses call_llm(task='compression')
- vision_tools.py: uses async_call_llm(task='vision')
- web_tools.py: uses async_call_llm(task='web_extract')
- session_search_tool.py: uses async_call_llm(task='session_search')
- browser_tool.py: uses call_llm(task='vision'/'web_extract')
- mcp_tool.py: uses call_llm(task='mcp')
- skills_guard.py: uses call_llm(provider='openrouter')
- run_agent.py flush_memories: uses call_llm(task='flush_memories')

Tests updated for context_compressor and MCP tool. Some test mocks
still need updating (15 remaining failures from mock pattern changes,
2 pre-existing).

											
										
										
											2026-03-11 20:52:19 -07:00
-												fix(auxiliary): honor api_mode in auxiliary client (#6800)

The auxiliary client always calls client.chat.completions.create(),
ignoring the api_mode config flag. This breaks codex-family models
(e.g. gpt-5.3-codex) on direct OpenAI API keys, which need the
/v1/responses endpoint.

Changes:
- Expand _resolve_task_provider_model to return api_mode (5-tuple)
- Read api_mode from auxiliary.{task}.api_mode config and env vars
  (AUXILIARY_{TASK}_API_MODE)
- Pass api_mode through _get_cached_client to resolve_provider_client
- Add _needs_codex_wrap/_wrap_if_needed helpers that wrap plain OpenAI
  clients in CodexAuxiliaryClient when api_mode=codex_responses or
  when auto-detection finds api.openai.com + codex model pattern
- Apply wrapping at all custom endpoint, named custom provider, and
  API-key provider return paths
- Update test mocks for the new 5-tuple return format

Users can now set:
  auxiliary:
    compression:
      model: gpt-5.3-codex
      base_url: https://api.openai.com/v1
      api_mode: codex_responses

Closes #6800

											
										
										
											2026-04-11 13:50:43 +05:30
+								    return "auto", resolved_model, None, None, resolved_api_mode
-												feat: call_llm/async_call_llm + config slots + migrate all consumers

Add centralized call_llm() and async_call_llm() functions that own the
full LLM request lifecycle:
  1. Resolve provider + model from task config or explicit args
  2. Get or create a cached client for that provider
  3. Format request args (max_tokens handling, provider extra_body)
  4. Make the API call with max_tokens/max_completion_tokens retry
  5. Return the response

Config: expanded auxiliary section with provider:model slots for all
tasks (compression, vision, web_extract, session_search, skills_hub,
mcp, flush_memories). Config version bumped to 7.

Migrated all auxiliary consumers:
- context_compressor.py: uses call_llm(task='compression')
- vision_tools.py: uses async_call_llm(task='vision')
- web_tools.py: uses async_call_llm(task='web_extract')
- session_search_tool.py: uses async_call_llm(task='session_search')
- browser_tool.py: uses call_llm(task='vision'/'web_extract')
- mcp_tool.py: uses call_llm(task='mcp')
- skills_guard.py: uses call_llm(provider='openrouter')
- run_agent.py flush_memories: uses call_llm(task='flush_memories')

Tests updated for context_compressor and MCP tool. Some test mocks
still need updating (15 remaining failures from mock pattern changes,
2 pre-existing).

											
										
										
											2026-03-11 20:52:19 -07:00
-												feat(agent): configurable timeouts for auxiliary LLM calls via config.yaml (#3597)

Add per-task timeout settings under auxiliary.{task}.timeout in config.yaml
instead of hardcoded values. Users with slow local models (Ollama, llama.cpp)
can now increase timeouts for compression, vision, session search, etc.

Defaults:
  - auxiliary.compression.timeout: 120s (was hardcoded 45s)
  - auxiliary.vision.timeout: 30s (unchanged)
  - all other aux tasks: 30s (was hardcoded 30s)
  - title_generator: 30s (was hardcoded 15s)

call_llm/async_call_llm now auto-resolve timeout from config when not
explicitly passed. Callers can still override with an explicit timeout arg.

Based on PR #3406 by alanfwilliams. Converted from env vars to config.yaml
per project conventions.

Co-authored-by: alanfwilliams <alanfwilliams@users.noreply.github.com>
											
										
										
											2026-03-28 14:35:28 -07:00
+								_DEFAULT_AUX_TIMEOUT = 30.0
-												fix(aux): add session_search extra_body and concurrency controls

Adds auxiliary.<task>.extra_body config passthrough so reasoning-heavy
OpenAI-compatible providers can receive provider-specific request fields
(e.g. enable_thinking: false on GLM) on auxiliary calls, and bounds
session_search summary fan-out with auxiliary.session_search.max_concurrency
(default 3, clamped 1-5) to avoid 429 bursts on small providers.

- agent/auxiliary_client.py: extract _get_auxiliary_task_config helper,
  add _get_task_extra_body, merge config+explicit extra_body with explicit winning
- hermes_cli/config.py: extra_body defaults on all aux tasks +
  session_search.max_concurrency; _config_version 19 -> 20
- tools/session_search_tool.py: semaphore around _summarize_all gather
- tests: coverage in test_auxiliary_client, test_session_search, test_aux_config
- docs: user-guide/configuration.md + fallback-providers.md

Co-authored-by: Teknium <teknium@nousresearch.com>

											
										
										
											2026-04-20 00:44:32 -07:00
+								def _get_auxiliary_task_config(task: str) -> Dict[str, Any]:
 								    """Return the config dict for auxiliary.<task>, or {} when unavailable."""
-												feat(agent): configurable timeouts for auxiliary LLM calls via config.yaml (#3597)

Add per-task timeout settings under auxiliary.{task}.timeout in config.yaml
instead of hardcoded values. Users with slow local models (Ollama, llama.cpp)
can now increase timeouts for compression, vision, session search, etc.

Defaults:
  - auxiliary.compression.timeout: 120s (was hardcoded 45s)
  - auxiliary.vision.timeout: 30s (unchanged)
  - all other aux tasks: 30s (was hardcoded 30s)
  - title_generator: 30s (was hardcoded 15s)

call_llm/async_call_llm now auto-resolve timeout from config when not
explicitly passed. Callers can still override with an explicit timeout arg.

Based on PR #3406 by alanfwilliams. Converted from env vars to config.yaml
per project conventions.

Co-authored-by: alanfwilliams <alanfwilliams@users.noreply.github.com>
											
										
										
											2026-03-28 14:35:28 -07:00
+								    if not task:
-												fix(aux): add session_search extra_body and concurrency controls

Adds auxiliary.<task>.extra_body config passthrough so reasoning-heavy
OpenAI-compatible providers can receive provider-specific request fields
(e.g. enable_thinking: false on GLM) on auxiliary calls, and bounds
session_search summary fan-out with auxiliary.session_search.max_concurrency
(default 3, clamped 1-5) to avoid 429 bursts on small providers.

- agent/auxiliary_client.py: extract _get_auxiliary_task_config helper,
  add _get_task_extra_body, merge config+explicit extra_body with explicit winning
- hermes_cli/config.py: extra_body defaults on all aux tasks +
  session_search.max_concurrency; _config_version 19 -> 20
- tools/session_search_tool.py: semaphore around _summarize_all gather
- tests: coverage in test_auxiliary_client, test_session_search, test_aux_config
- docs: user-guide/configuration.md + fallback-providers.md

Co-authored-by: Teknium <teknium@nousresearch.com>

											
										
										
											2026-04-20 00:44:32 -07:00
+								        return {}
-												feat(agent): configurable timeouts for auxiliary LLM calls via config.yaml (#3597)

Add per-task timeout settings under auxiliary.{task}.timeout in config.yaml
instead of hardcoded values. Users with slow local models (Ollama, llama.cpp)
can now increase timeouts for compression, vision, session search, etc.

Defaults:
  - auxiliary.compression.timeout: 120s (was hardcoded 45s)
  - auxiliary.vision.timeout: 30s (unchanged)
  - all other aux tasks: 30s (was hardcoded 30s)
  - title_generator: 30s (was hardcoded 15s)

call_llm/async_call_llm now auto-resolve timeout from config when not
explicitly passed. Callers can still override with an explicit timeout arg.

Based on PR #3406 by alanfwilliams. Converted from env vars to config.yaml
per project conventions.

Co-authored-by: alanfwilliams <alanfwilliams@users.noreply.github.com>
											
										
										
											2026-03-28 14:35:28 -07:00
+								    try:
 								        from hermes_cli.config import load_config
 								        config = load_config()
 								    except ImportError:
-												fix(aux): add session_search extra_body and concurrency controls

Adds auxiliary.<task>.extra_body config passthrough so reasoning-heavy
OpenAI-compatible providers can receive provider-specific request fields
(e.g. enable_thinking: false on GLM) on auxiliary calls, and bounds
session_search summary fan-out with auxiliary.session_search.max_concurrency
(default 3, clamped 1-5) to avoid 429 bursts on small providers.

- agent/auxiliary_client.py: extract _get_auxiliary_task_config helper,
  add _get_task_extra_body, merge config+explicit extra_body with explicit winning
- hermes_cli/config.py: extra_body defaults on all aux tasks +
  session_search.max_concurrency; _config_version 19 -> 20
- tools/session_search_tool.py: semaphore around _summarize_all gather
- tests: coverage in test_auxiliary_client, test_session_search, test_aux_config
- docs: user-guide/configuration.md + fallback-providers.md

Co-authored-by: Teknium <teknium@nousresearch.com>

											
										
										
											2026-04-20 00:44:32 -07:00
+								        return {}
-												feat(agent): configurable timeouts for auxiliary LLM calls via config.yaml (#3597)

Add per-task timeout settings under auxiliary.{task}.timeout in config.yaml
instead of hardcoded values. Users with slow local models (Ollama, llama.cpp)
can now increase timeouts for compression, vision, session search, etc.

Defaults:
  - auxiliary.compression.timeout: 120s (was hardcoded 45s)
  - auxiliary.vision.timeout: 30s (unchanged)
  - all other aux tasks: 30s (was hardcoded 30s)
  - title_generator: 30s (was hardcoded 15s)

call_llm/async_call_llm now auto-resolve timeout from config when not
explicitly passed. Callers can still override with an explicit timeout arg.

Based on PR #3406 by alanfwilliams. Converted from env vars to config.yaml
per project conventions.

Co-authored-by: alanfwilliams <alanfwilliams@users.noreply.github.com>
											
										
										
											2026-03-28 14:35:28 -07:00
+								    aux = config.get("auxiliary", {}) if isinstance(config, dict) else {}
 								    task_config = aux.get(task, {}) if isinstance(aux, dict) else {}
-												fix(aux): add session_search extra_body and concurrency controls

Adds auxiliary.<task>.extra_body config passthrough so reasoning-heavy
OpenAI-compatible providers can receive provider-specific request fields
(e.g. enable_thinking: false on GLM) on auxiliary calls, and bounds
session_search summary fan-out with auxiliary.session_search.max_concurrency
(default 3, clamped 1-5) to avoid 429 bursts on small providers.

- agent/auxiliary_client.py: extract _get_auxiliary_task_config helper,
  add _get_task_extra_body, merge config+explicit extra_body with explicit winning
- hermes_cli/config.py: extra_body defaults on all aux tasks +
  session_search.max_concurrency; _config_version 19 -> 20
- tools/session_search_tool.py: semaphore around _summarize_all gather
- tests: coverage in test_auxiliary_client, test_session_search, test_aux_config
- docs: user-guide/configuration.md + fallback-providers.md

Co-authored-by: Teknium <teknium@nousresearch.com>

											
										
										
											2026-04-20 00:44:32 -07:00
+								    return task_config if isinstance(task_config, dict) else {}
 								def _get_task_timeout(task: str, default: float = _DEFAULT_AUX_TIMEOUT) -> float:
 								    """Read timeout from auxiliary.{task}.timeout in config, falling back to *default*."""
 								    if not task:
 								        return default
 								    task_config = _get_auxiliary_task_config(task)
-												feat(agent): configurable timeouts for auxiliary LLM calls via config.yaml (#3597)

Add per-task timeout settings under auxiliary.{task}.timeout in config.yaml
instead of hardcoded values. Users with slow local models (Ollama, llama.cpp)
can now increase timeouts for compression, vision, session search, etc.

Defaults:
  - auxiliary.compression.timeout: 120s (was hardcoded 45s)
  - auxiliary.vision.timeout: 30s (unchanged)
  - all other aux tasks: 30s (was hardcoded 30s)
  - title_generator: 30s (was hardcoded 15s)

call_llm/async_call_llm now auto-resolve timeout from config when not
explicitly passed. Callers can still override with an explicit timeout arg.

Based on PR #3406 by alanfwilliams. Converted from env vars to config.yaml
per project conventions.

Co-authored-by: alanfwilliams <alanfwilliams@users.noreply.github.com>
											
										
										
											2026-03-28 14:35:28 -07:00
+								    raw = task_config.get("timeout")
 								    if raw is not None:
 								        try:
 								            return float(raw)
 								        except (ValueError, TypeError):
 								            pass
 								    return default
-												fix(aux): add session_search extra_body and concurrency controls

Adds auxiliary.<task>.extra_body config passthrough so reasoning-heavy
OpenAI-compatible providers can receive provider-specific request fields
(e.g. enable_thinking: false on GLM) on auxiliary calls, and bounds
session_search summary fan-out with auxiliary.session_search.max_concurrency
(default 3, clamped 1-5) to avoid 429 bursts on small providers.

- agent/auxiliary_client.py: extract _get_auxiliary_task_config helper,
  add _get_task_extra_body, merge config+explicit extra_body with explicit winning
- hermes_cli/config.py: extra_body defaults on all aux tasks +
  session_search.max_concurrency; _config_version 19 -> 20
- tools/session_search_tool.py: semaphore around _summarize_all gather
- tests: coverage in test_auxiliary_client, test_session_search, test_aux_config
- docs: user-guide/configuration.md + fallback-providers.md

Co-authored-by: Teknium <teknium@nousresearch.com>

											
										
										
											2026-04-20 00:44:32 -07:00
+								def _get_task_extra_body(task: str) -> Dict[str, Any]:
 								    """Read auxiliary.<task>.extra_body and return a shallow copy when valid."""
 								    task_config = _get_auxiliary_task_config(task)
 								    raw = task_config.get("extra_body")
 								    if isinstance(raw, dict):
 								        return dict(raw)
 								    return {}
-												fix: three provider-related bugs (#8161, #8181, #8147) (#8243)

- Add openai/openai-codex -> openai mapping to PROVIDER_TO_MODELS_DEV
  so context-length lookups use models.dev data instead of 128k fallback.
  Fixes #8161.

- Set api_mode from custom_providers entry when switching via hermes model,
  and clear stale api_mode when the entry has none. Also extract api_mode
  in _named_custom_provider_map(). Fixes #8181.

- Convert OpenAI image_url content blocks to Anthropic image blocks when
  the endpoint is Anthropic-compatible (MiniMax, MiniMax-CN, or any URL
  containing /anthropic). Fixes #8147.
											
										
										
											2026-04-12 01:44:18 -07:00
+								# ---------------------------------------------------------------------------
 								# Anthropic-compatible endpoint detection + image block conversion
 								# ---------------------------------------------------------------------------
 								# Providers that use Anthropic-compatible endpoints (via OpenAI SDK wrapper).
 								# Their image content blocks must use Anthropic format, not OpenAI format.
 								_ANTHROPIC_COMPAT_PROVIDERS = frozenset({"minimax", "minimax-cn"})
 								def _is_anthropic_compat_endpoint(provider: str, base_url: str) -> bool:
 								    """Detect if an endpoint expects Anthropic-format content blocks.
 								    Returns True for known Anthropic-compatible providers (MiniMax) and
 								    any endpoint whose URL contains ``/anthropic`` in the path.
 								    """
 								    if provider in _ANTHROPIC_COMPAT_PROVIDERS:
 								        return True
 								    url_lower = (base_url or "").lower()
 								    return "/anthropic" in url_lower
 								def _convert_openai_images_to_anthropic(messages: list) -> list:
 								    """Convert OpenAI ``image_url`` content blocks to Anthropic ``image`` blocks.
 								    Only touches messages that have list-type content with ``image_url`` blocks;
 								    plain text messages pass through unchanged.
 								    """
 								    converted = []
 								    for msg in messages:
 								        content = msg.get("content")
 								        if not isinstance(content, list):
 								            converted.append(msg)
 								            continue
 								        new_content = []
 								        changed = False
 								        for block in content:
 								            if block.get("type") == "image_url":
 								                image_url_val = (block.get("image_url") or {}).get("url", "")
 								                if image_url_val.startswith("data:"):
 								                    # Parse data URI: data:<media_type>;base64,<data>
 								                    header, _, b64data = image_url_val.partition(",")
 								                    media_type = "image/png"
 								                    if ":" in header and ";" in header:
 								                        media_type = header.split(":", 1)[1].split(";", 1)[0]
 								                    new_content.append({
 								                        "type": "image",
 								                        "source": {
 								                            "type": "base64",
 								                            "media_type": media_type,
 								                            "data": b64data,
 								                        },
 								                    })
 								                else:
 								                    # URL-based image
 								                    new_content.append({
 								                        "type": "image",
 								                        "source": {
 								                            "type": "url",
 								                            "url": image_url_val,
 								                        },
 								                    })
 								                changed = True
 								            else:
 								                new_content.append(block)
 								        converted.append({**msg, "content": new_content} if changed else msg)
 								    return converted
-												feat: call_llm/async_call_llm + config slots + migrate all consumers

Add centralized call_llm() and async_call_llm() functions that own the
full LLM request lifecycle:
  1. Resolve provider + model from task config or explicit args
  2. Get or create a cached client for that provider
  3. Format request args (max_tokens handling, provider extra_body)
  4. Make the API call with max_tokens/max_completion_tokens retry
  5. Return the response

Config: expanded auxiliary section with provider:model slots for all
tasks (compression, vision, web_extract, session_search, skills_hub,
mcp, flush_memories). Config version bumped to 7.

Migrated all auxiliary consumers:
- context_compressor.py: uses call_llm(task='compression')
- vision_tools.py: uses async_call_llm(task='vision')
- web_tools.py: uses async_call_llm(task='web_extract')
- session_search_tool.py: uses async_call_llm(task='session_search')
- browser_tool.py: uses call_llm(task='vision'/'web_extract')
- mcp_tool.py: uses call_llm(task='mcp')
- skills_guard.py: uses call_llm(provider='openrouter')
- run_agent.py flush_memories: uses call_llm(task='flush_memories')

Tests updated for context_compressor and MCP tool. Some test mocks
still need updating (15 remaining failures from mock pattern changes,
2 pre-existing).

											
										
										
											2026-03-11 20:52:19 -07:00
+								def _build_call_kwargs(
 								    provider: str,
 								    model: str,
 								    messages: list,
 								    temperature: Optional[float] = None,
 								    max_tokens: Optional[int] = None,
 								    tools: Optional[list] = None,
 								    timeout: float = 30.0,
 								    extra_body: Optional[dict] = None,
-												feat: add direct endpoint overrides for auxiliary and delegation

Add base_url/api_key overrides for auxiliary tasks and delegation so users can
route those flows straight to a custom OpenAI-compatible endpoint without
having to rely on provider=main or named custom providers.

Also clear gateway session env vars in test isolation so the full suite stays
deterministic when run from a messaging-backed agent session.

											
										
										
											2026-03-14 20:48:29 -07:00
+								    base_url: Optional[str] = None,
-												feat: call_llm/async_call_llm + config slots + migrate all consumers

Add centralized call_llm() and async_call_llm() functions that own the
full LLM request lifecycle:
  1. Resolve provider + model from task config or explicit args
  2. Get or create a cached client for that provider
  3. Format request args (max_tokens handling, provider extra_body)
  4. Make the API call with max_tokens/max_completion_tokens retry
  5. Return the response

Config: expanded auxiliary section with provider:model slots for all
tasks (compression, vision, web_extract, session_search, skills_hub,
mcp, flush_memories). Config version bumped to 7.

Migrated all auxiliary consumers:
- context_compressor.py: uses call_llm(task='compression')
- vision_tools.py: uses async_call_llm(task='vision')
- web_tools.py: uses async_call_llm(task='web_extract')
- session_search_tool.py: uses async_call_llm(task='session_search')
- browser_tool.py: uses call_llm(task='vision'/'web_extract')
- mcp_tool.py: uses call_llm(task='mcp')
- skills_guard.py: uses call_llm(provider='openrouter')
- run_agent.py flush_memories: uses call_llm(task='flush_memories')

Tests updated for context_compressor and MCP tool. Some test mocks
still need updating (15 remaining failures from mock pattern changes,
2 pre-existing).

											
										
										
											2026-03-11 20:52:19 -07:00
+								) -> dict:
 								    """Build kwargs for .chat.completions.create() with model/provider adjustments."""
 								    kwargs: Dict[str, Any] = {
 								        "model": model,
 								        "messages": messages,
 								        "timeout": timeout,
 								    }
-												fix(kimi): route temperature override by base_url — kimi-k2.5 needs 1.0 on api.moonshot.ai

Follow-up to #12144.  That PR standardized the kimi-k2.* temperature lock
against the Coding Plan endpoint (api.kimi.com/coding/v1) docs, where
non-thinking models require 0.6.  Verified empirically against Moonshot
(April 2026) that the public chat endpoint (api.moonshot.ai/v1) has a
different contract for kimi-k2.5: it only accepts temperature=1, and rejects
0.6 with:

    HTTP 400 "invalid temperature: only 1 is allowed for this model"

Users hit the public endpoint when KIMI_API_KEY is a legacy sk-* key (the
sk-kimi-* prefix routes to Coding Plan — see hermes_cli/auth.py).  So for
Coding Plan subscribers the fix from #12144 is correct, but for public-API
users it reintroduces the exact 400 reported in #9125.

Reproduction on api.moonshot.ai/v1 + kimi-k2.5:
  temperature=1.0 → 200 OK
  temperature=0.6 → 400 "only 1 is allowed"     ← #12144 default
  temperature=None → 200 OK

Other kimi-k2.* models are unaffected empirically — turbo-preview accepts
0.6 and thinking-turbo accepts 1.0 on both endpoints — so only kimi-k2.5
diverges.

Fix: thread the client's actual base_url through _build_call_kwargs (the
parameter already existed but callers passed config-level resolved_base_url;
for auto-detected routes that was often empty).  _fixed_temperature_for_model
now checks api.moonshot.ai first via an explicit _KIMI_PUBLIC_API_OVERRIDES
map, then falls back to the Coding Plan defaults.  Tests parametrize over
endpoint + model to lock both contracts.

Closes #9125.

											
										
										
											2026-04-20 04:18:49 +09:00
+								    fixed_temperature = _fixed_temperature_for_model(model, base_url)
-												fix(kimi): omit temperature entirely for Kimi/Moonshot models (#13157)

Kimi's gateway selects the correct temperature server-side based on the
active mode (thinking -> 1.0, non-thinking -> 0.6).  Sending any
temperature value — even the previously "correct" one — conflicts with
gateway-managed defaults.

Replaces the old approach of forcing specific temperature values (0.6
for non-thinking, 1.0 for thinking) with an OMIT_TEMPERATURE sentinel
that tells all call sites to strip the temperature key from API kwargs
entirely.

Changes:
- agent/auxiliary_client.py: OMIT_TEMPERATURE sentinel, _is_kimi_model()
  prefix check (covers all kimi-* models), _fixed_temperature_for_model()
  returns sentinel for kimi models.  _build_call_kwargs() strips temp.
- run_agent.py: _build_api_kwargs, flush_memories, and summary generation
  paths all handle the sentinel by popping/omitting temperature.
- trajectory_compressor.py: _effective_temperature_for_model returns None
  for kimi (sentinel mapped), direct client calls use kwargs dict to
  conditionally include temperature.
- mini_swe_runner.py: same sentinel handling via wrapper function.
- 6 test files updated: all 'forces temperature X' assertions replaced
  with 'temperature not in kwargs' assertions.

Net: -76 lines (171 added, 247 removed).
Inspired by PR #13137 (@kshitijk4poor).
											
										
										
											2026-04-20 12:23:05 -07:00
+								    if fixed_temperature is OMIT_TEMPERATURE:
 								        temperature = None  # strip — let server choose
 								    elif fixed_temperature is not None:
-												fix(kimi): force kimi-for-coding temperature to 0.6

											
										
										
											2026-04-17 16:17:15 -06:00
+								        temperature = fixed_temperature
-												fix(agent): complete Claude Opus 4.7 API migration

Claude Opus 4.7 introduced several breaking API changes that the current
codebase partially handled but not completely. This patch finishes the
migration per the official migration guide at
https://platform.claude.com/docs/en/about-claude/models/migration-guide

Fixes NousResearch/hermes-agent#11137

Breaking-change coverage:

1. Adaptive thinking + output_config.effort — 4.7 is now recognized by
   _supports_adaptive_thinking() (extends previous 4.6-only gate).

2. Sampling parameter stripping — 4.7 returns 400 for any non-default
   temperature / top_p / top_k. build_anthropic_kwargs drops them as a
   safety net; the OpenAI-protocol auxiliary path (_build_call_kwargs)
   and AnthropicCompletionsAdapter.create() both early-exit before
   setting temperature for 4.7+ models. This keeps flush_memories and
   structured-JSON aux paths that hardcode temperature from 400ing
   when the aux model is flipped to 4.7.

3. thinking.display = "summarized" — 4.7 defaults display to "omitted",
   which silently hides reasoning text from Hermes's CLI activity feed
   during long tool runs. Restoring "summarized" preserves 4.6 UX.

4. Effort level mapping — xhigh now maps to xhigh (was xhigh→max, which
   silently over-efforted every coding/agentic request). max is now a
   distinct ceiling per Anthropic's 5-level effort model.

5. New stop_reason values — refusal and model_context_window_exceeded
   were silently collapsed to "stop" (end_turn) by the adapter's
   stop_reason_map. Now mapped to "content_filter" and "length"
   respectively, matching upstream finish-reason handling already in
   bedrock_adapter.

6. Model catalogs — claude-opus-4-7 added to the Anthropic provider
   list, anthropic/claude-opus-4.7 added at top of OpenRouter fallback
   catalog (recommended), claude-opus-4-7 added to model_metadata
   DEFAULT_CONTEXT_LENGTHS (1M, matching 4.6 per migration guide).

7. Prefill docstrings — run_agent.AIAgent and BatchRunner now document
   that Anthropic Sonnet/Opus 4.6+ reject a trailing assistant-role
   prefill (400).

8. Tests — 4 new tests in test_anthropic_adapter covering display
   default, xhigh preservation, max on 4.7, refusal / context-overflow
   stop_reason mapping, plus the sampling-param predicate. test_model_metadata
   accepts 4.7 at 1M context.

Tested on macOS 15.5 (darwin). 119 tests pass in
tests/agent/test_anthropic_adapter.py, 1320 pass in tests/agent/.

											
										
										
											2026-04-16 12:35:43 -05:00
+								    # Opus 4.7+ rejects any non-default temperature/top_p/top_k — silently
-												refactor(memory): remove flush_memories entirely (#15696)

The AIAgent.flush_memories pre-compression save, the gateway
_flush_memories_for_session, and everything feeding them are
obsolete now that the background memory/skill review handles
persistent memory extraction.

Problems with flush_memories:

- Pre-dates the background review loop.  It was the only memory-save
  path when introduced; the background review now fires every 10 user
  turns on CLI and gateway alike, which is far more frequent than
  compression or session reset ever triggered flush.
- Blocking and synchronous.  Pre-compression flush ran on the live agent
  before compression, blocking the user-visible response.
- Cache-breaking.  Flush built a temporary conversation prefix
  (system prompt + memory-only tool list) that diverged from the live
  conversation's cached prefix, invalidating prompt caching.  The
  gateway variant spawned a fresh AIAgent with its own clean prompt
  for each finalized session — still cache-breaking, just in a
  different process.
- Redundant.  Background review runs in the live conversation's
  session context, gets the same content, writes to the same memory
  store, and doesn't break the cache.  Everything flush_memories
  claimed to preserve is already covered.

What this removes:

- AIAgent.flush_memories() method (~248 LOC in run_agent.py)
- Pre-compression flush call in _compress_context
- flush_memories call sites in cli.py (/new + exit)
- GatewayRunner._flush_memories_for_session + _async_flush_memories
  (and the 3 call sites: session expiry watcher, /new, /resume)
- 'flush_memories' entry from DEFAULT_CONFIG auxiliary tasks,
  hermes tools UI task list, auxiliary_client docstrings
- _memory_flush_min_turns config + init
- #15631's headroom-deduction math in
  _check_compression_model_feasibility (headroom was only needed
  because flush dragged the full main-agent system prompt along;
  the compression summariser sends a single user-role prompt so
  new_threshold = aux_context is safe again)
- The dedicated test files and assertions that exercised
  flush-specific paths

What this renames (with read-time backcompat on sessions.json):

- SessionEntry.memory_flushed -> SessionEntry.expiry_finalized.
  The session-expiry watcher still uses the flag to avoid re-running
  finalize/eviction on the same expired session; the new name
  reflects what it now actually gates.  from_dict() reads
  'expiry_finalized' first, falls back to the legacy 'memory_flushed'
  key so existing sessions.json files upgrade seamlessly.

Supersedes #15631 and #15638.

Tested: 383 targeted tests pass across run_agent/, agent/, cli/,
and gateway/ session-boundary suites.  No behavior regressions —
background memory review continues to handle persistent memory
extraction on both CLI and gateway.
											
										
										
											2026-04-25 08:21:14 -07:00
+								    # drop here so auxiliary callers that hardcode temperature (e.g. 0 on
 								    # structured-JSON extraction) don't 400 the moment
-												fix(agent): complete Claude Opus 4.7 API migration

Claude Opus 4.7 introduced several breaking API changes that the current
codebase partially handled but not completely. This patch finishes the
migration per the official migration guide at
https://platform.claude.com/docs/en/about-claude/models/migration-guide

Fixes NousResearch/hermes-agent#11137

Breaking-change coverage:

1. Adaptive thinking + output_config.effort — 4.7 is now recognized by
   _supports_adaptive_thinking() (extends previous 4.6-only gate).

2. Sampling parameter stripping — 4.7 returns 400 for any non-default
   temperature / top_p / top_k. build_anthropic_kwargs drops them as a
   safety net; the OpenAI-protocol auxiliary path (_build_call_kwargs)
   and AnthropicCompletionsAdapter.create() both early-exit before
   setting temperature for 4.7+ models. This keeps flush_memories and
   structured-JSON aux paths that hardcode temperature from 400ing
   when the aux model is flipped to 4.7.

3. thinking.display = "summarized" — 4.7 defaults display to "omitted",
   which silently hides reasoning text from Hermes's CLI activity feed
   during long tool runs. Restoring "summarized" preserves 4.6 UX.

4. Effort level mapping — xhigh now maps to xhigh (was xhigh→max, which
   silently over-efforted every coding/agentic request). max is now a
   distinct ceiling per Anthropic's 5-level effort model.

5. New stop_reason values — refusal and model_context_window_exceeded
   were silently collapsed to "stop" (end_turn) by the adapter's
   stop_reason_map. Now mapped to "content_filter" and "length"
   respectively, matching upstream finish-reason handling already in
   bedrock_adapter.

6. Model catalogs — claude-opus-4-7 added to the Anthropic provider
   list, anthropic/claude-opus-4.7 added at top of OpenRouter fallback
   catalog (recommended), claude-opus-4-7 added to model_metadata
   DEFAULT_CONTEXT_LENGTHS (1M, matching 4.6 per migration guide).

7. Prefill docstrings — run_agent.AIAgent and BatchRunner now document
   that Anthropic Sonnet/Opus 4.6+ reject a trailing assistant-role
   prefill (400).

8. Tests — 4 new tests in test_anthropic_adapter covering display
   default, xhigh preservation, max on 4.7, refusal / context-overflow
   stop_reason mapping, plus the sampling-param predicate. test_model_metadata
   accepts 4.7 at 1M context.

Tested on macOS 15.5 (darwin). 119 tests pass in
tests/agent/test_anthropic_adapter.py, 1320 pass in tests/agent/.

											
										
										
											2026-04-16 12:35:43 -05:00
+								    # the aux model is flipped to 4.7.
 								    if temperature is not None:
 								        from agent.anthropic_adapter import _forbids_sampling_params
 								        if _forbids_sampling_params(model):
 								            temperature = None
-												feat: call_llm/async_call_llm + config slots + migrate all consumers

Add centralized call_llm() and async_call_llm() functions that own the
full LLM request lifecycle:
  1. Resolve provider + model from task config or explicit args
  2. Get or create a cached client for that provider
  3. Format request args (max_tokens handling, provider extra_body)
  4. Make the API call with max_tokens/max_completion_tokens retry
  5. Return the response

Config: expanded auxiliary section with provider:model slots for all
tasks (compression, vision, web_extract, session_search, skills_hub,
mcp, flush_memories). Config version bumped to 7.

Migrated all auxiliary consumers:
- context_compressor.py: uses call_llm(task='compression')
- vision_tools.py: uses async_call_llm(task='vision')
- web_tools.py: uses async_call_llm(task='web_extract')
- session_search_tool.py: uses async_call_llm(task='session_search')
- browser_tool.py: uses call_llm(task='vision'/'web_extract')
- mcp_tool.py: uses call_llm(task='mcp')
- skills_guard.py: uses call_llm(provider='openrouter')
- run_agent.py flush_memories: uses call_llm(task='flush_memories')

Tests updated for context_compressor and MCP tool. Some test mocks
still need updating (15 remaining failures from mock pattern changes,
2 pre-existing).

											
										
										
											2026-03-11 20:52:19 -07:00
+								    if temperature is not None:
 								        kwargs["temperature"] = temperature
 								    if max_tokens is not None:
 								        # Codex adapter handles max_tokens internally; OpenRouter/Nous use max_tokens.
 								        # Direct OpenAI api.openai.com with newer models needs max_completion_tokens.
 								        if provider == "custom":
-												Merge origin/main into hermes/hermes-dd253d81

											
										
										
											2026-03-14 21:16:29 -07:00
+								            custom_base = base_url or _current_custom_base_url()
-												fix: extend hostname-match provider detection across remaining call sites

Aslaaen's fix in the original PR covered _detect_api_mode_for_url and the
two openai/xai sites in run_agent.py. This finishes the sweep: the same
substring-match false-positive class (e.g. https://api.openai.com.evil/v1,
https://proxy/api.openai.com/v1, https://api.anthropic.com.example/v1)
existed in eight more call sites, and the hostname helper was duplicated
in two modules.

- utils: add shared base_url_hostname() (single source of truth).
- hermes_cli/runtime_provider, run_agent: drop local duplicates, import
  from utils. Reuse the cached AIAgent._base_url_hostname attribute
  everywhere it's already populated.
- agent/auxiliary_client: switch codex-wrap auto-detect, max_completion_tokens
  gate (auxiliary_max_tokens_param), and custom-endpoint max_tokens kwarg
  selection to hostname equality.
- run_agent: native-anthropic check in the Claude-style model branch
  and in the AIAgent init provider-auto-detect branch.
- agent/model_metadata: Anthropic /v1/models context-length lookup.
- hermes_cli/providers.determine_api_mode: anthropic / openai URL
  heuristics for custom/unknown providers (the /anthropic path-suffix
  convention for third-party gateways is preserved).
- tools/delegate_tool: anthropic detection for delegated subagent
  runtimes.
- hermes_cli/setup, hermes_cli/tools_config: setup-wizard vision-endpoint
  native-OpenAI detection (paired with deduping the repeated check into
  a single is_native_openai boolean per branch).

Tests:
- tests/test_base_url_hostname.py covers the helper directly
  (path-containing-host, host-suffix, trailing dot, port, case).
- tests/hermes_cli/test_determine_api_mode_hostname.py adds the same
  regression class for determine_api_mode, plus a test that the
  /anthropic third-party gateway convention still wins.

Also: add asslaenn5@gmail.com → Aslaaen to scripts/release.py AUTHOR_MAP.

											
										
										
											2026-04-20 20:58:01 -07:00
+								            if base_url_hostname(custom_base) == "api.openai.com":
-												feat: call_llm/async_call_llm + config slots + migrate all consumers

Add centralized call_llm() and async_call_llm() functions that own the
full LLM request lifecycle:
  1. Resolve provider + model from task config or explicit args
  2. Get or create a cached client for that provider
  3. Format request args (max_tokens handling, provider extra_body)
  4. Make the API call with max_tokens/max_completion_tokens retry
  5. Return the response

Config: expanded auxiliary section with provider:model slots for all
tasks (compression, vision, web_extract, session_search, skills_hub,
mcp, flush_memories). Config version bumped to 7.

Migrated all auxiliary consumers:
- context_compressor.py: uses call_llm(task='compression')
- vision_tools.py: uses async_call_llm(task='vision')
- web_tools.py: uses async_call_llm(task='web_extract')
- session_search_tool.py: uses async_call_llm(task='session_search')
- browser_tool.py: uses call_llm(task='vision'/'web_extract')
- mcp_tool.py: uses call_llm(task='mcp')
- skills_guard.py: uses call_llm(provider='openrouter')
- run_agent.py flush_memories: uses call_llm(task='flush_memories')

Tests updated for context_compressor and MCP tool. Some test mocks
still need updating (15 remaining failures from mock pattern changes,
2 pre-existing).

											
										
										
											2026-03-11 20:52:19 -07:00
+								                kwargs["max_completion_tokens"] = max_tokens
 								            else:
 								                kwargs["max_tokens"] = max_tokens
 								        else:
 								            kwargs["max_tokens"] = max_tokens
 								    if tools:
 								        kwargs["tools"] = tools
 								    # Provider-specific extra_body
 								    merged_extra = dict(extra_body or {})
 								    if provider == "nous" or auxiliary_is_nous:
 								        merged_extra.setdefault("tags", []).extend(["product=hermes-agent"])
 								    if merged_extra:
 								        kwargs["extra_body"] = merged_extra
 								    return kwargs
-												fix(auxiliary): validate response shape in call_llm/async_call_llm (#7264)

async_call_llm (and call_llm) can return non-OpenAI objects from
custom providers or adapter shims, crashing downstream consumers
with misleading AttributeError ('str' has no attribute 'choices').

Add _validate_llm_response() that checks the response has the
expected .choices[0].message shape before returning. Wraps all
return paths in call_llm, async_call_llm, and fallback paths.
Fails fast with a clear RuntimeError identifying the task, response
type, and a preview of the malformed payload.

Closes #7264

											
										
										
											2026-04-11 13:53:00 +05:30
+								def _validate_llm_response(response: Any, task: str = None) -> Any:
 								    """Validate that an LLM response has the expected .choices[0].message shape.
 								    Fails fast with a clear error instead of letting malformed payloads
 								    propagate to downstream consumers where they crash with misleading
 								    AttributeError (e.g. "'str' object has no attribute 'choices'").
 								    See #7264.
 								    """
 								    if response is None:
 								        raise RuntimeError(
 								            f"Auxiliary {task or 'call'}: LLM returned None response"
 								        )
 								    # Allow SimpleNamespace responses from adapters (CodexAuxiliaryClient,
 								    # AnthropicAuxiliaryClient) — they have .choices[0].message.
 								    try:
 								        choices = response.choices
 								        if not choices or not hasattr(choices[0], "message"):
 								            raise AttributeError("missing choices[0].message")
 								    except (AttributeError, TypeError, IndexError) as exc:
 								        response_type = type(response).__name__
 								        response_preview = str(response)[:120]
 								        raise RuntimeError(
 								            f"Auxiliary {task or 'call'}: LLM returned invalid response "
 								            f"(type={response_type}): {response_preview!r}. "
 								            f"Expected object with .choices[0].message — check provider "
 								            f"adapter or custom endpoint compatibility."
 								        ) from exc
 								    return response
-												feat: call_llm/async_call_llm + config slots + migrate all consumers

Add centralized call_llm() and async_call_llm() functions that own the
full LLM request lifecycle:
  1. Resolve provider + model from task config or explicit args
  2. Get or create a cached client for that provider
  3. Format request args (max_tokens handling, provider extra_body)
  4. Make the API call with max_tokens/max_completion_tokens retry
  5. Return the response

Config: expanded auxiliary section with provider:model slots for all
tasks (compression, vision, web_extract, session_search, skills_hub,
mcp, flush_memories). Config version bumped to 7.

Migrated all auxiliary consumers:
- context_compressor.py: uses call_llm(task='compression')
- vision_tools.py: uses async_call_llm(task='vision')
- web_tools.py: uses async_call_llm(task='web_extract')
- session_search_tool.py: uses async_call_llm(task='session_search')
- browser_tool.py: uses call_llm(task='vision'/'web_extract')
- mcp_tool.py: uses call_llm(task='mcp')
- skills_guard.py: uses call_llm(provider='openrouter')
- run_agent.py flush_memories: uses call_llm(task='flush_memories')

Tests updated for context_compressor and MCP tool. Some test mocks
still need updating (15 remaining failures from mock pattern changes,
2 pre-existing).

											
										
										
											2026-03-11 20:52:19 -07:00
+								def call_llm(
 								    task: str = None,
 								    *,
 								    provider: str = None,
 								    model: str = None,
-												feat: add direct endpoint overrides for auxiliary and delegation

Add base_url/api_key overrides for auxiliary tasks and delegation so users can
route those flows straight to a custom OpenAI-compatible endpoint without
having to rely on provider=main or named custom providers.

Also clear gateway session env vars in test isolation so the full suite stays
deterministic when run from a messaging-backed agent session.

											
										
										
											2026-03-14 20:48:29 -07:00
+								    base_url: str = None,
 								    api_key: str = None,
-												fix(agent): route compression aux through live session runtime

											
										
										
											2026-04-12 00:10:19 -04:00
+								    main_runtime: Optional[Dict[str, Any]] = None,
-												feat: call_llm/async_call_llm + config slots + migrate all consumers

Add centralized call_llm() and async_call_llm() functions that own the
full LLM request lifecycle:
  1. Resolve provider + model from task config or explicit args
  2. Get or create a cached client for that provider
  3. Format request args (max_tokens handling, provider extra_body)
  4. Make the API call with max_tokens/max_completion_tokens retry
  5. Return the response

Config: expanded auxiliary section with provider:model slots for all
tasks (compression, vision, web_extract, session_search, skills_hub,
mcp, flush_memories). Config version bumped to 7.

Migrated all auxiliary consumers:
- context_compressor.py: uses call_llm(task='compression')
- vision_tools.py: uses async_call_llm(task='vision')
- web_tools.py: uses async_call_llm(task='web_extract')
- session_search_tool.py: uses async_call_llm(task='session_search')
- browser_tool.py: uses call_llm(task='vision'/'web_extract')
- mcp_tool.py: uses call_llm(task='mcp')
- skills_guard.py: uses call_llm(provider='openrouter')
- run_agent.py flush_memories: uses call_llm(task='flush_memories')

Tests updated for context_compressor and MCP tool. Some test mocks
still need updating (15 remaining failures from mock pattern changes,
2 pre-existing).

											
										
										
											2026-03-11 20:52:19 -07:00
+								    messages: list,
 								    temperature: float = None,
 								    max_tokens: int = None,
 								    tools: list = None,
-												feat(agent): configurable timeouts for auxiliary LLM calls via config.yaml (#3597)

Add per-task timeout settings under auxiliary.{task}.timeout in config.yaml
instead of hardcoded values. Users with slow local models (Ollama, llama.cpp)
can now increase timeouts for compression, vision, session search, etc.

Defaults:
  - auxiliary.compression.timeout: 120s (was hardcoded 45s)
  - auxiliary.vision.timeout: 30s (unchanged)
  - all other aux tasks: 30s (was hardcoded 30s)
  - title_generator: 30s (was hardcoded 15s)

call_llm/async_call_llm now auto-resolve timeout from config when not
explicitly passed. Callers can still override with an explicit timeout arg.

Based on PR #3406 by alanfwilliams. Converted from env vars to config.yaml
per project conventions.

Co-authored-by: alanfwilliams <alanfwilliams@users.noreply.github.com>
											
										
										
											2026-03-28 14:35:28 -07:00
+								    timeout: float = None,
-												feat: call_llm/async_call_llm + config slots + migrate all consumers

Add centralized call_llm() and async_call_llm() functions that own the
full LLM request lifecycle:
  1. Resolve provider + model from task config or explicit args
  2. Get or create a cached client for that provider
  3. Format request args (max_tokens handling, provider extra_body)
  4. Make the API call with max_tokens/max_completion_tokens retry
  5. Return the response

Config: expanded auxiliary section with provider:model slots for all
tasks (compression, vision, web_extract, session_search, skills_hub,
mcp, flush_memories). Config version bumped to 7.

Migrated all auxiliary consumers:
- context_compressor.py: uses call_llm(task='compression')
- vision_tools.py: uses async_call_llm(task='vision')
- web_tools.py: uses async_call_llm(task='web_extract')
- session_search_tool.py: uses async_call_llm(task='session_search')
- browser_tool.py: uses call_llm(task='vision'/'web_extract')
- mcp_tool.py: uses call_llm(task='mcp')
- skills_guard.py: uses call_llm(provider='openrouter')
- run_agent.py flush_memories: uses call_llm(task='flush_memories')

Tests updated for context_compressor and MCP tool. Some test mocks
still need updating (15 remaining failures from mock pattern changes,
2 pre-existing).

											
										
										
											2026-03-11 20:52:19 -07:00
+								    extra_body: dict = None,
 								) -> Any:
 								    """Centralized synchronous LLM call.
 								    Resolves provider + model (from task config, explicit args, or auto-detect),
 								    handles auth, request formatting, and model-specific arg adjustments.
 								    Args:
 								        task: Auxiliary task name ("compression", "vision", "web_extract",
-												refactor(memory): remove flush_memories entirely (#15696)

The AIAgent.flush_memories pre-compression save, the gateway
_flush_memories_for_session, and everything feeding them are
obsolete now that the background memory/skill review handles
persistent memory extraction.

Problems with flush_memories:

- Pre-dates the background review loop.  It was the only memory-save
  path when introduced; the background review now fires every 10 user
  turns on CLI and gateway alike, which is far more frequent than
  compression or session reset ever triggered flush.
- Blocking and synchronous.  Pre-compression flush ran on the live agent
  before compression, blocking the user-visible response.
- Cache-breaking.  Flush built a temporary conversation prefix
  (system prompt + memory-only tool list) that diverged from the live
  conversation's cached prefix, invalidating prompt caching.  The
  gateway variant spawned a fresh AIAgent with its own clean prompt
  for each finalized session — still cache-breaking, just in a
  different process.
- Redundant.  Background review runs in the live conversation's
  session context, gets the same content, writes to the same memory
  store, and doesn't break the cache.  Everything flush_memories
  claimed to preserve is already covered.

What this removes:

- AIAgent.flush_memories() method (~248 LOC in run_agent.py)
- Pre-compression flush call in _compress_context
- flush_memories call sites in cli.py (/new + exit)
- GatewayRunner._flush_memories_for_session + _async_flush_memories
  (and the 3 call sites: session expiry watcher, /new, /resume)
- 'flush_memories' entry from DEFAULT_CONFIG auxiliary tasks,
  hermes tools UI task list, auxiliary_client docstrings
- _memory_flush_min_turns config + init
- #15631's headroom-deduction math in
  _check_compression_model_feasibility (headroom was only needed
  because flush dragged the full main-agent system prompt along;
  the compression summariser sends a single user-role prompt so
  new_threshold = aux_context is safe again)
- The dedicated test files and assertions that exercised
  flush-specific paths

What this renames (with read-time backcompat on sessions.json):

- SessionEntry.memory_flushed -> SessionEntry.expiry_finalized.
  The session-expiry watcher still uses the flag to avoid re-running
  finalize/eviction on the same expired session; the new name
  reflects what it now actually gates.  from_dict() reads
  'expiry_finalized' first, falls back to the legacy 'memory_flushed'
  key so existing sessions.json files upgrade seamlessly.

Supersedes #15631 and #15638.

Tested: 383 targeted tests pass across run_agent/, agent/, cli/,
and gateway/ session-boundary suites.  No behavior regressions —
background memory review continues to handle persistent memory
extraction on both CLI and gateway.
											
										
										
											2026-04-25 08:21:14 -07:00
+								              "session_search", "skills_hub", "mcp", "title_generation").
-												feat: call_llm/async_call_llm + config slots + migrate all consumers

Add centralized call_llm() and async_call_llm() functions that own the
full LLM request lifecycle:
  1. Resolve provider + model from task config or explicit args
  2. Get or create a cached client for that provider
  3. Format request args (max_tokens handling, provider extra_body)
  4. Make the API call with max_tokens/max_completion_tokens retry
  5. Return the response

Config: expanded auxiliary section with provider:model slots for all
tasks (compression, vision, web_extract, session_search, skills_hub,
mcp, flush_memories). Config version bumped to 7.

Migrated all auxiliary consumers:
- context_compressor.py: uses call_llm(task='compression')
- vision_tools.py: uses async_call_llm(task='vision')
- web_tools.py: uses async_call_llm(task='web_extract')
- session_search_tool.py: uses async_call_llm(task='session_search')
- browser_tool.py: uses call_llm(task='vision'/'web_extract')
- mcp_tool.py: uses call_llm(task='mcp')
- skills_guard.py: uses call_llm(provider='openrouter')
- run_agent.py flush_memories: uses call_llm(task='flush_memories')

Tests updated for context_compressor and MCP tool. Some test mocks
still need updating (15 remaining failures from mock pattern changes,
2 pre-existing).

											
										
										
											2026-03-11 20:52:19 -07:00
+								              Reads provider:model from config/env. Ignored if provider is set.
 								        provider: Explicit provider override.
 								        model: Explicit model override.
 								        messages: Chat messages list.
 								        temperature: Sampling temperature (None = provider default).
 								        max_tokens: Max output tokens (handles max_tokens vs max_completion_tokens).
 								        tools: Tool definitions (for function calling).
-												feat(agent): configurable timeouts for auxiliary LLM calls via config.yaml (#3597)

Add per-task timeout settings under auxiliary.{task}.timeout in config.yaml
instead of hardcoded values. Users with slow local models (Ollama, llama.cpp)
can now increase timeouts for compression, vision, session search, etc.

Defaults:
  - auxiliary.compression.timeout: 120s (was hardcoded 45s)
  - auxiliary.vision.timeout: 30s (unchanged)
  - all other aux tasks: 30s (was hardcoded 30s)
  - title_generator: 30s (was hardcoded 15s)

call_llm/async_call_llm now auto-resolve timeout from config when not
explicitly passed. Callers can still override with an explicit timeout arg.

Based on PR #3406 by alanfwilliams. Converted from env vars to config.yaml
per project conventions.

Co-authored-by: alanfwilliams <alanfwilliams@users.noreply.github.com>
											
										
										
											2026-03-28 14:35:28 -07:00
+								        timeout: Request timeout in seconds (None = read from auxiliary.{task}.timeout config).
-												feat: call_llm/async_call_llm + config slots + migrate all consumers

Add centralized call_llm() and async_call_llm() functions that own the
full LLM request lifecycle:
  1. Resolve provider + model from task config or explicit args
  2. Get or create a cached client for that provider
  3. Format request args (max_tokens handling, provider extra_body)
  4. Make the API call with max_tokens/max_completion_tokens retry
  5. Return the response

Config: expanded auxiliary section with provider:model slots for all
tasks (compression, vision, web_extract, session_search, skills_hub,
mcp, flush_memories). Config version bumped to 7.

Migrated all auxiliary consumers:
- context_compressor.py: uses call_llm(task='compression')
- vision_tools.py: uses async_call_llm(task='vision')
- web_tools.py: uses async_call_llm(task='web_extract')
- session_search_tool.py: uses async_call_llm(task='session_search')
- browser_tool.py: uses call_llm(task='vision'/'web_extract')
- mcp_tool.py: uses call_llm(task='mcp')
- skills_guard.py: uses call_llm(provider='openrouter')
- run_agent.py flush_memories: uses call_llm(task='flush_memories')

Tests updated for context_compressor and MCP tool. Some test mocks
still need updating (15 remaining failures from mock pattern changes,
2 pre-existing).

											
										
										
											2026-03-11 20:52:19 -07:00
+								        extra_body: Additional request body fields.
 								    Returns:
 								        Response object with .choices[0].message.content
 								    Raises:
 								        RuntimeError: If no provider is configured.
 								    """
-												fix(auxiliary): honor api_mode in auxiliary client (#6800)

The auxiliary client always calls client.chat.completions.create(),
ignoring the api_mode config flag. This breaks codex-family models
(e.g. gpt-5.3-codex) on direct OpenAI API keys, which need the
/v1/responses endpoint.

Changes:
- Expand _resolve_task_provider_model to return api_mode (5-tuple)
- Read api_mode from auxiliary.{task}.api_mode config and env vars
  (AUXILIARY_{TASK}_API_MODE)
- Pass api_mode through _get_cached_client to resolve_provider_client
- Add _needs_codex_wrap/_wrap_if_needed helpers that wrap plain OpenAI
  clients in CodexAuxiliaryClient when api_mode=codex_responses or
  when auto-detection finds api.openai.com + codex model pattern
- Apply wrapping at all custom endpoint, named custom provider, and
  API-key provider return paths
- Update test mocks for the new 5-tuple return format

Users can now set:
  auxiliary:
    compression:
      model: gpt-5.3-codex
      base_url: https://api.openai.com/v1
      api_mode: codex_responses

Closes #6800

											
										
										
											2026-04-11 13:50:43 +05:30
+								    resolved_provider, resolved_model, resolved_base_url, resolved_api_key, resolved_api_mode = _resolve_task_provider_model(
-												feat: add direct endpoint overrides for auxiliary and delegation

Add base_url/api_key overrides for auxiliary tasks and delegation so users can
route those flows straight to a custom OpenAI-compatible endpoint without
having to rely on provider=main or named custom providers.

Also clear gateway session env vars in test isolation so the full suite stays
deterministic when run from a messaging-backed agent session.

											
										
										
											2026-03-14 20:48:29 -07:00
+								        task, provider, model, base_url, api_key)
-												fix(aux): add session_search extra_body and concurrency controls

Adds auxiliary.<task>.extra_body config passthrough so reasoning-heavy
OpenAI-compatible providers can receive provider-specific request fields
(e.g. enable_thinking: false on GLM) on auxiliary calls, and bounds
session_search summary fan-out with auxiliary.session_search.max_concurrency
(default 3, clamped 1-5) to avoid 429 bursts on small providers.

- agent/auxiliary_client.py: extract _get_auxiliary_task_config helper,
  add _get_task_extra_body, merge config+explicit extra_body with explicit winning
- hermes_cli/config.py: extra_body defaults on all aux tasks +
  session_search.max_concurrency; _config_version 19 -> 20
- tools/session_search_tool.py: semaphore around _summarize_all gather
- tests: coverage in test_auxiliary_client, test_session_search, test_aux_config
- docs: user-guide/configuration.md + fallback-providers.md

Co-authored-by: Teknium <teknium@nousresearch.com>

											
										
										
											2026-04-20 00:44:32 -07:00
+								    effective_extra_body = _get_task_extra_body(task)
 								    effective_extra_body.update(extra_body or {})
-												feat: call_llm/async_call_llm + config slots + migrate all consumers

Add centralized call_llm() and async_call_llm() functions that own the
full LLM request lifecycle:
  1. Resolve provider + model from task config or explicit args
  2. Get or create a cached client for that provider
  3. Format request args (max_tokens handling, provider extra_body)
  4. Make the API call with max_tokens/max_completion_tokens retry
  5. Return the response

Config: expanded auxiliary section with provider:model slots for all
tasks (compression, vision, web_extract, session_search, skills_hub,
mcp, flush_memories). Config version bumped to 7.

Migrated all auxiliary consumers:
- context_compressor.py: uses call_llm(task='compression')
- vision_tools.py: uses async_call_llm(task='vision')
- web_tools.py: uses async_call_llm(task='web_extract')
- session_search_tool.py: uses async_call_llm(task='session_search')
- browser_tool.py: uses call_llm(task='vision'/'web_extract')
- mcp_tool.py: uses call_llm(task='mcp')
- skills_guard.py: uses call_llm(provider='openrouter')
- run_agent.py flush_memories: uses call_llm(task='flush_memories')

Tests updated for context_compressor and MCP tool. Some test mocks
still need updating (15 remaining failures from mock pattern changes,
2 pre-existing).

											
										
										
											2026-03-11 20:52:19 -07:00
-												refactor: unify vision backend gating

											
										
										
											2026-03-14 20:22:13 -07:00
+								    if task == "vision":
 								        effective_provider, client, final_model = resolve_vision_provider_client(
-												fix: pass resolved args to resolve_vision_provider_client()

resolve_vision_provider_client() was receiving the raw call_llm
parameters instead of the resolved provider/model/key/url from
_resolve_task_provider_model(). This caused config overrides
(auxiliary.vision.provider, etc.) to be silently discarded.

Cherry-picked from #10901 by @lrawnsley.

											
										
										
											2026-04-16 19:24:21 +05:30
+								            provider=resolved_provider if resolved_provider != "auto" else provider,
 								            model=resolved_model or model,
 								            base_url=resolved_base_url or base_url,
 								            api_key=resolved_api_key or api_key,
-												refactor: unify vision backend gating

											
										
										
											2026-03-14 20:22:13 -07:00
+								            async_mode=False,
 								        )
-												feat: add direct endpoint overrides for auxiliary and delegation

Add base_url/api_key overrides for auxiliary tasks and delegation so users can
route those flows straight to a custom OpenAI-compatible endpoint without
having to rely on provider=main or named custom providers.

Also clear gateway session env vars in test isolation so the full suite stays
deterministic when run from a messaging-backed agent session.

											
										
										
											2026-03-14 20:48:29 -07:00
+								        if client is None and resolved_provider != "auto" and not resolved_base_url:
-												refactor: unify vision backend gating

											
										
										
											2026-03-14 20:22:13 -07:00
+								            logger.warning(
 								                "Vision provider %s unavailable, falling back to auto vision backends",
 								                resolved_provider,
 								            )
 								            effective_provider, client, final_model = resolve_vision_provider_client(
 								                provider="auto",
 								                model=resolved_model,
 								                async_mode=False,
 								            )
 								        if client is None:
 								            raise RuntimeError(
 								                f"No LLM provider configured for task={task} provider={resolved_provider}. "
 								                f"Run: hermes setup"
 								            )
 								        resolved_provider = effective_provider or resolved_provider
 								    else:
-												feat: add direct endpoint overrides for auxiliary and delegation

Add base_url/api_key overrides for auxiliary tasks and delegation so users can
route those flows straight to a custom OpenAI-compatible endpoint without
having to rely on provider=main or named custom providers.

Also clear gateway session env vars in test isolation so the full suite stays
deterministic when run from a messaging-backed agent session.

											
										
										
											2026-03-14 20:48:29 -07:00
+								        client, final_model = _get_cached_client(
 								            resolved_provider,
 								            resolved_model,
 								            base_url=resolved_base_url,
 								            api_key=resolved_api_key,
-												fix(auxiliary): honor api_mode in auxiliary client (#6800)

The auxiliary client always calls client.chat.completions.create(),
ignoring the api_mode config flag. This breaks codex-family models
(e.g. gpt-5.3-codex) on direct OpenAI API keys, which need the
/v1/responses endpoint.

Changes:
- Expand _resolve_task_provider_model to return api_mode (5-tuple)
- Read api_mode from auxiliary.{task}.api_mode config and env vars
  (AUXILIARY_{TASK}_API_MODE)
- Pass api_mode through _get_cached_client to resolve_provider_client
- Add _needs_codex_wrap/_wrap_if_needed helpers that wrap plain OpenAI
  clients in CodexAuxiliaryClient when api_mode=codex_responses or
  when auto-detection finds api.openai.com + codex model pattern
- Apply wrapping at all custom endpoint, named custom provider, and
  API-key provider return paths
- Update test mocks for the new 5-tuple return format

Users can now set:
  auxiliary:
    compression:
      model: gpt-5.3-codex
      base_url: https://api.openai.com/v1
      api_mode: codex_responses

Closes #6800

											
										
										
											2026-04-11 13:50:43 +05:30
+								            api_mode=resolved_api_mode,
-												fix(agent): route compression aux through live session runtime

											
										
										
											2026-04-12 00:10:19 -04:00
+								            main_runtime=main_runtime,
-												feat: add direct endpoint overrides for auxiliary and delegation

Add base_url/api_key overrides for auxiliary tasks and delegation so users can
route those flows straight to a custom OpenAI-compatible endpoint without
having to rely on provider=main or named custom providers.

Also clear gateway session env vars in test isolation so the full suite stays
deterministic when run from a messaging-backed agent session.

											
										
										
											2026-03-14 20:48:29 -07:00
+								        )
-												refactor: unify vision backend gating

											
										
										
											2026-03-14 20:22:13 -07:00
+								        if client is None:
-												fix: fail fast when explicit provider has no API key instead of silent OpenRouter fallback (#2445)

When a non-OpenRouter provider (e.g. minimax, anthropic) is set in
config.yaml but its API key is missing, Hermes silently fell back to
OpenRouter, causing confusing 404 errors.

Now checks if the user explicitly configured a provider before falling
back. Explicit providers raise RuntimeError with a clear message naming
the missing env var. Auto/openrouter/custom providers still fall through
to OpenRouter as before.

Three code paths fixed:
- run_agent.py AIAgent.__init__ — main client initialization
- auxiliary_client.py call_llm — sync auxiliary calls
- auxiliary_client.py call_llm_streaming — async auxiliary calls

Based on PR #2272 by @StefanIsMe. Applied manually to fix a
pconfig NameError in the original and extend to call_llm_streaming.

Co-authored-by: StefanIsMe <StefanIsMe@users.noreply.github.com>
											
										
										
											2026-03-22 03:59:29 -07:00
+								            # When the user explicitly chose a non-OpenRouter provider but no
 								            # credentials were found, fail fast instead of silently routing
 								            # through OpenRouter (which causes confusing 404s).
 								            _explicit = (resolved_provider or "").strip().lower()
 								            if _explicit and _explicit not in ("auto", "openrouter", "custom"):
 								                raise RuntimeError(
 								                    f"Provider '{_explicit}' is set in config.yaml but no API key "
 								                    f"was found. Set the {_explicit.upper()}_API_KEY environment "
 								                    f"variable, or switch to a different provider with `hermes model`."
 								                )
-												fix: auxiliary client payment fallback — retry with next provider on 402 (#5599)

When a user runs out of OpenRouter credits and switches to Codex (or any
other provider), auxiliary tasks (compression, vision, web_extract) would
still try OpenRouter first and fail with 402.  Two fixes:

1. Payment fallback in call_llm(): When a resolved provider returns HTTP 402
   or a credit-related error, automatically retry with the next available
   provider in the auto-detection chain.  Skips the depleted provider and
   tries Nous → Custom → Codex → API-key providers.

2. Remove hardcoded OpenRouter fallback: The old code fell back specifically
   to OpenRouter when auto/custom resolution returned no client.  Now falls
   back to the full auto-detection chain, which handles any available
   provider — not just OpenRouter.

Also extracts _get_provider_chain() as a shared function (replaces inline
tuple in _resolve_auto and the new fallback), built at call time so test
patches on _try_* functions remain visible.

Adds 16 tests covering _is_payment_error(), _get_provider_chain(),
_try_payment_fallback(), and call_llm() integration with 402 retry.
											
										
										
											2026-04-06 12:41:40 -07:00
+								            # For auto/custom with no credentials, try the full auto chain
 								            # rather than hardcoding OpenRouter (which may be depleted).
 								            # Pass model=None so each provider uses its own default —
 								            # resolved_model may be an OpenRouter-format slug that doesn't
 								            # work on other providers.
-												fix: fail fast when explicit provider has no API key instead of silent OpenRouter fallback (#2445)

When a non-OpenRouter provider (e.g. minimax, anthropic) is set in
config.yaml but its API key is missing, Hermes silently fell back to
OpenRouter, causing confusing 404 errors.

Now checks if the user explicitly configured a provider before falling
back. Explicit providers raise RuntimeError with a clear message naming
the missing env var. Auto/openrouter/custom providers still fall through
to OpenRouter as before.

Three code paths fixed:
- run_agent.py AIAgent.__init__ — main client initialization
- auxiliary_client.py call_llm — sync auxiliary calls
- auxiliary_client.py call_llm_streaming — async auxiliary calls

Based on PR #2272 by @StefanIsMe. Applied manually to fix a
pconfig NameError in the original and extend to call_llm_streaming.

Co-authored-by: StefanIsMe <StefanIsMe@users.noreply.github.com>
											
										
										
											2026-03-22 03:59:29 -07:00
+								            if not resolved_base_url:
-												fix: auxiliary client payment fallback — retry with next provider on 402 (#5599)

When a user runs out of OpenRouter credits and switches to Codex (or any
other provider), auxiliary tasks (compression, vision, web_extract) would
still try OpenRouter first and fail with 402.  Two fixes:

1. Payment fallback in call_llm(): When a resolved provider returns HTTP 402
   or a credit-related error, automatically retry with the next available
   provider in the auto-detection chain.  Skips the depleted provider and
   tries Nous → Custom → Codex → API-key providers.

2. Remove hardcoded OpenRouter fallback: The old code fell back specifically
   to OpenRouter when auto/custom resolution returned no client.  Now falls
   back to the full auto-detection chain, which handles any available
   provider — not just OpenRouter.

Also extracts _get_provider_chain() as a shared function (replaces inline
tuple in _resolve_auto and the new fallback), built at call time so test
patches on _try_* functions remain visible.

Adds 16 tests covering _is_payment_error(), _get_provider_chain(),
_try_payment_fallback(), and call_llm() integration with 402 retry.
											
										
										
											2026-04-06 12:41:40 -07:00
+								                logger.info("Auxiliary %s: provider %s unavailable, trying auto-detection chain",
-												fix: add INFO-level logging for auxiliary provider resolution (#3866)

The auxiliary client's auto-detection chain was a black box — when
compression, summarization, or memory flush failed, the only clue was
a generic 'Request timed out' with no indication of which provider was
tried or why it was skipped.

Now logs at INFO level:
- 'Auxiliary auto-detect: using local/custom (qwen3.5-9b) — skipped:
  openrouter, nous' when auto-detection picks a provider
- 'Auxiliary compression: using auto (qwen3.5-9b) at http://localhost:11434/v1'
  before each auxiliary call
- 'Auxiliary compression: provider custom unavailable, falling back to
  openrouter' on fallback
- Clear warning with actionable guidance when NO provider is available:
  'Set OPENROUTER_API_KEY or configure a local model in config.yaml'
											
										
										
											2026-03-29 21:29:00 -07:00
+								                            task or "call", resolved_provider)
-												fix(agent): route compression aux through live session runtime

											
										
										
											2026-04-12 00:10:19 -04:00
+								                client, final_model = _get_cached_client("auto", main_runtime=main_runtime)
-												refactor: unify vision backend gating

											
										
										
											2026-03-14 20:22:13 -07:00
+								        if client is None:
 								            raise RuntimeError(
 								                f"No LLM provider configured for task={task} provider={resolved_provider}. "
 								                f"Run: hermes setup")
-												feat: call_llm/async_call_llm + config slots + migrate all consumers

Add centralized call_llm() and async_call_llm() functions that own the
full LLM request lifecycle:
  1. Resolve provider + model from task config or explicit args
  2. Get or create a cached client for that provider
  3. Format request args (max_tokens handling, provider extra_body)
  4. Make the API call with max_tokens/max_completion_tokens retry
  5. Return the response

Config: expanded auxiliary section with provider:model slots for all
tasks (compression, vision, web_extract, session_search, skills_hub,
mcp, flush_memories). Config version bumped to 7.

Migrated all auxiliary consumers:
- context_compressor.py: uses call_llm(task='compression')
- vision_tools.py: uses async_call_llm(task='vision')
- web_tools.py: uses async_call_llm(task='web_extract')
- session_search_tool.py: uses async_call_llm(task='session_search')
- browser_tool.py: uses call_llm(task='vision'/'web_extract')
- mcp_tool.py: uses call_llm(task='mcp')
- skills_guard.py: uses call_llm(provider='openrouter')
- run_agent.py flush_memories: uses call_llm(task='flush_memories')

Tests updated for context_compressor and MCP tool. Some test mocks
still need updating (15 remaining failures from mock pattern changes,
2 pre-existing).

											
										
										
											2026-03-11 20:52:19 -07:00
-												feat(agent): configurable timeouts for auxiliary LLM calls via config.yaml (#3597)

Add per-task timeout settings under auxiliary.{task}.timeout in config.yaml
instead of hardcoded values. Users with slow local models (Ollama, llama.cpp)
can now increase timeouts for compression, vision, session search, etc.

Defaults:
  - auxiliary.compression.timeout: 120s (was hardcoded 45s)
  - auxiliary.vision.timeout: 30s (unchanged)
  - all other aux tasks: 30s (was hardcoded 30s)
  - title_generator: 30s (was hardcoded 15s)

call_llm/async_call_llm now auto-resolve timeout from config when not
explicitly passed. Callers can still override with an explicit timeout arg.

Based on PR #3406 by alanfwilliams. Converted from env vars to config.yaml
per project conventions.

Co-authored-by: alanfwilliams <alanfwilliams@users.noreply.github.com>
											
										
										
											2026-03-28 14:35:28 -07:00
+								    effective_timeout = timeout if timeout is not None else _get_task_timeout(task)
-												fix: add INFO-level logging for auxiliary provider resolution (#3866)

The auxiliary client's auto-detection chain was a black box — when
compression, summarization, or memory flush failed, the only clue was
a generic 'Request timed out' with no indication of which provider was
tried or why it was skipped.

Now logs at INFO level:
- 'Auxiliary auto-detect: using local/custom (qwen3.5-9b) — skipped:
  openrouter, nous' when auto-detection picks a provider
- 'Auxiliary compression: using auto (qwen3.5-9b) at http://localhost:11434/v1'
  before each auxiliary call
- 'Auxiliary compression: provider custom unavailable, falling back to
  openrouter' on fallback
- Clear warning with actionable guidance when NO provider is available:
  'Set OPENROUTER_API_KEY or configure a local model in config.yaml'
											
										
										
											2026-03-29 21:29:00 -07:00
+								    # Log what we're about to do — makes auxiliary operations visible
 								    _base_info = str(getattr(client, "base_url", resolved_base_url) or "")
 								    if task:
 								        logger.info("Auxiliary %s: using %s (%s)%s",
 								                     task, resolved_provider or "auto", final_model or "default",
 								                     f" at {_base_info}" if _base_info and "openrouter" not in _base_info else "")
-												fix(kimi): route temperature override by base_url — kimi-k2.5 needs 1.0 on api.moonshot.ai

Follow-up to #12144.  That PR standardized the kimi-k2.* temperature lock
against the Coding Plan endpoint (api.kimi.com/coding/v1) docs, where
non-thinking models require 0.6.  Verified empirically against Moonshot
(April 2026) that the public chat endpoint (api.moonshot.ai/v1) has a
different contract for kimi-k2.5: it only accepts temperature=1, and rejects
0.6 with:

    HTTP 400 "invalid temperature: only 1 is allowed for this model"

Users hit the public endpoint when KIMI_API_KEY is a legacy sk-* key (the
sk-kimi-* prefix routes to Coding Plan — see hermes_cli/auth.py).  So for
Coding Plan subscribers the fix from #12144 is correct, but for public-API
users it reintroduces the exact 400 reported in #9125.

Reproduction on api.moonshot.ai/v1 + kimi-k2.5:
  temperature=1.0 → 200 OK
  temperature=0.6 → 400 "only 1 is allowed"     ← #12144 default
  temperature=None → 200 OK

Other kimi-k2.* models are unaffected empirically — turbo-preview accepts
0.6 and thinking-turbo accepts 1.0 on both endpoints — so only kimi-k2.5
diverges.

Fix: thread the client's actual base_url through _build_call_kwargs (the
parameter already existed but callers passed config-level resolved_base_url;
for auto-detected routes that was often empty).  _fixed_temperature_for_model
now checks api.moonshot.ai first via an explicit _KIMI_PUBLIC_API_OVERRIDES
map, then falls back to the Coding Plan defaults.  Tests parametrize over
endpoint + model to lock both contracts.

Closes #9125.

											
										
										
											2026-04-20 04:18:49 +09:00
+								    # Pass the client's actual base_url (not just resolved_base_url) so
 								    # endpoint-specific temperature overrides can distinguish
 								    # api.moonshot.ai vs api.kimi.com/coding even on auto-detected routes.
-												feat: call_llm/async_call_llm + config slots + migrate all consumers

Add centralized call_llm() and async_call_llm() functions that own the
full LLM request lifecycle:
  1. Resolve provider + model from task config or explicit args
  2. Get or create a cached client for that provider
  3. Format request args (max_tokens handling, provider extra_body)
  4. Make the API call with max_tokens/max_completion_tokens retry
  5. Return the response

Config: expanded auxiliary section with provider:model slots for all
tasks (compression, vision, web_extract, session_search, skills_hub,
mcp, flush_memories). Config version bumped to 7.

Migrated all auxiliary consumers:
- context_compressor.py: uses call_llm(task='compression')
- vision_tools.py: uses async_call_llm(task='vision')
- web_tools.py: uses async_call_llm(task='web_extract')
- session_search_tool.py: uses async_call_llm(task='session_search')
- browser_tool.py: uses call_llm(task='vision'/'web_extract')
- mcp_tool.py: uses call_llm(task='mcp')
- skills_guard.py: uses call_llm(provider='openrouter')
- run_agent.py flush_memories: uses call_llm(task='flush_memories')

Tests updated for context_compressor and MCP tool. Some test mocks
still need updating (15 remaining failures from mock pattern changes,
2 pre-existing).

											
										
										
											2026-03-11 20:52:19 -07:00
+								    kwargs = _build_call_kwargs(
 								        resolved_provider, final_model, messages,
 								        temperature=temperature, max_tokens=max_tokens,
-												fix(aux): add session_search extra_body and concurrency controls

Adds auxiliary.<task>.extra_body config passthrough so reasoning-heavy
OpenAI-compatible providers can receive provider-specific request fields
(e.g. enable_thinking: false on GLM) on auxiliary calls, and bounds
session_search summary fan-out with auxiliary.session_search.max_concurrency
(default 3, clamped 1-5) to avoid 429 bursts on small providers.

- agent/auxiliary_client.py: extract _get_auxiliary_task_config helper,
  add _get_task_extra_body, merge config+explicit extra_body with explicit winning
- hermes_cli/config.py: extra_body defaults on all aux tasks +
  session_search.max_concurrency; _config_version 19 -> 20
- tools/session_search_tool.py: semaphore around _summarize_all gather
- tests: coverage in test_auxiliary_client, test_session_search, test_aux_config
- docs: user-guide/configuration.md + fallback-providers.md

Co-authored-by: Teknium <teknium@nousresearch.com>

											
										
										
											2026-04-20 00:44:32 -07:00
+								        tools=tools, timeout=effective_timeout, extra_body=effective_extra_body,
-												fix(kimi): route temperature override by base_url — kimi-k2.5 needs 1.0 on api.moonshot.ai

Follow-up to #12144.  That PR standardized the kimi-k2.* temperature lock
against the Coding Plan endpoint (api.kimi.com/coding/v1) docs, where
non-thinking models require 0.6.  Verified empirically against Moonshot
(April 2026) that the public chat endpoint (api.moonshot.ai/v1) has a
different contract for kimi-k2.5: it only accepts temperature=1, and rejects
0.6 with:

    HTTP 400 "invalid temperature: only 1 is allowed for this model"

Users hit the public endpoint when KIMI_API_KEY is a legacy sk-* key (the
sk-kimi-* prefix routes to Coding Plan — see hermes_cli/auth.py).  So for
Coding Plan subscribers the fix from #12144 is correct, but for public-API
users it reintroduces the exact 400 reported in #9125.

Reproduction on api.moonshot.ai/v1 + kimi-k2.5:
  temperature=1.0 → 200 OK
  temperature=0.6 → 400 "only 1 is allowed"     ← #12144 default
  temperature=None → 200 OK

Other kimi-k2.* models are unaffected empirically — turbo-preview accepts
0.6 and thinking-turbo accepts 1.0 on both endpoints — so only kimi-k2.5
diverges.

Fix: thread the client's actual base_url through _build_call_kwargs (the
parameter already existed but callers passed config-level resolved_base_url;
for auto-detected routes that was often empty).  _fixed_temperature_for_model
now checks api.moonshot.ai first via an explicit _KIMI_PUBLIC_API_OVERRIDES
map, then falls back to the Coding Plan defaults.  Tests parametrize over
endpoint + model to lock both contracts.

Closes #9125.

											
										
										
											2026-04-20 04:18:49 +09:00
+								        base_url=_base_info or resolved_base_url)
-												feat: call_llm/async_call_llm + config slots + migrate all consumers

Add centralized call_llm() and async_call_llm() functions that own the
full LLM request lifecycle:
  1. Resolve provider + model from task config or explicit args
  2. Get or create a cached client for that provider
  3. Format request args (max_tokens handling, provider extra_body)
  4. Make the API call with max_tokens/max_completion_tokens retry
  5. Return the response

Config: expanded auxiliary section with provider:model slots for all
tasks (compression, vision, web_extract, session_search, skills_hub,
mcp, flush_memories). Config version bumped to 7.

Migrated all auxiliary consumers:
- context_compressor.py: uses call_llm(task='compression')
- vision_tools.py: uses async_call_llm(task='vision')
- web_tools.py: uses async_call_llm(task='web_extract')
- session_search_tool.py: uses async_call_llm(task='session_search')
- browser_tool.py: uses call_llm(task='vision'/'web_extract')
- mcp_tool.py: uses call_llm(task='mcp')
- skills_guard.py: uses call_llm(provider='openrouter')
- run_agent.py flush_memories: uses call_llm(task='flush_memories')

Tests updated for context_compressor and MCP tool. Some test mocks
still need updating (15 remaining failures from mock pattern changes,
2 pre-existing).

											
										
										
											2026-03-11 20:52:19 -07:00
-												fix: three provider-related bugs (#8161, #8181, #8147) (#8243)

- Add openai/openai-codex -> openai mapping to PROVIDER_TO_MODELS_DEV
  so context-length lookups use models.dev data instead of 128k fallback.
  Fixes #8161.

- Set api_mode from custom_providers entry when switching via hermes model,
  and clear stale api_mode when the entry has none. Also extract api_mode
  in _named_custom_provider_map(). Fixes #8181.

- Convert OpenAI image_url content blocks to Anthropic image blocks when
  the endpoint is Anthropic-compatible (MiniMax, MiniMax-CN, or any URL
  containing /anthropic). Fixes #8147.
											
										
										
											2026-04-12 01:44:18 -07:00
+								    # Convert image blocks for Anthropic-compatible endpoints (e.g. MiniMax)
 								    _client_base = str(getattr(client, "base_url", "") or "")
 								    if _is_anthropic_compat_endpoint(resolved_provider, _client_base):
 								        kwargs["messages"] = _convert_openai_images_to_anthropic(kwargs["messages"])
-												fix(auxiliary): retry without temperature when any provider rejects it

Universal reactive fix for 'HTTP 400: Unsupported parameter: temperature'
across all providers/models — not just Codex Responses.

The same backend can accept temperature for some models and reject it for
others (e.g. gpt-5.4 accepts but gpt-5.5 rejects on the same OpenAI
endpoint; similar patterns on Copilot, OpenRouter reasoning routes, and
Anthropic Opus 4.7+ via OAI-compat). An allow/deny-list by model name does
not scale.

call_llm / async_call_llm now detect the concrete 'unsupported parameter:
temperature' 400 and transparently retry once without temperature. Kimi's
server-managed omission and Opus 4.7+'s proactive strip stay in place —
this is the safety net for everything else.

Changes:
- agent/auxiliary_client.py: add _is_unsupported_temperature_error helper;
  wire into both sync and async call_llm paths before the existing
  max_tokens/payment/auth retry ladder
- tests/agent/test_unsupported_temperature_retry.py: 19 tests covering
  detector phrasings, sync + async retry, no-retry-without-temperature,
  and non-temperature 400s not triggering the retry

Builds on PR #15620 (codex_responses fallback) which stripped temperature
up front for that one api_mode. This PR closes the gap for every other
provider/model combo via reactive retry.

Credit: retry approach and detector originate from @BlueBirdBack's PR #15578.

Co-authored-by: BlueBirdBack <BlueBirdBack@users.noreply.github.com>

											
										
										
											2026-04-25 05:25:09 -07:00
+								    # Handle unsupported temperature, max_tokens vs max_completion_tokens retry,
 								    # then payment fallback.
-												feat: call_llm/async_call_llm + config slots + migrate all consumers

Add centralized call_llm() and async_call_llm() functions that own the
full LLM request lifecycle:
  1. Resolve provider + model from task config or explicit args
  2. Get or create a cached client for that provider
  3. Format request args (max_tokens handling, provider extra_body)
  4. Make the API call with max_tokens/max_completion_tokens retry
  5. Return the response

Config: expanded auxiliary section with provider:model slots for all
tasks (compression, vision, web_extract, session_search, skills_hub,
mcp, flush_memories). Config version bumped to 7.

Migrated all auxiliary consumers:
- context_compressor.py: uses call_llm(task='compression')
- vision_tools.py: uses async_call_llm(task='vision')
- web_tools.py: uses async_call_llm(task='web_extract')
- session_search_tool.py: uses async_call_llm(task='session_search')
- browser_tool.py: uses call_llm(task='vision'/'web_extract')
- mcp_tool.py: uses call_llm(task='mcp')
- skills_guard.py: uses call_llm(provider='openrouter')
- run_agent.py flush_memories: uses call_llm(task='flush_memories')

Tests updated for context_compressor and MCP tool. Some test mocks
still need updating (15 remaining failures from mock pattern changes,
2 pre-existing).

											
										
										
											2026-03-11 20:52:19 -07:00
+								    try:
-												fix(auxiliary): validate response shape in call_llm/async_call_llm (#7264)

async_call_llm (and call_llm) can return non-OpenAI objects from
custom providers or adapter shims, crashing downstream consumers
with misleading AttributeError ('str' has no attribute 'choices').

Add _validate_llm_response() that checks the response has the
expected .choices[0].message shape before returning. Wraps all
return paths in call_llm, async_call_llm, and fallback paths.
Fails fast with a clear RuntimeError identifying the task, response
type, and a preview of the malformed payload.

Closes #7264

											
										
										
											2026-04-11 13:53:00 +05:30
+								        return _validate_llm_response(
 								            client.chat.completions.create(**kwargs), task)
-												feat: call_llm/async_call_llm + config slots + migrate all consumers

Add centralized call_llm() and async_call_llm() functions that own the
full LLM request lifecycle:
  1. Resolve provider + model from task config or explicit args
  2. Get or create a cached client for that provider
  3. Format request args (max_tokens handling, provider extra_body)
  4. Make the API call with max_tokens/max_completion_tokens retry
  5. Return the response

Config: expanded auxiliary section with provider:model slots for all
tasks (compression, vision, web_extract, session_search, skills_hub,
mcp, flush_memories). Config version bumped to 7.

Migrated all auxiliary consumers:
- context_compressor.py: uses call_llm(task='compression')
- vision_tools.py: uses async_call_llm(task='vision')
- web_tools.py: uses async_call_llm(task='web_extract')
- session_search_tool.py: uses async_call_llm(task='session_search')
- browser_tool.py: uses call_llm(task='vision'/'web_extract')
- mcp_tool.py: uses call_llm(task='mcp')
- skills_guard.py: uses call_llm(provider='openrouter')
- run_agent.py flush_memories: uses call_llm(task='flush_memories')

Tests updated for context_compressor and MCP tool. Some test mocks
still need updating (15 remaining failures from mock pattern changes,
2 pre-existing).

											
										
										
											2026-03-11 20:52:19 -07:00
+								    except Exception as first_err:
-												fix(auxiliary): retry without temperature when any provider rejects it

Universal reactive fix for 'HTTP 400: Unsupported parameter: temperature'
across all providers/models — not just Codex Responses.

The same backend can accept temperature for some models and reject it for
others (e.g. gpt-5.4 accepts but gpt-5.5 rejects on the same OpenAI
endpoint; similar patterns on Copilot, OpenRouter reasoning routes, and
Anthropic Opus 4.7+ via OAI-compat). An allow/deny-list by model name does
not scale.

call_llm / async_call_llm now detect the concrete 'unsupported parameter:
temperature' 400 and transparently retry once without temperature. Kimi's
server-managed omission and Opus 4.7+'s proactive strip stay in place —
this is the safety net for everything else.

Changes:
- agent/auxiliary_client.py: add _is_unsupported_temperature_error helper;
  wire into both sync and async call_llm paths before the existing
  max_tokens/payment/auth retry ladder
- tests/agent/test_unsupported_temperature_retry.py: 19 tests covering
  detector phrasings, sync + async retry, no-retry-without-temperature,
  and non-temperature 400s not triggering the retry

Builds on PR #15620 (codex_responses fallback) which stripped temperature
up front for that one api_mode. This PR closes the gap for every other
provider/model combo via reactive retry.

Credit: retry approach and detector originate from @BlueBirdBack's PR #15578.

Co-authored-by: BlueBirdBack <BlueBirdBack@users.noreply.github.com>

											
										
										
											2026-04-25 05:25:09 -07:00
+								        if "temperature" in kwargs and _is_unsupported_temperature_error(first_err):
 								            retry_kwargs = dict(kwargs)
 								            retry_kwargs.pop("temperature", None)
 								            logger.info(
 								                "Auxiliary %s: provider rejected temperature; retrying once without it",
 								                task or "call",
 								            )
 								            try:
 								                return _validate_llm_response(
 								                    client.chat.completions.create(**retry_kwargs), task)
 								            except Exception as retry_err:
 								                retry_err_str = str(retry_err)
 								                # If retry still fails, fall through to the max_tokens /
 								                # payment / auth chains below using the temperature-stripped
 								                # kwargs.  Re-raise only if the retry hit something those
 								                # chains won't handle.
 								                if not (
 								                    _is_payment_error(retry_err)
 								                    or _is_connection_error(retry_err)
 								                    or _is_auth_error(retry_err)
 								                    or "max_tokens" in retry_err_str
 								                    or "unsupported_parameter" in retry_err_str
 								                ):
 								                    raise
 								                first_err = retry_err
 								                kwargs = retry_kwargs
-												feat: call_llm/async_call_llm + config slots + migrate all consumers

Add centralized call_llm() and async_call_llm() functions that own the
full LLM request lifecycle:
  1. Resolve provider + model from task config or explicit args
  2. Get or create a cached client for that provider
  3. Format request args (max_tokens handling, provider extra_body)
  4. Make the API call with max_tokens/max_completion_tokens retry
  5. Return the response

Config: expanded auxiliary section with provider:model slots for all
tasks (compression, vision, web_extract, session_search, skills_hub,
mcp, flush_memories). Config version bumped to 7.

Migrated all auxiliary consumers:
- context_compressor.py: uses call_llm(task='compression')
- vision_tools.py: uses async_call_llm(task='vision')
- web_tools.py: uses async_call_llm(task='web_extract')
- session_search_tool.py: uses async_call_llm(task='session_search')
- browser_tool.py: uses call_llm(task='vision'/'web_extract')
- mcp_tool.py: uses call_llm(task='mcp')
- skills_guard.py: uses call_llm(provider='openrouter')
- run_agent.py flush_memories: uses call_llm(task='flush_memories')

Tests updated for context_compressor and MCP tool. Some test mocks
still need updating (15 remaining failures from mock pattern changes,
2 pre-existing).

											
										
										
											2026-03-11 20:52:19 -07:00
+								        err_str = str(first_err)
-												fix(auxiliary): generalize unsupported-parameter detector and harden max_tokens retry (#15633)

Generalize the temperature-specific 400 retry that shipped in PR #15621 so
the same reactive strategy covers any provider that rejects an arbitrary
request parameter —  — not just temperature.

- agent/auxiliary_client.py:
  * New _is_unsupported_parameter_error(exc, param): matches the same six
    phrasings the old temperature detector did plus 'unrecognized parameter'
    and 'invalid parameter', against any named param.
  * _is_unsupported_temperature_error is now a thin back-compat wrapper so
    existing imports and tests keep working.
  * The max_tokens → max_completion_tokens retry branch in call_llm and
    async_call_llm now (a) gates on 'max_tokens is not None' so we do not
    pop a key that was never set and silently substitute a None value on
    the retry, and (b) also matches the generic helper in addition to the
    legacy 'max_tokens' / 'unsupported_parameter' substring checks — picking
    up phrasings like 'Unknown parameter: max_tokens' that previously slipped
    through.

- tests/agent/test_unsupported_parameter_retry.py: 18 new tests covering
  the generic detector across params, the back-compat wrapper, and the two
  hardenings to the max_tokens retry branch (None gate + generic phrasing).

Credit: retry-generalization pattern from @nicholasrae's PR #15416. That PR
also proposed the reactive temperature retry which landed independently via
PR #15621 + #15623 (co-authored with @BlueBirdBack). This commit salvages
the remaining hardening ideas onto current main.
											
										
										
											2026-04-25 05:50:34 -07:00
+								        if max_tokens is not None and (
 								            "max_tokens" in err_str
 								            or "unsupported_parameter" in err_str
 								            or _is_unsupported_parameter_error(first_err, "max_tokens")
 								        ):
-												feat: call_llm/async_call_llm + config slots + migrate all consumers

Add centralized call_llm() and async_call_llm() functions that own the
full LLM request lifecycle:
  1. Resolve provider + model from task config or explicit args
  2. Get or create a cached client for that provider
  3. Format request args (max_tokens handling, provider extra_body)
  4. Make the API call with max_tokens/max_completion_tokens retry
  5. Return the response

Config: expanded auxiliary section with provider:model slots for all
tasks (compression, vision, web_extract, session_search, skills_hub,
mcp, flush_memories). Config version bumped to 7.

Migrated all auxiliary consumers:
- context_compressor.py: uses call_llm(task='compression')
- vision_tools.py: uses async_call_llm(task='vision')
- web_tools.py: uses async_call_llm(task='web_extract')
- session_search_tool.py: uses async_call_llm(task='session_search')
- browser_tool.py: uses call_llm(task='vision'/'web_extract')
- mcp_tool.py: uses call_llm(task='mcp')
- skills_guard.py: uses call_llm(provider='openrouter')
- run_agent.py flush_memories: uses call_llm(task='flush_memories')

Tests updated for context_compressor and MCP tool. Some test mocks
still need updating (15 remaining failures from mock pattern changes,
2 pre-existing).

											
										
										
											2026-03-11 20:52:19 -07:00
+								            kwargs.pop("max_tokens", None)
 								            kwargs["max_completion_tokens"] = max_tokens
-												fix: auxiliary client payment fallback — retry with next provider on 402 (#5599)

When a user runs out of OpenRouter credits and switches to Codex (or any
other provider), auxiliary tasks (compression, vision, web_extract) would
still try OpenRouter first and fail with 402.  Two fixes:

1. Payment fallback in call_llm(): When a resolved provider returns HTTP 402
   or a credit-related error, automatically retry with the next available
   provider in the auto-detection chain.  Skips the depleted provider and
   tries Nous → Custom → Codex → API-key providers.

2. Remove hardcoded OpenRouter fallback: The old code fell back specifically
   to OpenRouter when auto/custom resolution returned no client.  Now falls
   back to the full auto-detection chain, which handles any available
   provider — not just OpenRouter.

Also extracts _get_provider_chain() as a shared function (replaces inline
tuple in _resolve_auto and the new fallback), built at call time so test
patches on _try_* functions remain visible.

Adds 16 tests covering _is_payment_error(), _get_provider_chain(),
_try_payment_fallback(), and call_llm() integration with 402 retry.
											
										
										
											2026-04-06 12:41:40 -07:00
+								            try:
-												fix(auxiliary): validate response shape in call_llm/async_call_llm (#7264)

async_call_llm (and call_llm) can return non-OpenAI objects from
custom providers or adapter shims, crashing downstream consumers
with misleading AttributeError ('str' has no attribute 'choices').

Add _validate_llm_response() that checks the response has the
expected .choices[0].message shape before returning. Wraps all
return paths in call_llm, async_call_llm, and fallback paths.
Fails fast with a clear RuntimeError identifying the task, response
type, and a preview of the malformed payload.

Closes #7264

											
										
										
											2026-04-11 13:53:00 +05:30
+								                return _validate_llm_response(
 								                    client.chat.completions.create(**kwargs), task)
-												fix: auxiliary client payment fallback — retry with next provider on 402 (#5599)

When a user runs out of OpenRouter credits and switches to Codex (or any
other provider), auxiliary tasks (compression, vision, web_extract) would
still try OpenRouter first and fail with 402.  Two fixes:

1. Payment fallback in call_llm(): When a resolved provider returns HTTP 402
   or a credit-related error, automatically retry with the next available
   provider in the auto-detection chain.  Skips the depleted provider and
   tries Nous → Custom → Codex → API-key providers.

2. Remove hardcoded OpenRouter fallback: The old code fell back specifically
   to OpenRouter when auto/custom resolution returned no client.  Now falls
   back to the full auto-detection chain, which handles any available
   provider — not just OpenRouter.

Also extracts _get_provider_chain() as a shared function (replaces inline
tuple in _resolve_auto and the new fallback), built at call time so test
patches on _try_* functions remain visible.

Adds 16 tests covering _is_payment_error(), _get_provider_chain(),
_try_payment_fallback(), and call_llm() integration with 402 retry.
											
										
										
											2026-04-06 12:41:40 -07:00
+								            except Exception as retry_err:
-												fix(auxiliary): harden fallback behavior for non-OpenRouter users

Four fixes to auxiliary_client.py:

1. Respect explicit provider as hard constraint (#7559)
   When auxiliary.{task}.provider is explicitly set (not 'auto'),
   connection/payment errors no longer silently fallback to cloud
   providers. Local-only users (Ollama, vLLM) will no longer get
   unexpected OpenRouter billing from auxiliary tasks.

2. Eliminate model='default' sentinel (#7512)
   _resolve_api_key_provider() no longer sends literal 'default' as
   model name to APIs. Providers without a known aux model in
   _API_KEY_PROVIDER_AUX_MODELS are skipped instead of producing
   model_not_supported errors.

3. Add payment/connection fallback to async_call_llm (#7512)
   async_call_llm now mirrors sync call_llm's fallback logic for
   payment (402) and connection errors. Previously, async consumers
   (session_search, web_tools, vision) got hard failures with no
   recovery. Also fixes hardcoded 'openrouter' fallback to use the
   full auto-detection chain.

4. Use accurate error reason in fallback logs (#7512)
   _try_payment_fallback() now accepts a reason parameter and uses
   it in log messages. Connection timeouts are no longer misleadingly
   logged as 'payment error'.

Closes #7559
Closes #7512

											
										
										
											2026-04-11 12:37:53 +05:30
+								                # If the max_tokens retry also hits a payment or connection
 								                # error, fall through to the fallback chain below.
 								                if not (_is_payment_error(retry_err) or _is_connection_error(retry_err)):
-												fix: auxiliary client payment fallback — retry with next provider on 402 (#5599)

When a user runs out of OpenRouter credits and switches to Codex (or any
other provider), auxiliary tasks (compression, vision, web_extract) would
still try OpenRouter first and fail with 402.  Two fixes:

1. Payment fallback in call_llm(): When a resolved provider returns HTTP 402
   or a credit-related error, automatically retry with the next available
   provider in the auto-detection chain.  Skips the depleted provider and
   tries Nous → Custom → Codex → API-key providers.

2. Remove hardcoded OpenRouter fallback: The old code fell back specifically
   to OpenRouter when auto/custom resolution returned no client.  Now falls
   back to the full auto-detection chain, which handles any available
   provider — not just OpenRouter.

Also extracts _get_provider_chain() as a shared function (replaces inline
tuple in _resolve_auto and the new fallback), built at call time so test
patches on _try_* functions remain visible.

Adds 16 tests covering _is_payment_error(), _get_provider_chain(),
_try_payment_fallback(), and call_llm() integration with 402 retry.
											
										
										
											2026-04-06 12:41:40 -07:00
+								                    raise
 								                first_err = retry_err
-												fix(auxiliary): refresh Nous runtime credentials after aux 401s

											
										
										
											2026-04-21 14:45:13 -06:00
+								        # ── Nous auth refresh parity with main agent ──────────────────
 								        client_is_nous = (
 								            resolved_provider == "nous"
 								            or base_url_host_matches(_base_info, "inference-api.nousresearch.com")
 								        )
 								        if _is_auth_error(first_err) and client_is_nous:
 								            refreshed_client, refreshed_model = _refresh_nous_auxiliary_client(
 								                cache_provider=resolved_provider or "nous",
 								                model=final_model,
 								                async_mode=False,
 								                base_url=resolved_base_url,
 								                api_key=resolved_api_key,
 								                api_mode=resolved_api_mode,
 								                main_runtime=main_runtime,
-												fix(copilot): send vision header for Copilot vision requests

Thread a vision-request flag through auxiliary provider resolution so Copilot clients can include Copilot-Vision-Request only for vision tasks. This preserves normal text requests while ensuring Copilot vision payloads reach the vision-capable route.

Add regression coverage for Copilot vision routing and keep cached text and vision clients separate so a text client without the header is not reused for vision.

Co-authored-by: dhabibi <9087935+dhabibi@users.noreply.github.com>

											
										
										
											2026-04-26 21:29:55 +00:00
+								                is_vision=(task == "vision"),
-												fix(auxiliary): refresh Nous runtime credentials after aux 401s

											
										
										
											2026-04-21 14:45:13 -06:00
+								            )
 								            if refreshed_client is not None:
 								                logger.info("Auxiliary %s: refreshed Nous runtime credentials after 401, retrying",
 								                            task or "call")
 								                if refreshed_model and refreshed_model != kwargs.get("model"):
 								                    kwargs["model"] = refreshed_model
 								                return _validate_llm_response(
 								                    refreshed_client.chat.completions.create(**kwargs), task)
-												fix(aux): refresh cached auth after 401

											
										
										
											2026-04-23 19:49:46 +09:00
+								        # ── Auth refresh retry ───────────────────────────────────────
 								        if (_is_auth_error(first_err)
 								                and resolved_provider not in ("auto", "", None)
 								                and not client_is_nous):
 								            if _refresh_provider_credentials(resolved_provider):
 								                logger.info(
 								                    "Auxiliary %s: refreshed %s credentials after auth error, retrying",
 								                    task or "call", resolved_provider,
 								                )
 								                retry_client, retry_model = (
 								                    resolve_vision_provider_client(
 								                        provider=resolved_provider,
 								                        model=final_model,
 								                        async_mode=False,
 								                    )[1:]
 								                    if task == "vision"
 								                    else _get_cached_client(
 								                        resolved_provider,
 								                        resolved_model,
 								                        base_url=resolved_base_url,
 								                        api_key=resolved_api_key,
 								                        api_mode=resolved_api_mode,
 								                        main_runtime=main_runtime,
 								                    )
 								                )
 								                if retry_client is not None:
 								                    retry_kwargs = _build_call_kwargs(
 								                        resolved_provider,
 								                        retry_model or final_model,
 								                        messages,
 								                        temperature=temperature,
 								                        max_tokens=max_tokens,
 								                        tools=tools,
 								                        timeout=effective_timeout,
 								                        extra_body=effective_extra_body,
 								                        base_url=resolved_base_url,
 								                    )
 								                    _retry_base = str(getattr(retry_client, "base_url", "") or "")
 								                    if _is_anthropic_compat_endpoint(resolved_provider, _retry_base):
 								                        retry_kwargs["messages"] = _convert_openai_images_to_anthropic(retry_kwargs["messages"])
 								                    return _validate_llm_response(
 								                        retry_client.chat.completions.create(**retry_kwargs), task)
-												fix: auxiliary client payment fallback — retry with next provider on 402 (#5599)

When a user runs out of OpenRouter credits and switches to Codex (or any
other provider), auxiliary tasks (compression, vision, web_extract) would
still try OpenRouter first and fail with 402.  Two fixes:

1. Payment fallback in call_llm(): When a resolved provider returns HTTP 402
   or a credit-related error, automatically retry with the next available
   provider in the auto-detection chain.  Skips the depleted provider and
   tries Nous → Custom → Codex → API-key providers.

2. Remove hardcoded OpenRouter fallback: The old code fell back specifically
   to OpenRouter when auto/custom resolution returned no client.  Now falls
   back to the full auto-detection chain, which handles any available
   provider — not just OpenRouter.

Also extracts _get_provider_chain() as a shared function (replaces inline
tuple in _resolve_auto and the new fallback), built at call time so test
patches on _try_* functions remain visible.

Adds 16 tests covering _is_payment_error(), _get_provider_chain(),
_try_payment_fallback(), and call_llm() integration with 402 retry.
											
										
										
											2026-04-06 12:41:40 -07:00
+								        # ── Payment / credit exhaustion fallback ──────────────────────
 								        # When the resolved provider returns 402 or a credit-related error,
 								        # try alternative providers instead of giving up.  This handles the
 								        # common case where a user runs out of OpenRouter credits but has
 								        # Codex OAuth or another provider available.
-												fix: model fallback — stale model on Nous login + connection error fallback (#6554)

Two bugs in the model fallback system:

1. Nous login leaves stale model in config (provider=nous, model=opus
   from previous OpenRouter setup). Fixed by deferring the config.yaml
   provider write until AFTER model selection completes, and passing the
   selected model atomically via _update_config_for_provider's
   default_model parameter. Previously, _update_config_for_provider was
   called before model selection — if selection failed (free tier, no
   models, exception), config stayed as nous+opus permanently.

2. Codex/stale providers in auxiliary fallback can't connect but block
   the auto-detection chain. Added _is_connection_error() detection
   (APIConnectionError, APITimeoutError, DNS failures, connection
   refused) alongside the existing _is_payment_error() check in
   call_llm(). When a provider endpoint is unreachable, the system now
   falls back to the next available provider instead of crashing.
											
										
										
											2026-04-09 10:38:53 -07:00
+								        #
 								        # ── Connection error fallback ────────────────────────────────
 								        # When a provider endpoint is unreachable (DNS failure, connection
 								        # refused, timeout), try alternative providers.  This handles stale
 								        # Codex/OAuth tokens that authenticate but whose endpoint is down,
 								        # and providers the user never configured that got picked up by
 								        # the auto-detection chain.
 								        should_fallback = _is_payment_error(first_err) or _is_connection_error(first_err)
-												fix(auxiliary): harden fallback behavior for non-OpenRouter users

Four fixes to auxiliary_client.py:

1. Respect explicit provider as hard constraint (#7559)
   When auxiliary.{task}.provider is explicitly set (not 'auto'),
   connection/payment errors no longer silently fallback to cloud
   providers. Local-only users (Ollama, vLLM) will no longer get
   unexpected OpenRouter billing from auxiliary tasks.

2. Eliminate model='default' sentinel (#7512)
   _resolve_api_key_provider() no longer sends literal 'default' as
   model name to APIs. Providers without a known aux model in
   _API_KEY_PROVIDER_AUX_MODELS are skipped instead of producing
   model_not_supported errors.

3. Add payment/connection fallback to async_call_llm (#7512)
   async_call_llm now mirrors sync call_llm's fallback logic for
   payment (402) and connection errors. Previously, async consumers
   (session_search, web_tools, vision) got hard failures with no
   recovery. Also fixes hardcoded 'openrouter' fallback to use the
   full auto-detection chain.

4. Use accurate error reason in fallback logs (#7512)
   _try_payment_fallback() now accepts a reason parameter and uses
   it in log messages. Connection timeouts are no longer misleadingly
   logged as 'payment error'.

Closes #7559
Closes #7512

											
										
										
											2026-04-11 12:37:53 +05:30
+								        # Only try alternative providers when the user didn't explicitly
 								        # configure this task's provider.  Explicit provider = hard constraint;
 								        # auto (the default) = best-effort fallback chain.  (#7559)
 								        is_auto = resolved_provider in ("auto", "", None)
 								        if should_fallback and is_auto:
-												fix: model fallback — stale model on Nous login + connection error fallback (#6554)

Two bugs in the model fallback system:

1. Nous login leaves stale model in config (provider=nous, model=opus
   from previous OpenRouter setup). Fixed by deferring the config.yaml
   provider write until AFTER model selection completes, and passing the
   selected model atomically via _update_config_for_provider's
   default_model parameter. Previously, _update_config_for_provider was
   called before model selection — if selection failed (free tier, no
   models, exception), config stayed as nous+opus permanently.

2. Codex/stale providers in auxiliary fallback can't connect but block
   the auto-detection chain. Added _is_connection_error() detection
   (APIConnectionError, APITimeoutError, DNS failures, connection
   refused) alongside the existing _is_payment_error() check in
   call_llm(). When a provider endpoint is unreachable, the system now
   falls back to the next available provider instead of crashing.
											
										
										
											2026-04-09 10:38:53 -07:00
+								            reason = "payment error" if _is_payment_error(first_err) else "connection error"
 								            logger.info("Auxiliary %s: %s on %s (%s), trying fallback",
 								                        task or "call", reason, resolved_provider, first_err)
-												fix: auxiliary client payment fallback — retry with next provider on 402 (#5599)

When a user runs out of OpenRouter credits and switches to Codex (or any
other provider), auxiliary tasks (compression, vision, web_extract) would
still try OpenRouter first and fail with 402.  Two fixes:

1. Payment fallback in call_llm(): When a resolved provider returns HTTP 402
   or a credit-related error, automatically retry with the next available
   provider in the auto-detection chain.  Skips the depleted provider and
   tries Nous → Custom → Codex → API-key providers.

2. Remove hardcoded OpenRouter fallback: The old code fell back specifically
   to OpenRouter when auto/custom resolution returned no client.  Now falls
   back to the full auto-detection chain, which handles any available
   provider — not just OpenRouter.

Also extracts _get_provider_chain() as a shared function (replaces inline
tuple in _resolve_auto and the new fallback), built at call time so test
patches on _try_* functions remain visible.

Adds 16 tests covering _is_payment_error(), _get_provider_chain(),
_try_payment_fallback(), and call_llm() integration with 402 retry.
											
										
										
											2026-04-06 12:41:40 -07:00
+								            fb_client, fb_model, fb_label = _try_payment_fallback(
-												fix(auxiliary): harden fallback behavior for non-OpenRouter users

Four fixes to auxiliary_client.py:

1. Respect explicit provider as hard constraint (#7559)
   When auxiliary.{task}.provider is explicitly set (not 'auto'),
   connection/payment errors no longer silently fallback to cloud
   providers. Local-only users (Ollama, vLLM) will no longer get
   unexpected OpenRouter billing from auxiliary tasks.

2. Eliminate model='default' sentinel (#7512)
   _resolve_api_key_provider() no longer sends literal 'default' as
   model name to APIs. Providers without a known aux model in
   _API_KEY_PROVIDER_AUX_MODELS are skipped instead of producing
   model_not_supported errors.

3. Add payment/connection fallback to async_call_llm (#7512)
   async_call_llm now mirrors sync call_llm's fallback logic for
   payment (402) and connection errors. Previously, async consumers
   (session_search, web_tools, vision) got hard failures with no
   recovery. Also fixes hardcoded 'openrouter' fallback to use the
   full auto-detection chain.

4. Use accurate error reason in fallback logs (#7512)
   _try_payment_fallback() now accepts a reason parameter and uses
   it in log messages. Connection timeouts are no longer misleadingly
   logged as 'payment error'.

Closes #7559
Closes #7512

											
										
										
											2026-04-11 12:37:53 +05:30
+								                resolved_provider, task, reason=reason)
-												fix: auxiliary client payment fallback — retry with next provider on 402 (#5599)

When a user runs out of OpenRouter credits and switches to Codex (or any
other provider), auxiliary tasks (compression, vision, web_extract) would
still try OpenRouter first and fail with 402.  Two fixes:

1. Payment fallback in call_llm(): When a resolved provider returns HTTP 402
   or a credit-related error, automatically retry with the next available
   provider in the auto-detection chain.  Skips the depleted provider and
   tries Nous → Custom → Codex → API-key providers.

2. Remove hardcoded OpenRouter fallback: The old code fell back specifically
   to OpenRouter when auto/custom resolution returned no client.  Now falls
   back to the full auto-detection chain, which handles any available
   provider — not just OpenRouter.

Also extracts _get_provider_chain() as a shared function (replaces inline
tuple in _resolve_auto and the new fallback), built at call time so test
patches on _try_* functions remain visible.

Adds 16 tests covering _is_payment_error(), _get_provider_chain(),
_try_payment_fallback(), and call_llm() integration with 402 retry.
											
										
										
											2026-04-06 12:41:40 -07:00
+								            if fb_client is not None:
 								                fb_kwargs = _build_call_kwargs(
 								                    fb_label, fb_model, messages,
 								                    temperature=temperature, max_tokens=max_tokens,
 								                    tools=tools, timeout=effective_timeout,
-												fix(aux): add session_search extra_body and concurrency controls

Adds auxiliary.<task>.extra_body config passthrough so reasoning-heavy
OpenAI-compatible providers can receive provider-specific request fields
(e.g. enable_thinking: false on GLM) on auxiliary calls, and bounds
session_search summary fan-out with auxiliary.session_search.max_concurrency
(default 3, clamped 1-5) to avoid 429 bursts on small providers.

- agent/auxiliary_client.py: extract _get_auxiliary_task_config helper,
  add _get_task_extra_body, merge config+explicit extra_body with explicit winning
- hermes_cli/config.py: extra_body defaults on all aux tasks +
  session_search.max_concurrency; _config_version 19 -> 20
- tools/session_search_tool.py: semaphore around _summarize_all gather
- tests: coverage in test_auxiliary_client, test_session_search, test_aux_config
- docs: user-guide/configuration.md + fallback-providers.md

Co-authored-by: Teknium <teknium@nousresearch.com>

											
										
										
											2026-04-20 00:44:32 -07:00
+								                    extra_body=effective_extra_body,
-												fix(kimi): route temperature override by base_url — kimi-k2.5 needs 1.0 on api.moonshot.ai

Follow-up to #12144.  That PR standardized the kimi-k2.* temperature lock
against the Coding Plan endpoint (api.kimi.com/coding/v1) docs, where
non-thinking models require 0.6.  Verified empirically against Moonshot
(April 2026) that the public chat endpoint (api.moonshot.ai/v1) has a
different contract for kimi-k2.5: it only accepts temperature=1, and rejects
0.6 with:

    HTTP 400 "invalid temperature: only 1 is allowed for this model"

Users hit the public endpoint when KIMI_API_KEY is a legacy sk-* key (the
sk-kimi-* prefix routes to Coding Plan — see hermes_cli/auth.py).  So for
Coding Plan subscribers the fix from #12144 is correct, but for public-API
users it reintroduces the exact 400 reported in #9125.

Reproduction on api.moonshot.ai/v1 + kimi-k2.5:
  temperature=1.0 → 200 OK
  temperature=0.6 → 400 "only 1 is allowed"     ← #12144 default
  temperature=None → 200 OK

Other kimi-k2.* models are unaffected empirically — turbo-preview accepts
0.6 and thinking-turbo accepts 1.0 on both endpoints — so only kimi-k2.5
diverges.

Fix: thread the client's actual base_url through _build_call_kwargs (the
parameter already existed but callers passed config-level resolved_base_url;
for auto-detected routes that was often empty).  _fixed_temperature_for_model
now checks api.moonshot.ai first via an explicit _KIMI_PUBLIC_API_OVERRIDES
map, then falls back to the Coding Plan defaults.  Tests parametrize over
endpoint + model to lock both contracts.

Closes #9125.

											
										
										
											2026-04-20 04:18:49 +09:00
+								                    base_url=str(getattr(fb_client, "base_url", "") or ""))
-												fix(auxiliary): validate response shape in call_llm/async_call_llm (#7264)

async_call_llm (and call_llm) can return non-OpenAI objects from
custom providers or adapter shims, crashing downstream consumers
with misleading AttributeError ('str' has no attribute 'choices').

Add _validate_llm_response() that checks the response has the
expected .choices[0].message shape before returning. Wraps all
return paths in call_llm, async_call_llm, and fallback paths.
Fails fast with a clear RuntimeError identifying the task, response
type, and a preview of the malformed payload.

Closes #7264

											
										
										
											2026-04-11 13:53:00 +05:30
+								                return _validate_llm_response(
 								                    fb_client.chat.completions.create(**fb_kwargs), task)
-												feat: call_llm/async_call_llm + config slots + migrate all consumers

Add centralized call_llm() and async_call_llm() functions that own the
full LLM request lifecycle:
  1. Resolve provider + model from task config or explicit args
  2. Get or create a cached client for that provider
  3. Format request args (max_tokens handling, provider extra_body)
  4. Make the API call with max_tokens/max_completion_tokens retry
  5. Return the response

Config: expanded auxiliary section with provider:model slots for all
tasks (compression, vision, web_extract, session_search, skills_hub,
mcp, flush_memories). Config version bumped to 7.

Migrated all auxiliary consumers:
- context_compressor.py: uses call_llm(task='compression')
- vision_tools.py: uses async_call_llm(task='vision')
- web_tools.py: uses async_call_llm(task='web_extract')
- session_search_tool.py: uses async_call_llm(task='session_search')
- browser_tool.py: uses call_llm(task='vision'/'web_extract')
- mcp_tool.py: uses call_llm(task='mcp')
- skills_guard.py: uses call_llm(provider='openrouter')
- run_agent.py flush_memories: uses call_llm(task='flush_memories')

Tests updated for context_compressor and MCP tool. Some test mocks
still need updating (15 remaining failures from mock pattern changes,
2 pre-existing).

											
										
										
											2026-03-11 20:52:19 -07:00
+								        raise
-												fix: guard aux LLM calls against None content + reasoning fallback + retry (salvage #3389) (#3449)

Salvage of #3389 by @binhnt92 with reasoning fallback and retry logic added on top.

All 7 auxiliary LLM call sites now use extract_content_or_reasoning() which mirrors the main agent loop's behavior: extract content, strip think blocks, fall back to structured reasoning fields, retry on empty.

Closes #3389.
											
										
										
											2026-03-27 15:28:19 -07:00
+								def extract_content_or_reasoning(response) -> str:
 								    """Extract content from an LLM response, falling back to reasoning fields.
 								    Mirrors the main agent loop's behavior when a reasoning model (DeepSeek-R1,
 								    Qwen-QwQ, etc.) returns ``content=None`` with reasoning in structured fields.
 								    Resolution order:
 . ``message.content`` — strip inline think/reasoning blocks, check for
 								         remaining non-whitespace text.
 . ``message.reasoning`` / ``message.reasoning_content`` — direct
 								         structured reasoning fields (DeepSeek, Moonshot, Novita, etc.).
 . ``message.reasoning_details`` — OpenRouter unified array format.
 								    Returns the best available text, or ``""`` if nothing found.
 								    """
 								    import re
 								    msg = response.choices[0].message
 								    content = (msg.content or "").strip()
 								    if content:
 								        # Strip inline think/reasoning blocks (mirrors _strip_think_blocks)
 								        cleaned = re.sub(
-												fix: add <thought> stripping to auxiliary_client + tests

auxiliary_client.py had its own regex mirroring _strip_think_blocks
but was missing the <thought> variant. Also adds test coverage for
<thought> paired and orphaned tags.

											
										
										
											2026-04-12 12:38:24 -07:00
+								            r"<(?:think|thinking|reasoning|thought|REASONING_SCRATCHPAD)>"
-												fix: guard aux LLM calls against None content + reasoning fallback + retry (salvage #3389) (#3449)

Salvage of #3389 by @binhnt92 with reasoning fallback and retry logic added on top.

All 7 auxiliary LLM call sites now use extract_content_or_reasoning() which mirrors the main agent loop's behavior: extract content, strip think blocks, fall back to structured reasoning fields, retry on empty.

Closes #3389.
											
										
										
											2026-03-27 15:28:19 -07:00
+								            r".*?"
-												fix: add <thought> stripping to auxiliary_client + tests

auxiliary_client.py had its own regex mirroring _strip_think_blocks
but was missing the <thought> variant. Also adds test coverage for
<thought> paired and orphaned tags.

											
										
										
											2026-04-12 12:38:24 -07:00
+								            r"</(?:think|thinking|reasoning|thought|REASONING_SCRATCHPAD)>",
-												fix: guard aux LLM calls against None content + reasoning fallback + retry (salvage #3389) (#3449)

Salvage of #3389 by @binhnt92 with reasoning fallback and retry logic added on top.

All 7 auxiliary LLM call sites now use extract_content_or_reasoning() which mirrors the main agent loop's behavior: extract content, strip think blocks, fall back to structured reasoning fields, retry on empty.

Closes #3389.
											
										
										
											2026-03-27 15:28:19 -07:00
+								            "", content, flags=re.DOTALL | re.IGNORECASE,
 								        ).strip()
 								        if cleaned:
 								            return cleaned
 								    # Content is empty or reasoning-only — try structured reasoning fields
 								    reasoning_parts: list[str] = []
 								    for field in ("reasoning", "reasoning_content"):
 								        val = getattr(msg, field, None)
 								        if val and isinstance(val, str) and val.strip() and val not in reasoning_parts:
 								            reasoning_parts.append(val.strip())
 								    details = getattr(msg, "reasoning_details", None)
 								    if details and isinstance(details, list):
 								        for detail in details:
 								            if isinstance(detail, dict):
 								                summary = (
 								                    detail.get("summary")
 								                    or detail.get("content")
 								                    or detail.get("text")
 								                )
 								                if summary and summary not in reasoning_parts:
 								                    reasoning_parts.append(summary.strip() if isinstance(summary, str) else str(summary))
 								    if reasoning_parts:
 								        return "\n\n".join(reasoning_parts)
 								    return ""
-												feat: call_llm/async_call_llm + config slots + migrate all consumers

Add centralized call_llm() and async_call_llm() functions that own the
full LLM request lifecycle:
  1. Resolve provider + model from task config or explicit args
  2. Get or create a cached client for that provider
  3. Format request args (max_tokens handling, provider extra_body)
  4. Make the API call with max_tokens/max_completion_tokens retry
  5. Return the response

Config: expanded auxiliary section with provider:model slots for all
tasks (compression, vision, web_extract, session_search, skills_hub,
mcp, flush_memories). Config version bumped to 7.

Migrated all auxiliary consumers:
- context_compressor.py: uses call_llm(task='compression')
- vision_tools.py: uses async_call_llm(task='vision')
- web_tools.py: uses async_call_llm(task='web_extract')
- session_search_tool.py: uses async_call_llm(task='session_search')
- browser_tool.py: uses call_llm(task='vision'/'web_extract')
- mcp_tool.py: uses call_llm(task='mcp')
- skills_guard.py: uses call_llm(provider='openrouter')
- run_agent.py flush_memories: uses call_llm(task='flush_memories')

Tests updated for context_compressor and MCP tool. Some test mocks
still need updating (15 remaining failures from mock pattern changes,
2 pre-existing).

											
										
										
											2026-03-11 20:52:19 -07:00
+								async def async_call_llm(
 								    task: str = None,
 								    *,
 								    provider: str = None,
 								    model: str = None,
-												feat: add direct endpoint overrides for auxiliary and delegation

Add base_url/api_key overrides for auxiliary tasks and delegation so users can
route those flows straight to a custom OpenAI-compatible endpoint without
having to rely on provider=main or named custom providers.

Also clear gateway session env vars in test isolation so the full suite stays
deterministic when run from a messaging-backed agent session.

											
										
										
											2026-03-14 20:48:29 -07:00
+								    base_url: str = None,
 								    api_key: str = None,
-												feat: call_llm/async_call_llm + config slots + migrate all consumers

Add centralized call_llm() and async_call_llm() functions that own the
full LLM request lifecycle:
  1. Resolve provider + model from task config or explicit args
  2. Get or create a cached client for that provider
  3. Format request args (max_tokens handling, provider extra_body)
  4. Make the API call with max_tokens/max_completion_tokens retry
  5. Return the response

Config: expanded auxiliary section with provider:model slots for all
tasks (compression, vision, web_extract, session_search, skills_hub,
mcp, flush_memories). Config version bumped to 7.

Migrated all auxiliary consumers:
- context_compressor.py: uses call_llm(task='compression')
- vision_tools.py: uses async_call_llm(task='vision')
- web_tools.py: uses async_call_llm(task='web_extract')
- session_search_tool.py: uses async_call_llm(task='session_search')
- browser_tool.py: uses call_llm(task='vision'/'web_extract')
- mcp_tool.py: uses call_llm(task='mcp')
- skills_guard.py: uses call_llm(provider='openrouter')
- run_agent.py flush_memories: uses call_llm(task='flush_memories')

Tests updated for context_compressor and MCP tool. Some test mocks
still need updating (15 remaining failures from mock pattern changes,
2 pre-existing).

											
										
										
											2026-03-11 20:52:19 -07:00
+								    messages: list,
 								    temperature: float = None,
 								    max_tokens: int = None,
 								    tools: list = None,
-												feat(agent): configurable timeouts for auxiliary LLM calls via config.yaml (#3597)

Add per-task timeout settings under auxiliary.{task}.timeout in config.yaml
instead of hardcoded values. Users with slow local models (Ollama, llama.cpp)
can now increase timeouts for compression, vision, session search, etc.

Defaults:
  - auxiliary.compression.timeout: 120s (was hardcoded 45s)
  - auxiliary.vision.timeout: 30s (unchanged)
  - all other aux tasks: 30s (was hardcoded 30s)
  - title_generator: 30s (was hardcoded 15s)

call_llm/async_call_llm now auto-resolve timeout from config when not
explicitly passed. Callers can still override with an explicit timeout arg.

Based on PR #3406 by alanfwilliams. Converted from env vars to config.yaml
per project conventions.

Co-authored-by: alanfwilliams <alanfwilliams@users.noreply.github.com>
											
										
										
											2026-03-28 14:35:28 -07:00
+								    timeout: float = None,
-												feat: call_llm/async_call_llm + config slots + migrate all consumers

Add centralized call_llm() and async_call_llm() functions that own the
full LLM request lifecycle:
  1. Resolve provider + model from task config or explicit args
  2. Get or create a cached client for that provider
  3. Format request args (max_tokens handling, provider extra_body)
  4. Make the API call with max_tokens/max_completion_tokens retry
  5. Return the response

Config: expanded auxiliary section with provider:model slots for all
tasks (compression, vision, web_extract, session_search, skills_hub,
mcp, flush_memories). Config version bumped to 7.

Migrated all auxiliary consumers:
- context_compressor.py: uses call_llm(task='compression')
- vision_tools.py: uses async_call_llm(task='vision')
- web_tools.py: uses async_call_llm(task='web_extract')
- session_search_tool.py: uses async_call_llm(task='session_search')
- browser_tool.py: uses call_llm(task='vision'/'web_extract')
- mcp_tool.py: uses call_llm(task='mcp')
- skills_guard.py: uses call_llm(provider='openrouter')
- run_agent.py flush_memories: uses call_llm(task='flush_memories')

Tests updated for context_compressor and MCP tool. Some test mocks
still need updating (15 remaining failures from mock pattern changes,
2 pre-existing).

											
										
										
											2026-03-11 20:52:19 -07:00
+								    extra_body: dict = None,
 								) -> Any:
 								    """Centralized asynchronous LLM call.
 								    Same as call_llm() but async. See call_llm() for full documentation.
 								    """
-												fix(auxiliary): honor api_mode in auxiliary client (#6800)

The auxiliary client always calls client.chat.completions.create(),
ignoring the api_mode config flag. This breaks codex-family models
(e.g. gpt-5.3-codex) on direct OpenAI API keys, which need the
/v1/responses endpoint.

Changes:
- Expand _resolve_task_provider_model to return api_mode (5-tuple)
- Read api_mode from auxiliary.{task}.api_mode config and env vars
  (AUXILIARY_{TASK}_API_MODE)
- Pass api_mode through _get_cached_client to resolve_provider_client
- Add _needs_codex_wrap/_wrap_if_needed helpers that wrap plain OpenAI
  clients in CodexAuxiliaryClient when api_mode=codex_responses or
  when auto-detection finds api.openai.com + codex model pattern
- Apply wrapping at all custom endpoint, named custom provider, and
  API-key provider return paths
- Update test mocks for the new 5-tuple return format

Users can now set:
  auxiliary:
    compression:
      model: gpt-5.3-codex
      base_url: https://api.openai.com/v1
      api_mode: codex_responses

Closes #6800

											
										
										
											2026-04-11 13:50:43 +05:30
+								    resolved_provider, resolved_model, resolved_base_url, resolved_api_key, resolved_api_mode = _resolve_task_provider_model(
-												feat: add direct endpoint overrides for auxiliary and delegation

Add base_url/api_key overrides for auxiliary tasks and delegation so users can
route those flows straight to a custom OpenAI-compatible endpoint without
having to rely on provider=main or named custom providers.

Also clear gateway session env vars in test isolation so the full suite stays
deterministic when run from a messaging-backed agent session.

											
										
										
											2026-03-14 20:48:29 -07:00
+								        task, provider, model, base_url, api_key)
-												fix(aux): add session_search extra_body and concurrency controls

Adds auxiliary.<task>.extra_body config passthrough so reasoning-heavy
OpenAI-compatible providers can receive provider-specific request fields
(e.g. enable_thinking: false on GLM) on auxiliary calls, and bounds
session_search summary fan-out with auxiliary.session_search.max_concurrency
(default 3, clamped 1-5) to avoid 429 bursts on small providers.

- agent/auxiliary_client.py: extract _get_auxiliary_task_config helper,
  add _get_task_extra_body, merge config+explicit extra_body with explicit winning
- hermes_cli/config.py: extra_body defaults on all aux tasks +
  session_search.max_concurrency; _config_version 19 -> 20
- tools/session_search_tool.py: semaphore around _summarize_all gather
- tests: coverage in test_auxiliary_client, test_session_search, test_aux_config
- docs: user-guide/configuration.md + fallback-providers.md

Co-authored-by: Teknium <teknium@nousresearch.com>

											
										
										
											2026-04-20 00:44:32 -07:00
+								    effective_extra_body = _get_task_extra_body(task)
 								    effective_extra_body.update(extra_body or {})
-												feat: call_llm/async_call_llm + config slots + migrate all consumers

Add centralized call_llm() and async_call_llm() functions that own the
full LLM request lifecycle:
  1. Resolve provider + model from task config or explicit args
  2. Get or create a cached client for that provider
  3. Format request args (max_tokens handling, provider extra_body)
  4. Make the API call with max_tokens/max_completion_tokens retry
  5. Return the response

Config: expanded auxiliary section with provider:model slots for all
tasks (compression, vision, web_extract, session_search, skills_hub,
mcp, flush_memories). Config version bumped to 7.

Migrated all auxiliary consumers:
- context_compressor.py: uses call_llm(task='compression')
- vision_tools.py: uses async_call_llm(task='vision')
- web_tools.py: uses async_call_llm(task='web_extract')
- session_search_tool.py: uses async_call_llm(task='session_search')
- browser_tool.py: uses call_llm(task='vision'/'web_extract')
- mcp_tool.py: uses call_llm(task='mcp')
- skills_guard.py: uses call_llm(provider='openrouter')
- run_agent.py flush_memories: uses call_llm(task='flush_memories')

Tests updated for context_compressor and MCP tool. Some test mocks
still need updating (15 remaining failures from mock pattern changes,
2 pre-existing).

											
										
										
											2026-03-11 20:52:19 -07:00
-												refactor: unify vision backend gating

											
										
										
											2026-03-14 20:22:13 -07:00
+								    if task == "vision":
 								        effective_provider, client, final_model = resolve_vision_provider_client(
-												fix: pass resolved args to resolve_vision_provider_client()

resolve_vision_provider_client() was receiving the raw call_llm
parameters instead of the resolved provider/model/key/url from
_resolve_task_provider_model(). This caused config overrides
(auxiliary.vision.provider, etc.) to be silently discarded.

Cherry-picked from #10901 by @lrawnsley.

											
										
										
											2026-04-16 19:24:21 +05:30
+								            provider=resolved_provider if resolved_provider != "auto" else provider,
 								            model=resolved_model or model,
 								            base_url=resolved_base_url or base_url,
 								            api_key=resolved_api_key or api_key,
-												refactor: unify vision backend gating

											
										
										
											2026-03-14 20:22:13 -07:00
+								            async_mode=True,
 								        )
-												feat: add direct endpoint overrides for auxiliary and delegation

Add base_url/api_key overrides for auxiliary tasks and delegation so users can
route those flows straight to a custom OpenAI-compatible endpoint without
having to rely on provider=main or named custom providers.

Also clear gateway session env vars in test isolation so the full suite stays
deterministic when run from a messaging-backed agent session.

											
										
										
											2026-03-14 20:48:29 -07:00
+								        if client is None and resolved_provider != "auto" and not resolved_base_url:
-												refactor: unify vision backend gating

											
										
										
											2026-03-14 20:22:13 -07:00
+								            logger.warning(
 								                "Vision provider %s unavailable, falling back to auto vision backends",
 								                resolved_provider,
 								            )
 								            effective_provider, client, final_model = resolve_vision_provider_client(
 								                provider="auto",
 								                model=resolved_model,
 								                async_mode=True,
 								            )
 								        if client is None:
 								            raise RuntimeError(
 								                f"No LLM provider configured for task={task} provider={resolved_provider}. "
 								                f"Run: hermes setup"
 								            )
 								        resolved_provider = effective_provider or resolved_provider
 								    else:
 								        client, final_model = _get_cached_client(
-												feat: add direct endpoint overrides for auxiliary and delegation

Add base_url/api_key overrides for auxiliary tasks and delegation so users can
route those flows straight to a custom OpenAI-compatible endpoint without
having to rely on provider=main or named custom providers.

Also clear gateway session env vars in test isolation so the full suite stays
deterministic when run from a messaging-backed agent session.

											
										
										
											2026-03-14 20:48:29 -07:00
+								            resolved_provider,
 								            resolved_model,
 								            async_mode=True,
 								            base_url=resolved_base_url,
 								            api_key=resolved_api_key,
-												fix(auxiliary): honor api_mode in auxiliary client (#6800)

The auxiliary client always calls client.chat.completions.create(),
ignoring the api_mode config flag. This breaks codex-family models
(e.g. gpt-5.3-codex) on direct OpenAI API keys, which need the
/v1/responses endpoint.

Changes:
- Expand _resolve_task_provider_model to return api_mode (5-tuple)
- Read api_mode from auxiliary.{task}.api_mode config and env vars
  (AUXILIARY_{TASK}_API_MODE)
- Pass api_mode through _get_cached_client to resolve_provider_client
- Add _needs_codex_wrap/_wrap_if_needed helpers that wrap plain OpenAI
  clients in CodexAuxiliaryClient when api_mode=codex_responses or
  when auto-detection finds api.openai.com + codex model pattern
- Apply wrapping at all custom endpoint, named custom provider, and
  API-key provider return paths
- Update test mocks for the new 5-tuple return format

Users can now set:
  auxiliary:
    compression:
      model: gpt-5.3-codex
      base_url: https://api.openai.com/v1
      api_mode: codex_responses

Closes #6800

											
										
										
											2026-04-11 13:50:43 +05:30
+								            api_mode=resolved_api_mode,
-												feat: add direct endpoint overrides for auxiliary and delegation

Add base_url/api_key overrides for auxiliary tasks and delegation so users can
route those flows straight to a custom OpenAI-compatible endpoint without
having to rely on provider=main or named custom providers.

Also clear gateway session env vars in test isolation so the full suite stays
deterministic when run from a messaging-backed agent session.

											
										
										
											2026-03-14 20:48:29 -07:00
+								        )
-												refactor: unify vision backend gating

											
										
										
											2026-03-14 20:22:13 -07:00
+								        if client is None:
-												fix: fail fast when explicit provider has no API key instead of silent OpenRouter fallback (#2445)

When a non-OpenRouter provider (e.g. minimax, anthropic) is set in
config.yaml but its API key is missing, Hermes silently fell back to
OpenRouter, causing confusing 404 errors.

Now checks if the user explicitly configured a provider before falling
back. Explicit providers raise RuntimeError with a clear message naming
the missing env var. Auto/openrouter/custom providers still fall through
to OpenRouter as before.

Three code paths fixed:
- run_agent.py AIAgent.__init__ — main client initialization
- auxiliary_client.py call_llm — sync auxiliary calls
- auxiliary_client.py call_llm_streaming — async auxiliary calls

Based on PR #2272 by @StefanIsMe. Applied manually to fix a
pconfig NameError in the original and extend to call_llm_streaming.

Co-authored-by: StefanIsMe <StefanIsMe@users.noreply.github.com>
											
										
										
											2026-03-22 03:59:29 -07:00
+								            _explicit = (resolved_provider or "").strip().lower()
 								            if _explicit and _explicit not in ("auto", "openrouter", "custom"):
 								                raise RuntimeError(
 								                    f"Provider '{_explicit}' is set in config.yaml but no API key "
 								                    f"was found. Set the {_explicit.upper()}_API_KEY environment "
 								                    f"variable, or switch to a different provider with `hermes model`."
 								                )
 								            if not resolved_base_url:
-												fix(auxiliary): harden fallback behavior for non-OpenRouter users

Four fixes to auxiliary_client.py:

1. Respect explicit provider as hard constraint (#7559)
   When auxiliary.{task}.provider is explicitly set (not 'auto'),
   connection/payment errors no longer silently fallback to cloud
   providers. Local-only users (Ollama, vLLM) will no longer get
   unexpected OpenRouter billing from auxiliary tasks.

2. Eliminate model='default' sentinel (#7512)
   _resolve_api_key_provider() no longer sends literal 'default' as
   model name to APIs. Providers without a known aux model in
   _API_KEY_PROVIDER_AUX_MODELS are skipped instead of producing
   model_not_supported errors.

3. Add payment/connection fallback to async_call_llm (#7512)
   async_call_llm now mirrors sync call_llm's fallback logic for
   payment (402) and connection errors. Previously, async consumers
   (session_search, web_tools, vision) got hard failures with no
   recovery. Also fixes hardcoded 'openrouter' fallback to use the
   full auto-detection chain.

4. Use accurate error reason in fallback logs (#7512)
   _try_payment_fallback() now accepts a reason parameter and uses
   it in log messages. Connection timeouts are no longer misleadingly
   logged as 'payment error'.

Closes #7559
Closes #7512

											
										
										
											2026-04-11 12:37:53 +05:30
+								                logger.info("Auxiliary %s: provider %s unavailable, trying auto-detection chain",
 								                            task or "call", resolved_provider)
 								                client, final_model = _get_cached_client("auto", async_mode=True)
-												refactor: unify vision backend gating

											
										
										
											2026-03-14 20:22:13 -07:00
+								        if client is None:
 								            raise RuntimeError(
 								                f"No LLM provider configured for task={task} provider={resolved_provider}. "
 								                f"Run: hermes setup")
-												feat: call_llm/async_call_llm + config slots + migrate all consumers

Add centralized call_llm() and async_call_llm() functions that own the
full LLM request lifecycle:
  1. Resolve provider + model from task config or explicit args
  2. Get or create a cached client for that provider
  3. Format request args (max_tokens handling, provider extra_body)
  4. Make the API call with max_tokens/max_completion_tokens retry
  5. Return the response

Config: expanded auxiliary section with provider:model slots for all
tasks (compression, vision, web_extract, session_search, skills_hub,
mcp, flush_memories). Config version bumped to 7.

Migrated all auxiliary consumers:
- context_compressor.py: uses call_llm(task='compression')
- vision_tools.py: uses async_call_llm(task='vision')
- web_tools.py: uses async_call_llm(task='web_extract')
- session_search_tool.py: uses async_call_llm(task='session_search')
- browser_tool.py: uses call_llm(task='vision'/'web_extract')
- mcp_tool.py: uses call_llm(task='mcp')
- skills_guard.py: uses call_llm(provider='openrouter')
- run_agent.py flush_memories: uses call_llm(task='flush_memories')

Tests updated for context_compressor and MCP tool. Some test mocks
still need updating (15 remaining failures from mock pattern changes,
2 pre-existing).

											
										
										
											2026-03-11 20:52:19 -07:00
-												feat(agent): configurable timeouts for auxiliary LLM calls via config.yaml (#3597)

Add per-task timeout settings under auxiliary.{task}.timeout in config.yaml
instead of hardcoded values. Users with slow local models (Ollama, llama.cpp)
can now increase timeouts for compression, vision, session search, etc.

Defaults:
  - auxiliary.compression.timeout: 120s (was hardcoded 45s)
  - auxiliary.vision.timeout: 30s (unchanged)
  - all other aux tasks: 30s (was hardcoded 30s)
  - title_generator: 30s (was hardcoded 15s)

call_llm/async_call_llm now auto-resolve timeout from config when not
explicitly passed. Callers can still override with an explicit timeout arg.

Based on PR #3406 by alanfwilliams. Converted from env vars to config.yaml
per project conventions.

Co-authored-by: alanfwilliams <alanfwilliams@users.noreply.github.com>
											
										
										
											2026-03-28 14:35:28 -07:00
+								    effective_timeout = timeout if timeout is not None else _get_task_timeout(task)
-												fix(kimi): route temperature override by base_url — kimi-k2.5 needs 1.0 on api.moonshot.ai

Follow-up to #12144.  That PR standardized the kimi-k2.* temperature lock
against the Coding Plan endpoint (api.kimi.com/coding/v1) docs, where
non-thinking models require 0.6.  Verified empirically against Moonshot
(April 2026) that the public chat endpoint (api.moonshot.ai/v1) has a
different contract for kimi-k2.5: it only accepts temperature=1, and rejects
0.6 with:

    HTTP 400 "invalid temperature: only 1 is allowed for this model"

Users hit the public endpoint when KIMI_API_KEY is a legacy sk-* key (the
sk-kimi-* prefix routes to Coding Plan — see hermes_cli/auth.py).  So for
Coding Plan subscribers the fix from #12144 is correct, but for public-API
users it reintroduces the exact 400 reported in #9125.

Reproduction on api.moonshot.ai/v1 + kimi-k2.5:
  temperature=1.0 → 200 OK
  temperature=0.6 → 400 "only 1 is allowed"     ← #12144 default
  temperature=None → 200 OK

Other kimi-k2.* models are unaffected empirically — turbo-preview accepts
0.6 and thinking-turbo accepts 1.0 on both endpoints — so only kimi-k2.5
diverges.

Fix: thread the client's actual base_url through _build_call_kwargs (the
parameter already existed but callers passed config-level resolved_base_url;
for auto-detected routes that was often empty).  _fixed_temperature_for_model
now checks api.moonshot.ai first via an explicit _KIMI_PUBLIC_API_OVERRIDES
map, then falls back to the Coding Plan defaults.  Tests parametrize over
endpoint + model to lock both contracts.

Closes #9125.

											
										
										
											2026-04-20 04:18:49 +09:00
+								    # Pass the client's actual base_url (not just resolved_base_url) so
 								    # endpoint-specific temperature overrides can distinguish
 								    # api.moonshot.ai vs api.kimi.com/coding even on auto-detected routes.
 								    _client_base = str(getattr(client, "base_url", "") or "")
-												feat: call_llm/async_call_llm + config slots + migrate all consumers

Add centralized call_llm() and async_call_llm() functions that own the
full LLM request lifecycle:
  1. Resolve provider + model from task config or explicit args
  2. Get or create a cached client for that provider
  3. Format request args (max_tokens handling, provider extra_body)
  4. Make the API call with max_tokens/max_completion_tokens retry
  5. Return the response

Config: expanded auxiliary section with provider:model slots for all
tasks (compression, vision, web_extract, session_search, skills_hub,
mcp, flush_memories). Config version bumped to 7.

Migrated all auxiliary consumers:
- context_compressor.py: uses call_llm(task='compression')
- vision_tools.py: uses async_call_llm(task='vision')
- web_tools.py: uses async_call_llm(task='web_extract')
- session_search_tool.py: uses async_call_llm(task='session_search')
- browser_tool.py: uses call_llm(task='vision'/'web_extract')
- mcp_tool.py: uses call_llm(task='mcp')
- skills_guard.py: uses call_llm(provider='openrouter')
- run_agent.py flush_memories: uses call_llm(task='flush_memories')

Tests updated for context_compressor and MCP tool. Some test mocks
still need updating (15 remaining failures from mock pattern changes,
2 pre-existing).

											
										
										
											2026-03-11 20:52:19 -07:00
+								    kwargs = _build_call_kwargs(
 								        resolved_provider, final_model, messages,
 								        temperature=temperature, max_tokens=max_tokens,
-												fix(aux): add session_search extra_body and concurrency controls

Adds auxiliary.<task>.extra_body config passthrough so reasoning-heavy
OpenAI-compatible providers can receive provider-specific request fields
(e.g. enable_thinking: false on GLM) on auxiliary calls, and bounds
session_search summary fan-out with auxiliary.session_search.max_concurrency
(default 3, clamped 1-5) to avoid 429 bursts on small providers.

- agent/auxiliary_client.py: extract _get_auxiliary_task_config helper,
  add _get_task_extra_body, merge config+explicit extra_body with explicit winning
- hermes_cli/config.py: extra_body defaults on all aux tasks +
  session_search.max_concurrency; _config_version 19 -> 20
- tools/session_search_tool.py: semaphore around _summarize_all gather
- tests: coverage in test_auxiliary_client, test_session_search, test_aux_config
- docs: user-guide/configuration.md + fallback-providers.md

Co-authored-by: Teknium <teknium@nousresearch.com>

											
										
										
											2026-04-20 00:44:32 -07:00
+								        tools=tools, timeout=effective_timeout, extra_body=effective_extra_body,
-												fix(kimi): route temperature override by base_url — kimi-k2.5 needs 1.0 on api.moonshot.ai

Follow-up to #12144.  That PR standardized the kimi-k2.* temperature lock
against the Coding Plan endpoint (api.kimi.com/coding/v1) docs, where
non-thinking models require 0.6.  Verified empirically against Moonshot
(April 2026) that the public chat endpoint (api.moonshot.ai/v1) has a
different contract for kimi-k2.5: it only accepts temperature=1, and rejects
0.6 with:

    HTTP 400 "invalid temperature: only 1 is allowed for this model"

Users hit the public endpoint when KIMI_API_KEY is a legacy sk-* key (the
sk-kimi-* prefix routes to Coding Plan — see hermes_cli/auth.py).  So for
Coding Plan subscribers the fix from #12144 is correct, but for public-API
users it reintroduces the exact 400 reported in #9125.

Reproduction on api.moonshot.ai/v1 + kimi-k2.5:
  temperature=1.0 → 200 OK
  temperature=0.6 → 400 "only 1 is allowed"     ← #12144 default
  temperature=None → 200 OK

Other kimi-k2.* models are unaffected empirically — turbo-preview accepts
0.6 and thinking-turbo accepts 1.0 on both endpoints — so only kimi-k2.5
diverges.

Fix: thread the client's actual base_url through _build_call_kwargs (the
parameter already existed but callers passed config-level resolved_base_url;
for auto-detected routes that was often empty).  _fixed_temperature_for_model
now checks api.moonshot.ai first via an explicit _KIMI_PUBLIC_API_OVERRIDES
map, then falls back to the Coding Plan defaults.  Tests parametrize over
endpoint + model to lock both contracts.

Closes #9125.

											
										
										
											2026-04-20 04:18:49 +09:00
+								        base_url=_client_base or resolved_base_url)
-												feat: call_llm/async_call_llm + config slots + migrate all consumers

Add centralized call_llm() and async_call_llm() functions that own the
full LLM request lifecycle:
  1. Resolve provider + model from task config or explicit args
  2. Get or create a cached client for that provider
  3. Format request args (max_tokens handling, provider extra_body)
  4. Make the API call with max_tokens/max_completion_tokens retry
  5. Return the response

Config: expanded auxiliary section with provider:model slots for all
tasks (compression, vision, web_extract, session_search, skills_hub,
mcp, flush_memories). Config version bumped to 7.

Migrated all auxiliary consumers:
- context_compressor.py: uses call_llm(task='compression')
- vision_tools.py: uses async_call_llm(task='vision')
- web_tools.py: uses async_call_llm(task='web_extract')
- session_search_tool.py: uses async_call_llm(task='session_search')
- browser_tool.py: uses call_llm(task='vision'/'web_extract')
- mcp_tool.py: uses call_llm(task='mcp')
- skills_guard.py: uses call_llm(provider='openrouter')
- run_agent.py flush_memories: uses call_llm(task='flush_memories')

Tests updated for context_compressor and MCP tool. Some test mocks
still need updating (15 remaining failures from mock pattern changes,
2 pre-existing).

											
										
										
											2026-03-11 20:52:19 -07:00
-												fix: three provider-related bugs (#8161, #8181, #8147) (#8243)

- Add openai/openai-codex -> openai mapping to PROVIDER_TO_MODELS_DEV
  so context-length lookups use models.dev data instead of 128k fallback.
  Fixes #8161.

- Set api_mode from custom_providers entry when switching via hermes model,
  and clear stale api_mode when the entry has none. Also extract api_mode
  in _named_custom_provider_map(). Fixes #8181.

- Convert OpenAI image_url content blocks to Anthropic image blocks when
  the endpoint is Anthropic-compatible (MiniMax, MiniMax-CN, or any URL
  containing /anthropic). Fixes #8147.
											
										
										
											2026-04-12 01:44:18 -07:00
+								    # Convert image blocks for Anthropic-compatible endpoints (e.g. MiniMax)
 								    if _is_anthropic_compat_endpoint(resolved_provider, _client_base):
 								        kwargs["messages"] = _convert_openai_images_to_anthropic(kwargs["messages"])
-												feat: call_llm/async_call_llm + config slots + migrate all consumers

Add centralized call_llm() and async_call_llm() functions that own the
full LLM request lifecycle:
  1. Resolve provider + model from task config or explicit args
  2. Get or create a cached client for that provider
  3. Format request args (max_tokens handling, provider extra_body)
  4. Make the API call with max_tokens/max_completion_tokens retry
  5. Return the response

Config: expanded auxiliary section with provider:model slots for all
tasks (compression, vision, web_extract, session_search, skills_hub,
mcp, flush_memories). Config version bumped to 7.

Migrated all auxiliary consumers:
- context_compressor.py: uses call_llm(task='compression')
- vision_tools.py: uses async_call_llm(task='vision')
- web_tools.py: uses async_call_llm(task='web_extract')
- session_search_tool.py: uses async_call_llm(task='session_search')
- browser_tool.py: uses call_llm(task='vision'/'web_extract')
- mcp_tool.py: uses call_llm(task='mcp')
- skills_guard.py: uses call_llm(provider='openrouter')
- run_agent.py flush_memories: uses call_llm(task='flush_memories')

Tests updated for context_compressor and MCP tool. Some test mocks
still need updating (15 remaining failures from mock pattern changes,
2 pre-existing).

											
										
										
											2026-03-11 20:52:19 -07:00
+								    try:
-												fix(auxiliary): validate response shape in call_llm/async_call_llm (#7264)

async_call_llm (and call_llm) can return non-OpenAI objects from
custom providers or adapter shims, crashing downstream consumers
with misleading AttributeError ('str' has no attribute 'choices').

Add _validate_llm_response() that checks the response has the
expected .choices[0].message shape before returning. Wraps all
return paths in call_llm, async_call_llm, and fallback paths.
Fails fast with a clear RuntimeError identifying the task, response
type, and a preview of the malformed payload.

Closes #7264

											
										
										
											2026-04-11 13:53:00 +05:30
+								        return _validate_llm_response(
 								            await client.chat.completions.create(**kwargs), task)
-												feat: call_llm/async_call_llm + config slots + migrate all consumers

Add centralized call_llm() and async_call_llm() functions that own the
full LLM request lifecycle:
  1. Resolve provider + model from task config or explicit args
  2. Get or create a cached client for that provider
  3. Format request args (max_tokens handling, provider extra_body)
  4. Make the API call with max_tokens/max_completion_tokens retry
  5. Return the response

Config: expanded auxiliary section with provider:model slots for all
tasks (compression, vision, web_extract, session_search, skills_hub,
mcp, flush_memories). Config version bumped to 7.

Migrated all auxiliary consumers:
- context_compressor.py: uses call_llm(task='compression')
- vision_tools.py: uses async_call_llm(task='vision')
- web_tools.py: uses async_call_llm(task='web_extract')
- session_search_tool.py: uses async_call_llm(task='session_search')
- browser_tool.py: uses call_llm(task='vision'/'web_extract')
- mcp_tool.py: uses call_llm(task='mcp')
- skills_guard.py: uses call_llm(provider='openrouter')
- run_agent.py flush_memories: uses call_llm(task='flush_memories')

Tests updated for context_compressor and MCP tool. Some test mocks
still need updating (15 remaining failures from mock pattern changes,
2 pre-existing).

											
										
										
											2026-03-11 20:52:19 -07:00
+								    except Exception as first_err:
-												fix(auxiliary): retry without temperature when any provider rejects it

Universal reactive fix for 'HTTP 400: Unsupported parameter: temperature'
across all providers/models — not just Codex Responses.

The same backend can accept temperature for some models and reject it for
others (e.g. gpt-5.4 accepts but gpt-5.5 rejects on the same OpenAI
endpoint; similar patterns on Copilot, OpenRouter reasoning routes, and
Anthropic Opus 4.7+ via OAI-compat). An allow/deny-list by model name does
not scale.

call_llm / async_call_llm now detect the concrete 'unsupported parameter:
temperature' 400 and transparently retry once without temperature. Kimi's
server-managed omission and Opus 4.7+'s proactive strip stay in place —
this is the safety net for everything else.

Changes:
- agent/auxiliary_client.py: add _is_unsupported_temperature_error helper;
  wire into both sync and async call_llm paths before the existing
  max_tokens/payment/auth retry ladder
- tests/agent/test_unsupported_temperature_retry.py: 19 tests covering
  detector phrasings, sync + async retry, no-retry-without-temperature,
  and non-temperature 400s not triggering the retry

Builds on PR #15620 (codex_responses fallback) which stripped temperature
up front for that one api_mode. This PR closes the gap for every other
provider/model combo via reactive retry.

Credit: retry approach and detector originate from @BlueBirdBack's PR #15578.

Co-authored-by: BlueBirdBack <BlueBirdBack@users.noreply.github.com>

											
										
										
											2026-04-25 05:25:09 -07:00
+								        if "temperature" in kwargs and _is_unsupported_temperature_error(first_err):
 								            retry_kwargs = dict(kwargs)
 								            retry_kwargs.pop("temperature", None)
 								            logger.info(
 								                "Auxiliary %s (async): provider rejected temperature; retrying once without it",
 								                task or "call",
 								            )
 								            try:
 								                return _validate_llm_response(
 								                    await client.chat.completions.create(**retry_kwargs), task)
 								            except Exception as retry_err:
 								                retry_err_str = str(retry_err)
 								                if not (
 								                    _is_payment_error(retry_err)
 								                    or _is_connection_error(retry_err)
 								                    or _is_auth_error(retry_err)
 								                    or "max_tokens" in retry_err_str
 								                    or "unsupported_parameter" in retry_err_str
 								                ):
 								                    raise
 								                first_err = retry_err
 								                kwargs = retry_kwargs
-												feat: call_llm/async_call_llm + config slots + migrate all consumers

Add centralized call_llm() and async_call_llm() functions that own the
full LLM request lifecycle:
  1. Resolve provider + model from task config or explicit args
  2. Get or create a cached client for that provider
  3. Format request args (max_tokens handling, provider extra_body)
  4. Make the API call with max_tokens/max_completion_tokens retry
  5. Return the response

Config: expanded auxiliary section with provider:model slots for all
tasks (compression, vision, web_extract, session_search, skills_hub,
mcp, flush_memories). Config version bumped to 7.

Migrated all auxiliary consumers:
- context_compressor.py: uses call_llm(task='compression')
- vision_tools.py: uses async_call_llm(task='vision')
- web_tools.py: uses async_call_llm(task='web_extract')
- session_search_tool.py: uses async_call_llm(task='session_search')
- browser_tool.py: uses call_llm(task='vision'/'web_extract')
- mcp_tool.py: uses call_llm(task='mcp')
- skills_guard.py: uses call_llm(provider='openrouter')
- run_agent.py flush_memories: uses call_llm(task='flush_memories')

Tests updated for context_compressor and MCP tool. Some test mocks
still need updating (15 remaining failures from mock pattern changes,
2 pre-existing).

											
										
										
											2026-03-11 20:52:19 -07:00
+								        err_str = str(first_err)
-												fix(auxiliary): generalize unsupported-parameter detector and harden max_tokens retry (#15633)

Generalize the temperature-specific 400 retry that shipped in PR #15621 so
the same reactive strategy covers any provider that rejects an arbitrary
request parameter —  — not just temperature.

- agent/auxiliary_client.py:
  * New _is_unsupported_parameter_error(exc, param): matches the same six
    phrasings the old temperature detector did plus 'unrecognized parameter'
    and 'invalid parameter', against any named param.
  * _is_unsupported_temperature_error is now a thin back-compat wrapper so
    existing imports and tests keep working.
  * The max_tokens → max_completion_tokens retry branch in call_llm and
    async_call_llm now (a) gates on 'max_tokens is not None' so we do not
    pop a key that was never set and silently substitute a None value on
    the retry, and (b) also matches the generic helper in addition to the
    legacy 'max_tokens' / 'unsupported_parameter' substring checks — picking
    up phrasings like 'Unknown parameter: max_tokens' that previously slipped
    through.

- tests/agent/test_unsupported_parameter_retry.py: 18 new tests covering
  the generic detector across params, the back-compat wrapper, and the two
  hardenings to the max_tokens retry branch (None gate + generic phrasing).

Credit: retry-generalization pattern from @nicholasrae's PR #15416. That PR
also proposed the reactive temperature retry which landed independently via
PR #15621 + #15623 (co-authored with @BlueBirdBack). This commit salvages
the remaining hardening ideas onto current main.
											
										
										
											2026-04-25 05:50:34 -07:00
+								        if max_tokens is not None and (
 								            "max_tokens" in err_str
 								            or "unsupported_parameter" in err_str
 								            or _is_unsupported_parameter_error(first_err, "max_tokens")
 								        ):
-												feat: call_llm/async_call_llm + config slots + migrate all consumers

Add centralized call_llm() and async_call_llm() functions that own the
full LLM request lifecycle:
  1. Resolve provider + model from task config or explicit args
  2. Get or create a cached client for that provider
  3. Format request args (max_tokens handling, provider extra_body)
  4. Make the API call with max_tokens/max_completion_tokens retry
  5. Return the response

Config: expanded auxiliary section with provider:model slots for all
tasks (compression, vision, web_extract, session_search, skills_hub,
mcp, flush_memories). Config version bumped to 7.

Migrated all auxiliary consumers:
- context_compressor.py: uses call_llm(task='compression')
- vision_tools.py: uses async_call_llm(task='vision')
- web_tools.py: uses async_call_llm(task='web_extract')
- session_search_tool.py: uses async_call_llm(task='session_search')
- browser_tool.py: uses call_llm(task='vision'/'web_extract')
- mcp_tool.py: uses call_llm(task='mcp')
- skills_guard.py: uses call_llm(provider='openrouter')
- run_agent.py flush_memories: uses call_llm(task='flush_memories')

Tests updated for context_compressor and MCP tool. Some test mocks
still need updating (15 remaining failures from mock pattern changes,
2 pre-existing).

											
										
										
											2026-03-11 20:52:19 -07:00
+								            kwargs.pop("max_tokens", None)
 								            kwargs["max_completion_tokens"] = max_tokens
-												fix(auxiliary): harden fallback behavior for non-OpenRouter users

Four fixes to auxiliary_client.py:

1. Respect explicit provider as hard constraint (#7559)
   When auxiliary.{task}.provider is explicitly set (not 'auto'),
   connection/payment errors no longer silently fallback to cloud
   providers. Local-only users (Ollama, vLLM) will no longer get
   unexpected OpenRouter billing from auxiliary tasks.

2. Eliminate model='default' sentinel (#7512)
   _resolve_api_key_provider() no longer sends literal 'default' as
   model name to APIs. Providers without a known aux model in
   _API_KEY_PROVIDER_AUX_MODELS are skipped instead of producing
   model_not_supported errors.

3. Add payment/connection fallback to async_call_llm (#7512)
   async_call_llm now mirrors sync call_llm's fallback logic for
   payment (402) and connection errors. Previously, async consumers
   (session_search, web_tools, vision) got hard failures with no
   recovery. Also fixes hardcoded 'openrouter' fallback to use the
   full auto-detection chain.

4. Use accurate error reason in fallback logs (#7512)
   _try_payment_fallback() now accepts a reason parameter and uses
   it in log messages. Connection timeouts are no longer misleadingly
   logged as 'payment error'.

Closes #7559
Closes #7512

											
										
										
											2026-04-11 12:37:53 +05:30
+								            try:
-												fix(auxiliary): validate response shape in call_llm/async_call_llm (#7264)

async_call_llm (and call_llm) can return non-OpenAI objects from
custom providers or adapter shims, crashing downstream consumers
with misleading AttributeError ('str' has no attribute 'choices').

Add _validate_llm_response() that checks the response has the
expected .choices[0].message shape before returning. Wraps all
return paths in call_llm, async_call_llm, and fallback paths.
Fails fast with a clear RuntimeError identifying the task, response
type, and a preview of the malformed payload.

Closes #7264

											
										
										
											2026-04-11 13:53:00 +05:30
+								                return _validate_llm_response(
 								                    await client.chat.completions.create(**kwargs), task)
-												fix(auxiliary): harden fallback behavior for non-OpenRouter users

Four fixes to auxiliary_client.py:

1. Respect explicit provider as hard constraint (#7559)
   When auxiliary.{task}.provider is explicitly set (not 'auto'),
   connection/payment errors no longer silently fallback to cloud
   providers. Local-only users (Ollama, vLLM) will no longer get
   unexpected OpenRouter billing from auxiliary tasks.

2. Eliminate model='default' sentinel (#7512)
   _resolve_api_key_provider() no longer sends literal 'default' as
   model name to APIs. Providers without a known aux model in
   _API_KEY_PROVIDER_AUX_MODELS are skipped instead of producing
   model_not_supported errors.

3. Add payment/connection fallback to async_call_llm (#7512)
   async_call_llm now mirrors sync call_llm's fallback logic for
   payment (402) and connection errors. Previously, async consumers
   (session_search, web_tools, vision) got hard failures with no
   recovery. Also fixes hardcoded 'openrouter' fallback to use the
   full auto-detection chain.

4. Use accurate error reason in fallback logs (#7512)
   _try_payment_fallback() now accepts a reason parameter and uses
   it in log messages. Connection timeouts are no longer misleadingly
   logged as 'payment error'.

Closes #7559
Closes #7512

											
										
										
											2026-04-11 12:37:53 +05:30
+								            except Exception as retry_err:
 								                # If the max_tokens retry also hits a payment or connection
 								                # error, fall through to the fallback chain below.
 								                if not (_is_payment_error(retry_err) or _is_connection_error(retry_err)):
 								                    raise
 								                first_err = retry_err
-												fix(auxiliary): refresh Nous runtime credentials after aux 401s

											
										
										
											2026-04-21 14:45:13 -06:00
+								        # ── Nous auth refresh parity with main agent ──────────────────
 								        client_is_nous = (
 								            resolved_provider == "nous"
 								            or base_url_host_matches(_client_base, "inference-api.nousresearch.com")
 								        )
 								        if _is_auth_error(first_err) and client_is_nous:
 								            refreshed_client, refreshed_model = _refresh_nous_auxiliary_client(
 								                cache_provider=resolved_provider or "nous",
 								                model=final_model,
 								                async_mode=True,
 								                base_url=resolved_base_url,
 								                api_key=resolved_api_key,
 								                api_mode=resolved_api_mode,
-												fix(copilot): send vision header for Copilot vision requests

Thread a vision-request flag through auxiliary provider resolution so Copilot clients can include Copilot-Vision-Request only for vision tasks. This preserves normal text requests while ensuring Copilot vision payloads reach the vision-capable route.

Add regression coverage for Copilot vision routing and keep cached text and vision clients separate so a text client without the header is not reused for vision.

Co-authored-by: dhabibi <9087935+dhabibi@users.noreply.github.com>

											
										
										
											2026-04-26 21:29:55 +00:00
+								                is_vision=(task == "vision"),
-												fix(auxiliary): refresh Nous runtime credentials after aux 401s

											
										
										
											2026-04-21 14:45:13 -06:00
+								            )
 								            if refreshed_client is not None:
 								                logger.info("Auxiliary %s (async): refreshed Nous runtime credentials after 401, retrying",
 								                            task or "call")
 								                if refreshed_model and refreshed_model != kwargs.get("model"):
 								                    kwargs["model"] = refreshed_model
 								                return _validate_llm_response(
 								                    await refreshed_client.chat.completions.create(**kwargs), task)
-												fix(aux): refresh cached auth after 401

											
										
										
											2026-04-23 19:49:46 +09:00
+								        # ── Auth refresh retry (mirrors sync call_llm) ───────────────
 								        if (_is_auth_error(first_err)
 								                and resolved_provider not in ("auto", "", None)
 								                and not client_is_nous):
 								            if _refresh_provider_credentials(resolved_provider):
 								                logger.info(
 								                    "Auxiliary %s (async): refreshed %s credentials after auth error, retrying",
 								                    task or "call", resolved_provider,
 								                )
 								                if task == "vision":
 								                    _, retry_client, retry_model = resolve_vision_provider_client(
 								                        provider=resolved_provider,
 								                        model=final_model,
 								                        async_mode=True,
 								                    )
 								                else:
 								                    retry_client, retry_model = _get_cached_client(
 								                        resolved_provider,
 								                        resolved_model,
 								                        async_mode=True,
 								                        base_url=resolved_base_url,
 								                        api_key=resolved_api_key,
 								                        api_mode=resolved_api_mode,
 								                    )
 								                if retry_client is not None:
 								                    retry_kwargs = _build_call_kwargs(
 								                        resolved_provider,
 								                        retry_model or final_model,
 								                        messages,
 								                        temperature=temperature,
 								                        max_tokens=max_tokens,
 								                        tools=tools,
 								                        timeout=effective_timeout,
 								                        extra_body=effective_extra_body,
 								                        base_url=resolved_base_url,
 								                    )
 								                    _retry_base = str(getattr(retry_client, "base_url", "") or "")
 								                    if _is_anthropic_compat_endpoint(resolved_provider, _retry_base):
 								                        retry_kwargs["messages"] = _convert_openai_images_to_anthropic(retry_kwargs["messages"])
 								                    return _validate_llm_response(
 								                        await retry_client.chat.completions.create(**retry_kwargs), task)
-												fix(auxiliary): harden fallback behavior for non-OpenRouter users

Four fixes to auxiliary_client.py:

1. Respect explicit provider as hard constraint (#7559)
   When auxiliary.{task}.provider is explicitly set (not 'auto'),
   connection/payment errors no longer silently fallback to cloud
   providers. Local-only users (Ollama, vLLM) will no longer get
   unexpected OpenRouter billing from auxiliary tasks.

2. Eliminate model='default' sentinel (#7512)
   _resolve_api_key_provider() no longer sends literal 'default' as
   model name to APIs. Providers without a known aux model in
   _API_KEY_PROVIDER_AUX_MODELS are skipped instead of producing
   model_not_supported errors.

3. Add payment/connection fallback to async_call_llm (#7512)
   async_call_llm now mirrors sync call_llm's fallback logic for
   payment (402) and connection errors. Previously, async consumers
   (session_search, web_tools, vision) got hard failures with no
   recovery. Also fixes hardcoded 'openrouter' fallback to use the
   full auto-detection chain.

4. Use accurate error reason in fallback logs (#7512)
   _try_payment_fallback() now accepts a reason parameter and uses
   it in log messages. Connection timeouts are no longer misleadingly
   logged as 'payment error'.

Closes #7559
Closes #7512

											
										
										
											2026-04-11 12:37:53 +05:30
+								        # ── Payment / connection fallback (mirrors sync call_llm) ─────
 								        should_fallback = _is_payment_error(first_err) or _is_connection_error(first_err)
 								        is_auto = resolved_provider in ("auto", "", None)
 								        if should_fallback and is_auto:
 								            reason = "payment error" if _is_payment_error(first_err) else "connection error"
 								            logger.info("Auxiliary %s (async): %s on %s (%s), trying fallback",
 								                        task or "call", reason, resolved_provider, first_err)
 								            fb_client, fb_model, fb_label = _try_payment_fallback(
 								                resolved_provider, task, reason=reason)
 								            if fb_client is not None:
 								                fb_kwargs = _build_call_kwargs(
 								                    fb_label, fb_model, messages,
 								                    temperature=temperature, max_tokens=max_tokens,
 								                    tools=tools, timeout=effective_timeout,
-												fix(aux): add session_search extra_body and concurrency controls

Adds auxiliary.<task>.extra_body config passthrough so reasoning-heavy
OpenAI-compatible providers can receive provider-specific request fields
(e.g. enable_thinking: false on GLM) on auxiliary calls, and bounds
session_search summary fan-out with auxiliary.session_search.max_concurrency
(default 3, clamped 1-5) to avoid 429 bursts on small providers.

- agent/auxiliary_client.py: extract _get_auxiliary_task_config helper,
  add _get_task_extra_body, merge config+explicit extra_body with explicit winning
- hermes_cli/config.py: extra_body defaults on all aux tasks +
  session_search.max_concurrency; _config_version 19 -> 20
- tools/session_search_tool.py: semaphore around _summarize_all gather
- tests: coverage in test_auxiliary_client, test_session_search, test_aux_config
- docs: user-guide/configuration.md + fallback-providers.md

Co-authored-by: Teknium <teknium@nousresearch.com>

											
										
										
											2026-04-20 00:44:32 -07:00
+								                    extra_body=effective_extra_body,
-												fix(kimi): route temperature override by base_url — kimi-k2.5 needs 1.0 on api.moonshot.ai

Follow-up to #12144.  That PR standardized the kimi-k2.* temperature lock
against the Coding Plan endpoint (api.kimi.com/coding/v1) docs, where
non-thinking models require 0.6.  Verified empirically against Moonshot
(April 2026) that the public chat endpoint (api.moonshot.ai/v1) has a
different contract for kimi-k2.5: it only accepts temperature=1, and rejects
0.6 with:

    HTTP 400 "invalid temperature: only 1 is allowed for this model"

Users hit the public endpoint when KIMI_API_KEY is a legacy sk-* key (the
sk-kimi-* prefix routes to Coding Plan — see hermes_cli/auth.py).  So for
Coding Plan subscribers the fix from #12144 is correct, but for public-API
users it reintroduces the exact 400 reported in #9125.

Reproduction on api.moonshot.ai/v1 + kimi-k2.5:
  temperature=1.0 → 200 OK
  temperature=0.6 → 400 "only 1 is allowed"     ← #12144 default
  temperature=None → 200 OK

Other kimi-k2.* models are unaffected empirically — turbo-preview accepts
0.6 and thinking-turbo accepts 1.0 on both endpoints — so only kimi-k2.5
diverges.

Fix: thread the client's actual base_url through _build_call_kwargs (the
parameter already existed but callers passed config-level resolved_base_url;
for auto-detected routes that was often empty).  _fixed_temperature_for_model
now checks api.moonshot.ai first via an explicit _KIMI_PUBLIC_API_OVERRIDES
map, then falls back to the Coding Plan defaults.  Tests parametrize over
endpoint + model to lock both contracts.

Closes #9125.

											
										
										
											2026-04-20 04:18:49 +09:00
+								                    base_url=str(getattr(fb_client, "base_url", "") or ""))
-												fix(auxiliary): harden fallback behavior for non-OpenRouter users

Four fixes to auxiliary_client.py:

1. Respect explicit provider as hard constraint (#7559)
   When auxiliary.{task}.provider is explicitly set (not 'auto'),
   connection/payment errors no longer silently fallback to cloud
   providers. Local-only users (Ollama, vLLM) will no longer get
   unexpected OpenRouter billing from auxiliary tasks.

2. Eliminate model='default' sentinel (#7512)
   _resolve_api_key_provider() no longer sends literal 'default' as
   model name to APIs. Providers without a known aux model in
   _API_KEY_PROVIDER_AUX_MODELS are skipped instead of producing
   model_not_supported errors.

3. Add payment/connection fallback to async_call_llm (#7512)
   async_call_llm now mirrors sync call_llm's fallback logic for
   payment (402) and connection errors. Previously, async consumers
   (session_search, web_tools, vision) got hard failures with no
   recovery. Also fixes hardcoded 'openrouter' fallback to use the
   full auto-detection chain.

4. Use accurate error reason in fallback logs (#7512)
   _try_payment_fallback() now accepts a reason parameter and uses
   it in log messages. Connection timeouts are no longer misleadingly
   logged as 'payment error'.

Closes #7559
Closes #7512

											
										
										
											2026-04-11 12:37:53 +05:30
+								                # Convert sync fallback client to async
-												fix(copilot): send vision header for Copilot vision requests

Thread a vision-request flag through auxiliary provider resolution so Copilot clients can include Copilot-Vision-Request only for vision tasks. This preserves normal text requests while ensuring Copilot vision payloads reach the vision-capable route.

Add regression coverage for Copilot vision routing and keep cached text and vision clients separate so a text client without the header is not reused for vision.

Co-authored-by: dhabibi <9087935+dhabibi@users.noreply.github.com>

											
										
										
											2026-04-26 21:29:55 +00:00
+								                async_fb, async_fb_model = _to_async_client(
 								                    fb_client, fb_model or "", is_vision=(task == "vision")
 								                )
-												fix(auxiliary): harden fallback behavior for non-OpenRouter users

Four fixes to auxiliary_client.py:

1. Respect explicit provider as hard constraint (#7559)
   When auxiliary.{task}.provider is explicitly set (not 'auto'),
   connection/payment errors no longer silently fallback to cloud
   providers. Local-only users (Ollama, vLLM) will no longer get
   unexpected OpenRouter billing from auxiliary tasks.

2. Eliminate model='default' sentinel (#7512)
   _resolve_api_key_provider() no longer sends literal 'default' as
   model name to APIs. Providers without a known aux model in
   _API_KEY_PROVIDER_AUX_MODELS are skipped instead of producing
   model_not_supported errors.

3. Add payment/connection fallback to async_call_llm (#7512)
   async_call_llm now mirrors sync call_llm's fallback logic for
   payment (402) and connection errors. Previously, async consumers
   (session_search, web_tools, vision) got hard failures with no
   recovery. Also fixes hardcoded 'openrouter' fallback to use the
   full auto-detection chain.

4. Use accurate error reason in fallback logs (#7512)
   _try_payment_fallback() now accepts a reason parameter and uses
   it in log messages. Connection timeouts are no longer misleadingly
   logged as 'payment error'.

Closes #7559
Closes #7512

											
										
										
											2026-04-11 12:37:53 +05:30
+								                if async_fb_model and async_fb_model != fb_kwargs.get("model"):
 								                    fb_kwargs["model"] = async_fb_model
-												fix(auxiliary): validate response shape in call_llm/async_call_llm (#7264)

async_call_llm (and call_llm) can return non-OpenAI objects from
custom providers or adapter shims, crashing downstream consumers
with misleading AttributeError ('str' has no attribute 'choices').

Add _validate_llm_response() that checks the response has the
expected .choices[0].message shape before returning. Wraps all
return paths in call_llm, async_call_llm, and fallback paths.
Fails fast with a clear RuntimeError identifying the task, response
type, and a preview of the malformed payload.

Closes #7264

											
										
										
											2026-04-11 13:53:00 +05:30
+								                return _validate_llm_response(
 								                    await async_fb.chat.completions.create(**fb_kwargs), task)
-												feat: call_llm/async_call_llm + config slots + migrate all consumers

Add centralized call_llm() and async_call_llm() functions that own the
full LLM request lifecycle:
  1. Resolve provider + model from task config or explicit args
  2. Get or create a cached client for that provider
  3. Format request args (max_tokens handling, provider extra_body)
  4. Make the API call with max_tokens/max_completion_tokens retry
  5. Return the response

Config: expanded auxiliary section with provider:model slots for all
tasks (compression, vision, web_extract, session_search, skills_hub,
mcp, flush_memories). Config version bumped to 7.

Migrated all auxiliary consumers:
- context_compressor.py: uses call_llm(task='compression')
- vision_tools.py: uses async_call_llm(task='vision')
- web_tools.py: uses async_call_llm(task='web_extract')
- session_search_tool.py: uses async_call_llm(task='session_search')
- browser_tool.py: uses call_llm(task='vision'/'web_extract')
- mcp_tool.py: uses call_llm(task='mcp')
- skills_guard.py: uses call_llm(provider='openrouter')
- run_agent.py flush_memories: uses call_llm(task='flush_memories')

Tests updated for context_compressor and MCP tool. Some test mocks
still need updating (15 remaining failures from mock pattern changes,
2 pre-existing).

											
										
										
											2026-03-11 20:52:19 -07:00
+								        raise