Files
hermes-agent/website/docs/developer-guide/provider-runtime.md
Teknium 289cc47631 docs: resync reference, user-guide, developer-guide, and messaging pages against code (#17738)
Broad drift audit against origin/main (b52b63396).

Reference pages (most user-visible drift):
- slash-commands: add /busy, /curator, /footer, /indicator, /redraw, /steer
  that were missing; drop non-existent /terminal-setup; fix /q footnote
  (resolves to /queue, not /quit); extend CLI-only list with all 24
  CLI-only commands in the registry
- cli-commands: add dedicated sections for hermes curator / fallback /
  hooks (new subcommands not previously documented); remove stale
  hermes honcho standalone section (the plugin registers dynamically
  via hermes memory); list curator/fallback/hooks in top-level table;
  fix completion to include fish
- toolsets-reference: document the real 52-toolset count; split browser
  vs browser-cdp; add discord / discord_admin / spotify / yuanbao;
  correct hermes-cli tool count from 36 to 38; fix misleading claim
  that hermes-homeassistant adds tools (it's identical to hermes-cli)
- tools-reference: bump tool count 55 -> 68; add 7 Spotify, 5 Yuanbao,
  2 Discord toolsets; move browser_cdp/browser_dialog to their own
  browser-cdp toolset section
- environment-variables: add 40+ user-facing HERMES_* vars that were
  undocumented (--yolo, --accept-hooks, --ignore-*, inference model
  override, agent/stream/checkpoint timeouts, OAuth trace, per-platform
  batch tuning for Telegram/Discord/Matrix/Feishu/WeCom, cron knobs,
  gateway restart/connect timeouts); dedupe the Cron Scheduler section;
  replace stale QQ_SANDBOX with QQ_PORTAL_HOST

User-guide (top level):
- cli.md: compression preserves last 20 turns, not 4 (protect_last_n: 20)
- configuration.md: display.platforms is the canonical per-platform
  override key; tool_progress_overrides is deprecated and auto-migrated
- profiles.md: model.default is the config key, not model.model
- sessions.md: CLI/TUI session IDs use 6-char hex, gateway uses 8
- checkpoints-and-rollback.md: destructive-command list now matches
  _DESTRUCTIVE_PATTERNS (adds rmdir, cp, install, dd)
- docker.md: the container runs as non-root hermes (UID 10000) via
  gosu; fix install command (uv pip); add missing --insecure on the
  dashboard compose example (required for non-loopback bind)
- security.md: systemctl danger pattern also matches 'restart'
- index.md: built-in tool count 47 -> 68
- integrations/index.md: 6 STT providers, 8 memory providers
- integrations/providers.md: drop fictional dashscope/qwen aliases

Features:
- overview.md: 9 image models (not 8), 9 TTS providers (not 5),
  8 memory providers (Supermemory was missing)
- tool-gateway.md: 9 image models
- tools.md: extend common-toolsets list with search / messaging /
  spotify / discord / debugging / safe
- fallback-providers.md: add 6 real providers from PROVIDER_REGISTRY
  (lmstudio, kimi-coding-cn, stepfun, alibaba-coding-plan,
  tencent-tokenhub, azure-foundry)
- plugins.md: Available Hooks table now includes on_session_finalize,
  on_session_reset, subagent_stop
- built-in-plugins.md: add the 7 bundled plugins the page didn't
  mention (spotify, google_meet, three image_gen providers, two
  dashboard examples)
- web-dashboard.md: add --insecure and --tui flags
- cron.md: hermes cron create takes positional schedule/prompt, not
  flags

Messaging:
- telegram.md: TELEGRAM_WEBHOOK_SECRET is now REQUIRED when
  TELEGRAM_WEBHOOK_URL is set (gateway refuses to start without it
  per GHSA-3vpc-7q5r-276h). Biggest user-visible drift in the batch.
- discord.md: HERMES_DISCORD_TEXT_BATCH_SPLIT_DELAY_SECONDS default
  is 2.0, not 0.1
- dingtalk.md: document DINGTALK_REQUIRE_MENTION /
  FREE_RESPONSE_CHATS / MENTION_PATTERNS / HOME_CHANNEL /
  ALLOW_ALL_USERS that the adapter supports
- bluebubbles.md: drop fictional BLUEBUBBLES_SEND_READ_RECEIPTS env
  var; the setting lives in platforms.bluebubbles.extra only
- qqbot.md: drop dead QQ_SANDBOX; add real QQ_PORTAL_HOST and
  QQ_GROUP_ALLOWED_USERS
- wecom-callback.md: replace 'hermes gateway start' (service-only)
  with 'hermes gateway' for first-time setup

Developer-guide:
- architecture.md: refresh tool/toolset counts (61/52), terminal
  backend count (7), line counts for run_agent.py (~13.7k), cli.py
  (~11.5k), main.py (~10.4k), setup.py (~3.5k), gateway/run.py
  (~12.2k), mcp_tool.py (~3.1k); add yuanbao adapter, bump platform
  adapter count 18 -> 20
- agent-loop.md: run_agent.py line count 10.7k -> 13.7k
- tools-runtime.md: add vercel_sandbox backend
- adding-tools.md: remove stale 'Discovery import added to
  model_tools.py' checklist item (registry auto-discovery)
- adding-platform-adapters.md: mark send_typing / get_chat_info as
  concrete base methods; only connect/disconnect/send are abstract
- acp-internals.md: ACP sessions now persist to SessionDB
  (~/.hermes/state.db); acp.run_agent call uses
  use_unstable_protocol=True
- cron-internals.md: gateway runs scheduler in a dedicated background
  thread via _start_cron_ticker, not on a maintenance cycle; locking
  is cross-process via fcntl.flock (Unix) / msvcrt.locking (Windows)
- gateway-internals.md: gateway/run.py ~12k lines
- provider-runtime.md: cron DOES support fallback (run_job reads
  fallback_providers from config)
- session-storage.md: SCHEMA_VERSION = 11 (not 9); add migrations
  10 and 11 (trigram FTS, inline-mode FTS5 re-index); add
  api_call_count column to Sessions DDL; document messages_fts_trigram
  and state_meta in the architecture tree
- context-compression-and-caching.md: remove the obsolete 'context
  pressure warnings' section (warnings were removed for causing
  models to give up early)
- context-engine-plugin.md: compress() signature now includes
  focus_topic param
- extending-the-cli.md: _build_tui_layout_children signature now
  includes model_picker_widget; add to default layout

Also fixed three pre-existing broken links/anchors the build warned
about (docker.md -> api-server.md, yuanbao.md -> cron-jobs.md and
tips#background-tasks, nix-setup.md -> #container-aware-cli).

Regenerated per-skill pages via website/scripts/generate-skill-docs.py
so catalog tables and sidebar are consistent with current SKILL.md
frontmatter.

docusaurus build: clean, no broken links or anchors.
2026-04-29 20:55:59 -07:00

7.3 KiB

sidebar_position, title, description
sidebar_position title description
4 Provider Runtime Resolution How Hermes resolves providers, credentials, API modes, and auxiliary models at runtime

Provider Runtime Resolution

Hermes has a shared provider runtime resolver used across:

  • CLI
  • gateway
  • cron jobs
  • ACP
  • auxiliary model calls

Primary implementation:

  • hermes_cli/runtime_provider.py — credential resolution, _resolve_custom_runtime()
  • hermes_cli/auth.py — provider registry, resolve_provider()
  • hermes_cli/model_switch.py — shared /model switch pipeline (CLI + gateway)
  • agent/auxiliary_client.py — auxiliary model routing

If you are trying to add a new first-class inference provider, read Adding Providers alongside this page.

Resolution precedence

At a high level, provider resolution uses:

  1. explicit CLI/runtime request
  2. config.yaml model/provider config
  3. environment variables
  4. provider-specific defaults or auto resolution

That ordering matters because Hermes treats the saved model/provider choice as the source of truth for normal runs. This prevents a stale shell export from silently overriding the endpoint a user last selected in hermes model.

Providers

Current provider families include:

  • AI Gateway (Vercel)
  • OpenRouter
  • Nous Portal
  • OpenAI Codex
  • Copilot / Copilot ACP
  • Anthropic (native)
  • Google / Gemini
  • Alibaba / DashScope
  • DeepSeek
  • Z.AI
  • Kimi / Moonshot
  • MiniMax
  • MiniMax China
  • Kilo Code
  • Hugging Face
  • OpenCode Zen / OpenCode Go
  • Custom (provider: custom) — first-class provider for any OpenAI-compatible endpoint
  • Named custom providers (custom_providers list in config.yaml)

Output of runtime resolution

The runtime resolver returns data such as:

  • provider
  • api_mode
  • base_url
  • api_key
  • source
  • provider-specific metadata like expiry/refresh info

Why this matters

This resolver is the main reason Hermes can share auth/runtime logic between:

  • hermes chat
  • gateway message handling
  • cron jobs running in fresh sessions
  • ACP editor sessions
  • auxiliary model tasks

AI Gateway

Set AI_GATEWAY_API_KEY in ~/.hermes/.env and run with --provider ai-gateway. Hermes fetches available models from the gateway's /models endpoint, filtering to language models with tool-use support.

OpenRouter, AI Gateway, and custom OpenAI-compatible base URLs

Hermes contains logic to avoid leaking the wrong API key to a custom endpoint when multiple provider keys exist (e.g. OPENROUTER_API_KEY, AI_GATEWAY_API_KEY, and OPENAI_API_KEY).

Each provider's API key is scoped to its own base URL:

  • OPENROUTER_API_KEY is only sent to openrouter.ai endpoints
  • AI_GATEWAY_API_KEY is only sent to ai-gateway.vercel.sh endpoints
  • OPENAI_API_KEY is used for custom endpoints and as a fallback

Hermes also distinguishes between:

  • a real custom endpoint selected by the user
  • the OpenRouter fallback path used when no custom endpoint is configured

That distinction is especially important for:

  • local model servers
  • non-OpenRouter/non-AI Gateway OpenAI-compatible APIs
  • switching providers without re-running setup
  • config-saved custom endpoints that should keep working even when OPENAI_BASE_URL is not exported in the current shell

Native Anthropic path

Anthropic is not just "via OpenRouter" anymore.

When provider resolution selects anthropic, Hermes uses:

  • api_mode = anthropic_messages
  • the native Anthropic Messages API
  • agent/anthropic_adapter.py for translation

Credential resolution for native Anthropic now prefers refreshable Claude Code credentials over copied env tokens when both are present. In practice that means:

  • Claude Code credential files are treated as the preferred source when they include refreshable auth
  • manual ANTHROPIC_TOKEN / CLAUDE_CODE_OAUTH_TOKEN values still work as explicit overrides
  • Hermes preflights Anthropic credential refresh before native Messages API calls
  • Hermes still retries once on a 401 after rebuilding the Anthropic client, as a fallback path

OpenAI Codex path

Codex uses a separate Responses API path:

  • api_mode = codex_responses
  • dedicated credential resolution and auth store support

Auxiliary model routing

Auxiliary tasks such as:

  • vision
  • web extraction summarization
  • context compression summaries
  • session search summarization
  • skills hub operations
  • MCP helper operations
  • memory flushes

can use their own provider/model routing rather than the main conversational model.

When an auxiliary task is configured with provider main, Hermes resolves that through the same shared runtime path as normal chat. In practice that means:

  • env-driven custom endpoints still work
  • custom endpoints saved via hermes model / config.yaml also work
  • auxiliary routing can tell the difference between a real saved custom endpoint and the OpenRouter fallback

Fallback models

Hermes supports a configured fallback model/provider pair, allowing runtime failover when the primary model encounters errors.

How it works internally

  1. Storage: AIAgent.__init__ stores the fallback_model dict and sets _fallback_activated = False.

  2. Trigger points: _try_activate_fallback() is called from three places in the main retry loop in run_agent.py:

    • After max retries on invalid API responses (None choices, missing content)
    • On non-retryable client errors (HTTP 401, 403, 404)
    • After max retries on transient errors (HTTP 429, 500, 502, 503)
  3. Activation flow (_try_activate_fallback):

    • Returns False immediately if already activated or not configured
    • Calls resolve_provider_client() from auxiliary_client.py to build a new client with proper auth
    • Determines api_mode: codex_responses for openai-codex, anthropic_messages for anthropic, chat_completions for everything else
    • Swaps in-place: self.model, self.provider, self.base_url, self.api_mode, self.client, self._client_kwargs
    • For anthropic fallback: builds a native Anthropic client instead of OpenAI-compatible
    • Re-evaluates prompt caching (enabled for Claude models on OpenRouter)
    • Sets _fallback_activated = True — prevents firing again
    • Resets retry count to 0 and continues the loop
  4. Config flow:

    • CLI: cli.py reads CLI_CONFIG["fallback_model"] → passes to AIAgent(fallback_model=...)
    • Gateway: gateway/run.py._load_fallback_model() reads config.yaml → passes to AIAgent
    • Validation: both provider and model keys must be non-empty, or fallback is disabled

What does NOT support fallback

  • Subagent delegation (tools/delegate_tool.py): subagents inherit the parent's provider but not the fallback config
  • Auxiliary tasks: use their own independent provider auto-detection chain (see Auxiliary model routing above)

Cron jobs do support fallback: run_job() reads fallback_providers (or legacy fallback_model) from config.yaml and passes it to AIAgent(fallback_model=...), matching the gateway's _load_fallback_model() pattern. See Cron Internals.

Test coverage

See tests/test_fallback_model.py for comprehensive tests covering all supported providers, one-shot semantics, and edge cases.