Files
hermes-agent/website/docs/user-guide/features/fallback-providers.md
Teknium 22ff6ca32b docs: two-week gap sweep — platforms, CLI, config, TUI, hooks, providers (#17727)
Covers ~60 merged PRs from Apr 15–29 that shipped user-visible behavior
without docs coverage. No functional code changes; docs + static manifest
regeneration only.

Highlights:

Stale / incorrect:
- configuration.md: auxiliary auto-routing line was wrong since #11900;
  now correctly states auto routes to the main model, with a note on the
  cost trade-off and per-task override pattern.
- integrations/providers.md + configuration.md compression intro:
  removed stale 'Gemini Flash via OpenRouter' claim.
- website/static/api/model-catalog.json: rebuilt from hermes_cli/models.py
  so the live manifest picks up tencent/hy3-preview (and remains in sync
  for future model-catalog PRs).

Platform messaging (#17417 #16997 #16193 #14315 #13151 #11794 #10610
#10283 #10246 #11564 #13178):
- Signal: native formatting (bodyRanges), reply quotes, reactions.
- Telegram: table rendering (bullets + code-block fallback),
  disable_link_previews, group_allowed_chats.
- Slack: strict_mention config.
- Discord: slash_commands disable, send_animation GIF, send_message
  native media attachments.
- DingTalk: require_mention + allowed_users.

CLI (#16052 #16539 #16566 #15841 #14798 #10043):
- New 'hermes fallback' interactive manager.
- New 'hermes update --check', '--backup' flag, and pre-update pairing
  snapshot behavior.
- 'hermes gateway start/restart --all' multi-profile flag.
- cron.md: 'hermes tools' as a platform, per-job enabled_toolsets,
  wakeAgent gate, context_from chaining.

Config keys / env vars (#17305 #17026 #17000 #15077 #14557 #14227
#14166 #14730 #17008):
- terminal.docker_run_as_host_user, display.runtime_metadata_footer,
  compression.hygiene_hard_message_limit, HINDSIGHT_TIMEOUT,
  skills.guard_agent_created, TAVILY_BASE_URL,
  security.allow_private_urls, agent.api_max_retries,
  gateway hot-reload of compression/context_length config edits.

TUI / CLI UX (#17130 #17113 #17175 #17150 #16707 #12312 #12305 #12934
#14810 #14045 #17286 #17126):
- HERMES_TUI_RESUME, HERMES_TUI_THEME, LaTeX rendering, busy-indicator
  styles, ctrl-x queued-message delete, git branch in status bar, per-
  prompt elapsed stopwatch, external-editor keybind, markdown stripping,
  TUI voice-mode parity, /agents overlay, /reload + /mouse.

Gateway features (#16506 #15027 #13428 #12116):
- Native multimodal image routing based on vision capability.
- /usage account-limits section.
- /steer slash command (added to reference + explanation in CLI).

Plugins / hooks (#12929 #12972 #10763 #16364):
- transform_tool_result, transform_terminal_output plugin hooks.
- PluginContext.dispatch_tool() documented with slash-command example.
- google_meet bundled plugin entry under built-in-plugins.md.

Other (#16576 #16572 #16383 #15878 #15608 #15606 #14809 #14767 #14231
#14232 #14307 #13683 #12373 #11891 #11291 #10066):
- hermes backup exclusions (WAL/SHM/journal + checkpoints/).
- security.md hardline blocklist (floor below --yolo).
- FHS install layout for root installs.
- openssh-client + docker-cli baked into the Docker image.
- MEDIA: tag supported extensions table (docs/office/archives/pdf).
- Remote-to-host file sync on SSH/Modal/Daytona teardown.
- 'hermes model' -> Configure Auxiliary Models interactive picker.
- Podman support via HERMES_DOCKER_BINARY.

Providers / STT / one-shot (#15045 #14473 #15704):
- alibaba-coding-plan first-class provider entry.
- xAI Grok STT as a 6th transcription option.
- 'hermes -z' scripted one-shot mode + HERMES_INFERENCE_MODEL.

Build: 'docusaurus build' succeeds. No new broken links/anchors;
pre-existing warnings unchanged.
2026-04-29 20:32:37 -07:00

15 KiB

title, description, sidebar_label, sidebar_position
title description sidebar_label sidebar_position
Fallback Providers Configure automatic failover to backup LLM providers when your primary model is unavailable. Fallback Providers 8

Fallback Providers

Hermes Agent has three layers of resilience that keep your sessions running when providers hit issues:

  1. Credential pools — rotate across multiple API keys for the same provider (tried first)
  2. Primary model fallback — automatically switches to a different provider:model when your main model fails
  3. Auxiliary task fallback — independent provider resolution for side tasks like vision, compression, and web extraction

Credential pools handle same-provider rotation (e.g., multiple OpenRouter keys). This page covers cross-provider fallback. Both are optional and work independently.

Primary Model Fallback

When your main LLM provider encounters errors — rate limits, server overload, auth failures, connection drops — Hermes can automatically switch to a backup provider:model pair mid-session without losing your conversation.

Configuration

The easiest path is the interactive manager:

hermes fallback

hermes fallback reuses the provider picker from hermes model — same provider list, same credential prompts, same validation. Press a to add a fallback, / to reorder, d to remove, q to save and exit. Changes persist under model.fallback_providers in config.yaml.

If you'd rather edit the YAML directly, add a fallback_model section to ~/.hermes/config.yaml:

fallback_model:
  provider: openrouter
  model: anthropic/claude-sonnet-4

Both provider and model are required. If either is missing, the fallback is disabled.

:::note fallback_model vs fallback_providers fallback_model (singular) is the legacy single-fallback key — Hermes still honors it for back-compat. fallback_providers (plural, list) supports multiple fallbacks tried in order; hermes fallback writes to this key. When both are set, Hermes merges them with fallback_providers taking priority. :::

Supported Providers

Provider Value Requirements
AI Gateway ai-gateway AI_GATEWAY_API_KEY
OpenRouter openrouter OPENROUTER_API_KEY
Nous Portal nous hermes auth (OAuth)
OpenAI Codex openai-codex hermes model (ChatGPT OAuth)
GitHub Copilot copilot COPILOT_GITHUB_TOKEN, GH_TOKEN, or GITHUB_TOKEN
GitHub Copilot ACP copilot-acp External process (editor integration)
Anthropic anthropic ANTHROPIC_API_KEY or Claude Code credentials
z.ai / GLM zai GLM_API_KEY
Kimi / Moonshot kimi-coding KIMI_API_KEY
MiniMax minimax MINIMAX_API_KEY
MiniMax (China) minimax-cn MINIMAX_CN_API_KEY
DeepSeek deepseek DEEPSEEK_API_KEY
NVIDIA NIM nvidia NVIDIA_API_KEY (optional: NVIDIA_BASE_URL)
Ollama Cloud ollama-cloud OLLAMA_API_KEY
Google Gemini (OAuth) google-gemini-cli hermes model (Google OAuth; optional: HERMES_GEMINI_PROJECT_ID)
Google AI Studio gemini GOOGLE_API_KEY (alias: GEMINI_API_KEY)
xAI (Grok) xai (alias grok) XAI_API_KEY (optional: XAI_BASE_URL)
AWS Bedrock bedrock Standard boto3 auth (AWS_REGION + AWS_PROFILE or AWS_ACCESS_KEY_ID)
Qwen Portal (OAuth) qwen-oauth hermes model (Qwen Portal OAuth; optional: HERMES_QWEN_BASE_URL)
MiniMax (OAuth) minimax-oauth hermes model (MiniMax portal OAuth)
OpenCode Zen opencode-zen OPENCODE_ZEN_API_KEY
OpenCode Go opencode-go OPENCODE_GO_API_KEY
Kilo Code kilocode KILOCODE_API_KEY
Xiaomi MiMo xiaomi XIAOMI_API_KEY
Arcee AI arcee ARCEEAI_API_KEY
GMI Cloud gmi GMI_API_KEY
Alibaba / DashScope alibaba DASHSCOPE_API_KEY
Hugging Face huggingface HF_TOKEN
Custom endpoint custom base_url + key_env (see below)

Custom Endpoint Fallback

For a custom OpenAI-compatible endpoint, add base_url and optionally key_env:

fallback_model:
  provider: custom
  model: my-local-model
  base_url: http://localhost:8000/v1
  key_env: MY_LOCAL_KEY              # env var name containing the API key

When Fallback Triggers

The fallback activates automatically when the primary model fails with:

  • Rate limits (HTTP 429) — after exhausting retry attempts
  • Server errors (HTTP 500, 502, 503) — after exhausting retry attempts
  • Auth failures (HTTP 401, 403) — immediately (no point retrying)
  • Not found (HTTP 404) — immediately
  • Invalid responses — when the API returns malformed or empty responses repeatedly

When triggered, Hermes:

  1. Resolves credentials for the fallback provider
  2. Builds a new API client
  3. Swaps the model, provider, and client in-place
  4. Resets the retry counter and continues the conversation

The switch is seamless — your conversation history, tool calls, and context are preserved. The agent continues from exactly where it left off, just using a different model.

:::info Per-Turn, Not Per-Session Fallback is turn-scoped: each new user message starts with the primary model restored. If the primary fails mid-turn, fallback activates for that turn only. On the next message, Hermes tries the primary again. Within a single turn, fallback activates at most once — if the fallback also fails, normal error handling takes over (retries, then error message). This prevents cascading failover loops within a turn while giving the primary model a fresh chance every turn. :::

Examples

OpenRouter as fallback for Anthropic native:

model:
  provider: anthropic
  default: claude-sonnet-4-6

fallback_model:
  provider: openrouter
  model: anthropic/claude-sonnet-4

Nous Portal as fallback for OpenRouter:

model:
  provider: openrouter
  default: anthropic/claude-opus-4

fallback_model:
  provider: nous
  model: nous-hermes-3

Local model as fallback for cloud:

fallback_model:
  provider: custom
  model: llama-3.1-70b
  base_url: http://localhost:8000/v1
  key_env: LOCAL_API_KEY

Codex OAuth as fallback:

fallback_model:
  provider: openai-codex
  model: gpt-5.3-codex

Where Fallback Works

Context Fallback Supported
CLI sessions
Messaging gateway (Telegram, Discord, etc.)
Subagent delegation ✘ (subagents do not inherit fallback config)
Cron jobs ✘ (run with a fixed provider)
Auxiliary tasks (vision, compression) ✘ (use their own provider chain — see below)

:::tip There are no environment variables for fallback_model — it is configured exclusively through config.yaml. This is intentional: fallback configuration is a deliberate choice, not something a stale shell export should override. :::


Auxiliary Task Fallback

Hermes uses separate lightweight models for side tasks. Each task has its own provider resolution chain that acts as a built-in fallback system.

Tasks with Independent Provider Resolution

Task What It Does Config Key
Vision Image analysis, browser screenshots auxiliary.vision
Web Extract Web page summarization auxiliary.web_extract
Compression Context compression summaries auxiliary.compression
Session Search Past session summarization auxiliary.session_search
Skills Hub Skill search and discovery auxiliary.skills_hub
MCP MCP helper operations auxiliary.mcp
Approval Smart command-approval classification auxiliary.approval
Title Generation Session title summaries auxiliary.title_generation

Auto-Detection Chain

When a task's provider is set to "auto" (the default), Hermes tries providers in order until one works:

For text tasks (compression, web extract, etc.):

OpenRouter → Nous Portal → Custom endpoint → Codex OAuth →
API-key providers (z.ai, Kimi, MiniMax, Xiaomi MiMo, Hugging Face, Anthropic) → give up

For vision tasks:

Main provider (if vision-capable) → OpenRouter → Nous Portal →
Codex OAuth → Anthropic → Custom endpoint → give up

If the resolved provider fails at call time, Hermes also has an internal retry: if the provider is not OpenRouter and no explicit base_url is set, it tries OpenRouter as a last-resort fallback.

Configuring Auxiliary Providers

Each task can be configured independently in config.yaml:

auxiliary:
  vision:
    provider: "auto"              # auto | openrouter | nous | codex | main | anthropic
    model: ""                     # e.g. "openai/gpt-4o"
    base_url: ""                  # direct endpoint (takes precedence over provider)
    api_key: ""                   # API key for base_url

  web_extract:
    provider: "auto"
    model: ""

  compression:
    provider: "auto"
    model: ""

  session_search:
    provider: "auto"
    model: ""
    timeout: 30
    max_concurrency: 3
    extra_body: {}

  skills_hub:
    provider: "auto"
    model: ""

  mcp:
    provider: "auto"
    model: ""

Every task above follows the same provider / model / base_url pattern. Context compression is configured under auxiliary.compression:

auxiliary:
  compression:
    provider: main                                    # Same provider options as other auxiliary tasks
    model: google/gemini-3-flash-preview
    base_url: null                                    # Custom OpenAI-compatible endpoint

And the fallback model uses:

fallback_model:
  provider: openrouter
  model: anthropic/claude-sonnet-4
  # base_url: http://localhost:8000/v1               # Optional custom endpoint

For auxiliary.session_search, Hermes also supports:

  • max_concurrency to limit how many session summaries run at once
  • extra_body to pass provider-specific OpenAI-compatible request fields through on the summarization calls

Example:

auxiliary:
  session_search:
    provider: main
    model: glm-4.5-air
    max_concurrency: 2
    extra_body:
      enable_thinking: false

If your provider does not support a native OpenAI-compatible reasoning-control field, extra_body will not help for that part; in that case max_concurrency is still useful for reducing request-burst 429s.

All three — auxiliary, compression, fallback — work the same way: set provider to pick who handles the request, model to pick which model, and base_url to point at a custom endpoint (overrides provider).

Provider Options for Auxiliary Tasks

These options apply to auxiliary:, compression:, and fallback_model: configs only — "main" is not a valid value for your top-level model.provider. For custom endpoints, use provider: custom in your model: section (see AI Providers).

Provider Description Requirements
"auto" Try providers in order until one works (default) At least one provider configured
"openrouter" Force OpenRouter OPENROUTER_API_KEY
"nous" Force Nous Portal hermes auth
"codex" Force Codex OAuth hermes model → Codex
"main" Use whatever provider the main agent uses (auxiliary tasks only) Active main provider configured
"anthropic" Force Anthropic native ANTHROPIC_API_KEY or Claude Code credentials

Direct Endpoint Override

For any auxiliary task, setting base_url bypasses provider resolution entirely and sends requests directly to that endpoint:

auxiliary:
  vision:
    base_url: "http://localhost:1234/v1"
    api_key: "local-key"
    model: "qwen2.5-vl"

base_url takes precedence over provider. Hermes uses the configured api_key for authentication, falling back to OPENAI_API_KEY if not set. It does not reuse OPENROUTER_API_KEY for custom endpoints.


Context Compression Fallback

Context compression uses the auxiliary.compression config block to control which model and provider handles summarization:

auxiliary:
  compression:
    provider: "auto"                              # auto | openrouter | nous | main
    model: "google/gemini-3-flash-preview"

:::info Legacy migration Older configs with compression.summary_model / compression.summary_provider / compression.summary_base_url are automatically migrated to auxiliary.compression.* on first load (config version 17). :::

If no provider is available for compression, Hermes drops middle conversation turns without generating a summary rather than failing the session.


Delegation Provider Override

Subagents spawned by delegate_task do not use the primary fallback model. However, they can be routed to a different provider:model pair for cost optimization:

delegation:
  provider: "openrouter"                      # override provider for all subagents
  model: "google/gemini-3-flash-preview"      # override model
  # base_url: "http://localhost:1234/v1"      # or use a direct endpoint
  # api_key: "local-key"

See Subagent Delegation for full configuration details.


Cron Job Providers

Cron jobs run with whatever provider is configured at execution time. They do not support a fallback model. To use a different provider for cron jobs, configure provider and model overrides on the cron job itself:

cronjob(
    action="create",
    schedule="every 2h",
    prompt="Check server status",
    provider="openrouter",
    model="google/gemini-3-flash-preview"
)

See Scheduled Tasks (Cron) for full configuration details.


Summary

Feature Fallback Mechanism Config Location
Main agent model fallback_model in config.yaml — per-turn failover on errors (primary restored each turn) fallback_model: (top-level)
Vision Auto-detection chain + internal OpenRouter retry auxiliary.vision
Web extraction Auto-detection chain + internal OpenRouter retry auxiliary.web_extract
Context compression Auto-detection chain, degrades to no-summary if unavailable auxiliary.compression
Session search Auto-detection chain auxiliary.session_search
Skills hub Auto-detection chain auxiliary.skills_hub
MCP helpers Auto-detection chain auxiliary.mcp
Approval classification Auto-detection chain auxiliary.approval
Title generation Auto-detection chain auxiliary.title_generation
Delegation Provider override only (no automatic fallback) delegation.provider / delegation.model
Cron jobs Per-job provider override only (no automatic fallback) Per-job provider / model