Compare commits

...

2 Commits

Author SHA1 Message Date
teknium1
11c926cc9d docs: add Hugging Face provider to all documentation pages
- quickstart.md: add to provider table
- configuration.md: add to provider table, add dedicated section with
  usage examples, config.yaml snippet, routing suffixes, and token info;
  also fix pre-existing duplicate Alibaba Cloud entry
- environment-variables.md: add HF_TOKEN + HF_BASE_URL, add huggingface
  to HERMES_INFERENCE_PROVIDER values
- fallback-providers.md: add to supported providers table and
  auto-detection chain
2026-03-17 05:40:02 -07:00
Daniel van Strien
154fd88b76 feat: add Hugging Face as a first-class inference provider
Register Hugging Face Inference Providers (router.huggingface.co/v1)
as a named provider alongside existing ones. Users can now:
- hermes chat --provider huggingface
- Use hf:model-name syntax (e.g. hf:Qwen/Qwen3-235B-A22B-Thinking-2507)
- Set HF_TOKEN in ~/.hermes/.env
- Select from 18 curated models via hermes model picker

OpenAI-compatible endpoint with automatic failover across providers
(Groq, Together, SambaNova, etc.), free tier included.

Files changed:
- hermes_cli/auth.py: ProviderConfig + aliases (hf, hugging-face, huggingface-hub)
- hermes_cli/models.py: _PROVIDER_MODELS, _PROVIDER_LABELS, _PROVIDER_ALIASES, _PROVIDER_ORDER
- hermes_cli/main.py: provider_labels, providers list, --provider choices, dispatch
- hermes_cli/setup.py: provider_choices, setup flow with token prompt
- hermes_cli/config.py: HF_TOKEN + HF_BASE_URL in OPTIONAL_ENV_VARS
- agent/model_metadata.py: context window entries for all curated HF models
- .env.example: HF_TOKEN documentation

Based on PR #1171 by @davanstrien. Salvaged onto current main with
additional completeness: setup.py flow, config.py env vars, auth.py
aliases, model_metadata context windows, .env.example.
2026-03-17 05:21:17 -07:00
11 changed files with 167 additions and 8 deletions

View File

@@ -59,6 +59,15 @@ OPENCODE_ZEN_API_KEY=
# OpenCode Go provides access to open models (GLM-5, Kimi K2.5, MiniMax M2.5)
# $10/month subscription. Get your key at: https://opencode.ai/auth
OPENCODE_GO_API_KEY=
# =============================================================================
# LLM PROVIDER (Hugging Face Inference Providers)
# =============================================================================
# Hugging Face routes to 20+ open models via unified OpenAI-compatible endpoint.
# Free tier included ($0.10/month), no markup on provider rates.
# Get your token at: https://huggingface.co/settings/tokens
# Required permission: "Make calls to Inference Providers"
HF_TOKEN=
# OPENCODE_GO_BASE_URL=https://opencode.ai/zen/go/v1 # Override default base URL
# =============================================================================

View File

@@ -119,6 +119,25 @@ DEFAULT_CONTEXT_LENGTHS = {
"qwen-plus-latest": 131072,
"qwen3.5-flash": 131072,
"qwen-vl-max": 32768,
# Hugging Face Inference Providers — model IDs use org/name format
"Qwen/Qwen3.5-397B-A17B": 131072,
"Qwen/Qwen3-235B-A22B-Thinking-2507": 131072,
"Qwen/Qwen3-Coder-480B-A35B-Instruct": 131072,
"Qwen/Qwen3-Coder-Next": 131072,
"Qwen/Qwen3-Next-80B-A3B-Instruct": 131072,
"Qwen/Qwen3-Next-80B-A3B-Thinking": 131072,
"deepseek-ai/DeepSeek-R1-0528": 65536,
"deepseek-ai/DeepSeek-V3.2": 65536,
"moonshotai/Kimi-K2-Instruct": 262144,
"moonshotai/Kimi-K2-Instruct-0905": 262144,
"moonshotai/Kimi-K2.5": 262144,
"moonshotai/Kimi-K2-Thinking": 262144,
"MiniMaxAI/MiniMax-M2.5": 204800,
"MiniMaxAI/MiniMax-M2.1": 204800,
"XiaomiMiMo/MiMo-V2-Flash": 32768,
"zai-org/GLM-5": 202752,
"zai-org/GLM-4.7": 202752,
"zai-org/GLM-4.7-Flash": 202752,
}

View File

@@ -195,6 +195,14 @@ PROVIDER_REGISTRY: Dict[str, ProviderConfig] = {
api_key_env_vars=("KILOCODE_API_KEY",),
base_url_env_var="KILOCODE_BASE_URL",
),
"huggingface": ProviderConfig(
id="huggingface",
name="Hugging Face",
auth_type="api_key",
inference_base_url="https://router.huggingface.co/v1",
api_key_env_vars=("HF_TOKEN",),
base_url_env_var="HF_BASE_URL",
),
}
@@ -574,6 +582,7 @@ def resolve_provider(
"claude": "anthropic", "claude-code": "anthropic",
"aigateway": "ai-gateway", "vercel": "ai-gateway", "vercel-ai-gateway": "ai-gateway",
"opencode": "opencode-zen", "zen": "opencode-zen",
"hf": "huggingface", "hugging-face": "huggingface", "huggingface-hub": "huggingface",
"go": "opencode-go", "opencode-go-sub": "opencode-go",
"kilo": "kilocode", "kilo-code": "kilocode", "kilo-gateway": "kilocode",
}

View File

@@ -549,6 +549,21 @@ OPTIONAL_ENV_VARS = {
"category": "provider",
"advanced": True,
},
"HF_TOKEN": {
"description": "Hugging Face token for Inference Providers (20+ open models via router.huggingface.co)",
"prompt": "Hugging Face Token",
"url": "https://huggingface.co/settings/tokens",
"password": True,
"category": "provider",
},
"HF_BASE_URL": {
"description": "Hugging Face Inference Providers base URL override",
"prompt": "HF base URL (leave empty for default)",
"url": None,
"password": False,
"category": "provider",
"advanced": True,
},
# ── Tool API keys ──
"PARALLEL_API_KEY": {

View File

@@ -785,6 +785,7 @@ def cmd_model(args):
"ai-gateway": "AI Gateway",
"kilocode": "Kilo Code",
"alibaba": "Alibaba Cloud (DashScope)",
"huggingface": "Hugging Face",
"custom": "Custom endpoint",
}
active_label = provider_labels.get(active, active)
@@ -809,6 +810,7 @@ def cmd_model(args):
("opencode-go", "OpenCode Go (open models, $10/month subscription)"),
("ai-gateway", "AI Gateway (Vercel — 200+ models, pay-per-use)"),
("alibaba", "Alibaba Cloud / DashScope (Qwen models, Anthropic-compatible)"),
("huggingface", "Hugging Face Inference Providers (20+ open models)"),
]
# Add user-defined custom providers from config.yaml
@@ -877,7 +879,7 @@ def cmd_model(args):
_model_flow_anthropic(config, current_model)
elif selected_provider == "kimi-coding":
_model_flow_kimi(config, current_model)
elif selected_provider in ("zai", "minimax", "minimax-cn", "kilocode", "opencode-zen", "opencode-go", "ai-gateway", "alibaba"):
elif selected_provider in ("zai", "minimax", "minimax-cn", "kilocode", "opencode-zen", "opencode-go", "ai-gateway", "alibaba", "huggingface"):
_model_flow_api_key_provider(config, selected_provider, current_model)
@@ -1444,6 +1446,27 @@ _PROVIDER_MODELS = {
"google/gemini-3-pro-preview",
"google/gemini-3-flash-preview",
],
# Curated model list sourced from https://models.dev (huggingface provider)
"huggingface": [
"Qwen/Qwen3.5-397B-A17B",
"Qwen/Qwen3-235B-A22B-Thinking-2507",
"Qwen/Qwen3-Coder-480B-A35B-Instruct",
"Qwen/Qwen3-Coder-Next",
"Qwen/Qwen3-Next-80B-A3B-Instruct",
"Qwen/Qwen3-Next-80B-A3B-Thinking",
"deepseek-ai/DeepSeek-R1-0528",
"deepseek-ai/DeepSeek-V3.2",
"moonshotai/Kimi-K2-Instruct",
"moonshotai/Kimi-K2-Instruct-0905",
"moonshotai/Kimi-K2.5",
"moonshotai/Kimi-K2-Thinking",
"MiniMaxAI/MiniMax-M2.5",
"MiniMaxAI/MiniMax-M2.1",
"XiaomiMiMo/MiMo-V2-Flash",
"zai-org/GLM-5",
"zai-org/GLM-4.7",
"zai-org/GLM-4.7-Flash",
],
}
@@ -2642,7 +2665,7 @@ For more help on a command:
)
chat_parser.add_argument(
"--provider",
choices=["auto", "openrouter", "nous", "openai-codex", "anthropic", "zai", "kimi-coding", "minimax", "minimax-cn", "kilocode"],
choices=["auto", "openrouter", "nous", "openai-codex", "anthropic", "huggingface", "zai", "kimi-coding", "minimax", "minimax-cn", "kilocode"],
default=None,
help="Inference provider (default: auto)"
)

View File

@@ -155,6 +155,28 @@ _PROVIDER_MODELS: dict[str, list[str]] = {
"qwen3.5-flash",
"qwen-vl-max",
],
# Curated model list for Hugging Face Inference Providers
# sourced from https://models.dev (huggingface provider)
"huggingface": [
"Qwen/Qwen3.5-397B-A17B",
"Qwen/Qwen3-235B-A22B-Thinking-2507",
"Qwen/Qwen3-Coder-480B-A35B-Instruct",
"Qwen/Qwen3-Coder-Next",
"Qwen/Qwen3-Next-80B-A3B-Instruct",
"Qwen/Qwen3-Next-80B-A3B-Thinking",
"deepseek-ai/DeepSeek-R1-0528",
"deepseek-ai/DeepSeek-V3.2",
"moonshotai/Kimi-K2-Instruct",
"moonshotai/Kimi-K2-Instruct-0905",
"moonshotai/Kimi-K2.5",
"moonshotai/Kimi-K2-Thinking",
"MiniMaxAI/MiniMax-M2.5",
"MiniMaxAI/MiniMax-M2.1",
"XiaomiMiMo/MiMo-V2-Flash",
"zai-org/GLM-5",
"zai-org/GLM-4.7",
"zai-org/GLM-4.7-Flash",
],
}
_PROVIDER_LABELS = {
@@ -172,6 +194,7 @@ _PROVIDER_LABELS = {
"ai-gateway": "AI Gateway",
"kilocode": "Kilo Code",
"alibaba": "Alibaba Cloud (DashScope)",
"huggingface": "Hugging Face",
"custom": "Custom endpoint",
}
@@ -201,6 +224,9 @@ _PROVIDER_ALIASES = {
"aliyun": "alibaba",
"qwen": "alibaba",
"alibaba-cloud": "alibaba",
"hf": "huggingface",
"hugging-face": "huggingface",
"huggingface-hub": "huggingface",
}
@@ -234,7 +260,7 @@ def list_available_providers() -> list[dict[str, str]]:
# Canonical providers in display order
_PROVIDER_ORDER = [
"openrouter", "nous", "openai-codex",
"zai", "kimi-coding", "minimax", "minimax-cn", "kilocode", "anthropic", "alibaba",
"huggingface", "zai", "kimi-coding", "minimax", "minimax-cn", "kilocode", "anthropic", "alibaba",
"opencode-zen", "opencode-go",
"ai-gateway", "deepseek", "custom",
]

View File

@@ -61,6 +61,11 @@ _DEFAULT_PROVIDER_MODELS = {
"minimax-cn": ["MiniMax-M2.5", "MiniMax-M2.5-highspeed", "MiniMax-M2.1"],
"ai-gateway": ["anthropic/claude-opus-4.6", "anthropic/claude-sonnet-4.6", "openai/gpt-5", "google/gemini-3-flash"],
"kilocode": ["anthropic/claude-opus-4.6", "anthropic/claude-sonnet-4.6", "openai/gpt-5.4", "google/gemini-3-pro-preview", "google/gemini-3-flash-preview"],
"huggingface": [
"Qwen/Qwen3.5-397B-A17B", "Qwen/Qwen3-235B-A22B-Thinking-2507",
"Qwen/Qwen3-Coder-480B-A35B-Instruct", "deepseek-ai/DeepSeek-R1-0528",
"deepseek-ai/DeepSeek-V3.2", "moonshotai/Kimi-K2.5",
],
}
@@ -741,6 +746,7 @@ def setup_model_provider(config: dict):
"Alibaba Cloud / DashScope (Qwen models via Anthropic-compatible API)",
"OpenCode Zen (35+ curated models, pay-as-you-go)",
"OpenCode Go (open models, $10/month subscription)",
"Hugging Face Inference Providers (20+ open models)",
]
if keep_label:
provider_choices.append(keep_label)
@@ -1412,7 +1418,29 @@ def setup_model_provider(config: dict):
_set_model_provider(config, "opencode-go", pconfig.inference_base_url)
selected_base_url = pconfig.inference_base_url
# else: provider_idx == 14 (Keep current) — only shown when a provider already exists
elif provider_idx == 14: # Hugging Face Inference Providers
selected_provider = "huggingface"
print()
print_header("Hugging Face API Token")
pconfig = PROVIDER_REGISTRY["huggingface"]
print_info(f"Provider: {pconfig.name}")
print_info("Get your token at: https://huggingface.co/settings/tokens")
print_info("Required permission: 'Make calls to Inference Providers'")
print()
api_key = prompt_fn(
" HF_TOKEN: ",
is_password=True,
).strip()
if api_key:
save_env_value("HF_TOKEN", api_key)
# Clear OpenRouter env vars to prevent routing confusion
save_env_value("OPENAI_BASE_URL", "")
save_env_value("OPENAI_API_KEY", "")
_set_model_provider(config, "huggingface", pconfig.inference_base_url)
selected_base_url = pconfig.inference_base_url
# else: provider_idx == 15 (Keep current) — only shown when a provider already exists
# Normalize "keep current" to an explicit provider so downstream logic
# doesn't fall back to the generic OpenRouter/static-model path.
if selected_provider is None:

View File

@@ -50,6 +50,7 @@ hermes setup # Or configure everything at once
| **MiniMax** | International MiniMax endpoint | Set `MINIMAX_API_KEY` |
| **MiniMax China** | China-region MiniMax endpoint | Set `MINIMAX_CN_API_KEY` |
| **Alibaba Cloud** | Qwen models via DashScope | Set `DASHSCOPE_API_KEY` |
| **Hugging Face** | 20+ open models via unified router (Qwen, DeepSeek, Kimi, etc.) | Set `HF_TOKEN` |
| **Kilo Code** | KiloCode-hosted models | Set `KILOCODE_API_KEY` |
| **Vercel AI Gateway** | Vercel AI Gateway routing | Set `AI_GATEWAY_API_KEY` |
| **Custom Endpoint** | VLLM, SGLang, or any OpenAI-compatible API | Set base URL + API key |

View File

@@ -30,6 +30,8 @@ All variables go in `~/.hermes/.env`. You can also set them with `hermes config
| `MINIMAX_CN_BASE_URL` | Override MiniMax China base URL (default: `https://api.minimaxi.com/v1`) |
| `KILOCODE_API_KEY` | Kilo Code API key ([kilo.ai](https://kilo.ai)) |
| `KILOCODE_BASE_URL` | Override Kilo Code base URL (default: `https://api.kilo.ai/api/gateway`) |
| `HF_TOKEN` | Hugging Face token for Inference Providers ([huggingface.co/settings/tokens](https://huggingface.co/settings/tokens)) |
| `HF_BASE_URL` | Override Hugging Face base URL (default: `https://router.huggingface.co/v1`) |
| `ANTHROPIC_API_KEY` | Anthropic Console API key ([console.anthropic.com](https://console.anthropic.com/)) |
| `ANTHROPIC_TOKEN` | Manual or legacy Anthropic OAuth/setup-token override |
| `DASHSCOPE_API_KEY` | Alibaba Cloud DashScope API key for Qwen models ([modelstudio.console.alibabacloud.com](https://modelstudio.console.alibabacloud.com/)) |
@@ -48,7 +50,7 @@ For native Anthropic auth, Hermes prefers Claude Code's own credential files whe
| Variable | Description |
|----------|-------------|
| `HERMES_INFERENCE_PROVIDER` | Override provider selection: `auto`, `openrouter`, `nous`, `openai-codex`, `anthropic`, `zai`, `kimi-coding`, `minimax`, `minimax-cn`, `kilocode`, `alibaba` (default: `auto`) |
| `HERMES_INFERENCE_PROVIDER` | Override provider selection: `auto`, `openrouter`, `nous`, `openai-codex`, `anthropic`, `huggingface`, `zai`, `kimi-coding`, `minimax`, `minimax-cn`, `kilocode`, `alibaba` (default: `auto`) |
| `HERMES_PORTAL_BASE_URL` | Override Nous Portal URL (for development/testing) |
| `NOUS_INFERENCE_BASE_URL` | Override Nous inference API URL |
| `HERMES_NOUS_MIN_KEY_TTL_SECONDS` | Min agent key TTL before re-mint (default: 1800 = 30min) |

View File

@@ -72,7 +72,7 @@ You need at least one way to connect to an LLM. Use `hermes model` to switch pro
| **MiniMax China** | `MINIMAX_CN_API_KEY` in `~/.hermes/.env` (provider: `minimax-cn`) |
| **Alibaba Cloud** | `DASHSCOPE_API_KEY` in `~/.hermes/.env` (provider: `alibaba`, aliases: `dashscope`, `qwen`) |
| **Kilo Code** | `KILOCODE_API_KEY` in `~/.hermes/.env` (provider: `kilocode`) |
| **Alibaba Cloud** | `DASHSCOPE_API_KEY` in `~/.hermes/.env` (provider: `alibaba`) |
| **Hugging Face** | `HF_TOKEN` in `~/.hermes/.env` (provider: `huggingface`, aliases: `hf`) |
| **Custom Endpoint** | `hermes model` (saved in `config.yaml`) or `OPENAI_BASE_URL` + `OPENAI_API_KEY` in `~/.hermes/.env` |
:::info Codex Note
@@ -152,6 +152,32 @@ model:
Base URLs can be overridden with `GLM_BASE_URL`, `KIMI_BASE_URL`, `MINIMAX_BASE_URL`, `MINIMAX_CN_BASE_URL`, or `DASHSCOPE_BASE_URL` environment variables.
### Hugging Face Inference Providers
[Hugging Face Inference Providers](https://huggingface.co/docs/inference-providers) routes to 20+ open models through a unified OpenAI-compatible endpoint (`router.huggingface.co/v1`). Requests are automatically routed to the fastest available backend (Groq, Together, SambaNova, etc.) with automatic failover.
```bash
# Use any available model
hermes chat --provider huggingface --model Qwen/Qwen3-235B-A22B-Thinking-2507
# Requires: HF_TOKEN in ~/.hermes/.env
# Short alias
hermes chat --provider hf --model deepseek-ai/DeepSeek-V3.2
```
Or set it permanently in `config.yaml`:
```yaml
model:
provider: "huggingface"
default: "Qwen/Qwen3-235B-A22B-Thinking-2507"
```
Get your token at [huggingface.co/settings/tokens](https://huggingface.co/settings/tokens) — make sure to enable the "Make calls to Inference Providers" permission. Free tier included ($0.10/month credit, no markup on provider rates).
You can append routing suffixes to model names: `:fastest` (default), `:cheapest`, or `:provider_name` to force a specific backend.
The base URL can be overridden with `HF_BASE_URL`.
## Custom & Self-Hosted LLM Providers
Hermes Agent works with **any OpenAI-compatible API endpoint**. If a server implements `/v1/chat/completions`, you can point Hermes at it. This means you can use local models, GPU inference servers, multi-provider routers, or any third-party API.
@@ -443,7 +469,7 @@ fallback_model:
When activated, the fallback swaps the model and provider mid-session without losing your conversation. It fires **at most once** per session.
Supported providers: `openrouter`, `nous`, `openai-codex`, `anthropic`, `zai`, `kimi-coding`, `minimax`, `minimax-cn`, `custom`.
Supported providers: `openrouter`, `nous`, `openai-codex`, `anthropic`, `huggingface`, `zai`, `kimi-coding`, `minimax`, `minimax-cn`, `custom`.
:::tip
Fallback is configured exclusively through `config.yaml` — there are no environment variables for it. For full details on when it triggers, supported providers, and how it interacts with auxiliary tasks and delegation, see [Fallback Providers](/docs/user-guide/features/fallback-providers).

View File

@@ -44,6 +44,7 @@ Both `provider` and `model` are **required**. If either is missing, the fallback
| MiniMax | `minimax` | `MINIMAX_API_KEY` |
| MiniMax (China) | `minimax-cn` | `MINIMAX_CN_API_KEY` |
| Kilo Code | `kilocode` | `KILOCODE_API_KEY` |
| Hugging Face | `huggingface` | `HF_TOKEN` |
| Custom endpoint | `custom` | `base_url` + `api_key_env` (see below) |
### Custom Endpoint Fallback
@@ -161,7 +162,7 @@ When a task's provider is set to `"auto"` (the default), Hermes tries providers
```text
OpenRouter → Nous Portal → Custom endpoint → Codex OAuth →
API-key providers (z.ai, Kimi, MiniMax, Anthropic) → give up
API-key providers (z.ai, Kimi, MiniMax, Hugging Face, Anthropic) → give up
```
**For vision tasks:**