diff --git a/website/docs/getting-started/quickstart.md b/website/docs/getting-started/quickstart.md index 0418d473c3c..2bff29c64df 100644 --- a/website/docs/getting-started/quickstart.md +++ b/website/docs/getting-started/quickstart.md @@ -50,6 +50,7 @@ hermes setup # Or configure everything at once | **MiniMax** | International MiniMax endpoint | Set `MINIMAX_API_KEY` | | **MiniMax China** | China-region MiniMax endpoint | Set `MINIMAX_CN_API_KEY` | | **Alibaba Cloud** | Qwen models via DashScope | Set `DASHSCOPE_API_KEY` | +| **Hugging Face** | 20+ open models via unified router (Qwen, DeepSeek, Kimi, etc.) | Set `HF_TOKEN` | | **Kilo Code** | KiloCode-hosted models | Set `KILOCODE_API_KEY` | | **Vercel AI Gateway** | Vercel AI Gateway routing | Set `AI_GATEWAY_API_KEY` | | **Custom Endpoint** | VLLM, SGLang, or any OpenAI-compatible API | Set base URL + API key | diff --git a/website/docs/reference/environment-variables.md b/website/docs/reference/environment-variables.md index b4d6c3f4f74..7da6802857a 100644 --- a/website/docs/reference/environment-variables.md +++ b/website/docs/reference/environment-variables.md @@ -30,6 +30,8 @@ All variables go in `~/.hermes/.env`. You can also set them with `hermes config | `MINIMAX_CN_BASE_URL` | Override MiniMax China base URL (default: `https://api.minimaxi.com/v1`) | | `KILOCODE_API_KEY` | Kilo Code API key ([kilo.ai](https://kilo.ai)) | | `KILOCODE_BASE_URL` | Override Kilo Code base URL (default: `https://api.kilo.ai/api/gateway`) | +| `HF_TOKEN` | Hugging Face token for Inference Providers ([huggingface.co/settings/tokens](https://huggingface.co/settings/tokens)) | +| `HF_BASE_URL` | Override Hugging Face base URL (default: `https://router.huggingface.co/v1`) | | `ANTHROPIC_API_KEY` | Anthropic Console API key ([console.anthropic.com](https://console.anthropic.com/)) | | `ANTHROPIC_TOKEN` | Manual or legacy Anthropic OAuth/setup-token override | | `DASHSCOPE_API_KEY` | Alibaba Cloud DashScope API key for Qwen models ([modelstudio.console.alibabacloud.com](https://modelstudio.console.alibabacloud.com/)) | @@ -48,7 +50,7 @@ For native Anthropic auth, Hermes prefers Claude Code's own credential files whe | Variable | Description | |----------|-------------| -| `HERMES_INFERENCE_PROVIDER` | Override provider selection: `auto`, `openrouter`, `nous`, `openai-codex`, `anthropic`, `zai`, `kimi-coding`, `minimax`, `minimax-cn`, `kilocode`, `alibaba` (default: `auto`) | +| `HERMES_INFERENCE_PROVIDER` | Override provider selection: `auto`, `openrouter`, `nous`, `openai-codex`, `anthropic`, `huggingface`, `zai`, `kimi-coding`, `minimax`, `minimax-cn`, `kilocode`, `alibaba` (default: `auto`) | | `HERMES_PORTAL_BASE_URL` | Override Nous Portal URL (for development/testing) | | `NOUS_INFERENCE_BASE_URL` | Override Nous inference API URL | | `HERMES_NOUS_MIN_KEY_TTL_SECONDS` | Min agent key TTL before re-mint (default: 1800 = 30min) | diff --git a/website/docs/user-guide/configuration.md b/website/docs/user-guide/configuration.md index 8ee4d309516..2b4da44ddac 100644 --- a/website/docs/user-guide/configuration.md +++ b/website/docs/user-guide/configuration.md @@ -72,7 +72,7 @@ You need at least one way to connect to an LLM. Use `hermes model` to switch pro | **MiniMax China** | `MINIMAX_CN_API_KEY` in `~/.hermes/.env` (provider: `minimax-cn`) | | **Alibaba Cloud** | `DASHSCOPE_API_KEY` in `~/.hermes/.env` (provider: `alibaba`, aliases: `dashscope`, `qwen`) | | **Kilo Code** | `KILOCODE_API_KEY` in `~/.hermes/.env` (provider: `kilocode`) | -| **Alibaba Cloud** | `DASHSCOPE_API_KEY` in `~/.hermes/.env` (provider: `alibaba`) | +| **Hugging Face** | `HF_TOKEN` in `~/.hermes/.env` (provider: `huggingface`, aliases: `hf`) | | **Custom Endpoint** | `hermes model` (saved in `config.yaml`) or `OPENAI_BASE_URL` + `OPENAI_API_KEY` in `~/.hermes/.env` | :::info Codex Note @@ -152,6 +152,32 @@ model: Base URLs can be overridden with `GLM_BASE_URL`, `KIMI_BASE_URL`, `MINIMAX_BASE_URL`, `MINIMAX_CN_BASE_URL`, or `DASHSCOPE_BASE_URL` environment variables. +### Hugging Face Inference Providers + +[Hugging Face Inference Providers](https://huggingface.co/docs/inference-providers) routes to 20+ open models through a unified OpenAI-compatible endpoint (`router.huggingface.co/v1`). Requests are automatically routed to the fastest available backend (Groq, Together, SambaNova, etc.) with automatic failover. + +```bash +# Use any available model +hermes chat --provider huggingface --model Qwen/Qwen3-235B-A22B-Thinking-2507 +# Requires: HF_TOKEN in ~/.hermes/.env + +# Short alias +hermes chat --provider hf --model deepseek-ai/DeepSeek-V3.2 +``` + +Or set it permanently in `config.yaml`: +```yaml +model: + provider: "huggingface" + default: "Qwen/Qwen3-235B-A22B-Thinking-2507" +``` + +Get your token at [huggingface.co/settings/tokens](https://huggingface.co/settings/tokens) — make sure to enable the "Make calls to Inference Providers" permission. Free tier included ($0.10/month credit, no markup on provider rates). + +You can append routing suffixes to model names: `:fastest` (default), `:cheapest`, or `:provider_name` to force a specific backend. + +The base URL can be overridden with `HF_BASE_URL`. + ## Custom & Self-Hosted LLM Providers Hermes Agent works with **any OpenAI-compatible API endpoint**. If a server implements `/v1/chat/completions`, you can point Hermes at it. This means you can use local models, GPU inference servers, multi-provider routers, or any third-party API. @@ -443,7 +469,7 @@ fallback_model: When activated, the fallback swaps the model and provider mid-session without losing your conversation. It fires **at most once** per session. -Supported providers: `openrouter`, `nous`, `openai-codex`, `anthropic`, `zai`, `kimi-coding`, `minimax`, `minimax-cn`, `custom`. +Supported providers: `openrouter`, `nous`, `openai-codex`, `anthropic`, `huggingface`, `zai`, `kimi-coding`, `minimax`, `minimax-cn`, `custom`. :::tip Fallback is configured exclusively through `config.yaml` — there are no environment variables for it. For full details on when it triggers, supported providers, and how it interacts with auxiliary tasks and delegation, see [Fallback Providers](/docs/user-guide/features/fallback-providers). diff --git a/website/docs/user-guide/features/fallback-providers.md b/website/docs/user-guide/features/fallback-providers.md index 63e9337e4de..c149eee9461 100644 --- a/website/docs/user-guide/features/fallback-providers.md +++ b/website/docs/user-guide/features/fallback-providers.md @@ -44,6 +44,7 @@ Both `provider` and `model` are **required**. If either is missing, the fallback | MiniMax | `minimax` | `MINIMAX_API_KEY` | | MiniMax (China) | `minimax-cn` | `MINIMAX_CN_API_KEY` | | Kilo Code | `kilocode` | `KILOCODE_API_KEY` | +| Hugging Face | `huggingface` | `HF_TOKEN` | | Custom endpoint | `custom` | `base_url` + `api_key_env` (see below) | ### Custom Endpoint Fallback @@ -161,7 +162,7 @@ When a task's provider is set to `"auto"` (the default), Hermes tries providers ```text OpenRouter → Nous Portal → Custom endpoint → Codex OAuth → -API-key providers (z.ai, Kimi, MiniMax, Anthropic) → give up +API-key providers (z.ai, Kimi, MiniMax, Hugging Face, Anthropic) → give up ``` **For vision tasks:**