mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-04-29 15:31:38 +08:00
docs: add Hugging Face provider to all documentation pages
- quickstart.md: add to provider table - configuration.md: add to provider table, add dedicated section with usage examples, config.yaml snippet, routing suffixes, and token info; also fix pre-existing duplicate Alibaba Cloud entry - environment-variables.md: add HF_TOKEN + HF_BASE_URL, add huggingface to HERMES_INFERENCE_PROVIDER values - fallback-providers.md: add to supported providers table and auto-detection chain
This commit is contained in:
@@ -72,7 +72,7 @@ You need at least one way to connect to an LLM. Use `hermes model` to switch pro
|
||||
| **MiniMax China** | `MINIMAX_CN_API_KEY` in `~/.hermes/.env` (provider: `minimax-cn`) |
|
||||
| **Alibaba Cloud** | `DASHSCOPE_API_KEY` in `~/.hermes/.env` (provider: `alibaba`, aliases: `dashscope`, `qwen`) |
|
||||
| **Kilo Code** | `KILOCODE_API_KEY` in `~/.hermes/.env` (provider: `kilocode`) |
|
||||
| **Alibaba Cloud** | `DASHSCOPE_API_KEY` in `~/.hermes/.env` (provider: `alibaba`) |
|
||||
| **Hugging Face** | `HF_TOKEN` in `~/.hermes/.env` (provider: `huggingface`, aliases: `hf`) |
|
||||
| **Custom Endpoint** | `hermes model` (saved in `config.yaml`) or `OPENAI_BASE_URL` + `OPENAI_API_KEY` in `~/.hermes/.env` |
|
||||
|
||||
:::info Codex Note
|
||||
@@ -152,6 +152,32 @@ model:
|
||||
|
||||
Base URLs can be overridden with `GLM_BASE_URL`, `KIMI_BASE_URL`, `MINIMAX_BASE_URL`, `MINIMAX_CN_BASE_URL`, or `DASHSCOPE_BASE_URL` environment variables.
|
||||
|
||||
### Hugging Face Inference Providers
|
||||
|
||||
[Hugging Face Inference Providers](https://huggingface.co/docs/inference-providers) routes to 20+ open models through a unified OpenAI-compatible endpoint (`router.huggingface.co/v1`). Requests are automatically routed to the fastest available backend (Groq, Together, SambaNova, etc.) with automatic failover.
|
||||
|
||||
```bash
|
||||
# Use any available model
|
||||
hermes chat --provider huggingface --model Qwen/Qwen3-235B-A22B-Thinking-2507
|
||||
# Requires: HF_TOKEN in ~/.hermes/.env
|
||||
|
||||
# Short alias
|
||||
hermes chat --provider hf --model deepseek-ai/DeepSeek-V3.2
|
||||
```
|
||||
|
||||
Or set it permanently in `config.yaml`:
|
||||
```yaml
|
||||
model:
|
||||
provider: "huggingface"
|
||||
default: "Qwen/Qwen3-235B-A22B-Thinking-2507"
|
||||
```
|
||||
|
||||
Get your token at [huggingface.co/settings/tokens](https://huggingface.co/settings/tokens) — make sure to enable the "Make calls to Inference Providers" permission. Free tier included ($0.10/month credit, no markup on provider rates).
|
||||
|
||||
You can append routing suffixes to model names: `:fastest` (default), `:cheapest`, or `:provider_name` to force a specific backend.
|
||||
|
||||
The base URL can be overridden with `HF_BASE_URL`.
|
||||
|
||||
## Custom & Self-Hosted LLM Providers
|
||||
|
||||
Hermes Agent works with **any OpenAI-compatible API endpoint**. If a server implements `/v1/chat/completions`, you can point Hermes at it. This means you can use local models, GPU inference servers, multi-provider routers, or any third-party API.
|
||||
@@ -443,7 +469,7 @@ fallback_model:
|
||||
|
||||
When activated, the fallback swaps the model and provider mid-session without losing your conversation. It fires **at most once** per session.
|
||||
|
||||
Supported providers: `openrouter`, `nous`, `openai-codex`, `anthropic`, `zai`, `kimi-coding`, `minimax`, `minimax-cn`, `custom`.
|
||||
Supported providers: `openrouter`, `nous`, `openai-codex`, `anthropic`, `huggingface`, `zai`, `kimi-coding`, `minimax`, `minimax-cn`, `custom`.
|
||||
|
||||
:::tip
|
||||
Fallback is configured exclusively through `config.yaml` — there are no environment variables for it. For full details on when it triggers, supported providers, and how it interacts with auxiliary tasks and delegation, see [Fallback Providers](/docs/user-guide/features/fallback-providers).
|
||||
|
||||
Reference in New Issue
Block a user