Files
hermes-agent/website/docs/user-guide/features/api-server.md

379 lines
14 KiB
Markdown
Raw Normal View History

feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756) * feat: OpenAI-compatible API server platform adapter Salvaged from PR #956, updated for current main. Adds an HTTP API server as a gateway platform adapter that exposes hermes-agent via the OpenAI Chat Completions and Responses APIs. Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat, AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at http://localhost:8642/v1. Endpoints: - POST /v1/chat/completions — stateless Chat Completions API - POST /v1/responses — stateful Responses API with chaining - GET /v1/responses/{id} — retrieve stored response - DELETE /v1/responses/{id} — delete stored response - GET /v1/models — list hermes-agent as available model - GET /health — health check Features: - Real SSE streaming via stream_delta_callback (uses main's streaming) - In-memory LRU response store for Responses API conversation chaining - Named conversations via 'conversation' parameter - Bearer token auth (optional, via API_SERVER_KEY) - CORS support for browser-based frontends - System prompt layering (frontend system messages on top of core) - Real token usage tracking in responses Integration points: - Platform.API_SERVER in gateway/config.py - _create_adapter() branch in gateway/run.py - API_SERVER_* env vars in hermes_cli/config.py - Env var overrides in gateway/config.py _apply_env_overrides() Changes vs original PR #956: - Removed streaming infrastructure (already on main via stream_consumer.py) - Removed Telegram reply_to_mode (separate feature, not included) - Updated _resolve_model() -> _resolve_gateway_model() - Updated stream_callback -> stream_delta_callback - Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected() - Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK) Tests: 72 new tests, all passing Docs: API server guide, Open WebUI integration guide, env var reference * feat(whatsapp): make reply prefix configurable via config.yaml Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env. The WhatsApp bridge prepends a header to every outgoing message. This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize or disable it via config.yaml: whatsapp: reply_prefix: '' # disable header reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix How it works: - load_gateway_config() reads whatsapp.reply_prefix from config.yaml and stores it in PlatformConfig.extra['reply_prefix'] - WhatsAppAdapter reads it from config.extra at init - When spawning bridge.js, the adapter passes it as WHATSAPP_REPLY_PREFIX in the subprocess environment - bridge.js handles undefined (default), empty (no header), or custom values with \\n escape support - Self-chat echo suppression uses the configured prefix Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a key 10 (TAVILY_API_KEY), so existing users at v9 would never be prompted for Tavily. Bumped to 10 to close the gap. Added a regression test to prevent this from happening again. Credit: ifrederico (PR #1764) for the bridge.js implementation and the config version gap discovery. --------- Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
---
sidebar_position: 14
title: "API Server"
description: "Expose hermes-agent as an OpenAI-compatible API for any frontend"
---
# API Server
The API server exposes hermes-agent as an OpenAI-compatible HTTP endpoint. Any frontend that speaks the OpenAI format — Open WebUI, LobeChat, LibreChat, NextChat, ChatBox, and hundreds more — can connect to hermes-agent and use it as a backend.
Your agent handles requests with its full toolset (terminal, file operations, web search, memory, skills) and returns the final response. When streaming, tool progress indicators appear inline so frontends can show what the agent is doing.
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756) * feat: OpenAI-compatible API server platform adapter Salvaged from PR #956, updated for current main. Adds an HTTP API server as a gateway platform adapter that exposes hermes-agent via the OpenAI Chat Completions and Responses APIs. Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat, AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at http://localhost:8642/v1. Endpoints: - POST /v1/chat/completions — stateless Chat Completions API - POST /v1/responses — stateful Responses API with chaining - GET /v1/responses/{id} — retrieve stored response - DELETE /v1/responses/{id} — delete stored response - GET /v1/models — list hermes-agent as available model - GET /health — health check Features: - Real SSE streaming via stream_delta_callback (uses main's streaming) - In-memory LRU response store for Responses API conversation chaining - Named conversations via 'conversation' parameter - Bearer token auth (optional, via API_SERVER_KEY) - CORS support for browser-based frontends - System prompt layering (frontend system messages on top of core) - Real token usage tracking in responses Integration points: - Platform.API_SERVER in gateway/config.py - _create_adapter() branch in gateway/run.py - API_SERVER_* env vars in hermes_cli/config.py - Env var overrides in gateway/config.py _apply_env_overrides() Changes vs original PR #956: - Removed streaming infrastructure (already on main via stream_consumer.py) - Removed Telegram reply_to_mode (separate feature, not included) - Updated _resolve_model() -> _resolve_gateway_model() - Updated stream_callback -> stream_delta_callback - Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected() - Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK) Tests: 72 new tests, all passing Docs: API server guide, Open WebUI integration guide, env var reference * feat(whatsapp): make reply prefix configurable via config.yaml Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env. The WhatsApp bridge prepends a header to every outgoing message. This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize or disable it via config.yaml: whatsapp: reply_prefix: '' # disable header reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix How it works: - load_gateway_config() reads whatsapp.reply_prefix from config.yaml and stores it in PlatformConfig.extra['reply_prefix'] - WhatsAppAdapter reads it from config.extra at init - When spawning bridge.js, the adapter passes it as WHATSAPP_REPLY_PREFIX in the subprocess environment - bridge.js handles undefined (default), empty (no header), or custom values with \\n escape support - Self-chat echo suppression uses the configured prefix Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a key 10 (TAVILY_API_KEY), so existing users at v9 would never be prompted for Tavily. Bumped to 10 to close the gap. Added a regression test to prevent this from happening again. Credit: ifrederico (PR #1764) for the bridge.js implementation and the config version gap discovery. --------- Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
## Quick Start
### 1. Enable the API server
Add to `~/.hermes/.env`:
```bash
API_SERVER_ENABLED=true
API_SERVER_KEY=change-me-local-dev
# Optional: only if a browser must call Hermes directly
# API_SERVER_CORS_ORIGINS=http://localhost:3000
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756) * feat: OpenAI-compatible API server platform adapter Salvaged from PR #956, updated for current main. Adds an HTTP API server as a gateway platform adapter that exposes hermes-agent via the OpenAI Chat Completions and Responses APIs. Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat, AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at http://localhost:8642/v1. Endpoints: - POST /v1/chat/completions — stateless Chat Completions API - POST /v1/responses — stateful Responses API with chaining - GET /v1/responses/{id} — retrieve stored response - DELETE /v1/responses/{id} — delete stored response - GET /v1/models — list hermes-agent as available model - GET /health — health check Features: - Real SSE streaming via stream_delta_callback (uses main's streaming) - In-memory LRU response store for Responses API conversation chaining - Named conversations via 'conversation' parameter - Bearer token auth (optional, via API_SERVER_KEY) - CORS support for browser-based frontends - System prompt layering (frontend system messages on top of core) - Real token usage tracking in responses Integration points: - Platform.API_SERVER in gateway/config.py - _create_adapter() branch in gateway/run.py - API_SERVER_* env vars in hermes_cli/config.py - Env var overrides in gateway/config.py _apply_env_overrides() Changes vs original PR #956: - Removed streaming infrastructure (already on main via stream_consumer.py) - Removed Telegram reply_to_mode (separate feature, not included) - Updated _resolve_model() -> _resolve_gateway_model() - Updated stream_callback -> stream_delta_callback - Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected() - Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK) Tests: 72 new tests, all passing Docs: API server guide, Open WebUI integration guide, env var reference * feat(whatsapp): make reply prefix configurable via config.yaml Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env. The WhatsApp bridge prepends a header to every outgoing message. This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize or disable it via config.yaml: whatsapp: reply_prefix: '' # disable header reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix How it works: - load_gateway_config() reads whatsapp.reply_prefix from config.yaml and stores it in PlatformConfig.extra['reply_prefix'] - WhatsAppAdapter reads it from config.extra at init - When spawning bridge.js, the adapter passes it as WHATSAPP_REPLY_PREFIX in the subprocess environment - bridge.js handles undefined (default), empty (no header), or custom values with \\n escape support - Self-chat echo suppression uses the configured prefix Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a key 10 (TAVILY_API_KEY), so existing users at v9 would never be prompted for Tavily. Bumped to 10 to close the gap. Added a regression test to prevent this from happening again. Credit: ifrederico (PR #1764) for the bridge.js implementation and the config version gap discovery. --------- Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
```
### 2. Start the gateway
```bash
hermes gateway
```
You'll see:
```
[API Server] API server listening on http://127.0.0.1:8642
```
### 3. Connect a frontend
Point any OpenAI-compatible client at `http://localhost:8642/v1`:
```bash
# Test with curl
curl http://localhost:8642/v1/chat/completions \
-H "Authorization: Bearer change-me-local-dev" \
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756) * feat: OpenAI-compatible API server platform adapter Salvaged from PR #956, updated for current main. Adds an HTTP API server as a gateway platform adapter that exposes hermes-agent via the OpenAI Chat Completions and Responses APIs. Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat, AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at http://localhost:8642/v1. Endpoints: - POST /v1/chat/completions — stateless Chat Completions API - POST /v1/responses — stateful Responses API with chaining - GET /v1/responses/{id} — retrieve stored response - DELETE /v1/responses/{id} — delete stored response - GET /v1/models — list hermes-agent as available model - GET /health — health check Features: - Real SSE streaming via stream_delta_callback (uses main's streaming) - In-memory LRU response store for Responses API conversation chaining - Named conversations via 'conversation' parameter - Bearer token auth (optional, via API_SERVER_KEY) - CORS support for browser-based frontends - System prompt layering (frontend system messages on top of core) - Real token usage tracking in responses Integration points: - Platform.API_SERVER in gateway/config.py - _create_adapter() branch in gateway/run.py - API_SERVER_* env vars in hermes_cli/config.py - Env var overrides in gateway/config.py _apply_env_overrides() Changes vs original PR #956: - Removed streaming infrastructure (already on main via stream_consumer.py) - Removed Telegram reply_to_mode (separate feature, not included) - Updated _resolve_model() -> _resolve_gateway_model() - Updated stream_callback -> stream_delta_callback - Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected() - Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK) Tests: 72 new tests, all passing Docs: API server guide, Open WebUI integration guide, env var reference * feat(whatsapp): make reply prefix configurable via config.yaml Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env. The WhatsApp bridge prepends a header to every outgoing message. This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize or disable it via config.yaml: whatsapp: reply_prefix: '' # disable header reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix How it works: - load_gateway_config() reads whatsapp.reply_prefix from config.yaml and stores it in PlatformConfig.extra['reply_prefix'] - WhatsAppAdapter reads it from config.extra at init - When spawning bridge.js, the adapter passes it as WHATSAPP_REPLY_PREFIX in the subprocess environment - bridge.js handles undefined (default), empty (no header), or custom values with \\n escape support - Self-chat echo suppression uses the configured prefix Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a key 10 (TAVILY_API_KEY), so existing users at v9 would never be prompted for Tavily. Bumped to 10 to close the gap. Added a regression test to prevent this from happening again. Credit: ifrederico (PR #1764) for the bridge.js implementation and the config version gap discovery. --------- Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
-H "Content-Type: application/json" \
-d '{"model": "hermes-agent", "messages": [{"role": "user", "content": "Hello!"}]}'
```
Or connect Open WebUI, LobeChat, or any other frontend — see the [Open WebUI integration guide](/docs/user-guide/messaging/open-webui) for step-by-step instructions.
## Endpoints
### POST /v1/chat/completions
Standard OpenAI Chat Completions format. Stateless — the full conversation is included in each request via the `messages` array.
**Request:**
```json
{
"model": "hermes-agent",
"messages": [
{"role": "system", "content": "You are a Python expert."},
{"role": "user", "content": "Write a fibonacci function"}
],
"stream": false
}
```
**Response:**
```json
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1710000000,
"model": "hermes-agent",
"choices": [{
"index": 0,
"message": {"role": "assistant", "content": "Here's a fibonacci function..."},
"finish_reason": "stop"
}],
"usage": {"prompt_tokens": 50, "completion_tokens": 200, "total_tokens": 250}
}
```
feat(api-server): inline image inputs on /v1/chat/completions and /v1/responses (#12969) OpenAI-compatible clients (Open WebUI, LobeChat, etc.) can now send vision requests to the API server. Both endpoints accept the canonical OpenAI multimodal shape: Chat Completions: {type: text|image_url, image_url: {url, detail?}} Responses: {type: input_text|input_image, image_url: <str>, detail?} The server validates and converts both into a single internal shape that the existing agent pipeline already handles (Anthropic adapter converts, OpenAI-wire providers pass through). Remote http(s) URLs and data:image/* URLs are supported. Uploaded files (file, input_file, file_id) and non-image data: URLs are rejected with 400 unsupported_content_type. Changes: - gateway/platforms/api_server.py - _normalize_multimodal_content(): validates + normalizes both Chat and Responses content shapes. Returns a plain string for text-only content (preserves prompt-cache behavior on existing callers) or a canonical [{type:text|image_url,...}] list when images are present. - _content_has_visible_payload(): replaces the bare truthy check so a user turn with only an image no longer rejects as 'No user message'. - _handle_chat_completions and _handle_responses both call the new helper for user/assistant content; system messages continue to flatten to text. - Codex conversation_history, input[], and inline history paths all share the same validator. No duplicated normalizers. - run_agent.py - _summarize_user_message_for_log(): produces a short string summary ('[1 image] describe this') from list content for logging, spinner previews, and trajectory writes. Fixes AttributeError when list user_message hit user_message[:80] + '...' / .replace(). - _chat_content_to_responses_parts(): module-level helper that converts chat-style multimodal content to Responses 'input_text'/'input_image' parts. Used in _chat_messages_to_responses_input for Codex routing. - _preflight_codex_input_items() now validates and passes through list content parts for user/assistant messages instead of stringifying. - tests/gateway/test_api_server_multimodal.py (new, 38 tests) - Unit coverage for _normalize_multimodal_content, including both part formats, data URL gating, and all reject paths. - Real aiohttp HTTP integration on /v1/chat/completions and /v1/responses verifying multimodal payloads reach _run_agent intact. - 400 coverage for file / input_file / non-image data URL. - tests/run_agent/test_run_agent_multimodal_prologue.py (new) - Regression coverage for the prologue no-crash contract. - _chat_content_to_responses_parts round-trip coverage. - website/docs/user-guide/features/api-server.md - Inline image examples for both endpoints. - Updated Limitations: files still unsupported, images now supported. Validated live against openrouter/anthropic/claude-opus-4.6: POST /v1/chat/completions → 200, vision-accurate description POST /v1/responses → 200, same image, clean output_text POST /v1/chat/completions [file] → 400 unsupported_content_type POST /v1/responses [input_file] → 400 unsupported_content_type POST /v1/responses [non-image data URL] → 400 unsupported_content_type Closes #5621, #8253, #4046, #6632. Co-authored-by: Paul Bergeron <paul@gamma.app> Co-authored-by: zhangxicen <zhangxicen@example.com> Co-authored-by: Manuel Schipper <manuelschipper@users.noreply.github.com> Co-authored-by: pradeep7127 <pradeep7127@users.noreply.github.com>
2026-04-20 04:16:13 -07:00
**Inline image input:** user messages may send `content` as an array of `text` and `image_url` parts. Both remote `http(s)` URLs and `data:image/...` URLs are supported:
```json
{
"model": "hermes-agent",
"messages": [
{
"role": "user",
"content": [
{"type": "text", "text": "What is in this image?"},
{"type": "image_url", "image_url": {"url": "https://example.com/cat.png", "detail": "high"}}
]
}
]
}
```
Uploaded files (`file` / `input_file` / `file_id`) and non-image `data:` URLs return `400 unsupported_content_type`.
**Streaming** (`"stream": true`): Returns Server-Sent Events (SSE) with token-by-token response chunks. For **Chat Completions**, the stream uses standard `chat.completion.chunk` events plus Hermes' custom `hermes.tool.progress` event for tool-start UX. For **Responses**, the stream uses OpenAI Responses event types such as `response.created`, `response.output_text.delta`, `response.output_item.added`, `response.output_item.done`, and `response.completed`.
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756) * feat: OpenAI-compatible API server platform adapter Salvaged from PR #956, updated for current main. Adds an HTTP API server as a gateway platform adapter that exposes hermes-agent via the OpenAI Chat Completions and Responses APIs. Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat, AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at http://localhost:8642/v1. Endpoints: - POST /v1/chat/completions — stateless Chat Completions API - POST /v1/responses — stateful Responses API with chaining - GET /v1/responses/{id} — retrieve stored response - DELETE /v1/responses/{id} — delete stored response - GET /v1/models — list hermes-agent as available model - GET /health — health check Features: - Real SSE streaming via stream_delta_callback (uses main's streaming) - In-memory LRU response store for Responses API conversation chaining - Named conversations via 'conversation' parameter - Bearer token auth (optional, via API_SERVER_KEY) - CORS support for browser-based frontends - System prompt layering (frontend system messages on top of core) - Real token usage tracking in responses Integration points: - Platform.API_SERVER in gateway/config.py - _create_adapter() branch in gateway/run.py - API_SERVER_* env vars in hermes_cli/config.py - Env var overrides in gateway/config.py _apply_env_overrides() Changes vs original PR #956: - Removed streaming infrastructure (already on main via stream_consumer.py) - Removed Telegram reply_to_mode (separate feature, not included) - Updated _resolve_model() -> _resolve_gateway_model() - Updated stream_callback -> stream_delta_callback - Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected() - Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK) Tests: 72 new tests, all passing Docs: API server guide, Open WebUI integration guide, env var reference * feat(whatsapp): make reply prefix configurable via config.yaml Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env. The WhatsApp bridge prepends a header to every outgoing message. This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize or disable it via config.yaml: whatsapp: reply_prefix: '' # disable header reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix How it works: - load_gateway_config() reads whatsapp.reply_prefix from config.yaml and stores it in PlatformConfig.extra['reply_prefix'] - WhatsAppAdapter reads it from config.extra at init - When spawning bridge.js, the adapter passes it as WHATSAPP_REPLY_PREFIX in the subprocess environment - bridge.js handles undefined (default), empty (no header), or custom values with \\n escape support - Self-chat echo suppression uses the configured prefix Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a key 10 (TAVILY_API_KEY), so existing users at v9 would never be prompted for Tavily. Bumped to 10 to close the gap. Added a regression test to prevent this from happening again. Credit: ifrederico (PR #1764) for the bridge.js implementation and the config version gap discovery. --------- Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
**Tool progress in streams**:
- **Chat Completions**: Hermes emits `event: hermes.tool.progress` for tool-start visibility without polluting persisted assistant text.
- **Responses**: Hermes emits spec-native `function_call` and `function_call_output` output items during the SSE stream, so clients can render structured tool UI in real time.
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756) * feat: OpenAI-compatible API server platform adapter Salvaged from PR #956, updated for current main. Adds an HTTP API server as a gateway platform adapter that exposes hermes-agent via the OpenAI Chat Completions and Responses APIs. Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat, AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at http://localhost:8642/v1. Endpoints: - POST /v1/chat/completions — stateless Chat Completions API - POST /v1/responses — stateful Responses API with chaining - GET /v1/responses/{id} — retrieve stored response - DELETE /v1/responses/{id} — delete stored response - GET /v1/models — list hermes-agent as available model - GET /health — health check Features: - Real SSE streaming via stream_delta_callback (uses main's streaming) - In-memory LRU response store for Responses API conversation chaining - Named conversations via 'conversation' parameter - Bearer token auth (optional, via API_SERVER_KEY) - CORS support for browser-based frontends - System prompt layering (frontend system messages on top of core) - Real token usage tracking in responses Integration points: - Platform.API_SERVER in gateway/config.py - _create_adapter() branch in gateway/run.py - API_SERVER_* env vars in hermes_cli/config.py - Env var overrides in gateway/config.py _apply_env_overrides() Changes vs original PR #956: - Removed streaming infrastructure (already on main via stream_consumer.py) - Removed Telegram reply_to_mode (separate feature, not included) - Updated _resolve_model() -> _resolve_gateway_model() - Updated stream_callback -> stream_delta_callback - Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected() - Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK) Tests: 72 new tests, all passing Docs: API server guide, Open WebUI integration guide, env var reference * feat(whatsapp): make reply prefix configurable via config.yaml Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env. The WhatsApp bridge prepends a header to every outgoing message. This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize or disable it via config.yaml: whatsapp: reply_prefix: '' # disable header reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix How it works: - load_gateway_config() reads whatsapp.reply_prefix from config.yaml and stores it in PlatformConfig.extra['reply_prefix'] - WhatsAppAdapter reads it from config.extra at init - When spawning bridge.js, the adapter passes it as WHATSAPP_REPLY_PREFIX in the subprocess environment - bridge.js handles undefined (default), empty (no header), or custom values with \\n escape support - Self-chat echo suppression uses the configured prefix Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a key 10 (TAVILY_API_KEY), so existing users at v9 would never be prompted for Tavily. Bumped to 10 to close the gap. Added a regression test to prevent this from happening again. Credit: ifrederico (PR #1764) for the bridge.js implementation and the config version gap discovery. --------- Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
### POST /v1/responses
OpenAI Responses API format. Supports server-side conversation state via `previous_response_id` — the server stores full conversation history (including tool calls and results) so multi-turn context is preserved without the client managing it.
**Request:**
```json
{
"model": "hermes-agent",
"input": "What files are in my project?",
"instructions": "You are a helpful coding assistant.",
"store": true
}
```
**Response:**
```json
{
"id": "resp_abc123",
"object": "response",
"status": "completed",
"model": "hermes-agent",
"output": [
{"type": "function_call", "name": "terminal", "arguments": "{\"command\": \"ls\"}", "call_id": "call_1"},
{"type": "function_call_output", "call_id": "call_1", "output": "README.md src/ tests/"},
{"type": "message", "role": "assistant", "content": [{"type": "output_text", "text": "Your project has..."}]}
],
"usage": {"input_tokens": 50, "output_tokens": 200, "total_tokens": 250}
}
```
feat(api-server): inline image inputs on /v1/chat/completions and /v1/responses (#12969) OpenAI-compatible clients (Open WebUI, LobeChat, etc.) can now send vision requests to the API server. Both endpoints accept the canonical OpenAI multimodal shape: Chat Completions: {type: text|image_url, image_url: {url, detail?}} Responses: {type: input_text|input_image, image_url: <str>, detail?} The server validates and converts both into a single internal shape that the existing agent pipeline already handles (Anthropic adapter converts, OpenAI-wire providers pass through). Remote http(s) URLs and data:image/* URLs are supported. Uploaded files (file, input_file, file_id) and non-image data: URLs are rejected with 400 unsupported_content_type. Changes: - gateway/platforms/api_server.py - _normalize_multimodal_content(): validates + normalizes both Chat and Responses content shapes. Returns a plain string for text-only content (preserves prompt-cache behavior on existing callers) or a canonical [{type:text|image_url,...}] list when images are present. - _content_has_visible_payload(): replaces the bare truthy check so a user turn with only an image no longer rejects as 'No user message'. - _handle_chat_completions and _handle_responses both call the new helper for user/assistant content; system messages continue to flatten to text. - Codex conversation_history, input[], and inline history paths all share the same validator. No duplicated normalizers. - run_agent.py - _summarize_user_message_for_log(): produces a short string summary ('[1 image] describe this') from list content for logging, spinner previews, and trajectory writes. Fixes AttributeError when list user_message hit user_message[:80] + '...' / .replace(). - _chat_content_to_responses_parts(): module-level helper that converts chat-style multimodal content to Responses 'input_text'/'input_image' parts. Used in _chat_messages_to_responses_input for Codex routing. - _preflight_codex_input_items() now validates and passes through list content parts for user/assistant messages instead of stringifying. - tests/gateway/test_api_server_multimodal.py (new, 38 tests) - Unit coverage for _normalize_multimodal_content, including both part formats, data URL gating, and all reject paths. - Real aiohttp HTTP integration on /v1/chat/completions and /v1/responses verifying multimodal payloads reach _run_agent intact. - 400 coverage for file / input_file / non-image data URL. - tests/run_agent/test_run_agent_multimodal_prologue.py (new) - Regression coverage for the prologue no-crash contract. - _chat_content_to_responses_parts round-trip coverage. - website/docs/user-guide/features/api-server.md - Inline image examples for both endpoints. - Updated Limitations: files still unsupported, images now supported. Validated live against openrouter/anthropic/claude-opus-4.6: POST /v1/chat/completions → 200, vision-accurate description POST /v1/responses → 200, same image, clean output_text POST /v1/chat/completions [file] → 400 unsupported_content_type POST /v1/responses [input_file] → 400 unsupported_content_type POST /v1/responses [non-image data URL] → 400 unsupported_content_type Closes #5621, #8253, #4046, #6632. Co-authored-by: Paul Bergeron <paul@gamma.app> Co-authored-by: zhangxicen <zhangxicen@example.com> Co-authored-by: Manuel Schipper <manuelschipper@users.noreply.github.com> Co-authored-by: pradeep7127 <pradeep7127@users.noreply.github.com>
2026-04-20 04:16:13 -07:00
**Inline image input:** `input[].content` can contain `input_text` and `input_image` parts. Both remote URLs and `data:image/...` URLs are supported:
```json
{
"model": "hermes-agent",
"input": [
{
"role": "user",
"content": [
{"type": "input_text", "text": "Describe this screenshot."},
{"type": "input_image", "image_url": "data:image/png;base64,iVBORw0K..."}
]
}
]
}
```
Uploaded files (`input_file` / `file_id`) and non-image `data:` URLs return `400 unsupported_content_type`.
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756) * feat: OpenAI-compatible API server platform adapter Salvaged from PR #956, updated for current main. Adds an HTTP API server as a gateway platform adapter that exposes hermes-agent via the OpenAI Chat Completions and Responses APIs. Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat, AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at http://localhost:8642/v1. Endpoints: - POST /v1/chat/completions — stateless Chat Completions API - POST /v1/responses — stateful Responses API with chaining - GET /v1/responses/{id} — retrieve stored response - DELETE /v1/responses/{id} — delete stored response - GET /v1/models — list hermes-agent as available model - GET /health — health check Features: - Real SSE streaming via stream_delta_callback (uses main's streaming) - In-memory LRU response store for Responses API conversation chaining - Named conversations via 'conversation' parameter - Bearer token auth (optional, via API_SERVER_KEY) - CORS support for browser-based frontends - System prompt layering (frontend system messages on top of core) - Real token usage tracking in responses Integration points: - Platform.API_SERVER in gateway/config.py - _create_adapter() branch in gateway/run.py - API_SERVER_* env vars in hermes_cli/config.py - Env var overrides in gateway/config.py _apply_env_overrides() Changes vs original PR #956: - Removed streaming infrastructure (already on main via stream_consumer.py) - Removed Telegram reply_to_mode (separate feature, not included) - Updated _resolve_model() -> _resolve_gateway_model() - Updated stream_callback -> stream_delta_callback - Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected() - Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK) Tests: 72 new tests, all passing Docs: API server guide, Open WebUI integration guide, env var reference * feat(whatsapp): make reply prefix configurable via config.yaml Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env. The WhatsApp bridge prepends a header to every outgoing message. This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize or disable it via config.yaml: whatsapp: reply_prefix: '' # disable header reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix How it works: - load_gateway_config() reads whatsapp.reply_prefix from config.yaml and stores it in PlatformConfig.extra['reply_prefix'] - WhatsAppAdapter reads it from config.extra at init - When spawning bridge.js, the adapter passes it as WHATSAPP_REPLY_PREFIX in the subprocess environment - bridge.js handles undefined (default), empty (no header), or custom values with \\n escape support - Self-chat echo suppression uses the configured prefix Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a key 10 (TAVILY_API_KEY), so existing users at v9 would never be prompted for Tavily. Bumped to 10 to close the gap. Added a regression test to prevent this from happening again. Credit: ifrederico (PR #1764) for the bridge.js implementation and the config version gap discovery. --------- Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
#### Multi-turn with previous_response_id
Chain responses to maintain full context (including tool calls) across turns:
```json
{
"input": "Now show me the README",
"previous_response_id": "resp_abc123"
}
```
The server reconstructs the full conversation from the stored response chain — all previous tool calls and results are preserved. Chained requests also share the same session, so multi-turn conversations appear as a single entry in the dashboard and session history.
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756) * feat: OpenAI-compatible API server platform adapter Salvaged from PR #956, updated for current main. Adds an HTTP API server as a gateway platform adapter that exposes hermes-agent via the OpenAI Chat Completions and Responses APIs. Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat, AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at http://localhost:8642/v1. Endpoints: - POST /v1/chat/completions — stateless Chat Completions API - POST /v1/responses — stateful Responses API with chaining - GET /v1/responses/{id} — retrieve stored response - DELETE /v1/responses/{id} — delete stored response - GET /v1/models — list hermes-agent as available model - GET /health — health check Features: - Real SSE streaming via stream_delta_callback (uses main's streaming) - In-memory LRU response store for Responses API conversation chaining - Named conversations via 'conversation' parameter - Bearer token auth (optional, via API_SERVER_KEY) - CORS support for browser-based frontends - System prompt layering (frontend system messages on top of core) - Real token usage tracking in responses Integration points: - Platform.API_SERVER in gateway/config.py - _create_adapter() branch in gateway/run.py - API_SERVER_* env vars in hermes_cli/config.py - Env var overrides in gateway/config.py _apply_env_overrides() Changes vs original PR #956: - Removed streaming infrastructure (already on main via stream_consumer.py) - Removed Telegram reply_to_mode (separate feature, not included) - Updated _resolve_model() -> _resolve_gateway_model() - Updated stream_callback -> stream_delta_callback - Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected() - Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK) Tests: 72 new tests, all passing Docs: API server guide, Open WebUI integration guide, env var reference * feat(whatsapp): make reply prefix configurable via config.yaml Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env. The WhatsApp bridge prepends a header to every outgoing message. This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize or disable it via config.yaml: whatsapp: reply_prefix: '' # disable header reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix How it works: - load_gateway_config() reads whatsapp.reply_prefix from config.yaml and stores it in PlatformConfig.extra['reply_prefix'] - WhatsAppAdapter reads it from config.extra at init - When spawning bridge.js, the adapter passes it as WHATSAPP_REPLY_PREFIX in the subprocess environment - bridge.js handles undefined (default), empty (no header), or custom values with \\n escape support - Self-chat echo suppression uses the configured prefix Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a key 10 (TAVILY_API_KEY), so existing users at v9 would never be prompted for Tavily. Bumped to 10 to close the gap. Added a regression test to prevent this from happening again. Credit: ifrederico (PR #1764) for the bridge.js implementation and the config version gap discovery. --------- Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
#### Named conversations
Use the `conversation` parameter instead of tracking response IDs:
```json
{"input": "Hello", "conversation": "my-project"}
{"input": "What's in src/?", "conversation": "my-project"}
{"input": "Run the tests", "conversation": "my-project"}
```
The server automatically chains to the latest response in that conversation. Like the `/title` command for gateway sessions.
### GET /v1/responses/\{id\}
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756) * feat: OpenAI-compatible API server platform adapter Salvaged from PR #956, updated for current main. Adds an HTTP API server as a gateway platform adapter that exposes hermes-agent via the OpenAI Chat Completions and Responses APIs. Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat, AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at http://localhost:8642/v1. Endpoints: - POST /v1/chat/completions — stateless Chat Completions API - POST /v1/responses — stateful Responses API with chaining - GET /v1/responses/{id} — retrieve stored response - DELETE /v1/responses/{id} — delete stored response - GET /v1/models — list hermes-agent as available model - GET /health — health check Features: - Real SSE streaming via stream_delta_callback (uses main's streaming) - In-memory LRU response store for Responses API conversation chaining - Named conversations via 'conversation' parameter - Bearer token auth (optional, via API_SERVER_KEY) - CORS support for browser-based frontends - System prompt layering (frontend system messages on top of core) - Real token usage tracking in responses Integration points: - Platform.API_SERVER in gateway/config.py - _create_adapter() branch in gateway/run.py - API_SERVER_* env vars in hermes_cli/config.py - Env var overrides in gateway/config.py _apply_env_overrides() Changes vs original PR #956: - Removed streaming infrastructure (already on main via stream_consumer.py) - Removed Telegram reply_to_mode (separate feature, not included) - Updated _resolve_model() -> _resolve_gateway_model() - Updated stream_callback -> stream_delta_callback - Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected() - Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK) Tests: 72 new tests, all passing Docs: API server guide, Open WebUI integration guide, env var reference * feat(whatsapp): make reply prefix configurable via config.yaml Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env. The WhatsApp bridge prepends a header to every outgoing message. This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize or disable it via config.yaml: whatsapp: reply_prefix: '' # disable header reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix How it works: - load_gateway_config() reads whatsapp.reply_prefix from config.yaml and stores it in PlatformConfig.extra['reply_prefix'] - WhatsAppAdapter reads it from config.extra at init - When spawning bridge.js, the adapter passes it as WHATSAPP_REPLY_PREFIX in the subprocess environment - bridge.js handles undefined (default), empty (no header), or custom values with \\n escape support - Self-chat echo suppression uses the configured prefix Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a key 10 (TAVILY_API_KEY), so existing users at v9 would never be prompted for Tavily. Bumped to 10 to close the gap. Added a regression test to prevent this from happening again. Credit: ifrederico (PR #1764) for the bridge.js implementation and the config version gap discovery. --------- Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
Retrieve a previously stored response by ID.
### DELETE /v1/responses/\{id\}
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756) * feat: OpenAI-compatible API server platform adapter Salvaged from PR #956, updated for current main. Adds an HTTP API server as a gateway platform adapter that exposes hermes-agent via the OpenAI Chat Completions and Responses APIs. Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat, AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at http://localhost:8642/v1. Endpoints: - POST /v1/chat/completions — stateless Chat Completions API - POST /v1/responses — stateful Responses API with chaining - GET /v1/responses/{id} — retrieve stored response - DELETE /v1/responses/{id} — delete stored response - GET /v1/models — list hermes-agent as available model - GET /health — health check Features: - Real SSE streaming via stream_delta_callback (uses main's streaming) - In-memory LRU response store for Responses API conversation chaining - Named conversations via 'conversation' parameter - Bearer token auth (optional, via API_SERVER_KEY) - CORS support for browser-based frontends - System prompt layering (frontend system messages on top of core) - Real token usage tracking in responses Integration points: - Platform.API_SERVER in gateway/config.py - _create_adapter() branch in gateway/run.py - API_SERVER_* env vars in hermes_cli/config.py - Env var overrides in gateway/config.py _apply_env_overrides() Changes vs original PR #956: - Removed streaming infrastructure (already on main via stream_consumer.py) - Removed Telegram reply_to_mode (separate feature, not included) - Updated _resolve_model() -> _resolve_gateway_model() - Updated stream_callback -> stream_delta_callback - Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected() - Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK) Tests: 72 new tests, all passing Docs: API server guide, Open WebUI integration guide, env var reference * feat(whatsapp): make reply prefix configurable via config.yaml Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env. The WhatsApp bridge prepends a header to every outgoing message. This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize or disable it via config.yaml: whatsapp: reply_prefix: '' # disable header reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix How it works: - load_gateway_config() reads whatsapp.reply_prefix from config.yaml and stores it in PlatformConfig.extra['reply_prefix'] - WhatsAppAdapter reads it from config.extra at init - When spawning bridge.js, the adapter passes it as WHATSAPP_REPLY_PREFIX in the subprocess environment - bridge.js handles undefined (default), empty (no header), or custom values with \\n escape support - Self-chat echo suppression uses the configured prefix Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a key 10 (TAVILY_API_KEY), so existing users at v9 would never be prompted for Tavily. Bumped to 10 to close the gap. Added a regression test to prevent this from happening again. Credit: ifrederico (PR #1764) for the bridge.js implementation and the config version gap discovery. --------- Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
Delete a stored response.
### GET /v1/models
docs: correctness audit — fix wrong values, add missing coverage (#11972) Comprehensive audit of every reference/messaging/feature doc page against the live code registries (PROVIDER_REGISTRY, OPTIONAL_ENV_VARS, COMMAND_REGISTRY, TOOLSETS, tool registry, on-disk skills). Every fix was verified against code before writing. ### Wrong values fixed (users would paste-and-fail) - reference/environment-variables.md: - DASHSCOPE_BASE_URL default was `coding-intl.dashscope.aliyuncs.com/v1` \u2192 actual `dashscope-intl.aliyuncs.com/compatible-mode/v1`. - MINIMAX_BASE_URL and MINIMAX_CN_BASE_URL defaults were `/v1` \u2192 actual `/anthropic` (Hermes calls MiniMax via its Anthropic Messages endpoint). - reference/toolsets-reference.md MCP example used the non-existent nested `mcp: servers:` key \u2192 real key is the flat `mcp_servers:`. - reference/skills-catalog.md listed ~20 bundled skills that no longer exist on disk (all moved to `optional-skills/`). Regenerated the whole bundled section from `skills/**/SKILL.md` \u2014 79 skills, accurate paths and names. - messaging/slack.md ":::info" callout claimed Slack has no `free_response_channels` equivalent; both the env var and the yaml key are in fact read. - messaging/qqbot.md documented `QQ_MARKDOWN_SUPPORT` as an env var, but the adapter only reads `extra.markdown_support` from config.yaml. Removed the env var row and noted config-only nature. - messaging/qqbot.md `hermes setup gateway` \u2192 `hermes gateway setup`. ### Missing coverage added - Providers: AWS Bedrock and Qwen Portal (qwen-oauth) \u2014 both in PROVIDER_REGISTRY but undocumented everywhere. Added sections to integrations/providers.md, rows to quickstart.md and fallback-providers.md. - integrations/providers.md "Fallback Model" provider list now includes gemini, google-gemini-cli, qwen-oauth, xai, nvidia, ollama-cloud, bedrock. - reference/cli-commands.md `--provider` enum and HERMES_INFERENCE_PROVIDER enum in env-vars now include the same set. - reference/slash-commands.md: added `/agents` (alias `/tasks`) and `/copy`. Removed duplicate rows for `/snapshot`, `/fast` (\u00d72), `/debug`. - reference/tools-reference.md: fixed "47 built-in tools" \u2192 52. Added `feishu_doc` and `feishu_drive` toolset sections. - reference/toolsets-reference.md: added `feishu_doc` / `feishu_drive` core rows + all missing `hermes-<platform>` toolsets in the platform table (bluebubbles, dingtalk, feishu, qqbot, wecom, wecom-callback, weixin, homeassistant, webhook, gateway). Fixed the `debugging` composite to describe the actual `includes=[...]` mechanism. - reference/optional-skills-catalog.md: added `fitness-nutrition`. - reference/environment-variables.md: added NOUS_BASE_URL, NOUS_INFERENCE_BASE_URL, NVIDIA_API_KEY/BASE_URL, OLLAMA_API_KEY/BASE_URL, XAI_API_KEY/BASE_URL, MISTRAL_API_KEY, AWS_REGION/AWS_PROFILE, BEDROCK_BASE_URL, HERMES_QWEN_BASE_URL, DISCORD_ALLOWED_CHANNELS, DISCORD_PROXY, TELEGRAM_REPLY_TO_MODE, MATRIX_DEVICE_ID, MATRIX_REACTIONS, QQBOT_HOME_CHANNEL_NAME, QQ_SANDBOX. - messaging/discord.md: documented DISCORD_ALLOWED_CHANNELS, DISCORD_PROXY, HERMES_DISCORD_TEXT_BATCH_DELAY_SECONDS and HERMES_DISCORD_TEXT_BATCH_SPLIT _DELAY_SECONDS (all actively read by the adapter). - messaging/matrix.md: documented MATRIX_REACTIONS (default true). - messaging/telegram.md: removed the redundant second Webhook Mode section that invented a `telegram.webhook_mode: true` yaml key the adapter does not read. - user-guide/features/hooks.md: added `on_session_finalize` and `on_session_reset` (both emitted via invoke_hook but undocumented). - user-guide/features/api-server.md: documented GET /health/detailed, the `/api/jobs/*` CRUD surface, POST /v1/runs, and GET /v1/runs/{id}/events (10 routes that were live but undocumented). - user-guide/features/fallback-providers.md: added `approval` and `title_generation` auxiliary-task rows; added gemini, bedrock, qwen-oauth to the supported-providers table. - user-guide/features/tts.md: "seven providers" \u2192 "eight" (post-xAI add oversight in #11942). - user-guide/configuration.md: TTS provider enum gains `xai` and `gemini`; yaml example block gains `mistral:`, `gemini:`, `xai:` subsections. Auxiliary-provider enum now enumerates all real registry entries. - reference/faq.md: stale AIAgent/config examples bumped from `nous/hermes-3-llama-3.1-70b` and `claude-sonnet-4.6` to `claude-opus-4.7`. ### Docs-site integrity - guides/build-a-hermes-plugin.md referenced two nonexistent hooks (`pre_api_request`, `post_api_request`). Replaced with the real `on_session_finalize` / `on_session_reset` entries. - messaging/open-webui.md and features/api-server.md had pre-existing broken links to `/docs/user-guide/features/profiles` (actual path is `/docs/user-guide/profiles`). Fixed. - reference/skills-catalog.md had one `<1%` literal that MDX parsed as a JSX tag. Escaped to `&lt;1%`. ### False positives filtered out (not changed, verified correct) - `/set-home` is a registered alias of `/sethome` \u2014 docs were fine. - `hermes setup gateway` is valid syntax (`hermes setup \<section\>`); changed in qqbot.md for cross-doc consistency, not as a bug fix. - Telegram reactions "disabled by default" matches code (default `"false"`). - Matrix encryption "opt-in" matches code (empty env default \u2192 disabled). - `pre_api_request` / `post_api_request` hooks do NOT exist in current code; documented instead the real `on_session_finalize` / `on_session_reset`. - SIGNAL_IGNORE_STORIES is already in env-vars.md (subagent missed it). Validation: - `docusaurus build` \u2014 passes (only pre-existing nix-setup anchor warning). - `ascii-guard lint docs` \u2014 124 files, 0 errors. - 22 files changed, +317 / \u2212158.
2026-04-18 01:45:48 -07:00
Lists the agent as an available model. The advertised model name defaults to the [profile](/docs/user-guide/profiles) name (or `hermes-agent` for the default profile). Required by most frontends for model discovery.
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756) * feat: OpenAI-compatible API server platform adapter Salvaged from PR #956, updated for current main. Adds an HTTP API server as a gateway platform adapter that exposes hermes-agent via the OpenAI Chat Completions and Responses APIs. Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat, AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at http://localhost:8642/v1. Endpoints: - POST /v1/chat/completions — stateless Chat Completions API - POST /v1/responses — stateful Responses API with chaining - GET /v1/responses/{id} — retrieve stored response - DELETE /v1/responses/{id} — delete stored response - GET /v1/models — list hermes-agent as available model - GET /health — health check Features: - Real SSE streaming via stream_delta_callback (uses main's streaming) - In-memory LRU response store for Responses API conversation chaining - Named conversations via 'conversation' parameter - Bearer token auth (optional, via API_SERVER_KEY) - CORS support for browser-based frontends - System prompt layering (frontend system messages on top of core) - Real token usage tracking in responses Integration points: - Platform.API_SERVER in gateway/config.py - _create_adapter() branch in gateway/run.py - API_SERVER_* env vars in hermes_cli/config.py - Env var overrides in gateway/config.py _apply_env_overrides() Changes vs original PR #956: - Removed streaming infrastructure (already on main via stream_consumer.py) - Removed Telegram reply_to_mode (separate feature, not included) - Updated _resolve_model() -> _resolve_gateway_model() - Updated stream_callback -> stream_delta_callback - Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected() - Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK) Tests: 72 new tests, all passing Docs: API server guide, Open WebUI integration guide, env var reference * feat(whatsapp): make reply prefix configurable via config.yaml Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env. The WhatsApp bridge prepends a header to every outgoing message. This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize or disable it via config.yaml: whatsapp: reply_prefix: '' # disable header reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix How it works: - load_gateway_config() reads whatsapp.reply_prefix from config.yaml and stores it in PlatformConfig.extra['reply_prefix'] - WhatsAppAdapter reads it from config.extra at init - When spawning bridge.js, the adapter passes it as WHATSAPP_REPLY_PREFIX in the subprocess environment - bridge.js handles undefined (default), empty (no header), or custom values with \\n escape support - Self-chat echo suppression uses the configured prefix Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a key 10 (TAVILY_API_KEY), so existing users at v9 would never be prompted for Tavily. Bumped to 10 to close the gap. Added a regression test to prevent this from happening again. Credit: ifrederico (PR #1764) for the bridge.js implementation and the config version gap discovery. --------- Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
### GET /health
Health check. Returns `{"status": "ok"}`. Also available at **GET /v1/health** for OpenAI-compatible clients that expect the `/v1/` prefix.
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756) * feat: OpenAI-compatible API server platform adapter Salvaged from PR #956, updated for current main. Adds an HTTP API server as a gateway platform adapter that exposes hermes-agent via the OpenAI Chat Completions and Responses APIs. Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat, AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at http://localhost:8642/v1. Endpoints: - POST /v1/chat/completions — stateless Chat Completions API - POST /v1/responses — stateful Responses API with chaining - GET /v1/responses/{id} — retrieve stored response - DELETE /v1/responses/{id} — delete stored response - GET /v1/models — list hermes-agent as available model - GET /health — health check Features: - Real SSE streaming via stream_delta_callback (uses main's streaming) - In-memory LRU response store for Responses API conversation chaining - Named conversations via 'conversation' parameter - Bearer token auth (optional, via API_SERVER_KEY) - CORS support for browser-based frontends - System prompt layering (frontend system messages on top of core) - Real token usage tracking in responses Integration points: - Platform.API_SERVER in gateway/config.py - _create_adapter() branch in gateway/run.py - API_SERVER_* env vars in hermes_cli/config.py - Env var overrides in gateway/config.py _apply_env_overrides() Changes vs original PR #956: - Removed streaming infrastructure (already on main via stream_consumer.py) - Removed Telegram reply_to_mode (separate feature, not included) - Updated _resolve_model() -> _resolve_gateway_model() - Updated stream_callback -> stream_delta_callback - Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected() - Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK) Tests: 72 new tests, all passing Docs: API server guide, Open WebUI integration guide, env var reference * feat(whatsapp): make reply prefix configurable via config.yaml Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env. The WhatsApp bridge prepends a header to every outgoing message. This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize or disable it via config.yaml: whatsapp: reply_prefix: '' # disable header reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix How it works: - load_gateway_config() reads whatsapp.reply_prefix from config.yaml and stores it in PlatformConfig.extra['reply_prefix'] - WhatsAppAdapter reads it from config.extra at init - When spawning bridge.js, the adapter passes it as WHATSAPP_REPLY_PREFIX in the subprocess environment - bridge.js handles undefined (default), empty (no header), or custom values with \\n escape support - Self-chat echo suppression uses the configured prefix Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a key 10 (TAVILY_API_KEY), so existing users at v9 would never be prompted for Tavily. Bumped to 10 to close the gap. Added a regression test to prevent this from happening again. Credit: ifrederico (PR #1764) for the bridge.js implementation and the config version gap discovery. --------- Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
docs: correctness audit — fix wrong values, add missing coverage (#11972) Comprehensive audit of every reference/messaging/feature doc page against the live code registries (PROVIDER_REGISTRY, OPTIONAL_ENV_VARS, COMMAND_REGISTRY, TOOLSETS, tool registry, on-disk skills). Every fix was verified against code before writing. ### Wrong values fixed (users would paste-and-fail) - reference/environment-variables.md: - DASHSCOPE_BASE_URL default was `coding-intl.dashscope.aliyuncs.com/v1` \u2192 actual `dashscope-intl.aliyuncs.com/compatible-mode/v1`. - MINIMAX_BASE_URL and MINIMAX_CN_BASE_URL defaults were `/v1` \u2192 actual `/anthropic` (Hermes calls MiniMax via its Anthropic Messages endpoint). - reference/toolsets-reference.md MCP example used the non-existent nested `mcp: servers:` key \u2192 real key is the flat `mcp_servers:`. - reference/skills-catalog.md listed ~20 bundled skills that no longer exist on disk (all moved to `optional-skills/`). Regenerated the whole bundled section from `skills/**/SKILL.md` \u2014 79 skills, accurate paths and names. - messaging/slack.md ":::info" callout claimed Slack has no `free_response_channels` equivalent; both the env var and the yaml key are in fact read. - messaging/qqbot.md documented `QQ_MARKDOWN_SUPPORT` as an env var, but the adapter only reads `extra.markdown_support` from config.yaml. Removed the env var row and noted config-only nature. - messaging/qqbot.md `hermes setup gateway` \u2192 `hermes gateway setup`. ### Missing coverage added - Providers: AWS Bedrock and Qwen Portal (qwen-oauth) \u2014 both in PROVIDER_REGISTRY but undocumented everywhere. Added sections to integrations/providers.md, rows to quickstart.md and fallback-providers.md. - integrations/providers.md "Fallback Model" provider list now includes gemini, google-gemini-cli, qwen-oauth, xai, nvidia, ollama-cloud, bedrock. - reference/cli-commands.md `--provider` enum and HERMES_INFERENCE_PROVIDER enum in env-vars now include the same set. - reference/slash-commands.md: added `/agents` (alias `/tasks`) and `/copy`. Removed duplicate rows for `/snapshot`, `/fast` (\u00d72), `/debug`. - reference/tools-reference.md: fixed "47 built-in tools" \u2192 52. Added `feishu_doc` and `feishu_drive` toolset sections. - reference/toolsets-reference.md: added `feishu_doc` / `feishu_drive` core rows + all missing `hermes-<platform>` toolsets in the platform table (bluebubbles, dingtalk, feishu, qqbot, wecom, wecom-callback, weixin, homeassistant, webhook, gateway). Fixed the `debugging` composite to describe the actual `includes=[...]` mechanism. - reference/optional-skills-catalog.md: added `fitness-nutrition`. - reference/environment-variables.md: added NOUS_BASE_URL, NOUS_INFERENCE_BASE_URL, NVIDIA_API_KEY/BASE_URL, OLLAMA_API_KEY/BASE_URL, XAI_API_KEY/BASE_URL, MISTRAL_API_KEY, AWS_REGION/AWS_PROFILE, BEDROCK_BASE_URL, HERMES_QWEN_BASE_URL, DISCORD_ALLOWED_CHANNELS, DISCORD_PROXY, TELEGRAM_REPLY_TO_MODE, MATRIX_DEVICE_ID, MATRIX_REACTIONS, QQBOT_HOME_CHANNEL_NAME, QQ_SANDBOX. - messaging/discord.md: documented DISCORD_ALLOWED_CHANNELS, DISCORD_PROXY, HERMES_DISCORD_TEXT_BATCH_DELAY_SECONDS and HERMES_DISCORD_TEXT_BATCH_SPLIT _DELAY_SECONDS (all actively read by the adapter). - messaging/matrix.md: documented MATRIX_REACTIONS (default true). - messaging/telegram.md: removed the redundant second Webhook Mode section that invented a `telegram.webhook_mode: true` yaml key the adapter does not read. - user-guide/features/hooks.md: added `on_session_finalize` and `on_session_reset` (both emitted via invoke_hook but undocumented). - user-guide/features/api-server.md: documented GET /health/detailed, the `/api/jobs/*` CRUD surface, POST /v1/runs, and GET /v1/runs/{id}/events (10 routes that were live but undocumented). - user-guide/features/fallback-providers.md: added `approval` and `title_generation` auxiliary-task rows; added gemini, bedrock, qwen-oauth to the supported-providers table. - user-guide/features/tts.md: "seven providers" \u2192 "eight" (post-xAI add oversight in #11942). - user-guide/configuration.md: TTS provider enum gains `xai` and `gemini`; yaml example block gains `mistral:`, `gemini:`, `xai:` subsections. Auxiliary-provider enum now enumerates all real registry entries. - reference/faq.md: stale AIAgent/config examples bumped from `nous/hermes-3-llama-3.1-70b` and `claude-sonnet-4.6` to `claude-opus-4.7`. ### Docs-site integrity - guides/build-a-hermes-plugin.md referenced two nonexistent hooks (`pre_api_request`, `post_api_request`). Replaced with the real `on_session_finalize` / `on_session_reset` entries. - messaging/open-webui.md and features/api-server.md had pre-existing broken links to `/docs/user-guide/features/profiles` (actual path is `/docs/user-guide/profiles`). Fixed. - reference/skills-catalog.md had one `<1%` literal that MDX parsed as a JSX tag. Escaped to `&lt;1%`. ### False positives filtered out (not changed, verified correct) - `/set-home` is a registered alias of `/sethome` \u2014 docs were fine. - `hermes setup gateway` is valid syntax (`hermes setup \<section\>`); changed in qqbot.md for cross-doc consistency, not as a bug fix. - Telegram reactions "disabled by default" matches code (default `"false"`). - Matrix encryption "opt-in" matches code (empty env default \u2192 disabled). - `pre_api_request` / `post_api_request` hooks do NOT exist in current code; documented instead the real `on_session_finalize` / `on_session_reset`. - SIGNAL_IGNORE_STORIES is already in env-vars.md (subagent missed it). Validation: - `docusaurus build` \u2014 passes (only pre-existing nix-setup anchor warning). - `ascii-guard lint docs` \u2014 124 files, 0 errors. - 22 files changed, +317 / \u2212158.
2026-04-18 01:45:48 -07:00
### GET /health/detailed
Extended health check that also reports active sessions, running agents, and resource usage. Useful for monitoring/observability tooling.
## Runs API (streaming-friendly alternative)
In addition to `/v1/chat/completions` and `/v1/responses`, the server exposes a **runs** API for long-form sessions where the client wants to subscribe to progress events instead of managing streaming themselves.
### POST /v1/runs
Create a new agent run. Returns a `run_id` that can be used to subscribe to progress events.
### GET /v1/runs/\{run_id\}/events
Server-Sent Events stream of the run's tool-call progress, token deltas, and lifecycle events. Designed for dashboards and thick clients that want to attach/detach without losing state.
## Jobs API (background scheduled work)
The server exposes a lightweight jobs CRUD surface for managing scheduled / background agent runs from a remote client. All endpoints are gated behind the same bearer auth.
### GET /api/jobs
List all scheduled jobs.
### POST /api/jobs
Create a new scheduled job. Body accepts the same shape as `hermes cron` — prompt, schedule, skills, provider override, delivery target.
### GET /api/jobs/\{job_id\}
Fetch a single job's definition and last-run state.
### PATCH /api/jobs/\{job_id\}
Update fields on an existing job (prompt, schedule, etc.). Partial updates are merged.
### DELETE /api/jobs/\{job_id\}
Remove a job. Also cancels any in-flight run.
### POST /api/jobs/\{job_id\}/pause
Pause a job without deleting it. Next-scheduled-run timestamps are suspended until resumed.
### POST /api/jobs/\{job_id\}/resume
Resume a previously paused job.
### POST /api/jobs/\{job_id\}/run
Trigger the job to run immediately, out of schedule.
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756) * feat: OpenAI-compatible API server platform adapter Salvaged from PR #956, updated for current main. Adds an HTTP API server as a gateway platform adapter that exposes hermes-agent via the OpenAI Chat Completions and Responses APIs. Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat, AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at http://localhost:8642/v1. Endpoints: - POST /v1/chat/completions — stateless Chat Completions API - POST /v1/responses — stateful Responses API with chaining - GET /v1/responses/{id} — retrieve stored response - DELETE /v1/responses/{id} — delete stored response - GET /v1/models — list hermes-agent as available model - GET /health — health check Features: - Real SSE streaming via stream_delta_callback (uses main's streaming) - In-memory LRU response store for Responses API conversation chaining - Named conversations via 'conversation' parameter - Bearer token auth (optional, via API_SERVER_KEY) - CORS support for browser-based frontends - System prompt layering (frontend system messages on top of core) - Real token usage tracking in responses Integration points: - Platform.API_SERVER in gateway/config.py - _create_adapter() branch in gateway/run.py - API_SERVER_* env vars in hermes_cli/config.py - Env var overrides in gateway/config.py _apply_env_overrides() Changes vs original PR #956: - Removed streaming infrastructure (already on main via stream_consumer.py) - Removed Telegram reply_to_mode (separate feature, not included) - Updated _resolve_model() -> _resolve_gateway_model() - Updated stream_callback -> stream_delta_callback - Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected() - Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK) Tests: 72 new tests, all passing Docs: API server guide, Open WebUI integration guide, env var reference * feat(whatsapp): make reply prefix configurable via config.yaml Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env. The WhatsApp bridge prepends a header to every outgoing message. This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize or disable it via config.yaml: whatsapp: reply_prefix: '' # disable header reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix How it works: - load_gateway_config() reads whatsapp.reply_prefix from config.yaml and stores it in PlatformConfig.extra['reply_prefix'] - WhatsAppAdapter reads it from config.extra at init - When spawning bridge.js, the adapter passes it as WHATSAPP_REPLY_PREFIX in the subprocess environment - bridge.js handles undefined (default), empty (no header), or custom values with \\n escape support - Self-chat echo suppression uses the configured prefix Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a key 10 (TAVILY_API_KEY), so existing users at v9 would never be prompted for Tavily. Bumped to 10 to close the gap. Added a regression test to prevent this from happening again. Credit: ifrederico (PR #1764) for the bridge.js implementation and the config version gap discovery. --------- Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
## System Prompt Handling
When a frontend sends a `system` message (Chat Completions) or `instructions` field (Responses API), hermes-agent **layers it on top** of its core system prompt. Your agent keeps all its tools, memory, and skills — the frontend's system prompt adds extra instructions.
This means you can customize behavior per-frontend without losing capabilities:
- Open WebUI system prompt: "You are a Python expert. Always include type hints."
- The agent still has terminal, file tools, web search, memory, etc.
## Authentication
Bearer token auth via the `Authorization` header:
```
Authorization: Bearer ***
```
Configure the key via `API_SERVER_KEY` env var. If you need a browser to call Hermes directly, also set `API_SERVER_CORS_ORIGINS` to an explicit allowlist.
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756) * feat: OpenAI-compatible API server platform adapter Salvaged from PR #956, updated for current main. Adds an HTTP API server as a gateway platform adapter that exposes hermes-agent via the OpenAI Chat Completions and Responses APIs. Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat, AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at http://localhost:8642/v1. Endpoints: - POST /v1/chat/completions — stateless Chat Completions API - POST /v1/responses — stateful Responses API with chaining - GET /v1/responses/{id} — retrieve stored response - DELETE /v1/responses/{id} — delete stored response - GET /v1/models — list hermes-agent as available model - GET /health — health check Features: - Real SSE streaming via stream_delta_callback (uses main's streaming) - In-memory LRU response store for Responses API conversation chaining - Named conversations via 'conversation' parameter - Bearer token auth (optional, via API_SERVER_KEY) - CORS support for browser-based frontends - System prompt layering (frontend system messages on top of core) - Real token usage tracking in responses Integration points: - Platform.API_SERVER in gateway/config.py - _create_adapter() branch in gateway/run.py - API_SERVER_* env vars in hermes_cli/config.py - Env var overrides in gateway/config.py _apply_env_overrides() Changes vs original PR #956: - Removed streaming infrastructure (already on main via stream_consumer.py) - Removed Telegram reply_to_mode (separate feature, not included) - Updated _resolve_model() -> _resolve_gateway_model() - Updated stream_callback -> stream_delta_callback - Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected() - Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK) Tests: 72 new tests, all passing Docs: API server guide, Open WebUI integration guide, env var reference * feat(whatsapp): make reply prefix configurable via config.yaml Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env. The WhatsApp bridge prepends a header to every outgoing message. This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize or disable it via config.yaml: whatsapp: reply_prefix: '' # disable header reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix How it works: - load_gateway_config() reads whatsapp.reply_prefix from config.yaml and stores it in PlatformConfig.extra['reply_prefix'] - WhatsAppAdapter reads it from config.extra at init - When spawning bridge.js, the adapter passes it as WHATSAPP_REPLY_PREFIX in the subprocess environment - bridge.js handles undefined (default), empty (no header), or custom values with \\n escape support - Self-chat echo suppression uses the configured prefix Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a key 10 (TAVILY_API_KEY), so existing users at v9 would never be prompted for Tavily. Bumped to 10 to close the gap. Added a regression test to prevent this from happening again. Credit: ifrederico (PR #1764) for the bridge.js implementation and the config version gap discovery. --------- Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
:::warning Security
The API server gives full access to hermes-agent's toolset, **including terminal commands**. When binding to a non-loopback address like `0.0.0.0`, `API_SERVER_KEY` is **required**. Also keep `API_SERVER_CORS_ORIGINS` narrow to control browser access.
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756) * feat: OpenAI-compatible API server platform adapter Salvaged from PR #956, updated for current main. Adds an HTTP API server as a gateway platform adapter that exposes hermes-agent via the OpenAI Chat Completions and Responses APIs. Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat, AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at http://localhost:8642/v1. Endpoints: - POST /v1/chat/completions — stateless Chat Completions API - POST /v1/responses — stateful Responses API with chaining - GET /v1/responses/{id} — retrieve stored response - DELETE /v1/responses/{id} — delete stored response - GET /v1/models — list hermes-agent as available model - GET /health — health check Features: - Real SSE streaming via stream_delta_callback (uses main's streaming) - In-memory LRU response store for Responses API conversation chaining - Named conversations via 'conversation' parameter - Bearer token auth (optional, via API_SERVER_KEY) - CORS support for browser-based frontends - System prompt layering (frontend system messages on top of core) - Real token usage tracking in responses Integration points: - Platform.API_SERVER in gateway/config.py - _create_adapter() branch in gateway/run.py - API_SERVER_* env vars in hermes_cli/config.py - Env var overrides in gateway/config.py _apply_env_overrides() Changes vs original PR #956: - Removed streaming infrastructure (already on main via stream_consumer.py) - Removed Telegram reply_to_mode (separate feature, not included) - Updated _resolve_model() -> _resolve_gateway_model() - Updated stream_callback -> stream_delta_callback - Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected() - Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK) Tests: 72 new tests, all passing Docs: API server guide, Open WebUI integration guide, env var reference * feat(whatsapp): make reply prefix configurable via config.yaml Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env. The WhatsApp bridge prepends a header to every outgoing message. This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize or disable it via config.yaml: whatsapp: reply_prefix: '' # disable header reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix How it works: - load_gateway_config() reads whatsapp.reply_prefix from config.yaml and stores it in PlatformConfig.extra['reply_prefix'] - WhatsAppAdapter reads it from config.extra at init - When spawning bridge.js, the adapter passes it as WHATSAPP_REPLY_PREFIX in the subprocess environment - bridge.js handles undefined (default), empty (no header), or custom values with \\n escape support - Self-chat echo suppression uses the configured prefix Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a key 10 (TAVILY_API_KEY), so existing users at v9 would never be prompted for Tavily. Bumped to 10 to close the gap. Added a regression test to prevent this from happening again. Credit: ifrederico (PR #1764) for the bridge.js implementation and the config version gap discovery. --------- Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
The default bind address (`127.0.0.1`) is for local-only use. Browser access is disabled by default; enable it only for explicit trusted origins.
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756) * feat: OpenAI-compatible API server platform adapter Salvaged from PR #956, updated for current main. Adds an HTTP API server as a gateway platform adapter that exposes hermes-agent via the OpenAI Chat Completions and Responses APIs. Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat, AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at http://localhost:8642/v1. Endpoints: - POST /v1/chat/completions — stateless Chat Completions API - POST /v1/responses — stateful Responses API with chaining - GET /v1/responses/{id} — retrieve stored response - DELETE /v1/responses/{id} — delete stored response - GET /v1/models — list hermes-agent as available model - GET /health — health check Features: - Real SSE streaming via stream_delta_callback (uses main's streaming) - In-memory LRU response store for Responses API conversation chaining - Named conversations via 'conversation' parameter - Bearer token auth (optional, via API_SERVER_KEY) - CORS support for browser-based frontends - System prompt layering (frontend system messages on top of core) - Real token usage tracking in responses Integration points: - Platform.API_SERVER in gateway/config.py - _create_adapter() branch in gateway/run.py - API_SERVER_* env vars in hermes_cli/config.py - Env var overrides in gateway/config.py _apply_env_overrides() Changes vs original PR #956: - Removed streaming infrastructure (already on main via stream_consumer.py) - Removed Telegram reply_to_mode (separate feature, not included) - Updated _resolve_model() -> _resolve_gateway_model() - Updated stream_callback -> stream_delta_callback - Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected() - Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK) Tests: 72 new tests, all passing Docs: API server guide, Open WebUI integration guide, env var reference * feat(whatsapp): make reply prefix configurable via config.yaml Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env. The WhatsApp bridge prepends a header to every outgoing message. This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize or disable it via config.yaml: whatsapp: reply_prefix: '' # disable header reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix How it works: - load_gateway_config() reads whatsapp.reply_prefix from config.yaml and stores it in PlatformConfig.extra['reply_prefix'] - WhatsAppAdapter reads it from config.extra at init - When spawning bridge.js, the adapter passes it as WHATSAPP_REPLY_PREFIX in the subprocess environment - bridge.js handles undefined (default), empty (no header), or custom values with \\n escape support - Self-chat echo suppression uses the configured prefix Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a key 10 (TAVILY_API_KEY), so existing users at v9 would never be prompted for Tavily. Bumped to 10 to close the gap. Added a regression test to prevent this from happening again. Credit: ifrederico (PR #1764) for the bridge.js implementation and the config version gap discovery. --------- Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
:::
## Configuration
### Environment Variables
| Variable | Default | Description |
|----------|---------|-------------|
| `API_SERVER_ENABLED` | `false` | Enable the API server |
| `API_SERVER_PORT` | `8642` | HTTP server port |
| `API_SERVER_HOST` | `127.0.0.1` | Bind address (localhost only by default) |
| `API_SERVER_KEY` | _(none)_ | Bearer token for auth |
| `API_SERVER_CORS_ORIGINS` | _(none)_ | Comma-separated allowed browser origins |
| `API_SERVER_MODEL_NAME` | _(profile name)_ | Model name on `/v1/models`. Defaults to profile name, or `hermes-agent` for default profile. |
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756) * feat: OpenAI-compatible API server platform adapter Salvaged from PR #956, updated for current main. Adds an HTTP API server as a gateway platform adapter that exposes hermes-agent via the OpenAI Chat Completions and Responses APIs. Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat, AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at http://localhost:8642/v1. Endpoints: - POST /v1/chat/completions — stateless Chat Completions API - POST /v1/responses — stateful Responses API with chaining - GET /v1/responses/{id} — retrieve stored response - DELETE /v1/responses/{id} — delete stored response - GET /v1/models — list hermes-agent as available model - GET /health — health check Features: - Real SSE streaming via stream_delta_callback (uses main's streaming) - In-memory LRU response store for Responses API conversation chaining - Named conversations via 'conversation' parameter - Bearer token auth (optional, via API_SERVER_KEY) - CORS support for browser-based frontends - System prompt layering (frontend system messages on top of core) - Real token usage tracking in responses Integration points: - Platform.API_SERVER in gateway/config.py - _create_adapter() branch in gateway/run.py - API_SERVER_* env vars in hermes_cli/config.py - Env var overrides in gateway/config.py _apply_env_overrides() Changes vs original PR #956: - Removed streaming infrastructure (already on main via stream_consumer.py) - Removed Telegram reply_to_mode (separate feature, not included) - Updated _resolve_model() -> _resolve_gateway_model() - Updated stream_callback -> stream_delta_callback - Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected() - Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK) Tests: 72 new tests, all passing Docs: API server guide, Open WebUI integration guide, env var reference * feat(whatsapp): make reply prefix configurable via config.yaml Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env. The WhatsApp bridge prepends a header to every outgoing message. This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize or disable it via config.yaml: whatsapp: reply_prefix: '' # disable header reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix How it works: - load_gateway_config() reads whatsapp.reply_prefix from config.yaml and stores it in PlatformConfig.extra['reply_prefix'] - WhatsAppAdapter reads it from config.extra at init - When spawning bridge.js, the adapter passes it as WHATSAPP_REPLY_PREFIX in the subprocess environment - bridge.js handles undefined (default), empty (no header), or custom values with \\n escape support - Self-chat echo suppression uses the configured prefix Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a key 10 (TAVILY_API_KEY), so existing users at v9 would never be prompted for Tavily. Bumped to 10 to close the gap. Added a regression test to prevent this from happening again. Credit: ifrederico (PR #1764) for the bridge.js implementation and the config version gap discovery. --------- Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
### config.yaml
```yaml
# Not yet supported — use environment variables.
# config.yaml support coming in a future release.
```
## Security Headers
All responses include security headers:
- `X-Content-Type-Options: nosniff` — prevents MIME type sniffing
- `Referrer-Policy: no-referrer` — prevents referrer leakage
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756) * feat: OpenAI-compatible API server platform adapter Salvaged from PR #956, updated for current main. Adds an HTTP API server as a gateway platform adapter that exposes hermes-agent via the OpenAI Chat Completions and Responses APIs. Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat, AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at http://localhost:8642/v1. Endpoints: - POST /v1/chat/completions — stateless Chat Completions API - POST /v1/responses — stateful Responses API with chaining - GET /v1/responses/{id} — retrieve stored response - DELETE /v1/responses/{id} — delete stored response - GET /v1/models — list hermes-agent as available model - GET /health — health check Features: - Real SSE streaming via stream_delta_callback (uses main's streaming) - In-memory LRU response store for Responses API conversation chaining - Named conversations via 'conversation' parameter - Bearer token auth (optional, via API_SERVER_KEY) - CORS support for browser-based frontends - System prompt layering (frontend system messages on top of core) - Real token usage tracking in responses Integration points: - Platform.API_SERVER in gateway/config.py - _create_adapter() branch in gateway/run.py - API_SERVER_* env vars in hermes_cli/config.py - Env var overrides in gateway/config.py _apply_env_overrides() Changes vs original PR #956: - Removed streaming infrastructure (already on main via stream_consumer.py) - Removed Telegram reply_to_mode (separate feature, not included) - Updated _resolve_model() -> _resolve_gateway_model() - Updated stream_callback -> stream_delta_callback - Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected() - Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK) Tests: 72 new tests, all passing Docs: API server guide, Open WebUI integration guide, env var reference * feat(whatsapp): make reply prefix configurable via config.yaml Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env. The WhatsApp bridge prepends a header to every outgoing message. This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize or disable it via config.yaml: whatsapp: reply_prefix: '' # disable header reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix How it works: - load_gateway_config() reads whatsapp.reply_prefix from config.yaml and stores it in PlatformConfig.extra['reply_prefix'] - WhatsAppAdapter reads it from config.extra at init - When spawning bridge.js, the adapter passes it as WHATSAPP_REPLY_PREFIX in the subprocess environment - bridge.js handles undefined (default), empty (no header), or custom values with \\n escape support - Self-chat echo suppression uses the configured prefix Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a key 10 (TAVILY_API_KEY), so existing users at v9 would never be prompted for Tavily. Bumped to 10 to close the gap. Added a regression test to prevent this from happening again. Credit: ifrederico (PR #1764) for the bridge.js implementation and the config version gap discovery. --------- Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
## CORS
The API server does **not** enable browser CORS by default.
For direct browser access, set an explicit allowlist:
```bash
API_SERVER_CORS_ORIGINS=http://localhost:3000,http://127.0.0.1:3000
```
When CORS is enabled:
- **Preflight responses** include `Access-Control-Max-Age: 600` (10 minute cache)
- **SSE streaming responses** include CORS headers so browser EventSource clients work correctly
- **`Idempotency-Key`** is an allowed request header — clients can send it for deduplication (responses are cached by key for 5 minutes)
Most documented frontends such as Open WebUI connect server-to-server and do not need CORS at all.
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756) * feat: OpenAI-compatible API server platform adapter Salvaged from PR #956, updated for current main. Adds an HTTP API server as a gateway platform adapter that exposes hermes-agent via the OpenAI Chat Completions and Responses APIs. Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat, AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at http://localhost:8642/v1. Endpoints: - POST /v1/chat/completions — stateless Chat Completions API - POST /v1/responses — stateful Responses API with chaining - GET /v1/responses/{id} — retrieve stored response - DELETE /v1/responses/{id} — delete stored response - GET /v1/models — list hermes-agent as available model - GET /health — health check Features: - Real SSE streaming via stream_delta_callback (uses main's streaming) - In-memory LRU response store for Responses API conversation chaining - Named conversations via 'conversation' parameter - Bearer token auth (optional, via API_SERVER_KEY) - CORS support for browser-based frontends - System prompt layering (frontend system messages on top of core) - Real token usage tracking in responses Integration points: - Platform.API_SERVER in gateway/config.py - _create_adapter() branch in gateway/run.py - API_SERVER_* env vars in hermes_cli/config.py - Env var overrides in gateway/config.py _apply_env_overrides() Changes vs original PR #956: - Removed streaming infrastructure (already on main via stream_consumer.py) - Removed Telegram reply_to_mode (separate feature, not included) - Updated _resolve_model() -> _resolve_gateway_model() - Updated stream_callback -> stream_delta_callback - Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected() - Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK) Tests: 72 new tests, all passing Docs: API server guide, Open WebUI integration guide, env var reference * feat(whatsapp): make reply prefix configurable via config.yaml Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env. The WhatsApp bridge prepends a header to every outgoing message. This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize or disable it via config.yaml: whatsapp: reply_prefix: '' # disable header reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix How it works: - load_gateway_config() reads whatsapp.reply_prefix from config.yaml and stores it in PlatformConfig.extra['reply_prefix'] - WhatsAppAdapter reads it from config.extra at init - When spawning bridge.js, the adapter passes it as WHATSAPP_REPLY_PREFIX in the subprocess environment - bridge.js handles undefined (default), empty (no header), or custom values with \\n escape support - Self-chat echo suppression uses the configured prefix Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a key 10 (TAVILY_API_KEY), so existing users at v9 would never be prompted for Tavily. Bumped to 10 to close the gap. Added a regression test to prevent this from happening again. Credit: ifrederico (PR #1764) for the bridge.js implementation and the config version gap discovery. --------- Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
## Compatible Frontends
Any frontend that supports the OpenAI API format works. Tested/documented integrations:
| Frontend | Stars | Connection |
|----------|-------|------------|
| [Open WebUI](/docs/user-guide/messaging/open-webui) | 126k | Full guide available |
| LobeChat | 73k | Custom provider endpoint |
| LibreChat | 34k | Custom endpoint in librechat.yaml |
| AnythingLLM | 56k | Generic OpenAI provider |
| NextChat | 87k | BASE_URL env var |
| ChatBox | 39k | API Host setting |
| Jan | 26k | Remote model config |
| HF Chat-UI | 8k | OPENAI_BASE_URL |
| big-AGI | 7k | Custom endpoint |
| OpenAI Python SDK | — | `OpenAI(base_url="http://localhost:8642/v1")` |
| curl | — | Direct HTTP requests |
## Multi-User Setup with Profiles
docs: correctness audit — fix wrong values, add missing coverage (#11972) Comprehensive audit of every reference/messaging/feature doc page against the live code registries (PROVIDER_REGISTRY, OPTIONAL_ENV_VARS, COMMAND_REGISTRY, TOOLSETS, tool registry, on-disk skills). Every fix was verified against code before writing. ### Wrong values fixed (users would paste-and-fail) - reference/environment-variables.md: - DASHSCOPE_BASE_URL default was `coding-intl.dashscope.aliyuncs.com/v1` \u2192 actual `dashscope-intl.aliyuncs.com/compatible-mode/v1`. - MINIMAX_BASE_URL and MINIMAX_CN_BASE_URL defaults were `/v1` \u2192 actual `/anthropic` (Hermes calls MiniMax via its Anthropic Messages endpoint). - reference/toolsets-reference.md MCP example used the non-existent nested `mcp: servers:` key \u2192 real key is the flat `mcp_servers:`. - reference/skills-catalog.md listed ~20 bundled skills that no longer exist on disk (all moved to `optional-skills/`). Regenerated the whole bundled section from `skills/**/SKILL.md` \u2014 79 skills, accurate paths and names. - messaging/slack.md ":::info" callout claimed Slack has no `free_response_channels` equivalent; both the env var and the yaml key are in fact read. - messaging/qqbot.md documented `QQ_MARKDOWN_SUPPORT` as an env var, but the adapter only reads `extra.markdown_support` from config.yaml. Removed the env var row and noted config-only nature. - messaging/qqbot.md `hermes setup gateway` \u2192 `hermes gateway setup`. ### Missing coverage added - Providers: AWS Bedrock and Qwen Portal (qwen-oauth) \u2014 both in PROVIDER_REGISTRY but undocumented everywhere. Added sections to integrations/providers.md, rows to quickstart.md and fallback-providers.md. - integrations/providers.md "Fallback Model" provider list now includes gemini, google-gemini-cli, qwen-oauth, xai, nvidia, ollama-cloud, bedrock. - reference/cli-commands.md `--provider` enum and HERMES_INFERENCE_PROVIDER enum in env-vars now include the same set. - reference/slash-commands.md: added `/agents` (alias `/tasks`) and `/copy`. Removed duplicate rows for `/snapshot`, `/fast` (\u00d72), `/debug`. - reference/tools-reference.md: fixed "47 built-in tools" \u2192 52. Added `feishu_doc` and `feishu_drive` toolset sections. - reference/toolsets-reference.md: added `feishu_doc` / `feishu_drive` core rows + all missing `hermes-<platform>` toolsets in the platform table (bluebubbles, dingtalk, feishu, qqbot, wecom, wecom-callback, weixin, homeassistant, webhook, gateway). Fixed the `debugging` composite to describe the actual `includes=[...]` mechanism. - reference/optional-skills-catalog.md: added `fitness-nutrition`. - reference/environment-variables.md: added NOUS_BASE_URL, NOUS_INFERENCE_BASE_URL, NVIDIA_API_KEY/BASE_URL, OLLAMA_API_KEY/BASE_URL, XAI_API_KEY/BASE_URL, MISTRAL_API_KEY, AWS_REGION/AWS_PROFILE, BEDROCK_BASE_URL, HERMES_QWEN_BASE_URL, DISCORD_ALLOWED_CHANNELS, DISCORD_PROXY, TELEGRAM_REPLY_TO_MODE, MATRIX_DEVICE_ID, MATRIX_REACTIONS, QQBOT_HOME_CHANNEL_NAME, QQ_SANDBOX. - messaging/discord.md: documented DISCORD_ALLOWED_CHANNELS, DISCORD_PROXY, HERMES_DISCORD_TEXT_BATCH_DELAY_SECONDS and HERMES_DISCORD_TEXT_BATCH_SPLIT _DELAY_SECONDS (all actively read by the adapter). - messaging/matrix.md: documented MATRIX_REACTIONS (default true). - messaging/telegram.md: removed the redundant second Webhook Mode section that invented a `telegram.webhook_mode: true` yaml key the adapter does not read. - user-guide/features/hooks.md: added `on_session_finalize` and `on_session_reset` (both emitted via invoke_hook but undocumented). - user-guide/features/api-server.md: documented GET /health/detailed, the `/api/jobs/*` CRUD surface, POST /v1/runs, and GET /v1/runs/{id}/events (10 routes that were live but undocumented). - user-guide/features/fallback-providers.md: added `approval` and `title_generation` auxiliary-task rows; added gemini, bedrock, qwen-oauth to the supported-providers table. - user-guide/features/tts.md: "seven providers" \u2192 "eight" (post-xAI add oversight in #11942). - user-guide/configuration.md: TTS provider enum gains `xai` and `gemini`; yaml example block gains `mistral:`, `gemini:`, `xai:` subsections. Auxiliary-provider enum now enumerates all real registry entries. - reference/faq.md: stale AIAgent/config examples bumped from `nous/hermes-3-llama-3.1-70b` and `claude-sonnet-4.6` to `claude-opus-4.7`. ### Docs-site integrity - guides/build-a-hermes-plugin.md referenced two nonexistent hooks (`pre_api_request`, `post_api_request`). Replaced with the real `on_session_finalize` / `on_session_reset` entries. - messaging/open-webui.md and features/api-server.md had pre-existing broken links to `/docs/user-guide/features/profiles` (actual path is `/docs/user-guide/profiles`). Fixed. - reference/skills-catalog.md had one `<1%` literal that MDX parsed as a JSX tag. Escaped to `&lt;1%`. ### False positives filtered out (not changed, verified correct) - `/set-home` is a registered alias of `/sethome` \u2014 docs were fine. - `hermes setup gateway` is valid syntax (`hermes setup \<section\>`); changed in qqbot.md for cross-doc consistency, not as a bug fix. - Telegram reactions "disabled by default" matches code (default `"false"`). - Matrix encryption "opt-in" matches code (empty env default \u2192 disabled). - `pre_api_request` / `post_api_request` hooks do NOT exist in current code; documented instead the real `on_session_finalize` / `on_session_reset`. - SIGNAL_IGNORE_STORIES is already in env-vars.md (subagent missed it). Validation: - `docusaurus build` \u2014 passes (only pre-existing nix-setup anchor warning). - `ascii-guard lint docs` \u2014 124 files, 0 errors. - 22 files changed, +317 / \u2212158.
2026-04-18 01:45:48 -07:00
To give multiple users their own isolated Hermes instance (separate config, memory, skills), use [profiles](/docs/user-guide/profiles):
```bash
# Create a profile per user
hermes profile create alice
hermes profile create bob
# Configure each profile's API server on a different port
hermes -p alice config set API_SERVER_ENABLED true
hermes -p alice config set API_SERVER_PORT 8643
hermes -p alice config set API_SERVER_KEY alice-secret
hermes -p bob config set API_SERVER_ENABLED true
hermes -p bob config set API_SERVER_PORT 8644
hermes -p bob config set API_SERVER_KEY bob-secret
# Start each profile's gateway
hermes -p alice gateway &
hermes -p bob gateway &
```
Each profile's API server automatically advertises the profile name as the model ID:
- `http://localhost:8643/v1/models` → model `alice`
- `http://localhost:8644/v1/models` → model `bob`
In Open WebUI, add each as a separate connection. The model dropdown shows `alice` and `bob` as distinct models, each backed by a fully isolated Hermes instance. See the [Open WebUI guide](/docs/user-guide/messaging/open-webui#multi-user-setup-with-profiles) for details.
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756) * feat: OpenAI-compatible API server platform adapter Salvaged from PR #956, updated for current main. Adds an HTTP API server as a gateway platform adapter that exposes hermes-agent via the OpenAI Chat Completions and Responses APIs. Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat, AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at http://localhost:8642/v1. Endpoints: - POST /v1/chat/completions — stateless Chat Completions API - POST /v1/responses — stateful Responses API with chaining - GET /v1/responses/{id} — retrieve stored response - DELETE /v1/responses/{id} — delete stored response - GET /v1/models — list hermes-agent as available model - GET /health — health check Features: - Real SSE streaming via stream_delta_callback (uses main's streaming) - In-memory LRU response store for Responses API conversation chaining - Named conversations via 'conversation' parameter - Bearer token auth (optional, via API_SERVER_KEY) - CORS support for browser-based frontends - System prompt layering (frontend system messages on top of core) - Real token usage tracking in responses Integration points: - Platform.API_SERVER in gateway/config.py - _create_adapter() branch in gateway/run.py - API_SERVER_* env vars in hermes_cli/config.py - Env var overrides in gateway/config.py _apply_env_overrides() Changes vs original PR #956: - Removed streaming infrastructure (already on main via stream_consumer.py) - Removed Telegram reply_to_mode (separate feature, not included) - Updated _resolve_model() -> _resolve_gateway_model() - Updated stream_callback -> stream_delta_callback - Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected() - Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK) Tests: 72 new tests, all passing Docs: API server guide, Open WebUI integration guide, env var reference * feat(whatsapp): make reply prefix configurable via config.yaml Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env. The WhatsApp bridge prepends a header to every outgoing message. This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize or disable it via config.yaml: whatsapp: reply_prefix: '' # disable header reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix How it works: - load_gateway_config() reads whatsapp.reply_prefix from config.yaml and stores it in PlatformConfig.extra['reply_prefix'] - WhatsAppAdapter reads it from config.extra at init - When spawning bridge.js, the adapter passes it as WHATSAPP_REPLY_PREFIX in the subprocess environment - bridge.js handles undefined (default), empty (no header), or custom values with \\n escape support - Self-chat echo suppression uses the configured prefix Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a key 10 (TAVILY_API_KEY), so existing users at v9 would never be prompted for Tavily. Bumped to 10 to close the gap. Added a regression test to prevent this from happening again. Credit: ifrederico (PR #1764) for the bridge.js implementation and the config version gap discovery. --------- Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
## Limitations
- **Response storage** — stored responses (for `previous_response_id`) are persisted in SQLite and survive gateway restarts. Max 100 stored responses (LRU eviction).
feat(api-server): inline image inputs on /v1/chat/completions and /v1/responses (#12969) OpenAI-compatible clients (Open WebUI, LobeChat, etc.) can now send vision requests to the API server. Both endpoints accept the canonical OpenAI multimodal shape: Chat Completions: {type: text|image_url, image_url: {url, detail?}} Responses: {type: input_text|input_image, image_url: <str>, detail?} The server validates and converts both into a single internal shape that the existing agent pipeline already handles (Anthropic adapter converts, OpenAI-wire providers pass through). Remote http(s) URLs and data:image/* URLs are supported. Uploaded files (file, input_file, file_id) and non-image data: URLs are rejected with 400 unsupported_content_type. Changes: - gateway/platforms/api_server.py - _normalize_multimodal_content(): validates + normalizes both Chat and Responses content shapes. Returns a plain string for text-only content (preserves prompt-cache behavior on existing callers) or a canonical [{type:text|image_url,...}] list when images are present. - _content_has_visible_payload(): replaces the bare truthy check so a user turn with only an image no longer rejects as 'No user message'. - _handle_chat_completions and _handle_responses both call the new helper for user/assistant content; system messages continue to flatten to text. - Codex conversation_history, input[], and inline history paths all share the same validator. No duplicated normalizers. - run_agent.py - _summarize_user_message_for_log(): produces a short string summary ('[1 image] describe this') from list content for logging, spinner previews, and trajectory writes. Fixes AttributeError when list user_message hit user_message[:80] + '...' / .replace(). - _chat_content_to_responses_parts(): module-level helper that converts chat-style multimodal content to Responses 'input_text'/'input_image' parts. Used in _chat_messages_to_responses_input for Codex routing. - _preflight_codex_input_items() now validates and passes through list content parts for user/assistant messages instead of stringifying. - tests/gateway/test_api_server_multimodal.py (new, 38 tests) - Unit coverage for _normalize_multimodal_content, including both part formats, data URL gating, and all reject paths. - Real aiohttp HTTP integration on /v1/chat/completions and /v1/responses verifying multimodal payloads reach _run_agent intact. - 400 coverage for file / input_file / non-image data URL. - tests/run_agent/test_run_agent_multimodal_prologue.py (new) - Regression coverage for the prologue no-crash contract. - _chat_content_to_responses_parts round-trip coverage. - website/docs/user-guide/features/api-server.md - Inline image examples for both endpoints. - Updated Limitations: files still unsupported, images now supported. Validated live against openrouter/anthropic/claude-opus-4.6: POST /v1/chat/completions → 200, vision-accurate description POST /v1/responses → 200, same image, clean output_text POST /v1/chat/completions [file] → 400 unsupported_content_type POST /v1/responses [input_file] → 400 unsupported_content_type POST /v1/responses [non-image data URL] → 400 unsupported_content_type Closes #5621, #8253, #4046, #6632. Co-authored-by: Paul Bergeron <paul@gamma.app> Co-authored-by: zhangxicen <zhangxicen@example.com> Co-authored-by: Manuel Schipper <manuelschipper@users.noreply.github.com> Co-authored-by: pradeep7127 <pradeep7127@users.noreply.github.com>
2026-04-20 04:16:13 -07:00
- **No file upload** — inline images are supported on both `/v1/chat/completions` and `/v1/responses`, but uploaded files (`file`, `input_file`, `file_id`) and non-image document inputs are not supported through the API.
feat: OpenAI-compatible API server + WhatsApp configurable reply prefix (#1756) * feat: OpenAI-compatible API server platform adapter Salvaged from PR #956, updated for current main. Adds an HTTP API server as a gateway platform adapter that exposes hermes-agent via the OpenAI Chat Completions and Responses APIs. Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat, AnythingLLM, NextChat, ChatBox, etc.) can connect by pointing at http://localhost:8642/v1. Endpoints: - POST /v1/chat/completions — stateless Chat Completions API - POST /v1/responses — stateful Responses API with chaining - GET /v1/responses/{id} — retrieve stored response - DELETE /v1/responses/{id} — delete stored response - GET /v1/models — list hermes-agent as available model - GET /health — health check Features: - Real SSE streaming via stream_delta_callback (uses main's streaming) - In-memory LRU response store for Responses API conversation chaining - Named conversations via 'conversation' parameter - Bearer token auth (optional, via API_SERVER_KEY) - CORS support for browser-based frontends - System prompt layering (frontend system messages on top of core) - Real token usage tracking in responses Integration points: - Platform.API_SERVER in gateway/config.py - _create_adapter() branch in gateway/run.py - API_SERVER_* env vars in hermes_cli/config.py - Env var overrides in gateway/config.py _apply_env_overrides() Changes vs original PR #956: - Removed streaming infrastructure (already on main via stream_consumer.py) - Removed Telegram reply_to_mode (separate feature, not included) - Updated _resolve_model() -> _resolve_gateway_model() - Updated stream_callback -> stream_delta_callback - Updated connect()/disconnect() to use _mark_connected()/_mark_disconnected() - Adapted to current Platform enum (includes MATTERMOST, MATRIX, DINGTALK) Tests: 72 new tests, all passing Docs: API server guide, Open WebUI integration guide, env var reference * feat(whatsapp): make reply prefix configurable via config.yaml Reworked from PR #1764 (ifrederico) to use config.yaml instead of .env. The WhatsApp bridge prepends a header to every outgoing message. This was hardcoded to '⚕ *Hermes Agent*'. Users can now customize or disable it via config.yaml: whatsapp: reply_prefix: '' # disable header reply_prefix: '🤖 *My Bot*\n───\n' # custom prefix How it works: - load_gateway_config() reads whatsapp.reply_prefix from config.yaml and stores it in PlatformConfig.extra['reply_prefix'] - WhatsAppAdapter reads it from config.extra at init - When spawning bridge.js, the adapter passes it as WHATSAPP_REPLY_PREFIX in the subprocess environment - bridge.js handles undefined (default), empty (no header), or custom values with \\n escape support - Self-chat echo suppression uses the configured prefix Also fixes _config_version: was 9 but ENV_VARS_BY_VERSION had a key 10 (TAVILY_API_KEY), so existing users at v9 would never be prompted for Tavily. Bumped to 10 to close the gap. Added a regression test to prevent this from happening again. Credit: ifrederico (PR #1764) for the bridge.js implementation and the config version gap discovery. --------- Co-authored-by: Test <test@test.com>
2026-03-17 10:44:37 -07:00
- **Model field is cosmetic** — the `model` field in requests is accepted but the actual LLM model used is configured server-side in config.yaml.
## Proxy Mode
The API server also serves as the backend for **gateway proxy mode**. When another Hermes gateway instance is configured with `GATEWAY_PROXY_URL` pointing at this API server, it forwards all messages here instead of running its own agent. This enables split deployments — for example, a Docker container handling Matrix E2EE that relays to a host-side agent.
See [Matrix Proxy Mode](/docs/user-guide/messaging/matrix#proxy-mode-e2ee-on-macos) for the full setup guide.