Compare commits

...

5 Commits

Author SHA1 Message Date
Hermes Agent
5ad28a2dbe merge: resolve conflict with main (i18n refactor)
Main moved StatusPage constants/functions inside the component and added
i18n support. Resolved by keeping the i18n structure and adding the
runningRemote key to en.ts, zh.ts, and types.ts for remote gateway
display.
2026-04-14 22:30:50 +00:00
Hermes Agent
d5949d0d16 docs(docker): add dashboard section, expose API port, update Compose example
- Running in gateway mode: expose port 8642 for the API server and
  health endpoint, with a note on when it's needed.
- New 'Running the dashboard' section: docker run command with
  GATEWAY_HEALTH_URL and env var reference table.
- Docker Compose example: updated to include both gateway and dashboard
  services with internal network connectivity (hermes-net), so the
  dashboard probes the gateway via http://hermes:8642.
- Concurrent access warning: clarified that running a read-only
  dashboard alongside the gateway is safe.
2026-04-15 08:11:52 +10:00
Hermes Agent
88d590ce5e fix: override stale 'stopped' state when health probe confirms gateway alive
When the gateway responds to the health probe but the local
gateway_state.json has a stale 'stopped' state (common in cross-container
setups where the file was written before the gateway restarted), the
dashboard would show 'Running (remote)' but with a 'Stopped' badge.

Now if the HTTP probe succeeded (remote_health_body is not None) and
gateway_state is 'stopped' or None, override it to 'running'. Also
handles the no-shared-volume case where runtime is None entirely.
2026-04-14 22:01:02 +00:00
Hermes Agent
30c089a7e9 fix: normalise GATEWAY_HEALTH_URL to base URL before probing
The probe was appending '/detailed' to whatever URL was provided,
so GATEWAY_HEALTH_URL=http://host:8642 would try /8642/detailed
and /8642 — neither of which are valid routes.

Now strips any trailing /health or /health/detailed from the env var
and always probes {base}/health/detailed then {base}/health.
Accepts bare base URL, /health, or /health/detailed forms.
2026-04-14 06:29:59 +00:00
Hermes Agent
28c39fda3d feat(dashboard): add HTTP health probe for cross-container gateway detection
The dashboard's gateway status detection relied solely on local PID checks
(os.kill + /proc), which fails when the gateway runs in a separate container.

Changes:
- web_server.py: Add _probe_gateway_health() that queries the gateway's HTTP
  /health/detailed endpoint when the local PID check fails. Activated by
  setting the GATEWAY_HEALTH_URL env var (e.g. http://gateway:8642/health).
  Falls back to standard PID check when the env var is not set.
- api_server.py: Add GET /health/detailed endpoint that returns full gateway
  state (platforms, gateway_state, active_agents, pid, etc.) without auth.
  The existing GET /health remains unchanged for backwards compatibility.
- StatusPage.tsx: Handle the case where gateway_pid is null but the gateway
  is running remotely, displaying 'Running (remote)' instead of 'PID null'.

Environment variables:
- GATEWAY_HEALTH_URL: URL of the gateway health endpoint (e.g.
  http://gateway-container:8642/health). Unset = local PID check only.
- GATEWAY_HEALTH_TIMEOUT: Probe timeout in seconds (default: 3).
2026-04-14 05:17:17 +00:00
7 changed files with 163 additions and 5 deletions

View File

@@ -10,6 +10,7 @@ Exposes an HTTP server with endpoints:
- POST /v1/runs — start a run, returns run_id immediately (202)
- GET /v1/runs/{run_id}/events — SSE stream of structured lifecycle events
- GET /health — health check
- GET /health/detailed — rich status for cross-container dashboard probing
Any OpenAI-compatible frontend (Open WebUI, LobeChat, LibreChat,
AnythingLLM, NextChat, ChatBox, etc.) can connect to hermes-agent
@@ -565,6 +566,27 @@ class APIServerAdapter(BasePlatformAdapter):
"""GET /health — simple health check."""
return web.json_response({"status": "ok", "platform": "hermes-agent"})
async def _handle_health_detailed(self, request: "web.Request") -> "web.Response":
"""GET /health/detailed — rich status for cross-container dashboard probing.
Returns gateway state, connected platforms, PID, and uptime so the
dashboard can display full status without needing a shared PID file or
/proc access. No authentication required.
"""
from gateway.status import read_runtime_status
runtime = read_runtime_status() or {}
return web.json_response({
"status": "ok",
"platform": "hermes-agent",
"gateway_state": runtime.get("gateway_state"),
"platforms": runtime.get("platforms", {}),
"active_agents": runtime.get("active_agents", 0),
"exit_reason": runtime.get("exit_reason"),
"updated_at": runtime.get("updated_at"),
"pid": os.getpid(),
})
async def _handle_models(self, request: "web.Request") -> "web.Response":
"""GET /v1/models — return hermes-agent as an available model."""
auth_err = self._check_auth(request)
@@ -1783,6 +1805,7 @@ class APIServerAdapter(BasePlatformAdapter):
self._app = web.Application(middlewares=mws)
self._app["api_server_adapter"] = self
self._app.router.add_get("/health", self._handle_health)
self._app.router.add_get("/health/detailed", self._handle_health_detailed)
self._app.router.add_get("/v1/health", self._handle_health)
self._app.router.add_get("/v1/models", self._handle_models)
self._app.router.add_post("/v1/chat/completions", self._handle_chat_completions)

View File

@@ -13,6 +13,7 @@ import asyncio
import hmac
import json
import logging
import os
import secrets
import sys
import threading
@@ -319,12 +320,68 @@ class EnvVarReveal(BaseModel):
key: str
_GATEWAY_HEALTH_URL = os.getenv("GATEWAY_HEALTH_URL")
_GATEWAY_HEALTH_TIMEOUT = float(os.getenv("GATEWAY_HEALTH_TIMEOUT", "3"))
def _probe_gateway_health() -> tuple[bool, dict | None]:
"""Probe the gateway via its HTTP health endpoint (cross-container).
Uses ``/health/detailed`` first (returns full state), falling back to
the simpler ``/health`` endpoint. Returns ``(is_alive, body_dict)``.
Accepts any of these as ``GATEWAY_HEALTH_URL``:
- ``http://gateway:8642`` (base URL — recommended)
- ``http://gateway:8642/health`` (explicit health path)
- ``http://gateway:8642/health/detailed`` (explicit detailed path)
This is a **blocking** call — run via ``run_in_executor`` from async code.
"""
if not _GATEWAY_HEALTH_URL:
return False, None
# Normalise to base URL so we always probe the right paths regardless of
# whether the user included /health or /health/detailed in the env var.
base = _GATEWAY_HEALTH_URL.rstrip("/")
if base.endswith("/health/detailed"):
base = base[: -len("/health/detailed")]
elif base.endswith("/health"):
base = base[: -len("/health")]
for path in (f"{base}/health/detailed", f"{base}/health"):
try:
req = urllib.request.Request(path, method="GET")
with urllib.request.urlopen(req, timeout=_GATEWAY_HEALTH_TIMEOUT) as resp:
if resp.status == 200:
body = json.loads(resp.read())
return True, body
except Exception:
continue
return False, None
@app.get("/api/status")
async def get_status():
current_ver, latest_ver = check_config_version()
# --- Gateway liveness detection ---
# Try local PID check first (same-host). If that fails and a remote
# GATEWAY_HEALTH_URL is configured, probe the gateway over HTTP so the
# dashboard works when the gateway runs in a separate container.
gateway_pid = get_running_pid()
gateway_running = gateway_pid is not None
remote_health_body: dict | None = None
if not gateway_running and _GATEWAY_HEALTH_URL:
loop = asyncio.get_event_loop()
alive, remote_health_body = await loop.run_in_executor(
None, _probe_gateway_health
)
if alive:
gateway_running = True
# PID from the remote container (display only — not locally valid)
if remote_health_body:
gateway_pid = remote_health_body.get("pid")
gateway_state = None
gateway_platforms: dict = {}
@@ -341,7 +398,12 @@ async def get_status():
except Exception:
configured_gateway_platforms = None
# Prefer the detailed health endpoint response (has full state) when the
# local runtime status file is absent or stale (cross-container).
runtime = read_runtime_status()
if runtime is None and remote_health_body and remote_health_body.get("gateway_state"):
runtime = remote_health_body
if runtime:
gateway_state = runtime.get("gateway_state")
gateway_platforms = runtime.get("platforms") or {}
@@ -356,6 +418,17 @@ async def get_status():
if not gateway_running:
gateway_state = gateway_state if gateway_state in ("stopped", "startup_failed") else "stopped"
gateway_platforms = {}
elif gateway_running and remote_health_body is not None:
# The health probe confirmed the gateway is alive, but the local
# runtime status file may be stale (cross-container). Override
# stopped/None state so the dashboard shows the correct badge.
if gateway_state in (None, "stopped"):
gateway_state = "running"
# If there was no runtime info at all but the health probe confirmed alive,
# ensure we still report the gateway as running (no shared volume scenario).
if gateway_running and gateway_state is None and remote_health_body is not None:
gateway_state = "running"
active_sessions = 0
try:

View File

@@ -78,6 +78,7 @@ export const en: Translations = {
disconnected: "Disconnected",
error: "Error",
notRunning: "Not running",
runningRemote: "Running (remote)",
startFailed: "Start failed",
pid: "PID",
noneRunning: "None",

View File

@@ -81,6 +81,7 @@ export interface Translations {
disconnected: string;
error: string;
notRunning: string;
runningRemote: string;
startFailed: string;
pid: string;
noneRunning: string;

View File

@@ -78,6 +78,7 @@ export const zh: Translations = {
disconnected: "已断开",
error: "错误",
notRunning: "未运行",
runningRemote: "运行中(远程)",
startFailed: "启动失败",
pid: "进程",
noneRunning: "无",

View File

@@ -53,7 +53,8 @@ export default function StatusPage() {
};
function gatewayValue(): string {
if (status!.gateway_running) return `${t.status.pid} ${status!.gateway_pid}`;
if (status!.gateway_running && status!.gateway_pid) return `${t.status.pid} ${status!.gateway_pid}`;
if (status!.gateway_running) return t.status.runningRemote;
if (status!.gateway_state === "startup_failed") return t.status.startFailed;
return t.status.notRunning;
}

View File

@@ -35,9 +35,39 @@ docker run -d \
--name hermes \
--restart unless-stopped \
-v ~/.hermes:/opt/data \
-p 8642:8642 \
nousresearch/hermes-agent gateway run
```
Port 8642 exposes the gateway's [OpenAI-compatible API server](./api-server.md) and health endpoint. It's optional if you only use chat platforms (Telegram, Discord, etc.), but required if you want the dashboard or external tools to reach the gateway.
Opening any port on an internet facing machine is a security risk. You should not do it unless you understand the risks.
## Running the dashboard
The built-in web dashboard can run alongside the gateway as a separate container.
To run the dashboard as its own container, point it at the gateway's health endpoint so it can detect gateway status across containers:
```sh
docker run -d \
--name hermes-dashboard \
--restart unless-stopped \
-v ~/.hermes:/opt/data \
-p 9119:9119 \
-e GATEWAY_HEALTH_URL=http://$HOST_IP:8642 \
nousresearch/hermes-agent dashboard
```
Replace `$HOST_IP` with the IP address of the machine running the gateway container (e.g. `192.168.1.100`), or use a Docker network hostname if both containers share a network (see the [Compose example](#docker-compose-example) below).
| Environment variable | Description | Default |
|---------------------|-------------|---------|
| `GATEWAY_HEALTH_URL` | Base URL of the gateway's API server, e.g. `http://gateway:8642` | *(unset — local PID check only)* |
| `GATEWAY_HEALTH_TIMEOUT` | Health probe timeout in seconds | `3` |
Without `GATEWAY_HEALTH_URL`, the dashboard falls back to local process detection — which only works when the gateway runs in the same container or on the same host.
## Running interactively (CLI chat)
To open an interactive chat session against a running data directory:
@@ -66,7 +96,7 @@ The `/opt/data` volume is the single source of truth for all Hermes state. It ma
| `skins/` | Custom CLI skins |
:::warning
Never run two Hermes containers against the same data directory simultaneously — session files and memory stores are not designed for concurrent access.
Never run two Hermes **gateway** containers against the same data directory simultaneously — session files and memory stores are not designed for concurrent write access. Running a dashboard container alongside the gateway is safe since the dashboard only reads data.
:::
## Environment variable forwarding
@@ -85,18 +115,21 @@ Direct `-e` flags override values from `.env`. This is useful for CI/CD or secre
## Docker Compose example
For persistent gateway deployment, a `docker-compose.yaml` is convenient:
For persistent deployment with both the gateway and dashboard, a `docker-compose.yaml` is convenient:
```yaml
version: "3.8"
services:
hermes:
image: nousresearch/hermes-agent:latest
container_name: hermes
restart: unless-stopped
command: gateway run
ports:
- "8642:8642"
volumes:
- ~/.hermes:/opt/data
networks:
- hermes-net
# Uncomment to forward specific env vars instead of using .env file:
# environment:
# - ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
@@ -107,9 +140,34 @@ services:
limits:
memory: 4G
cpus: "2.0"
dashboard:
image: nousresearch/hermes-agent:latest
container_name: hermes-dashboard
restart: unless-stopped
command: dashboard --host 0.0.0.0
ports:
- "9119:9119"
volumes:
- ~/.hermes:/opt/data
environment:
- GATEWAY_HEALTH_URL=http://hermes:8642
networks:
- hermes-net
depends_on:
- hermes
deploy:
resources:
limits:
memory: 512M
cpus: "0.5"
networks:
hermes-net:
driver: bridge
```
Start with `docker compose up -d` and view logs with `docker compose logs -f hermes`.
Start with `docker compose up -d` and view logs with `docker compose logs -f`.
## Resource limits