Files
hermes-agent/website/docs/user-guide/configuration.md
Teknium 22ff6ca32b docs: two-week gap sweep — platforms, CLI, config, TUI, hooks, providers (#17727)
Covers ~60 merged PRs from Apr 15–29 that shipped user-visible behavior
without docs coverage. No functional code changes; docs + static manifest
regeneration only.

Highlights:

Stale / incorrect:
- configuration.md: auxiliary auto-routing line was wrong since #11900;
  now correctly states auto routes to the main model, with a note on the
  cost trade-off and per-task override pattern.
- integrations/providers.md + configuration.md compression intro:
  removed stale 'Gemini Flash via OpenRouter' claim.
- website/static/api/model-catalog.json: rebuilt from hermes_cli/models.py
  so the live manifest picks up tencent/hy3-preview (and remains in sync
  for future model-catalog PRs).

Platform messaging (#17417 #16997 #16193 #14315 #13151 #11794 #10610
#10283 #10246 #11564 #13178):
- Signal: native formatting (bodyRanges), reply quotes, reactions.
- Telegram: table rendering (bullets + code-block fallback),
  disable_link_previews, group_allowed_chats.
- Slack: strict_mention config.
- Discord: slash_commands disable, send_animation GIF, send_message
  native media attachments.
- DingTalk: require_mention + allowed_users.

CLI (#16052 #16539 #16566 #15841 #14798 #10043):
- New 'hermes fallback' interactive manager.
- New 'hermes update --check', '--backup' flag, and pre-update pairing
  snapshot behavior.
- 'hermes gateway start/restart --all' multi-profile flag.
- cron.md: 'hermes tools' as a platform, per-job enabled_toolsets,
  wakeAgent gate, context_from chaining.

Config keys / env vars (#17305 #17026 #17000 #15077 #14557 #14227
#14166 #14730 #17008):
- terminal.docker_run_as_host_user, display.runtime_metadata_footer,
  compression.hygiene_hard_message_limit, HINDSIGHT_TIMEOUT,
  skills.guard_agent_created, TAVILY_BASE_URL,
  security.allow_private_urls, agent.api_max_retries,
  gateway hot-reload of compression/context_length config edits.

TUI / CLI UX (#17130 #17113 #17175 #17150 #16707 #12312 #12305 #12934
#14810 #14045 #17286 #17126):
- HERMES_TUI_RESUME, HERMES_TUI_THEME, LaTeX rendering, busy-indicator
  styles, ctrl-x queued-message delete, git branch in status bar, per-
  prompt elapsed stopwatch, external-editor keybind, markdown stripping,
  TUI voice-mode parity, /agents overlay, /reload + /mouse.

Gateway features (#16506 #15027 #13428 #12116):
- Native multimodal image routing based on vision capability.
- /usage account-limits section.
- /steer slash command (added to reference + explanation in CLI).

Plugins / hooks (#12929 #12972 #10763 #16364):
- transform_tool_result, transform_terminal_output plugin hooks.
- PluginContext.dispatch_tool() documented with slash-command example.
- google_meet bundled plugin entry under built-in-plugins.md.

Other (#16576 #16572 #16383 #15878 #15608 #15606 #14809 #14767 #14231
#14232 #14307 #13683 #12373 #11891 #11291 #10066):
- hermes backup exclusions (WAL/SHM/journal + checkpoints/).
- security.md hardline blocklist (floor below --yolo).
- FHS install layout for root installs.
- openssh-client + docker-cli baked into the Docker image.
- MEDIA: tag supported extensions table (docs/office/archives/pdf).
- Remote-to-host file sync on SSH/Modal/Daytona teardown.
- 'hermes model' -> Configure Auxiliary Models interactive picker.
- Podman support via HERMES_DOCKER_BINARY.

Providers / STT / one-shot (#15045 #14473 #15704):
- alibaba-coding-plan first-class provider entry.
- xAI Grok STT as a 6th transcription option.
- 'hermes -z' scripted one-shot mode + HERMES_INFERENCE_MODEL.

Build: 'docusaurus build' succeeds. No new broken links/anchors;
pre-existing warnings unchanged.
2026-04-29 20:32:37 -07:00

83 KiB

sidebar_position, title, description
sidebar_position title description
2 Configuration Configure Hermes Agent — config.yaml, providers, models, API keys, and more

Configuration

All settings are stored in the ~/.hermes/ directory for easy access.

Directory Structure

~/.hermes/
├── config.yaml     # Settings (model, terminal, TTS, compression, etc.)
├── .env            # API keys and secrets
├── auth.json       # OAuth provider credentials (Nous Portal, etc.)
├── SOUL.md         # Primary agent identity (slot #1 in system prompt)
├── memories/       # Persistent memory (MEMORY.md, USER.md)
├── skills/         # Agent-created skills (managed via skill_manage tool)
├── cron/           # Scheduled jobs
├── sessions/       # Gateway sessions
└── logs/           # Logs (errors.log, gateway.log — secrets auto-redacted)

Managing Configuration

hermes config              # View current configuration
hermes config edit         # Open config.yaml in your editor
hermes config set KEY VAL  # Set a specific value
hermes config check        # Check for missing options (after updates)
hermes config migrate      # Interactively add missing options

# Examples:
hermes config set model anthropic/claude-opus-4
hermes config set terminal.backend docker
hermes config set OPENROUTER_API_KEY sk-or-...  # Saves to .env

:::tip The hermes config set command automatically routes values to the right file — API keys are saved to .env, everything else to config.yaml. :::

Configuration Precedence

Settings are resolved in this order (highest priority first):

  1. CLI arguments — e.g., hermes chat --model anthropic/claude-sonnet-4 (per-invocation override)
  2. ~/.hermes/config.yaml — the primary config file for all non-secret settings
  3. ~/.hermes/.env — fallback for env vars; required for secrets (API keys, tokens, passwords)
  4. Built-in defaults — hardcoded safe defaults when nothing else is set

:::info Rule of Thumb Secrets (API keys, bot tokens, passwords) go in .env. Everything else (model, terminal backend, compression settings, memory limits, toolsets) goes in config.yaml. When both are set, config.yaml wins for non-secret settings. :::

Environment Variable Substitution

You can reference environment variables in config.yaml using ${VAR_NAME} syntax:

auxiliary:
  vision:
    api_key: ${GOOGLE_API_KEY}
    base_url: ${CUSTOM_VISION_URL}

delegation:
  api_key: ${DELEGATION_KEY}

Multiple references in a single value work: url: "${HOST}:${PORT}". If a referenced variable is not set, the placeholder is kept verbatim (${UNDEFINED_VAR} stays as-is). Only the ${VAR} syntax is supported — bare $VAR is not expanded.

For AI provider setup (OpenRouter, Anthropic, Copilot, custom endpoints, self-hosted LLMs, fallback models, etc.), see AI Providers.

Provider Timeouts

You can set providers.<id>.request_timeout_seconds for a provider-wide request timeout, plus providers.<id>.models.<model>.timeout_seconds for a model-specific override. Applies to the primary turn client on every transport (OpenAI-wire, native Anthropic, Anthropic-compatible), the fallback chain, rebuilds after credential rotation, and (for OpenAI-wire) the per-request timeout kwarg — so the configured value wins over the legacy HERMES_API_TIMEOUT env var.

You can also set providers.<id>.stale_timeout_seconds for the non-streaming stale-call detector, plus providers.<id>.models.<model>.stale_timeout_seconds for a model-specific override. This wins over the legacy HERMES_API_CALL_STALE_TIMEOUT env var.

Leaving these unset keeps the legacy defaults (HERMES_API_TIMEOUT=1800s, HERMES_API_CALL_STALE_TIMEOUT=300s, native Anthropic 900s). Not currently wired for AWS Bedrock (both bedrock_converse and AnthropicBedrock SDK paths use boto3 with its own timeout configuration). See the commented example in cli-config.yaml.example.

Terminal Backend Configuration

Hermes supports seven terminal backends. Each determines where the agent's shell commands actually execute — your local machine, a Docker container, a remote server via SSH, a Modal cloud sandbox, a Daytona workspace, a Vercel Sandbox, or a Singularity/Apptainer container.

terminal:
  backend: local    # local | docker | ssh | modal | daytona | vercel_sandbox | singularity
  cwd: "."          # Working directory ("." = current dir for local, "/root" for containers)
  timeout: 180      # Per-command timeout in seconds
  env_passthrough: []  # Env var names to forward to sandboxed execution (terminal + execute_code)
  singularity_image: "docker://nikolaik/python-nodejs:python3.11-nodejs20"  # Container image for Singularity backend
  modal_image: "nikolaik/python-nodejs:python3.11-nodejs20"                 # Container image for Modal backend
  daytona_image: "nikolaik/python-nodejs:python3.11-nodejs20"               # Container image for Daytona backend

For cloud sandboxes such as Modal, Daytona, and Vercel Sandbox, container_persistent: true means Hermes will try to preserve filesystem state across sandbox recreation. It does not promise that the same live sandbox, PID space, or background processes will still be running later.

Backend Overview

Backend Where commands run Isolation Best for
local Your machine directly None Development, personal use
docker Docker container Full (namespaces, cap-drop) Safe sandboxing, CI/CD
ssh Remote server via SSH Network boundary Remote dev, powerful hardware
modal Modal cloud sandbox Full (cloud VM) Ephemeral cloud compute, evals
daytona Daytona workspace Full (cloud container) Managed cloud dev environments
vercel_sandbox Vercel Sandbox Full (cloud microVM) Cloud execution with snapshot-backed filesystem persistence
singularity Singularity/Apptainer container Namespaces (--containall) HPC clusters, shared machines

Local Backend

The default. Commands run directly on your machine with no isolation. No special setup required.

terminal:
  backend: local

:::warning The agent has the same filesystem access as your user account. Use hermes tools to disable tools you don't want, or switch to Docker for sandboxing. :::

Docker Backend

Runs commands inside a Docker container with security hardening (all capabilities dropped, no privilege escalation, PID limits).

terminal:
  backend: docker
  docker_image: "nikolaik/python-nodejs:python3.11-nodejs20"
  docker_mount_cwd_to_workspace: false  # Mount launch dir into /workspace
  docker_run_as_host_user: false   # See "Running container as host user" below
  docker_forward_env:              # Env vars to forward into container
    - "GITHUB_TOKEN"
  docker_volumes:                  # Host directory mounts
    - "/home/user/projects:/workspace/projects"
    - "/home/user/data:/data:ro"   # :ro for read-only

  # Resource limits
  container_cpu: 1                 # CPU cores (0 = unlimited)
  container_memory: 5120           # MB (0 = unlimited)
  container_disk: 51200            # MB (requires overlay2 on XFS+pquota)
  container_persistent: true       # Persist /workspace and /root across sessions

Requirements: Docker Desktop or Docker Engine installed and running. Hermes probes $PATH plus common macOS install locations (/usr/local/bin/docker, /opt/homebrew/bin/docker, Docker Desktop app bundle). Podman is supported out of the box: set HERMES_DOCKER_BINARY=podman (or the full path) to force it when both are installed.

Container lifecycle: Hermes reuses a single long-lived container (docker run -d ... sleep 2h) for every terminal and file-tool call, across sessions, /new, /reset, and delegate_task subagents, for the lifetime of the Hermes process. Commands run via docker exec with a login shell, so working-directory changes, installed packages, and files in /workspace all persist from one tool call to the next. The container is stopped and removed on Hermes shutdown (or when the idle-sweep reclaims it).

Parallel subagents spawned via delegate_task(tasks=[...]) share this one container — concurrent cd, env mutations, and writes to the same path will collide. If a subagent needs an isolated sandbox, it must register a per-task image override via register_task_env_overrides(), which RL and benchmark environments (TerminalBench2, HermesSweEnv, etc.) do automatically for their per-task Docker images.

Security hardening:

  • --cap-drop ALL with only DAC_OVERRIDE, CHOWN, FOWNER added back
  • --security-opt no-new-privileges
  • --pids-limit 256
  • Size-limited tmpfs for /tmp (512MB), /var/tmp (256MB), /run (64MB)

Credential forwarding: Env vars listed in docker_forward_env are resolved from your shell environment first, then ~/.hermes/.env. Skills can also declare required_environment_variables which are merged automatically.

SSH Backend

Runs commands on a remote server over SSH. Uses ControlMaster for connection reuse (5-minute idle keepalive). Persistent shell is enabled by default — state (cwd, env vars) survives across commands.

terminal:
  backend: ssh
  persistent_shell: true           # Keep a long-lived bash session (default: true)

Required environment variables:

TERMINAL_SSH_HOST=my-server.example.com
TERMINAL_SSH_USER=ubuntu

Optional:

Variable Default Description
TERMINAL_SSH_PORT 22 SSH port
TERMINAL_SSH_KEY (system default) Path to SSH private key
TERMINAL_SSH_PERSISTENT true Enable persistent shell

How it works: Connects at init time with BatchMode=yes and StrictHostKeyChecking=accept-new. Persistent shell keeps a single bash -l process alive on the remote host, communicating via temporary files. Commands that need stdin_data or sudo automatically fall back to one-shot mode.

Modal Backend

Runs commands in a Modal cloud sandbox. Each task gets an isolated VM with configurable CPU, memory, and disk. Filesystem can be snapshot/restored across sessions.

terminal:
  backend: modal
  container_cpu: 1                 # CPU cores
  container_memory: 5120           # MB (5GB)
  container_disk: 51200            # MB (50GB)
  container_persistent: true       # Snapshot/restore filesystem

Required: Either MODAL_TOKEN_ID + MODAL_TOKEN_SECRET environment variables, or a ~/.modal.toml config file.

Persistence: When enabled, the sandbox filesystem is snapshotted on cleanup and restored on next session. Snapshots are tracked in ~/.hermes/modal_snapshots.json. This preserves filesystem state, not live processes, PID space, or background jobs.

Credential files: Automatically mounted from ~/.hermes/ (OAuth tokens, etc.) and synced before each command.

Daytona Backend

Runs commands in a Daytona managed workspace. Supports stop/resume for persistence.

terminal:
  backend: daytona
  container_cpu: 1                 # CPU cores
  container_memory: 5120           # MB → converted to GiB
  container_disk: 10240            # MB → converted to GiB (max 10 GiB)
  container_persistent: true       # Stop/resume instead of delete

Required: DAYTONA_API_KEY environment variable.

Persistence: When enabled, sandboxes are stopped (not deleted) on cleanup and resumed on next session. Sandbox names follow the pattern hermes-{task_id}.

Disk limit: Daytona enforces a 10 GiB maximum. Requests above this are capped with a warning.

Vercel Sandbox Backend

Runs commands in a Vercel Sandbox cloud microVM. Hermes uses the normal terminal and file tool surfaces; there are no Vercel-specific model-facing tools.

terminal:
  backend: vercel_sandbox
  vercel_runtime: node24          # node24 | node22 | python3.13
  cwd: /vercel/sandbox            # default workspace root
  container_persistent: true      # Snapshot/restore filesystem
  container_disk: 51200           # Shared default only; custom disk is unsupported

Required install: Install the optional SDK extra:

pip install 'hermes-agent[vercel]'

Required authentication: Configure access-token auth with all three of VERCEL_TOKEN, VERCEL_PROJECT_ID, and VERCEL_TEAM_ID. This is the supported setup for deployments and normal long-running Hermes processes on Render, Railway, Docker, and similar hosts.

For one-off local development, Hermes also accepts short-lived Vercel OIDC tokens:

VERCEL_OIDC_TOKEN="$(vc project token <project-name>)" hermes chat

From a linked Vercel project directory, you can omit the project name:

VERCEL_OIDC_TOKEN="$(vc project token)" hermes chat

OIDC tokens are short-lived and should not be used as the documented deployment path.

Runtime: terminal.vercel_runtime supports node24, node22, and python3.13. If unset, Hermes defaults to node24.

Persistence: When container_persistent: true, Hermes snapshots the sandbox filesystem during cleanup and restores a later sandbox for the same task from that snapshot. Snapshot contents can include Hermes-synced credentials, skills, and cache files that were copied into the sandbox. This preserves filesystem state only; it does not preserve live sandbox identity, PID space, shell state, or running background processes.

Background commands: terminal(background=true) uses Hermes' generic non-local background process flow. You can spawn, poll, wait, view logs, and kill processes through the normal process tool while the sandbox is alive. Hermes does not provide native Vercel detached-process recovery after cleanup or restart.

Disk sizing: Vercel Sandbox does not currently support Hermes' container_disk resource knob. Leave container_disk unset or at the shared default 51200; non-default values fail diagnostics and backend creation instead of being silently ignored.

Singularity/Apptainer Backend

Runs commands in a Singularity/Apptainer container. Designed for HPC clusters and shared machines where Docker isn't available.

terminal:
  backend: singularity
  singularity_image: "docker://nikolaik/python-nodejs:python3.11-nodejs20"
  container_cpu: 1                 # CPU cores
  container_memory: 5120           # MB
  container_persistent: true       # Writable overlay persists across sessions

Requirements: apptainer or singularity binary in $PATH.

Image handling: Docker URLs (docker://...) are automatically converted to SIF files and cached. Existing .sif files are used directly.

Scratch directory: Resolved in order: TERMINAL_SCRATCH_DIRTERMINAL_SANDBOX_DIR/singularity/scratch/$USER/hermes-agent (HPC convention) → ~/.hermes/sandboxes/singularity.

Isolation: Uses --containall --no-home for full namespace isolation without mounting the host home directory.

Common Terminal Backend Issues

If terminal commands fail immediately or the terminal tool is reported as disabled:

  • Local — No special requirements. The safest default when getting started.
  • Docker — Run docker version to verify Docker is working. If it fails, fix Docker or hermes config set terminal.backend local.
  • SSH — Both TERMINAL_SSH_HOST and TERMINAL_SSH_USER must be set. Hermes logs a clear error if either is missing.
  • Modal — Needs MODAL_TOKEN_ID env var or ~/.modal.toml. Run hermes doctor to check.
  • Daytona — Needs DAYTONA_API_KEY. The Daytona SDK handles server URL configuration.
  • Singularity — Needs apptainer or singularity in $PATH. Common on HPC clusters.

When in doubt, set terminal.backend back to local and verify that commands run there first.

Remote-to-Host File Sync on Teardown

For the SSH, Modal, and Daytona backends (anywhere the agent's working tree lives on a different machine than the host running Hermes), Hermes tracks files the agent touched inside the remote sandbox and, on session teardown / sandbox cleanup, syncs the modified files back to the host under ~/.hermes/cache/remote-syncs/<session-id>/.

  • Triggers on: session close, /new, /reset, gateway message timeout, delegate_task subagent completion when the child used a remote backend.
  • Covers the whole tree the agent modified, not just files it explicitly opened. Additions, edits, and deletions are all captured.
  • The remote sandbox may have been torn down by the time you go looking; the local ~/.hermes/cache/remote-syncs/… copy is the authoritative record of what the agent changed.
  • Large binary outputs (model checkpoints, raw datasets) are capped by size — the sync skips files over file_sync_max_mb (default 100). Bump that if you expect bigger artifacts to come back.
terminal:
  file_sync_max_mb: 100     # default — sync files up to 100 MB each
  file_sync_enabled: true   # default — set false to skip the sync entirely

This is how you recover results from ephemeral cloud sandboxes that get destroyed after the session ends, without having to tell the agent to explicitly scp or modal volume put every artifact.

Docker Volume Mounts

When using the Docker backend, docker_volumes lets you share host directories with the container. Each entry uses standard Docker -v syntax: host_path:container_path[:options].

terminal:
  backend: docker
  docker_volumes:
    - "/home/user/projects:/workspace/projects"   # Read-write (default)
    - "/home/user/datasets:/data:ro"              # Read-only
    - "/home/user/.hermes/cache/documents:/output" # Gateway-visible exports

This is useful for:

  • Providing files to the agent (datasets, configs, reference code)
  • Receiving files from the agent (generated code, reports, exports)
  • Shared workspaces where both you and the agent access the same files

If you use a messaging gateway and want the agent to send generated files via MEDIA:/..., prefer a dedicated host-visible export mount such as /home/user/.hermes/cache/documents:/output.

  • Write files inside Docker to /output/...
  • Emit the host path in MEDIA:, for example: MEDIA:/home/user/.hermes/cache/documents/report.txt
  • Do not emit /workspace/... or /output/... unless that exact path also exists for the gateway process on the host

:::warning YAML duplicate keys silently override earlier ones. If you already have a docker_volumes: block, merge new mounts into the same list instead of adding another docker_volumes: key later in the file. :::

Can also be set via environment variable: TERMINAL_DOCKER_VOLUMES='["/host:/container"]' (JSON array).

Docker Credential Forwarding

By default, Docker terminal sessions do not inherit arbitrary host credentials. If you need a specific token inside the container, add it to terminal.docker_forward_env.

terminal:
  backend: docker
  docker_forward_env:
    - "GITHUB_TOKEN"
    - "NPM_TOKEN"

Hermes resolves each listed variable from your current shell first, then falls back to ~/.hermes/.env if it was saved with hermes config set.

:::warning Anything listed in docker_forward_env becomes visible to commands run inside the container. Only forward credentials you are comfortable exposing to the terminal session. :::

Running the Container as Your Host User

By default Docker containers run as root (UID 0). Files created inside /workspace or other bind-mounts end up owned by root on the host, so after a session you have to sudo chown them before you can edit them from your host editor. The terminal.docker_run_as_host_user flag fixes this:

terminal:
  backend: docker
  docker_run_as_host_user: true   # default: false

When enabled, Hermes appends --user $(id -u):$(id -g) to the docker run command so files written into bind-mounted directories (/workspace, /root, anything in docker_volumes) are owned by your host user, not root. The trade-off: the container can no longer apt install or write to root-owned paths like /root/.npm — use a base image whose HOME is owned by a non-root user (or add your required tooling at image build time) if you need both.

Leave this false (the default) for backwards-compatible behavior. Turn it on when your workflow is mostly "edit mounted host files" and you're tired of sudo chown -R.

Optional: Mount the Launch Directory into /workspace

Docker sandboxes stay isolated by default. Hermes does not pass your current host working directory into the container unless you explicitly opt in.

Enable it in config.yaml:

terminal:
  backend: docker
  docker_mount_cwd_to_workspace: true

When enabled:

  • if you launch Hermes from ~/projects/my-app, that host directory is bind-mounted to /workspace
  • the Docker backend starts in /workspace
  • file tools and terminal commands both see the same mounted project

When disabled, /workspace stays sandbox-owned unless you explicitly mount something via docker_volumes.

Security tradeoff:

  • false preserves the sandbox boundary
  • true gives the sandbox direct access to the directory you launched Hermes from

Use the opt-in only when you intentionally want the container to work on live host files.

Persistent Shell

By default, each terminal command runs in its own subprocess — working directory, environment variables, and shell variables reset between commands. When persistent shell is enabled, a single long-lived bash process is kept alive across execute() calls so that state survives between commands.

This is most useful for the SSH backend, where it also eliminates per-command connection overhead. Persistent shell is enabled by default for SSH and disabled for the local backend.

terminal:
  persistent_shell: true   # default — enables persistent shell for SSH

To disable:

hermes config set terminal.persistent_shell false

What persists across commands:

  • Working directory (cd /tmp sticks for the next command)
  • Exported environment variables (export FOO=bar)
  • Shell variables (MY_VAR=hello)

Precedence:

Level Variable Default
Config terminal.persistent_shell true
SSH override TERMINAL_SSH_PERSISTENT follows config
Local override TERMINAL_LOCAL_PERSISTENT false

Per-backend environment variables take highest precedence. If you want persistent shell on the local backend too:

export TERMINAL_LOCAL_PERSISTENT=true

:::note Commands that require stdin_data or sudo automatically fall back to one-shot mode, since the persistent shell's stdin is already occupied by the IPC protocol. :::

See Code Execution and the Terminal section of the README for details on each backend.

Skill Settings

Skills can declare their own configuration settings via their SKILL.md frontmatter. These are non-secret values (paths, preferences, domain settings) stored under the skills.config namespace in config.yaml.

skills:
  config:
    myplugin:
      path: ~/myplugin-data   # Example — each skill defines its own keys

How skill settings work:

  • hermes config migrate scans all enabled skills, finds unconfigured settings, and offers to prompt you
  • hermes config show displays all skill settings under "Skill Settings" with the skill they belong to
  • When a skill loads, its resolved config values are injected into the skill context automatically

Setting values manually:

hermes config set skills.config.myplugin.path ~/myplugin-data

For details on declaring config settings in your own skills, see Creating Skills — Config Settings.

Guard on agent-created skill writes

When the agent uses skill_manage to create, edit, patch, or delete a skill, Hermes can optionally scan the new/updated content for dangerous keyword patterns (credential harvesting, obvious prompt injection, exfil instructions). The scanner is off by default — real agent workflows that legitimately touch ~/.ssh/ or mention $OPENAI_API_KEY were tripping the heuristic too often. Turn it back on if you want the scanner to prompt you before the agent's skill writes land:

skills:
  guard_agent_created: true   # default: false

When on, any flagged skill_manage write surfaces as an approval prompt with the scanner's rationale. Accepted writes land; denied writes return an explanatory error to the agent.

Memory Configuration

memory:
  memory_enabled: true
  user_profile_enabled: true
  memory_char_limit: 2200   # ~800 tokens
  user_char_limit: 1375     # ~500 tokens

File Read Safety

Controls how much content a single read_file call can return. Reads that exceed the limit are rejected with an error telling the agent to use offset and limit for a smaller range. This prevents a single read of a minified JS bundle or large data file from flooding the context window.

file_read_max_chars: 100000  # default — ~25-35K tokens

Raise it if you're on a model with a large context window and frequently read big files. Lower it for small-context models to keep reads efficient:

# Large context model (200K+)
file_read_max_chars: 200000

# Small local model (16K context)
file_read_max_chars: 30000

The agent also deduplicates file reads automatically — if the same file region is read twice and the file hasn't changed, a lightweight stub is returned instead of re-sending the content. This resets on context compression so the agent can re-read files after their content is summarized away.

Tool Output Truncation Limits

Three related caps control how much raw output a tool can return before Hermes truncates it:

tool_output:
  max_bytes: 50000        # terminal output cap (chars)
  max_lines: 2000         # read_file pagination cap
  max_line_length: 2000   # per-line cap in read_file's line-numbered view
  • max_bytes — When a terminal command produces more than this many characters of combined stdout/stderr, Hermes keeps the first 40% and last 60% and inserts a [OUTPUT TRUNCATED] notice between them. Default 50000 (≈12-15K tokens across typical tokenisers).
  • max_lines — Upper bound on the limit parameter of a single read_file call. Requests above this are clamped so a single read can't flood the context window. Default 2000.
  • max_line_length — Per-line cap applied when read_file emits the line-numbered view. Lines longer than this are truncated to this many chars followed by ... [truncated]. Default 2000.

Raise the limits on models with large context windows that can afford more raw output per call. Lower them for small-context models to keep tool results compact:

# Large context model (200K+)
tool_output:
  max_bytes: 150000
  max_lines: 5000

# Small local model (16K context)
tool_output:
  max_bytes: 20000
  max_lines: 500

Global Toolset Disable

To suppress specific toolsets across the CLI and every gateway platform in one place, list their names under agent.disabled_toolsets:

agent:
  disabled_toolsets:
    - memory       # hide memory tools + MEMORY_GUIDANCE injection
    - web          # no web_search / web_extract anywhere

This applies after per-platform tool config (platform_toolsets written by hermes tools), so a toolset listed here is always removed — even if a platform's saved config still lists it. Use this when you want a single switch for "turn X off everywhere" rather than editing 15+ platform rows in the hermes tools UI.

Leaving the list empty, or omitting the key, is a no-op.

Git Worktree Isolation

Enable isolated git worktrees for running multiple agents in parallel on the same repo:

worktree: true    # Always create a worktree (same as hermes -w)
# worktree: false # Default — only when -w flag is passed

When enabled, each CLI session creates a fresh worktree under .worktrees/ with its own branch. Agents can edit files, commit, push, and create PRs without interfering with each other. Clean worktrees are removed on exit; dirty ones are kept for manual recovery.

You can also list gitignored files to copy into worktrees via .worktreeinclude in your repo root:

# .worktreeinclude
.env
.venv/
node_modules/

Context Compression

Hermes automatically compresses long conversations to stay within your model's context window. The compression summarizer is a separate LLM call — you can point it at any provider or endpoint.

All compression settings live in config.yaml (no environment variables).

Full reference

compression:
  enabled: true                                     # Toggle compression on/off
  threshold: 0.50                                   # Compress at this % of context limit
  target_ratio: 0.20                                # Fraction of threshold to preserve as recent tail
  protect_last_n: 20                                # Min recent messages to keep uncompressed
  hygiene_hard_message_limit: 400                   # Gateway safety valve — see below

# The summarization model/provider is configured under auxiliary:
auxiliary:
  compression:
    model: "google/gemini-3-flash-preview"          # Model for summarization
    provider: "auto"                                # Provider: "auto", "openrouter", "nous", "codex", "main", etc.
    base_url: null                                  # Custom OpenAI-compatible endpoint (overrides provider)

:::info Legacy config migration Older configs with compression.summary_model, compression.summary_provider, and compression.summary_base_url are automatically migrated to auxiliary.compression.* on first load (config version 17). No manual action needed. :::

hygiene_hard_message_limit is a gateway-only pre-compression safety valve. Runaway sessions with thousands of messages can hit model context limits before the normal percent-of-context threshold fires; when message count crosses this ceiling, Hermes forces compression regardless of token usage. Default 400 — raise it for platforms where very long sessions are normal, lower it to force more aggressive compression. Editing this value on a running gateway takes effect on the next message (see below).

:::tip Gateway hot-reload of compression and context length As of recent releases, editing model.context_length or any compression.* key in config.yaml on a running gateway takes effect on the next message — no gateway restart, no /reset, no session rotation required. The cached-agent signature includes these keys, so the gateway transparently rebuilds the agent when it sees a change. API keys and tool/skill config still require the usual reload paths. :::

Common setups

Default (auto-detect) — no configuration needed:

compression:
  enabled: true
  threshold: 0.50

Uses your main provider and main model. Override per-task (e.g. auxiliary.compression.provider: openrouter + model: google/gemini-2.5-flash) if you want compression on a cheaper model than your main chat model.

Force a specific provider (OAuth or API-key based):

auxiliary:
  compression:
    provider: nous
    model: gemini-3-flash

Works with any provider: nous, openrouter, codex, anthropic, main, etc.

Custom endpoint (self-hosted, Ollama, zai, DeepSeek, etc.):

auxiliary:
  compression:
    model: glm-4.7
    base_url: https://api.z.ai/api/coding/paas/v4

Points at a custom OpenAI-compatible endpoint. Uses OPENAI_API_KEY for auth.

How the three knobs interact

auxiliary.compression.provider auxiliary.compression.base_url Result
auto (default) not set Auto-detect best available provider
nous / openrouter / etc. not set Force that provider, use its auth
any set Use the custom endpoint directly (provider ignored)

:::warning Summary model context length requirement The summary model must have a context window at least as large as your main agent model's. The compressor sends the full middle section of the conversation to the summary model — if that model's context window is smaller than the main model's, the summarization call will fail with a context length error. When this happens, the middle turns are dropped without a summary, losing conversation context silently. If you override the model, verify its context length meets or exceeds your main model's. :::

Context Engine

The context engine controls how conversations are managed when approaching the model's token limit. The built-in compressor engine uses lossy summarization (see Context Compression). Plugin engines can replace it with alternative strategies.

context:
  engine: "compressor"    # default — built-in lossy summarization

To use a plugin engine (e.g., LCM for lossless context management):

context:
  engine: "lcm"          # must match the plugin's name

Plugin engines are never auto-activated — you must explicitly set context.engine to the plugin name. Available engines can be browsed and selected via hermes plugins → Provider Plugins → Context Engine.

See Memory Providers for the analogous single-select system for memory plugins.

Iteration Budget Pressure

When the agent is working on a complex task with many tool calls, it can burn through its iteration budget (default: 90 turns) without realizing it's running low. Budget pressure automatically warns the model as it approaches the limit:

Threshold Level What the model sees
70% Caution [BUDGET: 63/90. 27 iterations left. Start consolidating.]
90% Warning [BUDGET WARNING: 81/90. Only 9 left. Respond NOW.]

Warnings are injected into the last tool result's JSON (as a _budget_warning field) rather than as separate messages — this preserves prompt caching and doesn't disrupt the conversation structure.

agent:
  max_turns: 90                # Max iterations per conversation turn (default: 90)
  api_max_retries: 2           # Retries per provider before fallback engages (default: 2)

Budget pressure is enabled by default. The agent sees warnings naturally as part of tool results, encouraging it to consolidate its work and deliver a response before running out of iterations.

When the iteration budget is fully exhausted, the CLI shows a notification to the user: ⚠ Iteration budget reached (90/90) — response may be incomplete. If the budget runs out during active work, the agent generates a summary of what was accomplished before stopping.

agent.api_max_retries controls how many times Hermes retries a provider API call on transient errors (rate limits, connection drops, 5xx) before fallback-provider switching engages. The default is 2 — three attempts total, matching the OpenAI SDK default. If you have fallback providers configured and want to fail over faster, drop this to 0 so the first transient error on your primary immediately hands off to the fallback instead of churning retries against the flaky endpoint.

API Timeouts

Hermes has separate timeout layers for streaming, plus a stale detector for non-streaming calls. The stale detectors auto-adjust for local providers only when you leave them at their implicit defaults.

Timeout Default Local providers Config / env
Socket read timeout 120s Auto-raised to 1800s HERMES_STREAM_READ_TIMEOUT
Stale stream detection 180s Auto-disabled HERMES_STREAM_STALE_TIMEOUT
Stale non-stream detection 300s Auto-disabled when left implicit providers.<id>.stale_timeout_seconds or HERMES_API_CALL_STALE_TIMEOUT
API call (non-streaming) 1800s Unchanged providers.<id>.request_timeout_seconds / timeout_seconds or HERMES_API_TIMEOUT

The socket read timeout controls how long httpx waits for the next chunk of data from the provider. Local LLMs can take minutes for prefill on large contexts before producing the first token, so Hermes raises this to 30 minutes when it detects a local endpoint. If you explicitly set HERMES_STREAM_READ_TIMEOUT, that value is always used regardless of endpoint detection.

The stale stream detection kills connections that receive SSE keep-alive pings but no actual content. This is disabled entirely for local providers since they don't send keep-alive pings during prefill.

The stale non-stream detection kills non-streaming calls that produce no response for too long. By default Hermes disables this on local endpoints to avoid false positives during long prefills. If you explicitly set providers.<id>.stale_timeout_seconds, providers.<id>.models.<model>.stale_timeout_seconds, or HERMES_API_CALL_STALE_TIMEOUT, that explicit value is honored even on local endpoints.

Context Pressure Warnings

Separate from iteration budget pressure, context pressure tracks how close the conversation is to the compaction threshold — the point where context compression fires to summarize older messages. This helps both you and the agent understand when the conversation is getting long.

Progress Level What happens
≥ 60% to threshold Info CLI shows a cyan progress bar; gateway sends an informational notice
≥ 85% to threshold Warning CLI shows a bold yellow bar; gateway warns compaction is imminent

In the CLI, context pressure appears as a progress bar in the tool output feed:

  ◐ context ████████████░░░░░░░░ 62% to compaction  48k threshold (50%) · approaching compaction

On messaging platforms, a plain-text notification is sent:

◐ Context: ████████████░░░░░░░░ 62% to compaction (threshold: 50% of window).

If auto-compression is disabled, the warning tells you context may be truncated instead.

Context pressure is automatic — no configuration needed. It fires purely as a user-facing notification and does not modify the message stream or inject anything into the model's context.

Credential Pool Strategies

When you have multiple API keys or OAuth tokens for the same provider, configure the rotation strategy:

credential_pool_strategies:
  openrouter: round_robin    # cycle through keys evenly
  anthropic: least_used      # always pick the least-used key

Options: fill_first (default), round_robin, least_used, random. See Credential Pools for full documentation.

Auxiliary Models

Hermes uses "auxiliary" models for side tasks like image analysis, web page summarization, browser screenshot analysis, session-title generation, and context compression. By default (auxiliary.*.provider: "auto"), Hermes routes every auxiliary task to your main chat model — the same provider/model you picked in hermes model. You don't need to configure anything to get started, but be aware that on expensive reasoning models (Opus, MiniMax M2.7, etc.) auxiliary tasks add meaningful cost. If you want cheap-and-fast side tasks regardless of your main model, set auxiliary.<task>.provider and auxiliary.<task>.model explicitly (for example, Gemini Flash on OpenRouter for vision and web extraction).

:::note Why "auto" uses your main model Earlier builds split aggregator users (OpenRouter, Nous Portal) onto a cheap provider-side default. That was surprising — users who paid for an aggregator subscription would see a different model handling their auxiliary traffic. auto now uses the main model for everyone, and per-task overrides in config.yaml still win (see Full auxiliary config reference below). :::

Configuring auxiliary models interactively

Instead of hand-editing YAML, run hermes model and pick "Configure auxiliary models" from the menu. You'll get an interactive per-task picker:

$ hermes model
→ Configure auxiliary models

[ ] vision               currently: auto / main model
[ ] web_extract          currently: auto / main model
[ ] session_search       currently: openrouter / google/gemini-2.5-flash
[ ] title_generation     currently: openrouter / google/gemini-3-flash-preview
[ ] compression          currently: auto / main model
[ ] approval             currently: auto / main model

Select a task, pick a provider (OAuth flows open a browser; API-key providers prompt), pick a model. The change persists to auxiliary.<task>.* in config.yaml. Same machinery as the main-model picker — no extra syntax to learn.

Video Tutorial