changes from feedback

docs(security): rewrite policy around OS-level isolation as the boundary
Restate the trust model from first principles: the OS is the only load-bearing boundary against an adversarial LLM. Distinguish terminal-backend isolation (sandboxes the shell tool) from whole-process wrapping (sandboxes the agent itself, reference deployment NVIDIA OpenShell). Name in-process components (approval gate, output redaction, Skills Guard) as heuristics, and the class of reports that defeat them as out of scope under this policy — while explicitly welcoming them as regular issues or PRs. Introduce 'agent-loaded content' as the narrow, honest commitment: attacker-influenced input must not chain into a write the agent later loads on its own initiative. Strip implementation-detail enumerations (backend names, adapter names, config keys, env vars, internal symbols) so the doc stays evergreen as code evolves.
2026-05-07 19:26:56 +08:00 · 2026-05-05 22:45:12 -04:00 · 2026-05-05 12:46:51 -04:00
1 changed files with 302 additions and 55 deletions
--- a/SECURITY.md
+++ b/SECURITY.md
@@ -1,84 +1,331 @@
 # Hermes Agent Security Policy

-This document outlines the security protocols, trust model, and deployment hardening guidelines for the **Hermes Agent** project.
+This document describes Hermes Agent's trust model, names the one
+security boundary the project treats as load-bearing, and defines the
+scope for vulnerability reports.

-## 1. Vulnerability Reporting
+## 1. Reporting a Vulnerability

-Hermes Agent does **not** operate a bug bounty program. Security issues should be reported via [GitHub Security Advisories (GHSA)](https://github.com/NousResearch/hermes-agent/security/advisories/new) or by emailing **security@nousresearch.com**. Do not open public issues for security vulnerabilities.
+Report privately via [GitHub Security Advisories](https://github.com/NousResearch/hermes-agent/security/advisories/new)
+or **security@nousresearch.com**. Do not open public issues for
+security vulnerabilities. **Hermes Agent does not operate a bug
+bounty program.**

-### Required Submission Details
- **Title & Severity:** Concise description and CVSS score/rating.
- **Affected Component:** Exact file path and line range (e.g., `tools/approval.py:120-145`).
- **Environment:** Output of `hermes version`, commit SHA, OS, and Python version.
- **Reproduction:** Step-by-step Proof-of-Concept (PoC) against `main` or the latest release.
- **Impact:** Explanation of what trust boundary was crossed.
+A useful report includes:
+
+- A concise description and severity assessment.
+- The affected component, identified by file path and line range
+  (e.g. `path/to/file.py:120-145`).
+- Environment details (`hermes version`, commit SHA, OS, Python
+  version).
+- A reproduction against `main` or the latest release.
+- A statement of which trust boundary in §2 is crossed.
+
+Please read §2 and §3 before submitting. Reports that demonstrate
+limits of an in-process heuristic this policy does not treat as a
+boundary will be closed as out-of-scope under §3 — but see §3.2:
+they are still welcome as regular issues or pull requests, just not
+through the private security channel.

 ---

 ## 2. Trust Model

-The core assumption is that Hermes is a **personal agent** with one trusted operator.
+Hermes Agent is a single-tenant personal agent. Its posture is
+layered, and the layers are not equally load-bearing. Reporters and
+operators should reason about them in the same terms.

-### Operator & Session Trust
- **Single Tenant:** The system protects the operator from LLM actions, not from malicious co-tenants. Multi-user isolation must happen at the OS/host level.
- **Gateway Security:** Authorized callers (Telegram, Discord, Slack, etc.) receive equal trust. Session keys are used for routing, not as authorization boundaries.
- **Execution:** Defaults to `terminal.backend: local` (direct host execution). Container isolation (Docker, Modal, Daytona) is opt-in for sandboxing.
+### 2.1 Definitions

-### Dangerous Command Approval
-The approval system (`tools/approval.py`) is a core security boundary. Terminal commands, file operations, and other potentially destructive actions are gated behind explicit user confirmation before execution. The approval mode is configurable via `approvals.mode` in `config.yaml`:
- `"on"` (default) — prompts the user to approve dangerous commands.
- `"auto"` — auto-approves after a configurable delay.
- `"off"` — disables the gate entirely (break-glass; see Section 3).
+- **Agent process.** The Python interpreter running Hermes Agent,
+  including any Python modules it has loaded (skills, plugins,
+  hook handlers).
+- **Terminal backend.** A pluggable execution target for the
+  `terminal()` tool. The default runs commands directly on the host.
+  Other backends run commands inside a container, cloud sandbox, or
+  remote host.
+- **Input surface.** Any channel through which content enters the
+  agent's context: operator input, web fetches, email, gateway
+  messages, file reads, MCP server responses, tool results.
+- **Trust envelope.** The set of resources an operator has implicitly
+  granted Hermes Agent access to by running it — typically, whatever
+  the operator's own user account can reach on the host.
+- **Stance.** An explicit statement in Hermes Agent's documentation
+  or code about how a consuming layer (adapter, UI, file writer,
+  shell) should treat agent output — e.g. "the dashboard renders
+  agent output as inert HTML."

-### Output Redaction
-`agent/redact.py` strips secret-like patterns (API keys, tokens, credentials) from all display output before it reaches the terminal or gateway platform. This prevents accidental credential leakage in chat logs, tool previews, and response text. Redaction operates on the display layer only — underlying values remain intact for internal agent operations.
+### 2.2 The Boundary: OS-Level Isolation

-### Skills vs. MCP Servers
- **Installed Skills:** High trust. Equivalent to local host code; skills can read environment variables and run arbitrary commands.
- **MCP Servers:** Lower trust. MCP subprocesses receive a filtered environment (`_build_safe_env()` in `tools/mcp_tool.py`) — only safe baseline variables (`PATH`, `HOME`, `XDG_*`) plus variables explicitly declared in the server's `env` config block are passed through. Host credentials are stripped by default. Additionally, packages invoked via `npx`/`uvx` are checked against the OSV malware database before spawning.
+**The only security boundary against an adversarial LLM is the
+operating system.** Nothing inside the agent process constitutes
+containment — not the approval gate, not output redaction, not any
+pattern scanner, not any tool allowlist. Any in-process component
+that screens LLM output is a heuristic operating on an
+attacker-influenced string, and this policy treats it as such.

-### Code Execution Sandbox
-The `execute_code` tool (`tools/code_execution_tool.py`) runs LLM-generated Python scripts in a child process with API keys and tokens stripped from the environment to prevent credential exfiltration. Only environment variables explicitly declared by loaded skills (via `env_passthrough`) or by the user in `config.yaml` (`terminal.env_passthrough`) are passed through. The child accesses Hermes tools via RPC, not direct API calls.
+Hermes Agent supports two OS-level isolation postures. They address
+different threats and an operator should choose deliberately.

-### Subagents
- **No recursive delegation:** The `delegate_task` tool is disabled for child agents.
- **Depth limit:** `MAX_DEPTH = 2` — parent (depth 0) can spawn a child (depth 1); grandchildren are rejected.
- **Memory isolation:** Subagents run with `skip_memory=True` and do not have access to the parent's persistent memory provider. The parent receives only the task prompt and final response as an observation.
+#### Terminal-backend isolation
+
+A non-default terminal backend runs LLM-emitted shell commands
+inside a container, remote host, or cloud sandbox. The file tools
+(`read_file`, `write_file`, `patch`) also run through this backend,
+since they are implemented on top of the shell contract — they
+cannot reach paths the backend doesn't expose.
+
+What this confines: anything the agent does by issuing shell or
+file operations. What this does **not** confine: everything the
+agent does in its own Python process. That includes the
+code-execution tool (spawned as a host subprocess), MCP subprocesses
+(spawned from the agent's environment), plugin loading, hook
+dispatch, and skill loading (all imported into the agent
+interpreter).
+
+Terminal-backend isolation is the right posture when the concern is
+LLM-emitted destructive shell or unwanted file-tool writes, and the
+operator is otherwise trusted.
+
+#### Whole-process wrapping
+
+Whole-process wrapping runs the entire agent process tree inside a
+sandbox. Every code path — shell, code-execution, MCP, file tools,
+plugins, hooks, skill loading — is subject to the same filesystem,
+network, process, and (where applicable) inference policy.
+
+Hermes Agent supports this in two ways:
+
+- **Hermes Agent's own Docker image and Compose setup.** Lighter-
+  weight; the agent runs in a standard container with operator-
+  configured mounts and network policy.
+- **[NVIDIA OpenShell](https://github.com/NVIDIA/OpenShell)**.
+  OpenShell provides per-session sandboxes with declarative policy
+  across filesystem, network (L7 egress), process/syscall, and
+  inference-routing layers. Network and inference policies are
+  hot-reloadable. Credentials are injected from a Provider store
+  and never touch the sandbox filesystem.
+
+Under a whole-process wrapper, Hermes Agent's in-process heuristics
+(§2.4) function as accident-prevention layered on top of a real
+boundary. This is the supported posture when the agent ingests
+content from surfaces the operator does not control — the open web,
+inbound email, multi-user channels, untrusted MCP servers — and for
+production or shared deployments.
+
+Operators running the default local backend with untrusted input
+surfaces, or running a terminal-backend sandbox and expecting it to
+contain code paths that don't go through the shell, are operating
+outside the supported security posture.
+
+### 2.3 Credential Scoping
+
+Hermes Agent filters the environment it passes to its lower-trust
+in-process components: shell subprocesses, MCP subprocesses, and
+the code-execution child. Credentials like provider API keys and
+gateway tokens are stripped by default; variables explicitly
+declared by the operator or by a loaded skill are passed through.
+
+This reduces casual exfiltration. It is not containment. Any
+component running inside the agent process (skills, plugins, hook
+handlers) can read whatever the agent itself can read, including
+in-memory credentials. The mitigation against a compromised
+in-process component is operator review before install (§2.4,
+§2.5), not environment scrubbing.
+
+### 2.4 In-Process Heuristics
+
+The following components screen or warn about LLM behavior. They
+are useful. They are not boundaries.
+
+- The **approval gate** detects common destructive shell patterns
+  and prompts the operator before execution. Shell is Turing-
+  complete; a denylist over shell strings is structurally
+  incomplete. The gate catches cooperative-mode mistakes, not
+  adversarial output.
+- **Output redaction** strips secret-like patterns from display.
+  A motivated output producer will defeat it.
+- **Skills Guard** scans installable skill content for injection
+  patterns. It is a review aid; the boundary for third-party skills
+  is operator review before install. Reviewing a skill means
+  reading its Python code and scripts, not just its SKILL.md
+  description — skills execute arbitrary Python at import time.
+
+### 2.5 Plugin Trust Model
+
+Plugins load into the agent process and run with full agent
+privileges: they can read the same credentials, call the same
+tools, register the same hooks, and import the same modules as
+anything shipped in-tree. The boundary for third-party plugins is
+operator review before install — the same rule as skills (§2.4),
+called out separately because plugins are architecturally heavier
+and often ship their own background services, network listeners,
+and dependencies.
+
+A malicious or buggy plugin is not a vulnerability in Hermes Agent
+itself. Bugs in Hermes Agent's plugin-install or plugin-discovery
+path that prevent the operator from seeing what they're installing
+are in scope under §3.1.
+
+### 2.6 External Surfaces
+
+An **external surface** is any channel outside the local agent
+process through which a caller can dispatch agent work, resolve
+approvals, or receive agent output. Each surface has its own
+authorization model, but the rules below apply uniformly.
+
+**Surfaces in Hermes Agent:**
+
+- **Gateway platform adapters.** Messaging integrations in
+  `gateway/platforms/` (Telegram, Discord, Slack, email, SMS, etc.)
+  and analogous adapters shipped as plugins.
+- **Network-exposed HTTP surfaces.** The API server adapter, the
+  dashboard plugin, the kanban plugin's HTTP endpoints, and any
+  other plugin that binds a listening socket.
+- **Editor / IDE adapters.** The ACP adapter (`acp_adapter/`) and
+  equivalent integrations that accept requests from a local client
+  process.
+- **The TUI gateway (`tui_gateway/`).** JSON-RPC backend for the
+  Ink terminal UI, reached over local IPC.
+
+**Uniform rules:**
+
+1. **Authorization is required at every surface that crosses a
+   trust boundary.** For messaging and network HTTP surfaces, the
+   boundary is the network: authorization means an operator-
+   configured caller allowlist. For editor and local-IPC surfaces
+   (ACP, TUI gateway), the boundary is the host's user account:
+   authorization means relying on OS-level access control (file
+   permissions, loopback-only binds) and not exposing the surface
+   beyond the local user without an explicit network auth layer.
+2. **An allowlist is required for every enabled network-exposed
+   adapter.** Adapters must refuse to dispatch agent work, resolve
+   approvals, or relay output until an allowlist is set. Code paths
+   that fail open when no allowlist is configured are code bugs in
+   scope under §3.1.
+3. **Session identifiers are routing handles, not authorization
+   boundaries.** Knowing another caller's session ID does not grant
+   access to their approvals or output; authorization is always
+   re-checked against the allowlist (or OS-level equivalent).
+4. **Within the authorized set, all callers are equally trusted.**
+   Hermes Agent does not model per-caller capabilities inside a
+   single adapter. Operators who need capability separation should
+   run separate agent instances with separate allowlists.
+5. **Binding a local-only surface to a non-loopback interface is a
+   break-glass operator decision (§3.2).** The dashboard and other
+   plugin HTTP servers default to loopback; exposing them via
+   `--host 0.0.0.0` or equivalent makes public-exposure hardening
+   (§4) the operator's responsibility.

 ---

-## 3. Out of Scope (Non-Vulnerabilities)
+## 3. Scope

-The following scenarios are **not** considered security breaches:
- **Prompt Injection:** Unless it results in a concrete bypass of the approval system, toolset restrictions, or container sandbox.
- **Public Exposure:** Deploying the gateway to the public internet without external authentication or network protection.
- **Trusted State Access:** Reports that require pre-existing write access to `~/.hermes/`, `.env`, or `config.yaml` (these are operator-owned files).
- **Default Behavior:** Host-level command execution when `terminal.backend` is set to `local` — this is the documented default, not a vulnerability.
- **Configuration Trade-offs:** Intentional break-glass settings such as `approvals.mode: "off"` or `terminal.backend: local` in production.
- **Tool-level read/access restrictions:** The agent has unrestricted shell access via the `terminal` tool by design. Reports that a specific tool (e.g., `read_file`) can access a resource are not vulnerabilities if the same access is available through `terminal`. Tool-level deny lists only constitute a meaningful security boundary when paired with equivalent restrictions on the terminal side (as with write operations, where `WRITE_DENIED_PATHS` is paired with the dangerous command approval system).
+### 3.1 In Scope
+
+- Escape from a declared OS-level isolation posture (§2.2): an
+  attacker-controlled code path reaching state that the posture
+  claimed to confine.
+- Unauthorized external-surface access: a caller outside the
+  configured authorization set (allowlist, or OS-level equivalent
+  for local-IPC surfaces) dispatching work, receiving output, or
+  resolving approvals (§2.6).
+- Credential exfiltration: leakage of operator credentials or
+  session authorization material to a destination outside the
+  trust envelope, via a mechanism that should have prevented it
+  (environment scrubbing bug, adapter logging, transport error
+  that flushes credentials to an upstream, etc.).
+- Trust-model documentation violations: code behaving contrary to
+  what this policy, Hermes Agent's own documentation, or reasonable
+  operator expectations would predict — including cases where
+  Hermes Agent has documented a stance about how its output should
+  be rendered by a consuming layer (dashboard, gateway adapter,
+  file writer, shell) and a code path breaks that stance.
+
+### 3.2 Out of Scope
+
+"Out of scope" here means "not a security vulnerability under this
+policy." It does not mean "not worth reporting." Improvements to the
+in-process heuristics, hardening ideas, and UX fixes are welcome as
+regular issues or pull requests — the approval gate can always catch
+more patterns, redaction can always get smarter, adapter behavior
+can always be tightened. These items just don't go through the
+private-disclosure channel and don't receive advisories.
+
+- **Bypasses of in-process heuristics (§2.4)** — approval-gate regex
+  bypasses, redaction bypasses, Skills Guard pattern bypasses, and
+  analogous reports against future heuristics. These components are
+  not boundaries; defeating them is not a vulnerability under this
+  policy.
+- **Prompt injection per se.** Getting the LLM to emit unusual
+  output — via injected content, hallucination, training artifacts,
+  or any other cause — is not itself a vulnerability. "I achieved
+  prompt injection" without a chained §3.1 outcome is not an
+  actionable report under this policy.
+- **Consequences of a chosen isolation posture.** Reports that a
+  code path operating within its posture's scope can do what that
+  posture permits are not vulnerabilities. Examples: shell or file
+  tools reaching host state under the local backend; code-execution
+  or MCP subprocesses reaching host state under terminal-backend
+  isolation that only sandboxes shell; reports whose preconditions
+  require pre-existing write access to operator-owned configuration
+  or credential files (those are already inside the trust envelope).
+- **Documented break-glass settings.** Operator-selected trade-offs
+  that explicitly disable protections: `--insecure` and equivalent
+  flags on the dashboard or other components, disabled approvals,
+  local backend in production, development profiles that bypass
+  hermes-home security, and similar. Reports against those
+  configurations are not vulnerabilities — that's the flag's job.
+- **Community-contributed skills and plugins.** Third-party skills
+  (including the community skills repository) and third-party
+  plugins are in the operator's review surface, not Hermes Agent's
+  trust surface (§2.4, §2.5). A skill or plugin doing something
+  malicious is the expected failure mode of one that wasn't
+  reviewed, not a vulnerability in Hermes Agent. Bugs in Hermes
+  Agent's skill-install or plugin-install path that prevent the
+  operator from seeing what they're installing are in scope under
+  §3.1.
+- **Public exposure without external controls.** Exposing the
+  gateway or API to the public internet without authentication,
+  VPN, or firewall.
+- **Tool-level read/write restrictions on a posture where shell is
+  permitted.** If a path is reachable via the terminal tool, reports
+  that other file tools can reach it add nothing.

 ---

-## 4. Deployment Hardening & Best Practices
+## 4. Deployment Hardening

-### Filesystem & Network
- **Production sandboxing:** Use container backends (`docker`, `modal`, `daytona`) instead of `local` for untrusted workloads.
- **File permissions:** Run as non-root (the Docker image uses UID 10000); protect credentials with `chmod 600 ~/.hermes/.env` on local installs.
- **Network exposure:** Do not expose the gateway or API server to the public internet without VPN, Tailscale, or firewall protection. SSRF protection is enabled by default across all gateway platform adapters (Telegram, Discord, Slack, Matrix, Mattermost, etc.) with redirect validation. Note: the local terminal backend does not apply SSRF filtering, as it operates within the trusted operator's environment.
+The single most important hardening decision is matching isolation
+(§2.2) to the trust of the content the agent will ingest. Beyond
+that:

-### Skills & Supply Chain
- **Skill installation:** Review Skills Guard reports (`tools/skills_guard.py`) before installing third-party skills. The audit log at `~/.hermes/skills/.hub/audit.log` tracks every install and removal.
- **MCP safety:** OSV malware checking runs automatically for `npx`/`uvx` packages before MCP server processes are spawned.
- **CI/CD:** GitHub Actions are pinned to full commit SHAs. The `supply-chain-audit.yml` workflow blocks PRs containing `.pth` files or suspicious `base64`+`exec` patterns.
-
-### Credential Storage
- API keys and tokens belong exclusively in `~/.hermes/.env` — never in `config.yaml` or checked into version control.
- The credential pool system (`agent/credential_pool.py`) handles key rotation and fallback. Credentials are resolved from environment variables, not stored in plaintext databases.
+- Run the agent as a non-root user. The supplied container image
+  does this by default.
+- Keep credentials in the operator credential file with tight
+  permissions, never in the main config, never in version control.
+  Under OpenShell, use the Provider store rather than an on-disk
+  credential file.
+- Do not expose the gateway or API to the public internet without
+  VPN, Tailscale, or firewall protection. Under OpenShell, use the
+  network policy layer to restrict egress.
+- Configure a caller allowlist for every network-exposed adapter
+  you enable (§2.6).
+- Review third-party skills and plugins before install (§2.4,
+  §2.5). For skills, this means reading the Python and scripts,
+  not just SKILL.md. Skills Guard reports and the install audit
+  log are the review surface.
+- Hermes Agent includes supply-chain guards for MCP server
+  launches and for dependency / bundled-package changes in CI; see
+  `CONTRIBUTING.md` for specifics.

 ---

-## 5. Disclosure Process
+## 5. Disclosure

- **Coordinated Disclosure:** 90-day window or until a fix is released, whichever comes first.
- **Communication:** All updates occur via the GHSA thread or email correspondence with security@nousresearch.com.
- **Credits:** Reporters are credited in release notes unless anonymity is requested.
+- **Coordinated disclosure window:** 90 days from report, or until a
+  fix is released, whichever comes first.
+- **Channel:** the GHSA thread or email correspondence with
+  security@nousresearch.com.
+- **Credit:** reporters are credited in release notes unless
+  anonymity is requested.