Compare commits

..

2 Commits

Author SHA1 Message Date
Brooklyn Nicholson
3d8be3e84d feat(lint): observability for the LSP bridge with steady-state silence
Adds per-call structured logging on the dedicated ``hermes.lint.lsp``
logger so an opt-in user can answer "did LSP fire on that edit?" with
``rg 'lsp\['`` against ``~/.hermes/logs/agent.log``. Levels are tuned
so a 1000-write session emits exactly ONE INFO line at the default
threshold, not 1000.

Level model
-----------
* ``DEBUG`` (invisible at the default INFO threshold) for every per-call
  steady-state event: ``clean``, ``feature off``, ``extension not
  mapped``, ``backend not local``, repeated ``no project root`` for an
  already-announced file, repeated ``server unavailable`` for an
  already-announced binary.
* ``INFO`` for state transitions worth surfacing once: ``active for
  <root>`` the first time a (language, project_root) client starts,
  ``no project root for <path>`` the first time we see that orphan
  file. Plus every ``N diags`` event — diagnostics are inherently rare
  per-edit and are exactly the failure signal users want to grep for.
* ``WARNING`` for action-required failures the first time per
  (language, binary): ``server unavailable`` (binary not on PATH),
  ``no server configured``. Per-call ``WARNING`` for timeouts, server
  errors, and unexpected bridge exceptions — these are inherently
  novel events, not steady state, and each one is its own signal.

Dedup is in-process module-level sets guarded by a lock. Sets grow at
most by the number of distinct (language, project_root) and (language,
binary) pairs touched in one Python process — a few hundred entries in
the most aggressive monorepo session, which is bytes of memory. A
bounded LRU was rejected because evicting an entry would risk
re-firing the WARNING/INFO line we explicitly want to suppress.

Why this matters
----------------
The previous draft logged every per-call event at INFO. ``agent.log``
caps at 5 MB × 3 backups (= 20 MB) via ``RotatingFileHandler``, so
nothing would crash, but a normal coding session would dwarf the actual
signal under hundreds of ``lsp[typescript] clean (...)`` lines. The
new model preserves the verification answer ("LSP active for <root>")
and the action-required signals while keeping clean steady state out
of the user's face.

Tests
-----
* ``TestLogLevelsSteadyState`` — feature off, unmapped extension, non-
  local backend, and repeated clean writes all stay at DEBUG. Exactly
  one INFO ("active for ...") survives across N calls.
* ``TestLogLevelsNovelEvents`` — diagnostics are INFO per call;
  ``active for`` fires once per (language, root).
* ``TestLogLevelsActionRequired`` — server unavailable warns once per
  binary; orphan files INFO once per path; timeouts WARN every time.
2026-05-12 00:28:30 -04:00
Brooklyn Nicholson
9a546d6b08 feat(lint): opt-in LSP-backed lint path in _check_lint
The post-edit lint hook in `tools/file_operations.py::_check_lint` has
historically resolved `.ts`/`.tsx` to `npx tsc --noEmit <single-file>`,
which has no view of the surrounding project and floods the agent with
phantom "Cannot find module" errors that disappear the moment
`tsconfig.json` is in scope. Same shape for `.go` (`go vet` orphan) and
`.rs` (`rustfmt --check` is style, not types).

This change adds an opt-in LSP path that runs *before* the in-process /
shell linter table:

- `tools/lsp_client.py` — stdio JSON-RPC client with Content-Length
  framing, per-(language, project_root, server_cmd) cache, idle reaper.
  Sync API (`diagnostics(path, content)`) returns one snapshot via
  `textDocument/didOpen` + `publishDiagnostics` with a configurable
  settle window for servers that emit "indexing" diagnostics first.
  Hand-rolled rather than pulled from `multilspy` / `sansio-lsp-client`
  / `pygls` — see the module docstring for the comparison.
- `tools/lsp_lint.py` — bridge between `_check_lint` and the client.
  Returns `None` (caller falls through to the existing shell linter)
  whenever LSP cannot or should not handle the file: feature off,
  language unmapped, env not local, no project root marker, server
  binary missing, request timed out. Never raises.
- `tools/file_operations.py::_check_lint` — calls the bridge first,
  falls through to the in-process and shell linters unchanged. With
  the flag off (the default) this is one extra `import` on the lint
  path and zero behaviour change.
- `hermes_cli/config.py` — new `lint.lsp.{enabled,servers,...}` block,
  off by default. New top-level key, so the deep-merge handles
  upgrades and no `_config_version` bump is required.

Out of scope for this PR (explicitly noted in the module docstrings):

- Container / SSH / Modal / Daytona backends. The server has to run
  inside the sandbox, which means baking it into the image or
  installing on first use; that's the gnarly half and gets its own PR.
- didChange / live editing. We re-open with the freshly written bytes
  on every call.
- Pull diagnostics (LSP 3.17). Push works on every server we care
  about today.

Tests:

- `test_lsp_client.py` — pure helpers (project root walk, URI, framing,
  Diagnostic conversion) + end-to-end coverage against an in-tree fake
  LSP server (Python script over stdio) for clean / dirty / settle /
  timeout / registry caching / shutdown.
- `test_lsp_lint_bridge.py` — feature flag, backend gating, project
  root gating, language-map gating, error containment.
- `test_check_lint_lsp_routing.py` — regression guard that the default
  (LSP disabled) path still hits in-process / shell linters, plus
  proof that an enabled bridge short-circuits the shell linter.
2026-05-12 00:13:12 -04:00
2391 changed files with 60027 additions and 368917 deletions

View File

@@ -8,10 +8,6 @@ node_modules
**/node_modules
.venv
**/.venv
.notebooklm-cli-venv/
.notebooklm-playwright/
.pip-cache/
.uv-cache/
# Built artifacts that are regenerated inside the image. Excluded so local
# rebuilds on the developer's machine don't invalidate the npm-install layer
@@ -29,8 +25,6 @@ ui-tui/packages/hermes-ink/dist/
# Runtime data (bind-mounted at /opt/data; must not leak into build context)
data/
.hermes-docker/
.notebooklm-home/
# Compose/profile runtime state (bind-mounted; avoid ownership/secret issues)
hermes-config/

View File

@@ -14,14 +14,6 @@
# LLM_MODEL is no longer read from .env — this line is kept for reference only.
# LLM_MODEL=anthropic/claude-opus-4.6
# =============================================================================
# LLM PROVIDER (NovitaAI)
# =============================================================================
# NovitaAI — 90+ models, pay-per-use
# Get your key at: https://novita.ai/settings/key-management
# NOVITA_API_KEY=
# NOVITA_BASE_URL=https://api.novita.ai/openai/v1 # Override default base URL
# =============================================================================
# LLM PROVIDER (Google AI Studio / Gemini)
# =============================================================================
@@ -281,27 +273,6 @@ BROWSER_SESSION_TIMEOUT=300
# Browser sessions are automatically closed after this period of no activity
BROWSER_INACTIVITY_TIMEOUT=120
# Extra Chromium launch flags passed to agent-browser, comma- or newline-separated.
# Hermes auto-injects "--no-sandbox,--disable-dev-shm-usage" when it detects root
# or AppArmor-restricted unprivileged user namespaces (Ubuntu 23.10+, DGX Spark,
# many container images), so leave this unset unless you need extra flags.
# Setting this disables the auto-injection.
# AGENT_BROWSER_ARGS=--no-sandbox
# Camofox local anti-detection browser (Camoufox-based Firefox).
# Set CAMOFOX_URL to route the browser tools through a local Camofox server
# instead of agent-browser/Browserbase. See docs/user-guide/features/browser.md.
# CAMOFOX_URL=http://localhost:9377
# Externally managed Camofox sessions — when another app owns the visible
# Camofox browser, set these so Hermes shares the same userId/profile instead
# of creating its own isolated session.
# CAMOFOX_USER_ID=
# CAMOFOX_SESSION_KEY=
# Set to true to reuse an already-open Camofox tab for this identity before
# creating a new one (useful for gateway restarts).
# CAMOFOX_ADOPT_EXISTING_TAB=false
# =============================================================================
# SESSION LOGGING
# =============================================================================
@@ -339,7 +310,6 @@ BROWSER_INACTIVITY_TIMEOUT=120
# TELEGRAM_ALLOWED_USERS= # Comma-separated user IDs
# TELEGRAM_HOME_CHANNEL= # Default chat for cron delivery
# TELEGRAM_HOME_CHANNEL_NAME= # Display name for home channel
# TELEGRAM_CRON_THREAD_ID= # Forum topic ID for cron deliveries; overrides TELEGRAM_HOME_CHANNEL_THREAD_ID for cron so replies work in topic mode
# Webhook mode (optional — for cloud deployments like Fly.io/Railway)
# Default is long polling. Setting TELEGRAM_WEBHOOK_URL switches to webhook mode.
@@ -395,6 +365,24 @@ IMAGE_TOOLS_DEBUG=false
# CONTEXT_COMPRESSION_THRESHOLD=0.85 # Compress at 85% of context limit
# Model is set via compression.summary_model in config.yaml (default: google/gemini-3-flash-preview)
# =============================================================================
# RL TRAINING (Tinker + Atropos)
# =============================================================================
# Run reinforcement learning training on language models using the Tinker API.
# Requires the rl-server to be running (from tinker-atropos package).
# Tinker API Key - RL training service
# Get at: https://tinker-console.thinkingmachines.ai/keys
# TINKER_API_KEY=
# Weights & Biases API Key - Experiment tracking and metrics
# Get at: https://wandb.ai/authorize
# WANDB_API_KEY=
# RL API Server URL (default: http://localhost:8080)
# Change if running the rl-server on a different host/port
# RL_API_URL=http://localhost:8080
# =============================================================================
# SKILLS HUB (GitHub integration for skill search/install/publish)
# =============================================================================

View File

@@ -29,13 +29,9 @@ runs:
- name: hermes --help
shell: bash
run: |
# Use the image's real ENTRYPOINT (/init + main-wrapper.sh) so
# this exercises the actual production startup path. PR #30136
# review caught that an --entrypoint override here had been
# silently neutered by the s6-overlay migration — stage2-hook
# ignores its CMD args, so the smoke test was a no-op.
docker run --rm \
-v /tmp/hermes-test:/opt/data \
--entrypoint /opt/hermes/docker/entrypoint.sh \
"${{ inputs.image }}" --help
- name: hermes dashboard --help
@@ -47,4 +43,5 @@ runs:
# installed package.
docker run --rm \
-v /tmp/hermes-test:/opt/data \
--entrypoint /opt/hermes/docker/entrypoint.sh \
"${{ inputs.image }}" dashboard --help

View File

@@ -16,7 +16,7 @@ jobs:
check-attribution:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
- uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4
with:
fetch-depth: 0 # Full history needed for git log

View File

@@ -22,12 +22,7 @@ concurrency:
jobs:
deploy-vercel:
# Triggered automatically on release publish (production cuts) and
# manually via `gh workflow run deploy-site.yml` when an out-of-band
# main commit needs to ship live before the next release tag — e.g.
# a skills-index PR that doesn't touch website/** paths and so
# doesn't auto-deploy via the deploy-docs path.
if: github.event_name == 'release' || github.event_name == 'workflow_dispatch'
if: github.event_name == 'release'
runs-on: ubuntu-latest
steps:
- name: Trigger Vercel Deploy
@@ -40,7 +35,7 @@ jobs:
name: github-pages
url: ${{ steps.deploy.outputs.page_url }}
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
- uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4
- uses: actions/setup-node@49933ea5288caeca8642d1e84afbd3f7d6820020 # v4
with:
@@ -48,30 +43,27 @@ jobs:
cache: npm
cache-dependency-path: website/package-lock.json
- uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405 # v6.2.0
- uses: actions/setup-python@a26af69be951a213d495a4c3e4e4022e16d87065 # v5
with:
python-version: '3.11'
- name: Install PyYAML for skill extraction
run: pip install pyyaml==6.0.2 httpx==0.28.1
- name: Build skills index (unified multi-source catalog)
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
# Always rebuild — the file isn't committed (gitignored), so a
# fresh checkout starts without it and we want the freshest crawl
# in every deploy. Failure is non-fatal: extract-skills.py will
# fall back to the legacy snapshot cache and the Skills Hub page
# still renders, just without the latest community catalog.
python3 scripts/build_skills_index.py || echo "Skills index build failed (non-fatal)"
- name: Extract skill metadata for dashboard
run: python3 website/scripts/extract-skills.py
- name: Regenerate per-skill docs pages + catalogs
run: python3 website/scripts/generate-skill-docs.py
- name: Build skills index (if not already present)
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
if [ ! -f website/static/api/skills-index.json ]; then
python3 scripts/build_skills_index.py || echo "Skills index build failed (non-fatal)"
fi
- name: Install dependencies
run: npm ci
working-directory: website

View File

@@ -1,68 +0,0 @@
name: Docker / shell lint
# Lints the container build inputs: Dockerfile (via hadolint) and any shell
# scripts under docker/ (via shellcheck). These catch the class of regression
# the behavioral docker-publish smoke test can't — unquoted variable
# expansions, silently-failing RUN commands, etc.
#
# Rules and ignores are documented in .hadolint.yaml at the repo root.
# shellcheck severity is pinned to `error` so SC1091-style "can't follow
# sourced script" info-level warnings don't fail the job — the .venv
# activate script doesn't exist at lint time.
on:
push:
branches: [main]
paths:
- Dockerfile
- docker/**
- .hadolint.yaml
- .github/workflows/docker-lint.yml
pull_request:
branches: [main]
paths:
- Dockerfile
- docker/**
- .hadolint.yaml
- .github/workflows/docker-lint.yml
permissions:
contents: read
concurrency:
group: docker-lint-${{ github.ref }}
cancel-in-progress: true
jobs:
hadolint:
name: Lint Dockerfile (hadolint)
runs-on: ubuntu-latest
timeout-minutes: 5
steps:
- name: Checkout code
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
- name: hadolint
uses: hadolint/hadolint-action@54c9adbab1582c2ef04b2016b760714a4bfde3cf # v3.1.0
with:
dockerfile: Dockerfile
config: .hadolint.yaml
failure-threshold: warning
shellcheck:
name: Lint docker/ shell scripts (shellcheck)
runs-on: ubuntu-latest
timeout-minutes: 5
steps:
- name: Checkout code
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
- name: shellcheck
uses: ludeeus/action-shellcheck@00cae500b08a931fb5698e11e79bfbd38e612a38 # v2.0.0
env:
# Severity = error: SC1091 (can't follow sourced script) is info-
# level and would otherwise fail when the venv activate script
# doesn't exist at lint time.
SHELLCHECK_OPTS: --severity=error
with:
scandir: ./docker

View File

@@ -27,10 +27,10 @@ on:
permissions:
contents: read
# Concurrency: push/release runs are NEVER cancelled so every merge gets
# its own image. PR runs reuse a PR-scoped group with
# cancel-in-progress: true so rapid pushes to the same PR collapse to the
# latest commit.
# Concurrency: push/release runs are NEVER cancelled so every merge gets its
# own SHA-tagged image; :latest is guarded separately by the move-latest job.
# PR runs reuse a PR-scoped group with cancel-in-progress: true so rapid
# pushes to the same PR collapse to the latest commit.
concurrency:
group: docker-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: ${{ github.event_name == 'pull_request' }}
@@ -53,7 +53,7 @@ jobs:
digest: ${{ steps.push.outputs.digest }}
steps:
- name: Checkout code
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4
with:
submodules: recursive
@@ -64,15 +64,13 @@ jobs:
# to gha with a per-arch scope; the push step below reuses every
# layer from this build.
- name: Build image (amd64, smoke test)
uses: docker/build-push-action@bcafcacb16a39f128d818304e6c9c0c18556b85f # v7.1.0
uses: docker/build-push-action@10e90e3645eae34f1e60eeb005ba3a3d33f178e8 # v6
with:
context: .
file: Dockerfile
load: true
platforms: linux/amd64
tags: ${{ env.IMAGE_NAME }}:test
build-args: |
HERMES_GIT_SHA=${{ github.sha }}
cache-from: type=gha,scope=docker-amd64
cache-to: type=gha,mode=max,scope=docker-amd64
@@ -81,59 +79,9 @@ jobs:
with:
image: ${{ env.IMAGE_NAME }}:test
# ---------------------------------------------------------------------
# Run the docker-integration test suite against the freshly-built
# image already loaded into the local daemon (`:test`). These tests
# are excluded from the sharded `tests.yml :: test` matrix on purpose
# (see `_SKIP_PARTS` in scripts/run_tests_parallel.py) because each
# shard would otherwise reach the session-scoped ``built_image``
# fixture in ``tests/docker/conftest.py`` and start a 3-7min
# ``docker build`` under a 180s pytest-timeout cap — guaranteed to
# die in fixture setup.
#
# Piggybacking here avoids a second image build: the smoke test
# already proved the image loads + runs, so the daemon has it under
# `${IMAGE_NAME}:test` and we just point ``HERMES_TEST_IMAGE`` at
# that. The fixture's ``HERMES_TEST_IMAGE`` branch (see
# tests/docker/conftest.py:62-63) short-circuits the rebuild.
#
# Why this job and not a standalone one: the image is 5GB+; passing
# it between jobs via ``docker save``/``upload-artifact`` is slower
# than the build itself. Reusing the existing daemon state is the
# cheapest path to coverage on every PR that touches docker code.
# ---------------------------------------------------------------------
- name: Install uv (for docker tests)
uses: astral-sh/setup-uv@d4b2f3b6ecc6e67c4457f6d3e41ec42d3d0fcb86 # v5
- name: Set up Python 3.11 (for docker tests)
run: uv python install 3.11
- name: Install Python dependencies (for docker tests)
run: |
uv venv .venv --python 3.11
source .venv/bin/activate
# ``dev`` extra pulls in pytest, pytest-asyncio, pytest-timeout —
# everything tests/docker/ needs. We deliberately avoid ``all``
# here because the docker tests only drive the container via
# subprocess and don't import hermes_agent's optional deps.
uv pip install -e ".[dev]"
- name: Run docker integration tests
env:
# Skip rebuild; use the image already loaded by the build step.
HERMES_TEST_IMAGE: ${{ env.IMAGE_NAME }}:test
# Match the policy in tests.yml :: test job — no accidental
# real-API calls from inside the harness.
OPENROUTER_API_KEY: ""
OPENAI_API_KEY: ""
NOUS_API_KEY: ""
run: |
source .venv/bin/activate
python -m pytest tests/docker/ -v --tb=short
- name: Log in to Docker Hub
if: github.event_name == 'push' && github.ref == 'refs/heads/main' || github.event_name == 'release'
uses: docker/login-action@4907a6ddec9925e35a0a9e82d7399ccc52663121 # v4.1.0
uses: docker/login-action@c94ce9fb468520275223c153574b00df6fe4bcc9 # v3
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }}
@@ -141,18 +89,22 @@ jobs:
# Push amd64 by digest only (no tag). The merge job assembles the
# tagged manifest list. `push-by-digest=true` is docker's recommended
# pattern for multi-runner multi-platform builds.
#
# We apply the OCI revision label here (and again on arm64) because
# the move-latest job reads it off the linux/amd64 sub-manifest config
# of `:latest` to decide whether it's safe to advance. The label must
# be on each per-arch image — manifest lists themselves don't carry
# image config labels.
- name: Push amd64 by digest
id: push
if: github.event_name == 'push' && github.ref == 'refs/heads/main' || github.event_name == 'release'
uses: docker/build-push-action@bcafcacb16a39f128d818304e6c9c0c18556b85f # v7.1.0
uses: docker/build-push-action@10e90e3645eae34f1e60eeb005ba3a3d33f178e8 # v6
with:
context: .
file: Dockerfile
platforms: linux/amd64
labels: |
org.opencontainers.image.revision=${{ github.sha }}
build-args: |
HERMES_GIT_SHA=${{ github.sha }}
outputs: type=image,name=${{ env.IMAGE_NAME }},push-by-digest=true,name-canonical=true,push=true
cache-from: type=gha,scope=docker-amd64
cache-to: type=gha,mode=max,scope=docker-amd64
@@ -189,42 +141,24 @@ jobs:
digest: ${{ steps.push.outputs.digest }}
steps:
- name: Checkout code
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4
with:
submodules: recursive
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@8d2750c68a42422c14e847fe6c8ac0403b4cbd6f # v3
# Build once, load into the local daemon for smoke testing. PR arm64
# builds deliberately avoid the gha cache: cold-cache arm64 builds can
# outlive GitHub's short-lived Azure cache SAS token, then fail while
# reading or writing cache blobs before the smoke test can run.
- name: Build image (arm64, smoke test, uncached PR)
if: github.event_name == 'pull_request'
uses: docker/build-push-action@bcafcacb16a39f128d818304e6c9c0c18556b85f # v7.1.0
# Build once, load into the local daemon for smoke testing. Cached
# to gha with a per-arch scope; the push step below reuses every
# layer from this build.
- name: Build image (arm64, smoke test)
uses: docker/build-push-action@10e90e3645eae34f1e60eeb005ba3a3d33f178e8 # v6
with:
context: .
file: Dockerfile
load: true
platforms: linux/arm64
tags: ${{ env.IMAGE_NAME }}:test
build-args: |
HERMES_GIT_SHA=${{ github.sha }}
# Main/release builds still use the per-arch gha cache so the digest
# push below can reuse layers from this smoke-test build.
- name: Build image (arm64, smoke test, cached publish)
if: github.event_name != 'pull_request'
uses: docker/build-push-action@bcafcacb16a39f128d818304e6c9c0c18556b85f # v7.1.0
with:
context: .
file: Dockerfile
load: true
platforms: linux/arm64
tags: ${{ env.IMAGE_NAME }}:test
build-args: |
HERMES_GIT_SHA=${{ github.sha }}
cache-from: type=gha,scope=docker-arm64
cache-to: type=gha,mode=max,scope=docker-arm64
@@ -235,7 +169,7 @@ jobs:
- name: Log in to Docker Hub
if: github.event_name == 'push' && github.ref == 'refs/heads/main' || github.event_name == 'release'
uses: docker/login-action@4907a6ddec9925e35a0a9e82d7399ccc52663121 # v4.1.0
uses: docker/login-action@c94ce9fb468520275223c153574b00df6fe4bcc9 # v3
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }}
@@ -243,15 +177,13 @@ jobs:
- name: Push arm64 by digest
id: push
if: github.event_name == 'push' && github.ref == 'refs/heads/main' || github.event_name == 'release'
uses: docker/build-push-action@bcafcacb16a39f128d818304e6c9c0c18556b85f # v7.1.0
uses: docker/build-push-action@10e90e3645eae34f1e60eeb005ba3a3d33f178e8 # v6
with:
context: .
file: Dockerfile
platforms: linux/arm64
labels: |
org.opencontainers.image.revision=${{ github.sha }}
build-args: |
HERMES_GIT_SHA=${{ github.sha }}
outputs: type=image,name=${{ env.IMAGE_NAME }},push-by-digest=true,name-canonical=true,push=true
cache-from: type=gha,scope=docker-arm64
cache-to: type=gha,mode=max,scope=docker-arm64
@@ -275,16 +207,16 @@ jobs:
# ---------------------------------------------------------------------------
# Stitch both per-arch digests into a single tagged multi-arch manifest.
# This is a registry-side operation — no building, no layer re-push —
# so it runs in ~30 seconds.
#
# On main pushes: tags both :main and :latest.
# On releases: tags :<release_tag_name>.
# so it runs in ~30 seconds. On main pushes it produces :sha-<sha>.
# On releases it produces :<release_tag_name>.
# ---------------------------------------------------------------------------
merge:
if: github.repository == 'NousResearch/hermes-agent' && (github.event_name == 'push' && github.ref == 'refs/heads/main' || github.event_name == 'release')
runs-on: ubuntu-latest
needs: [build-amd64, build-arm64]
timeout-minutes: 10
outputs:
pushed_sha_tag: ${{ steps.mark_pushed.outputs.pushed }}
steps:
- name: Download digests
uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4
@@ -297,39 +229,179 @@ jobs:
uses: docker/setup-buildx-action@8d2750c68a42422c14e847fe6c8ac0403b4cbd6f # v3
- name: Log in to Docker Hub
uses: docker/login-action@4907a6ddec9925e35a0a9e82d7399ccc52663121 # v4.1.0
uses: docker/login-action@c94ce9fb468520275223c153574b00df6fe4bcc9 # v3
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }}
# Compute the tag for this run. Main pushes use sha-<sha> (so every
# commit gets its own immutable tag); releases use the release tag name.
- name: Compute tag
id: tag
run: |
if [ "${{ github.event_name }}" = "release" ]; then
echo "tag=${{ github.event.release.tag_name }}" >> "$GITHUB_OUTPUT"
else
echo "tag=sha-${{ github.sha }}" >> "$GITHUB_OUTPUT"
fi
- name: Create manifest list and push
working-directory: /tmp/digests
run: |
set -euo pipefail
# Build the arg array from each digest file (filename = the digest
# hex, with no sha256: prefix; empty file content, only the name
# matters). Using an array avoids shellcheck SC2046 and keeps
# every digest a single argv token even under pathological names.
args=()
for digest_file in *; do
args+=("${IMAGE_NAME}@sha256:${digest_file}")
done
if [ "${{ github.event_name }}" = "release" ]; then
TAG="${{ github.event.release.tag_name }}"
docker buildx imagetools create \
-t "${IMAGE_NAME}:${TAG}" \
"${args[@]}"
else
docker buildx imagetools create \
-t "${IMAGE_NAME}:main" \
-t "${IMAGE_NAME}:latest" \
"${args[@]}"
fi
docker buildx imagetools create \
-t "${IMAGE_NAME}:${TAG}" \
"${args[@]}"
env:
IMAGE_NAME: ${{ env.IMAGE_NAME }}
TAG: ${{ steps.tag.outputs.tag }}
- name: Inspect image
run: |
if [ "${{ github.event_name }}" = "release" ]; then
docker buildx imagetools inspect "${IMAGE_NAME}:${{ github.event.release.tag_name }}"
else
docker buildx imagetools inspect "${IMAGE_NAME}:main"
fi
docker buildx imagetools inspect "${IMAGE_NAME}:${TAG}"
env:
IMAGE_NAME: ${{ env.IMAGE_NAME }}
TAG: ${{ steps.tag.outputs.tag }}
# Signal to move-latest that the SHA tag is live. Only on main pushes;
# releases don't trigger move-latest (they use their own release tag).
- name: Mark SHA tag pushed
id: mark_pushed
if: github.event_name == 'push' && github.ref == 'refs/heads/main'
run: echo "pushed=true" >> "$GITHUB_OUTPUT"
# ---------------------------------------------------------------------------
# Move :latest to point at the SHA tag the merge job pushed.
#
# The real serialization guarantee comes from the top-level concurrency
# group (`docker-${{ github.ref }}` with `cancel-in-progress: false`),
# which ensures at most one workflow run for this ref executes at a time.
# That means two move-latest steps for the same ref cannot overlap.
#
# This job has its own concurrency group as defense-in-depth: if the
# top-level group is ever loosened, queued move-latests will run serially
# in arrival order, each one running the ancestor check below and either
# advancing :latest or skipping. `cancel-in-progress: false` matches the
# top-level setting — we don't want rapid pushes to cancel a queued
# move-latest, because the ancestor check is the real safety mechanism
# and queueing is cheap (move-latest is a ~30s registry op).
#
# Combined with the ancestor check, this means :latest only ever moves
# forward in git history.
# ---------------------------------------------------------------------------
move-latest:
if: |
github.repository == 'NousResearch/hermes-agent'
&& github.event_name == 'push'
&& github.ref == 'refs/heads/main'
&& needs.merge.outputs.pushed_sha_tag == 'true'
needs: merge
runs-on: ubuntu-latest
timeout-minutes: 10
concurrency:
group: docker-move-latest-${{ github.ref }}
cancel-in-progress: false
steps:
- name: Checkout code
uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4
with:
fetch-depth: 1000
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@8d2750c68a42422c14e847fe6c8ac0403b4cbd6f # v3
- name: Log in to Docker Hub
uses: docker/login-action@c94ce9fb468520275223c153574b00df6fe4bcc9 # v3
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }}
# Read the git revision label off the current :latest manifest, then
# use `git merge-base --is-ancestor` to check whether our commit is a
# descendant of it. If :latest doesn't exist yet, or its label is
# missing, we treat that as "safe to publish". If another run already
# advanced :latest past us (or diverged), we skip and leave it alone.
- name: Decide whether to move :latest
id: latest_check
run: |
set -euo pipefail
image=nousresearch/hermes-agent
# Pull the JSON for the linux/amd64 sub-manifest's config and extract
# the OCI revision label with jq — Go template field access can't
# handle dots in map keys, so using json+jq is the robust route.
image_json=$(
docker buildx imagetools inspect "${image}:latest" \
--format '{{ json (index .Image "linux/amd64") }}' \
2>/dev/null || true
)
if [ -z "${image_json}" ]; then
echo "No existing :latest (or inspect failed) — safe to publish."
echo "push_latest=true" >> "$GITHUB_OUTPUT"
exit 0
fi
current_sha=$(
printf '%s' "${image_json}" \
| jq -r '.config.Labels."org.opencontainers.image.revision" // ""'
)
if [ -z "${current_sha}" ]; then
echo "Registry :latest has no revision label — safe to publish."
echo "push_latest=true" >> "$GITHUB_OUTPUT"
exit 0
fi
echo "Registry :latest is at ${current_sha}"
echo "This run is at ${GITHUB_SHA}"
if [ "${current_sha}" = "${GITHUB_SHA}" ]; then
echo ":latest already points at our SHA — nothing to do."
echo "push_latest=false" >> "$GITHUB_OUTPUT"
exit 0
fi
# Make sure we have the :latest commit locally for merge-base.
if ! git cat-file -e "${current_sha}^{commit}" 2>/dev/null; then
git fetch --no-tags --prune origin \
"+refs/heads/main:refs/remotes/origin/main" \
|| true
fi
if ! git cat-file -e "${current_sha}^{commit}" 2>/dev/null; then
echo "Registry :latest points at an unknown commit (${current_sha}); refusing to overwrite."
echo "push_latest=false" >> "$GITHUB_OUTPUT"
exit 0
fi
# Our SHA must be a descendant of the current :latest to be safe.
if git merge-base --is-ancestor "${current_sha}" "${GITHUB_SHA}"; then
echo "Our commit is a descendant of :latest — safe to advance."
echo "push_latest=true" >> "$GITHUB_OUTPUT"
else
echo "Another run advanced :latest past us (or diverged) — leaving it alone."
echo "push_latest=false" >> "$GITHUB_OUTPUT"
fi
# Retag the already-pushed SHA manifest as :latest. This is a registry-
# side operation — no rebuild, no layer re-push — so it's quick and
# atomic per-tag. The ancestor check above plus the cancel-in-progress
# concurrency on this job together guarantee we only ever move :latest
# forward in git history.
- name: Move :latest to this SHA
if: steps.latest_check.outputs.push_latest == 'true'
run: |
set -euo pipefail
image=nousresearch/hermes-agent
docker buildx imagetools create \
--tag "${image}:latest" \
"${image}:sha-${GITHUB_SHA}"

View File

@@ -14,7 +14,7 @@ jobs:
docs-site-checks:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
- uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4
- uses: actions/setup-node@49933ea5288caeca8642d1e84afbd3f7d6820020 # v4
with:
@@ -26,7 +26,7 @@ jobs:
run: npm ci
working-directory: website
- uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405 # v6.2.0
- uses: actions/setup-python@a26af69be951a213d495a4c3e4e4022e16d87065 # v5
with:
python-version: '3.11'

View File

@@ -1,58 +0,0 @@
name: History Check
# Rejects PRs whose branch has no common ancestor with main.
#
# In May 2026 PR #25045 was merged from a branch that had been disconnected
# from main's history (likely an accidental `git checkout --orphan` or
# `.git/` re-init). GitHub's merge UI does not refuse merges of unrelated
# histories, so the PR landed cleanly with the intended one-file change —
# but its parent-less root commit (413990c94) got grafted into main as a
# second root, and ~1500 files' worth of `git blame` history collapsed
# onto that single commit.
#
# This check catches the failure mode by requiring `git merge-base` between
# the PR head and main to be non-empty.
on:
pull_request:
branches: [main]
permissions:
contents: read
jobs:
check-common-ancestor:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
fetch-depth: 0 # full history both sides for merge-base
- name: Reject PRs with no common ancestor on main
run: |
# `git merge-base` exits non-zero AND prints nothing when the two
# commits share no ancestor. We check both conditions explicitly
# so the failure message is clear regardless of which signal fires
# first.
if ! BASE=$(git merge-base origin/main HEAD 2>/dev/null) || [ -z "$BASE" ]; then
echo ""
echo "::error::This PR has no common ancestor with main."
echo ""
echo "Your branch's history is disconnected from main. Common causes:"
echo " - the branch was created with 'git checkout --orphan'"
echo " - '.git/' was re-initialized at some point during the work"
echo " - the branch was force-pushed from an unrelated repository"
echo ""
echo "Merging an unrelated-history PR grafts a parent-less root commit"
echo "into main and collapses git blame for every file in that snapshot."
echo "Reference: PR #25045 caused this and re-rooted blame on ~1500"
echo "files to a single orphan commit."
echo ""
echo "To fix, rebase your changes onto current main:"
echo " git fetch origin main"
echo " git checkout -b fix-branch origin/main"
echo " # re-apply your changes (cherry-pick, copy files, etc.)"
echo " git push -f origin fix-branch"
exit 1
fi
echo "::notice::Common ancestor with main: $BASE"

View File

@@ -37,7 +37,7 @@ jobs:
timeout-minutes: 10
steps:
- name: Checkout code
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4
with:
fetch-depth: 0 # need full history for merge-base + worktree
@@ -167,7 +167,7 @@ jobs:
timeout-minutes: 5
steps:
- name: Checkout code
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4
- name: Install uv
uses: astral-sh/setup-uv@d4b2f3b6ecc6e67c4457f6d3e41ec42d3d0fcb86 # v5
@@ -191,10 +191,10 @@ jobs:
timeout-minutes: 5
steps:
- name: Checkout code
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4
- name: Set up Python
uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405 # v5
uses: actions/setup-python@0b93645e9fea7318ecaed2b359559ac225c90a2b # v5
with:
python-version: "3.11"

View File

@@ -56,7 +56,7 @@ jobs:
app-id: ${{ secrets.APP_ID }}
private-key: ${{ secrets.APP_PRIVATE_KEY }}
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
- uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4
with:
ref: main
token: ${{ steps.app-token.outputs.token }}
@@ -194,7 +194,7 @@ jobs:
Triggered by @${{ github.actor }} — [workflow run](${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}).
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
- uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4
with:
repository: ${{ steps.resolve.outputs.owner }}/${{ steps.resolve.outputs.repo }}
ref: ${{ steps.resolve.outputs.ref }}

View File

@@ -21,7 +21,7 @@ jobs:
runs-on: ${{ matrix.os }}
timeout-minutes: 30
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
- uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4
- uses: ./.github/actions/nix-setup
with:
cachix-auth-token: ${{ secrets.CACHIX_AUTH_TOKEN }}
@@ -37,16 +37,23 @@ jobs:
- name: Check flake
id: flake
if: runner.os == 'Linux'
continue-on-error: true
run: nix flake check --print-build-logs
# When the flake check fails, run a targeted diagnostic to see if
- name: Build package
id: build
if: runner.os == 'Linux'
continue-on-error: true
run: nix build --print-build-logs
# When the real Nix build fails, run a targeted diagnostic to see if
# the failure is specifically a stale npm lockfile hash in one of the
# known npm subpackages (tui / web). This avoids surfacing a generic
# "build failed" message when the fix is a single known command.
- name: Diagnose npm lockfile hashes
id: hash_check
if: steps.flake.outcome == 'failure' && runner.os == 'Linux'
if: (steps.flake.outcome == 'failure' || steps.build.outcome == 'failure') && runner.os == 'Linux'
continue-on-error: true
env:
LINK_SHA: ${{ steps.sha.outputs.full }}
@@ -81,25 +88,30 @@ jobs:
- Or [run the Nix Lockfile Fix workflow](${{ github.server_url }}/${{ github.repository }}/actions/workflows/nix-lockfile-fix.yml) manually (pass PR `#${{ github.event.pull_request.number }}`)
- Or locally: `nix run .#fix-lockfiles` and commit the diff
# Clear the sticky comment when either the flake check passed outright (no
# Clear the sticky comment when either the build passed outright (no
# hash check needed) or the hash check explicitly returned stale=false
# (check failed for a non-hash reason).
# (build failed for a non-hash reason).
- name: Clear sticky PR comment (resolved)
if: |
github.event_name == 'pull_request' &&
runner.os == 'Linux' &&
(steps.hash_check.outputs.stale == 'false' ||
steps.flake.outcome == 'success')
(steps.flake.outcome == 'success' && steps.build.outcome == 'success'))
uses: marocchino/sticky-pull-request-comment@52423e01640425a022ef5fd42c6fb5f633a02728 # v2.9.1
with:
header: nix-lockfile-check
delete: true
- name: Final fail if flake check failed
if: steps.flake.outcome == 'failure'
- name: Final fail if build or flake failed
if: steps.flake.outcome == 'failure' || steps.build.outcome == 'failure'
run: |
if [ "${{ steps.hash_check.outputs.stale }}" == "true" ]; then
echo "::error::Nix build failed due to stale npm lockfile hash. Run: nix run .#fix-lockfiles"
else
echo "::error::Nix flake check failed. See logs above."
echo "::error::Nix build/flake check failed. See logs above."
fi
exit 1
- name: Evaluate flake (macOS)
if: runner.os == 'macOS'
run: nix flake show --json > /dev/null

View File

@@ -56,7 +56,7 @@ permissions:
jobs:
scan:
name: Scan lockfiles
uses: google/osv-scanner-action/.github/workflows/osv-scanner-reusable.yml@9a498708959aeaef5ef730655706c5a1df1edbc2 # v2.3.8
uses: google/osv-scanner-action/.github/workflows/osv-scanner-reusable.yml@c51854704019a247608d928f370c98740469d4b5 # v2.3.5
with:
# Scan explicit lockfiles rather than recursing, so we only look at
# the three sources of truth and skip vendored / test / worktree dirs.

View File

@@ -1,149 +0,0 @@
name: Skills Index Freshness Check
# Belt-and-suspenders for the twice-daily build_skills_index pipeline.
# If the live /docs/api/skills-index.json ever goes more than 26 hours
# stale OR the file disappears entirely OR a major source has collapsed,
# this workflow opens a GitHub issue so we hear about it before users do.
#
# Triggered every 4 hours so we catch a stuck cron within one tick.
on:
schedule:
- cron: '0 */4 * * *'
workflow_dispatch:
permissions:
contents: read
issues: write
jobs:
check-freshness:
if: github.repository == 'NousResearch/hermes-agent'
runs-on: ubuntu-latest
steps:
- name: Probe live index
id: probe
run: |
set -e
URL="https://hermes-agent.nousresearch.com/docs/api/skills-index.json"
echo "Probing $URL"
# -L follows redirects; -f fails on HTTP errors; -s suppresses progress
if ! curl -fsSL -o /tmp/skills-index.json "$URL"; then
echo "status=fetch-failed" >> "$GITHUB_OUTPUT"
echo "detail=Could not download $URL" >> "$GITHUB_OUTPUT"
exit 0
fi
# Validate + extract generated_at and per-source counts
python3 <<'PY' >> "$GITHUB_OUTPUT"
import json, sys
from datetime import datetime, timezone
try:
with open("/tmp/skills-index.json") as f:
data = json.load(f)
except Exception as e:
print(f"status=parse-failed")
print(f"detail=JSON decode error: {e}")
sys.exit(0)
generated_at = data.get("generated_at", "")
total = data.get("skill_count", 0)
skills = data.get("skills", [])
if not isinstance(skills, list):
print("status=invalid-shape")
print(f"detail=skills field is not a list (got {type(skills).__name__})")
sys.exit(0)
# Per-source counts
from collections import Counter
by_src = Counter(s.get("source", "") for s in skills)
# Freshness
age_hours = None
try:
ts = datetime.fromisoformat(generated_at.replace("Z", "+00:00"))
age_hours = (datetime.now(timezone.utc) - ts).total_seconds() / 3600
except Exception:
pass
# Floors — same as build_skills_index.py EXPECTED_FLOORS.
floors = {
"skills.sh": 100,
"lobehub": 100,
"clawhub": 50,
"official": 50,
"github": 30,
"browse-sh": 50,
}
issues = []
if age_hours is not None and age_hours > 26:
issues.append(f"Index is {age_hours:.1f}h old (limit 26h)")
for src, floor in floors.items():
count = by_src.get(src, 0)
if src == "skills.sh":
count = by_src.get("skills.sh", 0) + by_src.get("skills-sh", 0)
if count < floor:
issues.append(f"{src}: {count} < {floor}")
if total < 1500:
issues.append(f"total skills: {total} < 1500")
if issues:
detail = "; ".join(issues)
print("status=degraded")
# GITHUB_OUTPUT doesn't allow newlines without explicit delimiter
print(f"detail={detail}")
else:
print("status=ok")
print(f"detail=Index OK — {total} skills, generated {generated_at}")
by_summary = ", ".join(f"{k}={v}" for k, v in by_src.most_common(8))
print(f"summary={by_summary}")
PY
- name: Report status
run: |
echo "Probe status: ${{ steps.probe.outputs.status }}"
echo "Detail: ${{ steps.probe.outputs.detail }}"
if [ -n "${{ steps.probe.outputs.summary }}" ]; then
echo "Summary: ${{ steps.probe.outputs.summary }}"
fi
- name: Open issue on degraded / failed probe
if: steps.probe.outputs.status != 'ok'
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
STATUS: ${{ steps.probe.outputs.status }}
DETAIL: ${{ steps.probe.outputs.detail }}
run: |
# Find existing open issue by title prefix so we don't spam — we
# append a comment instead of opening a new one each tick.
TITLE_PREFIX="[skills-index-watchdog]"
existing=$(gh issue list \
--repo "${{ github.repository }}" \
--state open \
--search "in:title \"$TITLE_PREFIX\"" \
--json number,title \
--jq '.[] | select(.title | startswith("'"$TITLE_PREFIX"'")) | .number' \
| head -1)
BODY="Automated freshness probe failed.
**Status:** \`$STATUS\`
**Detail:** $DETAIL
The Skills Hub at /docs/skills depends on \`/docs/api/skills-index.json\`.
The unified index is rebuilt by \`.github/workflows/skills-index.yml\` (cron 6/18 UTC)
and \`.github/workflows/deploy-site.yml\` (on every push affecting website/skills).
If this issue keeps reopening, check the latest runs:
- https://github.com/${{ github.repository }}/actions/workflows/skills-index.yml
- https://github.com/${{ github.repository }}/actions/workflows/deploy-site.yml
This issue was opened by \`.github/workflows/skills-index-freshness.yml\`. Close it once the underlying problem is fixed; the next probe will reopen if it's still broken."
if [ -n "$existing" ]; then
echo "Appending to existing issue #$existing"
gh issue comment "$existing" --repo "${{ github.repository }}" --body "Probe still failing at $(date -u +%FT%TZ): \`$STATUS\` — $DETAIL"
else
echo "Opening new watchdog issue"
gh issue create --repo "${{ github.repository }}" \
--title "$TITLE_PREFIX Skills index is stale or degraded ($STATUS)" \
--body "$BODY"
fi

View File

@@ -13,7 +13,6 @@ on:
permissions:
contents: read
actions: write # to trigger deploy-site.yml on schedule
jobs:
build-index:
@@ -21,9 +20,9 @@ jobs:
if: github.repository == 'NousResearch/hermes-agent'
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
- uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4
- uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405 # v6.2.0
- uses: actions/setup-python@a26af69be951a213d495a4c3e4e4022e16d87065 # v5
with:
python-version: '3.11'
@@ -42,15 +41,61 @@ jobs:
path: website/static/api/skills-index.json
retention-days: 7
# Re-trigger the docs deploy so the refreshed index lands on the live site.
# The deploy itself is owned by deploy-site.yml (which crawls and deploys
# everything in one pipeline); we just kick it on a schedule.
trigger-deploy:
deploy-with-index:
needs: build-index
if: github.event_name == 'schedule' || github.event_name == 'workflow_dispatch'
runs-on: ubuntu-latest
permissions:
pages: write
id-token: write
environment:
name: github-pages
url: ${{ steps.deploy.outputs.page_url }}
# Only deploy on schedule or manual trigger (not on every push to the script)
if: github.event_name == 'schedule' || github.event_name == 'workflow_dispatch'
steps:
- name: Trigger Deploy Site workflow
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: gh workflow run deploy-site.yml --repo ${{ github.repository }}
- uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4
- uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4
with:
name: skills-index
path: website/static/api/
- uses: actions/setup-node@49933ea5288caeca8642d1e84afbd3f7d6820020 # v4
with:
node-version: 20
cache: npm
cache-dependency-path: website/package-lock.json
- uses: actions/setup-python@a26af69be951a213d495a4c3e4e4022e16d87065 # v5
with:
python-version: '3.11'
- name: Install PyYAML for skill extraction
run: pip install pyyaml==6.0.2
- name: Extract skill metadata for dashboard
run: python3 website/scripts/extract-skills.py
- name: Install dependencies
run: npm ci
working-directory: website
- name: Build Docusaurus
run: npm run build
working-directory: website
- name: Stage deployment
run: |
mkdir -p _site/docs
cp -r landingpage/* _site/
cp -r website/build/* _site/docs/
echo "hermes-agent.nousresearch.com" > _site/CNAME
- name: Upload artifact
uses: actions/upload-pages-artifact@56afc609e74202658d3ffba0e8f6dda462b719fa # v3
with:
path: _site
- name: Deploy to GitHub Pages
id: deploy
uses: actions/deploy-pages@d6db90164ac5ed86f2b6aed7e0febac5b3c0c03e # v4

View File

@@ -11,7 +11,6 @@ on:
- '**/sitecustomize.py'
- '**/usercustomize.py'
- '**/__init__.pth'
- 'pyproject.toml'
permissions:
pull-requests: write
@@ -32,7 +31,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4
with:
fetch-depth: 0
@@ -47,17 +46,14 @@ jobs:
HEAD="${{ github.event.pull_request.head.sha }}"
# Added lines only, excluding lockfiles.
# Three-dot diff (base...head) diffs from the merge base to HEAD,
# so only changes introduced by this PR are included — not changes
# that landed on main after the PR branched off.
DIFF=$(git diff "$BASE"..."$HEAD" -- . ':!uv.lock' ':!*.lock' ':!package-lock.json' ':!yarn.lock' || true)
DIFF=$(git diff "$BASE".."$HEAD" -- . ':!uv.lock' ':!*.lock' ':!package-lock.json' ':!yarn.lock' || true)
FINDINGS=""
# --- .pth files (auto-execute on Python startup) ---
# The exact mechanism used in the litellm supply chain attack:
# https://github.com/BerriAI/litellm/issues/24512
PTH_FILES=$(git diff --name-only "$BASE"..."$HEAD" | grep '\.pth$' || true)
PTH_FILES=$(git diff --name-only "$BASE".."$HEAD" | grep '\.pth$' || true)
if [ -n "$PTH_FILES" ]; then
FINDINGS="${FINDINGS}
### 🚨 CRITICAL: .pth file added or modified
@@ -100,12 +96,7 @@ jobs:
# --- Install-hook files (setup.py/sitecustomize/usercustomize/__init__.pth) ---
# These execute during pip install or interpreter startup.
# Anchored at repo root: only the top-level setup.py/setup.cfg run during
# `pip install`, and only top-level sitecustomize.py/usercustomize.py are
# auto-loaded by the interpreter via site.py. Any nested file with the
# same name (e.g. hermes_cli/setup.py — the CLI setup wizard) is unrelated
# and produced false positives that trained reviewers to ignore the scanner.
SETUP_HITS=$(git diff --name-only "$BASE"..."$HEAD" | grep -E '^(setup\.py|setup\.cfg|sitecustomize\.py|usercustomize\.py|__init__\.pth)$' || true)
SETUP_HITS=$(git diff --name-only "$BASE".."$HEAD" | grep -E '(^|/)(setup\.py|setup\.cfg|sitecustomize\.py|usercustomize\.py|__init__\.pth)$' || true)
if [ -n "$SETUP_HITS" ]; then
FINDINGS="${FINDINGS}
### 🚨 CRITICAL: Install-hook file added or modified
@@ -146,68 +137,3 @@ jobs:
run: |
echo "::error::CRITICAL supply chain risk patterns detected in this PR. See the PR comment for details."
exit 1
dep-bounds:
name: Check PyPI dependency upper bounds
runs-on: ubuntu-latest
if: contains(github.event.pull_request.changed_files_url, 'pyproject.toml') || true
steps:
- name: Checkout
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
fetch-depth: 0
- name: Check for unbounded PyPI deps
id: bounds
run: |
set -euo pipefail
BASE="${{ github.event.pull_request.base.sha }}"
HEAD="${{ github.event.pull_request.head.sha }}"
# Only check added lines in pyproject.toml
ADDED=$(git diff "$BASE"..."$HEAD" -- pyproject.toml | grep '^+' | grep -v '^+++' || true)
if [ -z "$ADDED" ]; then
echo "found=false" >> "$GITHUB_OUTPUT"
exit 0
fi
# Match PyPI dep specs that have >= but no < ceiling.
# Pattern: "package>=version" without a following ",<" bound.
# Excludes git+ URLs (which use commit SHAs) and comments.
UNBOUNDED=$(echo "$ADDED" | grep -oE '"[a-zA-Z0-9_-]+(\[[^\]]*\])?>=[ 0-9.]+"' | grep -v ',<' || true)
if [ -n "$UNBOUNDED" ]; then
echo "found=true" >> "$GITHUB_OUTPUT"
echo "$UNBOUNDED" > /tmp/unbounded.txt
else
echo "found=false" >> "$GITHUB_OUTPUT"
fi
- name: Post unbounded dep warning
if: steps.bounds.outputs.found == 'true'
env:
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
BODY="## ⚠️ Unbounded PyPI Dependency Detected
This PR adds PyPI dependencies without a \`<next_major\` upper bound. Per our [supply chain policy](../blob/main/CONTRIBUTING.md#dependency-pinning-policy-supply-chain-hardening), all PyPI deps must be pinned as \`>=floor,<next_major\`.
**Unbounded specs found:**
\`\`\`
$(cat /tmp/unbounded.txt)
\`\`\`
**Fix:** Add an upper bound, e.g. \`\"package>=1.2.0,<2\"\`
---
*See PR #2810 and CONTRIBUTING.md for the full policy rationale.*"
gh pr comment "${{ github.event.pull_request.number }}" --body "$BODY" || echo "::warning::Could not post PR comment (expected for fork PRs)"
- name: Fail on unbounded deps
if: steps.bounds.outputs.found == 'true'
run: |
echo "::error::PyPI dependencies without upper bounds detected. Add <next_major ceiling per CONTRIBUTING.md policy."
exit 1

View File

@@ -23,35 +23,13 @@ concurrency:
jobs:
test:
runs-on: ubuntu-latest
timeout-minutes: 30
strategy:
fail-fast: false
matrix:
slice: [1, 2, 3, 4, 5, 6]
timeout-minutes: 20
steps:
- name: Checkout code
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4
- name: Restore duration cache
uses: actions/cache/restore@27d5ce7f107fe9357f9df03efb73ab90386fccae # v5.0.5
with:
path: test_durations.json
# Single stable key. main always overwrites, PRs always find it.
key: test-durations
- name: Install ripgrep (prebuilt binary)
run: |
set -euo pipefail
RG_VERSION=15.1.0
RG_SHA256=1c9297be4a084eea7ecaedf93eb03d058d6faae29bbc57ecdaf5063921491599
RG_TARBALL=ripgrep-${RG_VERSION}-x86_64-unknown-linux-musl.tar.gz
curl -sSfL -o "$RG_TARBALL" \
"https://github.com/BurntSushi/ripgrep/releases/download/${RG_VERSION}/${RG_TARBALL}"
echo "${RG_SHA256} ${RG_TARBALL}" | sha256sum -c -
tar -xzf "$RG_TARBALL"
sudo mv "ripgrep-${RG_VERSION}-x86_64-unknown-linux-musl/rg" /usr/local/bin/rg
rm -rf "$RG_TARBALL" "ripgrep-${RG_VERSION}-x86_64-unknown-linux-musl"
rg --version
- name: Install system dependencies
run: sudo apt-get update && sudo apt-get install -y ripgrep
- name: Install uv
uses: astral-sh/setup-uv@d4b2f3b6ecc6e67c4457f6d3e41ec42d3d0fcb86 # v5
@@ -65,99 +43,22 @@ jobs:
source .venv/bin/activate
uv pip install -e ".[all,dev]"
- name: Run tests (slice ${{ matrix.slice }}/6)
# Per-file isolation via scripts/run_tests_parallel.py: discovers
# every test_*.py file under tests/ (excluding integration/ + e2e/),
# then runs `python -m pytest <file>` in a freshly-spawned subprocess
# with bounded parallelism. No xdist, no shared workers, no
# module-level state leakage between files.
#
# Why per-file (not per-test): per-test spawn cost (~250ms × 17k
# tests = 70min CPU minimum) blew the wall-clock budget. Per-file
# spawn (~250ms × ~850 files = ~3.5min) fits while still giving
# every file a fresh interpreter — the only isolation boundary
# that matters in practice (cross-file leakage was the original
# flake source; intra-file is the test author's responsibility).
#
# Why drop xdist entirely: xdist's persistent workers accumulate
# state across files, which is exactly the leakage we wanted to
# fix. ThreadPoolExecutor + subprocess.run is ~60 lines and does
# the job with cleaner semantics.
#
# Matrix slicing (--slice I/N): files are distributed across 6
# jobs by cached duration (LPT algorithm) so each job gets
# roughly equal wall time. Without a cache, files default to 2s
# estimate and get split roughly evenly by count — still correct,
# just not perfectly balanced.
- name: Run tests
run: |
source .venv/bin/activate
python scripts/run_tests_parallel.py --slice ${{ matrix.slice }}/6
python -m pytest tests/ -q --ignore=tests/integration --ignore=tests/e2e --tb=short -n auto
env:
# Ensure tests don't accidentally call real APIs
OPENROUTER_API_KEY: ""
OPENAI_API_KEY: ""
NOUS_API_KEY: ""
- name: Upload per-slice durations
uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7.0.1
with:
name: test-durations-slice-${{ matrix.slice }}
path: test_durations.json
retention-days: 1
# Merge per-slice duration data into a single cache, so future runs
# (including PRs) get balanced slicing.
save-durations:
needs: test
if: always() && github.ref == 'refs/heads/main'
runs-on: ubuntu-latest
steps:
- name: Download all slice durations
uses: actions/download-artifact@3e5f45b2cfb9172054b4087a40e8e0b5a5461e7c # v8.0.1
with:
pattern: test-durations-slice-*
path: durations
merge-multiple: true
- name: Merge into single durations file
run: |
python3 -c "
import json, glob, os
merged = {}
for f in glob.glob('durations/*test_durations.json'):
with open(f) as fh:
merged.update(json.load(fh))
with open('test_durations.json', 'w') as fh:
json.dump(merged, fh, indent=2, sort_keys=True)
print(f'Merged {len(merged)} file durations')
"
- name: Save merged duration cache
uses: actions/cache/save@27d5ce7f107fe9357f9df03efb73ab90386fccae # v5.0.5
with:
path: test_durations.json
key: test-durations
e2e:
runs-on: ubuntu-latest
timeout-minutes: 15
timeout-minutes: 10
steps:
- name: Checkout code
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
- name: Install ripgrep (prebuilt binary)
run: |
set -euo pipefail
RG_VERSION=15.1.0
RG_SHA256=1c9297be4a084eea7ecaedf93eb03d058d6faae29bbc57ecdaf5063921491599
RG_TARBALL=ripgrep-${RG_VERSION}-x86_64-unknown-linux-musl.tar.gz
curl -sSfL -o "$RG_TARBALL" \
"https://github.com/BurntSushi/ripgrep/releases/download/${RG_VERSION}/${RG_TARBALL}"
echo "${RG_SHA256} ${RG_TARBALL}" | sha256sum -c -
tar -xzf "$RG_TARBALL"
sudo mv "ripgrep-${RG_VERSION}-x86_64-unknown-linux-musl/rg" /usr/local/bin/rg
rm -rf "$RG_TARBALL" "ripgrep-${RG_VERSION}-x86_64-unknown-linux-musl"
rg --version
uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4
- name: Install uv
uses: astral-sh/setup-uv@d4b2f3b6ecc6e67c4457f6d3e41ec42d3d0fcb86 # v5
@@ -178,4 +79,4 @@ jobs:
env:
OPENROUTER_API_KEY: ""
OPENAI_API_KEY: ""
NOUS_API_KEY: ""
NOUS_API_KEY: ""

View File

@@ -1,164 +0,0 @@
name: Publish to PyPI
# Triggered by CalVer tag pushes from scripts/release.py (e.g. v2026.5.15)
# Can also be triggered manually from the Actions tab as an escape hatch.
on:
push:
tags:
- 'v20*' # CalVer tags: v2026.5.15, v2026.5.15.2, etc.
workflow_dispatch:
inputs:
confirm_tag:
description: 'Tag to publish (e.g. v2026.5.15). Must already exist.'
required: true
type: string
# Restrict default token to read-only; each job escalates as needed.
permissions:
contents: read
# Prevent overlapping publishes (e.g. two same-day tags pushed quickly).
concurrency:
group: pypi-publish
cancel-in-progress: false
jobs:
build:
name: Build distribution 📦
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
persist-credentials: false
# On workflow_dispatch, check out the confirmed tag.
ref: ${{ inputs.confirm_tag || github.ref }}
fetch-tags: true
- name: Validate tag exists
if: github.event_name == 'workflow_dispatch'
run: |
if ! git tag -l "${{ inputs.confirm_tag }}" | grep -q .; then
echo "::error::Tag '${{ inputs.confirm_tag }}' does not exist in the repo"
exit 1
fi
- name: Set up Python
uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405 # v6.2.0
with:
python-version: '3.13'
- name: Install uv
uses: astral-sh/setup-uv@d0cc045d04ccac9d8b7881df0226f9e82c39688e # v6
- name: Set up Node.js
uses: actions/setup-node@49933ea5288caeca8642d1e84afbd3f7d6820020 # v4
with:
node-version: '22'
- name: Build web dashboard
run: cd web && npm ci && npm run build
- name: Build TUI bundle
run: cd ui-tui && npm ci && npm run build
- name: Bundle TUI into hermes_cli
run: |
mkdir -p hermes_cli/tui_dist
cp ui-tui/dist/entry.js hermes_cli/tui_dist/entry.js
- name: Verify frontend assets exist
run: |
test -f hermes_cli/web_dist/index.html || { echo "ERROR: web_dist not built"; exit 1; }
test -f hermes_cli/tui_dist/entry.js || { echo "ERROR: tui_dist not built"; exit 1; }
- name: Bundle install scripts into wheel
run: |
mkdir -p hermes_cli/scripts
cp scripts/install.sh hermes_cli/scripts/install.sh
cp scripts/install.ps1 hermes_cli/scripts/install.ps1
- name: Build wheel and sdist
run: uv build --sdist --wheel
- name: Upload distribution artifacts
uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4
with:
name: python-package-distributions
path: dist/
publish:
name: Publish to PyPI
needs: build
runs-on: ubuntu-latest
environment:
name: pypi
url: https://pypi.org/p/hermes-agent
permissions:
id-token: write # OIDC trusted publishing
steps:
- name: Download distribution artifacts
uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4
with:
name: python-package-distributions
path: dist/
- name: Publish to PyPI
uses: pypa/gh-action-pypi-publish@cef221092ed1bacb1cc03d23a2d87d1d172e277b # v1.14.0
with:
skip-existing: true
sign:
name: Sign and attach to GitHub Release
# Only runs on tag pushes — release.py creates the GitHub Release,
# and workflow_dispatch won't have a matching release to attach to.
if: startsWith(github.ref, 'refs/tags/')
needs: publish
runs-on: ubuntu-latest
permissions:
contents: write # attach assets to the existing release
id-token: write # sigstore signing
steps:
- name: Download distribution artifacts
uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4
with:
name: python-package-distributions
path: dist/
- name: Wait for GitHub Release to exist
env:
GITHUB_TOKEN: ${{ github.token }}
# release.py creates the GitHub Release after pushing the tag,
# but this workflow starts from the tag push — wait for it.
run: |
for i in $(seq 1 30); do
if gh release view "$GITHUB_REF_NAME" --repo "$GITHUB_REPOSITORY" >/dev/null 2>&1; then
echo "Release $GITHUB_REF_NAME found"
exit 0
fi
echo "Waiting for release... ($i/30)"
sleep 10
done
echo "::warning::Release $GITHUB_REF_NAME not found after 5 minutes — skipping signature upload"
echo "skip_sign=true" >> "$GITHUB_ENV"
- name: Sign with Sigstore
if: env.skip_sign != 'true'
uses: sigstore/gh-action-sigstore-python@04cffa1d795717b140764e8b640de88853c92acc # v3.3.0
with:
inputs: >-
./dist/*.tar.gz
./dist/*.whl
- name: Attach signed artifacts to GitHub Release
if: env.skip_sign != 'true'
env:
GITHUB_TOKEN: ${{ github.token }}
# release.py already created the GitHub Release — just upload
# the Sigstore signatures alongside the existing assets.
run: >-
gh release upload
"$GITHUB_REF_NAME" dist/*.sigstore.json
--repo "$GITHUB_REPOSITORY"
--clobber

View File

@@ -71,7 +71,7 @@ jobs:
timeout-minutes: 5
steps:
- name: Checkout code
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
uses: actions/checkout@34e114876b0b11c390a56381ad16ebd13914f8d5 # v4
- name: Install uv
uses: astral-sh/setup-uv@d4b2f3b6ecc6e67c4457f6d3e41ec42d3d0fcb86 # v5

27
.gitignore vendored
View File

@@ -3,7 +3,6 @@
/_pycache/
*.pyc*
__pycache__/
build/
.venv/
.vscode/
.env
@@ -13,21 +12,12 @@ build/
.env.production.local
.env.development
.env.test
.hermes-docker/
.notebooklm-home/
.notebooklm-cli-venv/
.notebooklm-playwright/
.pip-cache/
.uv-cache/
compose.hermes.local.yml
export*
__pycache__/model_tools.cpython-310.pyc
__pycache__/web_tools.cpython-310.pyc
logs/
data/
.pytest_cache/
test_durations.json
.pytest-cache/
tmp/
temp_vision_images/
hermes-*/*
@@ -79,21 +69,4 @@ mini-swe-agent/
.nix-stamps/
result
website/static/api/skills-index.json
# skills.json + skills-meta.json are build artifacts emitted by
# website/scripts/extract-skills.py during prebuild — keep them out of
# git for the same reason as skills-index.json (large, generated, change
# every build).
website/static/api/skills.json
website/static/api/skills-meta.json
models-dev-upstream/
hermes_cli/tui_dist/*
hermes_cli/scripts/
docs/superpowers/*
# Working directory for the Hermes Agent's session state (~/.hermes/ at runtime;
# also created in-repo when an agent operates in this checkout). Plans, audit
# logs, and per-session caches are never artifacts of the codebase.
.hermes/
# Tool Search live-test harness output — non-deterministic model transcripts,
# regenerated by scripts/tool_search_livetest.py. Never an artifact of the repo.
scripts/out/

3
.gitmodules vendored Normal file
View File

@@ -0,0 +1,3 @@
[submodule "tinker-atropos"]
path = tinker-atropos
url = https://github.com/nousresearch/tinker-atropos

View File

@@ -1,36 +0,0 @@
# hadolint configuration for the Hermes Agent Dockerfile.
# See https://github.com/hadolint/hadolint#configure for rules.
#
# We want hadolint to surface NEW Dockerfile lint regressions, but we
# don't want to rewrite the existing image to silence rules that are
# either intentional or pragmatic tradeoffs for this project. Each
# ignore below has a one-line justification.
failure-threshold: warning
ignored:
# Pin versions in apt get install. We intentionally don't pin common
# tools (curl, git, openssh-client, etc.) — security updates flow in
# via the periodic base-image rebuild, and pinning would lock us to
# superseded patch releases. Same rationale as nearly every distro-
# base official image (python, node, debian).
- DL3008
# Use WORKDIR to switch to a directory. The image uses `(cd web && …)`
# / `(cd ../ui-tui && …)` inline subshells for one-off build steps
# because they don't affect later RUN commands; promoting them to
# full WORKDIR switches with restores would obscure intent.
- DL3003
# Multiple consecutive RUN instructions. The `touch README.md` + `uv
# sync` split is intentional — `touch` is cheap, `uv sync` is the
# expensive layer-cached step we want isolated, and merging them
# would invalidate the cache for trivial changes.
- DL3059
# Last USER should not be root. /init (s6-overlay) runs as root so the
# stage2 hook can usermod/groupmod and chown the data volume per
# HERMES_UID at runtime; each supervised service then drops to the
# hermes user via `s6-setuidgid`.
- DL3002
# Require explicit base-image pins (SHA256) — we already do this.
trustedRegistries:
- docker.io
- ghcr.io

173
AGENTS.md
View File

@@ -56,6 +56,7 @@ hermes-agent/
├── tui_gateway/ # Python JSON-RPC backend for the TUI
├── acp_adapter/ # ACP server (VS Code / Zed / JetBrains integration)
├── cron/ # Scheduler — jobs.py, scheduler.py
├── environments/ # RL training environments (Atropos)
├── scripts/ # run_tests.sh, release.py, auxiliary scripts
├── website/ # Docusaurus docs site
└── tests/ # Pytest suite (~17k tests across ~900 files as of May 2026)
@@ -308,29 +309,6 @@ The registry handles schema collection, dispatch, availability checking, and err
---
## Dependency Pinning Policy
All dependencies must have upper bounds to limit supply-chain attack surface.
This policy was established after the litellm compromise (PR #2796, #2810) and
reinforced after the Mini Shai-Hulud worm campaign (May 2026).
| Source type | Treatment | Example |
|---|---|---|
| PyPI package | `>=floor,<next_major` | `"httpx>=0.28.1,<1"` |
| Git URL | Commit SHA | `git+https://...@<40-char-sha>` |
| GitHub Actions | Commit SHA + comment | `uses: actions/checkout@<sha> # v4` |
| CI-only pip | `==exact` | `pyyaml==6.0.2` |
**When adding a new dependency to `pyproject.toml`:**
1. Pin to `>=current_version,<next_major` for post-1.0 (e.g. `>=1.5.0,<2`).
2. For pre-1.0 packages, use `<0.(current_minor + 2)` (e.g. `>=0.29,<0.32`).
3. Never commit a bare `>=X.Y.Z` without a ceiling — CI and reviewers will reject it.
4. Run `uv lock` to regenerate `uv.lock` with hashes.
Reference: #2810 (bounds pass), #9801 (SHA pinning + audit CI).
---
## Adding Configuration
### config.yaml options:
@@ -535,17 +513,6 @@ generic plugin surface (new hook, new ctx method) — never hardcode
plugin-specific logic into core. PR #5295 removed 95 lines of hardcoded
honcho argparse from `main.py` for exactly this reason.
**No new in-tree memory providers (policy, May 2026):** the set of
built-in memory providers under `plugins/memory/` is closed. New memory
backends must ship as **standalone plugin repos** that users install
into `~/.hermes/plugins/` (or via pip entry points) — they implement
the same `MemoryProvider` ABC, register through the same discovery
path, and integrate via `hermes memory setup` / `post_setup()` without
landing in this tree. PRs that add a new directory under
`plugins/memory/` will be closed with a pointer to publish the
provider as its own repo. Existing in-tree providers stay; bug fixes
to them are welcome.
### Model-provider plugins (`plugins/model-providers/<name>/`)
Every inference backend (openrouter, anthropic, gmi, deepseek, nvidia, …)
@@ -613,86 +580,6 @@ during setup, injected at load time).
Top-level `tags:` and `category:` are also accepted and mirrored from
`metadata.hermes.*` by the loader.
### Skill authoring standards (HARDLINE)
Every new or modernized skill — bundled, optional, or contributed —
must meet these standards before merge. Reviewers reject PRs that
violate them.
1. **`description` ≤ 60 characters, one sentence, ends with a period.**
Long descriptions bloat skill listings and dilute the model's
attention when many skills are loaded. State the capability, not
the implementation. No marketing words ("powerful",
"comprehensive", "seamless", "advanced"). Don't repeat the skill
name. Verify with:
```python
import re, pathlib
m = re.search(r'^description: (.*)$',
pathlib.Path('skills/<cat>/<name>/SKILL.md').read_text(),
re.MULTILINE)
assert len(m.group(1)) <= 60, len(m.group(1))
```
2. **Tools referenced in SKILL.md prose must be native Hermes tools or
MCP servers the skill explicitly expects.** When the skill needs a
capability, point at the proper tool by name in backticks
(`` `terminal` ``, `` `web_extract` ``, `` `read_file` ``,
`` `patch` ``, `` `search_files` ``, `` `vision_analyze` ``,
`` `browser_navigate` ``, `` `delegate_task` ``, etc.). Do NOT
name shell utilities the agent already has wrapped — `grep` →
`search_files`, `cat`/`head`/`tail` → `read_file`, `sed`/`awk` →
`patch`, `find`/`ls` → `search_files target='files'`. If the skill
depends on an MCP server, name the MCP server and document the
expected setup in `## Prerequisites`. Anything else (third-party
CLIs, shell pipelines, etc.) is fair game inside script files but
should not be the headline interaction surface in the prose.
3. **`platforms:` gating audited against actual script imports.**
Skills that use POSIX-only primitives (`fcntl`, `termios`,
`os.setsid`, `os.kill(pid, 0)` for liveness, `/proc`, `/tmp`
hardcoded, `signal.SIGKILL`, bash heredocs, `osascript`, `apt`,
`systemctl`) must declare their supported platforms. Default
posture: try to fix it cross-platform first — `tempfile.gettempdir`,
`pathlib.Path`, `psutil.pid_exists`, Python-level filtering instead
of `grep`. Gate to a narrower set only when the dependency is
genuinely platform-bound.
4. **`author` credits the human contributor first.** For external
contributions, the contributor's real name + GitHub handle goes
first; "Hermes Agent" is the secondary collaborator. If the
contributor's commit shows "Hermes Agent" as author (because they
used Hermes to draft the skill), replace it with their actual name
— credit the human, not the tool.
5. **SKILL.md body uses the modern section order.** `# <Skill> Skill`
title, 2-3 sentence intro stating what it does and doesn't do,
`## When to Use`, `## Prerequisites`, `## How to Run`,
`## Quick Reference`, `## Procedure`, `## Pitfalls`,
`## Verification`. Target ~200 lines for a complex skill,
~100 lines for a simple one. Cut redundant intro fluff, marketing
prose, and re-explanations of env vars already in
`## Prerequisites`.
6. **Scripts go in `scripts/`, references in `references/`,
templates in `templates/`.** Don't expect the model to inline-write
parsers, XML walkers, or non-trivial logic every call — ship a
helper script. Reference it from SKILL.md by path relative to the
skill directory.
7. **Tests live at `tests/skills/test_<skill>_skill.py`** and use only
stdlib + pytest + `unittest.mock`. No live network calls. Run via
`scripts/run_tests.sh tests/skills/test_<skill>_skill.py -q`.
8. **`.env.example` additions are isolated to a clearly delimited
block.** Don't touch the surrounding file — contributor-supplied
`.env.example` versions are usually stale and edits outside the
skill's own block must be dropped during salvage.
The full salvage / modernization checklist for external skill PRs
lives in the `hermes-agent-dev` skill at
`references/new-skill-pr-salvage.md` — load it before polishing
contributor skill PRs.
---
## Toolsets
@@ -830,11 +717,10 @@ kanban task.
`unlink`, `comment`, `complete`, `block`, `unblock`, `archive`,
`tail`, plus less-commonly-used `watch`, `stats`, `runs`, `log`,
`assignees`, `heartbeat`, `notify-*`, `dispatch`, `daemon`, `gc`.
- **Worker/orchestrator toolset:** `tools/kanban_tools.py` exposes
`kanban_show`, `kanban_complete`, `kanban_block`, `kanban_heartbeat`,
`kanban_comment`, `kanban_create`, `kanban_link`; profiles that
explicitly enable the `kanban` toolset outside a dispatcher-spawned
task also get `kanban_list` and `kanban_unblock` for board routing.
- **Worker toolset:** `tools/kanban_tools.py` exposes `kanban_show`,
`kanban_complete`, `kanban_block`, `kanban_heartbeat`, `kanban_comment`,
`kanban_create`, `kanban_link` — gated by `HERMES_KANBAN_TASK` so
the schema only appears for processes actually running as a worker.
- **Dispatcher:** long-lived loop that (default every 60s) reclaims
stale claims, promotes ready tasks, atomically claims, and spawns
assigned profiles. Runs **inside the gateway** by default via
@@ -850,9 +736,8 @@ Isolation model:
- **Tenant** is a soft namespace *within* a board — one specialist
fleet can serve multiple businesses with workspace-path + memory-key
isolation.
- After `kanban.failure_limit` consecutive non-success attempts on the
same task (default: 2), the dispatcher auto-blocks it to prevent spin
loops.
- After ~5 consecutive spawn failures on the same task the dispatcher
auto-blocks it to prevent spin loops.
Full user-facing docs: `website/docs/user-guide/features/kanban.md`.
@@ -1013,39 +898,17 @@ def profile_env(tmp_path, monkeypatch):
**ALWAYS use `scripts/run_tests.sh`** — do not call `pytest` directly. The script enforces
hermetic environment parity with CI (unset credential vars, TZ=UTC, LANG=C.UTF-8,
`-n auto` xdist workers, in-tree subprocess-isolation plugin). Direct `pytest`
on a 16+ core developer machine with API keys set diverges from CI in ways
that have caused multiple "works locally, fails in CI" incidents (and the reverse).
4 xdist workers matching GHA ubuntu-latest). Direct `pytest` on a 16+ core
developer machine with API keys set diverges from CI in ways that have caused
multiple "works locally, fails in CI" incidents (and the reverse).
```bash
scripts/run_tests.sh # full suite, CI-parity
scripts/run_tests.sh tests/gateway/ # one directory
scripts/run_tests.sh tests/agent/test_foo.py::test_x # one test
scripts/run_tests.sh -v --tb=long # pass-through pytest flags
scripts/run_tests.sh --no-isolate tests/foo/ # disable subprocess isolation (faster, for debugging)
```
### Subprocess-per-test isolation
Every test runs in a freshly-spawned Python subprocess via the in-tree plugin
at `tests/_isolate_plugin.py`. This means module-level dicts/sets and
ContextVars from one test cannot leak into the next — the historic
`_reset_module_state` autouse fixture is gone.
Implementation notes:
- The plugin uses `multiprocessing.get_context("spawn")`, which works on
Linux, macOS, and Windows alike (POSIX `fork` is not used).
- Per-test overhead is ~0.51.0s (Python startup + pytest collection). xdist
parallelism amortizes this across cores; on a 20-core box the full suite
finishes in roughly the same wall time as before, but flake-free.
- `isolate_timeout` (configured in `pyproject.toml`) caps each test at 30s.
Hangs are killed and surfaced as a failure report.
- Pass `--no-isolate` to disable isolation — useful when debugging a single
test interactively, or when you specifically want to verify state leakage.
- The plugin disables itself in child processes (sentinel envvar
`HERMES_ISOLATE_CHILD=1`), so there's no fork-bomb risk.
### Why the wrapper (and why the old "just call pytest" doesn't work)
Five real sources of local-vs-CI drift the script closes:
@@ -1056,7 +919,7 @@ Five real sources of local-vs-CI drift the script closes:
| HOME / `~/.hermes/` | Your real config+auth.json | Temp dir per test |
| Timezone | Local TZ (PDT etc.) | UTC |
| Locale | Whatever is set | C.UTF-8 |
| xdist workers | `-n auto` = all cores | `-n auto` (safe — subprocess isolation prevents cross-worker flakes) |
| xdist workers | `-n auto` = all cores (20+ on a workstation) | `-n 4` matching CI |
`tests/conftest.py` also enforces points 1-4 as an autouse fixture so ANY pytest
invocation (including IDE integrations) gets hermetic behavior — but the wrapper
@@ -1064,21 +927,15 @@ is belt-and-suspenders.
### Running without the wrapper (only if you must)
If you can't use the wrapper (e.g. inside an IDE that shells pytest directly),
at minimum activate the venv. The isolation plugin loads automatically from
`addopts` in `pyproject.toml`, so you get the same per-test process isolation
either way.
If you can't use the wrapper (e.g. on Windows or inside an IDE that shells
pytest directly), at minimum activate the venv and pass `-n 4`:
```bash
source .venv/bin/activate # or: source venv/bin/activate
python -m pytest tests/ -q
python -m pytest tests/ -q -n 4
```
If you need to bypass isolation for fast feedback while debugging:
```bash
python -m pytest tests/agent/test_foo.py -q --no-isolate
```
Worker count above 4 will surface test-ordering flakes that CI never sees.
Always run the full suite before pushing changes.

View File

@@ -49,24 +49,6 @@ If your skill is specialized, community-contributed, or niche, it's better suite
---
## Memory Providers: Ship as a Standalone Plugin
**We are no longer accepting new memory providers into this repo.** The set of built-in providers under `plugins/memory/` (honcho, mem0, supermemory, byterover, hindsight, holographic, openviking, retaindb) is closed. If you want to add a new memory backend, publish it as a **standalone plugin repo** that users install into `~/.hermes/plugins/` (or via a pip entry point).
Standalone memory plugins:
- Implement the same `MemoryProvider` ABC (`agent/memory_provider.py`) — `sync_turn`, `prefetch`, `shutdown`, and optionally `post_setup(hermes_home, config)` for setup-wizard integration
- Use the same discovery system — `discover_memory_providers()` picks them up from user/project plugin directories and pip entry points
- Integrate with `hermes memory setup` via `post_setup()` — no need to touch core code
- Can register their own CLI subcommands via `register_cli(subparser)` in a `cli.py` file
- Get all the same lifecycle hooks and config plumbing as in-tree providers
PRs that add a new directory under `plugins/memory/` will be closed with a pointer to publish the provider as its own repo. Existing in-tree providers stay; bug fixes to them are welcome.
This isn't a quality bar — it's a coupling-and-maintenance decision. Memory providers are the most common plugin type and they shouldn't all live in this tree.
---
## Development Setup
### Prerequisites
@@ -91,6 +73,9 @@ export VIRTUAL_ENV="$(pwd)/venv"
# Install with all extras (messaging, cron, CLI menus, dev tools)
uv pip install -e ".[all,dev]"
# Optional: RL training submodule
# git submodule update --init tinker-atropos && uv pip install -e "./tinker-atropos"
# Optional: browser tools
npm install
```
@@ -172,7 +157,7 @@ hermes-agent/
│ ├── vision_tools.py # Image analysis via multimodal models
│ ├── delegate_tool.py # Subagent spawning and parallel task execution
│ ├── code_execution_tool.py # Sandboxed Python with RPC tool access
│ ├── session_search_tool.py # Search past conversations with FTS5 + anchored windows
│ ├── session_search_tool.py # Search past conversations with FTS5 + summarization
│ ├── cronjob_tools.py # Scheduled task management
│ ├── skill_tools.py # Skill search, load, manage
│ └── environments/ # Terminal execution backends
@@ -193,6 +178,7 @@ hermes-agent/
├── skills/ # Bundled skills (copied to ~/.hermes/skills/ on install)
├── optional-skills/ # Official optional skills (discoverable via hub, not activated by default)
├── environments/ # RL training environments (Atropos integration)
├── tests/ # Test suite
├── website/ # Documentation site (hermes-agent.nousresearch.com)
@@ -210,7 +196,7 @@ hermes-agent/
| `~/.hermes/skills/` | All active skills (bundled + hub-installed + agent-created) |
| `~/.hermes/memories/` | Persistent memory (MEMORY.md, USER.md) |
| `~/.hermes/state.db` | SQLite session database |
| `~/.hermes/sessions/` | Gateway routing index (`sessions.json`), request-dump breadcrumbs, gateway `*.jsonl` transcripts, and (optionally) per-session JSON snapshots when `sessions.write_json_snapshots: true` is set. The per-session snapshots are off by default; state.db is canonical. |
| `~/.hermes/sessions/` | JSON session logs |
| `~/.hermes/cron/` | Scheduled job data |
| `~/.hermes/whatsapp/session/` | WhatsApp bridge credentials |
@@ -239,7 +225,7 @@ User message → AIAgent._run_agent_loop()
- **Self-registering tools**: Each tool file calls `registry.register()` at import time. `model_tools.py` triggers discovery by importing all tool modules.
- **Toolset grouping**: Tools are grouped into toolsets (`web`, `terminal`, `file`, `browser`, etc.) that can be enabled/disabled per platform.
- **Session persistence**: All conversations are stored in SQLite (`hermes_state.py`) with full-text search and unique session titles. Per-session JSON snapshots in `~/.hermes/sessions/` were superseded by the SQLite store and are off by default; opt back in with `sessions.write_json_snapshots: true` if you have external tooling that consumes the JSON files directly.
- **Session persistence**: All conversations are stored in SQLite (`hermes_state.py`) with full-text search and unique session titles. JSON logs go to `~/.hermes/sessions/`.
- **Ephemeral injection**: System prompts and prefill messages are injected at API call time, never persisted to the database or logs.
- **Provider abstraction**: The agent works with any OpenAI-compatible API. Provider resolution happens at init time (Nous Portal OAuth, OpenRouter API key, or custom endpoint).
- **Provider routing**: When using OpenRouter, `provider_routing` in config.yaml controls provider selection (sort by throughput/latency/price, allow/ignore specific providers, data retention policies). These are injected as `extra_body.provider` in API requests.
@@ -475,58 +461,6 @@ Gateway and messaging sessions never collect secrets in-band; they instruct the
See `skills/gifs/gif-search/` and `skills/email/himalaya/` for examples.
### Skill authoring standards (HARDLINE)
Every new or modernized skill — bundled, optional, or contributed — must meet these standards before merge. Reviewers reject PRs that violate them.
1. **`description` ≤ 60 characters, one sentence, ends with a period.** Long descriptions bloat the skill listing UI and dilute the model's attention when many skills are loaded. State the capability, not the implementation. No marketing words ("powerful", "comprehensive", "seamless", "advanced"). Don't repeat the skill name. Verify with:
```python
import re, pathlib
m = re.search(r'^description: (.*)$',
pathlib.Path('skills/<cat>/<name>/SKILL.md').read_text(),
re.MULTILINE)
assert len(m.group(1)) <= 60, len(m.group(1))
```
Good: `Search arXiv papers by keyword, author, category, or ID.`
Bad: `A powerful and comprehensive skill that allows the agent to search arXiv for relevant academic papers using various criteria including keywords, authors, and categories.`
2. **Tools referenced in SKILL.md prose must be native Hermes tools or MCP servers the skill explicitly expects.** When the skill needs a capability, point at the proper tool by name in backticks: `` `terminal` ``, `` `web_extract` ``, `` `web_search` ``, `` `read_file` ``, `` `write_file` ``, `` `patch` ``, `` `search_files` ``, `` `vision_analyze` ``, `` `browser_navigate` ``, `` `delegate_task` ``, `` `image_generate` ``, `` `text_to_speech` ``, `` `cronjob` ``, `` `memory` ``, `` `skill_view` ``, `` `todo` ``, `` `execute_code` ``.
Do NOT name shell utilities the agent already has wrapped:
| Don't say | Say |
|---|---|
| `grep`, `rg` | `search_files` |
| `cat`, `head`, `tail` | `read_file` |
| `sed`, `awk` | `patch` |
| `find`, `ls` | `search_files` (with `target='files'`) |
| `curl` for content extraction | `web_extract` |
| `echo > file`, `cat <<EOF` | `write_file` |
If the skill depends on an MCP server, name the MCP server and document its setup in `## Prerequisites`. Third-party CLIs (e.g. `ffmpeg`, `gh`, a specific SDK) are fine to invoke from inside script files, but the prose should frame the interaction as "invoke through the `terminal` tool", not as a manual shell session.
3. **`platforms:` gating audited against actual script imports.** Skills that use POSIX-only primitives (`fcntl`, `termios`, `os.setsid`, `os.kill(pid, 0)` for liveness, `/proc`, hardcoded `/tmp` paths, `signal.SIGKILL`, bash heredocs, `osascript`, `apt`, `systemctl`) must declare their supported platforms via the `platforms:` frontmatter. Default posture is to fix it cross-platform first — `tempfile.gettempdir()`, `pathlib.Path`, `psutil.pid_exists()`, Python-level filtering instead of `grep`. Gate to a narrower set only when the dependency is genuinely platform-bound (e.g. `osascript` is macOS-only, `/proc` is Linux-only).
4. **`author` credits the human contributor first.** For external contributions, the contributor's real name + GitHub handle goes first (`Jane Doe (jane-doe)`); "Hermes Agent" is the secondary collaborator. If the contributor's commit shows "Hermes Agent" as author because they used Hermes to draft the skill, replace it with their actual name — credit the human, not the tool.
5. **SKILL.md body uses the modern section order.** `# <Skill> Skill` title, 2-3 sentence intro stating what it does and what it doesn't do, then:
- `## When to Use` — trigger conditions
- `## Prerequisites` — env vars, install steps, MCP setup, API key sourcing
- `## How to Run` — canonical invocation through the `terminal` tool
- `## Quick Reference` — flat command/API reference
- `## Procedure` — numbered steps with copy-paste commands
- `## Pitfalls` — known limits, rate limits, things that look broken but aren't
- `## Verification` — single command that proves the skill works
Target ~200 lines for a complex skill, ~100 lines for a simple one. Cut redundant intro fluff, marketing prose, and re-explanations of env vars already documented in `## Prerequisites`.
6. **Scripts go in `scripts/`, references in `references/`, templates in `templates/`.** Don't expect the model to inline-write parsers, XML walkers, or non-trivial logic every call — ship a helper script. Reference scripts from SKILL.md by path relative to the skill directory.
7. **Tests live at `tests/skills/test_<skill>_skill.py`** and use only stdlib + pytest + `unittest.mock`. No live network calls. Run via `scripts/run_tests.sh tests/skills/test_<skill>_skill.py -q`. Must pass under the hermetic CI env (no API keys leaking through). Use `monkeypatch` and `tmp_path` for any env-var or filesystem dependencies.
8. **`.env.example` additions are isolated to a clearly delimited block.** Don't touch the surrounding file — contributor-supplied `.env.example` versions are usually stale, and edits outside the skill's own block will be dropped during salvage. Comment all values with `#` (it's documentation, not live config).
### Skill guidelines
- **No external dependencies unless absolutely necessary.** Prefer stdlib Python, curl, and existing Hermes tools (`web_extract`, `terminal`, `read_file`).
@@ -800,47 +734,6 @@ Hermes has terminal access. Security matters.
If your PR affects security, note it explicitly in the description.
### Dependency pinning policy (supply chain hardening)
After the [litellm supply chain compromise](https://github.com/BerriAI/litellm/issues/24512) in March 2026 and the [Mini Shai-Hulud worm campaign](https://socket.dev/blog/tanstack-npm-packages-compromised-mini-shai-hulud-supply-chain-attack) in May 2026, all dependencies must follow these rules:
| Source type | Required treatment | Rationale |
|---|---|---|
| **PyPI package** | `>=floor,<next_major` | PyPI versions are immutable once published, but new versions can be pushed into your range. A `<next_major` ceiling stops a 1.x install from upgrading to a malicious 2.0.0. |
| **Git URL** (atroposlib, tinker, yc-bench, Baileys) | Full commit SHA | Branches and tags are mutable refs; SHA is content-addressed. |
| **GitHub Actions** | Full commit SHA + version comment | Action tags are mutable refs (e.g. tj-actions/changed-files March 2025). Pin as `uses: owner/action@<sha> # vX.Y.Z` |
| **CI-only pip installs** | `==exact` | Hermetic CI builds; churn is acceptable. |
**Every new PyPI dependency in a PR must have a `<next_major` upper bound.** PRs adding unbounded `>=X.Y.Z` specs will be rejected by reviewers. The `supply-chain-audit.yml` CI workflow also flags dependency manifest changes for manual review.
**How to determine the ceiling:**
- If the package is at version `1.x.y`, use `<2`.
- If the package is at version `0.x.y` (pre-1.0), use `<0.(current_minor + 2)` — e.g. if current is `0.29.x`, use `<0.32`. This gives ~2 minor versions of headroom while keeping the window small enough that a hostile takeover version is unlikely to land inside it.
- Exception: packages with very stable APIs (e.g. `aiohttp-socks`) can use `<1` at reviewer discretion.
**Examples:**
```toml
# ✅ Correct — post-1.0
"openai>=2.21.0,<3"
"pydantic>=2.12.5,<3"
# ✅ Correct — pre-1.0 (tight minor window)
"asyncpg>=0.29,<0.32"
"aiosqlite>=0.20,<0.23"
"hindsight-client>=0.4.22,<0.5"
# ❌ Rejected — no upper bound
"some-package>=1.2.3"
# ❌ Rejected — too tight (blocks legitimate patches)
"some-package==1.2.3"
# ❌ Rejected — too loose for pre-1.0 (allows 80 minor versions)
"some-package>=0.20,<1"
```
**Reference PRs:** #2796 (litellm removal), #2810 (upper bounds pass), #9801 (SHA pinning + supply-chain-audit CI).
---
## Pull Request Process

View File

@@ -1,12 +1,5 @@
FROM ghcr.io/astral-sh/uv:0.11.6-python3.13-trixie@sha256:b3c543b6c4f23a5f2df22866bd7857e5d304b67a564f4feab6ac22044dde719b AS uv_source
# Node 22 LTS source stage. Debian trixie's bundled nodejs is pinned to 20.x
# which reached EOL in April 2026 — we copy node + npm + corepack from the
# upstream node:22 image instead so we can stay on a supported LTS without
# waiting for Debian 14 (forky, ~mid-2027). Bookworm-based slim image used
# so the produced binary links against glibc 2.36, which runs cleanly on
# our Debian 13 (trixie, glibc 2.41) runtime. Bumping to a new Node major
# is a one-line ARG change; see #4977.
FROM node:22-bookworm-slim@sha256:7af03b14a13c8cdd38e45058fd957bf00a72bbe17feac43b1c15a689c029c732 AS node_source
FROM tianon/gosu:1.19-trixie@sha256:3b176695959c71e123eb390d427efc665eeb561b1540e82679c15e992006b8b9 AS gosu_source
FROM debian:13.4
# Disable Python stdout buffering to ensure logs are printed immediately
@@ -16,82 +9,20 @@ ENV PYTHONUNBUFFERED=1
# install survives the /opt/data volume overlay at runtime.
ENV PLAYWRIGHT_BROWSERS_PATH=/opt/hermes/.playwright
# Install system dependencies in one layer, clear APT cache.
# tini was previously PID 1 to reap orphaned zombie processes (MCP stdio
# subprocesses, git, bun, etc.) that would otherwise accumulate when hermes
# ran as PID 1. See #15012. Phase 2 of the s6-overlay supervision plan
# replaces tini with s6-overlay's /init (PID 1 = s6-svscan), which reaps
# zombies non-blockingly on SIGCHLD and additionally supervises the main
# hermes process, the dashboard, and per-profile gateways.
# Install system dependencies in one layer, clear APT cache
# tini reaps orphaned zombie processes (MCP stdio subprocesses, git, bun, etc.)
# that would otherwise accumulate when hermes runs as PID 1. See #15012.
RUN apt-get update && \
apt-get install -y --no-install-recommends \
ca-certificates curl iputils-ping python3 python-is-python3 ripgrep ffmpeg gcc python3-dev libffi-dev procps git openssh-client docker-cli xz-utils && \
build-essential curl nodejs npm python3 ripgrep ffmpeg gcc python3-dev libffi-dev procps git openssh-client docker-cli tini && \
rm -rf /var/lib/apt/lists/*
# ---------- s6-overlay install ----------
# s6-overlay provides supervision for the main hermes process, the dashboard,
# and per-profile gateways. /init becomes PID 1 below — see ENTRYPOINT.
#
# Multi-arch: BuildKit auto-populates TARGETARCH (amd64 / arm64). s6-overlay
# uses tarball names keyed on the kernel arch string (x86_64 / aarch64), so
# we map between them inline. The noarch + symlinks tarballs are
# architecture-independent and reused as-is.
#
# We use `curl` instead of `ADD` for the per-arch tarball because `ADD`
# evaluates its URL at parse time, before any ARG / TARGETARCH substitution
# — splitting one URL per arch into two ADDs would download both on every
# build and leave dead bytes in the cache. A single curl + arch-keyed URL
# is simpler and cache-friendlier.
#
# Supply-chain integrity: every tarball is checksum-verified against the
# upstream-published SHA256. To bump S6_OVERLAY_VERSION, fetch the four
# `.sha256` files from the corresponding release and update the ARGs. The
# checksum lookup happens during build, so a compromised release artifact
# fails the build loudly instead of silently producing a tampered image.
ARG TARGETARCH
ARG S6_OVERLAY_VERSION=3.2.3.0
ARG S6_OVERLAY_NOARCH_SHA256=b720f9d9340efc8bb07528b9743813c836e4b02f8693d90241f047998b4c53cf
ARG S6_OVERLAY_X86_64_SHA256=a93f02882c6ed46b21e7adb5c0add86154f01236c93cd82c7d682722e8840563
ARG S6_OVERLAY_AARCH64_SHA256=0952056ff913482163cc30e35b2e944b507ba1025d78f5becbb89367bf344581
ARG S6_OVERLAY_SYMLINKS_SHA256=a60dc5235de3ecbcf874b9c1f18d73263ab99b289b9329aa950e8729c4789f0e
ADD https://github.com/just-containers/s6-overlay/releases/download/v${S6_OVERLAY_VERSION}/s6-overlay-noarch.tar.xz /tmp/
ADD https://github.com/just-containers/s6-overlay/releases/download/v${S6_OVERLAY_VERSION}/s6-overlay-symlinks-noarch.tar.xz /tmp/
RUN set -eu; \
case "${TARGETARCH:-amd64}" in \
amd64) s6_arch="x86_64"; s6_arch_sha="${S6_OVERLAY_X86_64_SHA256}" ;; \
arm64) s6_arch="aarch64"; s6_arch_sha="${S6_OVERLAY_AARCH64_SHA256}" ;; \
*) echo "Unsupported TARGETARCH=${TARGETARCH} for s6-overlay" >&2; exit 1 ;; \
esac; \
curl -fsSL --retry 3 -o /tmp/s6-overlay-arch.tar.xz \
"https://github.com/just-containers/s6-overlay/releases/download/v${S6_OVERLAY_VERSION}/s6-overlay-${s6_arch}.tar.xz"; \
{ \
printf '%s %s\n' "${S6_OVERLAY_NOARCH_SHA256}" /tmp/s6-overlay-noarch.tar.xz; \
printf '%s %s\n' "${s6_arch_sha}" /tmp/s6-overlay-arch.tar.xz; \
printf '%s %s\n' "${S6_OVERLAY_SYMLINKS_SHA256}" /tmp/s6-overlay-symlinks-noarch.tar.xz; \
} > /tmp/s6-overlay.sha256; \
sha256sum -c /tmp/s6-overlay.sha256; \
tar -C / -Jxpf /tmp/s6-overlay-noarch.tar.xz; \
tar -C / -Jxpf /tmp/s6-overlay-arch.tar.xz; \
tar -C / -Jxpf /tmp/s6-overlay-symlinks-noarch.tar.xz; \
rm /tmp/s6-overlay-*.tar.xz /tmp/s6-overlay.sha256
# Non-root user for runtime; UID can be overridden via HERMES_UID at runtime
RUN useradd -u 10000 -m -d /opt/data hermes
COPY --chmod=0755 --from=gosu_source /gosu /usr/local/bin/
COPY --chmod=0755 --from=uv_source /usr/local/bin/uv /usr/local/bin/uvx /usr/local/bin/
# Node 22 LTS: copy the node binary plus the bundled npm + corepack JS
# installs from the upstream image. npm and npx are recreated as symlinks
# because they're symlinks in the source image (and need to live on PATH).
# See node_source stage at the top of the file for the version-bump
# rationale (#4977).
COPY --chmod=0755 --from=node_source /usr/local/bin/node /usr/local/bin/
COPY --from=node_source /usr/local/lib/node_modules/npm /usr/local/lib/node_modules/npm
COPY --from=node_source /usr/local/lib/node_modules/corepack /usr/local/lib/node_modules/corepack
RUN ln -sf /usr/local/lib/node_modules/npm/bin/npm-cli.js /usr/local/bin/npm && \
ln -sf /usr/local/lib/node_modules/npm/bin/npx-cli.js /usr/local/bin/npx && \
ln -sf /usr/local/lib/node_modules/corepack/dist/corepack.js /usr/local/bin/corepack
WORKDIR /opt/hermes
# ---------- Layer-cached dependency install ----------
@@ -108,15 +39,14 @@ COPY ui-tui/package.json ui-tui/package-lock.json ui-tui/
COPY ui-tui/packages/hermes-ink/ ui-tui/packages/hermes-ink/
# `npm_config_install_links=false` forces npm to install `file:` deps as
# symlinks instead of copies. This is the default since npm 10+, which is
# what the image ships now (via the node:22 source stage). We set it
# explicitly anyway as defense-in-depth: the previous Debian-bundled npm
# 9.x defaulted to install-as-copy, which produced a hidden
# node_modules/.package-lock.json that permanently disagreed with the root
# lock on the @hermes/ink entry, tripped the TUI launcher's
# `_tui_need_npm_install()` check on every startup, and triggered a
# runtime `npm install` that then failed with EACCES. Keeping the env
# guards against a future regression if the source npm version changes.
# symlinks (the npm 10+ default) even on Debian's older bundled npm 9.x,
# which defaults to `install-links=true` and installs file deps as *copies*.
# The host-side package-lock.json is generated with a newer npm that uses
# symlinks, so an install-as-copy produces a hidden node_modules/.package-lock.json
# that permanently disagrees with the root lock on the @hermes/ink entry.
# That disagreement trips the TUI launcher's `_tui_need_npm_install()`
# check on every startup and triggers a runtime `npm install` that then
# fails with EACCES (node_modules/ is root-owned from build time).
ENV npm_config_install_links=false
RUN npm install --prefer-offline --no-audit && \
@@ -136,23 +66,17 @@ RUN npm install --prefer-offline --no-audit && \
# frontend stats the readme path during dep resolution, so we `touch` an
# empty placeholder — the real README is restored by `COPY . .` below.
#
# `uv sync --frozen --no-install-project --extra all --extra messaging`
# installs the deps reachable through the composite `[all]` extra
# (handpicked set intended for the production image), plus gateway
# messaging adapters that should work in the published image without a
# first-boot lazy install. We do NOT use `--all-extras`:
# `uv sync --frozen --no-install-project --extra all` installs only the
# deps reachable through the composite `[all]` extra (handpicked set
# intended for the production image). We do NOT use `--all-extras`:
# that would pull in `[rl]` (atroposlib + tinker + torch + wandb from
# git), `[yc-bench]` (another git dep), and `[termux-all]` (Android
# redundancy), none of which belong in the published container.
#
# Provider packages (anthropic, bedrock, azure-identity) are included
# so Docker users can use these providers without requiring runtime
# lazy-install access to PyPI (often blocked in containerized envs).
#
# The editable link is created after the source copy below.
COPY pyproject.toml uv.lock ./
RUN touch ./README.md
RUN uv sync --frozen --no-install-project --extra all --extra messaging --extra anthropic --extra bedrock --extra azure-identity
RUN uv sync --frozen --no-install-project --extra all
# ---------- Source code ----------
# .dockerignore excludes node_modules, so the installs above survive.
@@ -170,122 +94,20 @@ RUN cd web && npm run build && \
# hermes_cli/main.py succeeds (see #18800). /opt/hermes/web is build-time
# only (HERMES_WEB_DIST points at hermes_cli/web_dist) and is intentionally
# not chowned here.
# The .venv MUST remain hermes-writable so lazy_deps.py can install
# remaining optional platform packages and future pin bumps at first use.
# Without this, `uv pip install` fails with EACCES and adapters silently
# fail to load. See tools/lazy_deps.py.
USER root
RUN chmod -R a+rX /opt/hermes && \
chown -R hermes:hermes /opt/hermes/.venv /opt/hermes/ui-tui /opt/hermes/node_modules
# Start as root so the s6-overlay stage2 hook can usermod/groupmod and chown
# the data volume. Each supervised service then drops to the hermes user via
# `s6-setuidgid hermes` in its run script. If HERMES_UID is unset, services
# run as the default hermes user (UID 10000).
chown -R hermes:hermes /opt/hermes/ui-tui /opt/hermes/node_modules
# Start as root so the entrypoint can usermod/groupmod + gosu.
# If HERMES_UID is unset, the entrypoint drops to the default hermes user (10000).
# ---------- Link hermes-agent itself (editable) ----------
# Deps are already installed in the cached layer above; `--no-deps` makes
# this a fast (~1s) egg-link creation with no resolution or downloads.
RUN uv pip install --no-cache-dir --no-deps -e "."
# ---------- Bake build-time git revision ----------
# .dockerignore excludes .git, so `git rev-parse HEAD` from inside the
# container always returns nothing — meaning `hermes dump` reports
# "(unknown)" and the startup banner drops its `· upstream <sha>` suffix.
# That makes support triage from container bug reports impossible:
# we can't tell which commit the user is actually running.
#
# Fix: write the commit SHA passed via the HERMES_GIT_SHA build-arg to
# /opt/hermes/.hermes_build_sha at build time, and have
# hermes_cli/build_info.py read it at runtime. Both `hermes dump` and
# banner.get_git_banner_state() try the baked SHA first, then fall back
# to live `git rev-parse` for source installs (unchanged behaviour).
#
# The arg is optional — local `docker build` without --build-arg simply
# omits the file, and the runtime falls back to live-git lookup. CI
# (.github/workflows/docker-publish.yml) passes ${{ github.sha }} so
# every published image has it.
ARG HERMES_GIT_SHA=
RUN if [ -n "${HERMES_GIT_SHA}" ]; then \
printf '%s\n' "${HERMES_GIT_SHA}" > /opt/hermes/.hermes_build_sha && \
chown hermes:hermes /opt/hermes/.hermes_build_sha; \
fi
# ---------- s6-overlay service wiring ----------
# Static services declared at build time: main-hermes + dashboard.
# Per-profile gateway services are registered dynamically at runtime by
# the profile create/delete hooks (Phase 4); they live under
# /run/service/ (tmpfs) and are reconciled on container restart by
# /etc/cont-init.d/02-reconcile-profiles (Phase 4 Task 4.0).
COPY docker/s6-rc.d/ /etc/s6-overlay/s6-rc.d/
# stage2-hook handles UID/GID remap, volume chown, config seeding,
# skills sync — all the work the old entrypoint.sh did before
# `exec hermes`. Wired in as cont-init.d/01- so it
# runs before user services start.
#
# 02-reconcile-profiles re-creates per-profile gateway s6 service
# slots from $HERMES_HOME/profiles/<name>/ after a container restart
# (the /run/service/ scandir is tmpfs and wiped on restart). Phase 4.
RUN mkdir -p /etc/cont-init.d && \
printf '#!/command/with-contenv sh\nexec /opt/hermes/docker/stage2-hook.sh\n' \
> /etc/cont-init.d/01-hermes-setup && \
chmod +x /etc/cont-init.d/01-hermes-setup
COPY --chmod=0755 docker/cont-init.d/015-supervise-perms /etc/cont-init.d/015-supervise-perms
COPY --chmod=0755 docker/cont-init.d/02-reconcile-profiles /etc/cont-init.d/02-reconcile-profiles
# ---------- Runtime ----------
ENV HERMES_WEB_DIST=/opt/hermes/hermes_cli/web_dist
ENV HERMES_HOME=/opt/data
# `docker exec` privilege-drop shim. When operators run
# `docker exec <c> hermes ...` they default to root, and any file the
# command writes under $HERMES_HOME (auth.json, .env, config.yaml) ends
# up root-owned and unreadable to the supervised gateway (UID 10000).
# The shim lives at /opt/hermes/bin/hermes, sits earliest on PATH, and
# transparently re-exec's the real venv binary via `s6-setuidgid hermes`
# when invoked as root. Non-root callers (supervised processes,
# `--user hermes`, etc.) hit the short-circuit path with no overhead.
# Recursion is impossible because the shim exec's the venv binary by
# absolute path (/opt/hermes/.venv/bin/hermes). See the shim source for
# the opt-out env var (HERMES_DOCKER_EXEC_AS_ROOT=1).
COPY --chmod=0755 docker/hermes-exec-shim.sh /opt/hermes/bin/hermes
# Pre-s6 entrypoint.sh did `source .venv/bin/activate` which exported
# the venv bin onto PATH; Architecture B's main-wrapper.sh does the
# same for the container's main process, but `docker exec` and our
# cont-init.d scripts don't pass through the wrapper. Expose the venv
# bin globally so `docker exec <container> hermes ...` and any
# subprocess that doesn't activate the venv first still find hermes.
#
# /opt/hermes/bin is prepended ahead of the venv so the privilege-drop
# shim wins PATH resolution. The shim's last act is to exec the venv
# binary by absolute path, so this PATH ordering is transparent to
# every other consumer.
ENV PATH="/opt/hermes/bin:/opt/hermes/.venv/bin:/opt/data/.local/bin:${PATH}"
RUN mkdir -p /opt/data
ENV PATH="/opt/data/.local/bin:${PATH}"
VOLUME [ "/opt/data" ]
# s6-overlay's /init is PID 1. It sets up the supervision tree, runs
# /etc/cont-init.d/* (our stage2 hook), starts s6-rc services
# declared in /etc/s6-overlay/s6-rc.d/, then exec's its remaining
# argv as the container's "main program" with stdin/stdout/stderr
# inherited (this is what makes interactive --tui work). When the
# main program exits, /init begins stage 3 shutdown and the container
# exits with the program's exit code. Replaces tini — see Phase 2 of
# docs/plans/2026-05-07-s6-overlay-dynamic-subagent-gateways.md.
#
# We use the ENTRYPOINT+CMD split rather than CMD alone so the
# wrapper is prepended to user-supplied args automatically:
#
# docker run <image> → /init main-wrapper.sh (CMD default)
# docker run <image> chat -q "hi" → /init main-wrapper.sh chat -q hi
# docker run <image> sleep infinity → /init main-wrapper.sh sleep infinity
# docker run <image> --tui → /init main-wrapper.sh --tui
#
# main-wrapper.sh handles arg routing (bare-exec vs. hermes
# subcommand vs. no-args), drops to the hermes user via s6-setuidgid,
# and exec's the final program so its exit code becomes the container
# exit code. Without the wrapper-as-ENTRYPOINT, leading-dash args
# like `--version` would be intercepted by /init's POSIX shell.
ENTRYPOINT [ "/init", "/opt/hermes/docker/main-wrapper.sh" ]
CMD [ ]
ENTRYPOINT [ "/usr/bin/tini", "-g", "--", "/opt/hermes/docker/entrypoint.sh" ]

View File

@@ -1,9 +1,4 @@
graft skills
graft optional-skills
# Bundled plugin manifests (plugin.yaml / plugin.yml). Without these the
# PluginManager scan (hermes_cli/plugins.py) finds zero plugins on installs
# built from the sdist (e.g. Homebrew, downstream packagers). package-data
# below covers the wheel; this covers the sdist. See #34034 / #28149.
recursive-include plugins plugin.yaml plugin.yml
global-exclude __pycache__
global-exclude *.py[cod]

View File

@@ -14,7 +14,7 @@
**The self-improving AI agent built by [Nous Research](https://nousresearch.com).** It's the only agent with a built-in learning loop — it creates skills from experience, improves them during use, nudges itself to persist knowledge, searches its own past conversations, and builds a deepening model of who you are across sessions. Run it on a $5 VPS, a GPU cluster, or serverless infrastructure that costs nearly nothing when idle. It's not tied to your laptop — talk to it from Telegram while it works on a cloud VM.
Use any model you want — [Nous Portal](https://portal.nousresearch.com), [OpenRouter](https://openrouter.ai) (200+ models), [NovitaAI](https://novita.ai) (AI-native cloud for Model API, Agent Sandbox, and GPU Cloud), [NVIDIA NIM](https://build.nvidia.com) (Nemotron), [Xiaomi MiMo](https://platform.xiaomimimo.com), [z.ai/GLM](https://z.ai), [Kimi/Moonshot](https://platform.moonshot.ai), [MiniMax](https://www.minimax.io), [Hugging Face](https://huggingface.co), OpenAI, or your own endpoint. Switch with `hermes model` — no code changes, no lock-in.
Use any model you want — [Nous Portal](https://portal.nousresearch.com), [OpenRouter](https://openrouter.ai) (200+ models), [NVIDIA NIM](https://build.nvidia.com) (Nemotron), [Xiaomi MiMo](https://platform.xiaomimimo.com), [z.ai/GLM](https://z.ai), [Kimi/Moonshot](https://platform.moonshot.ai), [MiniMax](https://www.minimax.io), [Hugging Face](https://huggingface.co), OpenAI, or your own endpoint. Switch with `hermes model` — no code changes, no lock-in.
<table>
<tr><td><b>A real terminal interface</b></td><td>Full TUI with multiline editing, slash-command autocomplete, conversation history, interrupt-and-redirect, and streaming tool output.</td></tr>
@@ -22,8 +22,8 @@ Use any model you want — [Nous Portal](https://portal.nousresearch.com), [Open
<tr><td><b>A closed learning loop</b></td><td>Agent-curated memory with periodic nudges. Autonomous skill creation after complex tasks. Skills self-improve during use. FTS5 session search with LLM summarization for cross-session recall. <a href="https://github.com/plastic-labs/honcho">Honcho</a> dialectic user modeling. Compatible with the <a href="https://agentskills.io">agentskills.io</a> open standard.</td></tr>
<tr><td><b>Scheduled automations</b></td><td>Built-in cron scheduler with delivery to any platform. Daily reports, nightly backups, weekly audits — all in natural language, running unattended.</td></tr>
<tr><td><b>Delegates and parallelizes</b></td><td>Spawn isolated subagents for parallel workstreams. Write Python scripts that call tools via RPC, collapsing multi-step pipelines into zero-context-cost turns.</td></tr>
<tr><td><b>Runs anywhere, not just your laptop</b></td><td>Six terminal backends — local, Docker, SSH, Singularity, Modal, and Daytona. Daytona and Modal offer serverless persistence — your agent's environment hibernates when idle and wakes on demand, costing nearly nothing between sessions. Run it on a $5 VPS or a GPU cluster.</td></tr>
<tr><td><b>Research-ready</b></td><td>Batch trajectory generation, trajectory compression for training the next generation of tool-calling models.</td></tr>
<tr><td><b>Runs anywhere, not just your laptop</b></td><td>Seven terminal backends — local, Docker, SSH, Singularity, Modal, Daytona, and Vercel Sandbox. Daytona and Modal offer serverless persistence — your agent's environment hibernates when idle and wakes on demand, costing nearly nothing between sessions. Run it on a $5 VPS or a GPU cluster.</td></tr>
<tr><td><b>Research-ready</b></td><td>Batch trajectory generation, Atropos RL environments, trajectory compression for training the next generation of tool-calling models.</td></tr>
</table>
---
@@ -43,7 +43,7 @@ curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scri
Run this in PowerShell:
```powershell
iex (irm https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.ps1)
irm https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.ps1 | iex
```
The installer handles everything: uv, Python 3.11, Node.js, ripgrep, ffmpeg, **and a portable Git Bash** (MinGit, unpacked to `%LOCALAPPDATA%\hermes\git` — no admin required, completely isolated from any system Git install). Hermes uses this bundled Git Bash to run shell commands.
@@ -79,27 +79,6 @@ hermes doctor # Diagnose any issues
📖 **[Full documentation →](https://hermes-agent.nousresearch.com/docs/)**
---
## Skip the API-key collection — Nous Portal
Hermes works with whatever provider you want — that's not changing. But if you'd rather not collect five separate API keys for the model, web search, image generation, TTS, and a cloud browser, **[Nous Portal](https://portal.nousresearch.com)** covers all of them under one subscription:
- **300+ models** — pick any of them with `/model <name>`
- **Tool Gateway** — web search (Firecrawl), image generation (FAL), text-to-speech (OpenAI), cloud browser (Browser Use), all routed through your sub. No extra accounts.
One command from a fresh install:
```bash
hermes setup --portal
```
That logs you in via OAuth, sets Nous as your provider, and turns on the Tool Gateway. Check what's wired up any time with `hermes portal status`. Full details on the [Tool Gateway docs page](https://hermes-agent.nousresearch.com/docs/user-guide/features/tool-gateway).
You can still bring your own keys per-tool whenever you want — the gateway is per-backend, not all-or-nothing.
---
## CLI vs Messaging Quick Reference
Hermes has two entry points: start the terminal UI with `hermes`, or run the gateway and talk to it from Telegram, Discord, Slack, WhatsApp, Signal, or Email. Once you're in a conversation, many slash commands are shared across both interfaces.
@@ -196,6 +175,8 @@ uv pip install -e ".[all,dev]"
scripts/run_tests.sh
```
> **RL Training (optional):** The RL/Atropos integration (`environments/`) — see [`CONTRIBUTING.md`](https://github.com/NousResearch/hermes-agent/blob/main/CONTRIBUTING.md#development-setup) for the full setup.
---
## Community
@@ -203,7 +184,6 @@ scripts/run_tests.sh
- 💬 [Discord](https://discord.gg/NousResearch)
- 📚 [Skills Hub](https://agentskills.io)
- 🐛 [Issues](https://github.com/NousResearch/hermes-agent/issues)
- 🔌 [computer-use-linux](https://github.com/avifenesh/computer-use-linux) — Linux desktop-control MCP server for Hermes and other MCP hosts, with AT-SPI accessibility trees, Wayland/X11 input, screenshots, and compositor window targeting.
- 🔌 [HermesClaw](https://github.com/AaronWong1999/hermesclaw) — Community WeChat bridge: Run Hermes Agent and OpenClaw on the same WeChat account.
---

View File

@@ -23,7 +23,7 @@
<tr><td><b>定时自动化</b></td><td>内置 cron 调度器,支持向任何平台投递。日报、夜间备份、周审计——全部用自然语言描述,无人值守运行。</td></tr>
<tr><td><b>委派与并行</b></td><td>生成隔离子代理处理并行工作流。编写 Python 脚本通过 RPC 调用工具,将多步管道压缩为零上下文开销的轮次。</td></tr>
<tr><td><b>随处运行</b></td><td>六种终端后端——本地、Docker、SSH、Daytona、Singularity 和 Modal。Daytona 和 Modal 提供 Serverless 持久化——代理环境空闲时休眠、按需唤醒,空闲期间几乎零成本。$5 VPS 或 GPU 集群都能跑。</td></tr>
<tr><td><b>研究就绪</b></td><td>批量轨迹生成、轨迹压缩——用于训练下一代工具调用模型。</td></tr>
<tr><td><b>研究就绪</b></td><td>批量轨迹生成、Atropos RL 环境、轨迹压缩——用于训练下一代工具调用模型。</td></tr>
</table>
---
@@ -65,27 +65,6 @@ hermes doctor # 诊断问题
📖 **[完整文档 →](https://hermes-agent.nousresearch.com/docs/)**
---
## 省去到处收集 API Key — Nous Portal
Hermes 始终允许你使用任意服务商这点不会改变。但如果你不想为模型、网页搜索、图像生成、TTS、云浏览器分别去申请五个不同的 API Key**[Nous Portal](https://portal.nousresearch.com)** 用一个订阅就能覆盖全部:
- **300+ 模型** — 用 `/model <name>` 随时切换
- **Tool Gateway** — 网页搜索Firecrawl、图像生成FAL、文本转语音OpenAI、云浏览器Browser Use全部通过订阅托管。无需额外注册任何账户。
全新安装时一条命令即可:
```bash
hermes setup --portal
```
它会通过 OAuth 登录、把 Nous 设为推理服务商,并启用 Tool Gateway。随时用 `hermes portal status` 查看路由状态。完整说明见 [Tool Gateway 文档](https://hermes-agent.nousresearch.com/docs/user-guide/features/tool-gateway)。
你随时可以按工具单独切回自己的 API Key — Gateway 是按工具粒度生效的,不是一刀切。
---
## CLI 与消息平台 快速对照
Hermes 有两种入口:用 `hermes` 启动终端 UI或运行网关从 Telegram、Discord、Slack、WhatsApp、Signal 或 Email 与之对话。进入对话后,许多斜杠命令在两种界面中通用。
@@ -182,6 +161,12 @@ uv pip install -e ".[all,dev]"
python -m pytest tests/ -q
```
> **RL 训练(可选):** 如需参与 RL/Tinker-Atropos 集成开发:
> ```bash
> git submodule update --init tinker-atropos
> uv pip install -e "./tinker-atropos"
> ```
---
## 社区

View File

@@ -1,479 +0,0 @@
# Hermes Agent v0.14.0 (v2026.5.16)
**Release Date:** May 16, 2026
**Since v0.13.0:** 808 commits · 633 merged PRs · 1393 files changed · 165,061 insertions · 545 issues closed (12 P0, 50 P1) · 215 community contributors (including co-authors)
> The Foundation Release — Hermes installs and runs anywhere, ships with the things you actually want to use, and stops shipping the things you don't. xAI Grok lands as a SuperGrok OAuth provider with grok-4.3 bumped to a 1M context window. A new OpenAI-compatible local proxy turns any OAuth-authed Hermes provider — Claude Pro, ChatGPT Pro, SuperGrok — into an endpoint that Codex / Aider / Cline / Continue can hit. `x_search` lands as a first-class X (Twitter) search tool with OAuth-or-API-key auth. The Microsoft Teams stack is wired end-to-end (Graph auth + webhook listener + pipeline runtime + outbound delivery). A debloating wave makes installs dramatically lighter — heavyweight backends now lazy-install on first use, the `[all]` extras drop everything covered by lazy-deps, and a tiered install falls back when a wheel rejects on your platform. `pip install hermes-agent` works from PyPI. The cold-start wave shaves ~19 seconds off `hermes` launch. Browser CDP calls are 180x faster. Two new messaging platforms (LINE + SimpleX Chat) bring the total to 22. Cross-session 1-hour Claude prompt caching, `/handoff` that actually transfers sessions live, native button UI for `clarify` on Telegram and Discord, Discord channel history backfill, LSP semantic diagnostics on every write, a unified pluggable `video_generate`, a `computer_use` cua-driver backend that finally works with non-Anthropic providers, clickable URLs in any terminal, Zed ACP Registry integration via `uvx`, native Windows beta, 9 new optional skills, OpenRouter Pareto Code router, huggingface/skills as a trusted default tap. 12 P0 + 50 P1 closures.
---
## ✨ Highlights
- **xAI Grok via SuperGrok OAuth — and grok-4.3 jumps to a 1M context window** — If you pay for SuperGrok, you can now use Grok inside Hermes by signing in with your xAI account — no API key, no separate billing. The wire-through also bumps grok-4.3 to a 1M token context window, so you can drop whole codebases or research corpora into a single prompt. Includes proper handling for entitlement errors and an SSH-to-tunnel docs page for when you're SSH'd into a remote box and need to complete the OAuth flow. ([#26534](https://github.com/NousResearch/hermes-agent/pull/26534), [#26664](https://github.com/NousResearch/hermes-agent/pull/26664), [#26644](https://github.com/NousResearch/hermes-agent/pull/26644), [#26592](https://github.com/NousResearch/hermes-agent/pull/26592))
- **OpenAI-compatible local proxy for OAuth providers** — Run `hermes proxy` and you get a `http://localhost:port` endpoint that speaks the OpenAI API but is backed by whichever OAuth provider you're signed into — Claude Pro, ChatGPT Pro, SuperGrok. Now any tool that expects an OpenAI-compatible endpoint (Codex CLI, Aider, Cline, Continue, your custom scripts) just works with your existing subscription, no API key required. One subscription, every tool. ([#25969](https://github.com/NousResearch/hermes-agent/pull/25969))
- **`x_search` — first-class X (Twitter) search tool** — The agent can now search X directly without installing a skill or wiring up a custom integration. Search the timeline, find threads, surface specific posts — straight from the chat. Auth with either your X OAuth login or an API key, whichever you have. ([#26763](https://github.com/NousResearch/hermes-agent/pull/26763))
- **Microsoft Teams — end-to-end** — Hermes can now read messages from Teams and post back. The full Microsoft Graph stack lands together: auth + client foundation, a webhook listener that receives Teams events, a pipeline plugin runtime, and outbound delivery. Wire up the bot once, then chat to your agent from any Teams channel, DM, or group. (salvages of #21408#21411) ([#21922](https://github.com/NousResearch/hermes-agent/pull/21922), [#21969](https://github.com/NousResearch/hermes-agent/pull/21969), [#22007](https://github.com/NousResearch/hermes-agent/pull/22007), [#22024](https://github.com/NousResearch/hermes-agent/pull/22024))
- **Debloating wave — lighter installs, less you don't use** — A clean `pip install hermes-agent` used to pull down everything: every messaging adapter SDK, every image-gen SDK, every voice/TTS provider, whether you used them or not. Now those heavy backends (Slack / Matrix / Feishu / DingTalk adapters, hindsight client, codex app-server, Pixverse / Camofox / image-gen SDKs, voice/TTS providers) install automatically the first time you actually use them. The `[all]` extras drop everything covered by lazy-deps, the installer falls back through tiers when a wheel doesn't fit your platform, and a supply-chain advisory checker scans every install for unsafe versions. Faster installs, smaller disk footprint, fewer transitive vulnerabilities. ([#24220](https://github.com/NousResearch/hermes-agent/pull/24220), [#24515](https://github.com/NousResearch/hermes-agent/pull/24515), [#25014](https://github.com/NousResearch/hermes-agent/pull/25014), [#25038](https://github.com/NousResearch/hermes-agent/pull/25038), [#25766](https://github.com/NousResearch/hermes-agent/pull/25766), [#21818](https://github.com/NousResearch/hermes-agent/pull/21818))
- **`pip install hermes-agent && hermes`** — Hermes Agent is now a real PyPI package. No more cloning the repo or running shell installers — one pip command and you're running. The wheel ships with the Ink TUI bundle and the shell launcher, so the full experience comes out of the box. (salvage of [#26350](https://github.com/NousResearch/hermes-agent/pull/26350)) ([#26593](https://github.com/NousResearch/hermes-agent/pull/26593), [#26148](https://github.com/NousResearch/hermes-agent/pull/26148))
- **Cross-session 1h Claude prompt cache** — When you use Claude through Anthropic, OpenRouter, or Nous Portal, the prompt prefix (system prompt, skills, memory) now caches for an hour across sessions. Start a `/new` session and the first response comes back faster and cheaper because the cache is still warm from your last session. Background memory review hits the cache too, so it's not paying full price every turn. ([#23828](https://github.com/NousResearch/hermes-agent/pull/23828), [#25434](https://github.com/NousResearch/hermes-agent/pull/25434), [#24778](https://github.com/NousResearch/hermes-agent/pull/24778))
- **180x faster `browser_console` evaluations** — When the agent uses the browser tool to inspect a page or run JavaScript, those calls now share one persistent connection to Chrome instead of spinning up a new DevTools session every time. The difference is huge: things that used to take a couple of seconds per call return in milliseconds. Real-world page interactions feel instant. ([#23226](https://github.com/NousResearch/hermes-agent/pull/23226))
- **Cold-start performance wave — ~19 seconds off `hermes` launch** — Running `hermes` used to make you wait through a chunk of import overhead and network calls before you saw a prompt. Now the launch path is mostly deferred: heavy adapters only load when you use them, model catalogs come from disk cache first, doctor checks run in parallel, and `chat -q` skips the welcome banner entirely. The `hermes tools` All-Platforms screen alone dropped from 14 seconds to under 1.5 seconds. ([#22138](https://github.com/NousResearch/hermes-agent/pull/22138), [#22120](https://github.com/NousResearch/hermes-agent/pull/22120), [#22681](https://github.com/NousResearch/hermes-agent/pull/22681), [#22790](https://github.com/NousResearch/hermes-agent/pull/22790), [#22808](https://github.com/NousResearch/hermes-agent/pull/22808), [#22831](https://github.com/NousResearch/hermes-agent/pull/22831), [#22859](https://github.com/NousResearch/hermes-agent/pull/22859), [#22904](https://github.com/NousResearch/hermes-agent/pull/22904), [#22766](https://github.com/NousResearch/hermes-agent/pull/22766), [#25341](https://github.com/NousResearch/hermes-agent/pull/25341))
- **Two new messaging platforms — LINE + SimpleX Chat** — LINE is huge in Japan, Korea, and Taiwan, and now Hermes runs natively on the LINE Messaging API. SimpleX Chat is the privacy-focused decentralized messenger with no user IDs — also wired up as a first-class platform. That brings Hermes to 22 messaging platforms total, so wherever you and your team chat, the agent can be there. ([#23197](https://github.com/NousResearch/hermes-agent/pull/23197), [#26232](https://github.com/NousResearch/hermes-agent/pull/26232))
- **`/handoff` actually transfers the session live** — Switching models or personalities mid-conversation used to mean losing context or starting over. Now `/handoff` moves your active session — every message, every tool call, every piece of context — to the target model, persona, or profile, live, without dropping anything. Mid-debugging hand off from a fast model to a deep-reasoning one, or pass a session between profiles for different parts of a task. ([#23395](https://github.com/NousResearch/hermes-agent/pull/23395))
- **Native button UI for `clarify` on Telegram and Discord** — When the agent uses the `clarify` tool to ask you a multiple-choice question, it now shows real platform-native buttons on Telegram and Discord instead of asking you to type back the option number. Tap the button, the agent gets your answer. Especially nice on mobile. ([#24199](https://github.com/NousResearch/hermes-agent/pull/24199), [#25485](https://github.com/NousResearch/hermes-agent/pull/25485))
- **Discord channel history backfill (default on)** — When Hermes joins a Discord channel or thread for the first time, it now reads the recent message history so it knows what's been said before it responds. No more "what are we talking about?" — the agent has the context that's already on screen for everyone else. ([#25984](https://github.com/NousResearch/hermes-agent/pull/25984))
- **`vision_analyze` returns pixels to vision-capable models** — When you point the agent at an image with `vision_analyze` and the active model can actually see (GPT-5, Claude, Gemini, Grok-vision), Hermes now passes the raw pixels straight to the model instead of converting them to a text description first. You get the model's actual visual reasoning instead of a degraded text-summary round-trip. ([#22955](https://github.com/NousResearch/hermes-agent/pull/22955))
- **Per-turn file-mutation verifier footer** — After every turn that wrote or edited files, the agent now gets a short footer summarizing exactly what changed on disk — the file paths, the line counts, the actual delta. That means the agent catches its own mistakes when a write didn't land or got silently overwritten, instead of confidently telling you "I added the function" when the file wasn't actually saved. ([#24498](https://github.com/NousResearch/hermes-agent/pull/24498))
- **LSP semantic diagnostics on every write** — When the agent uses `write_file` or `patch`, Hermes now runs a real language server against the edited file and surfaces any new errors back to the agent before the next turn. Type errors, undefined symbols, missing imports — caught immediately. Goes way beyond v0.13.0's basic Python/JSON/YAML/TOML linting because it's actual semantic analysis. ([#24168](https://github.com/NousResearch/hermes-agent/pull/24168), [#25978](https://github.com/NousResearch/hermes-agent/pull/25978))
- **Unified `video_generate` with pluggable provider backends** — One tool, any video model. Hermes ships with the obvious backends already, but you can drop in a new video provider as a plugin without touching core. So when a new video model lands next month, it can be a one-file plugin instead of a fork. ([#25126](https://github.com/NousResearch/hermes-agent/pull/25126))
- **`computer_use` cua-driver backend — works with non-Anthropic models now** — Computer-use (the agent controlling your mouse and keyboard to drive GUI apps) used to be locked to Anthropic's SDK. The new cua-driver backend works with non-Anthropic providers too, has proper focus-safe operations, and refreshes itself on `hermes update`. Now any vision-capable model can drive your desktop. (re-salvage of #16936) ([#21967](https://github.com/NousResearch/hermes-agent/pull/21967), [#24063](https://github.com/NousResearch/hermes-agent/pull/24063))
- **Clickable URLs in any terminal** — Links in agent output are now real OSC8 hyperlinks with hover-highlight in any terminal that supports them. Click to open in your browser — no more copy-paste-trim of long URLs from the transcript. Just works in iTerm2, Kitty, Ghostty, modern Windows Terminal, etc. (@OutThisLife) ([#25071](https://github.com/NousResearch/hermes-agent/pull/25071), [#24013](https://github.com/NousResearch/hermes-agent/pull/24013))
- **Zed ACP Registry — `uvx` install in one click** — Hermes is now listed in Zed's Agent Client Protocol registry, so Zed users can install it with one click. The install path uses `uvx` so there's no npm dependency. `hermes acp --setup-browser` bootstraps the browser tools for registry-driven installs. (salvage of [#25908](https://github.com/NousResearch/hermes-agent/pull/25908)) ([#26079](https://github.com/NousResearch/hermes-agent/pull/26079), [#26120](https://github.com/NousResearch/hermes-agent/pull/26120), [#26234](https://github.com/NousResearch/hermes-agent/pull/26234))
- **OpenRouter Pareto Code router with `min_coding_score` knob** — OpenRouter's "Pareto" router automatically picks the cheapest model that meets a minimum quality bar. The new `min_coding_score` config lets you set that bar for coding tasks specifically — Hermes routes to the most affordable model that's at least that good at code. Stop paying for top-tier models when a mid-tier one would do. ([#22838](https://github.com/NousResearch/hermes-agent/pull/22838))
- **NovitaAI as a new model provider** — NovitaAI joins the provider lineup, giving you another option for open-source model hosting (Llama, Qwen, DeepSeek, etc.) with their pricing and rate limits. (salvage #7219) (@kshitijk4poor) ([#25507](https://github.com/NousResearch/hermes-agent/pull/25507))
- **Codex app-server runtime for OpenAI/Codex models** — An optional runtime that drives OpenAI's Codex CLI under the hood when you're using OpenAI or Codex paths. You get session reuse, automatic retirement of wedged sessions, and proper OAuth refresh classification — the kind of plumbing that makes long agentic runs not fall over. ([#24182](https://github.com/NousResearch/hermes-agent/pull/24182), [#25769](https://github.com/NousResearch/hermes-agent/pull/25769))
- **`huggingface/skills` as a trusted default tap** — The community skills index hosted at huggingface.co/skills is now wired into the Skills Hub by default. So when somebody publishes a useful skill there, you can install it from your own `hermes skills` browser without any extra config. (closes #2549) ([#26219](https://github.com/NousResearch/hermes-agent/pull/26219))
- **9 new optional skills** — Hyperliquid (perp + spot trading via the SDK and REST API), Yahoo Finance (live market data, fundamentals, historicals), api-testing (REST + GraphQL debug recipes), unified EVM multi-chain (one skill covers Ethereum + L2s + Base), darwinian-evolver (evolutionary prompt/skill tuning), osint-investigation (OSINT recipes for people / domains / orgs), pinggy-tunnel (expose local services to the public internet), watchers (polls RSS / HTTP JSON / GitHub via cron `no_agent` mode for change detection), and a full Notion overhaul for the May 2026 Developer Platform. ([#23582](https://github.com/NousResearch/hermes-agent/pull/23582), [#23583](https://github.com/NousResearch/hermes-agent/pull/23583), [#23590](https://github.com/NousResearch/hermes-agent/pull/23590), [#25299](https://github.com/NousResearch/hermes-agent/pull/25299), [#26760](https://github.com/NousResearch/hermes-agent/pull/26760), [#26729](https://github.com/NousResearch/hermes-agent/pull/26729), [#26765](https://github.com/NousResearch/hermes-agent/pull/26765), [#21881](https://github.com/NousResearch/hermes-agent/pull/21881), [#26612](https://github.com/NousResearch/hermes-agent/pull/26612))
- **API server exposes run approval events** — If you're driving Hermes programmatically through the HTTP API, long-running runs no longer silently hang when the agent hits an approval-required command. The approval request now surfaces on the API stream so your client can prompt the user and reply — no more silent stalls. (salvage of [#20311](https://github.com/NousResearch/hermes-agent/pull/20311)) ([#21899](https://github.com/NousResearch/hermes-agent/pull/21899))
- **Plugins can run any LLM call via `ctx.llm` + replace built-in tools via `tool_override`** — If you're writing a Hermes plugin, you now get first-class access to make LLM calls through the active provider and credentials — no manual client wiring. The new `tool_override` flag lets a plugin swap out a built-in tool with its own implementation cleanly. Plugin authors get the same model-routing and auth plumbing the core agent uses. (closes #11049) ([#23194](https://github.com/NousResearch/hermes-agent/pull/23194), [#26759](https://github.com/NousResearch/hermes-agent/pull/26759))
- **Brave Search (free tier) + DuckDuckGo (DDGS) as web-search providers** — Two new free web-search backends join Tavily, SearXNG, and Exa. Brave Search has a generous free tier; DDGS is the DuckDuckGo scraper that needs no key at all. Pick whichever fits your budget and rate-limit needs. ([#21337](https://github.com/NousResearch/hermes-agent/pull/21337))
- **Sudo brute-force block + 3 dangerous-command bypasses closed + tool-error sanitization** — The approval gate now blocks `sudo -S` brute-force attempts and classifies stdin-fed or askpass-stripped sudo invocations as DANGEROUS. Three known bypasses of dangerous-command detection are closed (inspired by Claude Code's command-detection work). And tool error strings are now sanitized before being re-injected into the model context, so a malicious file or remote service can't pass instructions to your agent through error output. ([#23736](https://github.com/NousResearch/hermes-agent/pull/23736), [#26829](https://github.com/NousResearch/hermes-agent/pull/26829), [#26823](https://github.com/NousResearch/hermes-agent/pull/26823))
- **`/subgoal` — user-added criteria appended to an active `/goal`** — When you've got a `/goal` running (the persistent Ralph-loop goal where the agent keeps going until criteria are met), you can now use `/subgoal <text>` to layer extra success criteria onto it mid-run. The judge factors your new criteria into the done-or-keep-going decision without restarting the loop. ([#25449](https://github.com/NousResearch/hermes-agent/pull/25449))
- **Provider rename — Alibaba Cloud → Qwen Cloud** — The Alibaba Cloud provider is renamed to Qwen Cloud in the picker and config to match what the rest of the world calls it. Existing config keys still work — no breaking changes — but the UI matches the actual brand now. ([#24835](https://github.com/NousResearch/hermes-agent/pull/24835))
- **Native Windows support (early beta)** — Hermes now runs natively on `cmd.exe` and PowerShell without WSL. A full PowerShell installer handles MinGit auto-install, Microsoft Store python stub detection, and the foreground Ctrl+C dance. There's still rough edges (this is the "early beta" stamp) — ~40 follow-up Windows-only fixes already landed in the window — but the basic loop works end-to-end on a clean Windows box. ([#21561](https://github.com/NousResearch/hermes-agent/pull/21561))
---
## 🪟 Windows — Native Support (Early Beta)
### Bootstrap & installer
- **Native Windows support (early beta)** — first-class native Windows path across CLI / gateway / TUI / tools ([#21561](https://github.com/NousResearch/hermes-agent/pull/21561))
- **PyPI wheel packaging — `pip install hermes-agent && hermes`** (salvage of #26350) ([#26593](https://github.com/NousResearch/hermes-agent/pull/26593))
- **Recognise Shift+Enter as a newline key** + Windows docs (salvage #21545) ([#22130](https://github.com/NousResearch/hermes-agent/pull/22130))
- **Preserve Ctrl+C for Windows foreground runs** (@helix4u) ([#22752](https://github.com/NousResearch/hermes-agent/pull/22752))
- **Stop spamming cwd-missing + tirith-spawn warnings on every terminal call** ([#26618](https://github.com/NousResearch/hermes-agent/pull/26618))
- **Use `--extra all` not `--all-extras`; drop lazy-covered extras from `[all]`** ([#24515](https://github.com/NousResearch/hermes-agent/pull/24515))
### Windows-specific fixes (40+ across cli / tools / gateway / curator / TUI)
A long tail of native-Windows fixes shipped alongside the beta — taskkill-based subprocess management, MinGit auto-install, Microsoft Store python stub detection, npm prefix handling, native PTY paths, signal handling differences, foreground process management, ANSI sequence handling, path normalization, file-locking semantics, and many more. Full list in commit log under `fix(windows)` / `feat(windows)` / `windows`.
---
## 🚀 Performance Wave
### Cold start
- **Cut ~19s from `hermes` cold start** — skills cache + lazy Feishu + no Nous HTTP at startup ([#22138](https://github.com/NousResearch/hermes-agent/pull/22138))
- **Skip eager plugin discovery on known built-in subcommands** ([#22120](https://github.com/NousResearch/hermes-agent/pull/22120))
- **Cache Nous auth + .env loads** — `hermes tools` All Platforms from 14s to <1.5s ([#25341](https://github.com/NousResearch/hermes-agent/pull/25341))
- **Skip welcome banner on `chat -q` single-query mode** ([#22904](https://github.com/NousResearch/hermes-agent/pull/22904))
- **Defer heavy google-cloud imports in google_chat to first adapter use** ([#22681](https://github.com/NousResearch/hermes-agent/pull/22681))
- **Defer QQAdapter and YuanbaoAdapter imports via PEP 562** ([#22790](https://github.com/NousResearch/hermes-agent/pull/22790))
- **Defer httpx import in teams to first webhook call** ([#22831](https://github.com/NousResearch/hermes-agent/pull/22831))
- **Defer fal_client import to first generation request** ([#22859](https://github.com/NousResearch/hermes-agent/pull/22859))
- **models.dev cache-first lookup, skip network when disk cache is fresh** ([#22808](https://github.com/NousResearch/hermes-agent/pull/22808))
- **Parallelize API connectivity checks in `hermes doctor` and disable IMDS** ([#22766](https://github.com/NousResearch/hermes-agent/pull/22766))
### Runtime
- **180x faster `browser_console` evaluations** — route through supervisor's persistent CDP WebSocket ([#23226](https://github.com/NousResearch/hermes-agent/pull/23226))
- **Tune Telegram cadence + adaptive fast-path for short replies** (salvage of #10388) ([#23587](https://github.com/NousResearch/hermes-agent/pull/23587))
- **Accumulate length-continuation prefix via list+join** ([#26237](https://github.com/NousResearch/hermes-agent/pull/26237))
### Prompt caching
- **Cross-session 1h prefix cache for Claude on Anthropic / OpenRouter / Nous Portal** ([#23828](https://github.com/NousResearch/hermes-agent/pull/23828))
- **Hit prefix cache in background review fork** (salvage #17276 + #25427) ([#25434](https://github.com/NousResearch/hermes-agent/pull/25434))
---
## 📦 Installation & Distribution
### PyPI + supply-chain
- **PyPI wheel packaging — `pip install hermes-agent && hermes`** (salvage of #26350) ([#26593](https://github.com/NousResearch/hermes-agent/pull/26593))
- **Supply-chain advisory checker + lazy-install framework + tiered install fallback** ([#24220](https://github.com/NousResearch/hermes-agent/pull/24220))
- **Use `--extra all` not `--all-extras`; drop lazy-covered extras from `[all]`** ([#24515](https://github.com/NousResearch/hermes-agent/pull/24515))
- **Skip browser download when system chromium exists** (@helix4u) ([#25317](https://github.com/NousResearch/hermes-agent/pull/25317))
### Nix
- **`extraDependencyGroups` for sealed venv extras** (@alt-glitch) ([#21817](https://github.com/NousResearch/hermes-agent/pull/21817))
- **Refresh npm lockfile hashes** — keeps Nix flake builds reproducible
### Docker
- **Bootstrap auth.json from env on first boot** ([#21880](https://github.com/NousResearch/hermes-agent/pull/21880))
- **Drop manual @hermes/ink build, rely on esbuild bundle** — slimmer image
### ACP / Zed
- **Zed ACP Registry integration** (salvage of #25908) ([#26079](https://github.com/NousResearch/hermes-agent/pull/26079))
- **Switch to uvx distribution, drop npm launcher** ([#26120](https://github.com/NousResearch/hermes-agent/pull/26120))
- **`hermes acp --setup-browser` bootstraps browser tools for registry installs** ([#26234](https://github.com/NousResearch/hermes-agent/pull/26234))
---
## 🏗️ Core Agent & Architecture
### Sessions & handoff
- **`/handoff` actually transfers the session live** ([#23395](https://github.com/NousResearch/hermes-agent/pull/23395))
- **Expose `HERMES_SESSION_ID` env var to agent tools** (@alt-glitch) ([#23847](https://github.com/NousResearch/hermes-agent/pull/23847))
### Goals (Ralph loop)
- **`/subgoal` — user-added criteria appended to active `/goal`** ([#25449](https://github.com/NousResearch/hermes-agent/pull/25449))
- **`/goal` checklist + /subgoal user controls** ([#23456](https://github.com/NousResearch/hermes-agent/pull/23456)) — rolled back in window ([#23813](https://github.com/NousResearch/hermes-agent/pull/23813)); /subgoal returned in simpler form via #25449
### Compression
- **Make `protect_first_n` configurable** ([#25447](https://github.com/NousResearch/hermes-agent/pull/25447))
### Verification
- **Per-turn file-mutation verifier footer** ([#24498](https://github.com/NousResearch/hermes-agent/pull/24498))
### Stream retry
- **Log inner cause, upstream headers, bytes/elapsed on every drop** ([#23005](https://github.com/NousResearch/hermes-agent/pull/23005))
---
## 🤖 Models & Providers
### New providers
- **xAI Grok OAuth (SuperGrok Subscription) provider** ([#26534](https://github.com/NousResearch/hermes-agent/pull/26534))
- **NovitaAI provider** (salvage #7219) (@kshitijk4poor) ([#25507](https://github.com/NousResearch/hermes-agent/pull/25507))
- **NVIDIA NIM billing origin header** (salvage #25211) ([#26585](https://github.com/NousResearch/hermes-agent/pull/26585))
### Provider work
- **OpenRouter Pareto Code router with `min_coding_score` knob** ([#22838](https://github.com/NousResearch/hermes-agent/pull/22838))
- **Optional codex app-server runtime for OpenAI/Codex models** ([#24182](https://github.com/NousResearch/hermes-agent/pull/24182))
- **Codex-runtime: retire wedged sessions + post-tool watchdog + OAuth refresh classify** ([#25769](https://github.com/NousResearch/hermes-agent/pull/25769))
- **Codex-runtime: skip unavailable plugins during migration** ([#25437](https://github.com/NousResearch/hermes-agent/pull/25437))
- **Codex-runtime: de-dup `[plugins.X]` tables and stop leaking HERMES_HOME into config.toml** (#26250) (@kshitijk4poor) ([#26260](https://github.com/NousResearch/hermes-agent/pull/26260))
- **Pass `reasoning.effort` to xAI Responses API** ([#22807](https://github.com/NousResearch/hermes-agent/pull/22807))
- **Custom provider: prompt and persist explicit `api_mode`** ([#25068](https://github.com/NousResearch/hermes-agent/pull/25068))
- **Rename Alibaba Cloud → Qwen Cloud, reorder picker** ([#24835](https://github.com/NousResearch/hermes-agent/pull/24835))
- **Restore gpt-5.3-codex-spark for ChatGPT Pro** (salvage #18286 + #19530, fixes #16172) (@kshitijk4poor) ([#22991](https://github.com/NousResearch/hermes-agent/pull/22991))
- **Inject tool-use enforcement for GLM models** ([#24715](https://github.com/NousResearch/hermes-agent/pull/24715))
- **Use Nous Portal as model metadata authority** (@rob-maron) ([#24502](https://github.com/NousResearch/hermes-agent/pull/24502))
- **Unified `client=hermes-client-v<version>` tag on every Portal request** ([#24779](https://github.com/NousResearch/hermes-agent/pull/24779))
- **Prevent stale Ollama credentials after provider switch** (@kshitijk4poor) ([#21703](https://github.com/NousResearch/hermes-agent/pull/21703))
- **Auxiliary client: rotate pooled auth after quota failures** (salvage #22779) ([#22792](https://github.com/NousResearch/hermes-agent/pull/22792))
- **Auxiliary client: skip providers without credentials immediately** (#25395) ([#25487](https://github.com/NousResearch/hermes-agent/pull/25487))
- **Auth: send Nous refresh token via header** (@shannonsands) ([#21578](https://github.com/NousResearch/hermes-agent/pull/21578))
- **MiniMax: harden OAuth dashboard and runtime** ([#24165](https://github.com/NousResearch/hermes-agent/pull/24165))
### OpenAI-compatible proxy
- **Local OpenAI-compatible proxy for OAuth providers** — Codex / Aider / Cline can hit Claude Pro, ChatGPT Pro, SuperGrok ([#25969](https://github.com/NousResearch/hermes-agent/pull/25969))
---
## 📱 Messaging Platforms (Gateway)
### New platforms
- **LINE Messaging API platform plugin** ([#23197](https://github.com/NousResearch/hermes-agent/pull/23197))
- **SimpleX Chat platform plugin** (salvages #2558) ([#26232](https://github.com/NousResearch/hermes-agent/pull/26232))
### Microsoft Graph foundation
- **msgraph: add auth and client foundation** (salvage of #21408) ([#21922](https://github.com/NousResearch/hermes-agent/pull/21922))
- **msgraph: add webhook listener platform** (salvage of #21409) ([#21969](https://github.com/NousResearch/hermes-agent/pull/21969))
- **teams-pipeline: add plugin runtime and operator cli** (salvage of #21410) ([#22007](https://github.com/NousResearch/hermes-agent/pull/22007))
- **teams: add pipeline outbound delivery via existing adapter** (salvage of #21411) ([#22024](https://github.com/NousResearch/hermes-agent/pull/22024))
### Cross-platform
- **Per-platform admin/user split for slash commands** (salvage of #4443) ([#23373](https://github.com/NousResearch/hermes-agent/pull/23373))
- **Forensics on signal handling — non-blocking diag, per-phase timing, stale-unit warning** ([#23285](https://github.com/NousResearch/hermes-agent/pull/23285))
- **Keep gateway running when platforms fail; add per-platform circuit breaker + `/platform`** ([#26600](https://github.com/NousResearch/hermes-agent/pull/26600))
- **Wire `clarify` tool with inline keyboard buttons on Telegram** ([#24199](https://github.com/NousResearch/hermes-agent/pull/24199))
- **Add `chat_id` to `hook_ctx` for message source tracking** ([#24710](https://github.com/NousResearch/hermes-agent/pull/24710))
### Telegram
- **Native draft streaming via `sendMessageDraft` (Bot API 9.5+)** (salvage of #3412) ([#23512](https://github.com/NousResearch/hermes-agent/pull/23512))
- **Stream Telegram edits safely** — salvage of #22264 (@kshitijk4poor) ([#22518](https://github.com/NousResearch/hermes-agent/pull/22518))
- **Telegram notification mode** (salvage #22772) ([#22793](https://github.com/NousResearch/hermes-agent/pull/22793))
- **Telegram guest mention mode** (@kshitijk4poor) ([#22759](https://github.com/NousResearch/hermes-agent/pull/22759))
- **Split-and-deliver oversized edits instead of silent truncation** (salvage of #19537) ([#23576](https://github.com/NousResearch/hermes-agent/pull/23576))
- **Preserve DM topic routing via reply fallback** (salvage #22053) (@kshitijk4poor) ([#22410](https://github.com/NousResearch/hermes-agent/pull/22410))
- **Pass `source.thread_id` explicitly on auto-reset notice** (carve-out of #7404) ([#23440](https://github.com/NousResearch/hermes-agent/pull/23440))
### Discord
- **Render clarify choices as buttons** ([#25485](https://github.com/NousResearch/hermes-agent/pull/25485))
- **Channel history backfill — default on, broadened scope** ([#25984](https://github.com/NousResearch/hermes-agent/pull/25984))
- **`thread_require_mention` for multi-bot threads** (salvage #25313) ([#25445](https://github.com/NousResearch/hermes-agent/pull/25445))
### Slack
- **Support `!cmd` as alternate prefix for slash commands in threads** ([#25355](https://github.com/NousResearch/hermes-agent/pull/25355))
### WhatsApp
- **Surface quoted reply metadata from Baileys** (#25398) ([#25489](https://github.com/NousResearch/hermes-agent/pull/25489))
### Feishu / Google Chat / others
- **Feishu: native update prompt cards** (@kshitijk4poor) ([#22448](https://github.com/NousResearch/hermes-agent/pull/22448))
- **Google Chat: repair setup prompt imports** (@helix4u) ([#22038](https://github.com/NousResearch/hermes-agent/pull/22038))
- **Google Chat: honor relay-declared sender_type** (salvage of #22107) (@kshitijk4poor) ([#22432](https://github.com/NousResearch/hermes-agent/pull/22432))
- **LINE: use `build_source` instead of nonexistent `create_source`** ([#24717](https://github.com/NousResearch/hermes-agent/pull/24717))
- **Add `weixin, and more` to gateway docs** (salvage of #21063 by @wuwuzhijing)
---
## 🖥️ CLI & TUI
### CLI
- **Show YOLO mode warning in banner and status bar** ([#26238](https://github.com/NousResearch/hermes-agent/pull/26238))
- **Confirm prompt for destructive slash commands** (#4069) ([#22687](https://github.com/NousResearch/hermes-agent/pull/22687))
- **`docker_extra_args` + `display.timestamps`** ([#23599](https://github.com/NousResearch/hermes-agent/pull/23599))
- **Delegate tool: show user's actual concurrency / spawn-depth limits in description** ([#22694](https://github.com/NousResearch/hermes-agent/pull/22694))
### TUI
- **`/sessions` slash command for browsing and resuming previous sessions** (@austinpickett) ([#20805](https://github.com/NousResearch/hermes-agent/pull/20805))
- **Segment turns with rule above non-first user msgs; trim ticker dead space** (@OutThisLife) ([#21846](https://github.com/NousResearch/hermes-agent/pull/21846))
- **Support attaching to an existing gateway** (@OutThisLife) ([#21978](https://github.com/NousResearch/hermes-agent/pull/21978))
- **Resolve markdown links to readable page titles** (@OutThisLife) ([#24013](https://github.com/NousResearch/hermes-agent/pull/24013))
- **Width-aware markdown table rendering with vertical fallback** (@alt-glitch) ([#26195](https://github.com/NousResearch/hermes-agent/pull/26195))
- **Keep Ink displayCursor in sync with fast-echo writes so cursor stops drifting** (@OutThisLife) ([#26717](https://github.com/NousResearch/hermes-agent/pull/26717))
- **Allow transcript scroll + Esc during approval/clarify/confirm prompts** (@OutThisLife) ([#26414](https://github.com/NousResearch/hermes-agent/pull/26414))
- **Preserve session when switching personality** (@austinpickett) ([#20942](https://github.com/NousResearch/hermes-agent/pull/20942))
- **Skip native safety net on OSC52-capable terminals** (@benbarclay) ([#20954](https://github.com/NousResearch/hermes-agent/pull/20954))
### Dashboard / GUI
- **Route embedded TUI through dashboard gateway** (@OutThisLife) ([#21979](https://github.com/NousResearch/hermes-agent/pull/21979))
- **Hide token/cost analytics behind config flag (default off)** ([#25438](https://github.com/NousResearch/hermes-agent/pull/25438))
- **Fix Langfuse observability — trace I/O, tool outputs, placeholder credentials** (closes #22342, #22763) (@kshitijk4poor) ([#26320](https://github.com/NousResearch/hermes-agent/pull/26320))
- **MiniMax 'Login' button launched Claude OAuth** (salvage #22849) ([#24058](https://github.com/NousResearch/hermes-agent/pull/24058))
- **Update cron modals** (@austinpickett) ([#25985](https://github.com/NousResearch/hermes-agent/pull/25985))
- **Analytics: prevent silent token loss and add Claude 4.54.7 pricing** (@austinpickett) ([#21455](https://github.com/NousResearch/hermes-agent/pull/21455))
---
## 🔧 Tools & Capabilities
### Vision & video
- **`vision_analyze` returns pixels to vision-capable models** ([#22955](https://github.com/NousResearch/hermes-agent/pull/22955))
- **Unified `video_generate` with pluggable provider backends** ([#25126](https://github.com/NousResearch/hermes-agent/pull/25126))
- **`image_gen`: actionable setup message when no FAL backend is reachable** ([#26222](https://github.com/NousResearch/hermes-agent/pull/26222))
### Computer use
- **`computer_use` cua-driver backend + focus-safe ops + non-Anthropic provider fix** (re-salvage #16936) ([#21967](https://github.com/NousResearch/hermes-agent/pull/21967))
- **Refresh cua-driver on `hermes update` + add `install --upgrade`** ([#24063](https://github.com/NousResearch/hermes-agent/pull/24063))
### LSP & write-time diagnostics
- **Semantic diagnostics from real language servers in `write_file`/`patch`** ([#24168](https://github.com/NousResearch/hermes-agent/pull/24168))
- **Shift baseline diagnostics into post-edit coordinates** ([#25978](https://github.com/NousResearch/hermes-agent/pull/25978))
### Search & web
- **Brave Search (free tier) and DDGS search providers** ([#21337](https://github.com/NousResearch/hermes-agent/pull/21337))
- **Bearer auth header for Tavily `/crawl` endpoint** ([#24658](https://github.com/NousResearch/hermes-agent/pull/24658))
### X (Twitter)
- **Gated `x_search` tool with OAuth-or-API-key auth** ([#26763](https://github.com/NousResearch/hermes-agent/pull/26763))
### Browser
- **Route `browser_console` eval through supervisor's persistent CDP WS (180x faster)** ([#23226](https://github.com/NousResearch/hermes-agent/pull/23226))
- **Support externally managed Camofox sessions** ([#24499](https://github.com/NousResearch/hermes-agent/pull/24499))
### MCP
- **`supports_parallel_tool_calls` for MCP servers** (salvage of #9944) ([#26825](https://github.com/NousResearch/hermes-agent/pull/26825))
- **Codex preset for Codex CLI MCP server** (salvage #22663) ([#22679](https://github.com/NousResearch/hermes-agent/pull/22679))
- **Stop retrying initial MCP auth failures** (#25624) ([#25776](https://github.com/NousResearch/hermes-agent/pull/25776))
### Google Workspace
- **Drive write ops + Docs/Sheets create/append** ([#21895](https://github.com/NousResearch/hermes-agent/pull/21895))
### Per-turn verifier
- **Per-turn file-mutation verifier footer** ([#24498](https://github.com/NousResearch/hermes-agent/pull/24498))
---
## 🧩 Kanban (Multi-Agent)
- **`specify` — auxiliary LLM fleshes out triage tasks** ([#21435](https://github.com/NousResearch/hermes-agent/pull/21435))
- **Orchestrator board tools — `kanban_list` + `kanban_unblock`** (carve-out of #20568) ([#23012](https://github.com/NousResearch/hermes-agent/pull/23012))
- **`stranded_in_ready` diagnostic for unclaimed tasks** ([#23578](https://github.com/NousResearch/hermes-agent/pull/23578))
- **Dashboard batch QOL upgrade** (salvage of #23240) ([#23550](https://github.com/NousResearch/hermes-agent/pull/23550))
- **Tooltips and docs link across dashboard** ([#21541](https://github.com/NousResearch/hermes-agent/pull/21541))
- **Dedupe notifier delivery via atomic claim + rewind on failure** (salvage #22558) ([#23401](https://github.com/NousResearch/hermes-agent/pull/23401))
- **Keep notifier subscriptions alive across retry cycles** (salvage #21398) ([#23423](https://github.com/NousResearch/hermes-agent/pull/23423))
- **Drop caller-controlled author override in `kanban_comment`** (salvage of #22109) (@kshitijk4poor) ([#22435](https://github.com/NousResearch/hermes-agent/pull/22435))
- **Sanitize comment author rendering in `build_worker_context`** ([#22769](https://github.com/NousResearch/hermes-agent/pull/22769))
---
## 🧠 Plugins & Extension
### Plugin surface
- **Run any LLM call from inside a plugin via `ctx.llm`** ([#23194](https://github.com/NousResearch/hermes-agent/pull/23194))
- **`tool_override` flag for replacing built-in tools** (closes #11049) ([#26759](https://github.com/NousResearch/hermes-agent/pull/26759))
- **`standalone_sender_fn` for out-of-process cron delivery** (@kshitijk4poor) ([#22461](https://github.com/NousResearch/hermes-agent/pull/22461))
- **`HERMES_PLUGINS_DEBUG=1` surfaces plugin discovery logs** ([#22684](https://github.com/NousResearch/hermes-agent/pull/22684))
- **Hindsight-client as optional dependency** (@alt-glitch) ([#21818](https://github.com/NousResearch/hermes-agent/pull/21818))
### Profile & distribution
- **Shareable profile distributions via git** ([#20831](https://github.com/NousResearch/hermes-agent/pull/20831))
---
## ⏰ Cron
- **Routing intent — `deliver=all` fans out to every connected channel** ([#21495](https://github.com/NousResearch/hermes-agent/pull/21495))
- **Support name-based lookup for job operations** ([#26231](https://github.com/NousResearch/hermes-agent/pull/26231))
- **Blank Cron dashboard tab + partial-record crashes** (salvage #21042 + #22330) (@kshitijk4poor) ([#22389](https://github.com/NousResearch/hermes-agent/pull/22389))
- **Do not seed `HERMES_SESSION_*` contextvars from cron origin** (salvage of #22356) (@kshitijk4poor) ([#22382](https://github.com/NousResearch/hermes-agent/pull/22382))
- **Scan assembled prompt including skill content for prompt injection** (#3968)
---
## 🧩 Skills Ecosystem
### Skills Hub
- **`hermes-skills/huggingface` as a trusted default tap** (closes #2549) ([#26219](https://github.com/NousResearch/hermes-agent/pull/26219))
- **Show per-skill pages in the left sidebar** ([#26646](https://github.com/NousResearch/hermes-agent/pull/26646))
- **Richer info panels on the Skills Hub** ([#22905](https://github.com/NousResearch/hermes-agent/pull/22905))
- **Refuse `skill_view` name collisions instead of guessing** (closes #6136 @polkn)
### Curator
- **Show rename map in user-visible summary** ([#22910](https://github.com/NousResearch/hermes-agent/pull/22910))
- **Hint at `hermes curator pin` in the rename block** ([#23212](https://github.com/NousResearch/hermes-agent/pull/23212))
### New optional skills
- **Hyperliquid** — perp/spot trading via SDK + REST (salvage of #1952) ([#23583](https://github.com/NousResearch/hermes-agent/pull/23583))
- **Yahoo Finance** market data ([#23590](https://github.com/NousResearch/hermes-agent/pull/23590))
- **api-testing** (REST/GraphQL debug, salvages #1800) ([#23582](https://github.com/NousResearch/hermes-agent/pull/23582))
- **Unified EVM multi-chain skill** (salvages #25291 + #2010 + folds in base/) ([#25299](https://github.com/NousResearch/hermes-agent/pull/25299))
- **darwinian-evolver** ([#26760](https://github.com/NousResearch/hermes-agent/pull/26760))
- **osint-investigation** (closes #355) ([#26729](https://github.com/NousResearch/hermes-agent/pull/26729))
- **pinggy-tunnel** ([#26765](https://github.com/NousResearch/hermes-agent/pull/26765))
- **watchers** — RSS / HTTP JSON / GitHub polling via cron no-agent ([#21881](https://github.com/NousResearch/hermes-agent/pull/21881))
- **Notion overhaul for the Developer Platform** (May 2026) ([#26612](https://github.com/NousResearch/hermes-agent/pull/26612))
---
## 🔒 Security & Reliability
### Security hardening
- **Sudo brute-force block + sudo-stdin/askpass DANGEROUS** (salvage of #22194 + #21128) (@kshitijk4poor) ([#23736](https://github.com/NousResearch/hermes-agent/pull/23736))
- **Drop caller-controlled author override in `kanban_comment`** (salvage of #22109) (@kshitijk4poor) ([#22435](https://github.com/NousResearch/hermes-agent/pull/22435))
- **Cover remaining SSRF fetch paths in skills-hub** (salvage #22804) ([#22843](https://github.com/NousResearch/hermes-agent/pull/22843))
- **Use credential_pool for custom endpoint model listing probes** (salvage #22810) ([#22842](https://github.com/NousResearch/hermes-agent/pull/22842))
- **Require dashboard auth for plugin API routes** (salvage #19541) ([#23220](https://github.com/NousResearch/hermes-agent/pull/23220))
- **Sanitize env and redact output in quick commands + remove write-only `_pending_messages`** ([#23584](https://github.com/NousResearch/hermes-agent/pull/23584))
- **Reduce unnecessary `shell=True` in subprocess calls** ([#25149](https://github.com/NousResearch/hermes-agent/pull/25149))
- **Sanitize Google Chat sender_type from relay** (salvage of #22107) (@kshitijk4poor) ([#22432](https://github.com/NousResearch/hermes-agent/pull/22432))
- **Supply-chain advisory checker** ([#24220](https://github.com/NousResearch/hermes-agent/pull/24220))
- **Rewrite security policy around OS-level isolation as the boundary** (@jquesnelle) ([#20317](https://github.com/NousResearch/hermes-agent/pull/20317))
- **Remove public security advisory page** ([#24253](https://github.com/NousResearch/hermes-agent/pull/24253))
### Reliability — notable bug closures
- **SQLite: fall back to `journal_mode=DELETE` on NFS/SMB/FUSE** (fixes `/resume` on network mounts) (@kshitijk4poor) ([#22043](https://github.com/NousResearch/hermes-agent/pull/22043))
- **Codex-runtime: retire wedged sessions + post-tool watchdog + OAuth refresh classify** ([#25769](https://github.com/NousResearch/hermes-agent/pull/25769))
- **Codex-runtime: de-dup `[plugins.X]` tables and stop leaking HERMES_HOME** (#26250) (@kshitijk4poor) ([#26260](https://github.com/NousResearch/hermes-agent/pull/26260))
- **Daytona: migrate legacy-sandbox lookup to cursor-based `list()`** ([#24587](https://github.com/NousResearch/hermes-agent/pull/24587))
- **MCP: stop retrying initial MCP auth failures** (#25624) ([#25776](https://github.com/NousResearch/hermes-agent/pull/25776))
- **Gateway: enable text-intercept for multi-choice clarify fallback** (#25587) ([#25778](https://github.com/NousResearch/hermes-agent/pull/25778))
- **Gateway: keep running when platforms fail; per-platform circuit breaker + `/platform`** ([#26600](https://github.com/NousResearch/hermes-agent/pull/26600))
- **Delegate: salvage #21933 JSON-string batch + diagnostic logging** (@kshitijk4poor) ([#22436](https://github.com/NousResearch/hermes-agent/pull/22436))
- **Profiles+banner: exclude infrastructure from `--clone-all` + fix stale update-check repo resolution** (@kshitijk4poor) ([#22475](https://github.com/NousResearch/hermes-agent/pull/22475))
- **ACP: inline file attachment resources** (salvage #21400 + image support) ([#21407](https://github.com/NousResearch/hermes-agent/pull/21407))
- **CI: unblock shared PR checks** (@stephenschoettler) ([#21012](https://github.com/NousResearch/hermes-agent/pull/21012), [#25957](https://github.com/NousResearch/hermes-agent/pull/25957))
### Notable reverts in window
- **`/goal` checklist + /subgoal feature stack** — rolled back ([#23813](https://github.com/NousResearch/hermes-agent/pull/23813)); `/subgoal` returned in simpler form via [#25449](https://github.com/NousResearch/hermes-agent/pull/25449)
- **Scrollback box width clamp** (#25975) rolled back to restore full-width borders ([#26163](https://github.com/NousResearch/hermes-agent/pull/26163))
- **`fix(cli): tolerate unreadable dirs when building systemd PATH`** rolled back
---
## 🌍 i18n
- **Localize all gateway commands + web dashboard, add 8 new locales (16 total)** ([#22914](https://github.com/NousResearch/hermes-agent/pull/22914))
---
## 📚 Documentation
- **Repair Voice & TTS provider table** (@nightcityblade, fixes #24101) ([#24138](https://github.com/NousResearch/hermes-agent/pull/24138))
- **Show per-skill pages in the left sidebar** ([#26646](https://github.com/NousResearch/hermes-agent/pull/26646))
- **Mention Weixin in gateway help and docstrings** (salvage of #21063 by @wuwuzhijing)
- **Richer info panels on the Skills Hub** ([#22905](https://github.com/NousResearch/hermes-agent/pull/22905))
- Many more doc updates across providers, platforms, skills, Windows install paths, and dashboard.
---
## 🧪 Testing & CI
- **Unblock shared PR checks** (@stephenschoettler) ([#21012](https://github.com/NousResearch/hermes-agent/pull/21012))
- **Stabilize shared test state after 21012** (@stephenschoettler) ([#25957](https://github.com/NousResearch/hermes-agent/pull/25957))
- A long tail of test additions for platforms, providers, plugins, and edge cases — 8 explicit `test:` PRs plus ~250 fix PRs that also added regression coverage.
---
## 👥 Contributors
### Core
- @teknium1 — release lead, architecture, ~406 PRs merged in window
### Top community contributors
- **@kshitijk4poor** — 38 PRs · Telegram cadence/streaming/topic routing, security hardening (sudo, SSRF, kanban_comment, dashboard auth), codex-runtime hygiene, NovitaAI provider, profile/banner fixes, Feishu update cards, gateway QOL across the board
- **@alt-glitch** — 13 PRs · Markdown-table TUI rendering, `HERMES_SESSION_ID` env var, hindsight-client optional dep, Nix `extraDependencyGroups`
- **@OutThisLife** (Brooklyn Nicholson) — 12 PRs · TUI turn segmentation, attach-to-gateway, markdown link titles, embedded TUI via dashboard gateway, Ink cursor sync, scroll/Esc during prompts
- **@austinpickett** — 8 PRs · `/sessions` slash command, personality switching preserves session, cron modals, dashboard analytics
- **@helix4u** — 5 PRs · Google Chat setup, browser install skip on system chromium, Windows Ctrl+C preservation
- **@rob-maron** — 4 PRs · Nous Portal as model metadata authority, provider polish
- **@stephenschoettler** — 3 PRs · CI stabilization
- **@ethernet8023** — 3 PRs · platform/gateway work
### All contributors (alphabetical)
@02356abc, @0xbyt4, @0xharryriddle, @1000Delta, @1RB, @29206394, @A-kamal, @aashizpoudel, @Abd0r,
@adybag14-cyber, @AgentArcLab, @ahmedbadr3, @AhmetArif0, @alblez, @Alex-yang00, @ALIYILD, @AllynSheep,
@alt-glitch, @am423, @amathxbt, @amethystani, @ArecaNon, @Arkmusn, @askclaw-vesper, @AsoTora, @austinpickett,
@aydnOktay, @ayushere, @baocin, @Bartok9, @benbarclay, @BennetYrWang, @Bihruze, @binhnt92, @briandevans,
@brooklynnicholson, @btorresgil, @buntingszn, @CalmProton, @chrisworksai, @CoinTheHat, @dandacompany, @Dangooy,
@DanielLSM, @David-0x221Eight, @ddupont808, @dhruv-saxena, @diablozzc, @dlkakbs, @dmahan93, @dmnkhorvath,
@domtriola, @donrhmexe, @Dusk1e, @eloklam, @emozilla, @ephron-ren, @erenkarakus, @EthanGuo-coder,
@ethernet8023, @evgyur, @explainanalyze, @fahdad, @fr33d3m0n, @Freeman-Consulting, @freqyfreqy, @Frowtek,
@fu576, @github-actions[bot], @gnanirahulnutakki, @GodsBoy, @guglielmofonda, @Gutslabs, @hanzckernel,
@heathley, @hekaru-agent, @helix4u, @HenkDz, @HiddenPuppy, @hllqkb, @hrygo, @HuangYuChuh, @Hugo-SEQUIER, @HxT9,
@iacker, @InB4DevOps, @isaachuangGMICLOUD, @iuyup, @Jaaneek, @jackey8616, @jackjin1997, @Jaggia, @jak983464779,
@jelrod27, @jethac, @JithendraNara, @johnisag, @Julientalbot, @Jwd-gity, @kallidean, @keyuyuan, @kfa-ai,
@kidonng, @KiraKatana, @kjames2001, @konsisumer, @Korkyzer, @kshitijk4poor, @KvnGz, @lars-hagen, @leehack,
@leepoweii, @LeonSGP43, @li0near, @libo1106, @liquidchen, @littlewwwhite, @liuhao1024, @liyoungc, @luandiasrj,
@luoyuctl, @luyao618, @magic524, @mbac, @McClean, @memosr, @Mibayy, @ming1523, @mizgyo, @mrshu, @ms-alan,
@MustafaKara7, @nederev, @nicoechaniz, @nidhi-singh02, @nightcityblade, @nik1t7n, @Ninso112, @NivOO5,
@novax635, @nv-kasikritc, @oferlaor, @oswaldb22, @outdoorsea, @oxngon, @PaTTeeL, @pearjelly, @pefontana,
@perng, @PhilipAD, @phuongvm, @polkn, @Prasanna28Devadiga, @princepal9120, @pty819, @purzbeats, @Quarkex,
@quocanh261997, @qWaitCrypto, @Qwinty, @rahimsais, @raymaylee, @ReqX, @rewbs, @RhombusMaximus, @rob-maron,
@Ruzzgar, @ryptotalent, @Sanjays2402, @shannonsands, @shaun0927, @SiliconID, @silv-mt-holdings, @simpolism,
@smwbev, @soichiyo, @sprmn24, @steezkelly, @stephenschoettler, @Sylw3ster, @szymonclawd, @teyrebaz33,
@Tianyu199509, @Tranquil-Flow, @TreyDong, @TurgutKural, @tw2818, @tymrtn, @uzunkuyruk, @v1b3coder,
@vanthinh6886, @VinceZcrikl, @vKongv, @vominh1919, @voteblake, @VTRiot, @wali-reheman, @wesleysimplicio,
@wilsen0, @WorldWriter, @worlldz, @wuli666, @wuwuzhijing, @Wysie, @XiaoXiao0221, @xieNniu, @xxxigm, @yehuosi,
@ygd58, @yifengingit, @yuga-hashimoto, @zccyman, @ZeterMordio, @Zhekinmaksim, @zhengyn0001
Also: @Nagatha (Claude Opus 4.7).
---
**Full Changelog**: [v2026.5.7...v2026.5.16](https://github.com/NousResearch/hermes-agent/compare/v2026.5.7...v2026.5.16)

View File

@@ -1,651 +0,0 @@
# Hermes Agent v0.15.0 (v2026.5.28)
**Release Date:** May 28, 2026
**Since v0.14.0:** 1,302 commits · 747 merged PRs · 1,746 files changed · 282,712 insertions · 36,699 deletions · 560+ issues closed (15 P0, 65 P1, 19 security-tagged) · 321 community contributors (including co-authors)
> **The Velocity Release.** Hermes gets dramatically faster — to start, to run, to ship work, and to grow. The 16,083-line `run_agent.py` collapses to 3,821 (-76%) across 14 cohesive `agent/*` modules. Kanban grew into a real multi-agent platform across 104 PRs — orchestrator auto-decomposition, swarm topology, scheduled tasks, worktree-per-task, per-task model overrides. The cold-start perf wave keeps going: another second shaved off launch, 47% fewer per-conversation function calls, `hermes --version` flipping the head-to-head benchmark against Codex CLI. `session_search` is 4,500× faster and free now. Promptware defense lands against Brainworm-class attacks. Bitwarden Secrets Manager replaces N per-provider API keys with one bootstrap token. Skill bundles let one slash command load a whole workflow. The Ink TUI gets a multi-session orchestrator. Two new image_gen providers (Krea 2 Medium + Large, FAL ported to plugin), the Nous-approved MCP catalog with an interactive picker, an OpenHands orchestration skill, ntfy as the 23rd messaging platform, and a deep xAI integration round (Web Search plugin, xai-oauth `hermes proxy` upstream, retired-May-15 model detection + `hermes migrate xai`, natural TTS speech-tag pauses, base_url leak guard, OpenAI-style execution guidance for Grok). 15 P0 + 65 P1 closures alongside.
---
## ✨ Highlights
- **The Big Refactor — `run_agent.py` is no longer 16,000 lines** — The file at the heart of Hermes — the agent conversation loop — has been reduced from 16,083 lines to 3,821 (-76%), with the extracted code redistributed across 14 cohesive modules under `agent/`. Behavior is unchanged: every extraction keeps a thin forwarder on `AIAgent`, every test patch path still works, every external caller is compatible. The reason you care: future Hermes development moves faster, plugin authors can finally grep the codebase, and the file that took 90 seconds to load in your editor opens in a blink. ([#27248](https://github.com/NousResearch/hermes-agent/pull/27248))
- **Kanban grew into a real multi-agent platform — 104 PRs end to end** — Triage auto-decomposes one task into a tree of sub-tasks. `hermes kanban swarm` creates a full Swarm v1 graph in one command — root, parallel workers, gated verifier, gated synthesizer, shared blackboard. Tasks support per-task model overrides (cheap models for boilerplate, expensive ones for hard sub-tasks), board-level default workdirs, per-task worktree paths and branches, scheduled start times, configurable claim TTL, retry fingerprinting, stale-task detection, respawn guards, and a drag-to-delete trash zone. Workers report through `/workers/active`, `/runs/{id}`, and `/inspect` endpoints. ([#27572](https://github.com/NousResearch/hermes-agent/pull/27572), [#28443](https://github.com/NousResearch/hermes-agent/pull/28443), [#28364](https://github.com/NousResearch/hermes-agent/pull/28364), [#28394](https://github.com/NousResearch/hermes-agent/pull/28394), [#28462](https://github.com/NousResearch/hermes-agent/pull/28462), [#28384](https://github.com/NousResearch/hermes-agent/pull/28384), [#28467](https://github.com/NousResearch/hermes-agent/pull/28467), [#28455](https://github.com/NousResearch/hermes-agent/pull/28455), [#28452](https://github.com/NousResearch/hermes-agent/pull/28452), [#28432](https://github.com/NousResearch/hermes-agent/pull/28432), [#28468](https://github.com/NousResearch/hermes-agent/pull/28468), [#28420](https://github.com/NousResearch/hermes-agent/pull/28420))
- **Cold-start perf wave keeps going — another second saved, 47% fewer per-turn function calls** — Three new optimization rounds: defer `openai._base_client` import (-240ms / -17MB on every CLI invocation), hot-path optimizations cut 47% of per-conversation function calls (399k → 213k for 31-turn chat), defer compression-feasibility check (-170 to -290ms on every agent construction), adaptive subprocess polling (-195ms per tool call, 1+ second per turn). Termux cold start drops from 2.9s to 0.8s. `hermes --version` cold drops 63% (701ms → 258ms), flipping the head-to-head benchmark against Codex CLI from 5/11 wins to 6/11. ([#28864](https://github.com/NousResearch/hermes-agent/pull/28864), [#28866](https://github.com/NousResearch/hermes-agent/pull/28866), [#28957](https://github.com/NousResearch/hermes-agent/pull/28957), [#29006](https://github.com/NousResearch/hermes-agent/pull/29006), [#29419](https://github.com/NousResearch/hermes-agent/pull/29419), [#30121](https://github.com/NousResearch/hermes-agent/pull/30121), [#30609](https://github.com/NousResearch/hermes-agent/pull/30609), [#31968](https://github.com/NousResearch/hermes-agent/pull/31968))
- **`session_search` rebuilt — no LLM, no cost, 4,500× faster** — The old `session_search` was an aux-LLM-powered tool that cost ~$0.30/call and took ~30 seconds to summarize three sessions, sometimes confabulating when the right session wasn't even in the FTS5 hit list. The new shape is one tool with three modes (discovery, scroll, browse) inferred from which args are set — no `mode` parameter, no aux-LLM, no config knob, no companion skill. Discovery is ~20ms instead of ~90s; scroll is ~1ms. Searching your past sessions for context is now free and instant. ([#27590](https://github.com/NousResearch/hermes-agent/pull/27590))
- **Promptware defense — Brainworm-class attacks blocked at three chokepoints** — Inspired by recent Brainworm / Promptware Kill Chain research (Origin HQ, arxiv 2601.09625), Hermes now defends the context window against prompt-injection attacks that try to hijack the agent via tool output, recalled memory, or stored skills. Single source of truth (`tools/threat_patterns.py`) with ~15 new Brainworm/C2 patterns; recalled memory is scanned at load time; tool results get delimiter markers so a malicious file or remote service can't impersonate Hermes' own system content. Paired with a new `security-guidance` plugin that pattern-matches dangerous code writes. ([#32269](https://github.com/NousResearch/hermes-agent/pull/32269), [#33131](https://github.com/NousResearch/hermes-agent/pull/33131), [#9151](https://github.com/NousResearch/hermes-agent/pull/9151))
- **Bitwarden Secrets Manager — one bootstrap token replaces every per-provider API key** — Stop keeping plaintext API keys in `~/.hermes/.env`. Install Bitwarden Secrets Manager (`bws` auto-installs lazily on first use), point Hermes at it with one bootstrap token (`BWS_ACCESS_TOKEN`), and every credential you need comes from Bitwarden at startup. Rotate a key in the Bitwarden web app and the rotation actually takes effect — Bitwarden defaults to source-of-truth so its values overwrite matching env vars on startup. Flip `secrets.bitwarden.override_existing: false` to invert. EU Cloud and self-hosted Bitwarden server URLs supported. Detected credentials are now labeled with their source so you can see at a glance which keys came from Bitwarden vs. the local env. ([#30035](https://github.com/NousResearch/hermes-agent/pull/30035), [#31378](https://github.com/NousResearch/hermes-agent/pull/31378), [#30364](https://github.com/NousResearch/hermes-agent/pull/30364))
- **ntfy as the 23rd messaging platform — push notifications without an account** — ntfy is the self-hostable push-notification service with no signup, no API key, just a topic URL. Hermes now adapts to it as a platform plugin (zero edits to core), so your agent can send you push notifications from any cron job, kanban task completion, or chat `send_message` — to your phone, your watch, your desktop, your homelab. (salvages [#30625](https://github.com/NousResearch/hermes-agent/pull/30625) → originally [#4043](https://github.com/NousResearch/hermes-agent/pull/4043)) ([#30867](https://github.com/NousResearch/hermes-agent/pull/30867))
- **Skill bundles — `/<name>` loads multiple skills at once** — A skill bundle is a named group of skills that loads them all together with one slash command. Set up your "writing day" bundle (humanizer + ideation + obsidian + youtube-content) and `/writing-day` activates all four for the session. Skills Hub now has health checks, a freshness badge, and a watchdog cron. Three new optional skills land: `code-wiki` (Karpathy's LLM-Wiki, persistent indexed dev wiki), `openhands` (delegate to OpenHands for parallel coding agents), and `web-pentest` (OWASP-style web pentest recipes). ([#28373](https://github.com/NousResearch/hermes-agent/pull/28373), [#32345](https://github.com/NousResearch/hermes-agent/pull/32345), [#32240](https://github.com/NousResearch/hermes-agent/pull/32240), [#32261](https://github.com/NousResearch/hermes-agent/pull/32261), [#32265](https://github.com/NousResearch/hermes-agent/pull/32265))
- **TUI session orchestrator — multiple live sessions in one TUI window** — The Ink TUI gained an active-session switcher overlay. List, switch between, refresh, and close multiple live process-local sessions without leaving the TUI; dispatch a new session with a session-scoped model picker. Plus a wave of TUI polish — mouse-tracking DEC mode presets, scrollback preservation across branches and termux, slash-dropdown fixes, x.com link rendering, and CJK / IME input rendering improvements. (salvages [#27642](https://github.com/NousResearch/hermes-agent/pull/27642)) ([#32980](https://github.com/NousResearch/hermes-agent/pull/32980), [#30084](https://github.com/NousResearch/hermes-agent/pull/30084))
- **Two new image_gen providers — Krea 2 Medium + Large, FAL ported to plugin** — Krea joins the image_gen lineup as a built-in plugin: `Krea 2 Medium` ($0.03) and `Krea 2 Large` ($0.06), auto-discovered, selectable via `hermes tools` → Image Generation → Krea. Available through both the native Krea plugin and the FAL.ai catalog. The FAL.ai backend got pulled out of the monolithic image-generation tool into `plugins/image_gen/fal/`, completing the four-way architectural parity already established by web, browser, and video_gen — new image providers are now one file, not a fork. ([#33236](https://github.com/NousResearch/hermes-agent/pull/33236), [#30380](https://github.com/NousResearch/hermes-agent/pull/30380), [#33506](https://github.com/NousResearch/hermes-agent/pull/33506))
- **Nous-approved MCP catalog with interactive picker** — A curated catalog of Nous-vetted MCP servers, mirroring the optional-skills shape. Run `hermes mcp` and you get an interactive picker; install with one keystroke, credentials prompted at install time and written to `~/.hermes/.env`. Ships with the n8n manifest first. Closes the discovery gap that left users hunting GitHub for trusted MCP servers. ([#30870](https://github.com/NousResearch/hermes-agent/pull/30870))
- **OpenHands orchestration skill** — A new optional skill under `optional-skills/autonomous-ai-agents/openhands/` lets the agent delegate coding tasks to the OpenHands CLI alongside `claude-code`, `codex`, and `opencode`. OpenHands is the model-agnostic member of that family — any LiteLLM-supported provider works (OpenAI, Anthropic, OpenRouter, your own), so you can route a sub-task to the cheapest model that can finish it. Drop-in worker for kanban swarms and `/delegate` flows. (closes [#477](https://github.com/NousResearch/hermes-agent/issues/477)) ([#32261](https://github.com/NousResearch/hermes-agent/pull/32261))
- **Deep xAI integration round — Web Search plugin, OAuth proxy upstream, May 15 retirement detection, natural TTS, security hardening** — Six interlocking xAI improvements:
- **xAI Web Search** lands as a `plugins/web/xai/` provider, slots alongside Brave / Tavily / Exa / SearXNG / DDGS / Firecrawl — reuses your existing Grok OAuth or `XAI_API_KEY` credentials, no new env vars. ([#29042](https://github.com/NousResearch/hermes-agent/pull/29042))
- **`hermes proxy` gains an xAI upstream** — your local OpenAI-compatible endpoint can now be backed by SuperGrok OAuth, no PKCE-refresh code to write in your client. ([#28356](https://github.com/NousResearch/hermes-agent/pull/28356))
- **May 15 model retirement detection** — `grok-4`, `grok-4-fast{,-reasoning,-non-reasoning}`, `grok-3`, `grok-code-fast-1`, `grok-imagine-image-pro` etc. are detected in doctor and chat startup, with `hermes migrate xai` to one-shot config migration to the supported model. No more silent 404s after the retirement date. ([#29277](https://github.com/NousResearch/hermes-agent/pull/29277))
- **Opt-in `auto_speech_tags`** for xAI TTS — inserts light `[pause]` tags between paragraphs and sentences for more natural-sounding voice replies. Default OFF. ([#29376](https://github.com/NousResearch/hermes-agent/pull/29376))
- **`xai-oauth` `base_url` pinned to `x.ai` origin** — closes a silent credential-leak vector where `XAI_BASE_URL` could repoint OAuth-authenticated inference to an attacker-controlled host. ([#28952](https://github.com/NousResearch/hermes-agent/pull/28952))
- **OpenAI-style execution guidance applied to Grok models** — Grok and xai-oauth now get the same family-specific execution discipline block GPT/Codex have, so the model stops claiming completion without tool calls and stops suggesting workarounds instead of using existing tools. ([#27797](https://github.com/NousResearch/hermes-agent/pull/27797))
- Plus `x_search` degraded-results surfacing, tier-gated 403 with API-key fallback, PKCE `code_challenge` round-trip fix, dead-token quarantine on terminal refresh failure, MiniMax-style short-token refresh on per-request, and `WKE=unauthenticated` honor at both classifier sites. ([#29484](https://github.com/NousResearch/hermes-agent/pull/29484), [#28351](https://github.com/NousResearch/hermes-agent/pull/28351), [#27560](https://github.com/NousResearch/hermes-agent/pull/27560), [#28116](https://github.com/NousResearch/hermes-agent/pull/28116), [#30619](https://github.com/NousResearch/hermes-agent/pull/30619), [#30872](https://github.com/NousResearch/hermes-agent/pull/30872))
---
## 🏗️ Core Agent & Architecture
### The Big Refactor — `run_agent.py` 16k → 3.8k
- `run_agent.py` from 16,083 → 3,821 lines (-76%), extracted into 14 cohesive `agent/*` modules. `run_conversation` alone was 3,877 lines before the refactor. Every extraction keeps a thin forwarder on `AIAgent`, every test-patch path is preserved, every external caller stays compatible. ([#27248](https://github.com/NousResearch/hermes-agent/pull/27248))
### Agent loop & conversation
- Auxiliary task layered fallback (primary → chain → main agent → graceful fail) on capacity errors (402/429/connection). (salvages [#26811](https://github.com/NousResearch/hermes-agent/pull/26811) + [#26998](https://github.com/NousResearch/hermes-agent/pull/26998)) ([#27625](https://github.com/NousResearch/hermes-agent/pull/27625))
- Buffer retry/fallback status; surface only on terminal failure (no more noisy "retrying..." spam in mid-run output). ([#33816](https://github.com/NousResearch/hermes-agent/pull/33816))
- Host contract for external context engines — condenses 5 prior PRs into one extension surface. ([#33750](https://github.com/NousResearch/hermes-agent/pull/33750))
- Fallback immediately on provider content-policy blocks. ([#33883](https://github.com/NousResearch/hermes-agent/pull/33883))
- Re-pad `reasoning_content` on cross-provider fallback to require-side providers. (salvage [#33784](https://github.com/NousResearch/hermes-agent/pull/33784)) ([#33795](https://github.com/NousResearch/hermes-agent/pull/33795))
- Per-turn tool-outcome verifier — patch tool gets indent preservation, CRLF preservation, per-file failure escalation. ([#32273](https://github.com/NousResearch/hermes-agent/pull/32273))
- Single-knob native vision for custom-provider models. ([#29679](https://github.com/NousResearch/hermes-agent/pull/29679))
- Background review fork isolated from external memory plugins. ([#27190](https://github.com/NousResearch/hermes-agent/pull/27190))
- Background review inherits parent toolset config for `tools[]` cache parity. ([#29704](https://github.com/NousResearch/hermes-agent/pull/29704))
- Recover from providers returning list-type tool content. ([#30259](https://github.com/NousResearch/hermes-agent/pull/30259))
- Treat partial-stream stub responses as length truncation rather than clean stop. ([#30998](https://github.com/NousResearch/hermes-agent/pull/30998))
- OpenAI execution guidance applied to xAI Grok / xai-oauth. ([#27797](https://github.com/NousResearch/hermes-agent/pull/27797))
- ContextVars propagate to concurrent tool worker threads.
- Preload `jiter` native parser. ([#33692](https://github.com/NousResearch/hermes-agent/pull/33692))
- Expose context engine tools with saved toolsets. (salvage of [#31194](https://github.com/NousResearch/hermes-agent/pull/31194)) ([#33719](https://github.com/NousResearch/hermes-agent/pull/33719))
### Sessions & memory
- `session_search` rebuilt — single-shape (discovery + scroll + browse), no aux-LLM, ~20ms vs. ~90s. ([#27590](https://github.com/NousResearch/hermes-agent/pull/27590))
- Salvage [#29182](https://github.com/NousResearch/hermes-agent/pull/29182) — opt-in JSON snapshot writer for sessions. ([#29278](https://github.com/NousResearch/hermes-agent/pull/29278))
- Persist `platform_message_id` for recall across gateway restarts. ([#29449](https://github.com/NousResearch/hermes-agent/pull/29449))
- Inline memory-context mentions stay visible in conversation. ([#28132](https://github.com/NousResearch/hermes-agent/pull/28132))
- Recalled memory labeled informational, not authoritative. ([#28583](https://github.com/NousResearch/hermes-agent/pull/28583))
- Memory + context-engine tool injection gated on `enabled_toolsets`. ([#30177](https://github.com/NousResearch/hermes-agent/pull/30177))
- Guard against external drift in `MEMORY.md` / `USER.md`. ([#30877](https://github.com/NousResearch/hermes-agent/pull/30877))
- Honcho runtime peer mapping — correctness follow-ups + setup wizard + docs. ([#30077](https://github.com/NousResearch/hermes-agent/pull/30077))
- Periodic memory logging for leak detection. (salvage of [#17667](https://github.com/NousResearch/hermes-agent/pull/17667)) ([#27102](https://github.com/NousResearch/hermes-agent/pull/27102))
### Codex / Responses-API maturation
- TTFB watchdog for stalled Codex Responses streams. ([#32042](https://github.com/NousResearch/hermes-agent/pull/32042))
- Actionable hint when stale-call detector fires on known silent-reject pattern. ([#32016](https://github.com/NousResearch/hermes-agent/pull/32016), [#33133](https://github.com/NousResearch/hermes-agent/pull/33133))
- Drop SDK `responses.stream()` helper; consume events directly. ([#33042](https://github.com/NousResearch/hermes-agent/pull/33042))
- Gracefully recover from `invalid_encrypted_content`. (salvage of [#10144](https://github.com/NousResearch/hermes-agent/pull/10144)) ([#33035](https://github.com/NousResearch/hermes-agent/pull/33035))
- Recover Codex Responses streams with null output. ([#32963](https://github.com/NousResearch/hermes-agent/pull/32963), [#33390](https://github.com/NousResearch/hermes-agent/pull/33390))
- Drop foreign-issuer reasoning and transient `rs_tmp` reasoning replay state. ([#33156](https://github.com/NousResearch/hermes-agent/pull/33156), [#33146](https://github.com/NousResearch/hermes-agent/pull/33146))
- Codex 429 quota classified as rate-limit, not missing credentials. ([#33168](https://github.com/NousResearch/hermes-agent/pull/33168))
- Codex chat path falls back to credential_pool when singleton is empty. ([#33189](https://github.com/NousResearch/hermes-agent/pull/33189))
- Codex re-auth syncs credential_pool. ([#33164](https://github.com/NousResearch/hermes-agent/pull/33164))
- Omit `tools` key when no tools registered. ([#33409](https://github.com/NousResearch/hermes-agent/pull/33409))
- Parse Codex image-generation SSE directly. ([#32933](https://github.com/NousResearch/hermes-agent/pull/32933))
---
## 🎛️ Kanban — Multi-Agent Maturation Wave
### Orchestration & dispatch
- Orchestrator-driven auto-decomposition on triage. ([#27572](https://github.com/NousResearch/hermes-agent/pull/27572))
- Kanban swarm topology helper — `hermes kanban swarm` creates a Swarm v1 graph (root + parallel workers + gated verifier + gated synthesizer + shared blackboard). (salvages [#26791](https://github.com/NousResearch/hermes-agent/pull/26791) by @Niraven) ([#28443](https://github.com/NousResearch/hermes-agent/pull/28443))
- Dispatcher wires review agents from the review column. ([#28449](https://github.com/NousResearch/hermes-agent/pull/28449))
- Stale-detection for running tasks in dispatcher. ([#28452](https://github.com/NousResearch/hermes-agent/pull/28452))
- Respawn guard blocks repeat worker storms. ([#28455](https://github.com/NousResearch/hermes-agent/pull/28455))
- Respawn guard defers `blocker_auth` instead of auto-blocking. ([#28683](https://github.com/NousResearch/hermes-agent/pull/28683))
- Cross-profile cron jobs surface in dashboard. ([#28457](https://github.com/NousResearch/hermes-agent/pull/28457))
- Worker visibility endpoints: `/workers/active`, `/runs/{id}`, `/inspect`. (salvages [#23761](https://github.com/NousResearch/hermes-agent/pull/23761) by @Interstellar-code) ([#28432](https://github.com/NousResearch/hermes-agent/pull/28432))
### Task configuration & scheduling
- Per-task model override. ([#28364](https://github.com/NousResearch/hermes-agent/pull/28364))
- Board-level default workdir. ([#28394](https://github.com/NousResearch/hermes-agent/pull/28394))
- Configurable worktree paths and branches. ([#28462](https://github.com/NousResearch/hermes-agent/pull/28462))
- Scheduled task start times. ([#28384](https://github.com/NousResearch/hermes-agent/pull/28384))
- Scheduled status for delayed follow-ups. ([#28467](https://github.com/NousResearch/hermes-agent/pull/28467))
- Trimmed task comments. ([#28399](https://github.com/NousResearch/hermes-agent/pull/28399))
- Initial-status for human-ops cards. ([#28414](https://github.com/NousResearch/hermes-agent/pull/28414))
- `max_in_progress` config to cap concurrent running tasks. ([#28420](https://github.com/NousResearch/hermes-agent/pull/28420))
- Filter tasks by workflow fields. ([#28454](https://github.com/NousResearch/hermes-agent/pull/28454))
- `--sort` for `hermes kanban list`. ([#28427](https://github.com/NousResearch/hermes-agent/pull/28427))
- Optional `board` parameter on all MCP tools. ([#28444](https://github.com/NousResearch/hermes-agent/pull/28444))
- Stamp originating ACP session_id on tasks. ([#28447](https://github.com/NousResearch/hermes-agent/pull/28447))
- `auto_promote_children` config toggle. ([#28344](https://github.com/NousResearch/hermes-agent/pull/28344))
- `archive --rm` to hard-delete archived tasks. ([#28355](https://github.com/NousResearch/hermes-agent/pull/28355))
- Promote dependents when parent is archived. ([#28372](https://github.com/NousResearch/hermes-agent/pull/28372))
- Promote blocked tasks when parent dependencies complete. ([#28377](https://github.com/NousResearch/hermes-agent/pull/28377))
- Demote ready children when parent is reopened. ([#28382](https://github.com/NousResearch/hermes-agent/pull/28382))
- `promote` verb for manual `todo→ready` recovery + bulk `--ids`. (salvage [#29464](https://github.com/NousResearch/hermes-agent/pull/29464)) ([#31334](https://github.com/NousResearch/hermes-agent/pull/31334))
### Dashboard
- Drag-to-delete trash zone + bulk delete. ([#28468](https://github.com/NousResearch/hermes-agent/pull/28468))
- Surface per-task `model_override` in show + tool output. ([#28442](https://github.com/NousResearch/hermes-agent/pull/28442))
- Cross-profile notification delivery via `kanban.notification_sources`. ([#28395](https://github.com/NousResearch/hermes-agent/pull/28395))
- Scratch-workspace deletion warning for users. ([#30949](https://github.com/NousResearch/hermes-agent/pull/30949))
- Mobile dashboard UX polish. ([#28127](https://github.com/NousResearch/hermes-agent/pull/28127))
### Reliability
- Worker log retention configurable. ([#27867](https://github.com/NousResearch/hermes-agent/pull/27867))
- Configurable claim TTL. ([#28392](https://github.com/NousResearch/hermes-agent/pull/28392))
- Fingerprint crash errors to prevent fleet-wide retry exhaustion. ([#28380](https://github.com/NousResearch/hermes-agent/pull/28380))
- Reset failure counters on `unblock_task`. ([#28379](https://github.com/NousResearch/hermes-agent/pull/28379))
- Detect cycles in `decompose_triage_task` sibling-link pre-validation. ([#28088](https://github.com/NousResearch/hermes-agent/pull/28088))
- Surface unusable triage auxiliary model (auto-decompose aware). ([#27871](https://github.com/NousResearch/hermes-agent/pull/27871))
- Align failure diagnostics with retry limit. ([#27868](https://github.com/NousResearch/hermes-agent/pull/27868))
- Align worker terminal timeout with task runtime. ([#27864](https://github.com/NousResearch/hermes-agent/pull/27864))
- Auto-install bundled skills (kanban-worker) on init. ([#28368](https://github.com/NousResearch/hermes-agent/pull/28368))
- Make legacy task migration idempotent. ([#28397](https://github.com/NousResearch/hermes-agent/pull/28397))
- Serialize DB initialization. ([#28383](https://github.com/NousResearch/hermes-agent/pull/28383))
- Persist worker session metadata on completion. ([#28387](https://github.com/NousResearch/hermes-agent/pull/28387))
- Pass `accept-hooks` to worker chat subprocess. ([#28393](https://github.com/NousResearch/hermes-agent/pull/28393))
- Preserve worker tools with restricted toolsets. ([#28396](https://github.com/NousResearch/hermes-agent/pull/28396))
- Avoid unsafe Windows worker Hermes shim resolution. ([#28398](https://github.com/NousResearch/hermes-agent/pull/28398))
- Sync slash subcommands with live parser. ([#28376](https://github.com/NousResearch/hermes-agent/pull/28376))
- Show scheduled kanban tasks in dashboard. ([#28400](https://github.com/NousResearch/hermes-agent/pull/28400))
- Assign single-task kanban decompositions. ([#28401](https://github.com/NousResearch/hermes-agent/pull/28401))
- Configurable `max_tokens` for kanban specify. ([#28374](https://github.com/NousResearch/hermes-agent/pull/28374))
- Per-job profile support for cron. ([#28124](https://github.com/NousResearch/hermes-agent/pull/28124))
- Codex app-server: include every Kanban-pinned path in `writable_roots`. ([#28435](https://github.com/NousResearch/hermes-agent/pull/28435))
- Cache kanban worker guidance at session init for prompt-cache reuse. ([#28425](https://github.com/NousResearch/hermes-agent/pull/28425))
---
## ⚡ Performance
- `openai._base_client` import deferred — 240ms / 17MB off every CLI cold start. ([#28864](https://github.com/NousResearch/hermes-agent/pull/28864))
- Agent-loop hot-path optimizations — 47% fewer per-conversation function calls (399k → 213k for 31-turn chat). ([#28866](https://github.com/NousResearch/hermes-agent/pull/28866))
- Compression-feasibility check deferred — 170-290ms off every agent construction. ([#28957](https://github.com/NousResearch/hermes-agent/pull/28957))
- Adaptive subprocess poll — ~195ms off every tool call, 1+ second per turn. ([#29006](https://github.com/NousResearch/hermes-agent/pull/29006))
- Termux TUI cold start speedup. ([#29419](https://github.com/NousResearch/hermes-agent/pull/29419))
- Termux non-TUI cold start speedup. (salvage [#29438](https://github.com/NousResearch/hermes-agent/pull/29438)) ([#30121](https://github.com/NousResearch/hermes-agent/pull/30121))
- Termux fast-path version + deferred bare-prompt agent startup. ([#30609](https://github.com/NousResearch/hermes-agent/pull/30609))
- Cut hermes `--version` wall time 63% — flips head-to-head vs Codex CLI. ([#31968](https://github.com/NousResearch/hermes-agent/pull/31968))
- Date-only timestamp + loud gateway-DB roundtrip logging — improves prompt-cache hit rate. ([#27675](https://github.com/NousResearch/hermes-agent/pull/27675))
- Cache kanban worker guidance at session init for prompt-cache reuse. ([#28425](https://github.com/NousResearch/hermes-agent/pull/28425))
---
## 🔧 Tool System
### Tool surface
- `patch`: indent preservation, CRLF preservation, per-file failure escalation. ([#32273](https://github.com/NousResearch/hermes-agent/pull/32273))
- `terminal`: warn at call time when `background=true` runs silently. ([#31289](https://github.com/NousResearch/hermes-agent/pull/31289))
- `terminal`: nudge homebrewed CI pollers at the tool surface. ([#33142](https://github.com/NousResearch/hermes-agent/pull/33142))
- `x_search`: surface degraded results + validate dates. ([#29484](https://github.com/NousResearch/hermes-agent/pull/29484))
- `x_search`: auto-enable toolset when xAI credentials are configured. ([#27376](https://github.com/NousResearch/hermes-agent/pull/27376))
- `computer_use`: route SOM/vision captures via auxiliary.vision. ([#30126](https://github.com/NousResearch/hermes-agent/pull/30126))
- `transcription`: reject symlinked audio inputs. ([#10082](https://github.com/NousResearch/hermes-agent/pull/10082))
- TTS: prevent double `[pause]` in xAI auto speech tags. ([#32237](https://github.com/NousResearch/hermes-agent/pull/32237))
- TTS: preserve native audio outside Telegram voice delivery. ([#28512](https://github.com/NousResearch/hermes-agent/pull/28512))
- TTS: opt-in xAI `auto_speech_tags` speech-tag pauses for natural voice replies. ([#29376](https://github.com/NousResearch/hermes-agent/pull/29376))
- Voice: chunk oversized CLI recordings. ([#30044](https://github.com/NousResearch/hermes-agent/pull/30044))
- Voice: honor `PULSE_SERVER` / `PIPEWIRE_REMOTE` inside Docker. ([#22534](https://github.com/NousResearch/hermes-agent/pull/22534))
### Browser
- All cloud browser providers (Browserbase, Anchor, Camofox, Hyperbrowser, etc.) migrated to image_gen-style plugins. (salvages [#25580](https://github.com/NousResearch/hermes-agent/pull/25580)) ([#27403](https://github.com/NousResearch/hermes-agent/pull/27403))
- Auto-launch Chromium-family browser for CDP. ([#29106](https://github.com/NousResearch/hermes-agent/pull/29106))
- Docker: discover agent-browser Chromium binary at boot. ([#33184](https://github.com/NousResearch/hermes-agent/pull/33184))
### Image generation
- **Krea** provider plugin (Krea 2 Medium + Large). ([#33236](https://github.com/NousResearch/hermes-agent/pull/33236))
- FAL backend ported to `plugins/image_gen/fal`. (salvage [#27966](https://github.com/NousResearch/hermes-agent/pull/27966)) ([#30380](https://github.com/NousResearch/hermes-agent/pull/30380))
- Cache xAI ephemeral URL responses to disk. ([#31759](https://github.com/NousResearch/hermes-agent/pull/31759))
### Web search
- **xAI Web Search** as a provider plugin. ([#29042](https://github.com/NousResearch/hermes-agent/pull/29042))
### MCP
- **Nous-approved MCP catalog** with interactive picker. ([#30870](https://github.com/NousResearch/hermes-agent/pull/30870))
- **TLS client certificate (mTLS) support** for HTTP and SSE MCP servers. ([#33721](https://github.com/NousResearch/hermes-agent/pull/33721))
- Stdin paste-back fallback for headless OAuth flow. ([#32053](https://github.com/NousResearch/hermes-agent/pull/32053))
- `skip` at paste prompt bypasses auth without disabling server. ([#32069](https://github.com/NousResearch/hermes-agent/pull/32069))
- Registry-aware `mcp_` prefix on both ends of round-trip. ([#31700](https://github.com/NousResearch/hermes-agent/pull/31700))
---
## 🧩 Skills Ecosystem
### Skills system
- **Skill bundles** — `/<name>` loads multiple skills. ([#28373](https://github.com/NousResearch/hermes-agent/pull/28373))
- Skills Hub: health checks, freshness badge, and a watchdog cron. ([#32345](https://github.com/NousResearch/hermes-agent/pull/32345))
- Opt-in AST deep diagnostics on skill writes. (salvage of [#30918](https://github.com/NousResearch/hermes-agent/pull/30918)) ([#31198](https://github.com/NousResearch/hermes-agent/pull/31198))
- Bundled/pinned skill protection in background-review prompts. ([#28338](https://github.com/NousResearch/hermes-agent/pull/28338))
- Show user-modified skill names in bundled skill sync summary. ([#28671](https://github.com/NousResearch/hermes-agent/pull/28671))
- Load symlinked skill slash commands. ([#27759](https://github.com/NousResearch/hermes-agent/pull/27759))
- Deduplicate Skills Hub search results by identifier, not name. ([#29490](https://github.com/NousResearch/hermes-agent/pull/29490))
### New skills
- `openhands` — delegate-to-OpenHands orchestration skill (closes [#477](https://github.com/NousResearch/hermes-agent/issues/477)) ([#32261](https://github.com/NousResearch/hermes-agent/pull/32261))
- `code-wiki` — persistent indexed dev wiki (closes [#486](https://github.com/NousResearch/hermes-agent/issues/486)) ([#32240](https://github.com/NousResearch/hermes-agent/pull/32240))
- `web-pentest` — OWASP recipes (closes [#400](https://github.com/NousResearch/hermes-agent/issues/400)) ([#32265](https://github.com/NousResearch/hermes-agent/pull/32265))
- `baoyu-article-illustrator` ([#28287](https://github.com/NousResearch/hermes-agent/pull/28287))
---
## ☁️ Providers
### xAI deep integration
- **xAI Web Search** as a `plugins/web/xai/` provider plugin. ([#29042](https://github.com/NousResearch/hermes-agent/pull/29042))
- **`hermes proxy` xAI upstream** — OpenAI-compatible local proxy backed by xai-oauth. ([#28356](https://github.com/NousResearch/hermes-agent/pull/28356))
- **May 15 model retirement detection + `hermes migrate xai`** for grok-4 / grok-3 / grok-code-fast-1 / grok-imagine-image-pro. ([#29277](https://github.com/NousResearch/hermes-agent/pull/29277))
- **Opt-in `auto_speech_tags`** for natural xAI TTS voice replies. ([#29376](https://github.com/NousResearch/hermes-agent/pull/29376))
- **xai-oauth base_url pinned to x.ai origin** — closes silent credential-leak vector. ([#28952](https://github.com/NousResearch/hermes-agent/pull/28952))
- **OpenAI-style execution guidance** applied to Grok / xai-oauth models. ([#27797](https://github.com/NousResearch/hermes-agent/pull/27797))
- xAI: detect retired May 15 models in doctor/chat startup. ([#29277](https://github.com/NousResearch/hermes-agent/pull/29277))
- xAI: resolve Grok Build context for OAuth. ([#30579](https://github.com/NousResearch/hermes-agent/pull/30579))
- xAI OAuth: tier-gated 403 with API-key fallback. ([#28351](https://github.com/NousResearch/hermes-agent/pull/28351))
- xAI OAuth: PKCE `code_challenge` echo. ([#27560](https://github.com/NousResearch/hermes-agent/pull/27560))
- xAI OAuth: quarantine dead tokens on terminal refresh failure. ([#28116](https://github.com/NousResearch/hermes-agent/pull/28116))
- xAI OAuth: honor `WKE=unauthenticated` disambiguator at both classifier sites. ([#30872](https://github.com/NousResearch/hermes-agent/pull/30872))
- xAI OAuth: accept bare-code manual paste (state=None). (closes [#26923](https://github.com/NousResearch/hermes-agent/issues/26923)) ([#33880](https://github.com/NousResearch/hermes-agent/pull/33880))
- xAI OAuth: fall back to manual paste on loopback timeout. ([#33231](https://github.com/NousResearch/hermes-agent/pull/33231))
- xAI proxy: handle 429 rate-limit responses in proxy retry path. ([#33743](https://github.com/NousResearch/hermes-agent/pull/33743))
### Other providers
- **OpenAI API as a first-class provider** (distinct from Codex runtime). ([#31898](https://github.com/NousResearch/hermes-agent/pull/31898))
- **Microsoft Entra ID** auth for Azure Foundry (with 1M Anthropic-Messages beta preserved on Bearer). (salvages [#27509](https://github.com/NousResearch/hermes-agent/pull/27509), [#27022](https://github.com/NousResearch/hermes-agent/pull/27022)) ([#28101](https://github.com/NousResearch/hermes-agent/pull/28101), [#28084](https://github.com/NousResearch/hermes-agent/pull/28084))
- **OpenRouter** sticky routing — `session_id` passed via `extra_body` so a long-running session keeps landing on the same upstream provider. (@Cybourgeoisie) ([#33939](https://github.com/NousResearch/hermes-agent/pull/33939))
- Nous: JWT token for inference; stop replaying invalid Nous refresh tokens. (@rewbs) ([#27663](https://github.com/NousResearch/hermes-agent/pull/27663))
- Nous Portal: one-shot setup, status CLI, and Nous-included markers. ([#30860](https://github.com/NousResearch/hermes-agent/pull/30860))
- Anthropic adapter: extract 7 helpers from `convert_messages_to_anthropic`. (salvage [#27784](https://github.com/NousResearch/hermes-agent/pull/27784)) ([#30386](https://github.com/NousResearch/hermes-agent/pull/30386))
- Catalog: add `qwen3.7-max` to Alibaba + Alibaba-Coding-Plan model lists. ([#33129](https://github.com/NousResearch/hermes-agent/pull/33129))
- opencode-go: route `qwen3.7-max` via `anthropic_messages`. (@beardthelion) ([#32780](https://github.com/NousResearch/hermes-agent/pull/32780))
- opencode-go: expose Kimi K2 + DeepSeek reasoning controls. ([#30845](https://github.com/NousResearch/hermes-agent/pull/30845))
- Remove Vercel AI Gateway and Vercel Sandbox.
- MiniMax OAuth: refresh short-lived access tokens per request. ([#30619](https://github.com/NousResearch/hermes-agent/pull/30619))
- Codex OAuth: quarantine terminal refresh errors. ([#28118](https://github.com/NousResearch/hermes-agent/pull/28118))
- Codex: drop dead model slugs that HTTP 400 on ChatGPT Pro. ([#33424](https://github.com/NousResearch/hermes-agent/pull/33424))
- Codex: sync `manual:device_code` pool entries on re-auth. ([#33744](https://github.com/NousResearch/hermes-agent/pull/33744))
- MiniMax OAuth: quarantine terminal refresh errors. ([#28119](https://github.com/NousResearch/hermes-agent/pull/28119))
---
## 🔑 Secrets
- **Bitwarden Secrets Manager** integration with lazy `bws` install. ([#30035](https://github.com/NousResearch/hermes-agent/pull/30035))
- Bitwarden: EU Cloud + self-hosted server URL support. ([#31378](https://github.com/NousResearch/hermes-agent/pull/31378))
- Label detected credentials with their source (Bitwarden). ([#30364](https://github.com/NousResearch/hermes-agent/pull/30364))
---
## 📱 Messaging Platforms (Gateway)
### Gateway core
- **Deliverable mode** — agents ship artifacts as native uploads from any platform (Slack/Discord/Telegram/Teams/Email). ([#27813](https://github.com/NousResearch/hermes-agent/pull/27813))
- `hermes send` — pipe any script's output to any messaging platform. (salvage of [#19631](https://github.com/NousResearch/hermes-agent/pull/19631)) ([#27188](https://github.com/NousResearch/hermes-agent/pull/27188))
- Debounce queued text follow-ups during active sessions. (salvage of [#31235](https://github.com/NousResearch/hermes-agent/pull/31235)) ([#31341](https://github.com/NousResearch/hermes-agent/pull/31341))
- Plugin-transformed final_response delivered through streaming gate. ([#31433](https://github.com/NousResearch/hermes-agent/pull/31433))
- Refresh cached agent tools on `/reload-mcp`. ([#32815](https://github.com/NousResearch/hermes-agent/pull/32815))
- Harden kanban + provider cleanup races on long-running workloads. ([#29479](https://github.com/NousResearch/hermes-agent/pull/29479))
### New / reorganized adapters
- **ntfy** — 23rd platform, push notifications, plugin shape, zero core edits. (salvages [#30625](https://github.com/NousResearch/hermes-agent/pull/30625) → [#4043](https://github.com/NousResearch/hermes-agent/pull/4043)) ([#30867](https://github.com/NousResearch/hermes-agent/pull/30867))
- **Discord** adapter migrated to bundled plugin. (salvage of [#24356](https://github.com/NousResearch/hermes-agent/pull/24356)) ([#30591](https://github.com/NousResearch/hermes-agent/pull/30591))
- **Mattermost** adapter migrated to bundled plugin. (salvage of [#30916](https://github.com/NousResearch/hermes-agent/pull/30916)) ([#31748](https://github.com/NousResearch/hermes-agent/pull/31748))
### Telegram
- Edit status messages in place instead of appending. (based on [#30141](https://github.com/NousResearch/hermes-agent/pull/30141) by @qike-ms) ([#30864](https://github.com/NousResearch/hermes-agent/pull/30864))
- Skip-STT audio path + 2GB cap via local Bot API server. ([#28541](https://github.com/NousResearch/hermes-agent/pull/28541))
- Route image documents (.png/.jpg/.webp/.gif) through vision pipeline. ([#28519](https://github.com/NousResearch/hermes-agent/pull/28519))
- Route audio file attachments away from STT pipeline. ([#28478](https://github.com/NousResearch/hermes-agent/pull/28478))
- `disable_topic_auto_rename` gateway flag. ([#28523](https://github.com/NousResearch/hermes-agent/pull/28523))
- `ignore_root_dm` config to drop messages without thread_id. ([#28536](https://github.com/NousResearch/hermes-agent/pull/28536))
- Chat-scoped auth without sender user_id. ([#28525](https://github.com/NousResearch/hermes-agent/pull/28525))
- Fail-closed auth fallback when `TELEGRAM_ALLOWED_USERS` is empty. ([#28494](https://github.com/NousResearch/hermes-agent/pull/28494))
- Roll over tool progress bubbles + scope audio_file_paths. ([#28482](https://github.com/NousResearch/hermes-agent/pull/28482))
- Avoid duplicate text after auto-TTS voice replies. ([#28509](https://github.com/NousResearch/hermes-agent/pull/28509))
- Mark final voice reply notify-worthy so Telegram delivers it audibly. ([#28504](https://github.com/NousResearch/hermes-agent/pull/28504))
### Discord
- Recover Windows voice opus decoding. ([#33182](https://github.com/NousResearch/hermes-agent/pull/33182))
- `allow_any_attachment` config to accept arbitrary file types. ([#27245](https://github.com/NousResearch/hermes-agent/pull/27245))
- Transcribe native voice notes. ([#28993](https://github.com/NousResearch/hermes-agent/pull/28993))
- Define UI view classes after lazy install. ([#28817](https://github.com/NousResearch/hermes-agent/pull/28817))
### Signal / Matrix / Feishu / Slack / WeCom
- Signal: `require_mention` filter for group chats. ([#28574](https://github.com/NousResearch/hermes-agent/pull/28574))
- Matrix: warn on clock-skew silent message drops. ([#27330](https://github.com/NousResearch/hermes-agent/pull/27330))
- Matrix E2EE installs full dep set; plugins respect `is_connected`. ([#31688](https://github.com/NousResearch/hermes-agent/pull/31688))
- Feishu: require webhook auth secret + honor config extras. ([#30746](https://github.com/NousResearch/hermes-agent/pull/30746))
- Feishu: enforce auth and chat binding for approval buttons. ([#30744](https://github.com/NousResearch/hermes-agent/pull/30744))
- Slack: socket recovery + Windows restart dedupe. ([#28873](https://github.com/NousResearch/hermes-agent/pull/28873))
- WeCom: safe-parse untrusted XML. ([#32442](https://github.com/NousResearch/hermes-agent/pull/32442))
### DingTalk / Webhooks / Microsoft Graph
- DingTalk: transcribe native voice notes. ([#28993](https://github.com/NousResearch/hermes-agent/pull/28993))
- Webhook: enforce `INSECURE_NO_AUTH` safety rail on dynamic route reloads. ([#30863](https://github.com/NousResearch/hermes-agent/pull/30863))
- Webhook: restrict default toolset capabilities. ([#30745](https://github.com/NousResearch/hermes-agent/pull/30745))
- Microsoft Graph: harden webhook auth requirements. ([#30169](https://github.com/NousResearch/hermes-agent/pull/30169))
---
## 🖥️ CLI & TUI
### CLI
- `/update` slash command in CLI and TUI. ([#23854](https://github.com/NousResearch/hermes-agent/pull/23854))
- Update auto-rollback when post-pull syntax check fails. ([#28669](https://github.com/NousResearch/hermes-agent/pull/28669))
- `--branch` flag for `hermes update`. (@jquesnelle) ([#29591](https://github.com/NousResearch/hermes-agent/pull/29591))
- `/exit --delete` flag to remove session on quit. (salvage of [#17665](https://github.com/NousResearch/hermes-agent/pull/17665)) ([#27101](https://github.com/NousResearch/hermes-agent/pull/27101))
- `▶ N` indicator in status bar for running `/background` tasks. ([#27175](https://github.com/NousResearch/hermes-agent/pull/27175))
- Live background terminal-process count in status bar. ([#32061](https://github.com/NousResearch/hermes-agent/pull/32061))
- Append session recap to `/status` output. (salvage of [#18587](https://github.com/NousResearch/hermes-agent/pull/18587)) ([#27176](https://github.com/NousResearch/hermes-agent/pull/27176))
- Configurable paste-collapse thresholds (TUI + CLI). (salvage [#29723](https://github.com/NousResearch/hermes-agent/pull/29723)) ([#32087](https://github.com/NousResearch/hermes-agent/pull/32087))
- `/resume` accepts position numbers. ([#31709](https://github.com/NousResearch/hermes-agent/pull/31709))
- Bring tool-call display back — verbose mode, specific failure reasons, todo progress. ([#31293](https://github.com/NousResearch/hermes-agent/pull/31293))
- Validate runtime token refresh in Qwen auth status. ([#31196](https://github.com/NousResearch/hermes-agent/pull/31196))
### TUI
- **TUI session orchestrator** — multiple live sessions in one TUI window. (salvages [#27642](https://github.com/NousResearch/hermes-agent/pull/27642)) ([#32980](https://github.com/NousResearch/hermes-agent/pull/32980))
- `mouse_tracking` DEC mode presets. (salvage of [#26681](https://github.com/NousResearch/hermes-agent/pull/26681) by @OutThisLife) ([#30084](https://github.com/NousResearch/hermes-agent/pull/30084))
- Termux scrollback preservation + touch-friendly defaults. ([#28910](https://github.com/NousResearch/hermes-agent/pull/28910))
- Full assistant text in scrollback (no history truncation). ([#28829](https://github.com/NousResearch/hermes-agent/pull/28829))
- Preserve scrollback when branching sessions. ([#30162](https://github.com/NousResearch/hermes-agent/pull/30162))
- Preserve Python dunder identifiers in markdown. ([#28582](https://github.com/NousResearch/hermes-agent/pull/28582))
- Active profile shown in TUI prompt. ([#28581](https://github.com/NousResearch/hermes-agent/pull/28581))
- Improve Charizard completion menu contrast. ([#28346](https://github.com/NousResearch/hermes-agent/pull/28346))
- Stop slash dropdown chopping last char of `/goal`. ([#31311](https://github.com/NousResearch/hermes-agent/pull/31311))
- Clipboard copy on linux/wayland. ([#29342](https://github.com/NousResearch/hermes-agent/pull/29342))
- Anchor `splitReasoning` unclosed-tag regex; stop eating last paragraph. ([#29426](https://github.com/NousResearch/hermes-agent/pull/29426))
- Surface verbose tool details. ([#30225](https://github.com/NousResearch/hermes-agent/pull/30225))
- Load Linux skills on Termux + salvage @adybag14-cyber's Termux gates. ([#30166](https://github.com/NousResearch/hermes-agent/pull/30166))
- Handle images with codex app-server. ([#31220](https://github.com/NousResearch/hermes-agent/pull/31220))
- Refresh virtual transcript on viewport resize. ([#31077](https://github.com/NousResearch/hermes-agent/pull/31077))
- Ignore late thinking deltas after completion. ([#31055](https://github.com/NousResearch/hermes-agent/pull/31055))
- Commit composer input bursts immediately. ([#31053](https://github.com/NousResearch/hermes-agent/pull/31053))
- Log parent gateway lifecycle exits. ([#31051](https://github.com/NousResearch/hermes-agent/pull/31051))
- Clear TTS env var on voice off + TTS indicator in status bar. ([#30987](https://github.com/NousResearch/hermes-agent/pull/30987))
- Pass `--expose-gc` as node argv instead of NODE_OPTIONS. ([#29998](https://github.com/NousResearch/hermes-agent/pull/29998))
- Align composer cursorLayout with wrap-ansi to kill multiline cursor drift. ([#27489](https://github.com/NousResearch/hermes-agent/pull/27489))
- Harden Terminal.app rendering and color paths. ([#27251](https://github.com/NousResearch/hermes-agent/pull/27251))
- Keep `/goal` verdict out of compact status row. ([#27971](https://github.com/NousResearch/hermes-agent/pull/27971))
- Clamp curses color 8 for 8-color terminals (Docker). ([#30260](https://github.com/NousResearch/hermes-agent/pull/30260))
---
## 🔒 Security & Reliability
### Promptware & memory hardening
- **Promptware defense** — shared threat patterns + memory load-time scan + tool-result delimiters. ([#32269](https://github.com/NousResearch/hermes-agent/pull/32269))
- Expand memory content scanning patterns to parity with skills guard. ([#9151](https://github.com/NousResearch/hermes-agent/pull/9151))
- Harden Skills Guard multi-word prompt patterns. (@YLChen-007) ([#26852](https://github.com/NousResearch/hermes-agent/pull/26852))
- Split cron scanner so skill prose stops false-positiving exfil patterns. ([#32339](https://github.com/NousResearch/hermes-agent/pull/32339))
### File safety
- Protect Hermes control-plane files from prompt injection (`auth.json`, `config.yaml`, `webhook_subscriptions.json`, `mcp-tokens/`). (salvages @PratikRai0101's [#14157](https://github.com/NousResearch/hermes-agent/pull/14157)) ([#30397](https://github.com/NousResearch/hermes-agent/pull/30397))
- Write-deny `<root>/.env` when running under a profile. ([#29687](https://github.com/NousResearch/hermes-agent/pull/29687))
- Defense-in-depth read-deny on credential stores. (salvages [#17659](https://github.com/NousResearch/hermes-agent/pull/17659) + [#8055](https://github.com/NousResearch/hermes-agent/pull/8055)) ([#30721](https://github.com/NousResearch/hermes-agent/pull/30721))
- TTS `output_path` traversal + update ZIP symlink reject. (salvage [#6693](https://github.com/NousResearch/hermes-agent/pull/6693) + [#15881](https://github.com/NousResearch/hermes-agent/pull/15881)) ([#32056](https://github.com/NousResearch/hermes-agent/pull/32056))
- Reject symlinked audio inputs. ([#10082](https://github.com/NousResearch/hermes-agent/pull/10082))
### Credential safety
- Avoid persisting borrowed credential secrets — runtime env-sourced keys no longer leak into `auth.json`. ([#31416](https://github.com/NousResearch/hermes-agent/pull/31416))
- Validate Nous Portal `inference_base_url` against host allowlist. (salvages [#27612](https://github.com/NousResearch/hermes-agent/pull/27612)) ([#30611](https://github.com/NousResearch/hermes-agent/pull/30611))
- Harden API server key placeholder handling. ([#30738](https://github.com/NousResearch/hermes-agent/pull/30738))
- Harden Google Chat OAuth credential persistence. (@Zyrixtrex) ([#24788](https://github.com/NousResearch/hermes-agent/pull/24788))
- xAI OAuth: pin inference `base_url` to x.ai origin. ([#28952](https://github.com/NousResearch/hermes-agent/pull/28952))
- Quarantine dead OAuth tokens on terminal refresh failure (xAI, Codex, MiniMax). ([#28116](https://github.com/NousResearch/hermes-agent/pull/28116), [#28118](https://github.com/NousResearch/hermes-agent/pull/28118), [#28119](https://github.com/NousResearch/hermes-agent/pull/28119))
### Supply-chain
- **On-demand supply-chain audit via OSV.dev** — `hermes audit`. ([#31460](https://github.com/NousResearch/hermes-agent/pull/31460))
- `hermes update` syntax-validates critical files post-pull, auto-rollback on failure. ([#28669](https://github.com/NousResearch/hermes-agent/pull/28669))
- Quarantine `hermes.exe` vs concurrent Windows instance. ([#26677](https://github.com/NousResearch/hermes-agent/pull/26677))
### Other hardening
- Restrict default webhook toolset capabilities. ([#30745](https://github.com/NousResearch/hermes-agent/pull/30745))
- Harden Microsoft Graph webhook auth requirements. ([#30169](https://github.com/NousResearch/hermes-agent/pull/30169))
- Require source CIDR allowlisting for public msgraph webhook binds. ([#33722](https://github.com/NousResearch/hermes-agent/pull/33722))
- Require `API_SERVER_KEY` before dispatching API server work. ([#33232](https://github.com/NousResearch/hermes-agent/pull/33232))
- env_passthrough: apply GHSA-rhgp-j443-p4rf filter to config.yaml path. (@roadhero) ([#27794](https://github.com/NousResearch/hermes-agent/pull/27794))
- Dashboard + WeCom: restrict markdown link schemes; safe-parse untrusted XML. ([#32442](https://github.com/NousResearch/hermes-agent/pull/32442))
- Salvage project-plugin RCE bypass fix from PR [#29311](https://github.com/NousResearch/hermes-agent/pull/29311) (GHSA-5qr3-c538-wm9j). ([#30837](https://github.com/NousResearch/hermes-agent/pull/30837))
- Cross-profile soft guard on file-write tools + system-prompt hint. ([#31290](https://github.com/NousResearch/hermes-agent/pull/31290))
- Reject unsafe tar members in Android psutil compatibility installer. ([#33742](https://github.com/NousResearch/hermes-agent/pull/33742))
- Reject non-regular tar members during tirith auto-install. ([#33786](https://github.com/NousResearch/hermes-agent/pull/33786))
---
## 🪟 Native Windows (Beta Continued)
- Complete Windows bootstrap — `dep_ensure` + `install.ps1` + detection. (@alt-glitch) ([#27845](https://github.com/NousResearch/hermes-agent/pull/27845))
- `install.ps1`: strip BOM, `-Commit`/`-Tag` pin params, harden git ops. (@jquesnelle) ([#28169](https://github.com/NousResearch/hermes-agent/pull/28169))
- Consolidate ACP browser bootstrap into `install.{sh,ps1}`. (@alt-glitch) ([#27851](https://github.com/NousResearch/hermes-agent/pull/27851))
- `hermes update` quarantines live `hermes.exe`. ([#26677](https://github.com/NousResearch/hermes-agent/pull/26677))
- Discord voice opus decoding on Windows. ([#33182](https://github.com/NousResearch/hermes-agent/pull/33182))
- Windows Docker Desktop compatible compose file. (@Sunil123135) ([#31031](https://github.com/NousResearch/hermes-agent/pull/31031))
---
## 🖥️ Web Dashboard
- Hardened Slack socket recovery + Windows restart dedupe. ([#28873](https://github.com/NousResearch/hermes-agent/pull/28873))
- Web dashboard: migrate checkboxes to `@nous-research/ui` + design-system polish. (@austinpickett) ([#28814](https://github.com/NousResearch/hermes-agent/pull/28814))
- Web dashboard: collapsible sidebar. (@austinpickett) ([#33421](https://github.com/NousResearch/hermes-agent/pull/33421))
- Dashboard typography & contrast pass. (salvage of [#28832](https://github.com/NousResearch/hermes-agent/pull/28832)) ([#30714](https://github.com/NousResearch/hermes-agent/pull/30714))
- Skills page: lazy-fetch catalog instead of bundling 34MB into JS. ([#33809](https://github.com/NousResearch/hermes-agent/pull/33809))
---
## 🐳 Docker
- **s6-overlay container supervision** — abstract `ServiceManager` protocol (systemd/launchd/Windows/s6 backends), per-profile gateway supervision in-container, container-restart reconciliation, hadolint/shellcheck CI. (salvage of [#30136](https://github.com/NousResearch/hermes-agent/pull/30136), @benbarclay) ([#31760](https://github.com/NousResearch/hermes-agent/pull/31760))
- Auto-redirect `gateway run` to supervised mode inside the s6 image. (@benbarclay) ([#33583](https://github.com/NousResearch/hermes-agent/pull/33583))
- Tee supervised gateway stdout to docker logs. (@benbarclay) ([#33621](https://github.com/NousResearch/hermes-agent/pull/33621))
- Drop `docker exec` to hermes uid before invoking the CLI. (@benbarclay) ([#33628](https://github.com/NousResearch/hermes-agent/pull/33628))
- Align HOME for dashboard and s6 gateway services. (@Dusk1e) ([#33481](https://github.com/NousResearch/hermes-agent/pull/33481))
- Bake build-time git SHA into image so `hermes dump` reports it. (@benbarclay) ([#33655](https://github.com/NousResearch/hermes-agent/pull/33655))
- `hermes update` prints `docker pull` guidance instead of bogus git error. (@benbarclay) ([#33659](https://github.com/NousResearch/hermes-agent/pull/33659))
- Upgrade Node to 22 LTS via multi-stage from `node:22-bookworm-slim`. (@benbarclay) ([#33060](https://github.com/NousResearch/hermes-agent/pull/33060))
- Drop `build-essential` from apt install. (@benbarclay) ([#33028](https://github.com/NousResearch/hermes-agent/pull/33028))
- Propagate env through s6 to cont-init and main CMD. ([#32412](https://github.com/NousResearch/hermes-agent/pull/32412))
- Targeted chown to preserve host file ownership in `HERMES_HOME`. ([#33033](https://github.com/NousResearch/hermes-agent/pull/33033))
- `mkdir HERMES_HOME` as root in stage2 before chown / privilege drop. ([#33078](https://github.com/NousResearch/hermes-agent/pull/33078))
- chown `ui-tui` and `node_modules` on UID remap so TUI esbuild works. ([#33045](https://github.com/NousResearch/hermes-agent/pull/33045))
- Include `anthropic`, `bedrock`, `azure-identity` extras in image. ([#30504](https://github.com/NousResearch/hermes-agent/pull/30504))
- Stop pushing per-commit SHA tags to Docker Hub. ([#29387](https://github.com/NousResearch/hermes-agent/pull/29387))
- Simplify Docker tagging — push both `:main` and `:latest` on main push. ([#33225](https://github.com/NousResearch/hermes-agent/pull/33225))
- Test slicing across GH actions jobs. (@ethernet8023) ([#30575](https://github.com/NousResearch/hermes-agent/pull/30575))
- Discover agent-browser Chromium binary at boot. ([#33184](https://github.com/NousResearch/hermes-agent/pull/33184))
---
## 🌐 API Server
- **Session control API** — `/api/sessions/*` (list/create/read/patch/delete/fork) + SSE-streaming chat. (salvages [#29302](https://github.com/NousResearch/hermes-agent/pull/29302) by @Codename-11 + multimodal followup by @Schwartz10) ([#33134](https://github.com/NousResearch/hermes-agent/pull/33134))
- `GET /v1/skills` and `/v1/toolsets`. ([#33016](https://github.com/NousResearch/hermes-agent/pull/33016))
- Coerce stringified booleans in stream/store/approval payloads. (salvage [#26639](https://github.com/NousResearch/hermes-agent/pull/26639)) ([#27293](https://github.com/NousResearch/hermes-agent/pull/27293))
- Honor `key_env` in auth-failure fallback resolution. ([#30840](https://github.com/NousResearch/hermes-agent/pull/30840))
---
## 🎟️ ACP (VS Code / Zed / JetBrains)
- Session edit auto-approval modes. (salvage of [#27034](https://github.com/NousResearch/hermes-agent/pull/27034)) ([#27862](https://github.com/NousResearch/hermes-agent/pull/27862))
- Enrich Zed permission cards — command in title + `reject_always`. ([#28148](https://github.com/NousResearch/hermes-agent/pull/28148))
- Replay session history before responding to `session/load`. ([#26957](https://github.com/NousResearch/hermes-agent/pull/26957), [#26943](https://github.com/NousResearch/hermes-agent/pull/26943))
- Plugin-transformed final_response delivered through streaming gate. ([#31433](https://github.com/NousResearch/hermes-agent/pull/31433))
---
## 🔌 Plugin Surface
- `register_tts_provider()` plugin hook. (salvage of [#30420](https://github.com/NousResearch/hermes-agent/pull/30420)) ([#31745](https://github.com/NousResearch/hermes-agent/pull/31745))
- `register_transcription_provider()` hook + `stt.providers` command-provider registry. (salvage of [#30493](https://github.com/NousResearch/hermes-agent/pull/30493)) ([#31907](https://github.com/NousResearch/hermes-agent/pull/31907))
- `register_auxiliary_task()` in PluginContext API. (salvage [#29817](https://github.com/NousResearch/hermes-agent/pull/29817)) ([#31177](https://github.com/NousResearch/hermes-agent/pull/31177))
- Bundled `security-guidance` plugin. ([#33131](https://github.com/NousResearch/hermes-agent/pull/33131))
- Discord and Mattermost migrated to bundled plugins. ([#30591](https://github.com/NousResearch/hermes-agent/pull/30591), [#31748](https://github.com/NousResearch/hermes-agent/pull/31748))
- ntfy as platform plugin. ([#30867](https://github.com/NousResearch/hermes-agent/pull/30867))
- Surface category-namespaced plugins in `hermes plugins list`. ([#27187](https://github.com/NousResearch/hermes-agent/pull/27187))
- Plugin discovery failures raised to WARNING level. ([#28318](https://github.com/NousResearch/hermes-agent/pull/28318))
- `hermes_plugins` included in gateway.log component filter. ([#28313](https://github.com/NousResearch/hermes-agent/pull/28313))
- Seed plugin extras before `is_connected` gate. ([#31703](https://github.com/NousResearch/hermes-agent/pull/31703))
- Dashboard: allowlist plugin assets + denylist subprocess-influencing env vars. ([#32277](https://github.com/NousResearch/hermes-agent/pull/32277))
---
## 📦 Distribution & Install
- Install-method stamping + Docker detection. (@alt-glitch) ([#27843](https://github.com/NousResearch/hermes-agent/pull/27843))
- Nix `#messaging` and `#full` package variants. (@alt-glitch) ([#33108](https://github.com/NousResearch/hermes-agent/pull/33108))
- Pre-load messaging gateway deps via `--extra messaging`. (salvage [#26394](https://github.com/NousResearch/hermes-agent/pull/26394)) ([#27558](https://github.com/NousResearch/hermes-agent/pull/27558))
- Avoid piping installer directly into `iex` (Windows). ([#28347](https://github.com/NousResearch/hermes-agent/pull/28347))
- Ship bundled skills in wheel. ([#28421](https://github.com/NousResearch/hermes-agent/pull/28421))
- Ship dashboard plugin assets in wheel. ([#28406](https://github.com/NousResearch/hermes-agent/pull/28406))
- Make Camofox lazy-installed instead of eager. ([#27055](https://github.com/NousResearch/hermes-agent/pull/27055))
- Wire STT lazy-install into transcription_tools.py. ([#30256](https://github.com/NousResearch/hermes-agent/pull/30256))
---
## 🐛 Notable Bug Fixes (highlights only)
- Match bare custom provider by active base URL in `hermes model`. ([#28908](https://github.com/NousResearch/hermes-agent/pull/28908))
- Route `auxiliary.vision.provider=openai` to api.openai.com, skip text-only main. ([#31452](https://github.com/NousResearch/hermes-agent/pull/31452))
- Lint: skip per-file shell linter when LSP will handle the file. ([#29054](https://github.com/NousResearch/hermes-agent/pull/29054))
- Treat empty credential pool entries as unauthenticated in `/model` picker. ([#28312](https://github.com/NousResearch/hermes-agent/pull/28312))
- Reverted within window: Firecrawl integration tag, send_message @username auto-mentions, Telegram quick-command-only menus, Telegram pin-on-turn.
---
## 🧪 Testing
- Disarm lazy-install probe so `_HAS_FASTER_WHISPER` patches work. ([#30334](https://github.com/NousResearch/hermes-agent/pull/30334))
- Cover default board dashboard pin. ([#28361](https://github.com/NousResearch/hermes-agent/pull/28361))
- Cover `_task_dict` `task_age` fallback. ([#28365](https://github.com/NousResearch/hermes-agent/pull/28365))
- Allowlist `tmp_path` for `kanban_notify` artifact delivery tests. ([#30851](https://github.com/NousResearch/hermes-agent/pull/30851), [#30852](https://github.com/NousResearch/hermes-agent/pull/30852))
- Cover null output stream terminal events in Codex. ([#33137](https://github.com/NousResearch/hermes-agent/pull/33137))
---
## 📚 Documentation
- **30-day docs overhaul** — full correctness audit, every PR in the window covered, Nous Portal weave, sidebar reorg. ([#33782](https://github.com/NousResearch/hermes-agent/pull/33782))
- Dedicated Nous Portal integration page and setup guide. ([#31296](https://github.com/NousResearch/hermes-agent/pull/31296))
- Providers: move Nous Portal first, Google Gemini OAuth last. ([#31287](https://github.com/NousResearch/hermes-agent/pull/31287))
- `session_search` rewrite for single-shape tool. ([#27840](https://github.com/NousResearch/hermes-agent/pull/27840))
- Kanban: document failure_limit, max_retries, inline create shortcuts, goals & kanban settings. ([#28357](https://github.com/NousResearch/hermes-agent/pull/28357), [#28358](https://github.com/NousResearch/hermes-agent/pull/28358), [#28359](https://github.com/NousResearch/hermes-agent/pull/28359), [#28360](https://github.com/NousResearch/hermes-agent/pull/28360), [#28362](https://github.com/NousResearch/hermes-agent/pull/28362))
- Kanban Codex lane skill. ([#28430](https://github.com/NousResearch/hermes-agent/pull/28430))
- xAI OAuth: note X Premium+ also unlocks Grok OAuth. ([#29055](https://github.com/NousResearch/hermes-agent/pull/29055))
- Docs site: Docker audio bridge notes, "Installing more tools in the container", xurl auth HOME in Docker.
- Email: clarify gateway vs Himalaya setup. (@helix4u) ([#33634](https://github.com/NousResearch/hermes-agent/pull/33634))
- Auth docs: replace stale `hermes login` references with `hermes auth add`. ([#32859](https://github.com/NousResearch/hermes-agent/pull/32859))
---
## 👥 Contributors
### Core
- @teknium1 (lead)
### Notable salvages & cherry-picks
- **@benbarclay** — s6-overlay container supervision (29 commits salvaged), Node 22 LTS upgrade, build-essential cleanup, `gateway run` auto-redirect in s6, tee supervised stdout to docker logs, `hermes update` Docker guidance, build-time SHA stamping
- **@OutThisLife** — `mouse_tracking` DEC mode presets
- **@jquesnelle** — Windows installer hardening, `--branch` flag for `hermes update`, install.ps1 BOM strip / commit-pin
- **@alt-glitch** — Windows `dep_ensure` bootstrap, Nix package variants (`.#messaging`, `.#full`), install-method stamping, ACP browser bootstrap consolidation
- **@austinpickett** — `/update` slash command, dashboard checkboxes → `@nous-research/ui`, mobile dashboard polish, collapsible sidebar
- **@ethernet8023** — CI test slicing across GH Actions jobs, TUI clipboard copy fix
- **@kshitijk4poor** — doctor section banner + fail-and-issue helpers extraction, post-tag salvage cluster (curator-fallout, kanban SQLite hardening, install world-readable uv dirs, xAI bare-code paste)
- **@rewbs** — Nous JWT inference switch + refresh-token replay fix
- **@Codename-11** + **@Schwartz10** — session control API (REST + SSE + multimodal followup)
- **@Niraven** — kanban swarm topology helper
- **@Interstellar-code** — kanban worker visibility endpoints
- **@adybag14-cyber** — termux cold-start optimizations (multiple PRs)
- **@qike-ms** — Telegram in-place status edits design
- **@sprmn24** — ntfy adapter
- **@Jaaneek** — xAI Web Search provider plugin
- **@yannsunn** — xAI upstream adapter for `hermes proxy`
- **@Cybourgeoisie** — OpenRouter sticky routing via session_id
- **@memosr** — Nous Portal base_url allowlist validation
- **@Sunil123135** — Windows Docker Desktop compose file
- **@Dusk1e** — Docker HOME alignment for dashboard + s6 gateway services
- **@beardthelion** — opencode-go anthropic_messages routing
- **@YLChen-007** — Skills Guard multi-word prompt patterns
- **@roadhero** — env_passthrough GHSA-rhgp-j443-p4rf filter
- **@Zyrixtrex** — Google Chat OAuth credential persistence hardening
- **@briandevans**, **@tomqiaozc** — defense-in-depth read-deny on credential stores
- **@PratikRai0101** — control-plane file write protection
- **@helix4u**, **@Bartok9**, **@zccyman** — auxiliary fallback ladder components
- **@ms-alan**, **@ticketclosed-wontfix**, **@donovan-yohan** — TUI session orchestrator + follow-ups
- **@daimon-nous[bot]** — cron per-job profile support
- **@bisko** — re-pad `reasoning_content` on cross-provider fallback
### All Contributors
@02356abc, @0xchainer, @0xDevNinja, @0xjackyang, @0xsir0000, @0z1-ghb, @8bit64k, @aaronlab, @AceWattGit,
@ACR27, @adam91holt, @AdamPlatin123, @Ade5954, @AdityaRajeshGadgil, @adybag14-cyber, @AhmetArif0, @ai-hana-ai,
@alaamohanad169-ship-it, @alber70g, @albert748, @alt-glitch, @aqilaziz, @argabor, @asdlem, @austinpickett,
@avifenesh, @awizemann, @B0Tch1, @Bartok9, @BaxBit, @Beandon13, @beardthelion, @benbarclay, @bensargotest-sys,
@binhnt92, @bird, @bisko, @BlackishGreen33, @booker1207, @bradhallett, @briandevans, @Brixyy, @brndnsvr,
@BROCCOLO1D, @btorresgil, @burjorjee, @carltonawong, @Carry00, @chaconne67, @chdlc, @chromalinx, @ChyuWei,
@CipherFrame, @cmullins70, @CNSeniorious000, @codeblackhole1024, @Codename-11, @colin-chang, @counterposition,
@cresslank, @CryptoByz, @cyb0rgk1tty, @Cybourgeoisie, @daizhonggeng, @darvsum, @davidcampbelldc, @deas,
@dgians, @dillweed, @DoGMaTiiC, @donovan-yohan, @draplater, @Drexuxux, @dskwe, @dsr-restyn, @Dusk1e,
@dusterbloom, @duyua9, @egilewski, @el-analista, @eliteworkstation94-ai, @eloklam, @EloquentBrush0x, @emonty,
@emozilla, @erhnysr, @erikengervall, @Erosika, @ether-btc, @ethernet8023, @EvilHumphrey, @fabiosiqueira,
@falasi, @falconexe, @fardoche6, @felix-windsor, @Fewmanism, @ffr31mr, @flamiinngo, @flanny7, @flooryyyy,
@fonhal, @francip, @fujinice, @gianfrancopiana, @glennc, @Glucksberg, @godlin-gh, @Grogger, @guillaumemeyer,
@Gutslabs, @H-Ali13381, @hanzckernel, @haran2001, @hawknewton, @hayka-pacha, @hehehe0803, @helix4u, @HenkDz,
@Hermes, @hermesagent26, @Hinotoi-agent, @hongchen1993, @honor2030, @houenyang-momo, @ht1072, @hueilau,
@iamfoz, @ilonagaja509-glitch, @InB4DevOps, @indigokarasu, @Interstellar-code, @iqdoctor, @iRonin, @Jaaneek,
@JabberELF, @jacevys, @jackey8616, @jackjin1997, @jdelmerico, @jfuenmayor, @Jiahui-Gu, @JimLiu, @joe102084,
@JohnC1009, @jonpol01, @Jpalmer95, @Julientalbot, @justemu, @justincc, @jvinals, @karthikeyann, @kasunvinod,
@kchuang1015, @kenyonxu, @khungate, @kiranvk-2011, @kjames2001, @konsisumer, @kpadilha, @kriscolab,
@krislidimo, @kronexoi, @kshitijk4poor, @kunci115, @Kylejeong2, @kylekahraman, @LaPhilosophie, @leeseoki0,
@lemassykoi, @Lempkey, @LeonJS, @LeonSGP43, @lidge-jun, @LifeJiggy, @liuhao1024, @LizerAIDev, @loicnico96,
@loongfay, @m0n3r0, @malaiwah, @matthewlai, @mavrickdeveloper, @maxmilian, @McClean-Edison, @memosr,
@Mind-Dragon, @momowind, @MoonJuhan, @MoonRay305, @moortekweb-art, @MorAlekss, @ms-alan, @Nami4D,
@nehaaprasaad, @nekwo, @nftpoetrist, @NickLarcombe, @nidhi-singh02, @Niraven, @nnnet, @noctilust, @novax635,
@nthrow, @nv-kasikritc, @nycomar, @OCWC22, @oemtalks, @OmX, @ooovenenoso, @orcool, @oseftg, @outsourc-e,
@OutThisLife, @Paperclip, @PaTTeeL, @pepelax, @phoenixshen, @Pluviobyte, @pnascimento9596, @pochi-gio, @pr7426,
@PratikRai0101, @Prithvi1994, @psionic73, @ptichalouf, @Que0x, @QuenVix, @quocanh261997, @qWaitCrypto, @Qwinty,
@r266-tech, @rak135, @rdasilva1016-ui, @rewbs, @roadhero, @rodrigoeqnit, @RonHillDev, @roycepersonalassistant,
@rudi193-cmd, @RyanRana, @sadiksaifi, @samahn0601, @samggggflynn, @SamuelZ12, @sanghyuk-seo-nexcube,
@Saurav0989, @savanne-kham, @Schrotti77, @Schwartz10, @SerenityTn, @sgtworkman, @sharziki, @shaun0927,
@shellybotmoyer, @shunsuke-hikiyama, @SimbaKingjoe, @SimoKiihamaki, @sir-ad, @Slimydog21, @slowtokki0409,
@Soju06, @someaka, @soynchux, @sprmn24, @Stark-X, @steezkelly, @stepanov1975, @stephenschoettler,
@stevehq26-bot, @steveonjava, @Strontvod, @subtract0, @Sunil123135, @superearn-fisher, @Sylw3ster, @tchanee,
@that-ambuj, @thedavidmurray, @TheOnlyMika, @therahul-yo, @thewillhuang, @ticketclosed-wontfix, @Timur00Kh,
@tomqiaozc, @Tosko4, @Tranquil-Flow, @tw2818, @uzunkuyruk, @vaddisrinivas, @vanthinh6886, @vgocoder,
@victorGPT, @vynxevainglory-ai, @waefrebeorn, @walli, @wangpuv, @wanwan2qq, @wesleysimplicio, @worlldz,
@wpengpeng168, @WuKongAI-CMU, @wuli666, @Wysie, @wysie, @xxxigm, @yannsunn, @YanzhongSu, @YarrowQiao, @ygd58,
@YLChen-007, @yoniebans, @yu-xin-c, @YuanHanzhong, @zapabob, @zccyman, @ziliangpeng, @zwolniony, @Zyrixtrex
---
**Full Changelog**: [v2026.5.16...v2026.5.28](https://github.com/NousResearch/hermes-agent/compare/v2026.5.16...v2026.5.28)

View File

@@ -1,110 +0,0 @@
# Hermes Agent v0.15.1 (v2026.5.29)
**Release Date:** May 29, 2026
**Since v0.15.0:** 28 commits · 21 merged PRs · hotfix release · 9 contributors
> **The Patch Release.** A same-day hotfix for v0.15.0. Headline fix: the dashboard infinite-reload loop that hit anyone running v0.15.0 in loopback mode (Docker, hosted Hermes, fresh installs). A handful of other v0.15.0 follow-ups go along for the ride — kanban worker SIGTERM, `/model` picker unification, `/yolo` session bypass, the full 19,932-entry skills.sh catalog, `.md` media delivery restoration, gateway probe-stepdown safety, web-URL redaction passthrough, kanban worker vision on referenced images, hindsight observation-default. Docker users get an explicit `--insecure` opt-in env var (no more bind-host inference), MCP server bare-command PATH resolution, and arm64 PR-build cache fixes.
---
## ✨ Highlights
- **Dashboard 401 reload loop fixed** — In loopback mode the dashboard's identity probe (`/api/auth/me`) returns 401 by design, but v0.15.0's stale-token reload guard treated every 401 as a rotated session token and full-page-reloaded to pick up a fresh one. Every successful sibling call cleared the one-shot reload guard, so the page reload-looped forever (Firefox: "Navigated to /sessions" storm; Chrome: React re-render storm). Fix adds an `allowUnauthorized` opt-out to `fetchJSON` that skips only the loopback stale-token reload — 401 still throws so `AuthWidget` swallows it, gated-mode `login_url` redirects are unaffected. Closes [#34206](https://github.com/NousResearch/hermes-agent/issues/34206), [#34202](https://github.com/NousResearch/hermes-agent/issues/34202). ([#30698](https://github.com/NousResearch/hermes-agent/pull/30698) — @austinpickett)
- **Docker dashboard `--insecure` is now an explicit env opt-in, never derived from bind host** — Previously the Docker entrypoint inferred `--insecure` when the dashboard bound to a non-loopback host. That conflated "I want LAN access" with "I want to disable the same-origin guard." The fix splits them: bind host is bind host, and disabling the dashboard's loopback auth requires an explicit `HERMES_DASHBOARD_INSECURE=1`. Existing setups that genuinely wanted insecure binding must now set the env var. ([#34188](https://github.com/NousResearch/hermes-agent/pull/34188), [#34204](https://github.com/NousResearch/hermes-agent/pull/34204) — @benbarclay)
- **MCP bare command resolution under Docker** — MCP servers configured with bare commands (`npx`, `npm`, `node`) now resolve against `/usr/local/bin` so they actually launch inside the Docker image where those binaries live. v0.15.0 left these failing silently in containers when the agent's effective PATH didn't include the Node toolchain location. ([#34186](https://github.com/NousResearch/hermes-agent/pull/34186) — @benbarclay)
- **Skills page sidebar / source pills restored** — A stale `useMemo` dependency in the new dashboard skills page collapsed the source pills and category sidebar to "All" only. Fixed; both surfaces now reflect the live catalog state. ([#34194](https://github.com/NousResearch/hermes-agent/pull/34194))
- **Kanban worker can be killed again** — `SIGTERM` on a kanban worker was being absorbed by an intermediate process and the worker stayed running. Closes [#28181](https://github.com/NousResearch/hermes-agent/issues/28181). ([#34045](https://github.com/NousResearch/hermes-agent/pull/34045))
- **Full skills.sh catalog (858 → 19,932 entries)** — The skills hub page was pulling a partial paginated catalog. The fetch now walks the sitemap, so all 19,932 skills.sh entries surface in the picker instead of just the first 858. ([#34025](https://github.com/NousResearch/hermes-agent/pull/34025))
---
## 🐛 Bug Fixes
### Dashboard / Web
- **`/api/auth/me` 401 no longer triggers reload loop** in loopback mode — ([#30698](https://github.com/NousResearch/hermes-agent/pull/30698) — @austinpickett)
- **Skills page source pills + category sidebar restored** — stale `useMemo` dep ([#34194](https://github.com/NousResearch/hermes-agent/pull/34194))
### Docker
- **`--insecure` is now explicit opt-in via env var**, not derived from bind host ([#34188](https://github.com/NousResearch/hermes-agent/pull/34188) — @benbarclay)
- **Dashboard test suite repaired** to match the insecure-opt-in fix ([#34204](https://github.com/NousResearch/hermes-agent/pull/34204) — @benbarclay)
- **arm64 PR builds skip the GHA cache** to avoid cache-thrash on cross-arch builders ([#33704](https://github.com/NousResearch/hermes-agent/pull/33704) — @BROCCOLO1D)
### MCP
- **Bare `npx`/`npm`/`node` resolve against `/usr/local/bin`** for Docker compatibility ([#34186](https://github.com/NousResearch/hermes-agent/pull/34186) — @benbarclay)
### Kanban
- **Worker SIGTERM actually terminates the process** ([#34045](https://github.com/NousResearch/hermes-agent/pull/34045))
- **Workers receive images referenced in task bodies** for vision-capable models ([#34210](https://github.com/NousResearch/hermes-agent/pull/34210))
### Gateway
- **`.md` files deliver again** — media-delivery validation defaults to denylist-only instead of an overly-narrow allowlist ([#34022](https://github.com/NousResearch/hermes-agent/pull/34022))
- **Probe stepdown safety** — on a context-overflow without an explicit provider context limit, the agent no longer steps down to a smaller model based on an unknown ceiling (salvage of [#33673](https://github.com/NousResearch/hermes-agent/pull/33673)) ([#33826](https://github.com/NousResearch/hermes-agent/pull/33826))
### CLI
- **`/yolo` mid-session enables the per-session bypass** instead of just toggling the env var (which the running agent had already snapshotted) ([#33931](https://github.com/NousResearch/hermes-agent/pull/33931) — @kshitijk4poor)
- **`/model` and `hermes model` show the same list**, plus disk cache for picker startup ([#33867](https://github.com/NousResearch/hermes-agent/pull/33867))
### Skills
- **Full skills.sh catalog via sitemap** — 858 → 19,932 entries ([#34025](https://github.com/NousResearch/hermes-agent/pull/34025))
### Redaction
- **Web URLs pass through unchanged** — the redactor was eating query parameters that looked credential-shaped ([#34029](https://github.com/NousResearch/hermes-agent/pull/34029))
---
## ✨ Small Features
- **Hindsight default narrowed to observation-only** for `recall_types` — tool path is also narrowed ([#34079](https://github.com/NousResearch/hermes-agent/pull/34079) — @nicoloboschi, follow-up [#34091](https://github.com/NousResearch/hermes-agent/pull/4df62d239e38bf8c212a595721c9c01e176f6c3a) — @kshitijk4poor)
- **Memory providers receive completed-turn message context** — salvage of [#28065](https://github.com/NousResearch/hermes-agent/pull/28065) ([#34097](https://github.com/NousResearch/hermes-agent/pull/34097) — @kshitijk4poor, credit to @devwdave)
---
## 📚 Documentation
- **`--no-supervise` / `HERMES_GATEWAY_NO_SUPERVISE` documented** in the reference docs (follow-up to [#33583](https://github.com/NousResearch/hermes-agent/pull/33583)) ([#33751](https://github.com/NousResearch/hermes-agent/pull/33751) — @r266-tech)
---
## 🛠️ Infrastructure
- **Vercel deploy workflow accepts `workflow_dispatch`** so docs deploys can be manually triggered ([#34081](https://github.com/NousResearch/hermes-agent/pull/34081))
- **`@nous-research/ui` bumped to 0.18.2** (Nix `npmDepsHash` also updated to match) ([#34193](https://github.com/NousResearch/hermes-agent/pull/34193) follow-ups — @austinpickett)
---
## 👥 Contributors
### Core
- @teknium1
### Community
- @austinpickett — dashboard 401 reload-loop fix (the headline), `@nous-research/ui` bump, Nix `npmDepsHash` updates
- @benbarclay — Docker `--insecure` opt-in, MCP bare-command resolution, dashboard test repair
- @kshitijk4poor`/yolo` session bypass, completed-turn memory context salvage, hindsight follow-up docs
- @nicoloboschi — hindsight `recall_types` observation default
- @BROCCOLO1D — arm64 PR build cache fix
- @r266-tech — `--no-supervise` reference docs
- @yangguangjin — probe stepdown safety (salvage of @yanghd's #33673)
- @devwdave — completed-turn memory context (credited via salvage)
- @andrewhosf — co-author
### Issue Reporters (the 401 loop)
- @routesmith ([#34206](https://github.com/NousResearch/hermes-agent/issues/34206))
- @beeaton ([#34202](https://github.com/NousResearch/hermes-agent/issues/34202))
---
**Full Changelog**: [v2026.5.28...v2026.5.29](https://github.com/NousResearch/hermes-agent/compare/v2026.5.28...v2026.5.29)

View File

@@ -1,32 +1,18 @@
"""ACP auth helpers — detect and advertise Hermes authentication methods."""
"""ACP auth helpers — detect the currently configured Hermes provider."""
from __future__ import annotations
from typing import Any, Optional
TERMINAL_SETUP_AUTH_METHOD_ID = "hermes-setup"
from typing import Optional
def detect_provider() -> Optional[str]:
"""Resolve the active Hermes runtime provider, or None if unavailable.
Treats a ``Callable`` ``api_key`` (Azure Foundry Entra ID bearer
token provider — see :mod:`agent.azure_identity_adapter`) as a valid
credential. Without this, ACP sessions for Entra-configured Foundry
deployments silently default to ``"openrouter"`` and the ACP auth
handshake rejects the legitimate provider.
"""
"""Resolve the active Hermes runtime provider, or None if unavailable."""
try:
from hermes_cli.runtime_provider import resolve_runtime_provider
runtime = resolve_runtime_provider()
api_key = runtime.get("api_key")
provider = runtime.get("provider")
if not isinstance(provider, str) or not provider.strip():
return None
is_string_key = isinstance(api_key, str) and api_key.strip()
is_callable_provider = callable(api_key) and not isinstance(api_key, str)
if is_string_key or is_callable_provider:
if isinstance(api_key, str) and api_key.strip() and isinstance(provider, str) and provider.strip():
return provider.strip().lower()
except Exception:
return None
@@ -36,44 +22,3 @@ def detect_provider() -> Optional[str]:
def has_provider() -> bool:
"""Return True if Hermes can resolve any runtime provider credentials."""
return detect_provider() is not None
def build_auth_methods() -> list[Any]:
"""Return registry-compatible ACP auth methods for Hermes.
The official ACP registry validates that agents advertise at least one
usable auth method during the initial handshake. A fresh Zed install may
not have Hermes provider credentials configured yet, so Hermes always
advertises a terminal setup method. When credentials are already present,
it also advertises the resolved provider as the default agent-managed
runtime credential method.
"""
from acp.schema import AuthMethodAgent, TerminalAuthMethod
methods: list[Any] = []
provider = detect_provider()
if provider:
methods.append(
AuthMethodAgent(
id=provider,
name=f"{provider} runtime credentials",
description=(
"Authenticate Hermes using the currently configured "
f"{provider} runtime credentials."
),
)
)
methods.append(
TerminalAuthMethod(
id=TERMINAL_SETUP_AUTH_METHOD_ID,
name="Configure Hermes provider",
description=(
"Open Hermes' interactive model/provider setup in a terminal. "
"Use this when Hermes has not been configured on this machine yet."
),
type="terminal",
args=["--setup"],
)
)
return methods

View File

@@ -1,286 +0,0 @@
"""Pre-execution ACP edit approval helpers.
This module is intentionally isolated from the generic tool registry. ACP binds
an edit approval requester in a ContextVar for the duration of one ACP agent run;
CLI, gateway, and other sessions leave it unset and therefore bypass this guard.
"""
from __future__ import annotations
import asyncio
import json
import logging
import tempfile
from concurrent.futures import TimeoutError as FutureTimeout
from contextvars import ContextVar, Token
from dataclasses import dataclass
from itertools import count
from pathlib import Path
from typing import Any, Callable
logger = logging.getLogger(__name__)
@dataclass(frozen=True)
class EditProposal:
"""A proposed single-file edit that can be shown to an ACP client."""
tool_name: str
path: str
old_text: str | None
new_text: str
arguments: dict[str, Any]
EditApprovalRequester = Callable[[EditProposal], bool]
_EDIT_APPROVAL_REQUESTER: ContextVar[EditApprovalRequester | None] = ContextVar(
"ACP_EDIT_APPROVAL_REQUESTER",
default=None,
)
_PERMISSION_REQUEST_IDS = count(1)
SENSITIVE_AUTO_APPROVE_NAMES = {".env", ".env.local", ".env.production", "id_rsa", "id_ed25519"}
AUTO_APPROVE_ASK = "ask"
AUTO_APPROVE_WORKSPACE = "workspace_session"
AUTO_APPROVE_SESSION = "session"
def set_edit_approval_requester(requester: EditApprovalRequester | None) -> Token:
"""Bind an ACP edit approval requester for the current context."""
return _EDIT_APPROVAL_REQUESTER.set(requester)
def reset_edit_approval_requester(token: Token) -> None:
"""Restore a previous edit approval requester binding."""
_EDIT_APPROVAL_REQUESTER.reset(token)
def clear_edit_approval_requester() -> None:
"""Clear the current requester; primarily used by tests."""
_EDIT_APPROVAL_REQUESTER.set(None)
def get_edit_approval_requester() -> EditApprovalRequester | None:
return _EDIT_APPROVAL_REQUESTER.get()
def _read_text_if_exists(path: str) -> str | None:
p = Path(path).expanduser()
if not p.exists():
return None
if not p.is_file():
raise OSError(f"Cannot edit non-file path: {path}")
return p.read_text(encoding="utf-8", errors="replace")
def _proposal_for_write_file(arguments: dict[str, Any]) -> EditProposal:
path = str(arguments.get("path") or "")
if not path:
raise ValueError("path required")
content = arguments.get("content")
if content is None:
raise ValueError("content required")
return EditProposal(
tool_name="write_file",
path=path,
old_text=_read_text_if_exists(path),
new_text=str(content),
arguments=dict(arguments),
)
def _proposal_for_patch_replace(arguments: dict[str, Any]) -> EditProposal:
path = str(arguments.get("path") or "")
if not path:
raise ValueError("path required")
old_string = arguments.get("old_string")
new_string = arguments.get("new_string")
if old_string is None or new_string is None:
raise ValueError("old_string and new_string required")
old_text = _read_text_if_exists(path)
if old_text is None:
raise ValueError(f"Failed to read file: {path}")
from tools.fuzzy_match import fuzzy_find_and_replace
new_text, match_count, _strategy, error = fuzzy_find_and_replace(
old_text,
str(old_string),
str(new_string),
bool(arguments.get("replace_all", False)),
)
if error or match_count == 0:
raise ValueError(error or f"Could not find match for old_string in {path}")
return EditProposal(
tool_name="patch",
path=path,
old_text=old_text,
new_text=new_text,
arguments=dict(arguments),
)
def build_edit_proposal(tool_name: str, arguments: dict[str, Any]) -> EditProposal | None:
"""Return an edit proposal for supported file mutation calls."""
if tool_name == "write_file":
return _proposal_for_write_file(arguments)
if tool_name == "patch" and arguments.get("mode", "replace") == "replace":
return _proposal_for_patch_replace(arguments)
return None
def _is_sensitive_auto_approve_path(path: str) -> bool:
parts = Path(path).expanduser().parts
lowered = {part.lower() for part in parts}
if ".git" in lowered or ".ssh" in lowered:
return True
return Path(path).name.lower() in SENSITIVE_AUTO_APPROVE_NAMES
def should_auto_approve_edit(proposal: EditProposal, policy: str, cwd: str | None = None) -> bool:
"""Return whether an ACP edit proposal may bypass the prompt for this session.
This is intentionally session-scoped and conservative: sensitive paths still
ask even under autonomous policies.
"""
policy = str(policy or AUTO_APPROVE_ASK).strip()
if policy == AUTO_APPROVE_ASK or _is_sensitive_auto_approve_path(proposal.path):
return False
path = Path(proposal.path).expanduser().resolve(strict=False)
if policy == AUTO_APPROVE_SESSION:
return True
if policy == AUTO_APPROVE_WORKSPACE:
# `/tmp` is the POSIX path but tempfile.gettempdir() is the real one on
# every platform: `/private/tmp` on macOS (because `/tmp` is a symlink
# and Path.resolve() follows it) and the per-user Temp dir on Windows.
tmp_root = Path(tempfile.gettempdir()).resolve(strict=False)
try:
path.relative_to(tmp_root)
return True
except ValueError:
pass
if cwd:
root = Path(cwd).expanduser().resolve(strict=False)
try:
path.relative_to(root)
return True
except ValueError:
return False
return False
def maybe_require_edit_approval(tool_name: str, arguments: dict[str, Any]) -> str | None:
"""Run ACP edit approval if bound.
Returns a JSON tool-error string when the edit must be blocked, otherwise
``None`` so dispatch can continue. Requester exceptions deny by default.
"""
requester = get_edit_approval_requester()
if requester is None:
return None
try:
proposal = build_edit_proposal(tool_name, arguments)
except Exception as exc:
logger.warning("Could not build ACP edit approval proposal for %s: %s", tool_name, exc)
return json.dumps({"error": f"Edit approval denied: could not prepare diff ({exc})"}, ensure_ascii=False)
if proposal is None:
return None
try:
approved = bool(requester(proposal))
except Exception as exc:
logger.warning("ACP edit approval requester failed: %s", exc)
approved = False
if approved:
return None
return json.dumps({"error": "Edit approval denied by ACP client; file was not modified."}, ensure_ascii=False)
def build_acp_edit_tool_call(proposal: EditProposal):
"""Build the ToolCallUpdate payload for ACP request_permission."""
import acp
tool_call_id = f"edit-approval-{next(_PERMISSION_REQUEST_IDS)}"
return acp.update_tool_call(
tool_call_id,
title=f"Approve edit: {proposal.path}",
kind="edit",
status="pending",
content=[
acp.tool_diff_content(
path=proposal.path,
old_text=proposal.old_text,
new_text=proposal.new_text,
)
],
raw_input={"tool": proposal.tool_name, "arguments": proposal.arguments},
)
def make_acp_edit_approval_requester(
request_permission_fn: Callable,
loop: asyncio.AbstractEventLoop,
session_id: str,
timeout: float = 60.0,
auto_approve_getter: Callable[[], tuple[str, str | None]] | None = None,
) -> EditApprovalRequester:
"""Return a sync requester that bridges edit proposals to ACP permissions."""
def _requester(proposal: EditProposal) -> bool:
from acp.schema import PermissionOption
from agent.async_utils import safe_schedule_threadsafe
if auto_approve_getter is not None:
try:
policy, cwd = auto_approve_getter()
if should_auto_approve_edit(proposal, policy, cwd):
logger.info("Auto-approved ACP edit under policy %s: %s", policy, proposal.path)
return True
except Exception:
logger.debug("ACP edit auto-approval policy check failed", exc_info=True)
options = [
PermissionOption(option_id="allow_once", kind="allow_once", name="Allow edit"),
PermissionOption(option_id="deny", kind="reject_once", name="Deny"),
]
tool_call = build_acp_edit_tool_call(proposal)
coro = request_permission_fn(
session_id=session_id,
tool_call=tool_call,
options=options,
)
future = safe_schedule_threadsafe(
coro,
loop,
logger=logger,
log_message="Edit approval request: failed to schedule on loop",
)
if future is None:
return False
try:
response = future.result(timeout=timeout)
except (FutureTimeout, Exception) as exc:
future.cancel()
logger.warning("Edit approval request timed out or failed: %s", exc)
return False
outcome = getattr(response, "outcome", None)
return (
getattr(outcome, "outcome", None) == "selected"
and getattr(outcome, "option_id", None) == "allow_once"
)
return _requester

View File

@@ -24,7 +24,6 @@ except ModuleNotFoundError:
# means UTF-8 stdio setup is skipped on Windows; POSIX is unaffected.
pass
import argparse
import asyncio
import logging
import sys
@@ -108,125 +107,8 @@ def _load_env() -> None:
)
def _parse_args(argv: list[str] | None = None) -> argparse.Namespace:
parser = argparse.ArgumentParser(
prog="hermes-acp",
description="Run Hermes Agent as an ACP stdio server.",
)
parser.add_argument("--version", action="store_true", help="Print Hermes version and exit")
parser.add_argument(
"--check",
action="store_true",
help="Verify ACP dependencies and adapter imports, then exit",
)
parser.add_argument(
"--setup",
action="store_true",
help="Run interactive Hermes provider/model setup for ACP terminal auth",
)
parser.add_argument(
"--setup-browser",
action="store_true",
help="Install agent-browser + Playwright Chromium into ~/.hermes/node/ "
"for browser tool support. Idempotent.",
)
parser.add_argument(
"--yes",
"-y",
action="store_true",
dest="assume_yes",
help="Accept all prompts (currently used by --setup-browser to skip the "
"~400 MB Chromium download confirmation).",
)
return parser.parse_args(argv)
def _print_version() -> None:
from hermes_cli import __version__ as hermes_version
print(hermes_version)
def _run_check() -> None:
import acp # noqa: F401
from acp_adapter.server import HermesACPAgent # noqa: F401
print("Hermes ACP check OK")
def _run_setup() -> None:
from hermes_cli.main import main as hermes_main
old_argv = sys.argv[:]
try:
sys.argv = [old_argv[0] if old_argv else "hermes", "model"]
hermes_main()
finally:
sys.argv = old_argv
# Offer browser-tools install as a follow-up. The terminal auth method
# is the one supported first-run UX for registry installs, so this is
# the natural moment to ask. Skip silently if stdin isn't a TTY (the
# answer can't be collected anyway).
if not sys.stdin.isatty():
return
try:
reply = input(
"\nInstall browser tools? Downloads agent-browser (npm) and "
"optionally Playwright Chromium (~400 MB). [y/N] "
).strip().lower()
except (EOFError, KeyboardInterrupt):
return
if reply in {"y", "yes"}:
_run_setup_browser(assume_yes=False)
def _run_setup_browser(assume_yes: bool = False) -> int:
"""Bootstrap agent-browser + Chromium.
Routes through dep_ensure -> install.{sh,ps1} --ensure, sharing code
with ``hermes postinstall`` and the runtime lazy installer.
Returns 0 on success, 1 on failure.
"""
from hermes_cli.dep_ensure import ensure_dependency
try:
node_ok = ensure_dependency("node", interactive=not assume_yes)
if not node_ok:
print("Node.js installation failed — cannot proceed with browser tools.",
file=sys.stderr)
return 1
browser_ok = ensure_dependency("browser", interactive=not assume_yes)
if not browser_ok:
print("Browser tools installation failed.", file=sys.stderr)
return 1
return 0
except OSError as exc:
print(f"Browser bootstrap failed: {exc}", file=sys.stderr)
return 1
def main(argv: list[str] | None = None) -> None:
def main() -> None:
"""Entry point: load env, configure logging, run the ACP agent."""
args = _parse_args(argv)
if args.version:
_print_version()
return
if args.check:
_run_check()
return
if args.setup:
_run_setup()
return
if args.setup_browser:
rc = _run_setup_browser(assume_yes=args.assume_yes)
if rc != 0:
sys.exit(rc)
return
_setup_logging()
_load_env()

View File

@@ -14,7 +14,6 @@ from collections import deque
from typing import Any, Callable, Deque, Dict
import acp
from acp.schema import AgentPlanUpdate, PlanEntry
from .tools import (
build_tool_complete,
@@ -25,65 +24,6 @@ from .tools import (
logger = logging.getLogger(__name__)
def _json_loads_maybe_prefix(value: str) -> Any:
"""Parse a JSON object even when Hermes appended a human hint after it."""
text = value.strip()
try:
return json.loads(text)
except Exception:
decoder = json.JSONDecoder()
data, _ = decoder.raw_decode(text)
return data
def _build_plan_update_from_todo_result(result: Any) -> AgentPlanUpdate | None:
"""Translate Hermes' todo tool result into ACP's native plan update.
Zed renders ``sessionUpdate: plan`` as its first-class task/todo panel. The
Hermes agent already maintains task state through the ``todo`` tool, so the
ACP adapter should expose that state natively instead of only as a generic
tool-call transcript block.
"""
if not isinstance(result, str) or not result.strip():
return None
try:
data = _json_loads_maybe_prefix(result)
except Exception:
return None
if not isinstance(data, dict) or not isinstance(data.get("todos"), list):
return None
todos = data["todos"]
if not todos:
return AgentPlanUpdate(session_update="plan", entries=[])
status_map = {
"pending": "pending",
"in_progress": "in_progress",
"completed": "completed",
# ACP plans only support pending/in_progress/completed. Preserve
# cancelled tasks as terminal entries instead of dropping them and
# making the client's full-list replacement lose visible context.
"cancelled": "completed",
}
entries: list[PlanEntry] = []
for item in todos:
if not isinstance(item, dict):
continue
content = str(item.get("content") or item.get("id") or "").strip()
if not content:
continue
raw_status = str(item.get("status") or "pending").strip()
status = status_map.get(raw_status, "pending")
if raw_status == "cancelled":
content = f"[cancelled] {content}"
entries.append(PlanEntry(content=content, priority="medium", status=status))
return AgentPlanUpdate(session_update="plan", entries=entries)
def _send_update(
conn: acp.Client,
session_id: str,
@@ -91,17 +31,10 @@ def _send_update(
update: Any,
) -> None:
"""Fire-and-forget an ACP session update from a worker thread."""
from agent.async_utils import safe_schedule_threadsafe
future = safe_schedule_threadsafe(
conn.session_update(session_id, update),
loop,
logger=logger,
log_message="Failed to send ACP update",
)
if future is None:
return
try:
future = asyncio.run_coroutine_threadsafe(
conn.session_update(session_id, update), loop
)
future.result(timeout=5)
except Exception:
logger.debug("Failed to send ACP update", exc_info=True)
@@ -117,7 +50,6 @@ def make_tool_progress_cb(
loop: asyncio.AbstractEventLoop,
tool_call_ids: Dict[str, Deque[str]],
tool_call_meta: Dict[str, Dict[str, Any]],
edit_approval_policy_getter: Callable[[], tuple[str, str | None]] | None = None,
) -> Callable:
"""Create a ``tool_progress_callback`` for AIAgent.
@@ -163,20 +95,7 @@ def make_tool_progress_cb(
logger.debug("Failed to capture ACP edit snapshot for %s", name, exc_info=True)
tool_call_meta[tc_id] = {"args": args, "snapshot": snapshot}
edit_diff = None
if name in {"write_file", "patch"} and edit_approval_policy_getter is not None:
try:
from acp_adapter.edit_approval import build_edit_proposal, should_auto_approve_edit
proposal = build_edit_proposal(name, args)
if proposal is not None:
policy, cwd = edit_approval_policy_getter()
if should_auto_approve_edit(proposal, policy, cwd):
edit_diff = proposal
except Exception:
logger.debug("Failed to prepare auto-approved ACP edit diff for %s", name, exc_info=True)
update = build_tool_start(tc_id, name, args, edit_diff=edit_diff)
update = build_tool_start(tc_id, name, args)
_send_update(conn, session_id, loop, update)
return _tool_progress
@@ -249,10 +168,6 @@ def make_step_cb(
snapshot=meta.get("snapshot"),
)
_send_update(conn, session_id, loop, update)
if tool_name == "todo":
plan_update = _build_plan_update_from_todo_result(result)
if plan_update is not None:
_send_update(conn, session_id, loop, plan_update)
if not queue:
tool_call_ids.pop(tool_name, None)

View File

@@ -1,11 +1,10 @@
"""ACP permission bridging for Hermes dangerous-command approvals."""
"""ACP permission bridging — maps ACP approval requests to hermes approval callbacks."""
from __future__ import annotations
import asyncio
import logging
from concurrent.futures import TimeoutError as FutureTimeout
from itertools import count
from typing import Callable
from acp.schema import (
@@ -15,107 +14,24 @@ from acp.schema import (
logger = logging.getLogger(__name__)
# Maps ACP permission option ids to Hermes approval result strings.
# Option ids are stable across both the ``allow_permanent=True`` and
# ``allow_permanent=False`` paths even though the option list differs.
_OPTION_ID_TO_HERMES = {
# Maps ACP PermissionOptionKind -> hermes approval result strings
_KIND_TO_HERMES = {
"allow_once": "once",
"allow_session": "session",
"allow_always": "always",
"deny": "deny",
"deny_always": "deny",
"reject_once": "deny",
"reject_always": "deny",
}
_PERMISSION_REQUEST_IDS = count(1)
def _permission_option_supports_kind(kind: str) -> bool:
"""Return whether the installed ACP SDK accepts a permission option kind."""
try:
PermissionOption(option_id="__probe__", kind=kind, name="probe")
except Exception:
return False
return True
def _build_permission_options(*, allow_permanent: bool) -> list[PermissionOption]:
"""Return ACP options that match Hermes approval semantics."""
options = [
PermissionOption(option_id="allow_once", kind="allow_once", name="Allow once"),
PermissionOption(
option_id="allow_session",
# ACP has no session-scoped kind, so use the closest persistent
# hint while keeping Hermes semantics in the option id.
kind="allow_always",
name="Allow for session",
),
]
if allow_permanent:
options.append(
PermissionOption(
option_id="allow_always",
kind="allow_always",
name="Allow always",
),
)
options.append(PermissionOption(option_id="deny", kind="reject_once", name="Deny"))
if _permission_option_supports_kind("reject_always"):
options.append(
PermissionOption(
option_id="deny_always",
kind="reject_always",
name="Deny always",
),
)
return options
def _build_permission_tool_call(command: str, description: str):
"""Return the ACP tool-call update attached to a permission request.
``request_permission`` expects a ``ToolCallUpdate`` payload — produced
by ``_acp.update_tool_call`` — not a ``ToolCallStart``. Each request
gets a unique ``perm-check-N`` id so concurrent requests don't collide.
"""
import acp as _acp
tool_call_id = f"perm-check-{next(_PERMISSION_REQUEST_IDS)}"
title = f"{description}: {command}" if description else command
content_text = f"{description}\n$ {command}" if description else f"$ {command}"
return _acp.update_tool_call(
tool_call_id,
title=title,
kind="execute",
status="pending",
content=[_acp.tool_content(_acp.text_block(content_text))],
raw_input={"command": command, "description": description},
)
def _map_outcome_to_hermes(outcome: object, *, allowed_option_ids: set[str]) -> str:
"""Map an ACP permission outcome into Hermes approval strings."""
if not isinstance(outcome, AllowedOutcome):
return "deny"
option_id = outcome.option_id
if option_id not in allowed_option_ids:
logger.warning("Permission request returned unknown option_id: %s", option_id)
return "deny"
return _OPTION_ID_TO_HERMES.get(option_id, "deny")
def make_approval_callback(
request_permission_fn: Callable,
loop: asyncio.AbstractEventLoop,
session_id: str,
timeout: float = 60.0,
) -> Callable[..., str]:
) -> Callable[[str, str], str]:
"""
Return a Hermes-compatible approval callback that bridges to ACP.
The callback accepts ``command`` and ``description`` plus optional
keyword arguments such as ``allow_permanent`` used by
``tools.approval.prompt_dangerous_approval()``.
Return a hermes-compatible ``approval_callback(command, description) -> str``
that bridges to the ACP client's ``request_permission`` call.
Args:
request_permission_fn: The ACP connection's ``request_permission`` coroutine.
@@ -124,45 +40,41 @@ def make_approval_callback(
timeout: Seconds to wait for a response before auto-denying.
"""
def _callback(
command: str,
description: str,
*,
allow_permanent: bool = True,
**_: object,
) -> str:
from agent.async_utils import safe_schedule_threadsafe
def _callback(command: str, description: str) -> str:
options = [
PermissionOption(option_id="allow_once", kind="allow_once", name="Allow once"),
PermissionOption(option_id="allow_always", kind="allow_always", name="Allow always"),
PermissionOption(option_id="deny", kind="reject_once", name="Deny"),
]
import acp as _acp
options = _build_permission_options(allow_permanent=allow_permanent)
tool_call = _acp.start_tool_call("perm-check", command, kind="execute")
tool_call = _build_permission_tool_call(command, description)
coro = request_permission_fn(
session_id=session_id,
tool_call=tool_call,
options=options,
)
future = safe_schedule_threadsafe(
coro, loop,
logger=logger,
log_message="Permission request: failed to schedule on loop",
)
if future is None:
return "deny"
try:
future = asyncio.run_coroutine_threadsafe(coro, loop)
response = future.result(timeout=timeout)
except (FutureTimeout, Exception) as exc:
future.cancel()
logger.warning("Permission request timed out or failed: %s", exc)
return "deny"
if response is None:
return "deny"
allowed_option_ids = {option.option_id for option in options}
return _map_outcome_to_hermes(
response.outcome,
allowed_option_ids=allowed_option_ids,
)
outcome = response.outcome
if isinstance(outcome, AllowedOutcome):
option_id = outcome.option_id
# Look up the kind from our options list
for opt in options:
if opt.option_id == option_id:
return _KIND_TO_HERMES.get(opt.kind, "deny")
return "once" # fallback for unknown option_id
else:
return "deny"
return _callback

View File

@@ -3,7 +3,6 @@
from __future__ import annotations
import asyncio
from datetime import datetime, timezone
import base64
import contextvars
import json
@@ -19,7 +18,6 @@ import acp
from acp.schema import (
AgentCapabilities,
AgentMessageChunk,
AgentThoughtChunk,
AuthenticateResponse,
AvailableCommand,
AvailableCommandsUpdate,
@@ -47,10 +45,7 @@ from acp.schema import (
ResourceContentBlock,
SessionCapabilities,
SessionForkCapabilities,
SessionInfoUpdate,
SessionListCapabilities,
SessionMode,
SessionModeState,
SessionModelState,
SessionResumeCapabilities,
SessionInfo,
@@ -62,9 +57,14 @@ from acp.schema import (
UserMessageChunk,
)
from acp_adapter.auth import TERMINAL_SETUP_AUTH_METHOD_ID, build_auth_methods, detect_provider
# AuthMethodAgent was renamed from AuthMethod in agent-client-protocol 0.9.0
try:
from acp.schema import AuthMethodAgent
except ImportError:
from acp.schema import AuthMethod as AuthMethodAgent # type: ignore[attr-defined]
from acp_adapter.auth import detect_provider
from acp_adapter.events import (
_build_plan_update_from_todo_result,
make_message_cb,
make_step_cb,
make_thinking_cb,
@@ -499,20 +499,6 @@ class HermesACPAgent(acp.Agent):
},
)
_EDIT_APPROVAL_POLICY_CONFIG_ID = "edit_approval_policy"
_EDIT_APPROVAL_POLICY_DEFAULT = "ask"
_MODE_DEFAULT = "default"
_MODE_ACCEPT_EDITS = "accept_edits"
_MODE_DONT_ASK = "dont_ask"
_MODE_TO_EDIT_APPROVAL_POLICY = {
_MODE_DEFAULT: "ask",
_MODE_ACCEPT_EDITS: "workspace_session",
_MODE_DONT_ASK: "session",
}
_EDIT_APPROVAL_POLICY_TO_MODE = {
value: key for key, value in _MODE_TO_EDIT_APPROVAL_POLICY.items()
}
def __init__(self, session_manager: SessionManager | None = None):
super().__init__()
self.session_manager = session_manager or SessionManager()
@@ -525,45 +511,6 @@ class HermesACPAgent(acp.Agent):
self._conn = conn
logger.info("ACP client connected")
def _session_modes(self, state: SessionState) -> SessionModeState:
"""Return ACP session modes while preserving Zed's separate model picker.
Zed renders ``config_options`` in the prominent selector slot where the
model picker was visible. Claude/Codex expose policy-like controls as ACP
modes, which coexist with the model picker, so Hermes maps edit approval
policy onto modes instead of advertising config options.
"""
current = str(getattr(state, "mode", "") or self._MODE_DEFAULT)
if current not in self._MODE_TO_EDIT_APPROVAL_POLICY:
current = self._MODE_DEFAULT
return SessionModeState(
current_mode_id=current,
available_modes=[
SessionMode(
id=self._MODE_DEFAULT,
name="Default",
description="Ask before edits.",
),
SessionMode(
id=self._MODE_ACCEPT_EDITS,
name="Accept Edits",
description="Auto-allow workspace and /tmp edits; still asks for sensitive paths.",
),
SessionMode(
id=self._MODE_DONT_ASK,
name="Don't Ask",
description="Auto-allow file edits for this session except sensitive paths.",
),
],
)
def _edit_approval_policy_for_state(self, state: SessionState) -> tuple[str, str | None]:
mode = str(getattr(state, "mode", "") or self._MODE_DEFAULT)
policy = self._MODE_TO_EDIT_APPROVAL_POLICY.get(mode, self._EDIT_APPROVAL_POLICY_DEFAULT)
return policy, state.cwd
@staticmethod
def _encode_model_choice(provider: str | None, model: str | None) -> str:
"""Encode a model selection so ACP clients can keep provider context."""
@@ -709,37 +656,6 @@ class HermesACPAgent(acp.Agent):
exc_info=True,
)
async def _send_session_info_update(self, session_id: str) -> None:
"""Send ACP native session metadata after Hermes changes it."""
if not self._conn:
return
try:
row = self.session_manager._get_db().get_session(session_id)
except Exception:
logger.debug("Could not read ACP session info for %s", session_id, exc_info=True)
return
if not row:
return
title = row.get("title")
# The `sessions` table does not have an `updated_at` column (see
# hermes_state.py schema — only started_at/ended_at). Use "now" as
# the updated_at since we're emitting this notification precisely
# because the title was just refreshed.
updated_at = datetime.now(timezone.utc).isoformat()
update = SessionInfoUpdate(
session_update="session_info_update",
title=title if isinstance(title, str) and title.strip() else None,
updated_at=updated_at,
)
try:
await self._conn.session_update(
session_id=session_id,
update=update,
)
except Exception:
logger.debug("Could not send ACP session info update for %s", session_id, exc_info=True)
def _schedule_usage_update(self, state: SessionState) -> None:
"""Schedule native context indicator refresh after ACP responses."""
if not self._conn:
@@ -828,7 +744,16 @@ class HermesACPAgent(acp.Agent):
resolved_protocol_version = (
protocol_version if isinstance(protocol_version, int) else acp.PROTOCOL_VERSION
)
auth_methods = build_auth_methods()
provider = detect_provider()
auth_methods = None
if provider:
auth_methods = [
AuthMethodAgent(
id=provider,
name=f"{provider} runtime credentials",
description=f"Authenticate Hermes using the currently configured {provider} runtime credentials.",
)
]
client_name = client_info.name if client_info else "unknown"
logger.info(
@@ -859,38 +784,24 @@ class HermesACPAgent(acp.Agent):
# server has provider credentials configured — harmless under
# Hermes' threat model (ACP is stdio-only, local-trust), but poor
# API hygiene and confusing if ACP ever grows multi-method auth.
if not isinstance(method_id, str):
return None
normalized_method = method_id.strip().lower()
provider = detect_provider()
if normalized_method == TERMINAL_SETUP_AUTH_METHOD_ID:
# Terminal auth launches Hermes setup/model selection out-of-band.
# Only report success once that flow has produced usable runtime
# credentials for the normal ACP session.
return AuthenticateResponse() if provider else None
if not provider or normalized_method != provider:
if not provider:
return None
if not isinstance(method_id, str) or method_id.strip().lower() != provider:
return None
return AuthenticateResponse()
# ---- Session management -------------------------------------------------
@staticmethod
def _flatten_history_text(value: Any) -> str:
"""Normalize a persisted text-or-text-parts value into a single string.
OpenAI-style assistant content (and provider reasoning fields) can arrive
as either a scalar string or a list of ``{"text": ...}`` /
``{"type": "text", "content": ...}`` parts. Whitespace-only inputs
collapse to an empty string so callers can treat ``""`` as "nothing to
emit".
"""
if isinstance(value, str):
return value.strip()
if isinstance(value, list):
def _history_message_text(message: dict[str, Any]) -> str:
"""Extract displayable text from a persisted OpenAI-style message."""
content = message.get("content")
if isinstance(content, str):
return content.strip()
if isinstance(content, list):
parts: list[str] = []
for item in value:
for item in content:
if isinstance(item, dict):
text = item.get("text")
if isinstance(text, str):
@@ -902,29 +813,6 @@ class HermesACPAgent(acp.Agent):
return "\n".join(part.strip() for part in parts if part and part.strip()).strip()
return ""
@classmethod
def _history_message_text(cls, message: dict[str, Any]) -> str:
"""Extract displayable text from a persisted OpenAI-style message."""
return cls._flatten_history_text(message.get("content"))
@classmethod
def _history_reasoning_text(cls, message: dict[str, Any]) -> str:
"""Extract displayable reasoning/thought text from a persisted assistant message.
Returns the first non-empty value among ``reasoning_content`` (the
canonical field used by DeepSeek / Moonshot and the post-#16892
chat-completions normalizer) and ``reasoning`` (used by the codex
event projector and several other transports). Both keys are
actively written by live code paths, so neither branch is
deprecated — they cover different transports rather than old vs.
new sessions.
"""
for key in ("reasoning_content", "reasoning"):
text = cls._flatten_history_text(message.get(key))
if text:
return text
return ""
@staticmethod
def _history_message_update(
*,
@@ -945,11 +833,6 @@ class HermesACPAgent(acp.Agent):
)
return None
@staticmethod
def _history_thought_update(text: str) -> AgentThoughtChunk:
"""Build an ACP history replay update for an assistant thought."""
return acp.update_agent_thought_text(text)
@staticmethod
def _history_tool_call_name_args(tool_call: dict[str, Any]) -> tuple[str, dict[str, Any]]:
"""Extract function name/arguments from an OpenAI-style tool_call."""
@@ -977,17 +860,13 @@ class HermesACPAgent(acp.Agent):
).strip()
async def _replay_session_history(self, state: SessionState) -> None:
"""Replay persisted user/assistant history during session/load or session/resume.
"""Send persisted user/assistant history to clients during session/load.
Invoked inline (``await``) from both ``load_session`` and
``resume_session`` so that spec-compliant ACP clients receive the
full transcript within the request's lifetime — see the comment at
the call sites for the rationale and prior-art citations.
Replays the conversation as user/assistant chunks, thinking-mode
thought chunks, plus reconstructed tool-call start/completion
notifications. Merely restoring server-side state makes Hermes
remember context, but leaves the editor looking like a clean thread.
Zed's ACP history UI calls ``session/load`` after the user picks an item
from the Agents sidebar. The agent must then replay the full conversation
as user/assistant chunks plus reconstructed tool-call start/completion
notifications; merely restoring server-side state makes Hermes remember
context, but leaves the editor looking like a clean thread.
"""
if not self._conn or not state.history:
return
@@ -1009,37 +888,24 @@ class HermesACPAgent(acp.Agent):
for message in state.history:
role = str(message.get("role") or "")
if role == "user":
text = self._history_message_text(message)
if text:
update = self._history_message_update(role=role, text=text)
if update is not None and not await _send(update):
return
continue
if role == "assistant":
thought = self._history_reasoning_text(message)
if thought and not await _send(self._history_thought_update(thought)):
return
if role in {"user", "assistant"}:
text = self._history_message_text(message)
if text:
update = self._history_message_update(role=role, text=text)
if update is not None and not await _send(update):
return
tool_calls = message.get("tool_calls")
if isinstance(tool_calls, list):
for tool_call in tool_calls:
if not isinstance(tool_call, dict):
continue
tool_call_id = self._history_tool_call_id(tool_call)
if not tool_call_id:
continue
tool_name, args = self._history_tool_call_name_args(tool_call)
active_tool_calls[tool_call_id] = (tool_name, args)
if not await _send(build_tool_start(tool_call_id, tool_name, args)):
return
if role == "assistant" and isinstance(message.get("tool_calls"), list):
for tool_call in message["tool_calls"]:
if not isinstance(tool_call, dict):
continue
tool_call_id = self._history_tool_call_id(tool_call)
if not tool_call_id:
continue
tool_name, args = self._history_tool_call_name_args(tool_call)
active_tool_calls[tool_call_id] = (tool_name, args)
if not await _send(build_tool_start(tool_call_id, tool_name, args)):
return
continue
if role == "tool":
@@ -1051,20 +917,15 @@ class HermesACPAgent(acp.Agent):
if not tool_call_id or not tool_name:
continue
result = message.get("content")
result_text = result if isinstance(result, str) else None
if not await _send(
build_tool_complete(
tool_call_id,
tool_name,
result=result_text,
result=result if isinstance(result, str) else None,
function_args=function_args,
)
):
return
if tool_name == "todo":
plan_update = _build_plan_update_from_todo_result(result_text)
if plan_update is not None and not await _send(plan_update):
return
async def new_session(
self,
@@ -1080,9 +941,20 @@ class HermesACPAgent(acp.Agent):
return NewSessionResponse(
session_id=state.session_id,
models=self._build_model_state(state),
modes=self._session_modes(state),
)
def _schedule_history_replay(self, state: SessionState) -> None:
"""Replay persisted history after session/load or session/resume returns.
Zed only attaches streamed transcript/tool updates once the load/resume
response has completed. Sending replay notifications while the request is
still in-flight can make the server look correct in logs while the editor
drops or fails to attach the tool-call history.
"""
loop = asyncio.get_running_loop()
replay_coro = self._replay_session_history(state)
loop.call_soon(asyncio.create_task, replay_coro)
async def load_session(
self,
cwd: str,
@@ -1096,36 +968,10 @@ class HermesACPAgent(acp.Agent):
return None
await self._register_session_mcp_servers(state, mcp_servers)
logger.info("Loaded session %s", session_id)
# Per ACP spec, `session/load` must stream the prior conversation back
# to the client via `session/update` notifications BEFORE responding,
# so the client receives the full transcript within the load request's
# lifetime. Awaiting the replay here matches Codex / Claude Code /
# OpenCode / Pi and the Zed client (which registers the session-update
# routing entry before awaiting the loadSession RPC specifically so
# in-call history replay updates can find the thread). Deferring this
# via `loop.call_soon` (as we did briefly in May 2026) broke every
# spec-compliant ACP client that measures notifications synchronously
# against the load response — see #12285 follow-up.
try:
await self._replay_session_history(state)
except Exception:
# Replay is best-effort — a corrupted or unexpected message shape
# must not turn a successful session/load into a JSON-RPC error
# response. Per-notification failures are already caught inside
# ``_replay_session_history``; this outer guard covers anything
# raised by the helpers themselves before reaching ``_send``.
logger.warning(
"ACP history replay raised during session/load for %s"
"load will still succeed, partial transcript may be missing",
session_id,
exc_info=True,
)
self._schedule_history_replay(state)
self._schedule_available_commands_update(session_id)
self._schedule_usage_update(state)
return LoadSessionResponse(
models=self._build_model_state(state),
modes=self._session_modes(state),
)
return LoadSessionResponse(models=self._build_model_state(state))
async def resume_session(
self,
@@ -1140,24 +986,10 @@ class HermesACPAgent(acp.Agent):
state = self.session_manager.create_session(cwd=cwd)
await self._register_session_mcp_servers(state, mcp_servers)
logger.info("Resumed session %s", state.session_id)
# See `load_session` above for the spec rationale — replay must
# complete before the response so clients receive the full transcript
# within the request's lifetime.
try:
await self._replay_session_history(state)
except Exception:
logger.warning(
"ACP history replay raised during session/resume for %s"
"resume will still succeed, partial transcript may be missing",
state.session_id,
exc_info=True,
)
self._schedule_history_replay(state)
self._schedule_available_commands_update(state.session_id)
self._schedule_usage_update(state)
return ResumeSessionResponse(
models=self._build_model_state(state),
modes=self._session_modes(state),
)
return ResumeSessionResponse(models=self._build_model_state(state))
async def cancel(self, session_id: str, **kwargs: Any) -> None:
state = self.session_manager.get_session(session_id)
@@ -1187,11 +1019,7 @@ class HermesACPAgent(acp.Agent):
logger.info("Forked session %s -> %s", session_id, new_id)
if new_id:
self._schedule_available_commands_update(new_id)
return ForkSessionResponse(
session_id=new_id,
models=self._build_model_state(state) if state is not None else None,
modes=self._session_modes(state) if state is not None else None,
)
return ForkSessionResponse(session_id=new_id)
async def list_sessions(
self,
@@ -1342,19 +1170,11 @@ class HermesACPAgent(acp.Agent):
tool_call_ids: dict[str, Deque[str]] = defaultdict(deque)
tool_call_meta: dict[str, dict[str, Any]] = {}
previous_approval_cb = None
edit_approval_requester = None
streamed_message = False
if conn:
tool_progress_cb = make_tool_progress_cb(
conn,
session_id,
loop,
tool_call_ids,
tool_call_meta,
edit_approval_policy_getter=lambda: self._edit_approval_policy_for_state(state),
)
tool_progress_cb = make_tool_progress_cb(conn, session_id, loop, tool_call_ids, tool_call_meta)
reasoning_cb = make_thinking_cb(conn, session_id, loop)
step_cb = make_step_cb(conn, session_id, loop, tool_call_ids, tool_call_meta)
message_cb = make_message_cb(conn, session_id, loop)
@@ -1366,17 +1186,6 @@ class HermesACPAgent(acp.Agent):
message_cb(text)
approval_cb = make_approval_callback(conn.request_permission, loop, session_id)
try:
from acp_adapter.edit_approval import make_acp_edit_approval_requester
edit_approval_requester = make_acp_edit_approval_requester(
conn.request_permission,
loop,
session_id,
auto_approve_getter=lambda: self._edit_approval_policy_for_state(state),
)
except Exception:
logger.debug("Could not create ACP edit approval requester", exc_info=True)
else:
tool_progress_cb = None
reasoning_cb = None
@@ -1406,11 +1215,9 @@ class HermesACPAgent(acp.Agent):
# which requires a notify_cb registered in _gateway_notify_cbs.
previous_approval_cb = None
previous_interactive = None
edit_approval_token = None
previous_session_id = None
def _run_agent() -> dict:
nonlocal previous_approval_cb, previous_interactive, edit_approval_token, previous_session_id
nonlocal previous_approval_cb, previous_interactive
# Bind HERMES_SESSION_KEY for this session so per-session caches
# (e.g. the interactive sudo password cache in tools.terminal_tool)
# scope to the ACP session rather than leaking across sessions
@@ -1434,24 +1241,10 @@ class HermesACPAgent(acp.Agent):
_terminal_tool.set_approval_callback(approval_cb)
except Exception:
logger.debug("Could not set ACP approval callback", exc_info=True)
if edit_approval_requester:
try:
from acp_adapter.edit_approval import set_edit_approval_requester
edit_approval_token = set_edit_approval_requester(edit_approval_requester)
except Exception:
logger.debug("Could not set ACP edit approval requester", exc_info=True)
# Signal to tools.approval that we have an interactive callback
# and the non-interactive auto-approve path must not fire.
previous_interactive = os.environ.get("HERMES_INTERACTIVE")
os.environ["HERMES_INTERACTIVE"] = "1"
# Propagate the originating ACP session id to tools that want to
# tag side-effects with it (e.g. ``kanban_create`` stamps it on
# the new task so clients can render a per-session board). Save
# and restore around the agent call so a re-used executor thread
# never leaks one session's id into the next session's tools.
previous_session_id = os.environ.get("HERMES_SESSION_ID")
os.environ["HERMES_SESSION_ID"] = session_id
try:
result = agent.run_conversation(
user_message=user_content,
@@ -1469,24 +1262,12 @@ class HermesACPAgent(acp.Agent):
os.environ.pop("HERMES_INTERACTIVE", None)
else:
os.environ["HERMES_INTERACTIVE"] = previous_interactive
# Restore HERMES_SESSION_ID symmetrically.
if previous_session_id is None:
os.environ.pop("HERMES_SESSION_ID", None)
else:
os.environ["HERMES_SESSION_ID"] = previous_session_id
if approval_cb:
try:
from tools import terminal_tool as _terminal_tool
_terminal_tool.set_approval_callback(previous_approval_cb)
except Exception:
logger.debug("Could not restore approval callback", exc_info=True)
if edit_approval_token is not None:
try:
from acp_adapter.edit_approval import reset_edit_approval_requester
reset_edit_approval_requester(edit_approval_token)
except Exception:
logger.debug("Could not restore ACP edit approval requester", exc_info=True)
if session_tokens is not None and clear_session_vars is not None:
try:
clear_session_vars(session_tokens)
@@ -1517,28 +1298,16 @@ class HermesACPAgent(acp.Agent):
try:
from agent.title_generator import maybe_auto_title
def _notify_title_update(_title: str) -> None:
if conn:
loop.call_soon_threadsafe(
asyncio.create_task,
self._send_session_info_update(session_id),
)
maybe_auto_title(
self.session_manager._get_db(),
session_id,
user_text,
final_response,
state.history,
title_callback=_notify_title_update,
)
except Exception:
logger.debug("Failed to auto-title ACP session %s", session_id, exc_info=True)
if final_response and conn and (not streamed_message or result.get("response_transformed")):
# Deliver the final response when streaming did not already send it,
# or when a plugin hook transformed the response after streaming
# finished (e.g. transform_llm_output) — otherwise the appended /
# rewritten text never reaches the client.
if final_response and conn and not streamed_message:
update = acp.update_agent_message_text(final_response)
await conn.session_update(session_id, update)
@@ -1921,12 +1690,9 @@ class HermesACPAgent(acp.Agent):
if state is None:
logger.warning("Session %s: mode switch requested for missing session", session_id)
return None
normalized_mode = str(mode_id or "").strip()
if normalized_mode not in self._MODE_TO_EDIT_APPROVAL_POLICY:
normalized_mode = self._MODE_DEFAULT
setattr(state, "mode", normalized_mode)
setattr(state, "mode", mode_id)
self.session_manager.save_session(session_id)
logger.info("Session %s: mode switched to %s", session_id, normalized_mode)
logger.info("Session %s: mode switched to %s", session_id, mode_id)
return SetSessionModeResponse()
async def set_config_option(
@@ -1938,15 +1704,11 @@ class HermesACPAgent(acp.Agent):
logger.warning("Session %s: config update requested for missing session", session_id)
return None
if str(config_id) == self._EDIT_APPROVAL_POLICY_CONFIG_ID:
mode = self._EDIT_APPROVAL_POLICY_TO_MODE.get(str(value), self._MODE_DEFAULT)
setattr(state, "mode", mode)
else:
options = getattr(state, "config_options", None)
if not isinstance(options, dict):
options = {}
options[str(config_id)] = value
setattr(state, "config_options", options)
options = getattr(state, "config_options", None)
if not isinstance(options, dict):
options = {}
options[str(config_id)] = value
setattr(state, "config_options", options)
self.session_manager.save_session(session_id)
logger.info("Session %s: config option %s updated", session_id, config_id)
return SetSessionConfigOptionResponse(config_options=[])

View File

@@ -202,44 +202,6 @@ def _json_loads_maybe(value: Optional[str]) -> Any:
return None
def _tool_result_failed(result: Optional[str], tool_name: str | None = None) -> bool:
"""Return True when a structured Hermes tool result clearly failed.
Keep this deliberately conservative. Plain text can contain words like
"error" because tests failed or a command printed diagnostics; Zed should
only receive ACP failed status for structured tool-level failures.
"""
# Raised exceptions from the agent's tool executor get wrapped in a
# canonical "Error executing tool '<name>': ..." prefix (see
# agent/tool_executor.py around the try/except). That prefix is uniquely
# produced by the wrapper itself — it cannot legitimately appear in
# well-behaved tool output. Catch it so a tool that blew up shows as
# failed in Zed instead of misleadingly green.
if isinstance(result, str) and result.startswith("Error executing tool '"):
return True
data = _json_loads_maybe(result)
if not isinstance(data, dict):
return False
for key in ("success", "ok"):
if data.get(key) is False:
return True
exit_code = data.get("exit_code", data.get("returncode"))
if isinstance(exit_code, int) and exit_code != 0:
return True
# Hermes core/polished tools commonly report tool-level failures as a
# structured {"error": "..."} payload without an explicit success flag.
# Keep generic plugin/unknown tool payloads conservative to avoid marking
# optional diagnostic messages as failed.
if tool_name in _POLISHED_TOOLS and data.get("error") and not data.get("content"):
return True
return False
def _truncate_text(text: str, limit: int = 5000) -> str:
if len(text) <= limit:
return text
@@ -316,26 +278,6 @@ def _format_search_files_result(result: Optional[str]) -> Optional[str]:
data = _json_loads_maybe(result)
if not isinstance(data, dict):
return None
files = data.get("files")
if isinstance(files, list):
total = data.get("total_count", len(files))
shown = min(len(files), 20)
truncated = bool(data.get("truncated")) or len(files) > shown
lines = [
"File search results",
f"Found {total} file{'s' if total != 1 else ''}; showing {shown}.",
"",
]
for path in files[:shown]:
lines.append(f"- {path}")
if truncated:
lines.extend([
"",
"Results truncated. Narrow the search, add path/file_glob, or use offset to page.",
])
return _truncate_text("\n".join(lines), limit=7000)
matches = data.get("matches")
if not isinstance(matches, list):
return None
@@ -726,114 +668,14 @@ def _format_media_or_cron_result(tool_name: str, result: Optional[str]) -> Optio
return "\n".join(lines)
def _format_structured_value(
key: str,
value: Any,
*,
indent: int = 0,
max_depth: int = 3,
max_items: int = 8,
) -> List[str]:
"""Render nested JSON-ish values as compact Markdown bullets, not inline blobs."""
prefix = " " * indent
bullet = f"{prefix}- "
label = f"**{key}:**" if key else ""
if value in (None, "", [], {}):
return []
if max_depth <= 0:
if isinstance(value, (dict, list)):
preview = json.dumps(value, ensure_ascii=False, default=str)
else:
preview = str(value)
return [f"{bullet}{label} {_truncate_text(preview, limit=240)}" if label else f"{bullet}{_truncate_text(preview, limit=240)}"]
if isinstance(value, dict):
lines = [f"{bullet}{label}" if label else f"{bullet}{len(value)} fields"]
shown = 0
for child_key, child_value in value.items():
if child_value in (None, "", [], {}):
continue
lines.extend(
_format_structured_value(
str(child_key),
child_value,
indent=indent + 1,
max_depth=max_depth - 1,
max_items=max_items,
)
)
shown += 1
if shown >= max_items:
remaining = max(0, len(value) - shown)
if remaining:
lines.append(f"{' ' * (indent + 1)}- ... {remaining} more fields")
break
return lines
if isinstance(value, list):
lines = [f"{bullet}{label} {len(value)} item{'s' if len(value) != 1 else ''}" if label else f"{bullet}{len(value)} item{'s' if len(value) != 1 else ''}"]
for idx, item in enumerate(value[:max_items], 1):
if isinstance(item, dict):
headline = str(item.get("content") or item.get("message") or item.get("title") or item.get("name") or item.get("id") or "").strip()
if headline:
lines.append(f"{' ' * (indent + 1)}{idx}. {_truncate_text(headline, limit=220)}")
for child_key in ("id", "status", "type", "scope", "quality_score", "score", "path", "url"):
child_value = item.get(child_key)
if child_value not in (None, "", [], {}):
lines.append(f"{' ' * (indent + 2)}- **{child_key}:** {_truncate_text(str(child_value), limit=180)}")
else:
lines.append(f"{' ' * (indent + 1)}{idx}.")
for child_key, child_value in list(item.items())[:max_items]:
lines.extend(
_format_structured_value(
str(child_key),
child_value,
indent=indent + 2,
max_depth=max_depth - 1,
max_items=max_items,
)
)
elif isinstance(item, list):
lines.append(f"{' ' * (indent + 1)}{idx}. {len(item)} items")
for nested in item[:max_items]:
lines.extend(
_format_structured_value(
"",
nested,
indent=indent + 2,
max_depth=max_depth - 1,
max_items=max_items,
)
)
else:
lines.append(f"{' ' * (indent + 1)}{idx}. {_truncate_text(str(item), limit=240)}")
if len(value) > max_items:
lines.append(f"{' ' * (indent + 1)}... {len(value) - max_items} more items")
return lines
return [f"{bullet}{label} {_truncate_text(str(value), limit=500)}" if label else f"{bullet}{_truncate_text(str(value), limit=500)}"]
def _format_generic_structured_result(
tool_name: str,
result: Optional[str],
*,
fallback_to_text: bool = True,
) -> Optional[str]:
def _format_generic_structured_result(tool_name: str, result: Optional[str]) -> Optional[str]:
data = _json_loads_maybe(result)
if not isinstance(data, (dict, list)):
return result if fallback_to_text and isinstance(result, str) and result.strip() else None
return result if isinstance(result, str) and result.strip() else None
if isinstance(data, list):
lines = [f"{tool_name}: {len(data)} item{'s' if len(data) != 1 else ''}"]
for item in data[:12]:
if isinstance(item, (dict, list)):
lines.extend(_format_structured_value("", item, indent=0, max_depth=2, max_items=6))
else:
lines.append(f"- {_truncate_text(str(item), limit=240)}")
if len(data) > 12:
lines.append(f"... {len(data) - 12} more items")
lines.append(f"- {_truncate_text(str(item), limit=240)}")
return _truncate_text("\n".join(lines), limit=5000)
if data.get("success") is False or data.get("error"):
@@ -857,9 +699,12 @@ def _format_generic_structured_result(
continue
if value in (None, "", [], {}):
continue
lines.extend(_format_structured_value(str(key), value, indent=0, max_depth=3, max_items=8))
if len(lines) >= 40:
lines.append("- ... more fields truncated")
if isinstance(value, (dict, list)):
preview = json.dumps(value, ensure_ascii=False, default=str)
else:
preview = str(value)
lines.append(f"- **{key}:** {_truncate_text(preview, limit=500)}")
if len(lines) >= 14:
break
content = data.get("content")
@@ -899,14 +744,79 @@ def _build_polished_completion_content(
if formatter is None and tool_name in _POLISHED_TOOLS:
formatter = lambda: _format_generic_structured_result(tool_name, result)
if formatter is None:
text = _format_generic_structured_result(tool_name, result, fallback_to_text=False)
else:
text = formatter()
return None
text = formatter()
if not text:
return None
return [_text(text)]
def _build_patch_mode_content(patch_text: str) -> List[Any]:
"""Parse V4A patch mode input into ACP diff blocks when possible."""
if not patch_text:
return [acp.tool_content(acp.text_block(""))]
try:
from tools.patch_parser import OperationType, parse_v4a_patch
operations, error = parse_v4a_patch(patch_text)
if error or not operations:
return [acp.tool_content(acp.text_block(patch_text))]
content: List[Any] = []
for op in operations:
if op.operation == OperationType.UPDATE:
old_chunks: list[str] = []
new_chunks: list[str] = []
for hunk in op.hunks:
old_lines = [line.content for line in hunk.lines if line.prefix in {" ", "-"}]
new_lines = [line.content for line in hunk.lines if line.prefix in {" ", "+"}]
if old_lines or new_lines:
old_chunks.append("\n".join(old_lines))
new_chunks.append("\n".join(new_lines))
old_text = "\n...\n".join(chunk for chunk in old_chunks if chunk)
new_text = "\n...\n".join(chunk for chunk in new_chunks if chunk)
if old_text or new_text:
content.append(
acp.tool_diff_content(
path=op.file_path,
old_text=old_text or None,
new_text=new_text or "",
)
)
continue
if op.operation == OperationType.ADD:
added_lines = [line.content for hunk in op.hunks for line in hunk.lines if line.prefix == "+"]
content.append(
acp.tool_diff_content(
path=op.file_path,
new_text="\n".join(added_lines),
)
)
continue
if op.operation == OperationType.DELETE:
content.append(
acp.tool_diff_content(
path=op.file_path,
old_text=f"Delete file: {op.file_path}",
new_text="",
)
)
continue
if op.operation == OperationType.MOVE:
content.append(
acp.tool_content(acp.text_block(f"Move file: {op.file_path} -> {op.new_path}"))
)
return content or [acp.tool_content(acp.text_block(patch_text))]
except Exception:
return [acp.tool_content(acp.text_block(patch_text))]
def _strip_diff_prefix(path: str) -> str:
raw = str(path or "").strip()
if raw.startswith(("a/", "b/")):
@@ -985,7 +895,7 @@ def _build_tool_complete_content(
if len(display_result) > 5000:
display_result = display_result[:4900] + f"\n... ({len(result)} chars total, truncated)"
if tool_name == "skill_manage":
if tool_name in {"write_file", "patch", "skill_manage"}:
try:
from agent.display import extract_edit_diff
@@ -1018,8 +928,6 @@ def build_tool_start(
tool_call_id: str,
tool_name: str,
arguments: Dict[str, Any],
*,
edit_diff: Any = None,
) -> ToolCallStart:
"""Create a ToolCallStart event for the given hermes tool invocation."""
kind = get_tool_kind(tool_name)
@@ -1027,34 +935,23 @@ def build_tool_start(
locations = extract_locations(arguments)
if tool_name == "patch":
if edit_diff is not None:
content = [
acp.tool_diff_content(
path=edit_diff.path,
old_text=edit_diff.old_text,
new_text=edit_diff.new_text,
)
]
mode = arguments.get("mode", "replace")
if mode == "replace":
path = arguments.get("path", "")
old = arguments.get("old_string", "")
new = arguments.get("new_string", "")
content = [acp.tool_diff_content(path=path, new_text=new, old_text=old)]
else:
mode = arguments.get("mode", "replace")
path = arguments.get("path") or "patch input"
content = [_text(f"Preparing {mode} edit for {path}. Approval prompt shows the diff.")]
patch_text = arguments.get("patch", "")
content = _build_patch_mode_content(patch_text)
return acp.start_tool_call(
tool_call_id, title, kind=kind, content=content, locations=locations,
)
if tool_name == "write_file":
if edit_diff is not None:
content = [
acp.tool_diff_content(
path=edit_diff.path,
old_text=edit_diff.old_text,
new_text=edit_diff.new_text,
)
]
else:
path = arguments.get("path", "")
content = [_text(f"Preparing write to {path}. Approval prompt shows the diff." if path else "Preparing file write. Approval prompt shows the diff.")]
path = arguments.get("path", "")
file_content = arguments.get("content", "")
content = [acp.tool_diff_content(path=path, new_text=file_content)]
return acp.start_tool_call(
tool_call_id, title, kind=kind, content=content, locations=locations,
)
@@ -1225,12 +1122,8 @@ def build_tool_start(
tool_call_id, title, kind=kind, content=content, locations=locations,
)
if not arguments:
return acp.start_tool_call(
tool_call_id, title, kind=kind, content=None, locations=locations, raw_input=None,
)
# Generic fallback
import json
try:
args_text = json.dumps(arguments, indent=2, default=str)
except (TypeError, ValueError):
@@ -1242,10 +1135,6 @@ def build_tool_start(
)
def _is_structured_json_result(result: Optional[str]) -> bool:
return isinstance(_json_loads_maybe(result), (dict, list))
def build_tool_complete(
tool_call_id: str,
tool_name: str,
@@ -1268,9 +1157,9 @@ def build_tool_complete(
return acp.update_tool_call(
tool_call_id,
kind=kind,
status="failed" if _tool_result_failed(result, tool_name) else "completed",
status="completed",
content=content,
raw_output=None if tool_name in _POLISHED_TOOLS or _is_structured_json_result(result) else result,
raw_output=None if tool_name in _POLISHED_TOOLS else result,
)

View File

@@ -1,16 +1,12 @@
{
"id": "hermes-agent",
"name": "Hermes Agent",
"version": "0.15.1",
"description": "Self-improving open-source AI agent by Nous Research with ACP editor integration, persistent memory, skills, and rich tool support.",
"repository": "https://github.com/NousResearch/hermes-agent",
"website": "https://hermes-agent.nousresearch.com/docs/user-guide/features/acp",
"authors": ["Nous Research"],
"license": "MIT",
"schema_version": 1,
"name": "hermes-agent",
"display_name": "Hermes Agent",
"description": "AI agent by Nous Research with 90+ tools, persistent memory, and multi-platform support",
"icon": "icon.svg",
"distribution": {
"uvx": {
"package": "hermes-agent[acp]==0.15.1",
"args": ["hermes-acp"]
}
"type": "command",
"command": "hermes",
"args": ["acp"]
}
}

View File

@@ -1,8 +1,25 @@
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 16 16" width="16" height="16" fill="none">
<path d="M8 1.5v13" stroke="currentColor" stroke-width="1.5" stroke-linecap="round"/>
<path d="M8 3.25c-2.35-1.4-4.7-.95-6.25.35 1.85-.2 3.8.2 5.55 1.55" stroke="currentColor" stroke-width="1.1" stroke-linecap="round" stroke-linejoin="round"/>
<path d="M8 3.25c2.35-1.4 4.7-.95 6.25.35-1.85-.2-3.8.2-5.55 1.55" stroke="currentColor" stroke-width="1.1" stroke-linecap="round" stroke-linejoin="round"/>
<path d="M8 13.25c-2.3-1-3.05-2.65-1.35-4.15-2 .8-2.35 2.95-.35 4" stroke="currentColor" stroke-width="1.1" stroke-linecap="round" stroke-linejoin="round"/>
<path d="M8 13.25c2.3-1 3.05-2.65 1.35-4.15 2 .8 2.35 2.95.35 4" stroke="currentColor" stroke-width="1.1" stroke-linecap="round" stroke-linejoin="round"/>
<circle cx="8" cy="1.8" r="1.1" fill="currentColor"/>
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 64 64" width="64" height="64">
<defs>
<linearGradient id="gold" x1="0%" y1="0%" x2="0%" y2="100%">
<stop offset="0%" style="stop-color:#F5C542;stop-opacity:1" />
<stop offset="100%" style="stop-color:#D4961C;stop-opacity:1" />
</linearGradient>
</defs>
<!-- Staff -->
<rect x="30" y="10" width="4" height="46" rx="2" fill="url(#gold)" />
<!-- Wings (left) -->
<path d="M30 18 C24 14, 14 14, 10 18 C14 16, 22 16, 28 20" fill="#F5C542" opacity="0.9" />
<path d="M30 22 C26 19, 18 19, 14 22 C18 20, 24 20, 28 24" fill="#D4961C" opacity="0.8" />
<!-- Wings (right) -->
<path d="M34 18 C40 14, 50 14, 54 18 C50 16, 42 16, 36 20" fill="#F5C542" opacity="0.9" />
<path d="M34 22 C38 19, 46 19, 50 22 C46 20, 40 20, 36 24" fill="#D4961C" opacity="0.8" />
<!-- Left serpent -->
<path d="M32 48 C22 44, 20 38, 26 34 C20 36, 18 42, 24 46 C18 40, 22 30, 30 28 C24 32, 22 38, 28 42"
fill="none" stroke="#F5C542" stroke-width="2.5" stroke-linecap="round" />
<!-- Right serpent -->
<path d="M32 48 C42 44, 44 38, 38 34 C44 36, 46 42, 40 46 C46 40, 42 30, 34 28 C40 32, 42 38, 36 42"
fill="none" stroke="#D4961C" stroke-width="2.5" stroke-linecap="round" />
<!-- Orb at top -->
<circle cx="32" cy="10" r="4" fill="#F5C542" />
<circle cx="32" cy="10" r="2" fill="#FFF8E1" opacity="0.7" />
</svg>

Before

Width:  |  Height:  |  Size: 882 B

After

Width:  |  Height:  |  Size: 1.4 KiB

View File

@@ -4,5 +4,3 @@ These modules contain pure utility functions and self-contained classes
that were previously embedded in the 3,600-line run_agent.py. Extracting
them makes run_agent.py focused on the AIAgent orchestrator class.
"""
from . import jiter_preload as _jiter_preload # noqa: F401

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -1,68 +0,0 @@
"""Async/sync bridging helpers.
The codebase has ~30 sites that schedule a coroutine onto an event loop from a
worker thread via :func:`asyncio.run_coroutine_threadsafe`. That function can
raise :class:`RuntimeError` (e.g. the loop was closed during a shutdown race),
and when it does the coroutine object is never awaited and never closed —
which triggers a ``"coroutine '<name>' was never awaited"`` RuntimeWarning and
leaks the coroutine's frame until GC.
:func:`safe_schedule_threadsafe` wraps the call, closes the coroutine on
scheduling failure, and returns ``None`` (instead of a half-formed future) so
callers can branch cleanly:
fut = safe_schedule_threadsafe(coro, loop)
if fut is None:
return # or fallback behavior
fut.result(timeout=5)
The helper deliberately does NOT also handle ``future.result()`` failures —
that is a separate concern. Once the loop has accepted the coroutine, its
lifecycle belongs to the loop, not the scheduling thread.
"""
from __future__ import annotations
import asyncio
import logging
from concurrent.futures import Future
from typing import Any, Coroutine, Optional
_DEFAULT_LOGGER = logging.getLogger(__name__)
def safe_schedule_threadsafe(
coro: Coroutine[Any, Any, Any],
loop: Optional[asyncio.AbstractEventLoop],
*,
logger: Optional[logging.Logger] = None,
log_message: str = "Failed to schedule coroutine on loop",
log_level: int = logging.DEBUG,
) -> Optional[Future]:
"""Schedule ``coro`` on ``loop`` from a sync context, leak-safe.
Returns the :class:`concurrent.futures.Future` on success, or ``None`` if
the loop is missing or :func:`asyncio.run_coroutine_threadsafe` raised
(e.g. the loop was closed during a shutdown race). In all failure paths
the coroutine is :meth:`close`-d so it does not trigger
``"coroutine was never awaited"`` warnings or leak its frame.
Callers retain full control over what to do with the returned future
(call ``.result(timeout=...)``, attach ``add_done_callback``, ignore it
fire-and-forget, etc.).
"""
log = logger if logger is not None else _DEFAULT_LOGGER
if loop is None:
if asyncio.iscoroutine(coro):
coro.close()
log.log(log_level, "%s: loop is None", log_message)
return None
try:
return asyncio.run_coroutine_threadsafe(coro, loop)
except Exception as exc:
if asyncio.iscoroutine(coro):
coro.close()
log.log(log_level, "%s: %s", log_message, exc)
return None

File diff suppressed because it is too large Load Diff

View File

@@ -1,555 +0,0 @@
"""Microsoft Entra ID adapter for Microsoft Foundry.
Provides keyless authentication for Microsoft Foundry deployments using the
`azure-identity` SDK's `DefaultAzureCredential` chain (env service principal
→ workload identity → managed identity → VS Code → Azure CLI → azd →
PowerShell → broker).
Architecture mirrors `agent/bedrock_adapter.py`:
* Lazy import. `azure-identity` is only loaded when ``model.auth_mode =
entra_id`` is selected. Users who stick with `AZURE_FOUNDRY_API_KEY`
never pay the import cost.
* SDK-callable contract. The public entry point ``build_token_provider``
returns a zero-arg callable produced by ``get_bearer_token_provider`` —
this is exactly the value Microsoft's documented sample plugs into
``OpenAI(api_key=token_provider, base_url=...)``. The OpenAI SDK calls
it before every request, so token refresh is transparent.
* Three explicit consumer-side helpers (display / cache / http-bearer)
rather than one generic "materialize" function — splitting them by
purpose prevents accidental token-minting in logging paths or token
leakage into cache keys / dashboard JSON.
* No persisted JWT. ``azure-identity`` caches in-process and (where
available) in the OS keychain or ``~/.IdentityService``. Hermes does
not duplicate that storage in ``auth.json``.
Reference: https://learn.microsoft.com/azure/ai-foundry/foundry-models/how-to/configure-entra-id
Requires: ``azure-identity`` (optional dependency — only needed when
``model.auth_mode = entra_id``).
"""
from __future__ import annotations
import functools
import logging
import os
import threading
from dataclasses import dataclass
from typing import Any, Callable, Dict, Optional
logger = logging.getLogger(__name__)
# Microsoft-documented scope for Foundry inference auth. Both the new
# Foundry portal and the legacy Azure OpenAI managed-identity docs use
# this scope for ALL Foundry endpoint shapes (*.openai.azure.com,
# *.services.ai.azure.com, *.ai.azure.com). The older control-plane
# scope ``https://cognitiveservices.azure.com/.default`` is for ARM
# resource management and is rejected for inference by newer
# resources — users with that requirement override via
# ``model.entra.scope`` in config.yaml.
SCOPE_AI_AZURE_DEFAULT = "https://ai.azure.com/.default"
# ---------------------------------------------------------------------------
# Lazy SDK import — only loaded when the Entra path is actually used.
# ---------------------------------------------------------------------------
_AZURE_IDENTITY_FEATURE = "provider.azure_identity"
def has_azure_identity_installed() -> bool:
"""Return True if `azure-identity` can be imported right now.
Cheap check — does not walk the credential chain.
"""
try:
import azure.identity # noqa: F401
return True
except Exception:
return False
def _require_azure_identity():
"""Import ``azure.identity``, lazy-installing it if allowed.
Raises ``ImportError`` with a clear actionable message when the
package is missing and lazy installs are disabled.
"""
try:
import azure.identity as _ai
return _ai
except ImportError:
try:
from tools.lazy_deps import ensure, FeatureUnavailable
except ImportError as exc:
raise ImportError(
"The 'azure-identity' package is required for Azure AI "
"Foundry Entra ID authentication. Install it with: "
"pip install azure-identity"
) from exc
try:
ensure(_AZURE_IDENTITY_FEATURE, prompt=False)
except FeatureUnavailable as exc:
raise ImportError(
"The 'azure-identity' package is required for Azure AI "
"Foundry Entra ID authentication. " + str(exc)
) from exc
# Retry import after lazy install.
import azure.identity as _ai # noqa: WPS440
return _ai
def reset_credential_cache() -> None:
"""Clear the cached ``DefaultAzureCredential``. Used by tests and
profile switches.
Defensive against tests that ``monkeypatch.setattr`` over
``build_credential`` with a plain (non-lru-cached) function — those
won't expose ``cache_clear()`` until pytest reverts the patch.
"""
cache_clear = getattr(build_credential, "cache_clear", None)
if callable(cache_clear):
cache_clear()
# ---------------------------------------------------------------------------
# Token-provider construction
# ---------------------------------------------------------------------------
@dataclass(frozen=True)
class EntraIdentityConfig:
"""Serializable Entra ID config.
Captures the Hermes-managed Entra knobs we need outside Azure SDK
environment configuration. Everything else
(tenant ID, service principal secret, federated token file, sovereign
cloud authority, etc.) flows through azure-identity's standard
``AZURE_*`` env vars — see the Bedrock pattern in
``hermes_cli/runtime_provider.py:1310-1377`` for the analogous
"let the SDK read env" approach.
``scope`` is Microsoft's documented Foundry inference audience. Almost
everyone uses the default; sovereign-cloud / non-standard tenants can
override via ``model.entra.scope``. Identity selection (user-assigned
managed identity, workload identity, service principal, tenant, authority)
stays in the standard Azure SDK env vars such as ``AZURE_CLIENT_ID``.
``exclude_interactive_browser`` is kept as an internal constructor knob
so probes stay non-interactive by default. It is not written by the setup
wizard.
The dataclass is frozen so it's hashable for ``functools.lru_cache``
keying, and serializable across multiprocessing boundaries (workers
rebuild the credential inside their own process).
"""
scope: str = SCOPE_AI_AZURE_DEFAULT
exclude_interactive_browser: bool = True
def __post_init__(self) -> None:
scope = str(self.scope or "").strip() or SCOPE_AI_AZURE_DEFAULT
object.__setattr__(self, "scope", scope)
def to_dict(self) -> Dict[str, Any]:
return {
"scope": self.scope,
"exclude_interactive_browser": self.exclude_interactive_browser,
}
@classmethod
def from_dict(cls, data: Optional[Dict[str, Any]],
*, default_scope: Optional[str] = None) -> "EntraIdentityConfig":
data = data or {}
scope = str(data.get("scope") or "").strip() or default_scope or SCOPE_AI_AZURE_DEFAULT
exclude_browser = bool(data.get("exclude_interactive_browser", True))
return cls(
scope=scope,
exclude_interactive_browser=exclude_browser,
)
def _build_default_credential(config: EntraIdentityConfig) -> Any:
"""Construct a ``DefaultAzureCredential`` for ``config``.
Only Hermes-selected knobs are passed as kwargs. Everything else
(tenant, service principal secret, federated token file, sovereign
cloud authority, etc.) is read by ``azure-identity`` from the
standard ``AZURE_*`` environment variables — see Microsoft's
documented credential resolution chain. Users configure those in
``~/.hermes/.env`` or the deployment environment.
"""
ai = _require_azure_identity()
kwargs: Dict[str, Any] = {}
# SDK default is True (browser excluded); only pass when the user
# explicitly opts in to interactive browser auth.
if not config.exclude_interactive_browser:
kwargs["exclude_interactive_browser_credential"] = False
return ai.DefaultAzureCredential(**kwargs)
@functools.lru_cache(maxsize=1)
def build_credential(config: EntraIdentityConfig) -> Any:
"""Return the cached ``DefaultAzureCredential`` for ``config``.
Hermes processes use exactly one Entra config at a time (the
``model.entra.*`` block in config.yaml drives every aux task,
subagent, and credential probe in the session). ``maxsize=1`` is
intentional: it reflects the actual usage pattern and keeps the
cache trivially small.
``EntraIdentityConfig`` is a frozen dataclass, so it's hashable and
safe as an LRU-cache key. ``functools.lru_cache`` is thread-safe in
CPython.
If two distinct configs are ever passed (tests do this; production
rarely), the LRU eviction handles it correctly — each call still
returns a credential matching its config; only one is cached at a
time. Use :func:`reset_credential_cache` to clear (e.g. in tests).
"""
return _build_default_credential(config)
def build_token_provider(scope: Optional[str] = None,
*,
config: Optional[EntraIdentityConfig] = None,
base_url: Optional[str] = None,
exclude_interactive_browser: bool = True,
) -> Callable[[], str]:
"""Return a zero-arg callable that mints a fresh Entra bearer JWT.
The returned callable is exactly what Microsoft's documented Foundry
sample expects::
from openai import OpenAI
client = OpenAI(
base_url="https://my-resource.openai.azure.com/openai/v1/",
api_key=build_token_provider(),
)
Scope resolution order:
1. ``config.scope`` when a config object is supplied
2. explicit ``scope`` kwarg
3. ``SCOPE_AI_AZURE_DEFAULT`` (Microsoft's documented Foundry scope)
``base_url`` is unused today and kept for back-compat. Tenant /
service-principal / sovereign-cloud configuration flows through
``azure-identity``'s standard ``AZURE_*`` environment variables —
see :func:`_build_default_credential` for the rationale.
NOT serializable across process boundaries. For multiprocessing
workers, serialize the ``EntraIdentityConfig`` and rebuild the
provider inside the worker.
"""
ai = _require_azure_identity()
if config is None:
config = EntraIdentityConfig(
scope=scope or SCOPE_AI_AZURE_DEFAULT,
exclude_interactive_browser=exclude_interactive_browser,
)
credential = build_credential(config)
return ai.get_bearer_token_provider(credential, config.scope)
# ---------------------------------------------------------------------------
# Credential probing
# ---------------------------------------------------------------------------
def has_azure_identity_credentials(scope: Optional[str] = None,
*,
config: Optional[EntraIdentityConfig] = None,
timeout_seconds: float = 10.0,
allow_install: bool = True,
**overrides: Any) -> bool:
"""Best-effort probe: can `DefaultAzureCredential` mint a token now?
Runs ``credential.get_token(scope)`` under a thread-based timeout so
a slow token service can't hang the caller. Returns False on any
error — never raises. Use for ``hermes doctor`` /
``hermes auth status`` / wizard preflight.
``allow_install``: when True (default) and ``azure-identity`` is not
importable, the adapter triggers the standard lazy-install path
(subject to ``security.allow_lazy_installs``) before probing. Set
False to make this strictly an "is installed?" check — used on hot
paths like CLI startup where we never want pip to run.
NOT used by ``is_provider_configured()`` — that path is structural
only (no token mint), so CLI startup doesn't pay this latency.
"""
if not has_azure_identity_installed():
if not allow_install:
return False
try:
_require_azure_identity()
except ImportError as exc:
logger.debug("azure-identity lazy install unavailable: %s", exc)
return False
if config is None:
effective_scope = (scope or "").strip() or SCOPE_AI_AZURE_DEFAULT
config = EntraIdentityConfig(scope=effective_scope, **overrides)
result = {"ok": False}
def _probe() -> None:
try:
credential = build_credential(config)
tok = credential.get_token(config.scope)
result["ok"] = bool(getattr(tok, "token", None))
except Exception as exc:
logger.debug("Entra credential probe failed: %s", exc)
result["ok"] = False
thread = threading.Thread(target=_probe, daemon=True)
thread.start()
thread.join(timeout=max(0.01, timeout_seconds))
if thread.is_alive():
logger.debug("Entra token service probe timed out after %ss", timeout_seconds)
return False
return bool(result.get("ok"))
def describe_active_credential(config: Optional[EntraIdentityConfig] = None,
*,
scope: Optional[str] = None,
timeout_seconds: float = 10.0,
allow_install: bool = True,
**overrides: Any) -> Dict[str, Any]:
"""Return diagnostic info about the active credential chain.
Best-effort: runs ``get_token()`` and inspects what came back.
Designed for ``hermes doctor`` and the wizard preflight — never
raises, returns ``{"ok": False, "error": ...}`` on failure.
``allow_install``: when True (default) and ``azure-identity`` is not
importable, the adapter triggers the standard lazy-install path
(subject to ``security.allow_lazy_installs``) before probing. The
install failure is surfaced as the diagnostic error when it fails.
Set False for hot CLI paths that should never trigger pip.
``azure-identity`` doesn't expose the winning inner credential as
a public field, so we report a coarse picture (env vars present,
token expiry, claims-derived tenant) rather than the credential
class name. Users wanting the precise class can run with
``AZURE_LOG_LEVEL=DEBUG``.
"""
info: Dict[str, Any] = {"ok": False}
if not has_azure_identity_installed():
if not allow_install:
info["error"] = "azure-identity not installed"
info["hint"] = (
"pip install azure-identity (or rely on lazy install at "
"first use)"
)
return info
try:
_require_azure_identity()
except ImportError as exc:
info["error"] = str(exc) or "azure-identity not installed"
info["hint"] = (
"pip install azure-identity manually, or enable lazy "
"installs (security.allow_lazy_installs: true in "
"config.yaml)."
)
return info
if config is None:
effective_scope = (scope or "").strip() or SCOPE_AI_AZURE_DEFAULT
config = EntraIdentityConfig(scope=effective_scope, **overrides)
info["scope"] = config.scope
# Tenant / authority / service-principal config flow through the
# standard ``AZURE_*`` env vars; surface them below.
if os.environ.get("AZURE_TENANT_ID", "").strip():
info["tenant_id_env"] = os.environ["AZURE_TENANT_ID"].strip()
# Surface which env-var sources are present without minting yet.
env_sources = []
if os.environ.get("AZURE_FEDERATED_TOKEN_FILE", "").strip():
env_sources.append("WorkloadIdentityCredential (AZURE_FEDERATED_TOKEN_FILE)")
if (os.environ.get("AZURE_CLIENT_ID", "").strip()
and os.environ.get("AZURE_CLIENT_SECRET", "").strip()
and os.environ.get("AZURE_TENANT_ID", "").strip()):
env_sources.append("EnvironmentCredential (client secret)")
if os.environ.get("IDENTITY_ENDPOINT", "").strip() or os.environ.get("MSI_ENDPOINT", "").strip():
env_sources.append("ManagedIdentityCredential (IDENTITY_ENDPOINT)")
info["env_sources"] = env_sources
# Now try minting.
result: Dict[str, Any] = {}
def _probe() -> None:
try:
credential = build_credential(config)
tok = credential.get_token(config.scope)
result["token"] = tok
except Exception as exc:
result["error"] = str(exc)
thread = threading.Thread(target=_probe, daemon=True)
thread.start()
thread.join(timeout=max(0.01, timeout_seconds))
if thread.is_alive():
info["error"] = f"Token probe timed out after {timeout_seconds:.0f}s"
info["hint"] = (
"DefaultAzureCredential can be slow when the token service is unreachable "
"or when az login state is stale. Try `az login` or set "
"AZURE_CLIENT_ID / AZURE_TENANT_ID / AZURE_CLIENT_SECRET."
)
return info
if "error" in result:
info["error"] = result["error"]
return info
token = result.get("token")
if token is None:
info["error"] = "credential chain exhausted"
return info
info["ok"] = True
info["expires_on"] = getattr(token, "expires_on", None)
return info
# ---------------------------------------------------------------------------
# Consumer-side helpers — split by purpose to prevent accidental token
# minting in logging / cache-key / dashboard paths.
# ---------------------------------------------------------------------------
def is_token_provider(value: Any) -> bool:
"""Return True when ``value`` is a callable Entra token provider.
Used at the seams where a consumer must decide between
string-API-key semantics and bearer-callable semantics.
"""
return callable(value) and not isinstance(value, str)
def materialize_bearer_for_http(value: Any) -> str:
"""Return a fresh Bearer JWT for a manual HTTP request.
Only call this at sites that must construct an ``Authorization``
header outside the OpenAI SDK (e.g. ``hermes_cli/azure_detect.py``).
Calls the callable exactly once and returns the resulting token.
**Anthropic SDK integration:** the Anthropic Python SDK does not
accept a ``Callable[[], str]`` for ``auth_token``. Instead,
:func:`build_bearer_http_client` returns an ``httpx.Client`` whose
request event hook calls this function and rewrites the
``Authorization`` header per request — and that client is passed to
the Anthropic SDK via ``http_client=...``. See
:func:`agent.anthropic_adapter.build_anthropic_client` for the
consumer.
Raises ``ValueError`` if ``value`` is not a callable token provider
or non-empty string.
"""
if is_token_provider(value):
token = value()
if not isinstance(token, str) or not token:
raise ValueError("token provider returned empty value")
return token
if isinstance(value, str) and value:
return value
raise ValueError("no usable api_key / token provider")
def build_bearer_http_client(token_provider: Callable[[], str], **httpx_kwargs: Any) -> Any:
"""Return an ``httpx.Client`` that mints a fresh Entra bearer JWT
per outbound request.
The Anthropic SDK (≤ 0.86.0 at the time of writing) stores
``api_key`` / ``auth_token`` as static strings and computes the
``Authorization`` header at construction time. To get per-request
token refresh (the Microsoft-recommended Foundry pattern for
callable bearer providers), we install an httpx ``request`` event
hook on a custom client and pass that client to the SDK via
``http_client=...``. The hook:
1. Calls :func:`materialize_bearer_for_http` to mint a fresh JWT
(azure-identity caches internally — this is cheap when the
cached token is still valid).
2. Strips any pre-set ``Authorization`` / ``api-key`` /
``x-api-key`` headers the SDK may have added (avoids
conflicting auth values).
3. Sets ``Authorization: Bearer <fresh-jwt>``.
``token_provider`` must be a zero-arg callable returning a string —
typically the result of :func:`build_token_provider`.
``httpx_kwargs`` are forwarded verbatim to ``httpx.Client(...)`` so
callers can attach a ``timeout``, ``transport``, ``proxy``, etc.
Raises ``ImportError`` if ``httpx`` is not installed (it is a
transitive dependency of both ``openai`` and ``anthropic`` SDKs, so
in practice always available when this helper is reached).
"""
if not is_token_provider(token_provider):
raise ValueError(
"build_bearer_http_client requires a zero-arg callable "
"token provider"
)
try:
import httpx
except ImportError as exc: # pragma: no cover — httpx ships with openai/anthropic
raise ImportError(
"httpx is required for Entra ID bearer auth on Microsoft Foundry "
"Anthropic-style endpoints. It is normally a transitive "
"dependency of the openai/anthropic SDKs."
) from exc
def _inject_bearer(request: "httpx.Request") -> None:
try:
token = materialize_bearer_for_http(token_provider)
except ValueError as exc:
# Token provider failed (chain exhausted, token service unreachable,
# az login expired, etc.). Strip any auth headers the SDK
# may have set — including our own placeholder sentinel
# ``entra-id-bearer-via-http-hook`` from
# ``_build_anthropic_client_with_bearer_hook`` — so the
# outbound request hits Azure with NO Authorization rather
# than with the placeholder. Azure returns a clean 401
# "missing auth" that is easier to diagnose than a 401
# against the sentinel string, and the sentinel never
# appears in upstream access logs.
#
# Log at WARNING (not DEBUG) so the misconfiguration is
# visible at default log levels.
logger.warning(
"Bearer hook: Entra ID token provider returned empty (%s) "
"— stripping Authorization headers. Azure will respond 401. "
"Run `hermes doctor` or `az login` to recover.",
exc,
)
for header_name in ("Authorization", "authorization", "Api-Key", "api-key", "X-Api-Key", "x-api-key"):
request.headers.pop(header_name, None)
return
for header_name in ("Authorization", "authorization", "Api-Key", "api-key", "X-Api-Key", "x-api-key"):
request.headers.pop(header_name, None)
request.headers["Authorization"] = f"Bearer {token}"
return httpx.Client(
event_hooks={"request": [_inject_bearer]},
**httpx_kwargs,
)
__all__ = [
"EntraIdentityConfig",
"SCOPE_AI_AZURE_DEFAULT",
"build_bearer_http_client",
"build_credential",
"build_token_provider",
"describe_active_credential",
"has_azure_identity_credentials",
"has_azure_identity_installed",
"is_token_provider",
"materialize_bearer_for_http",
"reset_credential_cache",
]

View File

@@ -1,597 +0,0 @@
"""Background memory/skill review — fork the agent to evaluate the turn.
After every turn, ``AIAgent.run_conversation`` may call
:func:`spawn_background_review` to fire off a daemon thread that replays
the conversation snapshot in a forked :class:`AIAgent` and asks itself
"should any skill/memory be saved or updated?". Writes go straight to
the memory + skill stores. Main conversation and prompt cache are never
touched.
The fork inherits the parent's live runtime (provider, model, base_url,
credentials, cached system prompt) so it hits the same prefix cache and
uses the same auth. It runs with a tool whitelist limited to memory and
skill management tools; everything else is denied at runtime.
See the ``hermes-agent-dev`` skill (``references/self-improvement-loop.md``)
for invariants and PR review criteria.
"""
from __future__ import annotations
import contextlib
import json
import logging
import os
from typing import Any, Dict, List, Optional
logger = logging.getLogger(__name__)
# Review-prompt strings — used by ``spawn_background_review_thread`` to build
# the user-message that the forked review agent receives. AIAgent exposes
# them as class attributes (``_MEMORY_REVIEW_PROMPT`` etc.) for back-compat;
# the actual text lives here so future edits are one-place.
_MEMORY_REVIEW_PROMPT = (
"Review the conversation above and consider saving to memory if appropriate.\n\n"
"Focus on:\n"
"1. Has the user revealed things about themselves — their persona, desires, "
"preferences, or personal details worth remembering?\n"
"2. Has the user expressed expectations about how you should behave, their work "
"style, or ways they want you to operate?\n\n"
"If something stands out, save it using the memory tool. "
"If nothing is worth saving, just say 'Nothing to save.' and stop."
)
_SKILL_REVIEW_PROMPT = (
"Review the conversation above and update the skill library. Be "
"ACTIVE — most sessions produce at least one skill update, even if "
"small. A pass that does nothing is a missed learning opportunity, "
"not a neutral outcome.\n\n"
"Target shape of the library: CLASS-LEVEL skills, each with a rich "
"SKILL.md and a `references/` directory for session-specific detail. "
"Not a long flat list of narrow one-session-one-skill entries. This "
"shapes HOW you update, not WHETHER you update.\n\n"
"Signals to look for (any one of these warrants action):\n"
" • User corrected your style, tone, format, legibility, or "
"verbosity. Frustration signals like 'stop doing X', 'this is too "
"verbose', 'don't format like this', 'why are you explaining', "
"'just give me the answer', 'you always do Y and I hate it', or an "
"explicit 'remember this' are FIRST-CLASS skill signals, not just "
"memory signals. Update the relevant skill(s) to embed the "
"preference so the next session starts already knowing.\n"
" • User corrected your workflow, approach, or sequence of steps. "
"Encode the correction as a pitfall or explicit step in the skill "
"that governs that class of task.\n"
" • Non-trivial technique, fix, workaround, debugging path, or "
"tool-usage pattern emerged that a future session would benefit "
"from. Capture it.\n"
" • A skill that got loaded or consulted this session turned out "
"to be wrong, missing a step, or outdated. Patch it NOW.\n\n"
"Preference order — prefer the earliest action that fits, but do "
"pick one when a signal above fired:\n"
" 1. UPDATE A CURRENTLY-LOADED SKILL. Look back through the "
"conversation for skills the user loaded via /skill-name or you "
"read via skill_view. If any of them covers the territory of the "
"new learning, PATCH that one first. It is the skill that was in "
"play, so it's the right one to extend.\n"
" 2. UPDATE AN EXISTING UMBRELLA (via skills_list + skill_view). "
"If no loaded skill fits but an existing class-level skill does, "
"patch it. Add a subsection, a pitfall, or broaden a trigger.\n"
" 3. ADD A SUPPORT FILE under an existing umbrella. Skills can be "
"packaged with three kinds of support files — use the right "
"directory per kind:\n"
" • `references/<topic>.md` — session-specific detail (error "
"transcripts, reproduction recipes, provider quirks) AND "
"condensed knowledge banks: quoted research, API docs, external "
"authoritative excerpts, or domain notes you found while working "
"on the problem. Write it concise and for the value of the task, "
"not as a full mirror of upstream docs.\n"
" • `templates/<name>.<ext>` — starter files meant to be "
"copied and modified (boilerplate configs, scaffolding, a "
"known-good example the agent can `reproduce with modifications`).\n"
" • `scripts/<name>.<ext>` — statically re-runnable actions "
"the skill can invoke directly (verification scripts, fixture "
"generators, deterministic probes, anything the agent should run "
"rather than hand-type each time).\n"
" Add support files via skill_manage action=write_file with "
"file_path starting 'references/', 'templates/', or 'scripts/'. "
"The umbrella's SKILL.md should gain a one-line pointer to any "
"new support file so future agents know it exists.\n"
" 4. CREATE A NEW CLASS-LEVEL UMBRELLA SKILL when no existing "
"skill covers the class. The name MUST be at the class level. "
"The name MUST NOT be a specific PR number, error string, feature "
"codename, library-alone name, or 'fix-X / debug-Y / audit-Z-today' "
"session artifact. If the proposed name only makes sense for "
"today's task, it's wrong — fall back to (1), (2), or (3).\n\n"
"User-preference embedding (important): when the user expressed a "
"style/format/workflow preference, the update belongs in the "
"SKILL.md body, not just in memory. Memory captures 'who the user "
"is and what the current situation and state of your operations "
"are'; skills capture 'how to do this class of task for this "
"user'. When they complain about how you handled a task, the "
"skill that governs that task needs to carry the lesson.\n\n"
"If you notice two existing skills that overlap, note it in your "
"reply — the background curator handles consolidation at scale.\n\n"
"Protected skills (DO NOT edit these):\n"
" • Bundled skills (shipped with Hermes, e.g. 'hermes-agent').\n"
" • Hub-installed skills (installed via 'hermes skills install').\n"
"Pinned skills (marked via 'hermes curator pin') CAN be improved — "
"pin only blocks deletion/archive/consolidation by the curator, not "
"content updates. Patch them when a pitfall or missing step turns up, "
"same as any other agent-created skill.\n"
"If the only skills that need updating are protected, say\n"
"'Nothing to save.' and stop.\n\n"
"Do NOT capture (these become persistent self-imposed constraints "
"that bite you later when the environment changes):\n"
" • Environment-dependent failures: missing binaries, fresh-install "
"errors, post-migration path mismatches, 'command not found', "
"unconfigured credentials, uninstalled packages. The user can fix "
"these — they are not durable rules.\n"
" • Negative claims about tools or features ('browser tools do not "
"work', 'X tool is broken', 'cannot use Y from execute_code'). These "
"harden into refusals the agent cites against itself for months "
"after the actual problem was fixed.\n"
" • Session-specific transient errors that resolved before the "
"conversation ended. If retrying worked, the lesson is the retry "
"pattern, not the original failure.\n"
" • One-off task narratives. A user asking 'summarize today's "
"market' or 'analyze this PR' is not a class of work that warrants "
"a skill.\n\n"
"If a tool failed because of setup state, capture the FIX (install "
"command, config step, env var to set) under an existing setup or "
"troubleshooting skill — never 'this tool does not work' as a "
"standalone constraint.\n\n"
"'Nothing to save.' is a real option but should NOT be the "
"default. If the session ran smoothly with no corrections and "
"produced no new technique, just say 'Nothing to save.' and stop. "
"Otherwise, act."
)
_COMBINED_REVIEW_PROMPT = (
"Review the conversation above and update two things:\n\n"
"**Memory**: who the user is. Did the user reveal persona, "
"desires, preferences, personal details, or expectations about "
"how you should behave? Save facts about the user and durable "
"preferences with the memory tool.\n\n"
"**Skills**: how to do this class of task. Be ACTIVE — most "
"sessions produce at least one skill update. A pass that does "
"nothing is a missed learning opportunity, not a neutral outcome.\n\n"
"Target shape of the skill library: CLASS-LEVEL skills with a rich "
"SKILL.md and a `references/` directory for session-specific detail. "
"Not a long flat list of narrow one-session-one-skill entries.\n\n"
"Signals that warrant a skill update (any one is enough):\n"
" • User corrected your style, tone, format, legibility, "
"verbosity, or approach. Frustration is a FIRST-CLASS skill "
"signal, not just a memory signal. 'stop doing X', 'don't format "
"like this', 'I hate when you Y' — embed the lesson in the skill "
"that governs that task so the next session starts fixed.\n"
" • Non-trivial technique, fix, workaround, or debugging path "
"emerged.\n"
" • A skill that was loaded or consulted turned out wrong, "
"missing, or outdated — patch it now.\n\n"
"Preference order for skills — pick the earliest that fits:\n"
" 1. UPDATE A CURRENTLY-LOADED SKILL. Check what skills were "
"loaded via /skill-name or skill_view in the conversation. If one "
"of them covers the learning, PATCH it first. It was in play; "
"it's the right place.\n"
" 2. UPDATE AN EXISTING UMBRELLA (skills_list + skill_view to "
"find the right one). Patch it.\n"
" 3. ADD A SUPPORT FILE under an existing umbrella via "
"skill_manage action=write_file. Three kinds: "
"`references/<topic>.md` for session-specific detail OR condensed "
"knowledge banks (quoted research, API docs excerpts, domain "
"notes) written concise and task-focused; `templates/<name>.<ext>` "
"for starter files meant to be copied and modified; "
"`scripts/<name>.<ext>` for statically re-runnable actions "
"(verification, fixture generators, probes). Add a one-line "
"pointer in SKILL.md so future agents find them.\n"
" 4. CREATE A NEW CLASS-LEVEL UMBRELLA when nothing exists. "
"Name at the class level — NOT a PR number, error string, "
"codename, library-alone name, or 'fix-X / debug-Y' session "
"artifact. If the name only fits today's task, fall back to (1), "
"(2), or (3).\n\n"
"User-preference embedding: when the user complains about how "
"you handled a task, update the skill that governs that task — "
"memory alone isn't enough. Memory says 'who the user is and "
"what the current situation and state of your operations are'; "
"skills say 'how to do this class of task for this user'. Both "
"should carry user-preference lessons when relevant.\n\n"
"If you notice overlapping existing skills, mention it — the "
"background curator handles consolidation.\n\n"
"Protected skills (DO NOT edit these):\n"
" • Bundled skills (shipped with Hermes, e.g. 'hermes-agent').\n"
" • Hub-installed skills (installed via 'hermes skills install').\n"
"Pinned skills (marked via 'hermes curator pin') CAN be improved — "
"pin only blocks deletion/archive/consolidation by the curator, not "
"content updates. Patch them when a pitfall or missing step turns up, "
"same as any other agent-created skill.\n"
"If the only skills that need updating are protected, say\n"
"'Nothing to save.' and stop.\n\n"
"Do NOT capture as skills (these become persistent self-imposed "
"constraints that bite you later when the environment changes):\n"
" • Environment-dependent failures: missing binaries, fresh-install "
"errors, post-migration path mismatches, 'command not found', "
"unconfigured credentials, uninstalled packages. The user can fix "
"these — they are not durable rules.\n"
" • Negative claims about tools or features ('browser tools do not "
"work', 'X tool is broken', 'cannot use Y from execute_code'). These "
"harden into refusals the agent cites against itself for months "
"after the actual problem was fixed.\n"
" • Session-specific transient errors that resolved before the "
"conversation ended. If retrying worked, the lesson is the retry "
"pattern, not the original failure.\n"
" • One-off task narratives. A user asking 'summarize today's "
"market' or 'analyze this PR' is not a class of work that warrants "
"a skill.\n\n"
"If a tool failed because of setup state, capture the FIX (install "
"command, config step, env var to set) under an existing setup or "
"troubleshooting skill — never 'this tool does not work' as a "
"standalone constraint.\n\n"
"Act on whichever of the two dimensions has real signal. If "
"genuinely nothing stands out on either, say 'Nothing to save.' "
"and stop — but don't reach for that conclusion as a default."
)
def summarize_background_review_actions(
review_messages: List[Dict],
prior_snapshot: List[Dict],
) -> List[str]:
"""Build the human-facing action summary for a background review pass.
Walks the review agent's session messages and collects "successful tool
action" descriptions to surface to the user (e.g. "Memory updated").
Tool messages already present in ``prior_snapshot`` are skipped so we
don't re-surface stale results from the prior conversation that the
review agent inherited via ``conversation_history`` (issue #14944).
Matching is by ``tool_call_id`` when available, with a content-equality
fallback for tool messages that lack one.
"""
existing_tool_call_ids = set()
existing_tool_contents = set()
for prior in prior_snapshot or []:
if not isinstance(prior, dict) or prior.get("role") != "tool":
continue
tcid = prior.get("tool_call_id")
if tcid:
existing_tool_call_ids.add(tcid)
else:
content = prior.get("content")
if isinstance(content, str):
existing_tool_contents.add(content)
actions: List[str] = []
for msg in review_messages or []:
if not isinstance(msg, dict) or msg.get("role") != "tool":
continue
tcid = msg.get("tool_call_id")
if tcid and tcid in existing_tool_call_ids:
continue
if not tcid:
content_str = msg.get("content")
if isinstance(content_str, str) and content_str in existing_tool_contents:
continue
try:
data = json.loads(msg.get("content", "{}"))
except (json.JSONDecodeError, TypeError):
continue
if not isinstance(data, dict) or not data.get("success"):
continue
message = data.get("message", "")
target = data.get("target", "")
if "created" in message.lower():
actions.append(message)
elif "updated" in message.lower():
actions.append(message)
elif "added" in message.lower() or (target and "add" in message.lower()):
label = "Memory" if target == "memory" else "User profile" if target == "user" else target
actions.append(f"{label} updated")
elif "Entry added" in message:
label = "Memory" if target == "memory" else "User profile" if target == "user" else target
actions.append(f"{label} updated")
elif "removed" in message.lower() or "replaced" in message.lower():
label = "Memory" if target == "memory" else "User profile" if target == "user" else target
actions.append(f"{label} updated")
return actions
def build_memory_write_metadata(
agent: Any,
*,
write_origin: Optional[str] = None,
execution_context: Optional[str] = None,
task_id: Optional[str] = None,
tool_call_id: Optional[str] = None,
) -> Dict[str, Any]:
"""Build provenance metadata for external memory-provider mirrors."""
metadata: Dict[str, Any] = {
"write_origin": write_origin or getattr(agent, "_memory_write_origin", "assistant_tool"),
"execution_context": (
execution_context
or getattr(agent, "_memory_write_context", "foreground")
),
"session_id": agent.session_id or "",
"parent_session_id": agent._parent_session_id or "",
"platform": agent.platform or os.environ.get("HERMES_SESSION_SOURCE", "cli"),
"tool_name": "memory",
}
if task_id:
metadata["task_id"] = task_id
if tool_call_id:
metadata["tool_call_id"] = tool_call_id
return {k: v for k, v in metadata.items() if v not in {None, ""}}
def _run_review_in_thread(
agent: Any,
messages_snapshot: List[Dict],
prompt: str,
) -> None:
"""Worker function executed in the background-review daemon thread.
Spawns a forked ``AIAgent`` inheriting the parent's runtime, runs the
review prompt, and surfaces a compact action summary back to the user
via ``agent._safe_print`` and ``agent.background_review_callback``.
"""
# Local import to avoid a hard circular dep at module load.
from run_agent import AIAgent
from tools.terminal_tool import set_approval_callback as _set_approval_callback
# Install a non-interactive approval callback on this worker
# thread so any dangerous-command guard the review agent trips
# resolves to "deny" instead of falling back to input() -- which
# deadlocks against the parent's prompt_toolkit TUI (#15216).
# Same pattern as _subagent_auto_deny in tools/delegate_tool.py.
def _bg_review_auto_deny(command, description, **kwargs):
logger.warning(
"Background review auto-denied dangerous command: %s (%s)",
command, description,
)
return "deny"
try:
_set_approval_callback(_bg_review_auto_deny)
except Exception:
pass
review_agent = None
review_messages: List[Dict] = []
try:
with open(os.devnull, "w", encoding="utf-8") as _devnull, \
contextlib.redirect_stdout(_devnull), \
contextlib.redirect_stderr(_devnull):
# Inherit the parent agent's live runtime (provider, model,
# base_url, api_key, api_mode) so the fork uses the exact
# same credentials the main turn is using. Without this,
# AIAgent.__init__ re-runs auto-resolution from env vars,
# which fails for OAuth-only providers, session-scoped
# creds, or credential-pool setups where the resolver can't
# reconstruct auth from scratch -- producing the spurious
# "No LLM provider configured" warning at end of turn.
_parent_runtime = agent._current_main_runtime()
_parent_api_mode = _parent_runtime.get("api_mode") or None
# The review fork needs to call agent-loop tools (memory,
# skill_manage). Those tools require Hermes' own dispatch,
# which the codex_app_server runtime bypasses entirely
# (it runs the turn inside codex's subprocess). So when
# the parent is on codex_app_server, downgrade the review
# fork to codex_responses — same auth/credentials, but
# talks to the OpenAI Responses API directly so Hermes
# owns the loop and the agent-loop tools dispatch.
if _parent_api_mode == "codex_app_server":
_parent_api_mode = "codex_responses"
# skip_memory=True keeps the review fork from
# touching external memory plugins (honcho, mem0,
# supermemory, etc.). Without it, the fork's
# __init__ rebuilds its own _memory_manager from
# config, scoped to the parent's session_id, and
# run_conversation() then leaks the harness prompt
# into the user's real memory namespace via three
# ingestion sites: on_turn_start (cadence + turn
# message), prefetch_all (recall query), and
# sync_all (harness prompt + review output recorded
# as a (user, assistant) turn pair). Built-in
# MEMORY.md / USER.md state is re-bound from the
# parent below so memory(action="add") writes from
# the review still land on disk; the review just
# has zero side effects on external providers.
# Match parent's toolset config so ``tools[]`` is byte-identical
# in the request body — Anthropic's cache key includes it.
# (The runtime whitelist below still restricts dispatch.)
review_agent = AIAgent(
model=agent.model,
max_iterations=16,
quiet_mode=True,
platform=agent.platform,
provider=agent.provider,
api_mode=_parent_api_mode,
base_url=_parent_runtime.get("base_url") or None,
api_key=_parent_runtime.get("api_key") or None,
credential_pool=getattr(agent, "_credential_pool", None),
parent_session_id=agent.session_id,
enabled_toolsets=getattr(agent, "enabled_toolsets", None),
disabled_toolsets=getattr(agent, "disabled_toolsets", None),
skip_memory=True,
)
review_agent._memory_write_origin = "background_review"
review_agent._memory_write_context = "background_review"
review_agent._memory_store = agent._memory_store
review_agent._memory_enabled = agent._memory_enabled
review_agent._user_profile_enabled = agent._user_profile_enabled
review_agent._memory_nudge_interval = 0
review_agent._skill_nudge_interval = 0
# Suppress all status/warning emits from the fork so the
# user only sees the final successful-action summary.
# Without this, mid-review "Iteration budget exhausted",
# rate-limit retries, compression warnings, and other
# lifecycle messages bubble up through _emit_status ->
# _vprint and leak past the stdout redirect (they go via
# _print_fn/status_callback, which bypass sys.stdout).
review_agent.suppress_status_output = True
# Inherit the parent's cached system prompt verbatim so
# the review fork's outbound HTTP request hits the same
# Anthropic/OpenRouter prefix cache the parent warmed.
# Without this, the fork rebuilds the system prompt from
# scratch (fresh _hermes_now() timestamp, fresh
# session_id, narrower toolset → different skills_prompt)
# and the byte-exact prefix-cache key misses. See
# issue #25322 and PR #17276 for the full analysis +
# measured impact (~26% end-to-end cost reduction on
# Sonnet 4.5).
review_agent._cached_system_prompt = agent._cached_system_prompt
# Defensive: pin session_start + session_id to the
# parent's so any code path that re-renders parts of
# the system prompt (compression, plugin hooks) still
# produces byte-identical output. The cached-prompt
# assignment above already short-circuits the normal
# rebuild path, but these pins guarantee parity even
# if a future code path bypasses the cache.
review_agent.session_start = agent.session_start
review_agent.session_id = agent.session_id
from model_tools import get_tool_definitions
from hermes_cli.plugins import (
set_thread_tool_whitelist,
clear_thread_tool_whitelist,
)
review_whitelist = {
t["function"]["name"]
for t in get_tool_definitions(
enabled_toolsets=["memory", "skills"],
quiet_mode=True,
)
}
set_thread_tool_whitelist(
review_whitelist,
deny_msg_fmt=(
"Background review denied non-whitelisted tool: "
"{tool_name}. Only memory/skill tools are allowed."
),
)
try:
review_agent.run_conversation(
user_message=(
prompt
+ "\n\nYou can only call memory and skill "
"management tools. Other tools will be denied "
"at runtime — do not attempt them."
),
conversation_history=messages_snapshot,
)
finally:
clear_thread_tool_whitelist()
# Snapshot review actions before teardown. close() is allowed to
# clean per-session state, but the user-visible self-improvement
# summary still needs the completed review agent's tool results.
review_messages = list(getattr(review_agent, "_session_messages", []))
# Tear down memory providers while stdout is still
# redirected so background thread teardown (Honcho flush,
# Hindsight sync, etc.) stays silent. The finally block
# below is a safety net for the exception path.
try:
review_agent.shutdown_memory_provider()
except Exception:
pass
try:
review_agent.close()
except Exception:
pass
review_agent = None
# Scan the review agent's messages for successful tool actions
# and surface a compact summary to the user. Tool messages
# already present in messages_snapshot must be skipped, since
# the review agent inherits that history and would otherwise
# re-surface stale "created"/"updated" messages from the prior
# conversation as if they just happened (issue #14944).
actions = summarize_background_review_actions(
review_messages,
messages_snapshot,
)
if actions:
summary = " · ".join(dict.fromkeys(actions))
agent._safe_print(
f" 💾 Self-improvement review: {summary}"
)
_bg_cb = agent.background_review_callback
if _bg_cb:
try:
_bg_cb(
f"💾 Self-improvement review: {summary}"
)
except Exception:
pass
except Exception as e:
logger.warning("Background memory/skill review failed: %s", e)
agent._emit_auxiliary_failure("background review", e)
finally:
# Safety-net cleanup for the exception path. Normal
# completion already shut down inside redirect_stdout above.
# Re-open devnull here so any teardown output (Honcho flush,
# Hindsight sync, background thread joins) stays silent even
# on the exception path where redirect_stdout already exited.
if review_agent is not None:
try:
with open(os.devnull, "w", encoding="utf-8") as _fn, \
contextlib.redirect_stdout(_fn), \
contextlib.redirect_stderr(_fn):
try:
review_agent.shutdown_memory_provider()
except Exception:
pass
try:
review_agent.close()
except Exception:
pass
except Exception:
pass
# Clear the approval callback on this bg-review thread so a
# recycled thread-id doesn't inherit a stale reference.
try:
_set_approval_callback(None)
except Exception:
pass
def spawn_background_review_thread(
agent: Any,
messages_snapshot: List[Dict],
review_memory: bool = False,
review_skills: bool = False,
):
"""Build the review thread target and prompt for a background review.
Returns a ``(target, prompt)`` tuple. The caller (``AIAgent._spawn_background_review``)
owns the actual ``threading.Thread`` construction so test-level patches
of ``run_agent.threading.Thread`` keep working.
"""
# Pick the right prompt based on which triggers fired. Allow per-agent
# override (the prompts moved to module-level constants but old code paths
# that set agent._MEMORY_REVIEW_PROMPT etc. directly keep working).
if review_memory and review_skills:
prompt = getattr(agent, "_COMBINED_REVIEW_PROMPT", _COMBINED_REVIEW_PROMPT)
elif review_memory:
prompt = getattr(agent, "_MEMORY_REVIEW_PROMPT", _MEMORY_REVIEW_PROMPT)
else:
prompt = getattr(agent, "_SKILL_REVIEW_PROMPT", _SKILL_REVIEW_PROMPT)
def _target() -> None:
_run_review_in_thread(agent, messages_snapshot, prompt)
return _target, prompt
__all__ = [
"_MEMORY_REVIEW_PROMPT",
"_SKILL_REVIEW_PROMPT",
"_COMBINED_REVIEW_PROMPT",
"spawn_background_review_thread",
"summarize_background_review_actions",
"build_memory_write_metadata",
]

View File

@@ -36,19 +36,6 @@ from typing import Any, Dict, List, Optional, Tuple
logger = logging.getLogger(__name__)
# ---------------------------------------------------------------------------
# Ensure boto3/botocore are installed before any code in this module runs.
# Upstream removed boto3 from [all] extras (PRs #24220, #24515); lazy_deps
# handles on-demand installation so the Bedrock provider still works in the
# EKS deployment without baking boto3 into the base image.
# ---------------------------------------------------------------------------
try:
from tools.lazy_deps import ensure
ensure("provider.bedrock", prompt=False)
except Exception:
pass # lazy_deps unavailable or install failed — let downstream imports surface the real error
# ---------------------------------------------------------------------------
# Lazy boto3 import — only loaded when the Bedrock provider is actually used.
# This keeps startup fast for users who don't use Bedrock.
@@ -1167,6 +1154,18 @@ def _extract_provider_from_arn(arn: str) -> str:
"""
match = re.search(r"foundation-model/([^.]+)", arn)
return match.group(1) if match else ""
def get_bedrock_model_ids(region: str) -> List[str]:
"""Return a flat list of available Bedrock model IDs for the given region.
Convenience wrapper around ``discover_bedrock_models()`` for use in
the model selection UI.
"""
models = discover_bedrock_models(region)
return [m["id"] for m in models]
# ---------------------------------------------------------------------------
# Error classification — Bedrock-specific exceptions
# ---------------------------------------------------------------------------

View File

@@ -1,175 +0,0 @@
"""
Browser Provider ABC
====================
Defines the pluggable-backend interface for cloud browser providers
(Browserbase, Browser Use, Firecrawl, …). Providers register instances via
:meth:`PluginContext.register_browser_provider`; the active one (selected via
``browser.cloud_provider`` in ``config.yaml``) services every cloud-mode
``browser_*`` tool call.
Providers live in ``<repo>/plugins/browser/<name>/`` (built-in, auto-loaded as
``kind: backend``) or ``~/.hermes/plugins/browser/<name>/`` (user, opt-in via
``plugins.enabled``).
This ABC mirrors :class:`agent.web_search_provider.WebSearchProvider` (PR
#25182) — same shape, same registration flow, same picker integration. The
legacy in-tree ``tools.browser_providers.base.CloudBrowserProvider`` ABC was
deleted in PR #25214 (this work) along with the per-vendor inline modules in
``tools/browser_providers/``; the lifecycle contract documented below is
preserved bit-for-bit so the tool wrapper (:mod:`tools.browser_tool`) does
not have to translate.
Session metadata contract (preserved from the legacy ``CloudBrowserProvider``)::
{
"session_name": str, # unique name for agent-browser --session
"bb_session_id": str, # provider session ID (for close/cleanup)
"cdp_url": str, # CDP websocket URL
"features": dict, # feature flags that were enabled
"external_call_id": str, # optional, managed-gateway billing key
}
``bb_session_id`` is a legacy key name kept verbatim for backward compat with
:mod:`tools.browser_tool` — it holds the provider's session ID regardless of
which provider is in use.
"""
from __future__ import annotations
import abc
from typing import Any, Dict
# ---------------------------------------------------------------------------
# ABC
# ---------------------------------------------------------------------------
class BrowserProvider(abc.ABC):
"""Abstract base class for a cloud browser backend.
Subclasses must implement :meth:`name`, :meth:`is_available`, and the
three lifecycle methods: :meth:`create_session`, :meth:`close_session`,
:meth:`emergency_cleanup`.
The lifecycle shape preserves the legacy ``CloudBrowserProvider`` contract
bit-for-bit so the dispatcher in :mod:`tools.browser_tool` is a pure
registry lookup — no per-provider conditionals, no shape translation.
"""
@property
@abc.abstractmethod
def name(self) -> str:
"""Stable short identifier used in the ``browser.cloud_provider``
config key.
Lowercase, hyphens permitted to preserve existing user-visible names.
Examples: ``browserbase``, ``browser-use``, ``firecrawl``.
"""
@property
def display_name(self) -> str:
"""Human-readable label shown in ``hermes tools``. Defaults to ``name``."""
return self.name
@abc.abstractmethod
def is_available(self) -> bool:
"""Return True when this provider can service calls.
Typically a cheap check (env var present, managed-gateway token
readable, optional Python dep importable). Must NOT make network
calls — this runs at tool-registration time and on every
``hermes tools`` paint.
Mirrors the legacy ``CloudBrowserProvider.is_configured()`` method;
renamed for parity with :class:`agent.web_search_provider.WebSearchProvider`.
"""
@abc.abstractmethod
def create_session(self, task_id: str) -> Dict[str, object]:
"""Create a cloud browser session and return session metadata.
Must return a dict with at least::
{
"session_name": str, # unique name for agent-browser --session
"bb_session_id": str, # provider session ID (for close/cleanup)
"cdp_url": str, # CDP websocket URL
"features": dict, # feature flags that were enabled
}
``bb_session_id`` is a legacy key name kept for backward compat with
the rest of :mod:`tools.browser_tool` — it holds the provider's
session ID regardless of which provider is in use.
May raise ``ValueError`` (missing credentials) or ``RuntimeError``
(network / API failure); the dispatcher surfaces these to the user.
"""
@abc.abstractmethod
def close_session(self, session_id: str) -> bool:
"""Release / terminate a cloud session by its provider session ID.
Returns True on success, False on failure. Should not raise — log and
return False on any exception so the dispatcher's cleanup loop keeps
moving across sessions.
"""
@abc.abstractmethod
def emergency_cleanup(self, session_id: str) -> None:
"""Best-effort session teardown during process exit.
Called from atexit / signal handlers. Must tolerate missing
credentials, network errors, etc. — log and move on. Must not raise.
"""
def get_setup_schema(self) -> Dict[str, Any]:
"""Return provider metadata for the ``hermes tools`` picker.
Used by :mod:`hermes_cli.tools_config` to inject this provider as a
row in the Browser Automation picker. Shape mirrors the existing
hardcoded entries in ``TOOL_CATEGORIES["browser"]``::
{
"name": "Browserbase",
"badge": "paid",
"tag": "Cloud browser with stealth and proxies",
"env_vars": [
{"key": "BROWSERBASE_API_KEY",
"prompt": "Browserbase API key",
"url": "https://browserbase.com"},
],
"post_setup": "agent_browser",
}
Default: minimal entry derived from :attr:`display_name`. Override to
expose API key prompts, badges, managed-Nous gating, and the
``post_setup`` install hook.
"""
return {
"name": self.display_name,
"badge": "",
"tag": "",
"env_vars": [],
}
# ------------------------------------------------------------------
# Backward-compat shims for the legacy CloudBrowserProvider API
# ------------------------------------------------------------------
#
# The pre-PR-#25214 ABC exposed ``is_configured()`` and ``provider_name()``;
# ``tools.browser_tool`` has ~6 callers that still use those names. Rather
# than churn every callsite (and break out-of-tree downstream code that
# subclassed CloudBrowserProvider), we expose the old names as thin
# delegations to the new API. Subclasses MUST implement :meth:`is_available`
# and :attr:`name`; they may override ``is_configured`` / ``provider_name``
# for compatibility with the legacy ABC but it is not required.
def is_configured(self) -> bool:
"""Backward-compat alias for :meth:`is_available`."""
return self.is_available()
def provider_name(self) -> str:
"""Backward-compat alias returning :attr:`display_name`."""
return self.display_name

View File

@@ -1,192 +0,0 @@
"""
Browser Provider Registry
=========================
Central map of registered cloud browser providers. Populated by plugins at
import-time via :meth:`PluginContext.register_browser_provider`; consumed by
:func:`tools.browser_tool._get_cloud_provider` to route each cloud-mode
``browser_*`` tool call to the active backend.
Active selection
----------------
The active provider is chosen by configuration with this precedence:
1. ``browser.cloud_provider`` in ``config.yaml`` (explicit override).
2. Legacy preference order — ``browser-use`` → ``browserbase`` — filtered by
availability. Matches the historic auto-detect order in
:func:`tools.browser_tool._get_cloud_provider` (Browser Use checked first
because it covers both the managed Nous gateway and direct API key path;
Browserbase as the older direct-credentials fallback). ``firecrawl`` is
intentionally NOT in the legacy walk — users only get Firecrawl as a
cloud browser when they explicitly set ``browser.cloud_provider:
firecrawl``, matching pre-migration behaviour where Firecrawl was never
auto-selected.
3. Otherwise ``None`` — the dispatcher falls back to local browser mode.
The explicit-config branch (rule 1) intentionally ignores ``is_available()``
so the dispatcher surfaces a typed "X_API_KEY is not set" error to the user
instead of silently switching backends. Matches the legacy
:func:`tools.browser_tool._get_cloud_provider` behaviour for configured names.
Note: there is no "capability" split here (unlike the web subsystem, which
has search/extract/crawl). Every browser provider implements the full
:class:`agent.browser_provider.BrowserProvider` lifecycle; the registry's
job is purely selection, not capability routing.
"""
from __future__ import annotations
import logging
import threading
from typing import Dict, List, Optional
from agent.browser_provider import BrowserProvider
logger = logging.getLogger(__name__)
_providers: Dict[str, BrowserProvider] = {}
_lock = threading.Lock()
def register_provider(provider: BrowserProvider) -> None:
"""Register a cloud browser provider.
Re-registration (same ``name``) overwrites the previous entry and logs
a debug message — makes hot-reload scenarios (tests, dev loops) behave
predictably.
"""
if not isinstance(provider, BrowserProvider):
raise TypeError(
f"register_provider() expects a BrowserProvider instance, "
f"got {type(provider).__name__}"
)
name = provider.name
if not isinstance(name, str) or not name.strip():
raise ValueError("Browser provider .name must be a non-empty string")
with _lock:
existing = _providers.get(name)
_providers[name] = provider
if existing is not None:
logger.debug(
"Browser provider '%s' re-registered (was %r)",
name, type(existing).__name__,
)
else:
logger.debug(
"Registered browser provider '%s' (%s)",
name, type(provider).__name__,
)
def list_providers() -> List[BrowserProvider]:
"""Return all registered providers, sorted by name."""
with _lock:
items = list(_providers.values())
return sorted(items, key=lambda p: p.name)
def get_provider(name: str) -> Optional[BrowserProvider]:
"""Return the provider registered under *name*, or None."""
if not isinstance(name, str):
return None
with _lock:
return _providers.get(name.strip())
# ---------------------------------------------------------------------------
# Active-provider resolution
# ---------------------------------------------------------------------------
# Legacy auto-detect order — used when no ``browser.cloud_provider`` is set.
# Matches the pre-migration walk in :func:`tools.browser_tool._get_cloud_provider`.
# Firecrawl is intentionally absent so users with ``FIRECRAWL_API_KEY`` set
# for web-extract don't get silently routed to a paid cloud browser. See
# :func:`_resolve` for the full rationale.
_LEGACY_PREFERENCE = (
"browser-use",
"browserbase",
)
def _resolve(configured: Optional[str]) -> Optional[BrowserProvider]:
"""Resolve the active browser provider.
Resolution rules (in order):
1. **Explicit "local".** Returns None — the dispatcher disables cloud
mode entirely. Mirrors legacy short-circuit in
:func:`tools.browser_tool._get_cloud_provider`.
2. **Explicit config wins, ignoring availability.** If ``configured``
names a registered provider, return it even if its
:meth:`is_available` returns False — the dispatcher will surface a
precise "X_API_KEY is not set" error instead of silently routing
somewhere else.
3. **Legacy preference walk, filtered by availability.** Walk
:data:`_LEGACY_PREFERENCE` (``browser-use`` → ``browserbase``) looking
for a provider whose ``is_available()`` is True.
There is intentionally NO "single-eligible shortcut" rule here (unlike
:func:`agent.web_search_registry._resolve`). Pre-migration, the
auto-detect branch in ``tools.browser_tool._get_cloud_provider`` only
considered Browser Use and Browserbase; Firecrawl was reachable only
via an explicit ``browser.cloud_provider: firecrawl`` config key.
Preserving that gate matters because Firecrawl shares its API key with
the *web* extract plugin (``plugins/web/firecrawl/``), so users who set
``FIRECRAWL_API_KEY`` for web extract must NOT get silently routed to a
paid cloud browser on a fresh install. Third-party browser-provider
plugins added under ``~/.hermes/plugins/browser/<vendor>/`` are subject
to the same gate — they must be explicitly configured to take effect.
Returns None when no provider is configured AND no available provider
matches the legacy preference; the dispatcher then falls back to local
browser mode.
"""
with _lock:
snapshot = dict(_providers)
def _is_available_safe(p: BrowserProvider) -> bool:
"""Wrap ``is_available()`` so a buggy provider doesn't kill resolution."""
try:
return bool(p.is_available())
except Exception as exc: # noqa: BLE001
logger.warning(
"Browser provider %s.is_available() raised %s — treating as unavailable",
p.name, exc, exc_info=True,
)
return False
# 1. Explicit "local" short-circuit.
if configured == "local":
return None
# 2. Explicit config wins — return regardless of is_available() so the
# user gets a precise downstream error message rather than a silent
# backend switch. Matches _get_cloud_provider() in browser_tool.py.
if configured:
provider = snapshot.get(configured)
if provider is not None:
return provider
logger.debug(
"browser cloud_provider '%s' configured but not registered; "
"falling back to auto-detect",
configured,
)
# 3. Legacy preference walk — only providers in _LEGACY_PREFERENCE are
# auto-eligible. Filtered by availability so we don't surface a
# provider the user has no credentials for. See docstring for why
# we do NOT fall back to "any single-eligible registered provider".
for legacy in _LEGACY_PREFERENCE:
provider = snapshot.get(legacy)
if provider is not None and _is_available_safe(provider):
return provider
return None
def _reset_for_tests() -> None:
"""Clear the registry. **Test-only.**"""
with _lock:
_providers.clear()

File diff suppressed because it is too large Load Diff

View File

@@ -23,38 +23,6 @@ from agent.prompt_builder import DEFAULT_AGENT_IDENTITY
logger = logging.getLogger(__name__)
def _classify_responses_issuer(
*,
is_xai_responses: bool = False,
is_github_responses: bool = False,
is_codex_backend: bool = False,
base_url: Optional[str] = None,
) -> str:
"""Stable identifier for the Responses endpoint that mints encrypted_content.
``reasoning.encrypted_content`` is sealed to the endpoint that issued it:
replaying a Codex-minted blob against xAI (or vice versa) deterministically
returns HTTP 400 ``invalid_encrypted_content``. Stamping the issuer on
persisted reasoning items and filtering at replay time lets a single
conversation switch models without poisoning history with un-decryptable
reasoning blocks.
"""
if is_xai_responses:
return "xai_responses"
if is_github_responses:
return "github_responses"
if is_codex_backend:
return "codex_backend"
if base_url:
return f"other:{base_url}"
return "other"
# Throttle the per-process cross-issuer skip warning so we don't flood logs
# when a long history contains many stale-issuer reasoning blocks.
_CROSS_ISSUER_WARN_EMITTED = False
# Matches Codex/Harmony tool-call serialization that occasionally leaks into
# assistant-message content when the model fails to emit a structured
# ``function_call`` item. Accepts the common forms:
@@ -276,47 +244,8 @@ def _normalize_responses_message_status(value: Any, *, default: str = "completed
return default
def _chat_messages_to_responses_input(
messages: List[Dict[str, Any]],
*,
is_xai_responses: bool = False,
replay_encrypted_reasoning: bool = True,
current_issuer_kind: Optional[str] = None,
) -> List[Dict[str, Any]]:
"""Convert internal chat-style messages to Responses input items.
``is_xai_responses`` is kept for transport signature compatibility but
no longer suppresses encrypted reasoning replay. Earlier (PR #26644,
May 2026) we believed xAI's OAuth/SuperGrok ``/v1/responses`` surface
rejected replayed ``encrypted_content`` reasoning items minted by
prior turns, and we stripped them. That decision was wrong — xAI
explicitly relies on Hermes threading encrypted reasoning back across
turns for cross-turn coherence (the whole point of their partnership
integration). We now replay encrypted reasoning on every Responses
transport (xAI, native Codex, custom relays) and let xAI tell us
explicitly if a specific surface ever rejects a payload.
``replay_encrypted_reasoning`` is the per-session kill switch. Some
OpenAI-compatible relays accept the request but later reject the
replayed encrypted blob with HTTP 400 ``invalid_encrypted_content``;
when that happens the retry loop calls
``AIAgent._disable_codex_reasoning_replay`` which both strips cached
items from the conversation history and threads ``replay_enabled=False``
through this converter so subsequent turns send no reasoning items.
``current_issuer_kind`` enables a per-item cross-issuer guard. The
Responses API's ``encrypted_content`` blob is decryptable only by the
endpoint that minted it — replaying a Codex-issued blob against xAI
(or vice versa) always yields HTTP 400 ``invalid_encrypted_content``
and breaks every subsequent turn in the same session. When this
argument is provided and a reasoning item carries an ``_issuer_kind``
stamp from a different endpoint, the item is dropped from the replayed
input. Legacy items without a stamp are still replayed
(backwards-compatible). The two guards compose:
``replay_encrypted_reasoning=False`` is the session-wide kill switch
(drops ALL replay); ``current_issuer_kind`` is the per-item filter
that runs only when replay is still enabled.
"""
def _chat_messages_to_responses_input(messages: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
"""Convert internal chat-style messages to Responses input items."""
items: List[Dict[str, Any]] = []
seen_item_ids: set = set()
@@ -342,14 +271,7 @@ def _chat_messages_to_responses_input(
if role == "assistant":
# Replay encrypted reasoning items from previous turns
# so the API can maintain coherent reasoning chains.
# This applies to every Responses transport including
# xAI — see _chat_messages_to_responses_input docstring
# for the May 2026 reversal of the earlier xAI gate.
codex_reasoning = (
msg.get("codex_reasoning_items")
if replay_encrypted_reasoning
else None
)
codex_reasoning = msg.get("codex_reasoning_items")
has_codex_reasoning = False
if isinstance(codex_reasoning, list):
for ri in codex_reasoning:
@@ -357,40 +279,11 @@ def _chat_messages_to_responses_input(
item_id = ri.get("id")
if item_id and item_id in seen_item_ids:
continue
# Cross-issuer guard: drop reasoning blocks that
# were minted by a different Responses endpoint.
# The current endpoint cannot decrypt foreign
# encrypted_content and would reject the whole
# request with HTTP 400 invalid_encrypted_content.
# Unstamped (legacy) items pass through.
item_issuer = ri.get("_issuer_kind")
if (
current_issuer_kind is not None
and item_issuer is not None
and item_issuer != current_issuer_kind
):
global _CROSS_ISSUER_WARN_EMITTED
if not _CROSS_ISSUER_WARN_EMITTED:
logger.warning(
"Dropping reasoning item minted by %s while "
"calling %s — encrypted_content is sealed to "
"its issuer. This happens when a session "
"switches model providers mid-conversation.",
item_issuer, current_issuer_kind,
)
_CROSS_ISSUER_WARN_EMITTED = True
continue
# Strip the "id" field — with store=False the
# Responses API cannot look up items by ID and
# returns 404. The encrypted_content blob is
# self-contained for reasoning chain continuity.
# Also strip the internal "_issuer_kind" stamp;
# it is a Hermes-side metadata key and not part
# of the Responses API schema.
replay_item = {
k: v for k, v in ri.items()
if k not in ("id", "_issuer_kind")
}
replay_item = {k: v for k, v in ri.items() if k != "id"}
items.append(replay_item)
if item_id:
seen_item_ids.add(item_id)
@@ -833,7 +726,7 @@ def _preflight_codex_api_kwargs(
"model", "instructions", "input", "tools", "store",
"reasoning", "include", "max_output_tokens", "temperature",
"tool_choice", "parallel_tool_calls", "prompt_cache_key", "service_tier",
"extra_headers", "extra_body", "timeout",
"extra_headers",
}
normalized: Dict[str, Any] = {
"model": model,
@@ -859,13 +752,6 @@ def _preflight_codex_api_kwargs(
max_output_tokens = api_kwargs.get("max_output_tokens")
if isinstance(max_output_tokens, (int, float)) and max_output_tokens > 0:
normalized["max_output_tokens"] = int(max_output_tokens)
timeout = api_kwargs.get("timeout")
if (
isinstance(timeout, (int, float))
and not isinstance(timeout, bool)
and 0 < float(timeout) < float("inf")
):
normalized["timeout"] = float(timeout)
temperature = api_kwargs.get("temperature")
if isinstance(temperature, (int, float)):
normalized["temperature"] = float(temperature)
@@ -890,19 +776,6 @@ def _preflight_codex_api_kwargs(
if normalized_headers:
normalized["extra_headers"] = normalized_headers
extra_body = api_kwargs.get("extra_body")
if extra_body is not None:
if not isinstance(extra_body, dict):
raise ValueError("Codex Responses request 'extra_body' must be an object.")
# Pass extra_body through verbatim — used by xAI Responses to
# carry `prompt_cache_key` as a body-level field (the documented
# cache-routing surface on /v1/responses). The openai SDK
# serializes extra_body into the JSON body without per-field
# type checks, so it survives Responses.stream() kwarg-signature
# changes that would otherwise raise TypeError before the wire.
if extra_body:
normalized["extra_body"] = dict(extra_body)
if allow_stream:
stream = api_kwargs.get("stream")
if stream is not None and stream is not True:
@@ -913,26 +786,6 @@ def _preflight_codex_api_kwargs(
elif "stream" in api_kwargs:
raise ValueError("Codex Responses stream flag is only allowed in fallback streaming requests.")
# Safety-net sanitization for xAI Responses (#28490): defense-in-depth
# for the same slash-enum strip that ``chat_completion_helpers`` and
# ``auxiliary_client`` apply at request-build time. If a future code
# path forgets to sanitize before calling us, this catches the bypass
# so xAI doesn't 400 with ``Invalid arguments passed to the model``
# (HuggingFace IDs like ``Qwen/Qwen3.5-0.8B`` from MCP tool schemas).
#
# Gated on the model name pattern because native Codex (OpenAI) DOES
# accept slash-containing enum values — stripping them there would
# silently degrade tool-schema constraints. xAI is the only
# Responses-API surface that rejects the shape.
model_name_for_provider_check = str(api_kwargs.get("model") or "").lower()
is_xai_model = model_name_for_provider_check.startswith(("grok-", "x-ai/grok-"))
if is_xai_model and normalized.get("tools"):
try:
from tools.schema_sanitizer import strip_slash_enum
normalized["tools"], _ = strip_slash_enum(normalized["tools"])
except Exception:
pass # Best-effort — the caller-level sanitization should have handled it
unexpected = sorted(key for key in api_kwargs if key not in allowed_keys)
if unexpected:
raise ValueError(
@@ -980,64 +833,12 @@ def _extract_responses_reasoning_text(item: Any) -> str:
return ""
def _format_responses_error(error_obj: Any, response_status: str) -> str:
"""Build a human-readable error string from a Responses ``response.error`` payload.
The OpenAI Responses API carries failure details under ``response.error``
on terminal ``response.failed`` events, in the shape
``{"code": "rate_limit_exceeded", "message": "Slow down", "param": ...}``.
Earlier code only surfaced ``message``, which left users staring at bare
strings like ``"Slow down"`` while the failure mode (rate limit vs
context-length vs internal_error vs model-overloaded) was hidden in
``code``. We now prefix ``code`` when both are present so consumers can
distinguish failure modes without parsing the bare message.
Falls back to ``code`` alone when ``message`` is empty, and to a stable
default referencing the response status when no error payload is
available at all. Adapted from anomalyco/opencode#28757.
"""
# Pull code and message from either dict or attribute-style payloads.
code: Any = None
message: Any = None
if isinstance(error_obj, dict):
code = error_obj.get("code")
message = error_obj.get("message")
elif error_obj is not None:
code = getattr(error_obj, "code", None)
message = getattr(error_obj, "message", None)
code_str = str(code).strip() if isinstance(code, str) else (str(code).strip() if code else "")
message_str = str(message).strip() if isinstance(message, str) else (str(message).strip() if message else "")
if code_str and message_str:
return f"{code_str}: {message_str}"
if message_str:
return message_str
if code_str:
return code_str
if error_obj:
# Last-resort: stringify whatever the provider sent so it's at least
# visible in logs/UI rather than silently swallowed.
return str(error_obj)
return f"Responses API returned status '{response_status}'"
# ---------------------------------------------------------------------------
# Full response normalization
# ---------------------------------------------------------------------------
def _normalize_codex_response(
response: Any,
*,
issuer_kind: Optional[str] = None,
) -> tuple[Any, str]:
"""Normalize a Responses API object to an assistant_message-like object.
``issuer_kind`` (when provided) is stamped onto each reasoning item the
response yields, so future replays can detect when the active endpoint
differs from the one that minted the encrypted_content blob and drop
the item instead of triggering HTTP 400 invalid_encrypted_content.
"""
def _normalize_codex_response(response: Any) -> tuple[Any, str]:
"""Normalize a Responses API object to an assistant_message-like object."""
output = getattr(response, "output", None)
if not isinstance(output, list) or not output:
# The Codex backend can return empty output when the answer was
@@ -1065,7 +866,10 @@ def _normalize_codex_response(
if response_status in {"failed", "cancelled"}:
error_obj = getattr(response, "error", None)
error_msg = _format_responses_error(error_obj, response_status)
if isinstance(error_obj, dict):
error_msg = error_obj.get("message") or str(error_obj)
else:
error_msg = str(error_obj) if error_obj else f"Responses API returned status '{response_status}'"
raise RuntimeError(error_msg)
content_parts: List[str] = []
@@ -1076,7 +880,6 @@ def _normalize_codex_response(
has_incomplete_items = response_status in {"queued", "in_progress", "incomplete"}
saw_commentary_phase = False
saw_final_answer_phase = False
saw_reasoning_item = False
for item in output:
item_type = getattr(item, "type", None)
@@ -1114,7 +917,6 @@ def _normalize_codex_response(
raw_message_item["phase"] = normalized_phase
message_items_raw.append(raw_message_item)
elif item_type == "reasoning":
saw_reasoning_item = True
reasoning_text = _extract_responses_reasoning_text(item)
if reasoning_text:
reasoning_parts.append(reasoning_text)
@@ -1124,19 +926,7 @@ def _normalize_codex_response(
encrypted = getattr(item, "encrypted_content", None)
if isinstance(encrypted, str) and encrypted:
raw_item = {"type": "reasoning", "encrypted_content": encrypted}
# Stamp the issuer so future turns can detect when a
# model swap moved the conversation to an endpoint that
# cannot decrypt this blob — see _chat_messages_to_responses_input
# cross-issuer guard.
if issuer_kind:
raw_item["_issuer_kind"] = issuer_kind
item_id = getattr(item, "id", None)
if isinstance(item_id, str) and item_id.startswith("rs_tmp_"):
logger.debug(
"Skipping transient Codex reasoning item during normalization: %s",
item_id,
)
continue
if isinstance(item_id, str) and item_id:
raw_item["id"] = item_id
# Capture summary — required by the API when replaying reasoning items
@@ -1247,13 +1037,13 @@ def _normalize_codex_response(
finish_reason = "incomplete"
elif has_incomplete_items or (saw_commentary_phase and not saw_final_answer_phase):
finish_reason = "incomplete"
elif (reasoning_items_raw or reasoning_parts or saw_reasoning_item) and not final_text:
# Response contains only reasoning (encrypted thinking state and/or
# human-readable summary) with no visible content or tool calls. The
# model is still thinking and needs another turn to produce the actual
# answer. Marking this as "stop" would send it into the empty-content
# retry loop which burns retries then fails — treat it as incomplete so
# the Codex continuation path handles it correctly.
elif reasoning_items_raw and not final_text:
# Response contains only reasoning (encrypted thinking state) with
# no visible content or tool calls. The model is still thinking and
# needs another turn to produce the actual answer. Marking this as
# "stop" would send it into the empty-content retry loop which burns
# 3 retries then fails — treat it as incomplete instead so the Codex
# continuation path handles it correctly.
finish_reason = "incomplete"
else:
finish_reason = "stop"

View File

@@ -1,535 +0,0 @@
"""Codex API runtime — App Server and Responses-API streaming paths.
Extracted from :class:`AIAgent` to keep the agent loop file focused.
Each function takes the parent ``AIAgent`` as its first argument
(``agent``). AIAgent keeps thin forwarder methods for backward
compatibility.
* ``run_codex_app_server_turn`` — drives one turn through the
``codex_app_server`` subprocess client (used when a Codex CLI install
is the active provider).
* ``run_codex_stream`` — streams a Codex Responses API call (the
``codex_responses`` api_mode).
* ``run_codex_create_stream_fallback`` — recovery path when the
Responses ``stream=True`` initial create fails.
"""
from __future__ import annotations
import logging
import os
import time
from types import SimpleNamespace
from typing import Any, Dict, List
logger = logging.getLogger(__name__)
def run_codex_app_server_turn(
agent,
*,
user_message: str,
original_user_message: Any,
messages: List[Dict[str, Any]],
effective_task_id: str,
should_review_memory: bool = False,
) -> Dict[str, Any]:
"""Codex app-server runtime path. Hands the entire turn to a `codex
app-server` subprocess and projects its events back into Hermes'
messages list so memory/skill review keep working.
Called from run_conversation() when agent.api_mode == "codex_app_server".
Returns the same dict shape as the chat_completions path.
"""
from agent.transports.codex_app_server_session import CodexAppServerSession
# Lazy session: one CodexAppServerSession per AIAgent instance.
# Spawned on first turn, reused across turns, closed at AIAgent
# shutdown (see _cleanup hook).
if not hasattr(agent, "_codex_session") or agent._codex_session is None:
cwd = getattr(agent, "session_cwd", None) or os.getcwd()
# Approval callback: defer to Hermes' standard prompt flow if a
# CLI thread has installed one. Gateway / cron contexts get the
# codex-side fail-closed default.
try:
from tools.terminal_tool import _get_approval_callback
approval_callback = _get_approval_callback()
except Exception:
approval_callback = None
agent._codex_session = CodexAppServerSession(
cwd=cwd,
approval_callback=approval_callback,
)
# NOTE: the user message is ALREADY appended to messages by the
# standard run_conversation() flow (line ~11823) before the early
# return reaches us. Do NOT append again — that would duplicate.
try:
turn = agent._codex_session.run_turn(user_input=user_message)
except Exception as exc:
logger.exception("codex app-server turn failed")
# Crash → unconditionally drop the session so the next turn
# respawns from scratch instead of reusing a dead client.
try:
agent._codex_session.close()
except Exception:
pass
agent._codex_session = None
return {
"final_response": (
f"Codex app-server turn failed: {exc}. "
f"Fall back to default runtime with `/codex-runtime auto`."
),
"messages": messages,
"api_calls": 0,
"completed": False,
"partial": True,
"error": str(exc),
}
# If the turn signalled the underlying client is wedged (deadline
# blown, post-tool watchdog tripped, OAuth refresh died, subprocess
# exited), retire the session so the next turn respawns codex
# rather than riding the broken process. Mirrors openclaw beta.8's
# "retire timed-out app-server clients" fix.
if getattr(turn, "should_retire", False):
logger.warning(
"codex app-server session retired (turn error: %s)",
turn.error,
)
try:
agent._codex_session.close()
except Exception:
pass
agent._codex_session = None
# Splice projected messages into the conversation. The projector emits
# standard {role, content, tool_calls, tool_call_id} entries, which
# is exactly what curator.py / sessions DB expect.
if turn.projected_messages:
messages.extend(turn.projected_messages)
# Counter ticks for the agent-improvement loop.
# _turns_since_memory and _user_turn_count are ALREADY incremented
# in the run_conversation() pre-loop block (lines ~11793-11817) so we
# do NOT touch them here — that would double-count.
# Only _iters_since_skill needs explicit increment, since the
# chat_completions loop bumps it per tool iteration (line ~12110)
# and that loop is bypassed on this path.
agent._iters_since_skill = (
getattr(agent, "_iters_since_skill", 0) + turn.tool_iterations
)
# Now check the skill nudge AFTER iters were incremented — same
# pattern the chat_completions path uses (line ~15432).
should_review_skills = False
if (
agent._skill_nudge_interval > 0
and agent._iters_since_skill >= agent._skill_nudge_interval
and "skill_manage" in agent.valid_tool_names
):
should_review_skills = True
agent._iters_since_skill = 0
# External memory provider sync (mirrors line ~15439). Skipped on
# interrupt/error to avoid feeding partial transcripts to memory.
if not turn.interrupted and turn.error is None:
try:
agent._sync_external_memory_for_turn(
original_user_message=original_user_message,
final_response=turn.final_text,
interrupted=False,
)
except Exception:
logger.debug("external memory sync raised", exc_info=True)
# Background review fork — same cadence + signature as the default
# path (line ~15449). Only fires when a trigger actually tripped AND
# we have a real final response.
if (
turn.final_text
and not turn.interrupted
and (should_review_memory or should_review_skills)
):
try:
agent._spawn_background_review(
messages_snapshot=list(messages),
review_memory=should_review_memory,
review_skills=should_review_skills,
)
except Exception:
logger.debug("background review spawn raised", exc_info=True)
return {
"final_response": turn.final_text,
"messages": messages,
"api_calls": 1, # one app-server "turn" maps to one logical API call
"completed": not turn.interrupted and turn.error is None,
"partial": turn.interrupted or turn.error is not None,
"error": turn.error,
"codex_thread_id": turn.thread_id,
"codex_turn_id": turn.turn_id,
}
# ---------------------------------------------------------------------------
# Event-driven Responses streaming
#
# OpenAI ships its consumer Codex backend (chatgpt.com/backend-api/codex) on
# a different schedule from the openai Python SDK. The high-level
# ``client.responses.stream(...)`` helper reconstructs a typed Response from
# the terminal ``response.completed`` event's ``response.output`` field, and
# when that field drifts to ``null`` (gpt-5.5, May 2026) the SDK raises
# ``TypeError: 'NoneType' object is not iterable`` mid-iteration.
#
# We sidestep the whole class of failure by going one level lower:
# ``client.responses.create(stream=True)`` returns the raw AsyncIterable of
# SSE events, and we assemble the final response object purely from
# ``response.output_item.done`` events as they arrive. We never read
# ``response.completed.response.output`` for content reconstruction, so the
# backend can return ``null``, ``[]``, a string, or omit the field entirely
# and we don't care.
#
# This mirrors what the OpenClaw TS implementation does for the same backend
# and is structurally immune to the bug class rather than patched.
# ---------------------------------------------------------------------------
_TERMINAL_EVENT_TYPES = frozenset({
"response.completed",
"response.incomplete",
"response.failed",
})
def _event_field(event: Any, name: str, default: Any = None) -> Any:
"""Field access that handles both attr-style (SDK objects) and dict (raw JSON) events."""
value = getattr(event, name, None)
if value is None and isinstance(event, dict):
value = event.get(name, default)
return value if value is not None else default
def _raise_stream_error(event: Any) -> None:
"""Raise a ``_StreamErrorEvent`` from a ``type=error`` SSE frame.
Imported lazily so this module stays importable from places that don't
pull in ``run_agent`` (e.g. plugin code, doc tools).
"""
from run_agent import _StreamErrorEvent
message = (_event_field(event, "message", "") or "stream emitted error event").strip()
raise _StreamErrorEvent(
message,
code=_event_field(event, "code"),
param=_event_field(event, "param"),
)
def _consume_codex_event_stream(
event_iter: Any,
*,
model: str,
on_text_delta=None,
on_reasoning_delta=None,
on_first_delta=None,
on_event=None,
interrupt_check=None,
) -> SimpleNamespace:
"""Consume a Codex Responses SSE event stream and return a final response.
The returned object is a ``SimpleNamespace`` shaped like the SDK's typed
``Response`` for the fields downstream code actually reads:
* ``output``: list of output items, assembled from ``response.output_item.done``.
For tool-call turns this contains the function_call items; for plain-text
turns it contains a synthesized ``message`` item built from streamed deltas
if no message item was emitted directly.
* ``output_text``: assembled text from ``response.output_text.delta`` deltas.
* ``usage``: copied from the terminal event's ``response.usage`` (when present).
* ``status``: ``completed`` / ``incomplete`` / ``failed`` (or ``completed`` if
the stream ended without a terminal frame but produced content).
* ``id``: ``response.id`` when present.
* ``incomplete_details``: passed through for ``response.incomplete`` frames.
* ``error``: passed through for ``response.failed`` frames.
* ``model``: from kwargs (the wire model name is not authoritative).
Critically, we never read ``response.output`` from the terminal event for
content reconstruction — only ``usage``, ``status``, ``id``. That field
being ``null`` / ``[]`` / missing is fine.
Callbacks:
* ``on_text_delta(str)`` — fires per ``response.output_text.delta``, suppressed
once a function_call event is seen (so tool-call turns don't bleed text
into the chat).
* ``on_reasoning_delta(str)`` — fires per ``response.reasoning.*.delta``.
* ``on_first_delta()`` — one-shot, fires on the first text delta only.
* ``on_event(event)`` — fires for every event before any other processing.
Used for watchdog activity, debug logging, anything wire-shape-agnostic.
* ``interrupt_check()`` — returns True to break the loop early.
"""
collected_output_items: List[Any] = []
collected_text_deltas: List[str] = []
has_tool_calls = False
first_delta_fired = False
terminal_status: str = "completed"
terminal_usage: Any = None
terminal_response_id: str = None
terminal_incomplete_details: Any = None
terminal_error: Any = None
saw_terminal = False
for event in event_iter:
if on_event is not None:
try:
on_event(event)
except (TimeoutError, InterruptedError):
# Control-flow signals from watchdog/cancellation hooks must
# propagate, not get swallowed as "debug noise".
raise
except Exception:
# Genuine bugs in third-party debug/log hooks shouldn't break
# stream consumption.
logger.debug("Codex stream on_event hook raised", exc_info=True)
if interrupt_check is not None and interrupt_check():
break
event_type = _event_field(event, "type", "")
if not isinstance(event_type, str):
event_type = ""
# ``error`` SSE frames carry the provider's real failure reason
# (subscription / quota / model-not-available / rejected-reasoning-replay)
# but never appear in the terminal set. Surface them as a structured
# exception so the credential pool + error classifier see the body.
if event_type == "error":
_raise_stream_error(event)
if "output_text.delta" in event_type or event_type == "response.output_text.delta":
delta_text = _event_field(event, "delta", "")
if delta_text:
collected_text_deltas.append(delta_text)
if not has_tool_calls:
if not first_delta_fired:
first_delta_fired = True
if on_first_delta is not None:
try:
on_first_delta()
except Exception:
logger.debug("Codex stream on_first_delta raised", exc_info=True)
if on_text_delta is not None:
try:
on_text_delta(delta_text)
except Exception:
logger.debug("Codex stream on_text_delta raised", exc_info=True)
continue
if "function_call" in event_type:
has_tool_calls = True
# fall through — function_call items still get added on output_item.done
if "reasoning" in event_type and "delta" in event_type:
reasoning_text = _event_field(event, "delta", "")
if reasoning_text and on_reasoning_delta is not None:
try:
on_reasoning_delta(reasoning_text)
except Exception:
logger.debug("Codex stream on_reasoning_delta raised", exc_info=True)
continue
if event_type == "response.output_item.done":
done_item = _event_field(event, "item")
if done_item is not None:
collected_output_items.append(done_item)
continue
if event_type in _TERMINAL_EVENT_TYPES:
saw_terminal = True
resp_obj = _event_field(event, "response")
if resp_obj is not None:
terminal_usage = getattr(resp_obj, "usage", None)
if terminal_usage is None and isinstance(resp_obj, dict):
terminal_usage = resp_obj.get("usage")
rid = getattr(resp_obj, "id", None)
if rid is None and isinstance(resp_obj, dict):
rid = resp_obj.get("id")
terminal_response_id = rid
rstatus = getattr(resp_obj, "status", None)
if rstatus is None and isinstance(resp_obj, dict):
rstatus = resp_obj.get("status")
if isinstance(rstatus, str):
terminal_status = rstatus
if event_type == "response.incomplete":
terminal_incomplete_details = getattr(resp_obj, "incomplete_details", None)
if terminal_incomplete_details is None and isinstance(resp_obj, dict):
terminal_incomplete_details = resp_obj.get("incomplete_details")
if event_type == "response.failed":
terminal_error = getattr(resp_obj, "error", None)
if terminal_error is None and isinstance(resp_obj, dict):
terminal_error = resp_obj.get("error")
if event_type == "response.completed":
terminal_status = terminal_status or "completed"
elif event_type == "response.incomplete":
terminal_status = terminal_status or "incomplete"
elif event_type == "response.failed":
terminal_status = terminal_status or "failed"
# Stop on terminal event.
break
# Build the final output list. Prefer items observed via output_item.done;
# if none arrived but we streamed plain text deltas (no tool calls), synthesize
# a single message item so downstream normalization has something to work with.
if collected_output_items:
output = list(collected_output_items)
elif collected_text_deltas and not has_tool_calls:
assembled = "".join(collected_text_deltas)
output = [SimpleNamespace(
type="message",
role="assistant",
status="completed",
content=[SimpleNamespace(type="output_text", text=assembled)],
)]
else:
output = []
# If the stream ended without any terminal event AND produced no usable
# content (no items, no text deltas), surface that as a RuntimeError so
# callers can distinguish "stream truncated mid-flight / provider rejected
# the call" from "stream completed with empty body". This preserves the
# signal the SDK's high-level helper used to raise as
# ``RuntimeError("Didn't receive a `response.completed` event.")``.
if not saw_terminal and not output:
raise RuntimeError(
"Codex Responses stream did not emit a terminal response"
)
assembled_text = "".join(collected_text_deltas)
final = SimpleNamespace(
output=output,
output_text=assembled_text,
usage=terminal_usage,
status=terminal_status,
id=terminal_response_id,
model=model,
incomplete_details=terminal_incomplete_details,
error=terminal_error,
)
return final
def run_codex_stream(agent, api_kwargs: dict, client: Any = None, on_first_delta=None):
"""Execute one streaming Responses API request and return the final response.
Uses ``responses.create(stream=True)`` (low-level raw event iteration)
rather than the high-level ``responses.stream(...)`` helper. This makes
us structurally immune to backend drift in the ``response.completed``
payload shape — we never let the SDK reconstruct a typed object from
the terminal event's ``output`` field.
"""
import httpx as _httpx
active_client = client or agent._ensure_primary_openai_client(reason="codex_stream_direct")
max_stream_retries = 1
# Accumulate streamed text so callers / compat shims can read it.
agent._codex_streamed_text_parts: list = []
def _on_text_delta(text: str) -> None:
agent._codex_streamed_text_parts.append(text)
agent._fire_stream_delta(text)
def _on_reasoning_delta(text: str) -> None:
agent._fire_reasoning_delta(text)
def _on_event(event: Any) -> None:
# TTFB watchdog and activity touch — runs once per SSE event.
agent._codex_stream_last_event_ts = time.time()
agent._touch_activity("receiving stream response")
def _interrupt_check() -> bool:
return bool(agent._interrupt_requested)
for attempt in range(max_stream_retries + 1):
if agent._interrupt_requested:
raise InterruptedError("Agent interrupted before Codex stream retry")
stream_kwargs = dict(api_kwargs)
stream_kwargs["stream"] = True
try:
event_stream = active_client.responses.create(**stream_kwargs)
except (_httpx.RemoteProtocolError, _httpx.ReadTimeout, _httpx.ConnectError, ConnectionError) as exc:
if attempt < max_stream_retries:
logger.debug(
"Codex Responses stream connect failed (attempt %s/%s); retrying. %s error=%s",
attempt + 1, max_stream_retries + 1,
agent._client_log_context(), exc,
)
continue
raise
try:
# Compatibility: some mocks/providers return a concrete response
# instead of an iterable. Pass it straight through.
if hasattr(event_stream, "output") and not hasattr(event_stream, "__iter__"):
return event_stream
try:
final = _consume_codex_event_stream(
event_stream,
model=api_kwargs.get("model"),
on_text_delta=_on_text_delta,
on_reasoning_delta=_on_reasoning_delta,
on_first_delta=on_first_delta,
on_event=_on_event,
interrupt_check=_interrupt_check,
)
except (_httpx.RemoteProtocolError, _httpx.ReadTimeout, _httpx.ConnectError, ConnectionError) as exc:
if attempt < max_stream_retries:
logger.debug(
"Codex Responses stream transport failed mid-iteration "
"(attempt %s/%s); retrying. %s error=%s",
attempt + 1, max_stream_retries + 1,
agent._client_log_context(), exc,
)
continue
raise
if final.status in {"incomplete", "failed"}:
logger.warning(
"Codex Responses stream terminal status=%s "
"(incomplete_details=%s, error=%s, streamed_chars=%d). %s",
final.status, final.incomplete_details, final.error,
sum(len(p) for p in agent._codex_streamed_text_parts),
agent._client_log_context(),
)
return final
finally:
close_fn = getattr(event_stream, "close", None)
if callable(close_fn):
try:
close_fn()
except Exception:
pass
def run_codex_create_stream_fallback(agent, api_kwargs: dict, client: Any = None):
"""Backward-compatible alias for the unified event-driven path.
Historically this was the fallback when the SDK's high-level
``responses.stream(...)`` helper raised on shape drift. The primary
path now does exactly what the fallback did, so this just forwards.
Kept as a public symbol because tests and a small number of call sites
still reference it by name.
"""
return run_codex_stream(agent, api_kwargs, client=client)
__all__ = [
"run_codex_app_server_turn",
"run_codex_stream",
"run_codex_create_stream_fallback",
"_consume_codex_event_stream",
]

View File

@@ -75,44 +75,6 @@ _IMAGE_TOKEN_ESTIMATE = 1600
_IMAGE_CHAR_EQUIVALENT = _IMAGE_TOKEN_ESTIMATE * _CHARS_PER_TOKEN
_SUMMARY_FAILURE_COOLDOWN_SECONDS = 600
# Hard ceiling for the deterministic summary-failure handoff. The fallback is
# only meant to preserve continuity anchors from the dropped window, not to
# become another unbounded transcript copy after the LLM summarizer failed.
_FALLBACK_SUMMARY_MAX_CHARS = 8_000
_FALLBACK_TURN_MAX_CHARS = 700
_PATH_MENTION_RE = re.compile(r"(?:/|~/?|[A-Za-z]:\\)[^\s`'\")\]}<>]+")
def _dedupe_append(items: list[str], value: str, *, limit: int) -> None:
value = value.strip()
if value and value not in items and len(items) < limit:
items.append(value)
def _extract_tool_call_name_and_args(tool_call: Any) -> tuple[str, str]:
"""Return a best-effort ``(name, arguments)`` pair for dict/object tool calls."""
if isinstance(tool_call, dict):
fn = tool_call.get("function") or {}
return str(fn.get("name") or "unknown"), str(fn.get("arguments") or "")
fn = getattr(tool_call, "function", None)
if fn is None:
return "unknown", ""
return str(getattr(fn, "name", None) or "unknown"), str(getattr(fn, "arguments", None) or "")
def _extract_tool_call_id(tool_call: Any) -> str:
if isinstance(tool_call, dict):
return str(tool_call.get("id") or "")
return str(getattr(tool_call, "id", "") or "")
def _collect_path_mentions(text: str, relevant_files: list[str], *, limit: int = 12) -> None:
for match in _PATH_MENTION_RE.findall(text):
_dedupe_append(relevant_files, match.rstrip(".,:;"), limit=limit)
def _content_length_for_budget(raw_content: Any) -> int:
"""Return the effective char-length of a message's content for token budgeting.
@@ -259,114 +221,6 @@ def _truncate_tool_call_args_json(args: str, head_chars: int = 200) -> str:
return json.dumps(shrunken, ensure_ascii=False)
_IMAGE_PART_TYPES = frozenset({"image_url", "input_image", "image"})
def _is_image_part(part: Any) -> bool:
"""True if ``part`` is a multimodal image content block.
Recognizes all three shapes the agent handles:
- OpenAI chat.completions: ``{"type": "image_url", "image_url": ...}``
- OpenAI Responses API: ``{"type": "input_image", "image_url": "..."}``
- Anthropic native: ``{"type": "image", "source": {...}}``
"""
if not isinstance(part, dict):
return False
return part.get("type") in _IMAGE_PART_TYPES
def _content_has_images(content: Any) -> bool:
"""True if a message's ``content`` is a multimodal list with image parts."""
if not isinstance(content, list):
return False
return any(_is_image_part(p) for p in content)
def _strip_images_from_content(content: Any) -> Any:
"""Return a copy of ``content`` with every image part replaced by a
short text placeholder.
- String content is returned unchanged.
- Non-list, non-string content is returned unchanged.
- List content: image parts become ``{"type": "text", "text": "[Attached
image — stripped after compression]"}``; other parts are preserved as-is.
Input is never mutated.
"""
if not isinstance(content, list):
return content
if not any(_is_image_part(p) for p in content):
return content
new_parts: List[Any] = []
for p in content:
if _is_image_part(p):
new_parts.append({
"type": "text",
"text": "[Attached image — stripped after compression]",
})
else:
new_parts.append(p)
return new_parts
def _strip_historical_media(messages: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
"""Replace image parts in older messages with placeholder text.
The anchor is the *last* user message that has any image content. Every
message before that anchor gets its image parts replaced with a short
placeholder so the outgoing request stops re-shipping the same multi-MB
base-64 image blobs on every turn.
If no user message carries images, the list is returned unchanged.
If the only user message with images is the very first one (nothing
earlier to strip), the list is returned unchanged.
Shallow copies of touched messages only; input is never mutated.
Port of Kilo-Org/kilocode#9434 (adapted for the OpenAI-style message
shape the hermes compressor emits).
"""
if not messages:
return messages
# Find the newest user message that carries at least one image part.
# We anchor on image-bearing user messages (not all user messages) so
# a plain text follow-up after a big-image turn still strips the old
# image — matching the problem kilocode#9434 set out to solve.
anchor = -1
for i in range(len(messages) - 1, -1, -1):
msg = messages[i]
if not isinstance(msg, dict):
continue
if msg.get("role") != "user":
continue
if _content_has_images(msg.get("content")):
anchor = i
break
if anchor <= 0:
# No image-bearing user message, or it's the very first message —
# nothing before it to strip.
return messages
changed = False
result: List[Dict[str, Any]] = []
for i, msg in enumerate(messages):
if i >= anchor or not isinstance(msg, dict):
result.append(msg)
continue
content = msg.get("content")
if not _content_has_images(content):
result.append(msg)
continue
new_msg = msg.copy()
new_msg["content"] = _strip_images_from_content(content)
result.append(new_msg)
changed = True
return result if changed else messages
def _summarize_tool_result(tool_name: str, tool_args: str, tool_content: str) -> str:
"""Create an informative 1-line summary of a tool call + result.
@@ -524,7 +378,7 @@ class ContextCompressor(ContextEngine):
model: str,
context_length: int,
base_url: str = "",
api_key: Any = "",
api_key: str = "",
provider: str = "",
api_mode: str = "",
) -> None:
@@ -561,7 +415,6 @@ class ContextCompressor(ContextEngine):
config_context_length: int | None = None,
provider: str = "",
api_mode: str = "",
abort_on_summary_failure: bool = False,
):
self.model = model
self.base_url = base_url
@@ -573,11 +426,6 @@ class ContextCompressor(ContextEngine):
self.protect_last_n = protect_last_n
self.summary_target_ratio = max(0.10, min(summary_target_ratio, 0.80))
self.quiet_mode = quiet_mode
# When True, summary-generation failure aborts compression entirely
# (returns messages unchanged, sets _last_compress_aborted=True).
# When False (default = historical behavior), insert a
# deterministic "summary unavailable" handoff and drop the middle window.
self.abort_on_summary_failure = abort_on_summary_failure
self.context_length = get_model_context_length(
model, base_url=base_url, api_key=api_key,
@@ -630,12 +478,6 @@ class ContextCompressor(ContextEngine):
# (gateway hygiene, /compress) can surface a visible warning.
self._last_summary_dropped_count: int = 0
self._last_summary_fallback_used: bool = False
# When summary generation fails we now ABORT compression entirely
# and return the original messages unchanged instead of dropping
# the middle window with a static placeholder. Callers inspect
# this flag to know "compression was attempted but aborted, freeze
# the chat until the user manually retries via /compress".
self._last_compress_aborted: bool = False
# When a user-configured summary model fails and we recover by
# retrying on the main model, record the failure so gateway /
# CLI callers can still warn the user even though compression
@@ -647,7 +489,6 @@ class ContextCompressor(ContextEngine):
"""Update tracked token usage from API response."""
self.last_prompt_tokens = usage.get("prompt_tokens", 0)
self.last_completion_tokens = usage.get("completion_tokens", 0)
self.last_total_tokens = usage.get("total_tokens", self.last_prompt_tokens + self.last_completion_tokens)
def should_compress(self, prompt_tokens: int = None) -> bool:
"""Check if context exceeds the compression threshold.
@@ -922,195 +763,6 @@ class ContextCompressor(ContextEngine):
return "\n\n".join(parts)
def _build_static_fallback_summary(
self,
turns_to_summarize: List[Dict[str, Any]],
reason: str | None = None,
) -> str:
"""Build a deterministic handoff when the LLM summarizer is unavailable.
This is intentionally much less rich than an LLM-written summary, but it
is still better than a bare "N messages were removed" marker. It keeps
the most useful continuity anchors that can be extracted locally:
recent user asks, assistant/tool actions, files/commands mentioned in
tool calls, and any error text. The result uses the normal summary
structure so downstream prompts can recover gracefully after a provider
outage or summary-model failure.
"""
user_asks: list[str] = []
assistant_actions: list[str] = []
tool_actions: list[str] = []
relevant_files: list[str] = []
blockers: list[str] = []
last_dropped_turns: list[str] = []
def _compact_fallback_turn(value: Any) -> str:
text = redact_sensitive_text(_content_text_for_contains(value))
text = re.sub(r"\bgh[pousr]_[A-Za-z0-9_]{8,}\b", "[REDACTED]", text)
text = re.sub(r"\s+", " ", text).strip()
if len(text) > _FALLBACK_TURN_MAX_CHARS:
text = text[: _FALLBACK_TURN_MAX_CHARS - 15].rstrip() + " ...[truncated]"
return re.sub(r"\bgh[pousr]_[A-Za-z0-9_.-]+", "[REDACTED]", text)
def _remember_dropped_turn(label: str, text: str, *, limit: int = 8) -> None:
text = text.strip()
if not text:
return
last_dropped_turns.append(f"{label}: {text}")
if len(last_dropped_turns) > limit:
del last_dropped_turns[0]
def _collect_paths_from_jsonish(obj: Any) -> None:
if isinstance(obj, dict):
for key, val in obj.items():
if key in {"path", "workdir", "file_path", "output_path"} and isinstance(val, str):
_dedupe_append(relevant_files, val, limit=12)
_collect_paths_from_jsonish(val)
elif isinstance(obj, list):
for val in obj:
_collect_paths_from_jsonish(val)
elif isinstance(obj, str):
_collect_path_mentions(obj, relevant_files)
call_id_to_tool: dict[str, tuple[str, str]] = {}
for msg in turns_to_summarize:
if msg.get("role") == "assistant" and msg.get("tool_calls"):
for tc in msg.get("tool_calls") or []:
name, raw_args = _extract_tool_call_name_and_args(tc)
args = redact_sensitive_text(raw_args)
call_id = _extract_tool_call_id(tc)
if call_id:
call_id_to_tool[call_id] = (name, args)
if args:
try:
parsed = json.loads(args)
except Exception:
parsed = args
_collect_paths_from_jsonish(parsed)
for msg in turns_to_summarize:
role = msg.get("role", "unknown")
text = _compact_fallback_turn(msg.get("content"))
_collect_path_mentions(text, relevant_files)
turn_text = text
turn_tool_names: list[str] = []
if role == "assistant" and msg.get("tool_calls"):
for tc in msg.get("tool_calls") or []:
name, _args = _extract_tool_call_name_and_args(tc)
turn_tool_names.append(name)
if turn_tool_names:
prefix = "tool calls: " + ", ".join(turn_tool_names[:6])
turn_text = f"{prefix}; {turn_text}" if turn_text else prefix
_remember_dropped_turn(str(role).upper(), turn_text)
if len(text) > 600:
text = text[:420].rstrip() + " ... " + text[-160:].lstrip()
if role == "user" and text:
user_asks.append(text)
elif role == "assistant":
tool_names: list[str] = []
for tc in msg.get("tool_calls") or []:
name, _args = _extract_tool_call_name_and_args(tc)
tool_names.append(name)
if tool_names:
assistant_actions.append(
"Called tool(s): " + ", ".join(tool_names[:6])
)
elif text:
assistant_actions.append(text)
elif role == "tool":
call_id = str(msg.get("tool_call_id") or "")
tool_name, tool_args = call_id_to_tool.get(call_id, ("unknown", ""))
tool_actions.append(
_summarize_tool_result(tool_name, tool_args, text or "")
)
if re.search(
r"\b(error|failed|exception|traceback|timeout|timed out|fatal)\b",
text,
re.I,
):
blockers.append(text[:500])
def _bullets(items: list[str], limit: int = 8) -> str:
unique: list[str] = []
seen: set[str] = set()
for item in items:
item = item.strip()
if not item or item in seen:
continue
seen.add(item)
unique.append(item)
if len(unique) >= limit:
break
return "\n".join(f"- {item}" for item in unique) if unique else "None."
completed: list[str] = []
for idx, item in enumerate((assistant_actions + tool_actions)[:12], start=1):
completed.append(f"{idx}. {item}")
active_task = (
f"User asked: {user_asks[-1]!r}"
if user_asks
else "Unknown from deterministic fallback."
)
previous_summary_note = ""
if self._previous_summary:
previous_summary_note = (
"\n\nPrevious compaction summary was present and should still be treated as "
"background continuity context, but the latest LLM summary update failed."
)
reason_text = f" Summary failure reason: {reason}." if reason else ""
body = f"""## Active Task
{active_task}
## Goal
Recovered from a deterministic fallback because the LLM context summarizer was unavailable. Continue from the protected recent messages after this summary and use current file/system state for exact details.{previous_summary_note}
## Constraints & Preferences
- This fallback was generated locally without an LLM summary call.
- Secrets and credentials were redacted before preservation.
- The summary may be incomplete; prefer verifying current files, git state, processes, and test results instead of assuming omitted details.
## Completed Actions
{chr(10).join(completed) if completed else "None recoverable from compacted turns."}
## Active State
Unknown from deterministic fallback. Inspect current repository/session state if needed.
## In Progress
{active_task}
## Blocked
{_bullets(blockers, limit=5)}
## Key Decisions
None recoverable from deterministic fallback.
## Resolved Questions
None recoverable from deterministic fallback.
## Pending User Asks
{active_task}
## Relevant Files
{_bullets(relevant_files, limit=12)}
## Remaining Work
Continue from the most recent unfulfilled user ask and protected tail messages. Verify state with tools before making claims.
## Last Dropped Turns
{_bullets(last_dropped_turns, limit=8)}
## Critical Context
Summary generation was unavailable, so this is a best-effort deterministic fallback for {len(turns_to_summarize)} compacted message(s).{reason_text}"""
summary = self._with_summary_prefix(redact_sensitive_text(body.strip()))
if len(summary) > _FALLBACK_SUMMARY_MAX_CHARS:
summary = summary[: _FALLBACK_SUMMARY_MAX_CHARS - 42].rstrip() + "\n...[fallback summary truncated]"
return summary
def _fallback_to_main_for_compression(self, e: Exception, reason: str) -> None:
"""Switch from a separate ``summary_model`` back to the main model.
@@ -1125,7 +777,7 @@ Summary generation was unavailable, so this is a best-effort deterministic fallb
into the warning log.
"""
self._summary_model_fallen_back = True
logger.warning(
logging.warning(
"Summary model '%s' %s (%s). "
"Falling back to main model '%s' for compression.",
self.summary_model, reason, e, self.model,
@@ -1138,11 +790,7 @@ Summary generation was unavailable, so this is a best-effort deterministic fallb
self.summary_model = "" # empty = use main model
self._summary_failure_cooldown_until = 0.0 # no cooldown — retry immediately
def _generate_summary(
self,
turns_to_summarize: List[Dict[str, Any]],
focus_topic: Optional[str] = None,
) -> Optional[str]:
def _generate_summary(self, turns_to_summarize: List[Dict[str, Any]], focus_topic: str = None) -> Optional[str]:
"""Generate a structured summary of conversation turns.
Uses a structured template (Goal, Progress, Decisions, Resolved/Pending
@@ -1318,7 +966,7 @@ The user has requested that this compaction PRIORITISE preserving all informatio
# No provider configured — long cooldown, unlikely to self-resolve
self._summary_failure_cooldown_until = time.monotonic() + _SUMMARY_FAILURE_COOLDOWN_SECONDS
self._last_summary_error = "no auxiliary LLM provider configured"
logger.warning("Context compression: no provider available for "
logging.warning("Context compression: no provider available for "
"summary. Middle turns will be dropped without summary "
"for %d seconds.",
_SUMMARY_FAILURE_COOLDOWN_SECONDS)
@@ -1414,7 +1062,7 @@ The user has requested that this compaction PRIORITISE preserving all informatio
if len(err_text) > 220:
err_text = err_text[:217].rstrip() + "..."
self._last_summary_error = err_text
logger.warning(
logging.warning(
"Failed to generate context summary: %s. "
"Further summary attempts paused for %d seconds.",
e,
@@ -1537,26 +1185,6 @@ The user has requested that this compaction PRIORITISE preserving all informatio
idx += 1
return idx
def _protect_head_size(self, messages: List[Dict[str, Any]]) -> int:
"""Total count of head messages to protect.
``protect_first_n`` is defined as *additional* messages protected
beyond the system prompt. The system prompt (if present at index 0)
is always implicitly protected — it's load-bearing context that
must never be summarised away. This keeps semantics stable across
call paths where the system prompt may or may not be included in
the ``messages`` list (e.g. the gateway ``/compress`` handler
strips it before calling compress()).
Examples:
protect_first_n=0 → system prompt only (or nothing if no system msg)
protect_first_n=3 → system + first 3 non-system messages
"""
head = 0
if messages and messages[0].get("role") == "system":
head = 1
return head + self.protect_first_n
def _align_boundary_backward(self, messages: List[Dict[str, Any]], idx: int) -> int:
"""Pull a compress-end boundary backward to avoid splitting a
tool_call / result group.
@@ -1715,7 +1343,7 @@ The user has requested that this compaction PRIORITISE preserving all informatio
skip the LLM call when the transcript is still entirely inside
the protected head/tail.
"""
compress_start = self._align_boundary_forward(messages, self._protect_head_size(messages))
compress_start = self._align_boundary_forward(messages, self.protect_first_n)
compress_end = self._find_tail_cut_by_tokens(messages, compress_start)
return compress_start < compress_end
@@ -1723,7 +1351,7 @@ The user has requested that this compaction PRIORITISE preserving all informatio
# Main compression entry point
# ------------------------------------------------------------------
def compress(self, messages: List[Dict[str, Any]], current_tokens: int = None, focus_topic: str = None, force: bool = False) -> List[Dict[str, Any]]:
def compress(self, messages: List[Dict[str, Any]], current_tokens: int = None, focus_topic: str = None) -> List[Dict[str, Any]]:
"""Compress conversation messages by summarizing middle turns.
Algorithm:
@@ -1741,9 +1369,6 @@ The user has requested that this compaction PRIORITISE preserving all informatio
provided, the summariser will prioritise preserving information
related to this topic and be more aggressive about compressing
everything else. Inspired by Claude Code's ``/compact``.
force: If True, clear any active summary-failure cooldown before
running so a manual ``/compress`` can retry immediately after
an auto-compression abort. Auto-compress callers pass False.
"""
# Reset per-call summary failure state — callers inspect these fields
# after compress() returns to decide whether to surface a warning.
@@ -1752,16 +1377,9 @@ The user has requested that this compaction PRIORITISE preserving all informatio
self._last_summary_error = None
self._last_aux_model_failure_error = None
self._last_aux_model_failure_model = None
self._last_compress_aborted = False
# Manual /compress (force=True) bypasses the failure cooldown so the
# user can retry immediately after an auto-compress abort. Without
# this, /compress would silently no-op for 30-60s after a failure.
if force and self._summary_failure_cooldown_until > 0.0:
self._summary_failure_cooldown_until = 0.0
n_messages = len(messages)
# Only need head + 3 tail messages minimum (token budget decides the real tail size)
_min_for_compress = self._protect_head_size(messages) + 3 + 1
_min_for_compress = self.protect_first_n + 3 + 1
if n_messages <= _min_for_compress:
if not self.quiet_mode:
logger.warning(
@@ -1781,7 +1399,7 @@ The user has requested that this compaction PRIORITISE preserving all informatio
logger.info("Pre-compression: pruned %d old tool result(s)", pruned_count)
# Phase 2: Determine boundaries
compress_start = self._protect_head_size(messages)
compress_start = self.protect_first_n
compress_start = self._align_boundary_forward(messages, compress_start)
# Use token-budget tail protection instead of fixed message count
@@ -1791,23 +1409,15 @@ The user has requested that this compaction PRIORITISE preserving all informatio
return messages
turns_to_summarize = messages[compress_start:compress_end]
# A persisted handoff summary can sit in the protected head after a
# resume (commonly immediately after the system prompt). Search from
# the first non-system message through the compression window so we can
# rehydrate iterative-summary state without serializing that handoff as
# a new turn. Protected messages after the handoff remain live context,
# so only summarize messages that are both after the handoff and inside
# the current compression window.
summary_search_start = 1 if messages and messages[0].get("role") == "system" else 0
summary_idx, summary_body = self._find_latest_context_summary(
messages,
summary_search_start,
compress_start,
compress_end,
)
if summary_idx is not None:
if summary_body and not self._previous_summary:
self._previous_summary = summary_body
turns_to_summarize = messages[max(compress_start, summary_idx + 1):compress_end]
turns_to_summarize = messages[summary_idx + 1:compress_end]
if not self.quiet_mode:
logger.info(
@@ -1834,32 +1444,6 @@ The user has requested that this compaction PRIORITISE preserving all informatio
# Phase 3: Generate structured summary
summary = self._generate_summary(turns_to_summarize, focus_topic=focus_topic)
# If summary generation failed, behavior splits on
# ``abort_on_summary_failure`` (config: compression.abort_on_summary_failure):
# True → ABORT compression entirely. Return messages unchanged
# and set _last_compress_aborted=True so callers can warn
# the user and stop the auto-compress retry loop.
# False → Fall through to the default fallback path below: insert
# a deterministic "summary unavailable" handoff and drop
# the middle window. Records _last_summary_fallback_used /
# _last_summary_dropped_count for gateway hygiene to
# surface a warning.
# Default is False (historical behavior).
if not summary and self.abort_on_summary_failure:
n_skipped = compress_end - compress_start
self._last_summary_dropped_count = 0 # nothing actually dropped
self._last_summary_fallback_used = False
self._last_compress_aborted = True
if not self.quiet_mode:
logger.warning(
"Summary generation failed — aborting compression "
"(compression.abort_on_summary_failure=true). "
"%d message(s) preserved unchanged. Conversation is "
"frozen until the next /compress or /new.",
n_skipped,
)
return messages
# Phase 4: Assemble compressed message list
compressed = []
for i in range(compress_start):
@@ -1874,18 +1458,20 @@ The user has requested that this compaction PRIORITISE preserving all informatio
)
compressed.append(msg)
# If LLM summary failed, insert a deterministic fallback so the model
# gets at least locally recoverable continuity anchors instead of a
# content-free "N messages were removed" marker.
# If LLM summary failed, insert a static fallback so the model
# knows context was lost rather than silently dropping everything.
if not summary:
if not self.quiet_mode:
logger.warning("Summary generation failed — inserting deterministic fallback context summary")
logger.warning("Summary generation failed — inserting static fallback context marker")
n_dropped = compress_end - compress_start
self._last_summary_dropped_count = n_dropped
self._last_summary_fallback_used = True
summary = self._build_static_fallback_summary(
turns_to_summarize,
reason=self._last_summary_error,
summary = (
f"{SUMMARY_PREFIX}\n"
f"Summary generation was unavailable. {n_dropped} message(s) were "
f"removed to free context space but could not be summarized. The removed "
f"messages contained earlier work in this session. Continue based on the "
f"recent messages below and the current state of any files or resources."
)
_merge_summary_into_tail = False
@@ -1945,14 +1531,6 @@ The user has requested that this compaction PRIORITISE preserving all informatio
compressed = self._sanitize_tool_pairs(compressed)
# Replace image parts in all compressed messages before the newest
# image-bearing user turn with a short text placeholder. Without
# this, tail messages keep their original multi-MB base-64 image
# payloads forever, which can push every subsequent API request
# past the provider's body-size limit and wedge the session.
# Port of Kilo-Org/kilocode#9434.
compressed = _strip_historical_media(compressed)
new_estimate = estimate_messages_tokens_rough(compressed)
saved_estimate = display_tokens - new_estimate

View File

@@ -55,11 +55,6 @@ class ContextEngine(ABC):
# These control the preflight compression check. Subclasses may
# override via __init__ or property; defaults are sensible for most
# engines.
#
# protect_first_n semantics (since PR #13754): count of non-system head
# messages always preserved verbatim, IN ADDITION to the system prompt
# which is always implicitly protected. Default 3 keeps the
# historical "system + first 3 non-system messages" head shape.
threshold_percent: float = 0.75
protect_first_n: int = 3
@@ -71,12 +66,7 @@ class ContextEngine(ABC):
def update_from_response(self, usage: Dict[str, Any]) -> None:
"""Update tracked token usage from an API response.
Called after every LLM call with a normalized usage dict. The legacy
keys ``prompt_tokens``, ``completion_tokens``, and ``total_tokens``
are always present. Newer hosts also include canonical buckets:
``input_tokens``, ``output_tokens``, ``cache_read_tokens``,
``cache_write_tokens``, and ``reasoning_tokens``. Engines should
treat those fields as optional for compatibility with older hosts.
Called after every LLM call with the usage dict from the response.
"""
@abstractmethod
@@ -205,7 +195,6 @@ class ContextEngine(ABC):
base_url: str = "",
api_key: str = "",
provider: str = "",
api_mode: str = "",
) -> None:
"""Called when the user switches models or on fallback activation.

View File

@@ -1,733 +0,0 @@
"""Context compression — extract the AIAgent methods that drive summarisation.
Three concerns live here:
* :func:`check_compression_model_feasibility` — startup probe of the
configured auxiliary compression model. Warns when the aux context
window can't fit the main model's compression threshold; auto-lowers
the session threshold when possible; hard-rejects auxes below
``MINIMUM_CONTEXT_LENGTH``.
* :func:`replay_compression_warning` — re-emit a stored warning through
the gateway ``status_callback`` once it's wired up (the callback is
set after :class:`AIAgent` construction).
* :func:`compress_context` — the actual compression call. Runs the
configured compressor, splits the SQLite session, rotates the
session_id, notifies plugin context engines / memory providers, and
returns the compressed message list and freshly-built system prompt.
* :func:`try_shrink_image_parts_in_messages` — image-too-large recovery
helper that re-encodes ``data:image/...;base64,...`` parts at a smaller
size so retries can fit under provider ceilings (Anthropic's 5 MB).
``run_agent`` keeps thin wrappers for each so existing call sites
(``self._compress_context(...)``) keep working. Tests that exercise
these paths see no behavioural change.
"""
from __future__ import annotations
import logging
import os
import tempfile
import uuid
from datetime import datetime
from pathlib import Path
from typing import Any, Optional, Tuple
from agent.model_metadata import estimate_request_tokens_rough
logger = logging.getLogger(__name__)
def _compression_lock_holder(agent: Any) -> str:
"""Build a unique holder id for the lock: pid:tid:agent-instance:uuid.
The pid+tid prefix lets ops tell crashed/abandoned holders apart from
live ones (expiry-based recovery uses the timestamp, but ``holder``
is what shows up in diagnostics + log lines). The agent instance id
and a per-acquire uuid disambiguate two co-resident agents on the
same thread (background_review forks run on a worker thread, but
on machines where compression itself dispatches to a thread pool
we want each acquire to be unique).
"""
import threading
return (
f"pid={os.getpid()}"
f":tid={threading.get_ident()}"
f":agent={id(agent):x}"
f":nonce={uuid.uuid4().hex[:8]}"
)
def check_compression_model_feasibility(agent: Any) -> None:
"""Warn at session start if the auxiliary compression model's context
window is smaller than the main model's compression threshold.
When the auxiliary model cannot fit the content that needs summarising,
compression will either fail outright (the LLM call errors) or produce
a severely truncated summary.
Called during ``AIAgent.__init__`` so CLI users see the warning
immediately (via ``_vprint``). The gateway sets ``status_callback``
*after* construction, so :func:`replay_compression_warning` re-sends
the stored warning through the callback on the first
``run_conversation()`` call.
"""
if not agent.compression_enabled:
return
try:
from agent.auxiliary_client import (
_resolve_task_provider_model,
get_text_auxiliary_client,
)
from agent.model_metadata import (
MINIMUM_CONTEXT_LENGTH,
get_model_context_length,
)
client, aux_model = get_text_auxiliary_client(
"compression",
main_runtime=agent._current_main_runtime(),
)
# Best-effort aux provider label for the warning message. The
# configured provider may be "auto", in which case we fall back
# to the client's base_url hostname so the user can still tell
# where the compression model is actually being called.
try:
_aux_cfg_provider, _, _, _, _ = _resolve_task_provider_model("compression")
except Exception:
_aux_cfg_provider = ""
if client is None or not aux_model:
if _aux_cfg_provider and _aux_cfg_provider != "auto":
msg = (
"⚠ Configured auxiliary compression provider "
f"'{_aux_cfg_provider}' is unavailable — context "
"compression will drop middle turns without a summary. "
"Check auxiliary.compression in config.yaml and "
"reauthenticate that provider."
)
else:
msg = (
"⚠ No auxiliary LLM provider configured — context "
"compression will drop middle turns without a summary. "
"Run `hermes setup` or set OPENROUTER_API_KEY."
)
agent._compression_warning = msg
agent._emit_status(msg)
logger.warning(
"No auxiliary LLM provider for compression — "
"summaries will be unavailable."
)
return
aux_base_url = str(getattr(client, "base_url", ""))
# ``client.api_key`` may be a callable (Azure Foundry Entra ID
# bearer provider). The context-length resolver chain expects a
# string, but it only needs a key for live catalogue probes
# (provider model lists). For Entra clients the model-metadata
# chain still resolves via models.dev + hardcoded family
# fallbacks, which don't require auth — pass empty string rather
# than minting a bearer JWT just to look up a context length.
_raw_aux_key = getattr(client, "api_key", "")
aux_api_key = "" if (callable(_raw_aux_key) and not isinstance(_raw_aux_key, str)) else str(_raw_aux_key or "")
aux_context = get_model_context_length(
aux_model,
base_url=aux_base_url,
api_key=aux_api_key,
config_context_length=getattr(agent, "_aux_compression_context_length_config", None),
# Each model must be resolved with its own provider so that
# provider-specific paths (e.g. Bedrock static table, OpenRouter API)
# are invoked for the correct client, not inherited from the main model.
provider=(_aux_cfg_provider if _aux_cfg_provider and _aux_cfg_provider != "auto" else getattr(agent, "provider", "")),
custom_providers=agent._custom_providers,
)
# Hard floor: the auxiliary compression model must have at least
# MINIMUM_CONTEXT_LENGTH (64K) tokens of context. The main model
# is already required to meet this floor (checked earlier in
# __init__), so the compression model must too — otherwise it
# cannot summarise a full threshold-sized window of main-model
# content. Mirrors the main-model rejection pattern.
if aux_context and aux_context < MINIMUM_CONTEXT_LENGTH:
raise ValueError(
f"Auxiliary compression model {aux_model} has a context "
f"window of {aux_context:,} tokens, which is below the "
f"minimum {MINIMUM_CONTEXT_LENGTH:,} required by Hermes "
f"Agent. Choose a compression model with at least "
f"{MINIMUM_CONTEXT_LENGTH // 1000}K context (set "
f"auxiliary.compression.model in config.yaml), or set "
f"auxiliary.compression.context_length to override the "
f"detected value if it is wrong."
)
threshold = agent.context_compressor.threshold_tokens
if aux_context < threshold:
# Auto-correct: lower the live session threshold so
# compression actually works this session. The hard floor
# above guarantees aux_context >= MINIMUM_CONTEXT_LENGTH,
# so the new threshold is always >= 64K.
#
# The compression summariser sends a single user-role
# prompt (no system prompt, no tools) to the aux model, so
# new_threshold == aux_context is safe: the request is
# the raw messages plus a small summarisation instruction.
old_threshold = threshold
new_threshold = aux_context
agent.context_compressor.threshold_tokens = new_threshold
# Keep threshold_percent in sync so future main-model
# context_length changes (update_model) re-derive from a
# sensible number rather than the original too-high value.
main_ctx = agent.context_compressor.context_length
if main_ctx:
agent.context_compressor.threshold_percent = (
new_threshold / main_ctx
)
safe_pct = int((aux_context / main_ctx) * 100) if main_ctx else 50
# Build human-readable "model (provider)" labels for both
# the main model and the compression model so users can
# tell at a glance which provider each side is actually
# using. When the configured provider is empty or "auto",
# fall back to the client's base_url hostname.
_main_model = getattr(agent, "model", "") or "?"
_main_provider = getattr(agent, "provider", "") or ""
_aux_provider_label = (
_aux_cfg_provider
if _aux_cfg_provider and _aux_cfg_provider != "auto"
else ""
)
if not _aux_provider_label:
try:
from urllib.parse import urlparse
_aux_provider_label = (
urlparse(aux_base_url).hostname or aux_base_url
)
except Exception:
_aux_provider_label = aux_base_url or "auto"
_main_label = (
f"{_main_model} ({_main_provider})"
if _main_provider
else _main_model
)
_aux_label = f"{aux_model} ({_aux_provider_label})"
msg = (
f"⚠ Compression model {_aux_label} context is "
f"{aux_context:,} tokens, but the main model "
f"{_main_label}'s compression threshold was "
f"{old_threshold:,} tokens. "
f"Auto-lowered this session's threshold to "
f"{new_threshold:,} tokens so compression can run.\n"
f" To make this permanent, edit config.yaml — either:\n"
f" 1. Use a larger compression model:\n"
f" auxiliary:\n"
f" compression:\n"
f" model: <model-with-{old_threshold:,}+-context>\n"
f" 2. Lower the compression threshold:\n"
f" compression:\n"
f" threshold: 0.{safe_pct:02d}"
)
agent._compression_warning = msg
agent._emit_status(msg)
logger.warning(
"Auxiliary compression model %s has %d token context, "
"below the main model's compression threshold of %d "
"tokens — auto-lowered session threshold to %d to "
"keep compression working.",
aux_model,
aux_context,
old_threshold,
new_threshold,
)
except ValueError:
# Hard rejections (aux below minimum context) must propagate
# so the session refuses to start.
raise
except Exception as exc:
logger.debug(
"Compression feasibility check failed (non-fatal): %s", exc
)
def replay_compression_warning(agent: Any) -> None:
"""Re-send the compression warning through ``status_callback``.
During ``__init__`` the gateway's ``status_callback`` is not yet
wired, so ``_emit_status`` only reaches ``_vprint`` (CLI). This
method is called once at the start of the first
``run_conversation()`` — by then the gateway has set the callback,
so every platform (Telegram, Discord, Slack, etc.) receives the
warning.
"""
msg = getattr(agent, "_compression_warning", None)
if msg and agent.status_callback:
try:
agent.status_callback("lifecycle", msg)
except Exception:
pass
def compress_context(
agent: Any,
messages: list,
system_message: str,
*,
approx_tokens: Optional[int] = None,
task_id: str = "default",
focus_topic: Optional[str] = None,
force: bool = False,
) -> Tuple[list, str]:
"""Compress conversation context and split the session in SQLite.
Args:
agent: The owning :class:`AIAgent`.
messages: Current message history (will be summarised).
system_message: Current system prompt; rebuilt after compression.
approx_tokens: Pre-compression token estimate, logged for ops.
task_id: Tool task scope (used for clearing file-read dedup state).
focus_topic: Optional focus string for guided compression — the
summariser will prioritise preserving information related to
this topic. Inspired by Claude Code's ``/compact <focus>``.
force: If True, bypass any active summary-failure cooldown. Set
by the manual ``/compress`` slash command so users can retry
immediately after an auto-compress abort. Auto-compress
callers use the default ``False``.
Returns:
``(compressed_messages, new_system_prompt)`` tuple. When
compression aborts (aux LLM failed to produce a usable summary),
returns the original messages unchanged and the existing system
prompt — the session is NOT rotated. Callers should detect the
no-op via ``len(returned) == len(input)`` and stop the retry loop.
"""
# Lazy feasibility check — run the auxiliary-provider probe + context
# length lookup just-in-time on the first compression attempt instead of
# at AIAgent.__init__. Saves ~400ms cold off every short session that
# never reaches the threshold (the vast majority of ``chat -q`` runs).
# The check itself sets ``agent._compression_warning`` so the
# status-callback replay machinery still emits the warning to the user
# the first time it would matter.
if not getattr(agent, "_compression_feasibility_checked", True):
try:
check_compression_model_feasibility(agent)
finally:
agent._compression_feasibility_checked = True
_pre_msg_count = len(messages)
logger.info(
"context compression started: session=%s messages=%d tokens=~%s model=%s focus=%r",
agent.session_id or "none", _pre_msg_count,
f"{approx_tokens:,}" if approx_tokens else "unknown", agent.model,
focus_topic,
)
agent._emit_status(
"🗜️ Compacting context — summarizing earlier conversation so I can continue..."
)
# ── Compression lock ────────────────────────────────────────────────
# Atomic, state.db-backed lock per session_id. Without this, two
# AIAgent instances that share the same session_id (most commonly the
# parent-turn agent and its background-review fork — see
# ``agent/background_review.py``: ``review_agent.session_id =
# agent.session_id``) can each call compress() on overlapping
# snapshots of the same conversation. Both succeed, both rotate
# ``agent.session_id`` to a fresh id, both create child sessions in
# state.db parented to the same old id. The gateway's SessionEntry
# only catches one rotation, so the other child becomes an orphan
# that silently accumulates writes — Damien's repro shape.
#
# Acquire keyed on the OLD session_id (the rotation target's parent),
# because that's the id that competing paths see and read from
# SessionEntry at the start of their own compression attempt.
#
# If we can't acquire the lock, another path is mid-compression on
# this session. Aborting is correct: the messages are unchanged, the
# other path's rotation will produce the canonical new session_id,
# and our caller's auto-compress loop sees ``len(returned) == len(input)``
# and stops retrying for this cycle. The session is NOT corrupted —
# we just sit out this round and let the winner finish.
_lock_db = getattr(agent, "_session_db", None)
_lock_sid = agent.session_id or ""
_lock_holder: Optional[str] = None
# Probe whether the lock subsystem is actually available on this
# SessionDB instance. A process running mismatched module versions
# (e.g. ``conversation_compression.py`` reloaded after a pull but the
# long-lived ``hermes_state.SessionDB`` class still bound to the
# pre-#34351 version in memory) has the call site but not the method.
# In that case ``try_acquire_compression_lock`` raises AttributeError —
# NOT a ``sqlite3.Error`` — so the method's own fail-open guard never
# runs and the exception propagates to the outer agent loop, which
# prints the error and retries. Because compression never succeeds,
# the token count never drops and the loop re-triggers compaction
# forever (the "API call #47/#48/#49 ... has no attribute
# try_acquire_compression_lock" spin). Fail OPEN here: if the lock
# subsystem is missing or broken in any unexpected way, skip locking
# and proceed with compression. Skipping the lock risks a rare
# concurrent-compression session fork; an infinite no-progress loop
# that never compresses at all is strictly worse.
if _lock_db is not None and _lock_sid:
_lock_holder = _compression_lock_holder(agent)
try:
_lock_acquired = _lock_db.try_acquire_compression_lock(
_lock_sid, _lock_holder
)
except Exception as _lock_err:
# Broken/absent lock subsystem (version skew, etc.). Log once
# per session and proceed WITHOUT the lock rather than letting
# the exception spin the outer loop.
_lock_holder = None # we don't own anything to release
if getattr(agent, "_last_compression_lock_error_sid", None) != _lock_sid:
agent._last_compression_lock_error_sid = _lock_sid
logger.warning(
"compression lock subsystem unavailable for session=%s "
"(%s: %s) — proceeding without lock. This usually means a "
"stale in-memory module after an update; restart the "
"process (or `hermes update`) to resync.",
_lock_sid, type(_lock_err).__name__, _lock_err,
)
_lock_acquired = True # treat as acquired-but-unlocked; proceed
if not _lock_acquired:
try:
existing = _lock_db.get_compression_lock_holder(_lock_sid)
except Exception:
existing = None
logger.warning(
"compression skipped: another path is compressing session=%s "
"(holder=%s) — returning messages unchanged to avoid session fork",
_lock_sid, existing,
)
_lock_holder = None # don't release a lock we don't own
# Surface to the user once — quiet for downstream auto-compress loops
if getattr(agent, "_last_compression_lock_warning_sid", None) != _lock_sid:
agent._last_compression_lock_warning_sid = _lock_sid
try:
agent._emit_warning(
"⚠ Skipping concurrent compression — another path "
"is already compressing this session. Will retry "
"after it finishes."
)
except Exception:
pass
_existing_sp = getattr(agent, "_cached_system_prompt", None)
if not _existing_sp:
_existing_sp = agent._build_system_prompt(system_message)
return messages, _existing_sp
def _release_lock() -> None:
"""Release the lock keyed on the OLD session_id (before rotation)."""
if _lock_db is not None and _lock_sid and _lock_holder:
try:
_lock_db.release_compression_lock(_lock_sid, _lock_holder)
except Exception as _rel_err:
logger.debug("compression lock release failed: %s", _rel_err)
# Notify external memory provider before compression discards context
if agent._memory_manager:
try:
agent._memory_manager.on_pre_compress(messages)
except Exception:
pass
try:
compressed = agent.context_compressor.compress(messages, current_tokens=approx_tokens, focus_topic=focus_topic, force=force)
except TypeError:
# Plugin context engine with strict signature that doesn't accept
# focus_topic / force — fall back to calling without them.
compressed = agent.context_compressor.compress(messages, current_tokens=approx_tokens)
except BaseException:
# ANY exception during compress() must release the lock so the
# session isn't permanently blocked from future compression.
_release_lock()
raise
# If compression aborted (aux LLM failed to produce a usable summary)
# the compressor returns the input messages unchanged. Surface the
# error to the user, skip the session-rotation work entirely (no
# session has logically ended), and let auto-compress callers detect
# the no-op via len(returned) == len(input).
if getattr(agent.context_compressor, "_last_compress_aborted", False):
_err = getattr(agent.context_compressor, "_last_summary_error", None) or "unknown error"
if getattr(agent, "_last_compression_summary_warning", None) != _err:
agent._last_compression_summary_warning = _err
agent._emit_warning(
f"⚠ Compression aborted: {_err}. "
"No messages were dropped — conversation continues unchanged. "
"Run /compress to retry, or /new to start a fresh session."
)
_existing_sp = getattr(agent, "_cached_system_prompt", None)
if not _existing_sp:
_existing_sp = agent._build_system_prompt(system_message)
_release_lock() # compression aborted — no rotation will happen
return messages, _existing_sp
summary_error = getattr(agent.context_compressor, "_last_summary_error", None)
if summary_error:
if getattr(agent, "_last_compression_summary_warning", None) != summary_error:
agent._last_compression_summary_warning = summary_error
agent._emit_warning(
f"⚠ Compression summary failed: {summary_error}. "
"Inserted a fallback context marker."
)
else:
# No hard failure — but did the configured aux model error out
# and get recovered by retrying on main? Surface that so users
# know their auxiliary.compression.model setting is broken even
# though compression succeeded.
_aux_fail_model = getattr(agent.context_compressor, "_last_aux_model_failure_model", None)
_aux_fail_err = getattr(agent.context_compressor, "_last_aux_model_failure_error", None)
if _aux_fail_model:
# Dedup on (model, error) so we don't spam on every compaction
_aux_key = (_aux_fail_model, _aux_fail_err)
if getattr(agent, "_last_aux_fallback_warning_key", None) != _aux_key:
agent._last_aux_fallback_warning_key = _aux_key
agent._emit_warning(
f" Configured compression model '{_aux_fail_model}' failed "
f"({_aux_fail_err or 'unknown error'}). Recovered using main model — "
"check auxiliary.compression.model in config.yaml."
)
todo_snapshot = agent._todo_store.format_for_injection()
if todo_snapshot:
compressed.append({"role": "user", "content": todo_snapshot})
agent._invalidate_system_prompt()
new_system_prompt = agent._build_system_prompt(system_message)
agent._cached_system_prompt = new_system_prompt
if agent._session_db:
try:
# Propagate title to the new session with auto-numbering
old_title = agent._session_db.get_session_title(agent.session_id)
# Trigger memory extraction on the old session before it rotates.
agent.commit_memory_session(messages)
agent._session_db.end_session(agent.session_id, "compression")
old_session_id = agent.session_id
agent.session_id = f"{datetime.now().strftime('%Y%m%d_%H%M%S')}_{uuid.uuid4().hex[:6]}"
try:
from gateway.session_context import set_current_session_id
set_current_session_id(agent.session_id)
except Exception:
os.environ["HERMES_SESSION_ID"] = agent.session_id
agent._session_db_created = False
agent._session_db.create_session(
session_id=agent.session_id,
source=agent.platform or os.environ.get("HERMES_SESSION_SOURCE", "cli"),
model=agent.model,
model_config=agent._session_init_model_config,
parent_session_id=old_session_id,
)
agent._session_db_created = True
# Auto-number the title for the continuation session
if old_title:
try:
new_title = agent._session_db.get_next_title_in_lineage(old_title)
agent._session_db.set_session_title(agent.session_id, new_title)
except (ValueError, Exception) as e:
logger.debug("Could not propagate title on compression: %s", e)
agent._session_db.update_system_prompt(agent.session_id, new_system_prompt)
# Reset flush cursor — new session starts with no messages written
agent._last_flushed_db_idx = 0
except Exception as e:
logger.warning("Session DB compression split failed — new session will NOT be indexed: %s", e)
# Notify the context engine that the session_id rotated because of
# compression (not a fresh /new). Plugin engines (e.g. hermes-lcm) use
# boundary_reason="compression" to preserve DAG lineage across the
# rollover instead of re-initializing fresh per-session state.
# See hermes-lcm#68. Built-in ContextCompressor ignores kwargs.
try:
_old_sid = locals().get("old_session_id")
if _old_sid and hasattr(agent.context_compressor, "on_session_start"):
agent.context_compressor.on_session_start(
agent.session_id or "",
boundary_reason="compression",
old_session_id=_old_sid,
conversation_id=getattr(agent, "_gateway_session_key", None),
)
except Exception as _ce_err:
logger.debug("context engine on_session_start (compression): %s", _ce_err)
# Notify memory providers of the compression-driven session_id rotation
# so provider-cached per-session state (Hindsight's _document_id,
# accumulated turn buffers, counters) refreshes. reset=False because
# the logical conversation continues; only the id and DB row rolled
# over. See #6672.
try:
_old_sid = locals().get("old_session_id")
if _old_sid and agent._memory_manager:
agent._memory_manager.on_session_switch(
agent.session_id or "",
parent_session_id=_old_sid,
reset=False,
reason="compression",
)
except Exception as _me_err:
logger.debug("memory manager on_session_switch (compression): %s", _me_err)
# Warn on repeated compressions (quality degrades with each pass)
_cc = agent.context_compressor.compression_count
if _cc >= 2:
agent._vprint(
f"{agent.log_prefix}⚠️ Session compressed {_cc} times — "
f"accuracy may degrade. Consider /new to start fresh.",
force=True,
)
# Update token estimate after compaction so pressure calculations
# use the post-compression count, not the stale pre-compression one.
# Use estimate_request_tokens_rough() so tool schemas are included —
# with 50+ tools enabled, schemas alone can add 20-30K tokens, and
# omitting them delays the next compression cycle far past the
# configured threshold (issue #14695).
_compressed_est = estimate_request_tokens_rough(
compressed,
system_prompt=new_system_prompt or "",
tools=agent.tools or None,
)
agent.context_compressor.last_prompt_tokens = _compressed_est
agent.context_compressor.last_completion_tokens = 0
# Clear the file-read dedup cache. After compression the original
# read content is summarised away — if the model re-reads the same
# file it needs the full content, not a "file unchanged" stub.
try:
from tools.file_tools import reset_file_dedup
reset_file_dedup(task_id)
except Exception:
pass
logger.info(
"context compression done: session=%s messages=%d->%d tokens=~%s",
agent.session_id or "none", _pre_msg_count, len(compressed),
f"{_compressed_est:,}",
)
# Release the lock on the OLD session_id only AFTER rotation completed
# and all post-rotation bookkeeping (memory manager, context engine,
# file dedup) ran. A concurrent path that wakes up the moment we
# release will see the NEW session_id in state.db / SessionEntry and
# acquire on that — no race against our just-finished work.
_release_lock()
return compressed, new_system_prompt
def try_shrink_image_parts_in_messages(api_messages: list) -> bool:
"""Re-encode all native image parts at a smaller size to recover from
image-too-large errors (Anthropic 5 MB, unknown other providers).
Mutates ``api_messages`` in place. Returns True if any image part was
actually replaced, False if there were no image parts to shrink or
Pillow couldn't help (caller should surface the original error).
Strategy: look for ``image_url`` / ``input_image`` parts carrying a
``data:image/...;base64,...`` payload. For each one whose encoded
size exceeds 4 MB (a safe target that slides under Anthropic's 5 MB
ceiling with header overhead), write the base64 to a tempfile, call
``vision_tools._resize_image_for_vision`` to produce a smaller data
URL, and substitute it in place.
Non-data-URL images (http/https URLs) are not touched — the provider
fetches those itself and the size limit is different.
"""
if not api_messages:
return False
try:
from tools.vision_tools import _resize_image_for_vision
except Exception as exc:
logger.warning("image-shrink recovery: vision_tools unavailable — %s", exc)
return False
# 4 MB target leaves comfortable headroom under Anthropic's 5 MB.
# Non-Anthropic providers we haven't observed rejecting are fine with
# much larger; shrinking to 4 MB here loses quality but only fires
# after a confirmed provider rejection, so the alternative is failure.
target_bytes = 4 * 1024 * 1024
changed_count = 0
def _shrink_data_url(url: str) -> Optional[str]:
"""Return a smaller data URL, or None if shrink can't help."""
if not isinstance(url, str) or not url.startswith("data:"):
return None
if len(url) <= target_bytes:
# This specific image wasn't the oversized one.
return None
try:
header, _, data = url.partition(",")
mime = "image/jpeg"
if header.startswith("data:"):
mime_part = header[len("data:"):].split(";", 1)[0].strip()
if mime_part.startswith("image/"):
mime = mime_part
import base64 as _b64
raw = _b64.b64decode(data)
suffix = {
"image/png": ".png", "image/gif": ".gif", "image/webp": ".webp",
"image/jpeg": ".jpg", "image/jpg": ".jpg", "image/bmp": ".bmp",
}.get(mime, ".jpg")
tmp = tempfile.NamedTemporaryFile(
prefix="hermes_shrink_", suffix=suffix, delete=False,
)
try:
tmp.write(raw)
tmp.close()
resized = _resize_image_for_vision(
Path(tmp.name),
mime_type=mime,
max_base64_bytes=target_bytes,
)
finally:
try:
Path(tmp.name).unlink(missing_ok=True)
except Exception:
pass
if not resized or len(resized) >= len(url):
# Shrink didn't help (or made it bigger — corrupt input?).
return None
return resized
except Exception as exc:
logger.warning("image-shrink recovery: re-encode failed — %s", exc)
return None
for msg in api_messages:
if not isinstance(msg, dict):
continue
content = msg.get("content")
if not isinstance(content, list):
continue
for part in content:
if not isinstance(part, dict):
continue
ptype = part.get("type")
if ptype not in {"image_url", "input_image"}:
continue
image_value = part.get("image_url")
# OpenAI chat.completions: {"image_url": {"url": "data:..."}}
# OpenAI Responses: {"image_url": "data:..."}
if isinstance(image_value, dict):
url = image_value.get("url", "")
resized = _shrink_data_url(url)
if resized:
image_value["url"] = resized
changed_count += 1
elif isinstance(image_value, str):
resized = _shrink_data_url(image_value)
if resized:
part["image_url"] = resized
changed_count += 1
if changed_count:
logger.info(
"image-shrink recovery: re-encoded %d image part(s) to fit under %.0f MB",
changed_count, target_bytes / (1024 * 1024),
)
return changed_count > 0
__all__ = [
"check_compression_model_feasibility",
"replay_compression_warning",
"compress_context",
"try_shrink_image_parts_in_messages",
]

File diff suppressed because it is too large Load Diff

View File

@@ -30,28 +30,6 @@ _DEFAULT_TIMEOUT_SECONDS = 900.0
_TOOL_CALL_BLOCK_RE = re.compile(r"<tool_call>\s*(\{.*?\})\s*</tool_call>", re.DOTALL)
_TOOL_CALL_JSON_RE = re.compile(r"\{\s*\"id\"\s*:\s*\"[^\"]+\"\s*,\s*\"type\"\s*:\s*\"function\"\s*,\s*\"function\"\s*:\s*\{.*?\}\s*\}", re.DOTALL)
# Stderr fingerprint of the deprecated `gh copilot` CLI extension
# (https://github.blog/changelog/2025-09-25-upcoming-deprecation-of-gh-copilot-cli-extension).
# We require BOTH the literal product name ("gh-copilot") AND a deprecation
# marker, so generic stderr from the NEW `@github/copilot` CLI — whose repo
# is github.com/github/copilot-cli and which legitimately mentions "copilot-cli"
# in its own banners and error messages — doesn't get misclassified as the
# deprecated extension.
_DEPRECATION_REQUIRED = ("gh-copilot",)
_DEPRECATION_MARKERS = (
"has been deprecated",
"no commands will be executed",
)
def _is_gh_copilot_deprecation_message(stderr_text: str) -> bool:
"""True iff stderr looks like the deprecated gh-copilot extension's banner."""
lower = stderr_text.lower()
if not any(req in lower for req in _DEPRECATION_REQUIRED):
return False
return any(marker in lower for marker in _DEPRECATION_MARKERS)
def _resolve_command() -> str:
return (
@@ -528,21 +506,6 @@ class CopilotACPClient:
stderr_text = "\n".join(stderr_tail).strip()
if proc.poll() is not None and stderr_text:
if _is_gh_copilot_deprecation_message(stderr_text):
raise RuntimeError(
"Hermes ACP mode requires the NEW GitHub Copilot CLI "
"(github.com/github/copilot-cli), but the binary it just "
"spawned is the deprecated `gh copilot` extension.\n\n"
"Install the new CLI:\n"
" npm install -g @github/copilot\n"
" # then verify with: copilot --help\n\n"
"If `copilot` already resolves to the new CLI but you still see this,\n"
"point Hermes at it explicitly:\n"
" export HERMES_COPILOT_ACP_COMMAND=/path/to/new/copilot\n\n"
"Alternative: use the `copilot` provider (no ACP, hits the Copilot API\n"
"directly with a Copilot subscription token) via `hermes setup`.\n\n"
f"Original error:\n{stderr_text}"
)
raise RuntimeError(f"Copilot ACP process exited early: {stderr_text}")
raise TimeoutError(f"Timed out waiting for Copilot ACP response to {method}.")
@@ -636,10 +599,7 @@ class CopilotACPClient:
block_error = get_read_block_error(str(path))
if block_error:
raise PermissionError(block_error)
try:
content = path.read_text()
except FileNotFoundError:
content = ""
content = path.read_text() if path.exists() else ""
line = params.get("line")
limit = params.get("limit")
if isinstance(line, int) and line > 1:

View File

@@ -1,174 +0,0 @@
"""Credential-pool disk-boundary sanitization helpers.
These helpers define which credential-pool entries are references to borrowed
runtime secrets and strip raw values before those entries are written to
``auth.json``. They intentionally have no dependency on ``hermes_cli.auth`` so
both the pool model and the final auth-store write boundary can share the same
policy without import cycles.
"""
from __future__ import annotations
import hashlib
import re
from typing import Any, Dict, Mapping
# Sources Hermes owns and can intentionally persist in auth.json. Everything
# else with a non-empty source is treated as borrowed/reference-only by default
# so future external secret providers fail closed at the disk boundary.
_PERSISTABLE_PROVIDER_SOURCES = frozenset({
("anthropic", "hermes_pkce"),
("minimax-oauth", "oauth"),
("nous", "device_code"),
("openai-codex", "device_code"),
("xai-oauth", "loopback_pkce"),
})
_SAFE_SECRETISH_METADATA_KEYS = frozenset({
"secret_fingerprint",
"secret_source",
"token_type",
"scope",
"client_id",
"agent_key_id",
"agent_key_expires_at",
"agent_key_expires_in",
"agent_key_reused",
"agent_key_obtained_at",
"expires_at",
"expires_at_ms",
"expires_in",
"last_refresh",
"last_status",
"last_status_at",
"last_error_code",
"last_error_reason",
"last_error_message",
"last_error_reset_at",
})
_SECRET_VALUE_KEYS = frozenset({
"access_token",
"refresh_token",
"agent_key",
"api_key",
"apikey",
"api_token",
"auth_token",
"authorization",
"bearer_token",
"client_secret",
"credential",
"credentials",
"id_token",
"oauth_token",
"private_key",
"secret_key",
"session_token",
"password",
"secret",
"token",
"tokens",
})
_SECRET_VALUE_SUFFIXES = (
"_api_key",
"_api_token",
"_access_token",
"_auth_token",
"_refresh_token",
"_bearer_token",
"_client_secret",
"_id_token",
"_oauth_token",
"_private_key",
"_session_token",
"_secret_key",
"_password",
"_secret",
"_token",
"_key",
)
_CAMEL_CASE_BOUNDARY = re.compile(r"(?<=[a-z0-9])(?=[A-Z])")
def _normalize_key(key: Any) -> str:
raw = str(key or "").strip()
raw = _CAMEL_CASE_BOUNDARY.sub("_", raw)
return raw.lower().replace("-", "_").replace(".", "_")
def is_borrowed_credential_source(source: Any, provider_id: Any = None) -> bool:
"""Return True when ``source`` points at a borrowed/reference-only secret."""
normalized_source = str(source or "").strip().lower()
if not normalized_source:
return False
if normalized_source == "manual" or normalized_source.startswith("manual:"):
return False
normalized_provider = str(provider_id or "").strip().lower()
return (normalized_provider, normalized_source) not in _PERSISTABLE_PROVIDER_SOURCES
def _is_secret_payload_key(key: Any) -> bool:
normalized = _normalize_key(key)
if not normalized or normalized in _SAFE_SECRETISH_METADATA_KEYS:
return False
if normalized in _SECRET_VALUE_KEYS:
return True
return normalized.endswith(_SECRET_VALUE_SUFFIXES)
def _fingerprint_value(value: Any) -> str | None:
if value is None:
return None
text = str(value)
if not text:
return None
digest = hashlib.sha256(text.encode("utf-8", errors="surrogatepass")).hexdigest()
return f"sha256:{digest[:16]}"
def _credential_secret_fingerprint(payload: Mapping[str, Any]) -> str | None:
for key in ("agent_key", "access_token", "refresh_token", "api_key", "token", "secret"):
fingerprint = _fingerprint_value(payload.get(key))
if fingerprint:
return fingerprint
for key, value in payload.items():
if _is_secret_payload_key(key):
fingerprint = _fingerprint_value(value)
if fingerprint:
return fingerprint
existing = payload.get("secret_fingerprint")
if isinstance(existing, str) and existing.startswith("sha256:"):
return existing
return None
def sanitize_borrowed_credential_payload(
payload: Mapping[str, Any],
provider_id: Any = None,
) -> Dict[str, Any]:
"""Return a disk-safe credential-pool payload.
Owned sources (manual entries and Hermes-owned OAuth/device-code state)
pass through unchanged. Borrowed/reference-only sources keep labels,
source refs, status/cooldown metadata, counters, and a non-reversible
fingerprint, but raw secret value fields are removed.
"""
result = dict(payload)
if not is_borrowed_credential_source(result.get("source"), provider_id):
return result
fingerprint = _credential_secret_fingerprint(result)
sanitized = {
key: value
for key, value in result.items()
if not _is_secret_payload_key(key)
}
if fingerprint:
sanitized["secret_fingerprint"] = fingerprint
return sanitized

View File

@@ -10,18 +10,15 @@ import time
import uuid
import re
from dataclasses import dataclass, fields, replace
from datetime import datetime, timezone
from datetime import datetime
from typing import Any, Dict, List, Optional, Set, Tuple
from hermes_constants import OPENROUTER_BASE_URL
from hermes_cli.config import load_env
from agent.credential_persistence import (
is_borrowed_credential_source,
sanitize_borrowed_credential_payload,
)
from hermes_cli.config import get_env_value, load_env
import hermes_cli.auth as auth_mod
from hermes_cli.auth import (
CODEX_ACCESS_TOKEN_REFRESH_SKEW_SECONDS,
DEFAULT_AGENT_KEY_MIN_TTL_SECONDS,
PROVIDER_REGISTRY,
_auth_store_lock,
_codex_access_token_is_expiring,
@@ -32,7 +29,6 @@ from hermes_cli.auth import (
_resolve_zai_base_url,
_save_auth_store,
_save_provider_state,
_store_provider_state,
read_credential_pool,
write_credential_pool,
)
@@ -54,38 +50,6 @@ def _load_config_safe() -> Optional[dict]:
STATUS_OK = "ok"
STATUS_EXHAUSTED = "exhausted"
# Terminal failure — the credential will never recover on its own. Used for
# upstream-permanent OAuth states like ``token_invalidated`` / ``token_revoked``
# where retrying after a TTL cooldown is guaranteed to fail. ``DEAD`` entries
# are excluded from rotation unconditionally and only clear when an explicit
# write-side sync (e.g. ``_save_codex_tokens`` after a fresh device-code
# login) rewrites the tokens.
STATUS_DEAD = "dead"
# OAuth error reasons that indicate the credential is permanently invalid
# server-side and cannot be recovered by retry/refresh. Sourced from
# OpenAI Codex Responses API, Anthropic, xAI, and Google OAuth spec.
_TERMINAL_AUTH_REASONS = frozenset({
"token_invalidated", # OpenAI Codex: "Your authentication token has been invalidated."
"token_revoked", # OAuth 2.0 RFC 7009: token explicitly revoked
"invalid_token", # RFC 6750: bearer token is malformed/expired/revoked
"invalid_grant", # RFC 6749: refresh_token rejected during refresh
"unauthorized_client", # RFC 6749: client no longer authorized
"refresh_token_reused", # Single-use refresh token consumed by another process
})
# How long a DEAD manual credential is preserved before being pruned.
# Manual entries (``manual:*``) are independent credentials with no singleton
# to re-seed from, so pruning them after a quiet window cleans up dead state
# without losing recoverability — the user always has the option to re-add
# via ``hermes auth add``.
#
# Singleton-seeded entries (``device_code``, ``loopback_pkce``, ``claude_code``)
# are NOT pruned because ``_seed_from_singletons`` would just re-create them
# on the next ``load_pool()`` with the same stale singleton tokens, defeating
# the cleanup. They remain in the pool marked DEAD until an explicit re-auth
# write-side sync (``_save_codex_tokens`` etc.) clears the status.
DEAD_MANUAL_PRUNE_TTL_SECONDS = 24 * 60 * 60 # 24 hours
AUTH_TYPE_OAUTH = "oauth"
AUTH_TYPE_API_KEY = "api_key"
@@ -121,7 +85,7 @@ CUSTOM_POOL_PREFIX = "custom:"
_EXTRA_KEYS = frozenset({
"token_type", "scope", "client_id", "portal_base_url", "obtained_at",
"expires_in", "agent_key_id", "agent_key_expires_in", "agent_key_reused",
"agent_key_obtained_at", "tls", "secret_source", "secret_fingerprint",
"agent_key_obtained_at", "tls",
})
@@ -164,9 +128,6 @@ class PooledCredential:
def from_dict(cls, provider: str, payload: Dict[str, Any]) -> "PooledCredential":
field_names = {f.name for f in fields(cls) if f.name != "provider"}
data = {k: payload.get(k) for k in field_names if k in payload}
# Rehydrated last_status_at may be an ISO string from to_dict() — normalize to float epoch
if "last_status_at" in data and isinstance(data["last_status_at"], str):
data["last_status_at"] = _parse_absolute_timestamp(data["last_status_at"])
extra = {k: payload[k] for k in _EXTRA_KEYS if k in payload and payload[k] is not None}
data["extra"] = extra
data.setdefault("id", uuid.uuid4().hex[:6])
@@ -196,28 +157,12 @@ class PooledCredential:
for k, v in self.extra.items():
if v is not None:
result[k] = v
return sanitize_borrowed_credential_payload(result, self.provider)
return result
@property
def runtime_api_key(self) -> str:
if self.provider == "nous":
# Nous stores the runtime inference credential in agent_key for
# compatibility. It must be a NAS invoke JWT.
for token, expires_at in (
(self.agent_key, self.agent_key_expires_at),
(self.access_token, self.expires_at),
):
if (
isinstance(token, str)
and token.strip()
and auth_mod._nous_invoke_jwt_is_usable(
token,
scope=getattr(self, "scope", None),
expires_at=expires_at,
)
):
return token.strip()
return ""
return str(self.agent_key or self.access_token or "")
return str(self.access_token or "")
@property
@@ -294,16 +239,6 @@ def _extract_retry_delay_seconds(message: str) -> Optional[float]:
sec_match = re.search(r"retry\s+(?:after\s+)?(\d+(?:\.\d+)?)\s*(?:sec|secs|seconds|s\b)", message, re.IGNORECASE)
if sec_match:
return float(sec_match.group(1))
# "Resets in 4hr 5min" format used by OpenCode Go weekly usage limits
hr_min_match = re.search(r"resets?\s+in\s+(\d+)\s*hr\s+(\d+)\s*min", message, re.IGNORECASE)
if hr_min_match:
return int(hr_min_match.group(1)) * 3600 + int(hr_min_match.group(2)) * 60
hr_only_match = re.search(r"resets?\s+in\s+(\d+)\s*hr\b", message, re.IGNORECASE)
if hr_only_match:
return int(hr_only_match.group(1)) * 3600
min_only_match = re.search(r"resets?\s+in\s+(\d+)\s*min\b", message, re.IGNORECASE)
if min_only_match:
return int(min_only_match.group(1)) * 60
return None
@@ -483,29 +418,6 @@ class CredentialPool:
[entry.to_dict() for entry in self._entries],
)
def _is_terminal_auth_failure(
self,
status_code: Optional[int],
normalized_error: Dict[str, Any],
) -> bool:
"""Detect upstream-permanent OAuth failures that won't recover on TTL.
Only fires for 401 responses whose error code/reason matches a known
terminal OAuth state (token_invalidated, token_revoked, invalid_grant,
etc.). Distinguishes permanent failures from transient ones like
token_expired (refreshable) or generic 401 without a specific reason
(could be a server-side glitch worth retrying).
Returns False for non-401 status codes — 429 rate limits and 402
billing failures are transient by nature and should keep TTL semantics.
"""
if status_code != 401:
return False
reason = normalized_error.get("reason")
if not isinstance(reason, str):
return False
return reason.strip().lower() in _TERMINAL_AUTH_REASONS
def _mark_exhausted(
self,
entry: PooledCredential,
@@ -513,20 +425,9 @@ class CredentialPool:
error_context: Optional[Dict[str, Any]] = None,
) -> PooledCredential:
normalized_error = _normalize_error_context(error_context)
# Permanent OAuth failures (token_invalidated, token_revoked, etc.)
# transition to STATUS_DEAD instead of STATUS_EXHAUSTED. Without this,
# a revoked credential gets a 1-hour TTL cooldown and then re-enters
# rotation, failing immediately every hour until the user manually
# removes it (issue #32849). DEAD entries are excluded from rotation
# unconditionally and only clear via an explicit re-auth write-side
# sync (``_save_codex_tokens`` after a fresh device-code login).
if self._is_terminal_auth_failure(status_code, normalized_error):
terminal_status = STATUS_DEAD
else:
terminal_status = STATUS_EXHAUSTED
updated = replace(
entry,
last_status=terminal_status,
last_status=STATUS_EXHAUSTED,
last_status_at=time.time(),
last_error_code=status_code,
last_error_reason=normalized_error.get("reason"),
@@ -638,64 +539,6 @@ class CredentialPool:
logger.debug("Failed to sync Codex entry from auth.json: %s", exc)
return entry
def _sync_xai_oauth_entry_from_auth_store(self, entry: PooledCredential) -> PooledCredential:
"""Sync an xAI OAuth pool entry from auth.json if tokens differ.
xAI OAuth refresh tokens are single-use. When another Hermes process
(or another profile sharing the same auth.json) refreshes the token,
it writes the new pair to ``providers["xai-oauth"]["tokens"]`` under
``_auth_store_lock``. Without this resync, our in-memory pool entry
keeps the consumed refresh_token and the next ``_refresh_entry`` call
would replay it and get a ``refresh_token_reused``-style 4xx.
Only applies to entries seeded from the singleton (``loopback_pkce``);
manually added entries (``manual:xai_pkce``) are independent
credentials with their own refresh-token lifecycle.
"""
if self.provider != "xai-oauth" or entry.source != "loopback_pkce":
return entry
try:
with _auth_store_lock():
auth_store = _load_auth_store()
state = _load_provider_state(auth_store, "xai-oauth")
if not isinstance(state, dict):
return entry
tokens = state.get("tokens")
if not isinstance(tokens, dict):
return entry
store_access = tokens.get("access_token", "")
store_refresh = tokens.get("refresh_token", "")
entry_access = entry.access_token or ""
entry_refresh = entry.refresh_token or ""
if store_access and (
store_access != entry_access
or (store_refresh and store_refresh != entry_refresh)
):
logger.debug(
"Pool entry %s: syncing xAI OAuth tokens from auth.json "
"(refreshed by another process)",
entry.id,
)
field_updates: Dict[str, Any] = {
"access_token": store_access,
"refresh_token": store_refresh or entry.refresh_token,
"last_status": None,
"last_status_at": None,
"last_error_code": None,
"last_error_reason": None,
"last_error_message": None,
"last_error_reset_at": None,
}
if state.get("last_refresh"):
field_updates["last_refresh"] = state["last_refresh"]
updated = replace(entry, **field_updates)
self._replace_entry(entry, updated)
self._persist()
return updated
except Exception as exc:
logger.debug("Failed to sync xAI OAuth entry from auth.json: %s", exc)
return entry
def _sync_nous_entry_from_auth_store(self, entry: PooledCredential) -> PooledCredential:
"""Sync a Nous pool entry from auth.json if tokens differ.
@@ -716,35 +559,18 @@ class CredentialPool:
return entry
store_refresh = state.get("refresh_token", "")
store_access = state.get("access_token", "")
comparable_updates = {
"access_token": store_access,
"refresh_token": store_refresh,
"expires_at": state.get("expires_at"),
"agent_key": state.get("agent_key"),
"agent_key_expires_at": state.get("agent_key_expires_at"),
"inference_base_url": state.get("inference_base_url"),
}
should_sync = any(
value not in (None, "") and getattr(entry, key, None) != value
for key, value in comparable_updates.items()
)
if should_sync:
if store_refresh and store_refresh != entry.refresh_token:
logger.debug(
"Pool entry %s: syncing Nous state from auth.json",
"Pool entry %s: syncing tokens from auth.json (Nous refresh token changed)",
entry.id,
)
field_updates: Dict[str, Any] = {
"access_token": store_access,
"refresh_token": store_refresh,
"last_status": None,
"last_status_at": None,
"last_error_code": None,
"last_error_reason": None,
"last_error_message": None,
"last_error_reset_at": None,
}
if store_access:
field_updates["access_token"] = store_access
if store_refresh:
field_updates["refresh_token"] = store_refresh
if state.get("expires_at"):
field_updates["expires_at"] = state["expires_at"]
if state.get("agent_key"):
@@ -778,22 +604,9 @@ class CredentialPool:
re-seeding a consumed single-use refresh token.
Applies to any OAuth provider whose singleton lives in auth.json
(currently Nous, OpenAI Codex, and xAI Grok OAuth).
``set_active=False`` on every write: a pool sync-back is a
token-rotation side effect, not the user choosing a provider.
Using ``_save_provider_state`` (which sets ``active_provider``)
here would mean every Nous/Codex/xAI refresh in a multi-provider
setup silently flips the ``active_provider`` flag — the next
``hermes`` invocation that defaults to the active provider
(e.g. setup wizard, ``hermes auth status``) would land on
whatever provider happened to refresh last, not whatever the
user actually chose.
(currently Nous and OpenAI Codex).
"""
# Only sync entries that were seeded *from* a singleton. Manually
# added pool entries (source="manual:*") are independent credentials
# and must not write back to the singleton.
if entry.source not in {"device_code", "loopback_pkce"}:
if entry.source != "device_code":
return
try:
with _auth_store_lock():
@@ -819,7 +632,7 @@ class CredentialPool:
state[extra_key] = val
if entry.inference_base_url:
state["inference_base_url"] = entry.inference_base_url
_store_provider_state(auth_store, "nous", state, set_active=False)
_save_provider_state(auth_store, "nous", state)
elif self.provider == "openai-codex":
state = _load_provider_state(auth_store, "openai-codex")
@@ -833,21 +646,7 @@ class CredentialPool:
tokens["refresh_token"] = entry.refresh_token
if entry.last_refresh:
state["last_refresh"] = entry.last_refresh
_store_provider_state(auth_store, "openai-codex", state, set_active=False)
elif self.provider == "xai-oauth":
state = _load_provider_state(auth_store, "xai-oauth")
if not isinstance(state, dict):
return
tokens = state.get("tokens")
if not isinstance(tokens, dict):
return
tokens["access_token"] = entry.access_token
if entry.refresh_token:
tokens["refresh_token"] = entry.refresh_token
if entry.last_refresh:
state["last_refresh"] = entry.last_refresh
_store_provider_state(auth_store, "xai-oauth", state, set_active=False)
_save_provider_state(auth_store, "openai-codex", state)
else:
return
@@ -890,13 +689,6 @@ class CredentialPool:
except Exception as wexc:
logger.debug("Failed to write refreshed token to credentials file: %s", wexc)
elif self.provider == "openai-codex":
# Adopt fresher tokens from auth.json before spending the
# refresh_token — single-use tokens consumed by another Hermes
# process sharing the same auth.json singleton would otherwise
# trigger ``refresh_token_reused`` on the next POST.
synced = self._sync_codex_entry_from_auth_store(entry)
if synced is not entry:
entry = synced
refreshed = auth_mod.refresh_codex_oauth_pure(
entry.access_token,
entry.refresh_token,
@@ -907,33 +699,40 @@ class CredentialPool:
refresh_token=refreshed["refresh_token"],
last_refresh=refreshed.get("last_refresh"),
)
elif self.provider == "xai-oauth":
# Adopt fresher tokens from auth.json before spending the
# refresh_token — single-use tokens consumed by another
# process (or another profile sharing the singleton) would
# otherwise trigger ``refresh_token_reused`` on the next
# POST. Only meaningful for singleton-seeded entries.
synced = self._sync_xai_oauth_entry_from_auth_store(entry)
if synced is not entry:
entry = synced
refreshed = auth_mod.refresh_xai_oauth_pure(
entry.access_token,
entry.refresh_token,
)
updated = replace(
entry,
access_token=refreshed["access_token"],
refresh_token=refreshed["refresh_token"],
last_refresh=refreshed.get("last_refresh"),
)
elif self.provider == "nous":
synced = self._sync_nous_entry_from_auth_store(entry)
if synced is not entry:
entry = synced
auth_mod.resolve_nous_runtime_credentials(
nous_state = {
"access_token": entry.access_token,
"refresh_token": entry.refresh_token,
"client_id": entry.client_id,
"portal_base_url": entry.portal_base_url,
"inference_base_url": entry.inference_base_url,
"token_type": entry.token_type,
"scope": entry.scope,
"obtained_at": entry.obtained_at,
"expires_at": entry.expires_at,
"agent_key": entry.agent_key,
"agent_key_expires_at": entry.agent_key_expires_at,
"tls": entry.tls,
}
refreshed = auth_mod.refresh_nous_oauth_from_state(
nous_state,
min_key_ttl_seconds=DEFAULT_AGENT_KEY_MIN_TTL_SECONDS,
force_refresh=force,
force_mint=force,
)
updated = self._sync_nous_entry_from_auth_store(entry)
# Apply returned fields: dataclass fields via replace, extras via dict update
field_updates = {}
extra_updates = dict(entry.extra)
_field_names = {f.name for f in fields(entry)}
for k, v in refreshed.items():
if k in _field_names:
field_updates[k] = v
elif k in _EXTRA_KEYS:
extra_updates[k] = v
updated = replace(entry, extra=extra_updates, **field_updates)
else:
return entry
except Exception as exc:
@@ -978,140 +777,6 @@ class CredentialPool:
# Credentials file had a valid (non-expired) token — use it directly
logger.debug("Credentials file has valid token, using without refresh")
return synced
# For xai-oauth: same race as nous — another process may have
# consumed the refresh token between our proactive sync and the
# HTTP call. Re-check auth.json and adopt the fresh tokens if
# they have rotated since. Only meaningful for singleton-seeded
# (loopback_pkce) entries; manual entries don't share state with
# the singleton.
if self.provider == "xai-oauth":
synced = self._sync_xai_oauth_entry_from_auth_store(entry)
if synced.refresh_token != entry.refresh_token:
logger.debug(
"xAI OAuth refresh failed but auth.json has newer tokens — adopting"
)
updated = replace(
synced,
last_status=STATUS_OK,
last_status_at=None,
last_error_code=None,
last_error_reason=None,
last_error_message=None,
last_error_reset_at=None,
)
self._replace_entry(synced, updated)
self._persist()
return updated
# Terminal error: auth.json has no newer tokens — the stored
# refresh_token is dead. Clear it from auth.json so the next
# session does not re-seed the same revoked credentials, and
# remove all singleton-seeded (loopback_pkce) entries from the
# in-memory pool. Mirrors the Nous quarantine path above.
if auth_mod._is_terminal_xai_oauth_refresh_error(exc):
logger.debug(
"xAI OAuth refresh token is terminally invalid; clearing local token state"
)
try:
with _auth_store_lock():
auth_store = _load_auth_store()
state = _load_provider_state(auth_store, "xai-oauth") or {}
if isinstance(state, dict):
tokens = state.get("tokens") or {}
if isinstance(tokens, dict):
store_refresh = str(tokens.get("refresh_token") or "").strip()
entry_refresh = str(entry.refresh_token or "").strip()
if not store_refresh or store_refresh == entry_refresh:
tokens.pop("access_token", None)
tokens.pop("refresh_token", None)
state["tokens"] = tokens
state["last_auth_error"] = {
"provider": "xai-oauth",
"code": getattr(exc, "code", "unknown"),
"message": str(exc),
"reason": "credential_pool_refresh_failure",
"relogin_required": True,
"at": datetime.now(timezone.utc).isoformat(),
}
_save_provider_state(auth_store, "xai-oauth", state)
_save_auth_store(auth_store)
except Exception as clear_exc:
logger.debug(
"Failed to clear terminal xAI OAuth state: %s", clear_exc
)
self._entries = [
item for item in self._entries
if item.source != "loopback_pkce"
]
if self._current_id == entry.id:
self._current_id = None
self._persist()
return None
# For openai-codex: same race as xAI/nous — another Hermes process
# may have consumed the refresh token between our proactive sync
# and the HTTP call. Re-check auth.json and adopt the fresh tokens
# if they have rotated since.
if self.provider == "openai-codex":
synced = self._sync_codex_entry_from_auth_store(entry)
if synced.refresh_token != entry.refresh_token:
logger.debug(
"Codex OAuth refresh failed but auth.json has newer tokens — adopting"
)
updated = replace(
synced,
last_status=STATUS_OK,
last_status_at=None,
last_error_code=None,
last_error_reason=None,
last_error_message=None,
last_error_reset_at=None,
)
self._replace_entry(synced, updated)
self._persist()
return updated
# Terminal error: auth.json has no newer tokens — the stored
# refresh_token is dead. Clear it from auth.json so the next
# session does not re-seed the same revoked credentials, and
# remove all singleton-seeded (device_code) entries from the
# in-memory pool. Mirrors the xAI and Nous quarantine paths.
if auth_mod._is_terminal_codex_oauth_refresh_error(exc):
logger.debug(
"Codex OAuth refresh token is terminally invalid; clearing local token state"
)
try:
with _auth_store_lock():
auth_store = _load_auth_store()
state = _load_provider_state(auth_store, "openai-codex") or {}
if isinstance(state, dict):
tokens = state.get("tokens") or {}
if isinstance(tokens, dict):
store_refresh = str(tokens.get("refresh_token") or "").strip()
entry_refresh = str(entry.refresh_token or "").strip()
if not store_refresh or store_refresh == entry_refresh:
tokens.pop("access_token", None)
tokens.pop("refresh_token", None)
state["tokens"] = tokens
state["last_auth_error"] = {
"provider": "openai-codex",
"code": getattr(exc, "code", "unknown"),
"message": str(exc),
"reason": "credential_pool_refresh_failure",
"relogin_required": True,
"at": datetime.now(timezone.utc).isoformat(),
}
_save_provider_state(auth_store, "openai-codex", state)
_save_auth_store(auth_store)
except Exception as clear_exc:
logger.debug(
"Failed to clear terminal Codex OAuth state: %s", clear_exc
)
self._entries = [
item for item in self._entries
if item.source != "device_code"
]
if self._current_id == entry.id:
self._current_id = None
self._persist()
return None
# For nous: another process may have consumed the refresh token
# between our proactive sync and the HTTP call. Re-sync from
# auth.json and adopt the fresh tokens if available.
@@ -1132,49 +797,6 @@ class CredentialPool:
self._persist()
self._sync_device_code_entry_to_auth_store(updated)
return updated
if auth_mod._is_terminal_nous_refresh_error(exc):
logger.debug("Nous refresh token is terminally invalid; clearing local token state")
try:
with _auth_store_lock():
auth_store = _load_auth_store()
state = _load_provider_state(auth_store, "nous") or {
"client_id": entry.client_id,
"portal_base_url": entry.portal_base_url,
"inference_base_url": entry.inference_base_url,
"token_type": entry.token_type,
"scope": entry.scope,
"tls": entry.tls,
}
store_refresh = str(state.get("refresh_token") or "").strip()
entry_refresh = str(entry.refresh_token or "").strip()
if not store_refresh or store_refresh == entry_refresh:
auth_mod._quarantine_nous_oauth_state(
state,
exc,
reason="credential_pool_refresh_failure",
)
auth_mod._quarantine_nous_pool_entries(
auth_store,
exc,
reason="credential_pool_refresh_failure",
)
_save_provider_state(auth_store, "nous", state)
_save_auth_store(auth_store)
except Exception as clear_exc:
logger.debug("Failed to clear terminal Nous OAuth state: %s", clear_exc)
singleton_sources = {
auth_mod.NOUS_DEVICE_CODE_SOURCE,
f"manual:{auth_mod.NOUS_DEVICE_CODE_SOURCE}",
}
self._entries = [
item for item in self._entries
if item.source not in singleton_sources
]
if self._current_id == entry.id:
self._current_id = None
self._persist()
return None
self._mark_exhausted(entry, None)
return None
@@ -1207,13 +829,8 @@ class CredentialPool:
entry.access_token,
CODEX_ACCESS_TOKEN_REFRESH_SKEW_SECONDS,
)
if self.provider == "xai-oauth":
return auth_mod._xai_access_token_is_expiring(
entry.access_token,
auth_mod.XAI_ACCESS_TOKEN_REFRESH_SKEW_SECONDS,
)
if self.provider == "nous":
# Nous refresh can require network access and should happen when
# Nous refresh/mint can require network access and should happen when
# runtime credentials are actually resolved, not merely when the pool
# is enumerated for listing, migration, or selection.
return False
@@ -1232,14 +849,13 @@ class CredentialPool:
"""
now = time.time()
cleared_any = False
entries_to_prune: List[str] = []
available: List[PooledCredential] = []
for entry in self._entries:
# For anthropic claude_code entries, sync from the credentials file
# before any status/refresh checks. This picks up tokens refreshed
# by other processes (Claude Code CLI, other Hermes profiles).
if (self.provider == "anthropic" and entry.source == "claude_code"
and entry.last_status in {STATUS_EXHAUSTED, STATUS_DEAD}):
and entry.last_status == STATUS_EXHAUSTED):
synced = self._sync_anthropic_entry_from_credentials_file(entry)
if synced is not entry:
entry = synced
@@ -1250,7 +866,7 @@ class CredentialPool:
# exhausted status stale.
if (self.provider == "nous"
and entry.source == "device_code"
and entry.last_status in {STATUS_EXHAUSTED, STATUS_DEAD}):
and entry.last_status == STATUS_EXHAUSTED):
synced = self._sync_nous_entry_from_auth_store(entry)
if synced is not entry:
entry = synced
@@ -1262,52 +878,11 @@ class CredentialPool:
# future for ChatGPT weekly windows).
if (self.provider == "openai-codex"
and entry.source == "device_code"
and entry.last_status in {STATUS_EXHAUSTED, STATUS_DEAD}):
and entry.last_status == STATUS_EXHAUSTED):
synced = self._sync_codex_entry_from_auth_store(entry)
if synced is not entry:
entry = synced
cleared_any = True
# For xai-oauth singleton-seeded entries, identical pattern:
# an entry frozen as exhausted may simply be holding stale
# tokens that another process (or a fresh `hermes model` ->
# xAI Grok OAuth login) has since rotated in auth.json.
if (self.provider == "xai-oauth"
and entry.source == "loopback_pkce"
and entry.last_status in {STATUS_EXHAUSTED, STATUS_DEAD}):
synced = self._sync_xai_oauth_entry_from_auth_store(entry)
if synced is not entry:
entry = synced
cleared_any = True
if entry.last_status == STATUS_DEAD:
# Manual DEAD credentials get pruned after a 24h quiet window
# so the pool doesn't accumulate dead entries forever. The
# user can always re-add via ``hermes auth add``. Singleton-
# seeded DEAD entries are kept so the audit trail (label,
# last_error_reason, timestamps) stays visible — pruning them
# would just be undone by ``_seed_from_singletons`` on the
# next load anyway.
if _is_manual_source(entry.source):
dead_at = entry.last_status_at or 0
if dead_at and now - dead_at > DEAD_MANUAL_PRUNE_TTL_SECONDS:
_label = entry.label or entry.id[:8]
logger.warning(
"credential pool: pruning DEAD manual entry %s "
"(reason=%s, age=%.1fh) — re-add via `hermes auth add %s`",
_label,
entry.last_error_reason or "unknown",
(now - dead_at) / 3600.0,
self.provider,
)
# Mark for removal after the loop completes; we can't
# mutate self._entries while iterating.
entries_to_prune.append(entry.id)
cleared_any = True
# Permanently failed credentials never re-enter rotation via
# TTL. They only clear when a write-side re-auth sync rewrites
# the tokens (e.g. ``_save_codex_tokens`` after a fresh
# device-code login). The auth.json-sync paths below handle
# the re-auth case for OAuth singletons.
continue
if entry.last_status == STATUS_EXHAUSTED:
exhausted_until = _exhausted_until(entry)
if exhausted_until is not None and now < exhausted_until:
@@ -1331,9 +906,6 @@ class CredentialPool:
continue
entry = refreshed
available.append(entry)
if entries_to_prune:
pruned_ids = set(entries_to_prune)
self._entries = [e for e in self._entries if e.id not in pruned_ids]
if cleared_any:
self._persist()
return available
@@ -1383,40 +955,17 @@ class CredentialPool:
*,
status_code: Optional[int],
error_context: Optional[Dict[str, Any]] = None,
api_key_hint: Optional[str] = None,
) -> Optional[PooledCredential]:
with self._lock:
entry = None
if api_key_hint:
# Prefer the specific entry whose API key matches the one that
# actually failed. When this pool was freshly loaded from disk
# (another process already rotated), current() is None and
# _select_unlocked() would return the NEXT key — the wrong one.
entry = next(
(e for e in self._entries if e.runtime_api_key == api_key_hint),
None,
)
if entry is None:
entry = self.current() or self._select_unlocked()
entry = self.current() or self._select_unlocked()
if entry is None:
return None
_label = entry.label or entry.id[:8]
self._mark_exhausted(entry, status_code, error_context)
# Re-read the updated entry to log the correct terminal state.
updated_entry = next(
(e for e in self._entries if e.id == entry.id), entry,
logger.info(
"credential pool: marking %s exhausted (status=%s), rotating",
_label, status_code,
)
if updated_entry.last_status == STATUS_DEAD:
logger.warning(
"credential pool: marking %s DEAD (status=%s, reason=%s) — "
"permanently failed, will NOT re-enter rotation until re-auth",
_label, status_code, updated_entry.last_error_reason or "unknown",
)
else:
logger.info(
"credential pool: marking %s exhausted (status=%s), rotating",
_label, status_code,
)
self._mark_exhausted(entry, status_code, error_context)
self._current_id = None
next_entry = self._select_unlocked()
if next_entry:
@@ -1578,12 +1127,8 @@ def _upsert_entry(entries: List[PooledCredential], provider: str, source: str, p
if field_updates or extra_updates:
if extra_updates:
field_updates["extra"] = {**existing.extra, **extra_updates}
updated = replace(existing, **field_updates)
entries[existing_idx] = updated
# Runtime-only borrowed secret updates should refresh the in-memory
# entry without forcing auth.json churn when the disk-safe payload is
# unchanged (for example env keys with the same fingerprint).
return existing.to_dict() != updated.to_dict()
entries[existing_idx] = replace(existing, **field_updates)
return True
return False
@@ -1646,48 +1191,6 @@ def _seed_from_singletons(provider: str, entries: List[PooledCredential]) -> Tup
except ImportError:
pass
# API-key vs OAuth is a user-visible choice at `hermes setup` ("Claude
# Pro/Max subscription" vs "Anthropic API key"). The signal that the
# user picked the API-key path is: ANTHROPIC_API_KEY set in the env,
# AND no OAuth env vars set — `save_anthropic_api_key()` writes the
# API key and zeros ANTHROPIC_TOKEN; `save_anthropic_oauth_token()`
# does the inverse. When that signal is present we MUST NOT seed
# autodiscovered OAuth tokens (~/.claude/.credentials.json from the
# Claude Code CLI, hermes_pkce creds from a previous OAuth login)
# into the anthropic pool — otherwise rotation on a 401/429 silently
# flips the session onto an OAuth credential, which forces the Claude
# Code identity injection, `mcp_` tool-name rewrite, and claude-cli
# User-Agent header (`agent/anthropic_adapter.py:2128`). Users who
# explicitly opted into the API-key path are explicitly opting OUT of
# that masquerade. Prefer ~/.hermes/.env over os.environ for the
# same reason `_seed_from_env` does — that's the authoritative file
# that `hermes setup` writes.
_env_file = load_env()
def _env_val(key: str) -> str:
return (_env_file.get(key) or os.environ.get(key) or "").strip()
anthropic_api_key = _env_val("ANTHROPIC_API_KEY")
anthropic_oauth_env = (
_env_val("ANTHROPIC_TOKEN") or _env_val("CLAUDE_CODE_OAUTH_TOKEN")
)
api_key_path_explicit = bool(anthropic_api_key and not anthropic_oauth_env)
if api_key_path_explicit:
# Prune any stale autodiscovered OAuth entries that may have been
# seeded into the on-disk pool during a previous OAuth session.
# Without this, switching OAuth -> API key at setup leaves the
# OAuth entries dormant in auth.json forever and rotation on a
# transient 401 could revive them.
retained = [
entry for entry in entries
if entry.source not in {"hermes_pkce", "claude_code"}
]
if len(retained) != len(entries):
entries[:] = retained
changed = True
return changed, active_sources
from agent.anthropic_adapter import read_claude_code_credentials, read_hermes_oauth_credentials
for source_name, creds in (
@@ -1714,22 +1217,7 @@ def _seed_from_singletons(provider: str, entries: List[PooledCredential]) -> Tup
elif provider == "nous":
state = _load_provider_state(auth_store, "nous")
has_runtime_material = bool(
isinstance(state, dict)
and (
str(state.get("access_token") or "").strip()
or str(state.get("agent_key") or "").strip()
)
)
if state and not has_runtime_material:
retained = [
entry for entry in entries
if entry.source not in {"device_code", "manual:device_code"}
]
if len(retained) != len(entries):
entries[:] = retained
changed = True
if state and has_runtime_material and not _is_suppressed(provider, "device_code"):
if state and not _is_suppressed(provider, "device_code"):
active_sources.add("device_code")
# Prefer a user-supplied label embedded in the singleton state
# (set by persist_nous_credentials(label=...) when the user ran
@@ -1756,9 +1244,9 @@ def _seed_from_singletons(provider: str, entries: List[PooledCredential]) -> Tup
"inference_base_url": state.get("inference_base_url"),
"agent_key": state.get("agent_key"),
"agent_key_expires_at": state.get("agent_key_expires_at"),
# Carry the refresh timestamps into the pool so
# Carry the mint/refresh timestamps into the pool so
# freshness-sensitive consumers (self-heal hooks, pool
# pruning by age) can distinguish just-refreshed credentials
# pruning by age) can distinguish just-minted credentials
# from stale ones. Without these, fresh device_code
# entries get obtained_at=None and look older than they
# are (#15099).
@@ -1906,37 +1394,6 @@ def _seed_from_singletons(provider: str, entries: List[PooledCredential]) -> Tup
},
)
elif provider == "xai-oauth":
# When the user logs in via ``hermes model`` -> xAI Grok OAuth,
# tokens are written to the auth.json singleton
# (``providers["xai-oauth"]``). Surface them in the pool too so
# ``hermes auth list`` reflects the logged-in state and so the pool
# is the single source of truth for refresh during runtime resolution.
if _is_suppressed(provider, "loopback_pkce"):
return changed, active_sources
state = _load_provider_state(auth_store, "xai-oauth")
tokens = state.get("tokens") if isinstance(state, dict) else None
if isinstance(tokens, dict) and tokens.get("access_token"):
active_sources.add("loopback_pkce")
from hermes_cli.auth import DEFAULT_XAI_OAUTH_BASE_URL
base_url = DEFAULT_XAI_OAUTH_BASE_URL
changed |= _upsert_entry(
entries,
provider,
"loopback_pkce",
{
"source": "loopback_pkce",
"auth_type": AUTH_TYPE_OAUTH,
"access_token": tokens.get("access_token", ""),
"refresh_token": tokens.get("refresh_token"),
"base_url": base_url,
"last_refresh": state.get("last_refresh"),
"label": label_from_token(tokens.get("access_token", ""), "loopback_pkce"),
},
)
return changed, active_sources
@@ -1963,35 +1420,6 @@ def _seed_from_env(provider: str, entries: List[PooledCredential]) -> Tuple[bool
except ImportError:
def _is_source_suppressed(_p, _s): # type: ignore[misc]
return False
def _secret_source_for_env(env_var: str) -> Optional[str]:
try:
from hermes_cli.env_loader import get_secret_source
source_label = get_secret_source(env_var)
except Exception:
source_label = None
return str(source_label).strip() if source_label else None
def _env_payload(
*,
source: str,
env_var: str,
token: str,
base_url: str,
auth_type: str = AUTH_TYPE_API_KEY,
) -> Dict[str, Any]:
payload: Dict[str, Any] = {
"source": source,
"auth_type": auth_type,
"access_token": token,
"base_url": base_url,
"label": env_var,
}
secret_source = _secret_source_for_env(env_var)
if secret_source:
payload["secret_source"] = secret_source
return payload
if provider == "openrouter":
# Prefer ~/.hermes/.env over os.environ
token = _get_env_prefer_dotenv("OPENROUTER_API_KEY")
@@ -2004,12 +1432,13 @@ def _seed_from_env(provider: str, entries: List[PooledCredential]) -> Tuple[bool
entries,
provider,
source,
_env_payload(
source=source,
env_var="OPENROUTER_API_KEY",
token=token,
base_url=OPENROUTER_BASE_URL,
),
{
"source": source,
"auth_type": AUTH_TYPE_API_KEY,
"access_token": token,
"base_url": OPENROUTER_BASE_URL,
"label": "OPENROUTER_API_KEY",
},
)
return changed, active_sources
@@ -2048,13 +1477,13 @@ def _seed_from_env(provider: str, entries: List[PooledCredential]) -> Tuple[bool
entries,
provider,
source,
_env_payload(
source=source,
env_var=env_var,
token=token,
base_url=base_url,
auth_type=auth_type,
),
{
"source": source,
"auth_type": auth_type,
"access_token": token,
"base_url": base_url,
"label": env_var,
},
)
return changed, active_sources
@@ -2066,11 +1495,8 @@ def _prune_stale_seeded_entries(entries: List[PooledCredential], active_sources:
if _is_manual_source(entry.source)
or entry.source in active_sources
or not (
is_borrowed_credential_source(entry.source, entry.provider)
# Hermes PKCE is Hermes-owned/persistable while present, but it is
# still a file-backed singleton and should disappear from the pool
# when the backing OAuth file is gone.
or entry.source == "hermes_pkce"
entry.source.startswith("env:")
or entry.source in {"claude_code", "hermes_pkce"}
)
]
if len(retained) == len(entries):
@@ -2155,22 +1581,17 @@ def _seed_custom_pool(pool_key: str, entries: List[PooledCredential]) -> Tuple[b
def load_pool(provider: str) -> CredentialPool:
provider = (provider or "").strip().lower()
raw_entries = read_credential_pool(provider)
raw_needs_sanitization = any(
isinstance(payload, dict)
and sanitize_borrowed_credential_payload(payload, provider) != payload
for payload in raw_entries
)
entries = [PooledCredential.from_dict(provider, payload) for payload in raw_entries]
if provider.startswith(CUSTOM_POOL_PREFIX):
# Custom endpoint pool — seed from custom_providers config and model config
custom_changed, custom_sources = _seed_custom_pool(provider, entries)
changed = raw_needs_sanitization or custom_changed
changed = custom_changed
changed |= _prune_stale_seeded_entries(entries, custom_sources)
else:
singleton_changed, singleton_sources = _seed_from_singletons(provider, entries)
env_changed, env_sources = _seed_from_env(provider, entries)
changed = raw_needs_sanitization or singleton_changed or env_changed
changed = singleton_changed or env_changed
changed |= _prune_stale_seeded_entries(entries, singleton_sources | env_sources)
changed |= _normalize_pool_priorities(provider, entries)

View File

@@ -240,11 +240,11 @@ def _clear_auth_store_provider(provider: str) -> bool:
def _remove_nous_device_code(provider: str, removed) -> RemovalResult:
"""Nous OAuth lives in auth.json providers.nous — clear it and suppress.
We suppress in addition to clearing because nothing else stops a future
`hermes auth add nous` (or any other path that writes providers.nous)
from re-seeding before the user has decided to. Suppression forces
them to go through `hermes auth add nous` to re-engage, which is the
documented re-add path and clears the suppression atomically.
We suppress in addition to clearing because nothing else stops the
user's next `hermes login` run from writing providers.nous again
before they decide to. Suppression forces them to go through
`hermes auth add nous` to re-engage, which is the documented re-add
path and clears the suppression atomically.
"""
result = RemovalResult()
if _clear_auth_store_provider(provider):
@@ -265,31 +265,6 @@ def _remove_minimax_oauth(provider: str, removed) -> RemovalResult:
return result
def _remove_xai_oauth_loopback_pkce(provider: str, removed) -> RemovalResult:
"""xAI OAuth tokens live in auth.json providers.xai-oauth — clear them.
Without this step, ``hermes auth remove xai-oauth <N>`` silently undoes
itself: the central dispatcher only removes the in-memory pool entry,
leaves ``providers.xai-oauth`` in auth.json intact, and on the next
``load_pool("xai-oauth")`` call ``_seed_from_singletons`` re-seeds the
entry from the still-present singleton — credentials reappear with no
user feedback. Clearing the singleton in step with the suppression set
by the central dispatcher makes the removal stick.
Belt-and-braces against the manual entry path: ``hermes auth add
xai-oauth`` produces a ``manual:xai_pkce`` entry whose removal step
falls through to "unregistered → nothing to clean up" (correct —
manual entries are pool-only).
"""
result = RemovalResult()
if _clear_auth_store_provider(provider):
result.cleaned.append(f"Cleared {provider} OAuth tokens from auth store")
result.hints.append(
"Run `hermes model` → xAI Grok OAuth (SuperGrok / Premium+) to re-authenticate if needed."
)
return result
def _remove_codex_device_code(provider: str, removed) -> RemovalResult:
"""Codex tokens live in TWO places: our auth store AND ~/.codex/auth.json.
@@ -422,11 +397,6 @@ def _register_all_sources() -> None:
remove_fn=_remove_codex_device_code,
description="auth.json providers.openai-codex + ~/.codex/auth.json",
))
register(RemovalStep(
provider="xai-oauth", source_id="loopback_pkce",
remove_fn=_remove_xai_oauth_loopback_pkce,
description="auth.json providers.xai-oauth",
))
register(RemovalStep(
provider="qwen-oauth", source_id="qwen-cli",
remove_fn=_remove_qwen_cli,

View File

@@ -390,26 +390,7 @@ CURATOR_REVIEW_PROMPT = (
"(verification scripts, fixture generators, probes)\n"
" Then archive the old sibling. Use `terminal` with `mkdir -p "
"~/.hermes/skills/<umbrella>/references/ && mv ... <umbrella>/"
"references/<topic>.md` (or templates/ / scripts/).\n\n"
"Package integrity — not optional:\n"
"Before demoting or archiving a skill, inspect it as a COMPLETE "
"directory package, not just SKILL.md. A skill root may include "
"`references/`, `templates/`, `scripts/`, and `assets/`; `skill_view` "
"discovers those relative to the skill root. A reference markdown file "
"inside another skill is NOT a new skill root and does not get its own "
"linked-file discovery.\n"
"If the source skill has support files OR SKILL.md contains relative "
"links such as `references/...`, `templates/...`, `scripts/...`, or "
"`assets/...`, DO NOT flatten only SKILL.md into "
"`<umbrella>/references/<old>.md`. Choose one safe path instead:\n"
" • keep it as a standalone skill, OR\n"
" • fully merge it by re-homing every needed support file into the "
"umbrella's canonical `references/`, `templates/`, `scripts/`, or "
"`assets/` directories AND rewrite the destination instructions to "
"the new paths, OR\n"
" • archive the entire original skill package unchanged.\n"
"Never leave archived/demoted instructions pointing at files that were "
"left behind under the old skill directory.\n"
"references/<topic>.md` (or templates/ / scripts/).\n"
"4. Also flag skills whose NAME is too narrow (contains a PR number, "
"a feature codename, a specific error string, an 'audit' / "
"'diagnosis' / 'salvage' session artifact). These almost always "

View File

@@ -39,15 +39,17 @@ from __future__ import annotations
import json
import logging
import os
import re
import shutil
import tarfile
import tempfile
import time
from datetime import datetime, timezone
from pathlib import Path
from typing import Any, Dict, List, Optional, Tuple
from hermes_constants import get_hermes_home
from agent.skill_utils import is_excluded_skill_path
logger = logging.getLogger(__name__)
@@ -174,9 +176,7 @@ def get_keep() -> int:
def _count_skill_files(base: Path) -> int:
try:
return sum(
1 for p in base.rglob("SKILL.md") if not is_excluded_skill_path(p)
)
return sum(1 for _ in base.rglob("SKILL.md"))
except OSError:
return 0

View File

@@ -14,7 +14,6 @@ from difflib import unified_diff
from pathlib import Path
from utils import safe_json_loads
from agent.tool_result_classification import file_mutation_result_landed
# ANSI escape codes for coloring tool failure indicators
_RED = "\033[31m"
@@ -240,6 +239,21 @@ def build_tool_preview(tool_name: str, args: dict, max_len: int | None = None) -
msg = msg[:17] + "..."
return f"to {target}: \"{msg}\""
if tool_name.startswith("rl_"):
rl_previews = {
"rl_list_environments": "listing envs",
"rl_select_environment": args.get("name", ""),
"rl_get_current_config": "reading config",
"rl_edit_config": f"{args.get('field', '')}={args.get('value', '')}",
"rl_start_training": "starting",
"rl_check_status": args.get("run_id", "")[:16],
"rl_stop_training": f"stopping {args.get('run_id', '')[:16]}",
"rl_get_results": args.get("run_id", "")[:16],
"rl_list_runs": "listing runs",
"rl_test_inference": f"{args.get('num_steps', 3)} steps",
}
return rl_previews.get(tool_name)
key = primary_args.get(tool_name)
if not key:
for fallback_key in ("query", "text", "command", "path", "name", "prompt", "code", "goal"):
@@ -787,65 +801,31 @@ class KawaiiSpinner:
# Cute tool message (completion line that replaces the spinner)
# =========================================================================
_ERROR_SUFFIX_MAX_LEN = 48
def _trim_error(msg: str) -> str:
"""Shrink an error message for inline display in a tool status line.
Strips overly long absolute paths down to just the filename so the
suffix stays readable on narrow terminals.
"""
msg = msg.strip()
# Common case: "File not found: /very/long/absolute/path/foo.py"
if "File not found:" in msg:
_, _, tail = msg.partition("File not found:")
tail = tail.strip()
if "/" in tail:
msg = f"File not found: {tail.rsplit('/', 1)[-1]}"
if len(msg) > _ERROR_SUFFIX_MAX_LEN:
msg = msg[: _ERROR_SUFFIX_MAX_LEN - 3] + "..."
return msg
def _detect_tool_failure(tool_name: str, result: str | None) -> tuple[bool, str]:
"""Inspect a tool result string for signs of failure.
Returns ``(is_failure, suffix)`` where *suffix* is a short informational
tag like ``" [exit 1]"`` for terminal failures, ``" [full]"`` for memory
overflow, or a trimmed error message (``" [File not found: foo.py]"``).
On success returns ``(False, "")``.
Returns ``(is_failure, suffix)`` where *suffix* is an informational tag
like ``" [exit 1]"`` for terminal failures, or ``" [error]"`` for generic
failures. On success, returns ``(False, "")``.
"""
if result is None:
return False, ""
if file_mutation_result_landed(tool_name, result):
return False, ""
data = safe_json_loads(result)
# Terminal: non-zero exit code is the canonical failure signal.
if tool_name == "terminal":
data = safe_json_loads(result)
if isinstance(data, dict):
exit_code = data.get("exit_code")
if exit_code is not None and exit_code != 0:
err_msg = data.get("error")
if err_msg:
return True, f" [{_trim_error(str(err_msg))}]"
return True, f" [exit {exit_code}]"
return False, ""
# Memory: distinguish "store full" from real errors.
# Memory-specific: distinguish "full" from real errors
if tool_name == "memory":
data = safe_json_loads(result)
if isinstance(data, dict):
if data.get("success") is False and "exceed the limit" in data.get("error", ""):
return True, " [full]"
# Structured error in JSON result (any tool that surfaces {"error": ...}).
if isinstance(data, dict):
err = data.get("error") or data.get("message")
if err and (data.get("success") is False or "error" in data):
return True, f" [{_trim_error(str(err))}]"
# Generic heuristic for non-terminal tools
# Multimodal tool results (dicts with _multimodal=True) are not strings —
# treat them as successes since failures would be JSON-encoded strings.
@@ -904,6 +884,10 @@ def get_cute_tool_message(
extra = f" +{len(urls)-1}" if len(urls) > 1 else ""
return _wrap(f"┊ 📄 fetch {_trunc(domain, 35)}{extra} {dur}")
return _wrap(f"┊ 📄 fetch pages {dur}")
if tool_name == "web_crawl":
url = args.get("url", "")
domain = url.replace("https://", "").replace("http://", "").split("/")[0]
return _wrap(f"┊ 🕸️ crawl {_trunc(domain, 35)} {dur}")
if tool_name == "terminal":
return _wrap(f"┊ 💻 $ {_trunc(args.get('command', ''), 42)} {dur}")
if tool_name == "process":
@@ -949,29 +933,11 @@ def get_cute_tool_message(
if tool_name == "todo":
todos_arg = args.get("todos")
merge = args.get("merge", False)
# Parse result for completion progress
total = 0
done = 0
if result:
try:
data = safe_json_loads(result)
if data:
s = data.get("summary", {})
total = s.get("total", 0)
done = s.get("completed", 0)
except Exception:
pass
if todos_arg is None:
if total > 0:
return _wrap(f"┊ 📋 plan {done}/{total} task(s) {dur}")
return _wrap(f"┊ 📋 plan reading tasks {dur}")
elif merge:
if total > 0 and done > 0:
return _wrap(f"┊ 📋 plan update {done}/{total}{dur}")
return _wrap(f"┊ 📋 plan update {len(todos_arg)} task(s) {dur}")
else:
if total > 0 and done > 0:
return _wrap(f"┊ 📋 plan {done}/{total} task(s) {dur}")
return _wrap(f"┊ 📋 plan {len(todos_arg)} task(s) {dur}")
if tool_name == "session_search":
return _wrap(f"┊ 🔍 recall \"{_trunc(args.get('query', ''), 35)}\" {dur}")
@@ -1012,6 +978,15 @@ def get_cute_tool_message(
if action == "list":
return _wrap(f"┊ ⏰ cron listing {dur}")
return _wrap(f"┊ ⏰ cron {action} {args.get('job_id', '')} {dur}")
if tool_name.startswith("rl_"):
rl = {
"rl_list_environments": "list envs", "rl_select_environment": f"select {args.get('name', '')}",
"rl_get_current_config": "get config", "rl_edit_config": f"set {args.get('field', '?')}",
"rl_start_training": "start training", "rl_check_status": f"status {args.get('run_id', '?')[:12]}",
"rl_stop_training": f"stop {args.get('run_id', '?')[:12]}", "rl_get_results": f"results {args.get('run_id', '?')[:12]}",
"rl_list_runs": "list runs", "rl_test_inference": "test inference",
}
return _wrap(f"┊ 🧪 rl {rl.get(tool_name, tool_name.replace('rl_', ''))} {dur}")
if tool_name == "execute_code":
code = args.get("code", "")
first_line = code.strip().split("\n")[0] if code.strip() else ""

View File

@@ -44,15 +44,12 @@ class FailoverReason(enum.Enum):
payload_too_large = "payload_too_large" # 413 — compress payload
image_too_large = "image_too_large" # Native image part exceeds provider's per-image limit — shrink and retry
# Model / provider policy
# Model
model_not_found = "model_not_found" # 404 or invalid model — fallback to different model
provider_policy_blocked = "provider_policy_blocked" # Aggregator (e.g. OpenRouter) blocked the only endpoint due to account data/privacy policy
content_policy_blocked = "content_policy_blocked" # Provider safety filter rejected this prompt — deterministic per-request, don't retry unchanged
# Request format
format_error = "format_error" # 400 bad request — abort or strip + retry
invalid_encrypted_content = "invalid_encrypted_content" # Responses replay blob rejected — strip replay state and retry
multimodal_tool_content_unsupported = "multimodal_tool_content_unsupported" # Provider rejected list-type content in tool messages (e.g. Xiaomi MiMo) — downgrade to text and retry
# Provider-specific
thinking_signature = "thinking_signature" # Anthropic thinking block sig invalid
@@ -98,20 +95,13 @@ _BILLING_PATTERNS = [
"insufficient_quota",
"insufficient balance",
"credit balance",
"credits exhausted",
"credits have been exhausted",
"no usable credits",
"top up your credits",
"payment required",
"billing hard limit",
"exceeded your current quota",
"account is deactivated",
"plan does not include",
"out of funds",
"run out of funds",
"balance_depleted",
"model_not_supported_on_free_tier",
"not available on the free tier",
]
# Patterns that indicate rate limiting (transient, will resolve)
@@ -175,32 +165,6 @@ _IMAGE_TOO_LARGE_PATTERNS = [
# the likely culprit; we still try the shrink path before giving up.
]
# Providers that follow the OpenAI spec strictly require tool message
# ``content`` to be a string. Some (Anthropic native, Codex Responses,
# Gemini native, first-party OpenAI) extend this to accept a content-parts
# list (text + image_url) so screenshots from computer_use survive. Others
# (Xiaomi MiMo, some Alibaba endpoints, a long tail of OpenAI-compatible
# providers) reject the list with a 400 — the patterns below are the most
# common error shapes we see. Recovery: strip image parts from tool
# messages in-place, record the (provider, model) for the rest of the
# session so we don't waste another call learning the same lesson, retry.
#
# See: https://github.com/NousResearch/hermes-agent/issues/27344
_MULTIMODAL_TOOL_CONTENT_PATTERNS = [
# Xiaomi MiMo: {"error":{"code":"400","message":"Param Incorrect","param":"text is not set"}}
"text is not set",
# Generic "tool message must be string" shapes
"tool message content must be a string",
"tool content must be a string",
"tool message must be a string",
# OpenAI-compat servers that reject list-type tool content with a
# schema-validation message
"expected string, got list",
"expected string, got array",
# Alibaba/DashScope variant
"tool_call.content must be string",
]
# Context overflow patterns
_CONTEXT_OVERFLOW_PATTERNS = [
"context length",
@@ -249,24 +213,6 @@ _MODEL_NOT_FOUND_PATTERNS = [
"unsupported model",
]
# Request-validation patterns — the request is malformed and will fail
# identically on every retry. Some OpenAI-compatible gateways (notably
# codex.nekos.me) return these as 5xx instead of the standard 4xx, which
# makes the generic "5xx → retryable server_error" rule misfire: the retry
# loop hammers the same deterministic rejection 3+ times, then the
# transport-recovery path resets the counter and does it again, producing
# a request flood. When a 5xx body carries one of these unambiguous
# request-validation signals, classify as a non-retryable format_error so
# the loop fails fast and falls back instead of looping.
_REQUEST_VALIDATION_PATTERNS = [
"unknown parameter",
"unsupported parameter",
"unrecognized request argument",
"invalid_request_error",
"unknown_parameter",
"unsupported_parameter",
]
# OpenRouter aggregator policy-block patterns.
#
# When a user's OpenRouter account privacy setting (or a per-request
@@ -290,45 +236,6 @@ _PROVIDER_POLICY_BLOCKED_PATTERNS = [
"no endpoints found matching your data policy",
]
# Provider content-policy / safety-filter blocks. Distinct from
# ``provider_policy_blocked`` above (which is an OpenRouter *account*-level
# data/privacy guardrail) — these are *per-prompt* safety decisions made by
# the upstream model provider. They are deterministic for the unchanged
# request, so retrying the same prompt three times just reproduces the same
# block and burns paid attempts on a refusal. The recovery is to switch to a
# configured fallback model/provider immediately, or surface the block to
# the user with actionable guidance if no fallback exists.
#
# Patterns are intentionally narrow — each phrase is a verbatim string from
# a specific provider's safety pipeline, not a generic word like "policy" or
# "violation" that could collide with billing/auth/format errors:
# • OpenAI Codex cybersecurity refusal (gpt-5.5, the case from #18028)
# • OpenAI moderation refusal ("violates our usage policies", with
# "usage policies" disambiguating from billing's "exceeded ... policy")
# • Anthropic safety refusal ("prompt was flagged by ... safety system")
# • OpenAI Responses content filter
_CONTENT_POLICY_BLOCKED_PATTERNS = [
# OpenAI Codex (#18028) — message may arrive without an HTTP status
"flagged for possible cybersecurity risk",
"trusted access for cyber",
# OpenAI moderation — chat completions / responses
"violates our usage policies",
"violates openai's usage policies",
"your request was flagged by",
# Anthropic safety system
"prompt was flagged by our safety",
"responses cannot be generated due to safety",
# Generic content-filter wording seen on Azure / OpenAI Responses.
# ``content_filter`` (underscore) is the OpenAI-standard error/finish
# token surfaced verbatim by their SDKs when a request is blocked.
# ``responsibleaipolicyviolation`` is Azure OpenAI's error code.
# Deliberately NOT matching the space variant ("content filter") — it
# appears in benign config descriptions and tooltip text that providers
# echo back; the underscore form is provider-specific enough.
"content_filter",
"responsibleaipolicyviolation",
]
# Auth patterns (non-status-code signals)
_AUTH_PATTERNS = [
"invalid api key",
@@ -532,20 +439,6 @@ def classify_api_error(
# ── 1. Provider-specific patterns (highest priority) ────────────
# Provider content-policy / safety-filter block. The provider has made a
# deterministic refusal decision about THIS prompt — retrying unchanged
# just reproduces the same refusal and burns paid attempts. Must run
# before status-based classification so a 400 safety block isn't
# downgraded to a generic ``format_error`` and a status-less block
# (OpenAI Codex SDK can raise without one) isn't left in the retryable
# ``unknown`` bucket. See issue #18028.
if any(p in error_msg for p in _CONTENT_POLICY_BLOCKED_PATTERNS):
return _result(
FailoverReason.content_policy_blocked,
retryable=False,
should_fallback=True,
)
# Anthropic thinking block signature invalid (400).
# Don't gate on provider — OpenRouter proxies Anthropic errors, so the
# provider may be "openrouter" even though the error is Anthropic-specific.
@@ -617,35 +510,6 @@ def classify_api_error(
should_compress=False,
)
# xAI Grok subscription entitlement errors.
#
# xAI returns "You have either run out of available resources or do not
# have an active Grok subscription" through two distinct code paths:
#
# • HTTP 403 — status_code is set; _classify_by_status (step 2) routes
# it to FailoverReason.auth correctly, and _is_entitlement_failure
# then prevents the credential-refresh loop.
#
# • SSE ``type=error`` frame — surfaced as _StreamErrorEvent with
# status_code=None. _classify_by_status is skipped entirely, and
# "grok subscription" / "out of available resources" appear in none
# of the message-pattern lists below. Without this guard the error
# falls through to FailoverReason.unknown (retryable=True), burning
# max_retries before the agent stops — and _is_entitlement_failure
# is never called because it only runs under FailoverReason.auth.
#
# Both X Premium+ and SuperGrok subscribers hit this path when their
# subscription tier does not cover the requested model or feature.
if (
"do not have an active grok subscription" in error_msg
or ("out of available resources" in error_msg and "grok" in error_msg)
):
return _result(
FailoverReason.auth,
retryable=False,
should_fallback=True,
)
# ── 2. HTTP status code classification ──────────────────────────
if status_code is not None:
@@ -751,13 +615,8 @@ def _classify_by_status(
)
if status_code == 403:
# OpenRouter 403 "key limit exceeded" is actually billing. Other
# providers also use 403 for account-plan or credit exhaustion.
if (
"key limit exceeded" in error_msg
or "spending limit" in error_msg
or any(p in error_msg for p in _BILLING_PATTERNS)
):
# OpenRouter 403 "key limit exceeded" is actually billing
if "key limit exceeded" in error_msg or "spending limit" in error_msg:
return result_fn(
FailoverReason.billing,
retryable=False,
@@ -774,17 +633,6 @@ def _classify_by_status(
return _classify_402(error_msg, result_fn)
if status_code == 404:
# Nous API currently surfaces HA/NAS credit depletion as a paid model
# becoming unavailable on the Free Tier, returned as 404 rather than
# 402. Treat that as entitlement/billing exhaustion, not a missing
# model, so the retry loop can show credit/top-up guidance.
if any(p in error_msg for p in _BILLING_PATTERNS):
return result_fn(
FailoverReason.billing,
retryable=False,
should_rotate_credential=True,
should_fallback=True,
)
# OpenRouter policy-block 404 — distinct from "model not found".
# The model exists; the user's account privacy setting excludes the
# only endpoint serving it. Falling back to another provider won't
@@ -841,23 +689,6 @@ def _classify_by_status(
)
if status_code in {500, 502}:
# Some OpenAI-compatible gateways return request-validation errors
# with a 5xx status (codex.nekos.me returns 502 for unknown/
# unsupported parameters). These are deterministic — every retry
# gets the identical rejection — so the generic "5xx → retryable
# server_error" rule turns one bad request into a retry flood.
# Detect the unambiguous request-validation signals (in either the
# message text or the structured error code) and fail fast.
if (
any(p in error_msg for p in _REQUEST_VALIDATION_PATTERNS)
or error_code.lower() in {"invalid_request_error", "unknown_parameter",
"unsupported_parameter"}
):
return result_fn(
FailoverReason.format_error,
retryable=False,
should_fallback=True,
)
return result_fn(FailoverReason.server_error, retryable=True)
if status_code in {503, 529}:
@@ -921,19 +752,6 @@ def _classify_400(
) -> ClassifiedError:
"""Classify 400 Bad Request — context overflow, format error, or generic."""
# Multimodal tool content rejected from 400. Must be checked BEFORE
# image_too_large because the recovery is different (strip image parts
# from tool messages, mark the model as no-list-tool-content for the
# rest of the session) and BEFORE context_overflow because some of the
# patterns ("text is not set") are ambiguous in isolation but become
# specific when combined with a 400 on a request known to contain
# multimodal tool content.
if any(p in error_msg for p in _MULTIMODAL_TOOL_CONTENT_PATTERNS):
return result_fn(
FailoverReason.multimodal_tool_content_unsupported,
retryable=True,
)
# Image-too-large from 400 (Anthropic's 5 MB per-image check fires this way).
# Must be checked BEFORE context_overflow because messages can trip both
# patterns ("exceeds" + "image") and image-shrink is a cheaper recovery.
@@ -943,26 +761,6 @@ def _classify_400(
retryable=True,
)
# Invalid encrypted reasoning replay blob (OpenAI Responses API). Must be
# checked BEFORE context_overflow because some surfaces emit messages that
# contain context-like phrasing ("encrypted content … could not be
# verified") which could otherwise trip the context_overflow heuristics.
# ``error_msg`` is lowercased upstream — match accordingly.
error_code_lower = (error_code or "").lower()
if (
error_code_lower == "invalid_encrypted_content"
or "invalid_encrypted_content" in error_msg
or (
"encrypted content for item" in error_msg
and "could not be verified" in error_msg
)
):
return result_fn(
FailoverReason.invalid_encrypted_content,
retryable=True,
should_fallback=False,
)
# Context overflow from 400
if any(p in error_msg for p in _CONTEXT_OVERFLOW_PATTERNS):
return result_fn(
@@ -1050,15 +848,7 @@ def _classify_by_error_code(
should_rotate_credential=True,
)
if code_lower in {
"insufficient_quota",
"billing_not_active",
"payment_required",
"insufficient_credits",
"no_usable_credits",
"balance_depleted",
"model_not_supported_on_free_tier",
}:
if code_lower in {"insufficient_quota", "billing_not_active", "payment_required"}:
return result_fn(
FailoverReason.billing,
retryable=False,
@@ -1080,13 +870,6 @@ def _classify_by_error_code(
should_compress=True,
)
if code_lower == "invalid_encrypted_content":
return result_fn(
FailoverReason.invalid_encrypted_content,
retryable=True,
should_fallback=False,
)
return None
@@ -1110,13 +893,6 @@ def _classify_by_message(
should_compress=True,
)
# Multimodal tool content patterns (from message text when no status_code)
if any(p in error_msg for p in _MULTIMODAL_TOOL_CONTENT_PATTERNS):
return result_fn(
FailoverReason.multimodal_tool_content_unsupported,
retryable=True,
)
# Image-too-large patterns (from message text when no status_code)
if any(p in error_msg for p in _IMAGE_TOO_LARGE_PATTERNS):
return result_fn(
@@ -1254,49 +1030,15 @@ def _extract_error_code(body: dict) -> str:
"""Extract an error code string from the response body."""
if not body:
return ""
def _code_from_payload(payload) -> str:
"""Extract a code/type from a nested error payload dict (defensive)."""
if not isinstance(payload, dict):
return ""
payload_error = payload.get("error", {})
if isinstance(payload_error, dict):
nested = payload_error.get("code") or payload_error.get("type") or ""
if isinstance(nested, str) and nested.strip() and nested.strip() != "400":
return nested.strip()
code = payload.get("code") or payload.get("error_code") or ""
if isinstance(code, (str, int)):
text = str(code).strip()
if text and text != "400":
return text
return ""
error_obj = body.get("error", {})
if isinstance(error_obj, dict):
code = error_obj.get("code") or error_obj.get("type") or ""
if isinstance(code, str) and code.strip() and code.strip() != "400":
if isinstance(code, str) and code.strip():
return code.strip()
# Some providers wrap the real JSON error body as a string inside
# error.message — peek into it for a nested code (e.g. Responses API
# surfaces ``invalid_encrypted_content`` this way).
message = error_obj.get("message")
if isinstance(message, str) and message.strip().startswith("{"):
import json
try:
inner = json.loads(message)
except (json.JSONDecodeError, TypeError):
inner = None
nested_code = _code_from_payload(inner)
if nested_code:
return nested_code
# Top-level code
code = body.get("code") or body.get("error_code") or ""
if isinstance(code, (str, int)):
text = str(code).strip()
if text and text != "400":
return text
return str(code).strip()
return ""

View File

@@ -16,19 +16,9 @@ def _hermes_home_path() -> Path:
return Path(os.path.expanduser("~/.hermes"))
def _hermes_root_path() -> Path:
"""Resolve the Hermes root dir (always the parent of any profile, never per-profile)."""
try:
from hermes_constants import get_default_hermes_root # local import to avoid cycles
return get_default_hermes_root()
except Exception:
return Path(os.path.expanduser("~/.hermes"))
def build_write_denied_paths(home: str) -> set[str]:
"""Return exact sensitive paths that must never be written."""
hermes_home = _hermes_home_path()
hermes_root = _hermes_root_path()
return {
os.path.realpath(p)
for p in [
@@ -36,16 +26,7 @@ def build_write_denied_paths(home: str) -> set[str]:
os.path.join(home, ".ssh", "id_rsa"),
os.path.join(home, ".ssh", "id_ed25519"),
os.path.join(home, ".ssh", "config"),
# Active profile .env (or top-level .env when not in profile mode).
str(hermes_home / ".env"),
# Top-level .env, even when running under a profile — overwriting it
# leaks credentials across every profile that inherits from root (#15981).
str(hermes_root / ".env"),
# Active profile Anthropic PKCE credential store.
str(hermes_home / ".anthropic_oauth.json"),
# Top-level Anthropic PKCE credential store remains sensitive even
# when a profile is active; default/non-profile sessions still read it.
str(hermes_root / ".anthropic_oauth.json"),
os.path.join(home, ".bashrc"),
os.path.join(home, ".zshrc"),
os.path.join(home, ".profile"),
@@ -55,7 +36,6 @@ def build_write_denied_paths(home: str) -> set[str]:
os.path.join(home, ".pgpass"),
os.path.join(home, ".npmrc"),
os.path.join(home, ".pypirc"),
os.path.join(home, ".git-credentials"),
"/etc/sudoers",
"/etc/passwd",
"/etc/shadow",
@@ -77,7 +57,6 @@ def build_write_denied_prefixes(home: str) -> list[str]:
os.path.join(home, ".docker"),
os.path.join(home, ".azure"),
os.path.join(home, ".config", "gh"),
os.path.join(home, ".config", "gcloud"),
]
]
@@ -104,43 +83,6 @@ def is_write_denied(path: str) -> bool:
if resolved.startswith(prefix):
return True
# Hermes control-plane files: block both the ACTIVE profile's view
# (hermes_home) AND the global root view. Without the root pass, a
# profile-mode session leaves <root>/auth.json + <root>/config.yaml
# writable — letting a prompt-injected write_file overwrite the global
# files that every profile inherits from (same shape as #15981).
control_file_names = ("auth.json", "config.yaml", "webhook_subscriptions.json")
mcp_tokens_dir_name = "mcp-tokens"
hermes_dirs = []
for base in (_hermes_home_path(), _hermes_root_path()):
try:
real = os.path.realpath(base)
if real not in hermes_dirs:
hermes_dirs.append(real)
except Exception:
continue
for base_real in hermes_dirs:
for name in control_file_names:
try:
if resolved == os.path.realpath(os.path.join(base_real, name)):
return True
except Exception:
continue
try:
mcp_real = os.path.realpath(os.path.join(base_real, mcp_tokens_dir_name))
if resolved == mcp_real or resolved.startswith(mcp_real + os.sep):
return True
except Exception:
pass
try:
pairing_real = os.path.realpath(os.path.join(base_real, "pairing"))
if resolved == pairing_real or resolved.startswith(pairing_real + os.sep):
return True
except Exception:
pass
safe_root = get_safe_write_root()
if safe_root and not (resolved == safe_root or resolved.startswith(safe_root + os.sep)):
return True
@@ -148,306 +90,22 @@ def is_write_denied(path: str) -> bool:
return False
# Common secret-bearing project-local environment file basenames.
# These are blocked because .env files routinely contain API keys,
# database passwords, and other credentials.
_BLOCKED_PROJECT_ENV_BASENAMES: set[str] = {
".env",
".env.local",
".env.development",
".env.production",
".env.test",
".env.staging",
".envrc",
}
def get_read_block_error(path: str) -> Optional[str]:
"""Return an error message when a read targets a denied Hermes path.
Three categories are blocked:
* Internal Hermes cache files under ``HERMES_HOME/skills/.hub`` —
readable metadata that an attacker could use as a prompt-injection
carrier.
* Credential / secret stores under HERMES_HOME and the global Hermes
root: ``auth.json``, ``auth.lock``, ``.anthropic_oauth.json``,
``.env``, ``webhook_subscriptions.json``, ``auth/google_oauth.json``,
and anything under ``mcp-tokens/``. These hold plaintext provider keys,
OAuth tokens, and HMAC secrets that the agent never needs to read
directly — provider tools / gateway adapters consume them through
internal channels.
* Project-local environment files anywhere on disk: ``.env``,
``.env.local``, ``.env.development``, ``.env.production``,
``.env.test``, ``.env.staging``, ``.envrc``. These routinely hold
API keys, database passwords, and other credentials for the user's
own projects. The agent helping debug a project shouldn't normally
need to read these — ``.env.example`` is the documented-shape
substitute.
**This is NOT a security boundary.** The terminal tool runs as the
same OS user with shell access; the agent can still ``cat auth.json``
or ``cat ~/.hermes/.env`` and exfiltrate the file. The read-deny exists
as defense-in-depth that:
* Returns a clear error to models that respect tool denials, which
empirically prompts most modern models to stop rather than reach
for the shell.
* Surfaces a visible audit trail when something tries to read
credentials — easier to spot in logs than a generic ``cat``.
Treat any user-visible framing around this as "may help" rather than
"stops attackers." A determined model or malicious instruction can
always shell out.
Callers that resolve relative paths against a non-process cwd
(e.g. ``TERMINAL_CWD`` in ``tools/file_tools.py``) MUST pre-resolve
and pass the absolute path string. This function's own ``resolve()``
is anchored at the Python process cwd, so a relative input like
``"auth.json"`` would otherwise miss the denylist when the task's
terminal cwd differs from the process cwd.
"""
"""Return an error message when a read targets internal Hermes cache files."""
resolved = Path(path).expanduser().resolve()
# Resolve BOTH the active HERMES_HOME (profile-aware) AND the global
# Hermes root so credential stores at <root>/auth.json etc. are also
# blocked when running under a profile (HERMES_HOME points at
# <root>/profiles/<name> in profile mode). Same shape as the write
# deny widening (#15981, #14157).
hermes_dirs: list[Path] = []
for base in (_hermes_home_path(), _hermes_root_path()):
hermes_home = _hermes_home_path().resolve()
blocked_dirs = [
hermes_home / "skills" / ".hub" / "index-cache",
hermes_home / "skills" / ".hub",
]
for blocked in blocked_dirs:
try:
real = base.resolve()
if real not in hermes_dirs:
hermes_dirs.append(real)
except Exception:
continue
# Skills .hub: prompt-injection carriers.
for hd in hermes_dirs:
blocked_dirs = [
hd / "skills" / ".hub" / "index-cache",
hd / "skills" / ".hub",
]
for blocked in blocked_dirs:
try:
resolved.relative_to(blocked)
except ValueError:
continue
return (
f"Access denied: {path} is an internal Hermes cache file "
"and cannot be read directly to prevent prompt injection. "
"Use the skills_list or skill_view tools instead."
)
# Credential / secret stores. Exact-file matches under either
# HERMES_HOME or <root>.
credential_file_names = (
"auth.json",
"auth.lock",
".anthropic_oauth.json",
".env",
"webhook_subscriptions.json",
os.path.join("auth", "google_oauth.json"),
# Bitwarden Secrets Manager disk cache: stores plaintext secret values
# to avoid re-fetching across back-to-back CLI invocations. The file
# was introduced by #31968 but not added to this guard.
os.path.join("cache", "bws_cache.json"),
)
for hd in hermes_dirs:
for name in credential_file_names:
try:
blocked = (hd / name).resolve()
except Exception:
continue
if resolved == blocked:
return (
f"Access denied: {path} is a Hermes credential store "
"and cannot be read directly. Provider tools consume "
"these credentials through internal channels. "
"(Defense-in-depth — not a security boundary; the "
"terminal tool can still bypass.)"
)
# mcp-tokens/: directory prefix match — anything inside is OAuth
# token material.
for hd in hermes_dirs:
try:
mcp_tokens = (hd / "mcp-tokens").resolve()
except Exception:
continue
if resolved == mcp_tokens:
return (
f"Access denied: {path} is the Hermes MCP token directory "
"and cannot be read directly. (Defense-in-depth — not a "
"security boundary; the terminal tool can still bypass.)"
)
try:
resolved.relative_to(mcp_tokens)
resolved.relative_to(blocked)
except ValueError:
continue
return (
f"Access denied: {path} is a Hermes MCP token file "
"and cannot be read directly. (Defense-in-depth — not a "
"security boundary; the terminal tool can still bypass.)"
f"Access denied: {path} is an internal Hermes cache file "
"and cannot be read directly to prevent prompt injection. "
"Use the skills_list or skill_view tools instead."
)
# Block common secret-bearing project-local .env files anywhere on disk.
# The agent helping a user with their project rarely needs to read raw
# .env contents — .env.example is the documented-shape substitute. The
# terminal tool can still ``cat .env``; this is defense-in-depth, not a
# boundary (see module docstring).
if resolved.name in _BLOCKED_PROJECT_ENV_BASENAMES:
return (
f"Access denied: {path} is a secret-bearing environment file "
"and cannot be read to prevent credential leakage. "
"If you need to check the file structure, read .env.example instead. "
"(Defense-in-depth — not a security boundary; the terminal tool can still bypass.)"
)
return None
# ---------------------------------------------------------------------------
# Cross-profile write guard (#TBD)
#
# Hermes profiles are separate HERMES_HOME dirs under
# ``<root>/profiles/<name>/``. Each profile has its own skills/, plugins/,
# cron/, memories/. When an agent runs under one profile, writing into
# ANOTHER profile's directories is almost always wrong — those skills /
# plugins / cron jobs / memories affect a different session the user runs
# from a different shell.
#
# Soft guard, NOT a security boundary: the agent runs as the same OS user
# and has unrestricted terminal access, so this returns a warning the model
# can choose to honor or override with ``cross_profile=True``. Same shape
# as the dangerous-command approval flow — the agent is told the boundary
# exists, and explicit user direction is required to cross it.
#
# Reference: May 2026 incident where a hermes-security profile session
# edited skills under both ``~/.hermes/profiles/hermes-security/skills/``
# AND ``~/.hermes/skills/`` (the default profile's skills) without realizing
# the second path belonged to a different profile.
# ---------------------------------------------------------------------------
# Profile-scoped directories under HERMES_HOME / <root> / <root>/profiles/<X>/
# that should be guarded. Adding a new area here extends the guard with no
# other code change.
PROFILE_SCOPED_AREAS = ("skills", "plugins", "cron", "memories")
def _resolve_active_profile_name() -> str:
"""Return the active profile name derived from HERMES_HOME.
``~/.hermes`` -> ``"default"``
``~/.hermes/profiles/X`` -> ``"X"``
Falls back to ``"default"`` on any resolution failure so the guard
never raises into the tool path.
"""
try:
home_real = _hermes_home_path().resolve()
root_real = _hermes_root_path().resolve()
except (OSError, RuntimeError):
return "default"
profiles_dir = root_real / "profiles"
try:
rel = home_real.relative_to(profiles_dir)
parts = rel.parts
if len(parts) >= 1:
return parts[0]
except ValueError:
pass
return "default"
def classify_cross_profile_target(path: str) -> Optional[dict]:
"""Classify a write target as cross-profile if it lands in another
profile's scoped area (skills/plugins/cron/memories).
Returns ``None`` when the target is outside Hermes scope, or is inside
the ACTIVE profile, or doesn't hit a profile-scoped area. Otherwise
returns a dict with:
* ``active_profile``: name of the profile the agent is running as
* ``target_profile``: name of the profile the path belongs to
* ``area``: which scoped area (``"skills"``, ``"plugins"``, etc.)
* ``target_path``: the resolved path string
The caller decides what to do with the result — surface a warning to
the model, prompt the user, or (with explicit consent /
``cross_profile=True``) proceed anyway.
"""
try:
target = Path(os.path.expanduser(str(path))).resolve()
root_real = _hermes_root_path().resolve()
except (OSError, RuntimeError):
return None
target_profile: Optional[str] = None
area: Optional[str] = None
try:
rel = target.relative_to(root_real)
except ValueError:
return None
parts = rel.parts
if not parts:
return None
if parts[0] in PROFILE_SCOPED_AREAS:
# ``<root>/<area>/...`` → default profile.
target_profile = "default"
area = parts[0]
elif (
parts[0] == "profiles"
and len(parts) >= 3
and parts[2] in PROFILE_SCOPED_AREAS
):
# ``<root>/profiles/<name>/<area>/...`` → named profile.
target_profile = parts[1]
area = parts[2]
else:
return None
active_profile = _resolve_active_profile_name()
if target_profile == active_profile:
# In-profile write — not a cross-profile event.
return None
return {
"active_profile": active_profile,
"target_profile": target_profile,
"area": area,
"target_path": str(target),
}
def get_cross_profile_warning(path: str) -> Optional[str]:
"""Return a model-facing warning string when ``path`` is cross-profile.
Returns ``None`` when the write is in-scope (same profile) or outside
Hermes entirely. Caller is expected to surface the warning to the
agent as a tool-result error, NOT to silently allow the write — the
agent must either get explicit user direction to proceed, or pass
``cross_profile=True`` to its write tool.
This is defense-in-depth: the terminal tool runs as the same OS user
and can write any of these paths without going through this guard.
Treat the guard as a confusion-reducer, not a security boundary.
"""
info = classify_cross_profile_target(path)
if info is None:
return None
return (
f"Cross-profile write blocked by soft guard: {info['target_path']} "
f"belongs to Hermes profile {info['target_profile']!r}, but the "
f"agent is running under profile {info['active_profile']!r}. "
f"Editing another profile's {info['area']}/ will affect that "
f"profile's future sessions, not the one you are currently in. "
f"Confirm with the user before proceeding. To bypass this guard "
f"after explicit user direction, retry the call with "
f"``cross_profile=True``. (Defense-in-depth — not a security "
f"boundary; the terminal tool can still bypass.)"
)

View File

@@ -450,13 +450,7 @@ def _make_stream_chunk(
finish_reason: Optional[str] = None,
reasoning: str = "",
) -> _GeminiStreamChunk:
delta_kwargs: Dict[str, Any] = {
"role": "assistant",
"content": None,
"tool_calls": None,
"reasoning": None,
"reasoning_content": None,
}
delta_kwargs: Dict[str, Any] = {"role": "assistant"}
if content:
delta_kwargs["content"] = content
if tool_call_delta is not None:

View File

@@ -31,6 +31,7 @@ import json
import logging
import time
import urllib.error
import urllib.parse
import urllib.request
import uuid
from dataclasses import dataclass, field

View File

@@ -59,7 +59,7 @@ from dataclasses import dataclass
from pathlib import Path
from typing import Any, Dict, Optional, Tuple
from hermes_constants import get_hermes_home, secure_parent_dir
from hermes_constants import get_hermes_home
logger = logging.getLogger(__name__)
@@ -491,8 +491,10 @@ def save_credentials(creds: GoogleCredentials) -> Path:
path.parent.mkdir(parents=True, exist_ok=True)
# Tighten parent dir to 0o700 so siblings can't traverse to the creds file.
# On Windows this is a no-op (POSIX mode bits aren't enforced); ignore failures.
# secure_parent_dir refuses to chmod / or top-level dirs (#25821).
secure_parent_dir(path)
try:
os.chmod(path.parent, 0o700)
except OSError:
pass
payload = json.dumps(creds.to_dict(), indent=2, sort_keys=True) + "\n"
with _credentials_lock():
@@ -656,7 +658,7 @@ def get_valid_access_token(*, force_refresh: bool = False) -> str:
creds = load_credentials()
if creds is None:
raise GoogleOAuthError(
"No Google OAuth credentials found. Run `hermes auth add google-gemini-cli` first.",
"No Google OAuth credentials found. Run `hermes login --provider google-gemini-cli` first.",
code="google_oauth_not_logged_in",
)
@@ -899,15 +901,7 @@ def start_oauth_flow(
try:
import webbrowser
try:
from hermes_cli.auth import (
_can_open_graphical_browser as _can_open_gui,
)
except Exception:
_can_open_gui = lambda: True # noqa: E731
if _can_open_gui():
webbrowser.open(auth_url, new=1, autoraise=True)
webbrowser.open(auth_url, new=1, autoraise=True)
except Exception as exc:
logger.debug("webbrowser.open failed: %s", exc)

View File

@@ -191,88 +191,6 @@ def save_b64_image(
return path
# Extension inference for save_url_image — keep small and explicit. We don't
# want to import mimetypes for a handful of formats every image_gen provider
# actually returns, and we never want to inherit a content-type that points
# at HTML or JSON when the API gives us a degenerate response.
_URL_IMAGE_CONTENT_TYPES = {
"image/png": "png",
"image/jpeg": "jpg",
"image/jpg": "jpg",
"image/webp": "webp",
"image/gif": "gif",
}
def save_url_image(
url: str,
*,
prefix: str = "image",
timeout: float = 60.0,
max_bytes: int = 25 * 1024 * 1024,
) -> Path:
"""Download an image URL and write it under ``$HERMES_HOME/cache/images/``.
Used by providers (xAI, fallback OpenAI) whose API returns an *ephemeral*
URL instead of inline base64 — those URLs frequently expire before a
downstream consumer (Telegram ``send_photo``, browser fetch) can resolve
them, so we materialise the bytes locally at tool-completion time.
Mirrors :func:`save_b64_image`'s shape so providers can swap in one line.
Returns the absolute :class:`Path` to the saved file. Raises on any
network / HTTP / oversize / non-image-content-type error so callers can
fall back to returning the bare URL with a clear error message.
"""
import requests
response = requests.get(url, timeout=timeout, stream=True)
response.raise_for_status()
# Infer extension from the response content-type, falling back to the
# URL suffix when xAI / OpenAI omit a precise type (some CDNs return
# ``application/octet-stream``). Defaults to ``png``.
content_type = (response.headers.get("Content-Type") or "").split(";", 1)[0].strip().lower()
extension = _URL_IMAGE_CONTENT_TYPES.get(content_type)
if extension is None:
url_path = url.split("?", 1)[0].lower()
for ext in ("png", "jpg", "jpeg", "webp", "gif"):
if url_path.endswith(f".{ext}"):
extension = "jpg" if ext == "jpeg" else ext
break
if extension is None:
extension = "png"
ts = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
short = uuid.uuid4().hex[:8]
path = _images_cache_dir() / f"{prefix}_{ts}_{short}.{extension}"
bytes_written = 0
with path.open("wb") as fh:
for chunk in response.iter_content(chunk_size=64 * 1024):
if not chunk:
continue
bytes_written += len(chunk)
if bytes_written > max_bytes:
fh.close()
try:
path.unlink()
except OSError:
pass
raise ValueError(
f"Image at {url} exceeds {max_bytes // (1024 * 1024)}MB cap; refusing to cache."
)
fh.write(chunk)
if bytes_written == 0:
try:
path.unlink()
except OSError:
pass
raise ValueError(f"Image at {url} returned 0 bytes; refusing to cache.")
return path
def success_response(
*,
image: str,

View File

@@ -77,17 +77,6 @@ def get_active_provider() -> Optional[ImageGenProvider]:
Reads ``image_gen.provider`` from config.yaml; falls back per the
module docstring.
**Availability semantics** (mirrors :mod:`agent.web_search_registry`):
- When ``image_gen.provider`` is explicitly set, the configured
provider is returned even if :meth:`ImageGenProvider.is_available`
reports False — the dispatcher surfaces a precise "X_API_KEY is not
set" error rather than silently switching backends.
- When ``image_gen.provider`` is unset, the fallback path (single-
provider shortcut and the FAL legacy preference) is filtered by
``is_available()`` so we don't pick a provider the user has no
credentials for.
"""
configured: Optional[str] = None
try:
@@ -105,17 +94,6 @@ def get_active_provider() -> Optional[ImageGenProvider]:
with _lock:
snapshot = dict(_providers)
def _is_available_safe(p: ImageGenProvider) -> bool:
"""Wrap ``is_available()`` so a buggy provider doesn't kill resolution."""
try:
return bool(p.is_available())
except Exception as exc: # noqa: BLE001
logger.debug("image_gen provider %s.is_available() raised %s", p.name, exc)
return False
# 1. Explicit config wins — return regardless of is_available() so the
# user gets a precise downstream error message rather than a silent
# backend switch.
if configured:
provider = snapshot.get(configured)
if provider is not None:
@@ -125,16 +103,13 @@ def get_active_provider() -> Optional[ImageGenProvider]:
configured,
)
# 2. Fallback: single registered provider — but only if it's actually
# available (no credentials = don't surface it as "active").
available = [p for p in snapshot.values() if _is_available_safe(p)]
if len(available) == 1:
return available[0]
# Fallback: single-provider case
if len(snapshot) == 1:
return next(iter(snapshot.values()))
# 3. Fallback: prefer legacy FAL for backward compat, when available.
fal = snapshot.get("fal")
if fal is not None and _is_available_safe(fal):
return fal
# Fallback: prefer legacy FAL for backward compat
if "fal" in snapshot:
return snapshot["fal"]
return None

View File

@@ -37,8 +37,6 @@ from __future__ import annotations
import base64
import logging
import mimetypes
import os
import re
from pathlib import Path
from typing import Any, Dict, List, Optional, Tuple
@@ -48,180 +46,6 @@ logger = logging.getLogger(__name__)
_VALID_MODES = frozenset({"auto", "native", "text"})
# Image extensions used by extract_image_refs(). Kept tight on purpose — we
# only auto-attach things the model can actually see. Documents/archives are
# excluded because the gateway's broader extract_local_files() also routes
# them differently (send_document), and we don't want to attach a PDF as a
# vision part.
_IMAGE_EXTS = (
".png", ".jpg", ".jpeg", ".gif", ".webp", ".bmp", ".tiff", ".tif", ".heic",
)
_IMAGE_EXT_PATTERN = "|".join(e.lstrip(".") for e in _IMAGE_EXTS)
# Absolute / home-relative local image path. Matches the same shape gateway's
# extract_local_files() uses: anchors to ``~/`` or ``/``, ignores matches inside
# URLs (the ``(?<![/:\w.])`` lookbehind), and case-insensitive on the extension.
_LOCAL_IMAGE_PATH_RE = re.compile(
r"(?<![/:\w.])(?:~/|/)(?:[\w.\-]+/)*[\w.\-]+\.(?:" + _IMAGE_EXT_PATTERN + r")\b",
re.IGNORECASE,
)
# http(s) URL ending in an image extension (optionally followed by a
# query string). Case-insensitive on the extension. Strict ``http(s)://``
# scheme so we don't accidentally grab ``file://`` URLs or other shapes.
_IMAGE_URL_RE = re.compile(
r"https?://[^\s<>\"']+?\.(?:" + _IMAGE_EXT_PATTERN + r")(?:\?[^\s<>\"']*)?",
re.IGNORECASE,
)
def extract_image_refs(text: str) -> Tuple[List[str], List[str]]:
"""Scan free-form text for image references the model should see.
Returns ``(local_paths, urls)``:
* ``local_paths`` — absolute (``/``) or home-relative (``~/``) paths
whose suffix is an image extension AND whose expanded form exists
on disk as a file. Order-preserving, deduplicated.
* ``urls`` — ``http(s)://…`` URLs whose path ends in an image
extension (a ``?query`` is allowed after the extension).
Order-preserving, deduplicated.
Matches inside fenced code blocks (``` ``` ```) and inline backticks
(`` `…` ``) are skipped so that snippets pasted into a task body for
reference aren't mistaken for live attachments. This mirrors the
behaviour of ``gateway.platforms.base.BaseAdapter.extract_local_files``.
Local paths are validated against the filesystem; URLs are not
(the provider fetches them at request time).
"""
if not isinstance(text, str) or not text:
return [], []
# Build spans covered by fenced code blocks and inline code so we can
# ignore references the author embedded purely as example text.
code_spans: list[tuple[int, int]] = []
for m in re.finditer(r"```[^\n]*\n.*?```", text, re.DOTALL):
code_spans.append((m.start(), m.end()))
for m in re.finditer(r"`[^`\n]+`", text):
code_spans.append((m.start(), m.end()))
def _in_code(pos: int) -> bool:
return any(s <= pos < e for s, e in code_spans)
local_paths: list[str] = []
seen_paths: set[str] = set()
for match in _LOCAL_IMAGE_PATH_RE.finditer(text):
if _in_code(match.start()):
continue
raw = match.group(0)
expanded = os.path.expanduser(raw)
try:
if not os.path.isfile(expanded):
continue
except OSError:
# ENAMETOOLONG / EINVAL on pathological inputs — skip rather than crash.
continue
if expanded in seen_paths:
continue
seen_paths.add(expanded)
local_paths.append(expanded)
urls: list[str] = []
seen_urls: set[str] = set()
for match in _IMAGE_URL_RE.finditer(text):
if _in_code(match.start()):
continue
url = match.group(0)
# Strip trailing punctuation that's almost certainly prose, not part
# of the URL (e.g. "see https://x.com/a.png." or "/a.png)").
url = url.rstrip(".,;:!?)]>")
if url in seen_urls:
continue
seen_urls.add(url)
urls.append(url)
return local_paths, urls
# Strict YAML/JSON boolean coercion for capability overrides.
#
# ``bool("false")`` is True in Python because non-empty strings are truthy, so
# a user writing ``supports_vision: "false"`` (quoted — a common YAML mistake)
# would silently enable native vision routing on a model that can't actually
# handle it. Accept only the values YAML 1.1 / 1.2 treat as booleans, plus
# real ``bool`` and integer 0/1. Anything else returns None so the caller
# falls through to models.dev rather than honouring garbage.
_TRUE_TOKENS = frozenset({"true", "yes", "on", "1"})
_FALSE_TOKENS = frozenset({"false", "no", "off", "0"})
def _coerce_capability_bool(raw: Any) -> Optional[bool]:
"""Return True/False for recognised boolean values, None otherwise."""
if isinstance(raw, bool):
return raw
if isinstance(raw, int):
if raw in (0, 1):
return bool(raw)
return None
if isinstance(raw, str):
s = raw.strip().lower()
if s in _TRUE_TOKENS:
return True
if s in _FALSE_TOKENS:
return False
return None
def _supports_vision_override(
cfg: Optional[Dict[str, Any]],
provider: str,
model: str,
) -> Optional[bool]:
"""Resolve user-declared vision capability from config.yaml.
Resolution order, first hit wins:
1. ``model.supports_vision`` (top-level shortcut for the active model)
2. ``providers.<provider>.models.<model>.supports_vision``
(named custom providers — ``provider`` may be the runtime-resolved
value ``"custom"`` and/or the user-declared name under
``model.provider``; both are tried)
Returns None when no override is set, so the caller falls through to
models.dev. Returns False explicitly only when the user wrote a
recognised boolean false token.
"""
if not isinstance(cfg, dict):
return None
# 1. Top-level shortcut
model_cfg_raw = cfg.get("model")
model_cfg: Dict[str, Any] = model_cfg_raw if isinstance(model_cfg_raw, dict) else {}
top = _coerce_capability_bool(model_cfg.get("supports_vision"))
if top is not None:
return top
# 2. Per-provider, per-model. Named custom providers (e.g. "my-vllm")
# get rewritten to provider="custom" at runtime
# (hermes_cli/runtime_provider.py:_resolve_named_custom_runtime), so the
# config still holds the user-declared name under model.provider. Try
# both as candidate provider keys.
config_provider = str(model_cfg.get("provider") or "").strip()
providers_raw = cfg.get("providers")
providers_cfg: Dict[str, Any] = providers_raw if isinstance(providers_raw, dict) else {}
for p in dict.fromkeys(filter(None, (provider, config_provider))):
entry_raw = providers_cfg.get(p)
entry: Dict[str, Any] = entry_raw if isinstance(entry_raw, dict) else {}
models_raw = entry.get("models")
models_cfg: Dict[str, Any] = models_raw if isinstance(models_raw, dict) else {}
per_model_raw = models_cfg.get(model)
per_model: Dict[str, Any] = per_model_raw if isinstance(per_model_raw, dict) else {}
coerced = _coerce_capability_bool(per_model.get("supports_vision"))
if coerced is not None:
return coerced
return None
def _coerce_mode(raw: Any) -> str:
"""Normalize a config value into one of the valid modes."""
if not isinstance(raw, str):
@@ -257,20 +81,8 @@ def _explicit_aux_vision_override(cfg: Optional[Dict[str, Any]]) -> bool:
return True
def _lookup_supports_vision(
provider: str,
model: str,
cfg: Optional[Dict[str, Any]] = None,
) -> Optional[bool]:
"""Return True/False if we can resolve caps, None if unknown.
Consults the user's ``supports_vision`` override in config.yaml first
(so custom/local models declared as vision-capable don't fall through to
text routing in ``auto`` mode), then falls back to models.dev.
"""
override = _supports_vision_override(cfg, provider, model)
if override is not None:
return override
def _lookup_supports_vision(provider: str, model: str) -> Optional[bool]:
"""Return True/False if we can resolve caps, None if unknown."""
if not provider or not model:
return None
try:
@@ -311,7 +123,7 @@ def decide_image_input_mode(
if _explicit_aux_vision_override(cfg):
return "text"
supports = _lookup_supports_vision(provider, model, cfg)
supports = _lookup_supports_vision(provider, model)
if supports is True:
return "native"
return "text"
@@ -418,29 +230,20 @@ def _file_to_data_url(path: Path) -> Optional[str]:
def build_native_content_parts(
user_text: str,
image_paths: List[str],
image_urls: Optional[List[str]] = None,
) -> Tuple[List[Dict[str, Any]], List[str]]:
"""Build an OpenAI-style ``content`` list for a user turn.
Shape:
[{"type": "text", "text": "...\\n\\n[Image attached at: /local/path]"},
{"type": "image_url", "image_url": {"url": "data:image/png;base64,..."}},
{"type": "image_url", "image_url": {"url": "https://example.com/a.png"}},
...]
Local paths are read from disk and embedded as base64 ``data:`` URLs.
Remote URLs (``http(s)://``) are passed through verbatim — the provider
fetches them server-side. The model still sees the pixels either way.
For each successfully attached image, a hint is appended to the text
part:
* local path → ``[Image attached at: <path>]``
* URL → ``[Image attached: <url>]``
The hint gives the model a string handle so MCP/skill tools that take
an image path or URL argument can be invoked on the same image without
an extra round-trip. This parallels the text-mode hint produced by
The local path of each successfully attached image is appended to the
text part as ``[Image attached at: <path>]``. The model still sees the
pixels via the ``image_url`` part (full native vision); the path note
just gives it a string handle so MCP/skill tools that take an image
path or URL argument can be invoked on the same image without an
extra round-trip. This parallels the text-mode hint produced by
``Runner._enrich_message_with_vision`` (``vision_analyze using image_url:
<path>``) so behaviour is consistent across both image input modes.
@@ -449,14 +252,12 @@ def build_native_content_parts(
ceiling), the agent's retry loop transparently shrinks and retries
once — see ``run_agent._try_shrink_image_parts_in_messages``.
Returns (content_parts, skipped). Skipped entries are local paths
that couldn't be read from disk; URLs are never skipped (they're
not validated here).
Returns (content_parts, skipped_paths). Skipped paths are files that
couldn't be read from disk and are NOT advertised in the path hints.
"""
skipped: List[str] = []
image_parts: List[Dict[str, Any]] = []
attached_paths: List[str] = []
attached_urls: List[str] = []
for raw_path in image_paths:
p = Path(raw_path)
@@ -473,26 +274,16 @@ def build_native_content_parts(
})
attached_paths.append(str(raw_path))
for url in image_urls or []:
url = (url or "").strip()
if not url:
continue
image_parts.append({
"type": "image_url",
"image_url": {"url": url},
})
attached_urls.append(url)
text = (user_text or "").strip()
# If at least one image attached, build a single text part that combines
# the user's caption (or a neutral default) with one hint per image.
if attached_paths or attached_urls:
# the user's caption (or a neutral default) with one path hint per image.
if attached_paths:
base_text = text or "What do you see in this image?"
hint_lines: List[str] = []
hint_lines.extend(f"[Image attached at: {p}]" for p in attached_paths)
hint_lines.extend(f"[Image attached: {u}]" for u in attached_urls)
combined_text = f"{base_text}\n\n" + "\n".join(hint_lines)
path_hints = "\n".join(
f"[Image attached at: {p}]" for p in attached_paths
)
combined_text = f"{base_text}\n\n{path_hints}"
parts: List[Dict[str, Any]] = [{"type": "text", "text": combined_text}]
parts.extend(image_parts)
return parts, skipped
@@ -507,5 +298,4 @@ def build_native_content_parts(
__all__ = [
"decide_image_input_mode",
"build_native_content_parts",
"extract_image_refs",
]

View File

@@ -1,62 +0,0 @@
"""Per-agent iteration budget — thread-safe consume/refund counter.
Extracted from ``run_agent.py``. Each ``AIAgent`` instance (parent or
subagent) holds an :class:`IterationBudget`; the parent's cap comes from
``max_iterations`` (default 90), each subagent's cap comes from
``delegation.max_iterations`` (default 50).
``run_agent`` re-exports ``IterationBudget`` so existing
``from run_agent import IterationBudget`` imports keep working unchanged.
"""
from __future__ import annotations
import threading
class IterationBudget:
"""Thread-safe iteration counter for an agent.
Each agent (parent or subagent) gets its own ``IterationBudget``.
The parent's budget is capped at ``max_iterations`` (default 90).
Each subagent gets an independent budget capped at
``delegation.max_iterations`` (default 50) — this means total
iterations across parent + subagents can exceed the parent's cap.
Users control the per-subagent limit via ``delegation.max_iterations``
in config.yaml.
``execute_code`` (programmatic tool calling) iterations are refunded via
:meth:`refund` so they don't eat into the budget.
"""
def __init__(self, max_total: int):
self.max_total = max_total
self._used = 0
self._lock = threading.Lock()
def consume(self) -> bool:
"""Try to consume one iteration. Returns True if allowed."""
with self._lock:
if self._used >= self.max_total:
return False
self._used += 1
return True
def refund(self) -> None:
"""Give back one iteration (e.g. for execute_code turns)."""
with self._lock:
if self._used > 0:
self._used -= 1
@property
def used(self) -> int:
with self._lock:
return self._used
@property
def remaining(self) -> int:
with self._lock:
return max(0, self.max_total - self._used)
__all__ = ["IterationBudget"]

View File

@@ -1,39 +0,0 @@
"""Best-effort early import for the OpenAI SDK's native streaming parser.
The OpenAI SDK imports ``jiter`` while constructing streaming chat-completion
responses. On some Windows installs the native extension can be imported
directly from the Hermes venv, but the first import fails when it happens later
inside the threaded streaming request path. Loading it once during agent
package import avoids that import-order failure while preserving the normal
SDK error path for genuinely missing or broken installs.
"""
from __future__ import annotations
import importlib
_JITER_PRELOADED = False
_JITER_PRELOAD_ERROR: Exception | None = None
def preload_jiter_native_extension() -> bool:
"""Import jiter's native extension early if it is available."""
global _JITER_PRELOADED, _JITER_PRELOAD_ERROR
if _JITER_PRELOADED:
return True
try:
importlib.import_module("jiter.jiter")
from jiter import from_json as _from_json # noqa: F401
except Exception as exc:
_JITER_PRELOAD_ERROR = exc
return False
_JITER_PRELOADED = True
_JITER_PRELOAD_ERROR = None
return True
preload_jiter_native_extension()

View File

@@ -1,106 +0,0 @@
"""Language Server Protocol (LSP) integration for Hermes Agent.
Hermes runs full language servers (pyright, gopls, rust-analyzer,
typescript-language-server, etc.) as subprocesses and pipes their
``textDocument/publishDiagnostics`` output into the post-write lint
delta filter used by ``write_file`` and ``patch``.
LSP is **gated on git workspace detection** — if the agent's cwd is
inside a git repository, LSP runs against that workspace; otherwise the
file_operations layer falls back to its existing in-process syntax
checks. This keeps users on user-home cwd's (e.g. Telegram gateway
chats) from spawning daemons they don't need.
Public API:
from agent.lsp import get_service
svc = get_service()
if svc and svc.enabled_for(path):
await svc.touch_file(path)
diags = svc.diagnostics_for(path)
The bulk of the wiring is internal — most callers only need the layer
in :func:`tools.file_operations.FileOperations._check_lint_delta`,
which is already wired (see that module).
Architecture is documented in ``website/docs/user-guide/features/lsp.md``.
"""
from __future__ import annotations
import atexit
import logging
import threading
from typing import Optional
from agent.lsp.manager import LSPService
logger = logging.getLogger("agent.lsp")
_service: Optional[LSPService] = None
_atexit_registered = False
_service_lock = threading.Lock()
def get_service() -> Optional[LSPService]:
"""Return the process-wide LSP service singleton, or None when disabled.
The service is created lazily on first call. ``None`` is returned
when LSP is disabled in config, when no workspace can be detected,
or when the platform doesn't support subprocess-based LSP servers.
On first creation, registers an :mod:`atexit` handler that tears
down spawned language servers on Python exit so a long-running
CLI or gateway session doesn't leak pyright/gopls/etc. processes
when it terminates.
"""
global _service, _atexit_registered
if _service is not None:
return _service if _service.is_active() else None
with _service_lock:
if _service is not None:
return _service if _service.is_active() else None
_service = LSPService.create_from_config()
if not _atexit_registered:
# ``atexit`` handlers run in LIFO order on normal Python
# exit and on SystemExit, but NOT on os._exit() or
# uncaught signals. Language servers are stateless
# subprocesses — losing them on SIGKILL is fine; they'll
# be reaped by the kernel along with their parent. We
# care about clean exits where Python flushes stdio
# before terminating; without this hook every
# ``hermes chat`` exit would leak pyright processes that
# outlive the parent for a few seconds while their
# stdout buffers drain.
atexit.register(_atexit_shutdown)
_atexit_registered = True
return _service if (_service is not None and _service.is_active()) else None
def shutdown_service() -> None:
"""Tear down the LSP service if one was started.
Safe to call multiple times; safe to call when no service was created.
"""
global _service
with _service_lock:
svc = _service
_service = None
if svc is not None:
try:
svc.shutdown()
except Exception as e: # noqa: BLE001
logger.debug("LSP shutdown error: %s", e)
def _atexit_shutdown() -> None:
"""atexit-registered wrapper. Logs at debug because by the time
atexit fires the user has already seen the agent's final output —
a noisy shutdown line on top of that is just clutter."""
try:
shutdown_service()
except Exception as e: # noqa: BLE001
logger.debug("atexit LSP shutdown failed: %s", e)
__all__ = ["get_service", "shutdown_service", "LSPService"]

View File

@@ -1,306 +0,0 @@
"""``hermes lsp`` CLI subcommand.
Subcommands:
- ``status`` — show service state, configured servers, install status.
- ``install <server_id>`` — eagerly install one server's binary.
- ``install-all`` — try to install every server with a known recipe.
- ``restart`` — tear down running clients so the next edit re-spawns.
- ``which <server_id>`` — print the resolved binary path for one server.
- ``list`` — print the registry of supported servers.
The handlers are kept here (rather than in
``hermes_cli/main.py``) so the LSP module ships self-contained.
"""
from __future__ import annotations
import argparse
import sys
def register_subparser(subparsers: argparse._SubParsersAction) -> None:
"""Wire the ``hermes lsp`` subcommand tree into the main argparse."""
parser = subparsers.add_parser(
"lsp",
help="Language Server Protocol management",
description=(
"Manage the LSP layer that powers post-write semantic "
"diagnostics in write_file/patch."
),
)
sub = parser.add_subparsers(dest="lsp_command")
sub_status = sub.add_parser("status", help="Show LSP service status")
sub_status.add_argument(
"--json", action="store_true", help="Emit machine-readable JSON"
)
sub_list = sub.add_parser("list", help="List supported language servers")
sub_list.add_argument(
"--installed-only",
action="store_true",
help="Only show servers whose binary is currently available",
)
sub_install = sub.add_parser("install", help="Install a server binary")
sub_install.add_argument("server", help="Server id (e.g. pyright, gopls)")
sub_install_all = sub.add_parser(
"install-all",
help="Install every server with a known auto-install recipe",
)
sub_install_all.add_argument(
"--include-manual",
action="store_true",
help="Even attempt servers marked manual-install (best effort)",
)
sub_restart = sub.add_parser(
"restart",
help="Tear down running LSP clients (next edit re-spawns)",
)
sub_which = sub.add_parser("which", help="Print binary path for a server")
sub_which.add_argument("server", help="Server id")
parser.set_defaults(func=run_lsp_command)
def run_lsp_command(args: argparse.Namespace) -> int:
"""Top-level dispatcher for ``hermes lsp <subcommand>``."""
sub = getattr(args, "lsp_command", None) or "status"
try:
if sub == "status":
return _cmd_status(getattr(args, "json", False))
if sub == "list":
return _cmd_list(getattr(args, "installed_only", False))
if sub == "install":
return _cmd_install(args.server)
if sub == "install-all":
return _cmd_install_all(getattr(args, "include_manual", False))
if sub == "restart":
return _cmd_restart()
if sub == "which":
return _cmd_which(args.server)
sys.stderr.write(f"unknown lsp subcommand: {sub}\n")
return 2
except KeyboardInterrupt:
return 130
def _cmd_status(emit_json: bool) -> int:
from agent.lsp import get_service
from agent.lsp.servers import SERVERS
from agent.lsp.install import detect_status
svc = get_service()
service_active = svc is not None
info = svc.get_status() if svc is not None else {"enabled": False}
if emit_json:
import json
payload = {
"service": info,
"registry": [
{
"server_id": s.server_id,
"extensions": list(s.extensions),
"description": s.description,
"binary_status": detect_status(_recipe_pkg_for(s.server_id)),
}
for s in SERVERS
],
}
sys.stdout.write(json.dumps(payload, indent=2) + "\n")
return 0
out = []
out.append("LSP Service")
out.append("===========")
out.append(f" enabled: {info.get('enabled', False)}")
if service_active:
out.append(f" wait_mode: {info.get('wait_mode')}")
out.append(f" wait_timeout: {info.get('wait_timeout')}s")
out.append(f" install_strategy:{info.get('install_strategy')}")
clients = info.get("clients") or []
if clients:
out.append(f" active clients: {len(clients)}")
for c in clients:
out.append(
f" - {c['server_id']:20s} state={c['state']:10s} root={c['workspace_root']}"
)
else:
out.append(" active clients: none")
broken = info.get("broken") or []
if broken:
out.append(f" broken pairs: {len(broken)}")
for b in broken:
out.append(f" - {b}")
disabled = info.get("disabled_servers") or []
if disabled:
out.append(f" disabled in cfg: {', '.join(disabled)}")
# Surface backend-tool gaps that aren't visible in the registry table:
# some servers spawn fine but emit no diagnostics without a sidecar
# binary (bash-language-server -> shellcheck).
backend_warnings = _backend_warnings()
if backend_warnings:
out.append("")
out.append("Backend warnings")
out.append("================")
for line in backend_warnings:
out.append(f" ! {line}")
out.append("")
out.append("Registered Servers")
out.append("==================")
for s in SERVERS:
pkg = _recipe_pkg_for(s.server_id)
status = detect_status(pkg)
marker = {
"installed": "",
"missing": "·",
"manual-only": "?",
}.get(status, " ")
ext_summary = ", ".join(list(s.extensions)[:5])
if len(s.extensions) > 5:
ext_summary += f", … (+{len(s.extensions) - 5})"
out.append(
f" {marker} {s.server_id:24s} [{status:11s}] {ext_summary}"
)
if s.description:
out.append(f" {s.description}")
sys.stdout.write("\n".join(out) + "\n")
return 0
def _cmd_list(installed_only: bool) -> int:
from agent.lsp.servers import SERVERS
from agent.lsp.install import detect_status
for s in SERVERS:
pkg = _recipe_pkg_for(s.server_id)
status = detect_status(pkg)
if installed_only and status != "installed":
continue
sys.stdout.write(
f"{s.server_id:24s} [{status:11s}] {','.join(s.extensions)}\n"
)
return 0
def _cmd_install(server_id: str) -> int:
from agent.lsp.install import try_install, INSTALL_RECIPES, detect_status
pkg = _recipe_pkg_for(server_id)
pre_status = detect_status(pkg)
if pre_status == "installed":
sys.stdout.write(f"{server_id} already installed\n")
return 0
sys.stdout.write(f"installing {server_id} (pkg={pkg}) ...\n")
sys.stdout.flush()
bin_path = try_install(pkg, "auto")
if bin_path is None:
recipe = INSTALL_RECIPES.get(pkg)
if recipe and recipe.get("strategy") == "manual":
sys.stderr.write(
f"{server_id}: this server requires a manual install. "
f"See documentation.\n"
)
else:
sys.stderr.write(f"{server_id}: install failed (see logs).\n")
return 1
sys.stdout.write(f"installed: {bin_path}\n")
return 0
def _cmd_install_all(include_manual: bool) -> int:
from agent.lsp.servers import SERVERS
from agent.lsp.install import try_install, INSTALL_RECIPES, detect_status
rc = 0
for s in SERVERS:
pkg = _recipe_pkg_for(s.server_id)
recipe = INSTALL_RECIPES.get(pkg)
if recipe is None:
continue
if recipe.get("strategy") == "manual" and not include_manual:
continue
if detect_status(pkg) == "installed":
sys.stdout.write(f" {s.server_id:24s} already installed\n")
continue
sys.stdout.write(f" installing {s.server_id} (pkg={pkg}) ... ")
sys.stdout.flush()
path = try_install(pkg, "auto")
if path:
sys.stdout.write(f"ok ({path})\n")
else:
sys.stdout.write("FAILED\n")
rc = 1
return rc
def _cmd_restart() -> int:
from agent.lsp import shutdown_service
shutdown_service()
sys.stdout.write("LSP service shut down. Next edit will respawn clients.\n")
return 0
def _cmd_which(server_id: str) -> int:
from agent.lsp.install import INSTALL_RECIPES, hermes_lsp_bin_dir
import shutil as _shutil
recipe = INSTALL_RECIPES.get(server_id)
bin_name = (recipe or {}).get("bin", server_id)
staged = hermes_lsp_bin_dir() / bin_name
if staged.exists():
sys.stdout.write(str(staged) + "\n")
return 0
on_path = _shutil.which(bin_name)
if on_path:
sys.stdout.write(on_path + "\n")
return 0
sys.stderr.write(f"{server_id}: not installed\n")
return 1
def _recipe_pkg_for(server_id: str) -> str:
"""Map a registry ``server_id`` to its install-recipe package key."""
# The mapping lives here (not in install.py) because it's a CLI
# convenience layer. Most server_ids are also their own recipe
# key, but a few differ (e.g. ``vue-language-server`` →
# ``@vue/language-server``).
aliases = {
"vue-language-server": "@vue/language-server",
"astro-language-server": "@astrojs/language-server",
"dockerfile-ls": "dockerfile-language-server-nodejs",
"typescript": "typescript-language-server",
}
return aliases.get(server_id, server_id)
def _backend_warnings() -> list:
"""Return human-readable notes about LSP backend tools that are missing
in a way that won't surface elsewhere.
Some language servers ship as thin wrappers around an external CLI for
actual diagnostics — they spawn cleanly but never emit any errors when
the sidecar binary isn't on PATH. bash-language-server / shellcheck
is the load-bearing example.
Returned strings are short, actionable, and include the install
suggestion across common platforms.
"""
import shutil as _shutil
from agent.lsp.install import hermes_lsp_bin_dir
notes: list = []
bash_installed = _shutil.which("bash-language-server") is not None or (
(hermes_lsp_bin_dir() / "bash-language-server").exists()
)
if bash_installed and _shutil.which("shellcheck") is None:
notes.append(
"bash-language-server is installed but shellcheck is missing — "
"diagnostics will be empty (apt: shellcheck, brew: shellcheck, "
"scoop: shellcheck)."
)
return notes

View File

@@ -1,930 +0,0 @@
"""Async LSP client over stdin/stdout.
One :class:`LSPClient` corresponds to one ``(language_server, workspace_root)``
pair — exactly what OpenCode keys clients on, and the same shape Claude
Code uses. The client owns a child process, drives the JSON-RPC
exchange, and exposes:
- :meth:`open_file` / :meth:`change_file` — text document sync
- :meth:`wait_for_diagnostics` — block until the server emits fresh
diagnostics for a specific file (or a timeout fires)
- :meth:`diagnostics_for` — read the current per-file diagnostic store
- :meth:`shutdown` — graceful close + SIGTERM/SIGKILL fallback
The class is designed for async use from a single asyncio event loop.
The :class:`agent.lsp.manager.LSPService` runs an event loop in a
background thread so the synchronous file_operations layer can call
into it via :func:`agent.lsp.manager.LSPService.touch_file`.
Implementation notes:
- Push diagnostics are stored per-URI in :attr:`_push_diagnostics` from
``textDocument/publishDiagnostics`` notifications. Pull diagnostics
go in :attr:`_pull_diagnostics`. The merged view dedupes by content.
- Whole-document sync. Even when the server advertises incremental
sync, we send a single ``contentChanges`` entry replacing the
entire document. Pretending to be incremental while sending a
full replacement is well-tolerated by every major server and saves
range bookkeeping. See OpenCode's ``client.ts:584-659`` for the
same trick.
- The "touch-file dance": every ``open_file`` call also fires a
``workspace/didChangeWatchedFiles`` notification (CREATED on the
first open, CHANGED thereafter). Some servers (clangd, eslint)
only re-scan when this notification fires, even though the LSP spec
doesn't strictly require it.
- ``ContentModified`` (-32801) errors get retried with exponential
backoff up to 3 times. This matches Claude Code's
``LSPServerInstance.sendRequest``.
"""
from __future__ import annotations
import asyncio
import logging
import os
from pathlib import Path
from typing import Any, Awaitable, Callable, Dict, List, Optional, Set
from urllib.parse import quote, unquote
from agent.lsp.protocol import (
ERROR_CONTENT_MODIFIED,
ERROR_METHOD_NOT_FOUND,
LSPProtocolError,
LSPRequestError,
classify_message,
encode_message,
make_error_response,
make_notification,
make_request,
make_response,
read_message,
)
logger = logging.getLogger("agent.lsp.client")
# Timeouts (seconds) — mirror OpenCode's constants, scaled to seconds.
INITIALIZE_TIMEOUT = 45.0
DIAGNOSTICS_DOCUMENT_WAIT = 5.0
DIAGNOSTICS_FULL_WAIT = 10.0
DIAGNOSTICS_REQUEST_TIMEOUT = 3.0
PUSH_DEBOUNCE = 0.15
SHUTDOWN_GRACE = 1.0 # seconds between SIGTERM and SIGKILL
# Retry policy for transient ContentModified errors.
MAX_CONTENT_MODIFIED_RETRIES = 3
RETRY_BASE_DELAY = 0.5 # 0.5, 1.0, 2.0 — exponential
def file_uri(path: str) -> str:
"""Return ``file://`` URI for an absolute filesystem path.
Mirrors Node's ``pathToFileURL`` — handles spaces, unicode, and
Windows drive letters (``C:\\foo`` → ``file:///C:/foo``).
"""
abs_path = os.path.abspath(path)
if os.name == "nt":
# Windows: backslash → forward slash, prepend extra slash so
# the drive letter shows up as part of the path component.
abs_path = abs_path.replace("\\", "/")
if not abs_path.startswith("/"):
abs_path = "/" + abs_path
return "file://" + quote(abs_path, safe="/:")
def uri_to_path(uri: str) -> str:
"""Inverse of :func:`file_uri`."""
if not uri.startswith("file://"):
return uri
raw = uri[len("file://"):]
if os.name == "nt" and raw.startswith("/") and len(raw) > 2 and raw[2] == ":":
raw = raw[1:] # strip leading slash before drive letter
return os.path.normpath(unquote(raw))
def _end_position(text: str) -> Dict[str, int]:
"""Return the LSP Position at the end of ``text``.
Used to construct a single-range "replace whole document" change
for ``textDocument/didChange`` regardless of the server's declared
sync mode.
"""
if not text:
return {"line": 0, "character": 0}
lines = text.splitlines(keepends=False)
last_line = len(lines) - 1
last_col = len(lines[-1]) if lines else 0
# If the text ends with a trailing newline, ``splitlines`` won't
# represent it. The end position is then the start of the next
# (empty) line — line index is len(lines), column 0.
if text.endswith(("\n", "\r")):
return {"line": last_line + 1, "character": 0}
return {"line": last_line, "character": last_col}
class LSPClient:
"""Async LSP client tied to one server process and one workspace root.
Lifecycle:
c = LSPClient(server_id, workspace_root, command, args, init_options)
await c.start() # spawn + initialize
ver = await c.open_file("/path/to/foo.py")
await c.wait_for_diagnostics("/path/to/foo.py", ver)
diags = c.diagnostics_for("/path/to/foo.py")
await c.shutdown()
"""
# ------------------------------------------------------------------
# construction + lifecycle
# ------------------------------------------------------------------
def __init__(
self,
*,
server_id: str,
workspace_root: str,
command: List[str],
env: Optional[Dict[str, str]] = None,
cwd: Optional[str] = None,
initialization_options: Optional[Dict[str, Any]] = None,
seed_diagnostics_on_first_push: bool = False,
) -> None:
self.server_id = server_id
self.workspace_root = workspace_root
self._command = list(command)
self._env = env
self._cwd = cwd or workspace_root
self._init_options = initialization_options or {}
self._seed_first_push = seed_diagnostics_on_first_push
# Process + streams
self._proc: Optional[asyncio.subprocess.Process] = None
self._stderr_task: Optional[asyncio.Task] = None
self._reader_task: Optional[asyncio.Task] = None
# Request/response correlation
self._next_id: int = 0
self._pending: Dict[int, asyncio.Future] = {}
# Server-side request handlers (server → client requests).
# Kept small and explicit; everything else returns method-not-found.
self._request_handlers: Dict[str, Callable[[Any], Awaitable[Any]]] = {
"window/workDoneProgress/create": self._handle_work_done_create,
"workspace/configuration": self._handle_workspace_configuration,
"client/registerCapability": self._handle_register_capability,
"client/unregisterCapability": self._handle_unregister_capability,
"workspace/workspaceFolders": self._handle_workspace_folders,
"workspace/diagnostic/refresh": self._handle_diagnostic_refresh,
}
# Notifications (server → client) we care about.
self._notification_handlers: Dict[str, Callable[[Any], None]] = {
"textDocument/publishDiagnostics": self._handle_publish_diagnostics,
# Everything else (window/showMessage, $/progress, etc.)
# is silently dropped by default.
}
# Tracked file state — required for didChange version bumps.
self._files: Dict[str, Dict[str, Any]] = {}
# Diagnostic stores, keyed by file path (NOT URI).
self._push_diagnostics: Dict[str, List[Dict[str, Any]]] = {}
self._pull_diagnostics: Dict[str, List[Dict[str, Any]]] = {}
# Per-path "last published" time so wait-for-fresh logic works.
self._published: Dict[str, float] = {}
# Per-path version of the latest push (matches our didChange
# version when the server respects it).
self._published_version: Dict[str, int] = {}
# First-push seen flag, for typescript-style seed-on-first-push.
self._first_push_seen: Set[str] = set()
# Capability registrations — only diagnostic ones are tracked.
self._diagnostic_registrations: Dict[str, Dict[str, Any]] = {}
# State machine
self._state: str = "stopped"
self._initialize_result: Optional[Dict[str, Any]] = None
self._sync_kind: int = 1 # 1=Full, 2=Incremental
self._stopping: bool = False
# Push event for waiters.
self._push_event = asyncio.Event()
# Monotonic counter incremented on every publishDiagnostics push.
# Waiters snapshot it on entry and treat any increase as
# "something happened, recheck the predicate". Avoids the
# asyncio.Event sticky-state trap.
self._push_counter = 0
# Registration change event so wait_for_diagnostics can re-loop
# when the server announces a new dynamic provider.
self._registration_event = asyncio.Event()
@property
def is_running(self) -> bool:
return self._state == "running" and self._proc is not None and self._proc.returncode is None
@property
def state(self) -> str:
return self._state
async def start(self) -> None:
"""Spawn the server and complete the initialize handshake.
Raises any exception encountered during spawn/init. On failure
the process is killed and the client is left in state
``"error"`` — re-call ``start()`` to retry.
"""
if self._state in {"running", "starting"}:
return
self._state = "starting"
try:
await self._spawn()
await self._initialize()
self._state = "running"
except Exception:
self._state = "error"
await self._cleanup_process()
raise
async def _spawn(self) -> None:
env = dict(os.environ)
if self._env:
env.update(self._env)
try:
self._proc = await asyncio.create_subprocess_exec(
self._command[0],
*self._command[1:],
stdin=asyncio.subprocess.PIPE,
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE,
env=env,
cwd=self._cwd,
)
except FileNotFoundError as e:
raise LSPProtocolError(
f"LSP server binary not found: {self._command[0]} ({e})"
) from e
# Drain stderr at debug level — if we don't, the pipe buffer
# fills and the server hangs.
self._stderr_task = asyncio.create_task(self._drain_stderr())
# Start the reader loop.
self._reader_task = asyncio.create_task(self._reader_loop())
async def _drain_stderr(self) -> None:
if self._proc is None or self._proc.stderr is None:
return
try:
while True:
line = await self._proc.stderr.readline()
if not line:
break
text = line.decode("utf-8", errors="replace").rstrip()
if text:
logger.debug("[%s] stderr: %s", self.server_id, text[:1000])
except (asyncio.CancelledError, OSError):
pass
async def _reader_loop(self) -> None:
if self._proc is None or self._proc.stdout is None:
return
try:
while True:
msg = await read_message(self._proc.stdout)
if msg is None:
logger.debug("[%s] server closed stdout cleanly", self.server_id)
break
kind, key = classify_message(msg)
if kind == "response":
self._dispatch_response(key, msg)
elif kind == "request":
asyncio.create_task(self._dispatch_request(key, msg))
elif kind == "notification":
self._dispatch_notification(key, msg)
else:
logger.warning("[%s] dropping invalid message: %r", self.server_id, msg)
except LSPProtocolError as e:
logger.warning("[%s] protocol error in reader loop: %s", self.server_id, e)
except (asyncio.CancelledError, OSError):
pass
finally:
# Wake up any pending requests so they can fail fast.
for fut in list(self._pending.values()):
if not fut.done():
fut.set_exception(LSPProtocolError("server connection closed"))
self._pending.clear()
async def _initialize(self) -> None:
params = {
"rootUri": file_uri(self.workspace_root),
"rootPath": self.workspace_root,
"processId": os.getpid(),
"workspaceFolders": [
{"name": "workspace", "uri": file_uri(self.workspace_root)}
],
"initializationOptions": self._init_options,
"capabilities": {
"window": {"workDoneProgress": True},
"workspace": {
"configuration": True,
"workspaceFolders": True,
"didChangeWatchedFiles": {"dynamicRegistration": True},
"diagnostics": {"refreshSupport": False},
},
"textDocument": {
"synchronization": {
"dynamicRegistration": False,
"didOpen": True,
"didChange": True,
"didSave": True,
"willSave": False,
"willSaveWaitUntil": False,
},
"diagnostic": {
"dynamicRegistration": True,
"relatedDocumentSupport": True,
},
"publishDiagnostics": {
"relatedInformation": True,
"tagSupport": {"valueSet": [1, 2]},
"versionSupport": True,
"codeDescriptionSupport": True,
"dataSupport": False,
},
"hover": {"contentFormat": ["markdown", "plaintext"]},
"definition": {"linkSupport": True},
"references": {},
"documentSymbol": {"hierarchicalDocumentSymbolSupport": True},
},
"general": {"positionEncodings": ["utf-16"]},
},
}
result = await asyncio.wait_for(
self._send_request("initialize", params),
timeout=INITIALIZE_TIMEOUT,
)
self._initialize_result = result
self._sync_kind = self._extract_sync_kind(result.get("capabilities") or {})
await self._send_notification("initialized", {})
if self._init_options:
# Some servers (vtsls, eslint) want config pushed via
# didChangeConfiguration even if it was sent in
# initializationOptions.
await self._send_notification(
"workspace/didChangeConfiguration",
{"settings": self._init_options},
)
@staticmethod
def _extract_sync_kind(capabilities: dict) -> int:
sync = capabilities.get("textDocumentSync")
if isinstance(sync, int):
return sync
if isinstance(sync, dict):
change = sync.get("change")
if isinstance(change, int):
return change
return 1 # default to Full
async def shutdown(self) -> None:
"""Best-effort graceful shutdown.
Sends ``shutdown`` + ``exit``, then SIGTERMs/SIGKILLs the
process if it doesn't exit cleanly. Idempotent.
"""
if self._stopping:
return
self._stopping = True
try:
if self.is_running:
try:
await asyncio.wait_for(self._send_request("shutdown", None), timeout=2.0)
except (asyncio.TimeoutError, LSPRequestError, LSPProtocolError):
pass
try:
await self._send_notification("exit", None)
except Exception:
pass
finally:
self._state = "stopped"
await self._cleanup_process()
async def _cleanup_process(self) -> None:
if self._reader_task is not None and not self._reader_task.done():
self._reader_task.cancel()
try:
await self._reader_task
except (asyncio.CancelledError, Exception): # noqa: BLE001
pass
if self._stderr_task is not None and not self._stderr_task.done():
self._stderr_task.cancel()
try:
await self._stderr_task
except (asyncio.CancelledError, Exception): # noqa: BLE001
pass
proc = self._proc
self._proc = None
if proc is None:
return
if proc.returncode is None:
try:
proc.terminate()
try:
await asyncio.wait_for(proc.wait(), timeout=SHUTDOWN_GRACE)
except asyncio.TimeoutError:
try:
proc.kill()
await proc.wait()
except ProcessLookupError:
pass
except ProcessLookupError:
pass
# ------------------------------------------------------------------
# request / notification plumbing
# ------------------------------------------------------------------
async def _send_request(self, method: str, params: Any) -> Any:
if self._proc is None or self._proc.stdin is None or self._proc.stdin.is_closing():
raise LSPProtocolError(f"cannot send {method!r}: stdin closed")
loop = asyncio.get_running_loop()
req_id = self._next_id
self._next_id += 1
fut: asyncio.Future = loop.create_future()
self._pending[req_id] = fut
try:
self._proc.stdin.write(encode_message(make_request(req_id, method, params)))
await self._proc.stdin.drain()
except (BrokenPipeError, ConnectionResetError, OSError) as e:
self._pending.pop(req_id, None)
raise LSPProtocolError(f"send failed for {method!r}: {e}") from e
try:
return await fut
finally:
self._pending.pop(req_id, None)
async def _send_request_with_retry(self, method: str, params: Any, *, timeout: float) -> Any:
"""Send a request, retrying on ``ContentModified`` (-32801).
Other errors propagate. The retry policy matches Claude Code's
``LSPServerInstance.sendRequest`` — 3 attempts with delays
0.5s, 1.0s, 2.0s.
"""
for attempt in range(MAX_CONTENT_MODIFIED_RETRIES + 1):
try:
return await asyncio.wait_for(self._send_request(method, params), timeout=timeout)
except LSPRequestError as e:
if e.code == ERROR_CONTENT_MODIFIED and attempt < MAX_CONTENT_MODIFIED_RETRIES:
await asyncio.sleep(RETRY_BASE_DELAY * (2 ** attempt))
continue
raise
async def _send_notification(self, method: str, params: Any) -> None:
if self._proc is None or self._proc.stdin is None or self._proc.stdin.is_closing():
return
try:
self._proc.stdin.write(encode_message(make_notification(method, params)))
await self._proc.stdin.drain()
except (BrokenPipeError, ConnectionResetError, OSError) as e:
logger.debug("[%s] notify %s failed: %s", self.server_id, method, e)
async def _send_response(self, req_id: Any, result: Any) -> None:
if self._proc is None or self._proc.stdin is None or self._proc.stdin.is_closing():
return
try:
self._proc.stdin.write(encode_message(make_response(req_id, result)))
await self._proc.stdin.drain()
except (BrokenPipeError, ConnectionResetError, OSError):
pass
async def _send_error_response(self, req_id: Any, code: int, message: str) -> None:
if self._proc is None or self._proc.stdin is None or self._proc.stdin.is_closing():
return
try:
self._proc.stdin.write(encode_message(make_error_response(req_id, code, message)))
await self._proc.stdin.drain()
except (BrokenPipeError, ConnectionResetError, OSError):
pass
def _dispatch_response(self, req_id: int, msg: dict) -> None:
fut = self._pending.get(req_id)
if fut is None or fut.done():
return
if "error" in msg:
err = msg["error"] or {}
fut.set_exception(
LSPRequestError(
code=int(err.get("code", -32000)),
message=str(err.get("message", "unknown")),
data=err.get("data"),
)
)
else:
fut.set_result(msg.get("result"))
async def _dispatch_request(self, req_id: Any, msg: dict) -> None:
method = msg.get("method", "")
params = msg.get("params")
handler = self._request_handlers.get(method)
if handler is None:
await self._send_error_response(req_id, ERROR_METHOD_NOT_FOUND, f"method not found: {method}")
return
try:
result = await handler(params)
except Exception as e: # noqa: BLE001 — protocol must not blow up
logger.warning("[%s] request handler %s failed: %s", self.server_id, method, e)
await self._send_error_response(req_id, -32000, f"handler failed: {e}")
return
await self._send_response(req_id, result)
def _dispatch_notification(self, method: str, msg: dict) -> None:
handler = self._notification_handlers.get(method)
if handler is None:
return
try:
handler(msg.get("params"))
except Exception as e: # noqa: BLE001
logger.debug("[%s] notification handler %s failed: %s", self.server_id, method, e)
# ------------------------------------------------------------------
# built-in server-→-client request handlers
# ------------------------------------------------------------------
async def _handle_work_done_create(self, params: Any) -> Any:
# Acknowledge progress tokens — required by some servers.
return None
async def _handle_workspace_configuration(self, params: Any) -> Any:
# Walk dotted sections through initializationOptions. Mirrors
# OpenCode's `client.ts:198-220` — return null when missing.
if not isinstance(params, dict):
return [None]
items = params.get("items") or []
out: List[Any] = []
for item in items:
if not isinstance(item, dict):
out.append(None)
continue
section = item.get("section")
if not section or not self._init_options:
out.append(self._init_options or None)
continue
cur: Any = self._init_options
for part in str(section).split("."):
if isinstance(cur, dict) and part in cur:
cur = cur[part]
else:
cur = None
break
out.append(cur)
return out
async def _handle_register_capability(self, params: Any) -> Any:
if not isinstance(params, dict):
return None
for reg in params.get("registrations") or []:
if not isinstance(reg, dict):
continue
method = reg.get("method")
reg_id = reg.get("id")
if method == "textDocument/diagnostic" and reg_id:
self._diagnostic_registrations[str(reg_id)] = reg
self._registration_event.set()
return None
async def _handle_unregister_capability(self, params: Any) -> Any:
if not isinstance(params, dict):
return None
for unreg in params.get("unregisterations") or []:
if not isinstance(unreg, dict):
continue
reg_id = unreg.get("id")
if reg_id:
self._diagnostic_registrations.pop(str(reg_id), None)
return None
async def _handle_workspace_folders(self, params: Any) -> Any:
return [{"name": "workspace", "uri": file_uri(self.workspace_root)}]
async def _handle_diagnostic_refresh(self, params: Any) -> Any:
# We don't honour refresh — we re-pull on every touchFile.
return None
# ------------------------------------------------------------------
# publishDiagnostics handler
# ------------------------------------------------------------------
def _handle_publish_diagnostics(self, params: Any) -> None:
if not isinstance(params, dict):
return
uri = params.get("uri")
if not isinstance(uri, str):
return
path = uri_to_path(uri)
diagnostics = params.get("diagnostics") or []
if not isinstance(diagnostics, list):
diagnostics = []
version = params.get("version")
loop_time = asyncio.get_event_loop().time()
if self._seed_first_push and path not in self._first_push_seen:
# First push: seed without firing the event so a waiter
# doesn't resolve on the very first push (which arrives
# before the user-triggered didChange could've produced
# fresh diagnostics).
self._first_push_seen.add(path)
self._push_diagnostics[path] = diagnostics
self._published[path] = loop_time
if isinstance(version, int):
self._published_version[path] = version
return
self._push_diagnostics[path] = diagnostics
self._published[path] = loop_time
if isinstance(version, int):
self._published_version[path] = version
self._first_push_seen.add(path)
# Bump the monotonic push counter and wake every waiter. We
# keep the Event sticky-set so any wait already in progress
# resolves; waiters re-check their predicate after waking and
# decide whether to keep waiting. ``_push_counter`` is what
# they actually compare against to detect a fresh event.
self._push_counter += 1
self._push_event.set()
# ------------------------------------------------------------------
# public file-sync API
# ------------------------------------------------------------------
async def open_file(self, path: str, *, language_id: str = "plaintext") -> int:
"""Send didOpen (first time) or didChange (subsequent) for ``path``.
Returns the new document version number that the agent's
``wait_for_diagnostics`` should match against.
"""
if not self.is_running:
raise LSPProtocolError("client not running")
abs_path = os.path.abspath(path)
try:
text = Path(abs_path).read_text(encoding="utf-8", errors="replace")
except OSError as e:
raise LSPProtocolError(f"cannot read {abs_path}: {e}") from e
uri = file_uri(abs_path)
existing = self._files.get(abs_path)
if existing is not None:
# Re-open: bump version, fire didChangeWatchedFiles + didChange.
await self._send_notification(
"workspace/didChangeWatchedFiles",
{"changes": [{"uri": uri, "type": 2}]}, # 2 = CHANGED
)
new_version = existing["version"] + 1
old_text = existing["text"]
content_changes: List[Dict[str, Any]]
if self._sync_kind == 2:
content_changes = [
{
"range": {
"start": {"line": 0, "character": 0},
"end": _end_position(old_text),
},
"text": text,
}
]
else:
content_changes = [{"text": text}]
await self._send_notification(
"textDocument/didChange",
{
"textDocument": {"uri": uri, "version": new_version},
"contentChanges": content_changes,
},
)
self._files[abs_path] = {"version": new_version, "text": text}
return new_version
# First open: didChangeWatchedFiles CREATED + didOpen.
await self._send_notification(
"workspace/didChangeWatchedFiles",
{"changes": [{"uri": uri, "type": 1}]}, # 1 = CREATED
)
# Clear any stale push/pull entries — fresh open should start
# from scratch.
self._push_diagnostics.pop(abs_path, None)
self._pull_diagnostics.pop(abs_path, None)
self._published.pop(abs_path, None)
self._published_version.pop(abs_path, None)
await self._send_notification(
"textDocument/didOpen",
{
"textDocument": {
"uri": uri,
"languageId": language_id,
"version": 0,
"text": text,
}
},
)
self._files[abs_path] = {"version": 0, "text": text}
return 0
async def save_file(self, path: str) -> None:
"""Send didSave for ``path``. Some linters re-scan only on save."""
if not self.is_running:
return
abs_path = os.path.abspath(path)
await self._send_notification(
"textDocument/didSave",
{"textDocument": {"uri": file_uri(abs_path)}},
)
# ------------------------------------------------------------------
# diagnostics: pull + wait
# ------------------------------------------------------------------
async def _pull_document_diagnostics(self, path: str) -> None:
"""Send ``textDocument/diagnostic`` for one file.
Stores results into :attr:`_pull_diagnostics`. Silently
no-ops on errors (server may not support the pull endpoint).
"""
try:
params: Dict[str, Any] = {
"textDocument": {"uri": file_uri(os.path.abspath(path))}
}
result = await self._send_request_with_retry(
"textDocument/diagnostic",
params,
timeout=DIAGNOSTICS_REQUEST_TIMEOUT,
)
except (LSPRequestError, LSPProtocolError, asyncio.TimeoutError) as e:
logger.debug("[%s] document diagnostic pull failed: %s", self.server_id, e)
return
if not isinstance(result, dict):
return
items = result.get("items")
if isinstance(items, list):
self._pull_diagnostics[os.path.abspath(path)] = items
related = result.get("relatedDocuments")
if isinstance(related, dict):
for uri, sub in related.items():
if not isinstance(sub, dict):
continue
sub_items = sub.get("items")
if isinstance(sub_items, list):
self._pull_diagnostics[uri_to_path(uri)] = sub_items
async def wait_for_diagnostics(
self,
path: str,
version: int,
*,
mode: str = "document",
) -> None:
"""Wait for the server to publish diagnostics for ``path`` at ``version``.
``mode`` is ``"document"`` (5s budget, document pulls) or
``"full"`` (10s budget, also workspace pulls). Best-effort —
returns silently on timeout. Does NOT throw if the server
doesn't support pull diagnostics; we still get the push side.
"""
budget = DIAGNOSTICS_FULL_WAIT if mode == "full" else DIAGNOSTICS_DOCUMENT_WAIT
deadline = asyncio.get_event_loop().time() + budget
abs_path = os.path.abspath(path)
while True:
remaining = deadline - asyncio.get_event_loop().time()
if remaining <= 0:
return
# Concurrent: document pull + push wait.
pull_task = asyncio.create_task(self._pull_document_diagnostics(abs_path))
push_task = asyncio.create_task(self._wait_for_fresh_push(abs_path, version, remaining))
done, pending = await asyncio.wait(
{pull_task, push_task},
timeout=remaining,
return_when=asyncio.FIRST_COMPLETED,
)
for t in pending:
t.cancel()
for t in pending:
try:
await t
except (asyncio.CancelledError, Exception): # noqa: BLE001
pass
# If we got a fresh push for our version, we're done.
current_v = self._published_version.get(abs_path)
if abs_path in self._published and (
current_v is None or current_v >= version
):
return
# Pull may have populated _pull_diagnostics — that's also
# success.
if abs_path in self._pull_diagnostics:
return
# Loop until budget runs out.
async def _wait_for_fresh_push(self, path: str, version: int, timeout: float) -> None:
"""Wait until a publishDiagnostics arrives for ``path`` at ``version``+."""
deadline = asyncio.get_event_loop().time() + timeout
baseline = self._push_counter
while True:
current_v = self._published_version.get(path)
if path in self._published and (current_v is None or current_v >= version):
# Debounce — wait a tick in case more diagnostics arrive
# immediately after. TS often emits in pairs. We
# snapshot the counter so we wake on a *new* push, not
# on the one that satisfied us a moment ago.
debounce_baseline = self._push_counter
debounce_deadline = asyncio.get_event_loop().time() + PUSH_DEBOUNCE
while self._push_counter == debounce_baseline:
remaining = debounce_deadline - asyncio.get_event_loop().time()
if remaining <= 0:
break
self._push_event.clear()
try:
await asyncio.wait_for(self._push_event.wait(), timeout=remaining)
except asyncio.TimeoutError:
break
return
remaining = deadline - asyncio.get_event_loop().time()
if remaining <= 0:
return
if self._push_counter > baseline:
# New event arrived but predicate still false — re-check
# immediately without waiting again.
baseline = self._push_counter
continue
self._push_event.clear()
try:
await asyncio.wait_for(self._push_event.wait(), timeout=min(remaining, 0.5))
except asyncio.TimeoutError:
continue
def diagnostics_for(self, path: str) -> List[Dict[str, Any]]:
"""Return current merged + deduped diagnostics for one file.
Diagnostics from push and pull stores are concatenated and
deduplicated by ``(severity, code, message, range)`` content
key. Empty list if the server hasn't published anything.
"""
abs_path = os.path.abspath(path)
push = self._push_diagnostics.get(abs_path) or []
pull = self._pull_diagnostics.get(abs_path) or []
return _dedupe(push, pull)
def _dedupe(*lists: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
seen: Set[str] = set()
out: List[Dict[str, Any]] = []
for lst in lists:
for d in lst:
if not isinstance(d, dict):
continue
key = _diagnostic_key(d)
if key in seen:
continue
seen.add(key)
out.append(d)
return out
def _diagnostic_key(d: Dict[str, Any]) -> str:
"""Content-equality key for a diagnostic.
Matches the structural-equality used in claude-code's
``areDiagnosticsEqual`` — message + severity + source + code +
range coords. The range is reduced to a tuple to keep the key
stable across dict orderings.
"""
rng = d.get("range") or {}
start = rng.get("start") or {}
end = rng.get("end") or {}
code = d.get("code")
if code is not None and not isinstance(code, str):
code = str(code)
return "\x00".join(
[
str(d.get("severity") or 1),
str(code or ""),
str(d.get("source") or ""),
str(d.get("message") or "").strip(),
f"{start.get('line', 0)}:{start.get('character', 0)}-{end.get('line', 0)}:{end.get('character', 0)}",
]
)
__all__ = [
"LSPClient",
"file_uri",
"uri_to_path",
"INITIALIZE_TIMEOUT",
"DIAGNOSTICS_DOCUMENT_WAIT",
"DIAGNOSTICS_FULL_WAIT",
]

View File

@@ -1,213 +0,0 @@
"""Structured logging with steady-state silence for the LSP layer.
The LSP layer fires on every write_file/patch. In a busy session
that's hundreds of events. We want users to be able to ``rg`` the
log for "did LSP fire on that edit?" without drowning in noise.
The level model:
- ``DEBUG`` for steady-state events that have no novel signal:
``clean``, ``feature off``, ``extension not mapped``, ``no project
root for already-announced file``, ``server unavailable for
already-announced binary``. These never reach ``agent.log`` at the
default INFO threshold.
- ``INFO`` for state transitions worth surfacing exactly once per
session: ``active for <root>`` the first time a (server_id,
workspace_root) client starts, ``no project root for <path>``
the first time we see that file. Plus every diagnostic event
(those are inherently rare and per-edit, exactly what users grep
for).
- ``WARNING`` for action-required failures: ``server unavailable``
(binary not on PATH) the first time per (server_id, binary),
``no server configured`` once per language. Per-call WARNING for
timeouts and unexpected bridge exceptions.
The dedup is in-process module-level sets. Each set grows at most by
the number of distinct (server_id, root) and (server_id, binary)
pairs touched in one Python process — bytes of memory in even an
aggressive monorepo session. Bounded LRU was rejected: evicting an
entry would risk re-firing the WARNING/INFO line we explicitly want
to suppress.
Grep recipe::
tail -f ~/.hermes/logs/agent.log | rg 'lsp\\['
"""
from __future__ import annotations
import logging
import os
import threading
from typing import Tuple
# Dedicated logger name so the documented grep recipe survives a
# ``logging.getLogger(__name__)`` rename of any internal module.
event_log = logging.getLogger("hermes.lint.lsp")
# ---------------------------------------------------------------------------
# Once-per-X dedup sets
# ---------------------------------------------------------------------------
_announce_lock = threading.Lock()
_announced_active: set = set() # keys: (server_id, workspace_root)
_announced_unavailable: set = set() # keys: (server_id, binary_path_or_name)
_announced_no_root: set = set() # keys: (server_id, file_path)
_announced_no_server: set = set() # keys: (server_id,)
def _short_path(file_path: str) -> str:
"""Render *file_path* relative to the cwd when sensible, else absolute.
Keeps log lines readable for the common case (the user is inside
the project they're editing) without emitting brittle ``../../..``
chains for the cross-tree case.
"""
if not file_path:
return file_path
try:
rel = os.path.relpath(file_path)
except ValueError:
return file_path
if rel.startswith(".." + os.sep) or rel == "..":
return file_path
return rel
def _emit(server_id: str, level: int, message: str) -> None:
event_log.log(level, "lsp[%s] %s", server_id, message)
def _announce_once(bucket: set, key: Tuple) -> bool:
"""Return True if *key* has not been announced for *bucket* yet.
Atomically marks the key as announced so concurrent callers
cannot both win the race and double-log.
"""
with _announce_lock:
if key in bucket:
return False
bucket.add(key)
return True
# ---------------------------------------------------------------------------
# Public event helpers — call these from the LSP layer.
# ---------------------------------------------------------------------------
def log_clean(server_id: str, file_path: str) -> None:
"""No diagnostics emitted for *file_path*. DEBUG (silent at default)."""
_emit(server_id, logging.DEBUG, f"clean ({_short_path(file_path)})")
def log_disabled(server_id: str, file_path: str, reason: str) -> None:
"""LSP intentionally skipped for this file (feature off, ext unmapped,
backend not local, etc.). DEBUG."""
_emit(server_id, logging.DEBUG, f"skipped: {reason} ({_short_path(file_path)})")
def log_active(server_id: str, workspace_root: str) -> None:
"""A new LSP client started for (server_id, workspace_root).
INFO once per (server_id, workspace_root); DEBUG thereafter.
Lets users verify "is LSP actually running?" with a single grep.
"""
key = (server_id, workspace_root)
if _announce_once(_announced_active, key):
_emit(server_id, logging.INFO, f"active for {workspace_root}")
else:
_emit(server_id, logging.DEBUG, f"reused client for {workspace_root}")
def log_diagnostics(server_id: str, file_path: str, count: int) -> None:
"""Diagnostics arrived for a file. INFO every time — these are the
failure signals users actually want to grep for, and they are
inherently rare per edit."""
_emit(server_id, logging.INFO, f"{count} diags ({_short_path(file_path)})")
def log_no_project_root(server_id: str, file_path: str) -> None:
"""File had no recognised project marker. INFO once per file,
DEBUG thereafter."""
key = (server_id, file_path)
if _announce_once(_announced_no_root, key):
_emit(server_id, logging.INFO, f"no project root for {_short_path(file_path)}")
else:
_emit(server_id, logging.DEBUG, f"no project root for {_short_path(file_path)}")
def log_server_unavailable(server_id: str, binary_or_pkg: str) -> None:
"""The server binary couldn't be resolved. WARNING once per
(server_id, binary), DEBUG thereafter so a hundred subsequent
.py edits don't spam the log."""
key = (server_id, binary_or_pkg)
if _announce_once(_announced_unavailable, key):
_emit(
server_id,
logging.WARNING,
f"server unavailable: {binary_or_pkg} not found "
"(install via `hermes lsp install <id>` or set lsp.servers.<id>.command)",
)
else:
_emit(server_id, logging.DEBUG, f"server still unavailable: {binary_or_pkg}")
def log_no_server_configured(server_id: str) -> None:
"""No spawn recipe for this language. WARNING once."""
if _announce_once(_announced_no_server, (server_id,)):
_emit(server_id, logging.WARNING, "no server configured")
def log_timeout(server_id: str, file_path: str, kind: str = "diagnostics") -> None:
"""A request to the server timed out. WARNING every time — these are
inherently novel events worth surfacing on each occurrence."""
_emit(
server_id,
logging.WARNING,
f"{kind} timed out for {_short_path(file_path)}",
)
def log_server_error(server_id: str, file_path: str, exc: BaseException) -> None:
"""An unexpected exception bubbled out of the LSP layer. WARNING."""
_emit(
server_id,
logging.WARNING,
f"unexpected error for {_short_path(file_path)}: {type(exc).__name__}: {exc}",
)
def log_spawn_failed(server_id: str, workspace_root: str, exc: BaseException) -> None:
"""The LSP server failed to spawn or initialize. WARNING."""
_emit(
server_id,
logging.WARNING,
f"spawn/initialize failed for {workspace_root}: {type(exc).__name__}: {exc}",
)
def reset_announce_caches() -> None:
"""Test-only: clear the dedup caches. Production code never calls this."""
with _announce_lock:
_announced_active.clear()
_announced_unavailable.clear()
_announced_no_root.clear()
_announced_no_server.clear()
__all__ = [
"event_log",
"log_clean",
"log_disabled",
"log_active",
"log_diagnostics",
"log_no_project_root",
"log_server_unavailable",
"log_no_server_configured",
"log_timeout",
"log_server_error",
"log_spawn_failed",
"reset_announce_caches",
]

View File

@@ -1,376 +0,0 @@
"""Auto-installation of LSP server binaries.
Tries to install missing servers using whatever package manager is
appropriate. All installs go to a Hermes-owned bin staging dir,
``<HERMES_HOME>/lsp/bin/``, so we don't pollute the user's global
toolchain.
Strategies:
- ``auto`` — attempt to install with the best available package
manager. This is the default.
- ``manual`` — never install; if a binary is missing, the server is
silently skipped and the user is told about it via ``hermes lsp
status``.
- ``off`` — same as ``manual`` for now (kept distinct so we can
evolve behavior later, e.g. logging differently).
The actual installs happen synchronously the first time a server is
needed and concurrent calls to :func:`try_install` for the same
package are deduplicated via a per-package lock.
Failure modes are non-fatal: every install path is wrapped in
try/except and returns ``None`` on failure. The tool layer then
falls back to its in-process syntax checker, exactly as if the user
hadn't enabled LSP at all.
"""
from __future__ import annotations
import logging
import os
import shutil
import subprocess
import sys
import threading
from pathlib import Path
from typing import Any, Dict, Optional
logger = logging.getLogger("agent.lsp.install")
# Package-name → install-strategy hint registry. Each entry is a
# tuple of strategy name + package name + executable name. When the
# install completes, we look for the executable in
# ``<HERMES_HOME>/lsp/bin/`` first, then on PATH.
#
# Optional fields:
# - ``extra_pkgs``: list of sibling packages to install alongside
# ``pkg`` in the same node_modules tree. Used when an LSP server
# has a runtime peer dependency that npm doesn't auto-pull (e.g.
# typescript-language-server needs ``typescript``).
INSTALL_RECIPES: Dict[str, Dict[str, Any]] = {
# Python
"pyright": {"strategy": "npm", "pkg": "pyright", "bin": "pyright-langserver"},
# JS/TS family
"typescript-language-server": {
"strategy": "npm",
"pkg": "typescript-language-server",
"bin": "typescript-language-server",
# typescript-language-server requires the `typescript` SDK
# (tsserver) to be importable from the same node_modules tree;
# otherwise initialize() fails with "Could not find a valid
# TypeScript installation". Install them together.
"extra_pkgs": ["typescript"],
},
"@vue/language-server": {
"strategy": "npm",
"pkg": "@vue/language-server",
"bin": "vue-language-server",
},
"svelte-language-server": {
"strategy": "npm",
"pkg": "svelte-language-server",
"bin": "svelteserver",
},
"@astrojs/language-server": {
"strategy": "npm",
"pkg": "@astrojs/language-server",
"bin": "astro-ls",
},
"yaml-language-server": {
"strategy": "npm",
"pkg": "yaml-language-server",
"bin": "yaml-language-server",
},
"bash-language-server": {
"strategy": "npm",
"pkg": "bash-language-server",
"bin": "bash-language-server",
},
"intelephense": {"strategy": "npm", "pkg": "intelephense", "bin": "intelephense"},
"dockerfile-language-server-nodejs": {
"strategy": "npm",
"pkg": "dockerfile-language-server-nodejs",
"bin": "docker-langserver",
},
# Go
"gopls": {"strategy": "go", "pkg": "golang.org/x/tools/gopls@latest", "bin": "gopls"},
# Rust — too heavy (hundreds of MB to bootstrap). We do NOT
# auto-install rust-analyzer; users install via rustup.
"rust-analyzer": {"strategy": "manual", "pkg": "", "bin": "rust-analyzer"},
# C/C++ — manual (clangd ships with LLVM, very heavy)
"clangd": {"strategy": "manual", "pkg": "", "bin": "clangd"},
# Lua — manual (LuaLS is platform-specific binaries from GitHub
# releases; complex enough that we punt to the user)
"lua-language-server": {"strategy": "manual", "pkg": "", "bin": "lua-language-server"},
}
_install_locks: Dict[str, threading.Lock] = {}
_install_results: Dict[str, Optional[str]] = {}
_install_lock_meta = threading.Lock()
def hermes_lsp_bin_dir() -> Path:
"""Return the Hermes-owned bin staging dir for LSP servers."""
home = os.environ.get("HERMES_HOME")
if home is None:
home = os.path.join(os.path.expanduser("~"), ".hermes")
p = Path(home) / "lsp" / "bin"
p.mkdir(parents=True, exist_ok=True)
return p
def _existing_binary(name: str) -> Optional[str]:
"""Probe the staging dir + PATH for a binary named ``name``."""
staged = hermes_lsp_bin_dir() / name
if staged.exists() and os.access(staged, os.X_OK):
return str(staged)
on_path = shutil.which(name)
if on_path:
return on_path
return None
def _get_lock(pkg: str) -> threading.Lock:
with _install_lock_meta:
lock = _install_locks.get(pkg)
if lock is None:
lock = threading.Lock()
_install_locks[pkg] = lock
return lock
def try_install(pkg: str, strategy: str = "auto") -> Optional[str]:
"""Try to install ``pkg`` and return the binary path if successful.
``strategy`` is ``"auto"``, ``"manual"``, or ``"off"``. In
``manual``/``off`` mode, this function only probes for an
existing binary and returns ``None`` if not found.
The install is cached per-package — a second call returns the
same path (or ``None``) without reinstalling. Concurrent calls
are serialized.
"""
if strategy not in {"auto",}:
# Only ``auto`` triggers an actual install. In manual/off,
# we still check whether the binary already exists.
recipe = INSTALL_RECIPES.get(pkg, {})
bin_name = recipe.get("bin", pkg)
return _existing_binary(bin_name)
if pkg in _install_results:
return _install_results[pkg]
lock = _get_lock(pkg)
with lock:
# Double-check after acquiring lock.
if pkg in _install_results:
return _install_results[pkg]
result = _do_install(pkg)
_install_results[pkg] = result
return result
def _do_install(pkg: str) -> Optional[str]:
recipe = INSTALL_RECIPES.get(pkg)
if recipe is None:
# Not in our registry — best-effort: just probe PATH.
return shutil.which(pkg)
strategy = recipe.get("strategy", "manual")
bin_name = recipe.get("bin", pkg)
# Check if already present (shutil.which or staging dir)
existing = _existing_binary(bin_name)
if existing:
return existing
if strategy == "manual":
logger.debug("[install] %s requires manual install (recipe=%s)", pkg, recipe)
return None
if strategy == "npm":
return _install_npm(
recipe.get("pkg", pkg),
bin_name,
extra_pkgs=recipe.get("extra_pkgs") or [],
)
if strategy == "go":
return _install_go(recipe.get("pkg", pkg), bin_name)
if strategy == "pip":
return _install_pip(recipe.get("pkg", pkg), bin_name)
logger.warning("[install] unknown strategy %r for %s", strategy, pkg)
return None
def _install_npm(
pkg: str,
bin_name: str,
extra_pkgs: Optional[list] = None,
) -> Optional[str]:
"""Install an npm package into our staging dir.
Uses ``npm install --prefix`` so the binaries land in
``<staging>/node_modules/.bin/<bin_name>`` and we symlink them up
one level for direct PATH-style access.
``extra_pkgs`` is a list of sibling packages to install in the
same ``node_modules`` tree. Used for LSP servers with runtime
peer deps that npm doesn't auto-pull (typescript-language-server
needs ``typescript`` next to it; intelephense ships standalone).
"""
npm = shutil.which("npm")
if npm is None:
logger.info("[install] cannot install %s: npm not on PATH", pkg)
return None
staging = hermes_lsp_bin_dir().parent # <HERMES_HOME>/lsp/
install_targets = [pkg] + list(extra_pkgs or [])
try:
logger.info(
"[install] npm install --prefix %s %s",
staging,
" ".join(install_targets),
)
proc = subprocess.run(
[npm, "install", "--prefix", str(staging), "--silent", "--no-fund", "--no-audit", *install_targets],
check=False,
capture_output=True,
text=True,
timeout=300,
)
if proc.returncode != 0:
logger.warning(
"[install] npm install failed for %s: %s", pkg, proc.stderr.strip()[:500]
)
return None
except (subprocess.TimeoutExpired, OSError) as e:
logger.warning("[install] npm install errored for %s: %s", pkg, e)
return None
# Find the bin
nm_bin = staging / "node_modules" / ".bin" / bin_name
if os.name == "nt":
# On Windows npm sometimes drops `.cmd` shims
candidates = [nm_bin, nm_bin.with_suffix(".cmd")]
else:
candidates = [nm_bin]
for c in candidates:
if c.exists():
# Symlink into our `lsp/bin/` for stable PATH access.
link = hermes_lsp_bin_dir() / c.name
if not link.exists():
try:
link.symlink_to(c)
except (OSError, NotImplementedError):
# Symlinks fail on some Windows setups — copy instead.
try:
shutil.copy2(c, link)
except OSError:
return str(c)
return str(link if link.exists() else c)
logger.warning("[install] npm install for %s succeeded but bin %s not found", pkg, bin_name)
return None
def _install_go(pkg: str, bin_name: str) -> Optional[str]:
"""Install a Go module to GOBIN=<staging>."""
go = shutil.which("go")
if go is None:
logger.info("[install] cannot install %s: go not on PATH", pkg)
return None
staging = hermes_lsp_bin_dir()
env = dict(os.environ)
env["GOBIN"] = str(staging)
try:
logger.info("[install] go install %s (GOBIN=%s)", pkg, staging)
proc = subprocess.run(
[go, "install", pkg],
check=False,
capture_output=True,
text=True,
timeout=600,
env=env,
)
if proc.returncode != 0:
logger.warning(
"[install] go install failed for %s: %s", pkg, proc.stderr.strip()[:500]
)
return None
except (subprocess.TimeoutExpired, OSError) as e:
logger.warning("[install] go install errored for %s: %s", pkg, e)
return None
bin_path = staging / bin_name
if os.name == "nt":
bin_path = bin_path.with_suffix(".exe")
if bin_path.exists():
return str(bin_path)
logger.warning("[install] go install for %s succeeded but bin %s not found", pkg, bin_name)
return None
def _install_pip(pkg: str, bin_name: str) -> Optional[str]:
"""Install a Python package into a hermes-owned target dir.
We avoid polluting the user's site-packages by using
``pip install --target``. Bins go into
``<staging>/python-packages/bin/`` which we symlink into
``<staging>/bin``. Note: this only works for packages that ship a
console script.
"""
pip_target = hermes_lsp_bin_dir().parent / "python-packages"
pip_target.mkdir(parents=True, exist_ok=True)
try:
logger.info("[install] pip install --target %s %s", pip_target, pkg)
proc = subprocess.run(
[sys.executable, "-m", "pip", "install", "--target", str(pip_target), "--quiet", pkg],
check=False,
capture_output=True,
text=True,
timeout=300,
)
if proc.returncode != 0:
logger.warning(
"[install] pip install failed for %s: %s", pkg, proc.stderr.strip()[:500]
)
return None
except (subprocess.TimeoutExpired, OSError) as e:
logger.warning("[install] pip install errored for %s: %s", pkg, e)
return None
# Look for the script
bin_path = pip_target / "bin" / bin_name
if bin_path.exists():
link = hermes_lsp_bin_dir() / bin_name
if not link.exists():
try:
link.symlink_to(bin_path)
except (OSError, NotImplementedError):
try:
shutil.copy2(bin_path, link)
except OSError:
return str(bin_path)
return str(link if link.exists() else bin_path)
return None
def detect_status(pkg: str) -> str:
"""Return ``installed``, ``missing``, or ``manual-only`` for a package.
Used by the ``hermes lsp status`` CLI to give users a quick
overview of what's available without spawning anything.
"""
recipe = INSTALL_RECIPES.get(pkg)
bin_name = recipe.get("bin", pkg) if recipe else pkg
if _existing_binary(bin_name):
return "installed"
if recipe and recipe.get("strategy") == "manual":
return "manual-only"
return "missing"
__all__ = [
"INSTALL_RECIPES",
"try_install",
"detect_status",
"hermes_lsp_bin_dir",
]

View File

@@ -1,639 +0,0 @@
"""Service-level orchestration for LSP clients.
The :class:`LSPService` is the bridge between the synchronous
file_operations layer and the async :class:`agent.lsp.client.LSPClient`.
Design choices:
- A **single asyncio event loop** runs in a background thread. All
client work happens on that loop. Synchronous callers from
``tools/file_operations.py`` use :meth:`get_diagnostics_sync` to
open + wait + drain in one blocking call.
- One client per ``(server_id, workspace_root)`` key. Lazy spawn:
the first request for a key spawns the client; subsequent requests
re-use it.
- A **broken-set** records ``(server_id, workspace_root)`` pairs that
failed to spawn or initialize. These are never retried for the
life of the service. Mirrors OpenCode's design.
- A **delta baseline** map keeps "diagnostics-as-of-the-last-snapshot"
per file. ``snapshot_baseline()`` is called BEFORE a write; the
next ``get_diagnostics_sync()`` returns only diagnostics that
weren't in the baseline. This is the lift from Claude Code's
``beforeFileEdited`` / ``getNewDiagnostics`` pattern, except wired
to the local LSP layer instead of MCP IDE RPC.
The service is **off by default** — call :meth:`is_active` to check
whether it's actually doing anything. When LSP is disabled in
config, when no git workspace can be detected, when all configured
servers are missing binaries and auto-install is off, ``is_active``
returns False and the file_operations layer falls through to the
in-process syntax check.
"""
from __future__ import annotations
import asyncio
import logging
import os
import threading
import time
from typing import Any, Callable, Dict, List, Optional, Tuple
from agent.lsp import eventlog
from agent.lsp.client import (
DIAGNOSTICS_DOCUMENT_WAIT,
LSPClient,
)
from agent.lsp.servers import (
ServerContext,
find_server_for_file,
language_id_for,
)
from agent.lsp.workspace import (
clear_cache,
resolve_workspace_for_file,
)
logger = logging.getLogger("agent.lsp.manager")
DEFAULT_IDLE_TIMEOUT = 600 # seconds; servers idle for >10min get reaped
class _BackgroundLoop:
"""A daemon thread that owns one asyncio event loop.
Provides :meth:`run` for synchronous callers — submits a coroutine
to the loop and blocks until it finishes (or a timeout fires).
"""
def __init__(self) -> None:
self._loop: Optional[asyncio.AbstractEventLoop] = None
self._thread: Optional[threading.Thread] = None
self._ready = threading.Event()
def start(self) -> None:
if self._thread is not None:
return
self._thread = threading.Thread(
target=self._run_forever,
name="hermes-lsp-loop",
daemon=True,
)
self._thread.start()
self._ready.wait(timeout=5.0)
def _run_forever(self) -> None:
loop = asyncio.new_event_loop()
self._loop = loop
asyncio.set_event_loop(loop)
self._ready.set()
try:
loop.run_forever()
finally:
try:
loop.close()
except Exception: # noqa: BLE001
pass
def run(self, coro, *, timeout: Optional[float] = None) -> Any:
"""Submit a coroutine to the loop and block until done.
Returns the coroutine's result, or raises its exception.
"""
from agent.async_utils import safe_schedule_threadsafe
if self._loop is None:
if asyncio.iscoroutine(coro):
coro.close()
raise RuntimeError("background loop not started")
fut = safe_schedule_threadsafe(coro, self._loop)
if fut is None:
raise RuntimeError("background loop not running")
try:
return fut.result(timeout=timeout)
except Exception:
fut.cancel()
raise
def stop(self) -> None:
loop = self._loop
if loop is None:
return
try:
loop.call_soon_threadsafe(loop.stop)
except RuntimeError:
pass
if self._thread is not None:
self._thread.join(timeout=2.0)
self._loop = None
self._thread = None
class LSPService:
"""The process-wide LSP service.
Created once via :meth:`create_from_config`; the
:func:`agent.lsp.get_service` accessor manages the singleton.
Most callers should use that accessor rather than constructing
:class:`LSPService` directly.
"""
# ------------------------------------------------------------------
# construction + factory
# ------------------------------------------------------------------
def __init__(
self,
*,
enabled: bool,
wait_mode: str,
wait_timeout: float,
install_strategy: str,
binary_overrides: Optional[Dict[str, List[str]]] = None,
env_overrides: Optional[Dict[str, Dict[str, str]]] = None,
init_overrides: Optional[Dict[str, Dict[str, Any]]] = None,
disabled_servers: Optional[List[str]] = None,
idle_timeout: float = DEFAULT_IDLE_TIMEOUT,
) -> None:
self._enabled = enabled
self._wait_mode = wait_mode if wait_mode in {"document", "full"} else "document"
self._wait_timeout = wait_timeout
self._install_strategy = install_strategy
self._binary_overrides = binary_overrides or {}
self._env_overrides = env_overrides or {}
self._init_overrides = init_overrides or {}
self._disabled_servers = set(disabled_servers or [])
self._idle_timeout = idle_timeout
self._loop = _BackgroundLoop()
if self._enabled:
self._loop.start()
# Per-(server_id, workspace_root) state
self._clients: Dict[Tuple[str, str], LSPClient] = {}
self._broken: set = set()
self._spawning: Dict[Tuple[str, str], asyncio.Future] = {}
self._last_used: Dict[Tuple[str, str], float] = {}
self._state_lock = threading.Lock()
# Delta baseline: file path → snapshot of diagnostics taken
# immediately before a write. ``get_diagnostics_sync`` filters
# out anything in the baseline so the agent only sees errors
# introduced by the current edit.
self._delta_baseline: Dict[str, List[Dict[str, Any]]] = {}
@classmethod
def create_from_config(cls) -> Optional["LSPService"]:
"""Build a service from ``hermes_cli.config`` settings.
Returns ``None`` if the config can't be loaded. The service
itself returns ``is_active()`` False when LSP is disabled.
"""
try:
from hermes_cli.config import load_config
cfg = load_config()
except Exception as e: # noqa: BLE001
logger.debug("LSP config load failed: %s", e)
return None
lsp_cfg = (cfg.get("lsp") or {}) if isinstance(cfg, dict) else {}
if not isinstance(lsp_cfg, dict):
lsp_cfg = {}
enabled = bool(lsp_cfg.get("enabled", True))
wait_mode = lsp_cfg.get("wait_mode", "document")
wait_timeout = float(lsp_cfg.get("wait_timeout", DIAGNOSTICS_DOCUMENT_WAIT))
install_strategy = lsp_cfg.get("install_strategy", "auto")
servers_cfg = lsp_cfg.get("servers") or {}
disabled = []
binary_overrides: Dict[str, List[str]] = {}
env_overrides: Dict[str, Dict[str, str]] = {}
init_overrides: Dict[str, Dict[str, Any]] = {}
if isinstance(servers_cfg, dict):
for name, sub in servers_cfg.items():
if not isinstance(sub, dict):
continue
if sub.get("disabled"):
disabled.append(name)
cmd = sub.get("command")
if isinstance(cmd, list) and cmd:
binary_overrides[name] = cmd
env = sub.get("env")
if isinstance(env, dict):
env_overrides[name] = {k: str(v) for k, v in env.items()}
init = sub.get("initialization_options")
if isinstance(init, dict):
init_overrides[name] = init
return cls(
enabled=enabled,
wait_mode=wait_mode,
wait_timeout=wait_timeout,
install_strategy=install_strategy,
binary_overrides=binary_overrides,
env_overrides=env_overrides,
init_overrides=init_overrides,
disabled_servers=disabled,
)
# ------------------------------------------------------------------
# public API
# ------------------------------------------------------------------
def is_active(self) -> bool:
"""Return True iff this service should be consulted at all."""
return self._enabled
def enabled_for(self, file_path: str) -> bool:
"""Return True iff LSP should run for this specific file.
Gates on workspace detection (file or cwd inside a git worktree),
on whether any registered server matches the extension, and
on whether the (server_id, workspace_root) pair is in the
broken-set from a previous spawn failure.
Files in already-broken pairs return False so the file_operations
layer skips the LSP path entirely — no spawn attempts, no
timeout cost — until the service is restarted (``hermes lsp
restart``) or the process exits.
"""
if not self._enabled:
return False
srv = find_server_for_file(file_path)
if srv is None or srv.server_id in self._disabled_servers:
return False
ws_root, gated_in = resolve_workspace_for_file(file_path)
if not (ws_root and gated_in):
return False
# Broken-set short-circuit. Use the per-server root if we can
# compute one cheaply; otherwise fall back to the workspace
# root as the broken key (which is what _get_or_spawn would
# have used anyway when it failed).
try:
per_server_root = srv.resolve_root(file_path, ws_root) or ws_root
except Exception: # noqa: BLE001
per_server_root = ws_root
if (srv.server_id, per_server_root) in self._broken:
return False
return True
def snapshot_baseline(self, file_path: str) -> None:
"""Snapshot current diagnostics for ``file_path`` as the delta baseline.
Called BEFORE a write so the next ``get_diagnostics_sync()``
can filter out pre-existing errors. Best-effort — failures
are silently swallowed so a flaky server can't break a write.
Outer timeouts (e.g. server hangs during initialize) mark the
(server_id, workspace_root) pair as broken so subsequent edits
skip it instantly instead of re-paying the timeout cost.
"""
if not self.enabled_for(file_path):
return
try:
diags = self._loop.run(self._snapshot_async(file_path), timeout=8.0)
self._delta_baseline[os.path.abspath(file_path)] = diags or []
except Exception as e: # noqa: BLE001
logger.debug("baseline snapshot failed for %s: %s", file_path, e)
self._mark_broken_for_file(file_path, e)
self._delta_baseline[os.path.abspath(file_path)] = []
def get_diagnostics_sync(
self,
file_path: str,
*,
delta: bool = True,
timeout: Optional[float] = None,
line_shift: Optional[Callable[[int], Optional[int]]] = None,
) -> List[Dict[str, Any]]:
"""Synchronously open ``file_path`` in the right server, wait for
diagnostics, return them.
If ``delta`` is True (default), the result is filtered against
any baseline previously captured via :meth:`snapshot_baseline`.
Diagnostics present in the baseline are removed so the caller
only sees errors introduced by the current edit.
When ``line_shift`` is provided, baseline diagnostics are
remapped through it before the set-difference. This handles
the case where the edit deleted or inserted lines, causing
pre-existing diagnostics below the edit point to surface at
different line numbers in the post-edit snapshot — without
the shift, they'd all look "introduced by this edit". Pass
a callable built by
:func:`agent.lsp.range_shift.build_line_shift` (pre_text,
post_text). Omit when pre/post content isn't available;
the unshifted comparison still catches diagnostics that
didn't move.
Returns an empty list when LSP is disabled, when no workspace
can be detected, when no server matches, or when the server
can't be spawned. Never raises.
"""
if not self.enabled_for(file_path):
return []
# Resolve server_id eagerly so we can emit structured logs even
# when the request errors out below.
srv = find_server_for_file(file_path)
server_id = srv.server_id if srv else "?"
try:
t = timeout if timeout is not None else self._wait_timeout + 2.0
diags = self._loop.run(self._open_and_wait_async(file_path), timeout=t) or []
except asyncio.TimeoutError as e:
eventlog.log_timeout(server_id, file_path)
logger.debug("LSP diagnostics timeout for %s: %s", file_path, e)
self._mark_broken_for_file(file_path, e)
return []
except Exception as e: # noqa: BLE001
eventlog.log_server_error(server_id, file_path, e)
logger.debug("LSP diagnostics fetch failed for %s: %s", file_path, e)
self._mark_broken_for_file(file_path, e)
return []
abs_path = os.path.abspath(file_path)
if delta:
baseline = self._delta_baseline.get(abs_path) or []
if baseline:
if line_shift is not None:
# Remap baseline diagnostics into post-edit
# coordinates so shifted-but-otherwise-identical
# entries hash equal under _diag_key. Entries
# that mapped into a deleted region drop out
# silently — they no longer apply.
from agent.lsp.range_shift import shift_baseline
baseline = shift_baseline(baseline, line_shift)
seen = {_diag_key(d) for d in baseline}
diags = [d for d in diags if _diag_key(d) not in seen]
# Roll baseline forward — next call returns deltas relative
# to the just-emitted state, mirroring claude-code's
# diagnosticTracking.
try:
fresh = self._loop.run(self._current_diags_async(file_path), timeout=2.0) or []
except Exception: # noqa: BLE001
fresh = []
if fresh:
self._delta_baseline[abs_path] = fresh
if diags:
eventlog.log_diagnostics(server_id, file_path, len(diags))
else:
eventlog.log_clean(server_id, file_path)
return diags
def _mark_broken_for_file(self, file_path: str, exc: BaseException) -> None:
"""Mark the (server_id, workspace_root) pair as broken so subsequent
edits skip it instantly instead of re-paying timeout cost.
Called when the outer ``_loop.run`` timeout cancels an in-flight
spawn/initialize that the inner ``_get_or_spawn`` task was still
holding open. Without this, every subsequent write would re-enter
the spawn path and re-pay the full ``snapshot_baseline``
timeout (8s) until the binary is fixed.
Also kills any orphan client process that survived the cancelled
future, and emits a single eventlog WARNING so the user knows
which server gave up.
``exc`` is whatever exception the outer wrapper caught — used
only for logging, never re-raised.
"""
srv = find_server_for_file(file_path)
if srv is None:
return
ws_root, gated = resolve_workspace_for_file(file_path)
if not (ws_root and gated):
return
try:
per_server_root = srv.resolve_root(file_path, ws_root) or ws_root
except Exception: # noqa: BLE001
per_server_root = ws_root
key = (srv.server_id, per_server_root)
already_broken = key in self._broken
self._broken.add(key)
# Kill any client we managed to spawn before the timeout. The
# cancelled future never reached the broken-set add inside
# ``_get_or_spawn`` so the client may still be hanging in
# ``_clients`` with a half-initialized state.
with self._state_lock:
client = self._clients.pop(key, None)
if client is not None:
try:
# Fire-and-forget shutdown — give it a second to cleanup,
# but don't block. We're already on a slow path.
self._loop.run(client.shutdown(), timeout=1.0)
except Exception: # noqa: BLE001
pass
if not already_broken:
eventlog.log_spawn_failed(srv.server_id, per_server_root, exc)
def shutdown(self) -> None:
"""Tear down all clients and stop the background loop."""
if not self._enabled:
return
try:
self._loop.run(self._shutdown_async(), timeout=10.0)
except Exception as e: # noqa: BLE001
logger.debug("LSP shutdown error: %s", e)
self._loop.stop()
clear_cache()
# ------------------------------------------------------------------
# async internals
# ------------------------------------------------------------------
async def _snapshot_async(self, file_path: str) -> List[Dict[str, Any]]:
client = await self._get_or_spawn(file_path)
if client is None:
return []
try:
version = await client.open_file(file_path, language_id=language_id_for(file_path))
await client.wait_for_diagnostics(file_path, version, mode=self._wait_mode)
except Exception as e: # noqa: BLE001
logger.debug("snapshot open/wait failed: %s", e)
return []
self._last_used[(client.server_id, client.workspace_root)] = time.time()
return list(client.diagnostics_for(file_path))
async def _open_and_wait_async(self, file_path: str) -> List[Dict[str, Any]]:
client = await self._get_or_spawn(file_path)
if client is None:
return []
try:
version = await client.open_file(file_path, language_id=language_id_for(file_path))
await client.save_file(file_path)
await client.wait_for_diagnostics(file_path, version, mode=self._wait_mode)
except Exception as e: # noqa: BLE001
logger.debug("open/wait failed for %s: %s", file_path, e)
return []
self._last_used[(client.server_id, client.workspace_root)] = time.time()
return list(client.diagnostics_for(file_path))
async def _current_diags_async(self, file_path: str) -> List[Dict[str, Any]]:
ws, gated = resolve_workspace_for_file(file_path)
srv = find_server_for_file(file_path)
if not (ws and gated and srv):
return []
with self._state_lock:
client = self._clients.get((srv.server_id, ws))
if client is None:
return []
return list(client.diagnostics_for(file_path))
async def _get_or_spawn(self, file_path: str) -> Optional[LSPClient]:
srv = find_server_for_file(file_path)
if srv is None:
return None
if srv.server_id in self._disabled_servers:
eventlog.log_disabled(srv.server_id, file_path, "disabled in config")
return None
ws_root, gated = resolve_workspace_for_file(file_path)
if not (ws_root and gated):
eventlog.log_no_project_root(srv.server_id, file_path)
return None
per_server_root = srv.resolve_root(file_path, ws_root)
if per_server_root is None:
eventlog.log_disabled(
srv.server_id, file_path, "exclude marker hit (server gated off)"
)
return None # exclude marker hit, server gated off
key = (srv.server_id, per_server_root)
if key in self._broken:
return None
with self._state_lock:
client = self._clients.get(key)
if client is not None and client.is_running:
eventlog.log_active(srv.server_id, per_server_root)
return client
spawning = self._spawning.get(key)
if spawning is not None:
try:
return await spawning
except Exception: # noqa: BLE001
return None
# Begin spawn
loop = asyncio.get_running_loop()
spawn_future: asyncio.Future = loop.create_future()
with self._state_lock:
self._spawning[key] = spawn_future
try:
ctx = ServerContext(
workspace_root=per_server_root,
install_strategy=self._install_strategy,
binary_overrides=self._binary_overrides,
env_overrides=self._env_overrides,
init_overrides=self._init_overrides,
)
spec = srv.build_spawn(per_server_root, ctx)
if spec is None:
# ``build_spawn`` returns None when the binary can't be
# located (auto-install disabled, manual-only server,
# or install attempt failed). Surface this once via
# the structured logger so the user can act on it.
eventlog.log_server_unavailable(srv.server_id, srv.server_id)
self._broken.add(key)
spawn_future.set_result(None)
return None
client = LSPClient(
server_id=srv.server_id,
workspace_root=spec.workspace_root,
command=spec.command,
env=spec.env,
cwd=spec.cwd,
initialization_options=spec.initialization_options,
seed_diagnostics_on_first_push=spec.seed_diagnostics_on_first_push or srv.seed_first_push,
)
try:
await client.start()
except Exception as e: # noqa: BLE001
eventlog.log_spawn_failed(srv.server_id, per_server_root, e)
self._broken.add(key)
spawn_future.set_result(None)
return None
with self._state_lock:
self._clients[key] = client
self._last_used[key] = time.time()
eventlog.log_active(srv.server_id, per_server_root)
spawn_future.set_result(client)
return client
finally:
with self._state_lock:
self._spawning.pop(key, None)
async def _shutdown_async(self) -> None:
with self._state_lock:
clients = list(self._clients.values())
self._clients.clear()
self._broken.clear()
self._last_used.clear()
await asyncio.gather(
*(c.shutdown() for c in clients),
return_exceptions=True,
)
# ------------------------------------------------------------------
# status / introspection (used by ``hermes lsp status``)
# ------------------------------------------------------------------
def get_status(self) -> Dict[str, Any]:
"""Return a snapshot of the service for the CLI status command."""
with self._state_lock:
clients = [
{
"server_id": k[0],
"workspace_root": k[1],
"state": c.state,
"running": c.is_running,
}
for k, c in self._clients.items()
]
broken = list(self._broken)
return {
"enabled": self._enabled,
"wait_mode": self._wait_mode,
"wait_timeout": self._wait_timeout,
"install_strategy": self._install_strategy,
"clients": clients,
"broken": broken,
"disabled_servers": sorted(self._disabled_servers),
}
def _diag_key(d: Dict[str, Any]) -> str:
"""Content equality key used for cross-edit delta filtering.
Includes the diagnostic's position range — when used together
with :func:`agent.lsp.range_shift.shift_baseline`, the baseline
is line-shifted into post-edit coordinates BEFORE this key is
computed, so identical-but-shifted diagnostics hash equal. Two
genuinely distinct diagnostics at different lines (e.g. the same
error class introduced at a second site) hash differently and
are surfaced as new.
Mirrors :func:`agent.lsp.client._diagnostic_key`; intentionally
identical so the two layers agree on diagnostic identity.
"""
rng = d.get("range") or {}
start = rng.get("start") or {}
end = rng.get("end") or {}
code = d.get("code")
if code is not None and not isinstance(code, str):
code = str(code)
return "\x00".join(
[
str(d.get("severity") or 1),
str(code or ""),
str(d.get("source") or ""),
str(d.get("message") or "").strip(),
f"{start.get('line', 0)}:{start.get('character', 0)}-{end.get('line', 0)}:{end.get('character', 0)}",
]
)
__all__ = ["LSPService"]

View File

@@ -1,196 +0,0 @@
"""Minimal LSP JSON-RPC 2.0 framer over async streams.
LSP wire format:
Content-Length: <bytes>\\r\\n
\\r\\n
<utf-8 JSON body>
The body is a JSON-RPC 2.0 envelope: request, response, or notification.
This module replaces what ``vscode-jsonrpc/node`` would do in a
TypeScript implementation. We keep it deliberately small — just the
framer + envelope helpers — so :class:`agent.lsp.client.LSPClient` can
focus on protocol semantics.
"""
from __future__ import annotations
import asyncio
import json
import logging
from typing import Any, Optional, Tuple
logger = logging.getLogger("agent.lsp.protocol")
# LSP error codes we care about. Full list in
# https://microsoft.github.io/language-server-protocol/specifications/lsp/3.17/specification/#errorCodes
ERROR_CONTENT_MODIFIED = -32801
ERROR_REQUEST_CANCELLED = -32800
ERROR_METHOD_NOT_FOUND = -32601
class LSPProtocolError(Exception):
"""Raised when the wire protocol is violated.
Distinct from :class:`LSPRequestError` which represents a server
returning a JSON-RPC error response — that's protocol-conformant.
This exception means the framing or envelope itself is broken.
"""
class LSPRequestError(Exception):
"""Raised when an LSP request returns an error response.
Carries the JSON-RPC ``code``, ``message``, and optional ``data``.
"""
def __init__(self, code: int, message: str, data: Any = None) -> None:
super().__init__(f"LSP error {code}: {message}")
self.code = code
self.message = message
self.data = data
def encode_message(obj: dict) -> bytes:
"""Encode a JSON-RPC envelope as a Content-Length framed byte string.
The body is encoded as compact UTF-8 JSON (no spaces between
separators) — matches what ``vscode-jsonrpc`` emits and keeps the
Content-Length count exact.
"""
body = json.dumps(obj, separators=(",", ":"), ensure_ascii=False).encode("utf-8")
header = f"Content-Length: {len(body)}\r\n\r\n".encode("ascii")
return header + body
async def read_message(reader: asyncio.StreamReader) -> Optional[dict]:
"""Read one Content-Length framed JSON-RPC message from the stream.
Returns ``None`` on clean EOF (server closed stdout cleanly between
messages — typical shutdown). Raises :class:`LSPProtocolError` on
malformed framing.
The reader is advanced to just past the JSON body on success.
"""
headers: dict = {}
header_bytes = 0
while True:
try:
line = await reader.readuntil(b"\r\n")
except asyncio.IncompleteReadError as e:
# EOF while reading headers. If we hadn't started a header
# block, treat as clean EOF; otherwise the framing is bad.
if not e.partial and not headers:
return None
raise LSPProtocolError(
f"unexpected EOF while reading LSP headers (partial={e.partial!r})"
) from e
# Defensive cap against a server streaming headers without ever
# emitting CRLF-CRLF. Caps total header bytes at 8 KiB — a
# well-behaved server fits in well under 200 bytes.
header_bytes += len(line)
if header_bytes > 8192:
raise LSPProtocolError(
f"LSP header block exceeded 8 KiB without terminator"
)
line = line[:-2] # strip CRLF
if not line:
break # blank line ends header block
try:
key, _, value = line.decode("ascii").partition(":")
except UnicodeDecodeError as e:
raise LSPProtocolError(f"non-ASCII LSP header: {line!r}") from e
if not key:
raise LSPProtocolError(f"malformed LSP header line: {line!r}")
headers[key.strip().lower()] = value.strip()
cl = headers.get("content-length")
if cl is None:
raise LSPProtocolError(f"LSP message missing Content-Length: {headers!r}")
try:
n = int(cl)
except ValueError as e:
raise LSPProtocolError(f"non-integer Content-Length: {cl!r}") from e
if n < 0 or n > 64 * 1024 * 1024: # 64 MiB sanity cap
raise LSPProtocolError(f"unreasonable Content-Length: {n}")
try:
body = await reader.readexactly(n)
except asyncio.IncompleteReadError as e:
raise LSPProtocolError(
f"truncated LSP body: expected {n} bytes, got {len(e.partial)}"
) from e
try:
return json.loads(body.decode("utf-8"))
except json.JSONDecodeError as e:
raise LSPProtocolError(f"invalid JSON in LSP body: {e}") from e
except UnicodeDecodeError as e:
raise LSPProtocolError(f"non-UTF-8 LSP body: {e}") from e
def make_request(req_id: int, method: str, params: Any) -> dict:
"""Build a JSON-RPC 2.0 request envelope."""
msg: dict = {"jsonrpc": "2.0", "id": req_id, "method": method}
if params is not None:
msg["params"] = params
return msg
def make_notification(method: str, params: Any) -> dict:
"""Build a JSON-RPC 2.0 notification envelope (no ``id``)."""
msg: dict = {"jsonrpc": "2.0", "method": method}
if params is not None:
msg["params"] = params
return msg
def make_response(req_id: Any, result: Any) -> dict:
"""Build a JSON-RPC 2.0 success response envelope."""
return {"jsonrpc": "2.0", "id": req_id, "result": result}
def make_error_response(req_id: Any, code: int, message: str, data: Any = None) -> dict:
"""Build a JSON-RPC 2.0 error response envelope."""
err: dict = {"code": code, "message": message}
if data is not None:
err["data"] = data
return {"jsonrpc": "2.0", "id": req_id, "error": err}
def classify_message(msg: dict) -> Tuple[str, Any]:
"""Return ``(kind, key)`` where kind is one of ``request``,
``response``, ``notification``, ``invalid``.
The key is the request id for request/response, the method name
for notifications, and ``None`` for invalid messages.
"""
if not isinstance(msg, dict):
return "invalid", None
if msg.get("jsonrpc") != "2.0":
return "invalid", None
has_id = "id" in msg
has_method = "method" in msg
if has_id and has_method:
return "request", msg["id"]
if has_id and ("result" in msg or "error" in msg):
return "response", msg["id"]
if has_method and not has_id:
return "notification", msg["method"]
return "invalid", None
__all__ = [
"ERROR_CONTENT_MODIFIED",
"ERROR_REQUEST_CANCELLED",
"ERROR_METHOD_NOT_FOUND",
"LSPProtocolError",
"LSPRequestError",
"encode_message",
"read_message",
"make_request",
"make_notification",
"make_response",
"make_error_response",
"classify_message",
]

View File

@@ -1,149 +0,0 @@
"""Diff-aware line-shift map for cross-edit LSP delta filtering.
When an edit deletes or inserts lines in the middle of a file, every
diagnostic below the edit point shifts to a new line number. The
LSPService delta filter subtracts the pre-edit baseline from the
post-edit diagnostics keyed on ``(severity, code, source, message,
range)`` — without an adjustment, the shifted-but-otherwise-identical
diagnostics look brand-new and the agent gets flooded with noise.
The fix used here is the same trick git's blame and unified diff use:
build a piecewise-linear map from pre-edit line numbers to post-edit
line numbers, then apply that map to baseline diagnostics before the
set-difference. Diagnostics whose pre-edit line is in a region the
edit deleted return ``None`` and are dropped from the baseline (they
genuinely no longer apply).
Trade-off vs. dropping range from the key entirely (the previous
fix): preserves the "new instance of an identical error at a
different line" signal — if the model introduces a second instance
of the same error class at a different location, that one will be
surfaced as new instead of swallowed by content-only dedup.
The map is derived from ``difflib.SequenceMatcher.get_opcodes()`` and
exposed as a single callable so callers don't have to reason about
diff regions.
"""
from __future__ import annotations
import difflib
from typing import Any, Callable, Dict, List, Optional
def build_line_shift(pre_text: str, post_text: str) -> Callable[[int], Optional[int]]:
"""Build a function mapping pre-edit line numbers to post-edit line numbers.
Lines are 0-indexed to match the LSP wire format
(``range.start.line`` is 0-indexed).
The returned callable takes a pre-edit 0-indexed line number and
returns the corresponding post-edit 0-indexed line number, or
``None`` if that line was deleted by the edit (no post-edit
counterpart exists).
Cost: one ``SequenceMatcher.get_opcodes()`` call up front; the
returned closure is O(log n) per call (binary search over opcode
regions). Cheap enough to call once per write/patch and apply to
every baseline diagnostic.
"""
pre_lines = pre_text.splitlines() if pre_text else []
post_lines = post_text.splitlines() if post_text else []
# Trivial case: identical content or no content — identity map.
if pre_lines == post_lines:
return lambda line: line
# SequenceMatcher.get_opcodes() returns a list of
# (tag, i1, i2, j1, j2) where tag is 'equal', 'replace', 'delete',
# or 'insert'. i1:i2 is the range in pre, j1:j2 is the range in
# post. We build a list of (i1, i2, j1, j2, tag) tuples and
# binary-search by i for each lookup.
sm = difflib.SequenceMatcher(a=pre_lines, b=post_lines, autojunk=False)
opcodes = sm.get_opcodes()
def shift(line: int) -> Optional[int]:
# Find the opcode region whose i1 <= line < i2.
# Linear scan is fine — typical opcode count is small (single
# digits for a typical patch-tool edit).
for tag, i1, i2, j1, j2 in opcodes:
if i1 <= line < i2:
if tag == "equal":
# Pre-line N → post-line (N - i1 + j1).
return line - i1 + j1
if tag == "delete":
# Pre-line is in a deleted region — no post counterpart.
return None
if tag == "replace":
# Replace == delete + insert; the pre-line has no
# post counterpart in any meaningful sense. Drop.
return None
# 'insert' has i1 == i2 so line < i2 can't be hit.
if line < i1:
# Past the relevant region — handled in earlier iteration.
break
# Past the last opcode region (line >= len(pre_lines)).
# Anchor at end of post.
return max(0, len(post_lines) - 1) if post_lines else None
return shift
def shift_diagnostic_range(diag: Dict[str, Any],
shift: Callable[[int], Optional[int]]) -> Optional[Dict[str, Any]]:
"""Return a copy of ``diag`` with its line range remapped through ``shift``.
Returns ``None`` if the diagnostic's start line maps to ``None``
(the line was deleted by the edit) — caller drops it from the
baseline since the diagnostic no longer applies.
Both ``start.line`` and ``end.line`` are remapped independently;
when only the end maps to ``None`` (rare, multi-line diagnostic
straddling the edit boundary) we collapse to a single-line range
at the shifted start to keep the diagnostic in the baseline.
The original ``diag`` is not mutated.
"""
rng = diag.get("range") or {}
start = rng.get("start") or {}
end = rng.get("end") or {}
pre_start_line = int(start.get("line", 0))
pre_end_line = int(end.get("line", pre_start_line))
new_start_line = shift(pre_start_line)
if new_start_line is None:
return None
new_end_line = shift(pre_end_line)
if new_end_line is None:
# Diagnostic straddled the deletion — collapse to start.
new_end_line = new_start_line
shifted = dict(diag)
shifted["range"] = {
"start": {
"line": new_start_line,
"character": int(start.get("character", 0)),
},
"end": {
"line": new_end_line,
"character": int(end.get("character", 0)),
},
}
return shifted
def shift_baseline(baseline: List[Dict[str, Any]],
shift: Callable[[int], Optional[int]]) -> List[Dict[str, Any]]:
"""Apply ``shift`` to every diagnostic in ``baseline``, dropping deleted entries."""
out: List[Dict[str, Any]] = []
for d in baseline:
if not isinstance(d, dict):
continue
shifted = shift_diagnostic_range(d, shift)
if shifted is not None:
out.append(shifted)
return out
__all__ = ["build_line_shift", "shift_diagnostic_range", "shift_baseline"]

View File

@@ -1,78 +0,0 @@
"""Format LSP diagnostics for inclusion in tool output.
The model sees a compact, severity-filtered, line-bounded summary of
diagnostics introduced by the latest edit. Format matches what
OpenCode's ``lsp/diagnostic.ts`` and Claude Code's
``formatDiagnosticsSummary`` produce — ``<diagnostics>`` blocks with
1-indexed line/column, capped at ``MAX_PER_FILE`` errors.
"""
from __future__ import annotations
from typing import Any, Dict, List
# Severity-1 only by default — warnings/info/hints would flood the
# agent. Lift this in config under ``lsp.severities`` if needed.
SEVERITY_NAMES = {1: "ERROR", 2: "WARN", 3: "INFO", 4: "HINT"}
DEFAULT_SEVERITIES = frozenset({1}) # ERROR only
MAX_PER_FILE = 20
MAX_TOTAL_CHARS = 4000
def format_diagnostic(d: Dict[str, Any]) -> str:
"""One-line representation of a single diagnostic."""
sev = SEVERITY_NAMES.get(d.get("severity") or 1, "ERROR")
rng = d.get("range") or {}
start = rng.get("start") or {}
line = int(start.get("line", 0)) + 1
col = int(start.get("character", 0)) + 1
msg = str(d.get("message") or "").rstrip()
code = d.get("code")
code_part = f" [{code}]" if code not in {None, ""} else ""
source = d.get("source")
source_part = f" ({source})" if source else ""
return f"{sev} [{line}:{col}] {msg}{code_part}{source_part}"
def report_for_file(
file_path: str,
diagnostics: List[Dict[str, Any]],
*,
severities: frozenset = DEFAULT_SEVERITIES,
max_per_file: int = MAX_PER_FILE,
) -> str:
"""Build a ``<diagnostics file=...>`` block for one file.
Returns an empty string when no diagnostics pass the severity
filter, so callers can do ``if block:`` to skip empty cases.
"""
if not diagnostics:
return ""
filtered = [d for d in diagnostics if (d.get("severity") or 1) in severities]
if not filtered:
return ""
limited = filtered[:max_per_file]
extra = len(filtered) - len(limited)
lines = [format_diagnostic(d) for d in limited]
body = "\n".join(lines)
if extra > 0:
body += f"\n... and {extra} more"
return f"<diagnostics file=\"{file_path}\">\n{body}\n</diagnostics>"
def truncate(s: str, *, limit: int = MAX_TOTAL_CHARS) -> str:
"""Hard-cap a formatted summary string."""
if len(s) <= limit:
return s
marker = "\n…[truncated]"
return s[: limit - len(marker)] + marker
__all__ = [
"SEVERITY_NAMES",
"DEFAULT_SEVERITIES",
"MAX_PER_FILE",
"format_diagnostic",
"report_for_file",
"truncate",
]

File diff suppressed because it is too large Load Diff

View File

@@ -1,223 +0,0 @@
"""Workspace and project-root resolution for LSP.
Two concerns live here:
1. **Workspace gate** — the upper-level "is this directory a project?"
check. Hermes only runs LSP when the cwd (or the file being edited)
sits inside a git worktree. Files outside any git root never
trigger LSP, even if a server is configured. This keeps Telegram
gateway users on user-home cwd's from spawning daemons.
2. **NearestRoot** — the per-server project-root walk. Each language
server cares about a different marker (``pyproject.toml`` for
Python, ``Cargo.toml`` for Rust, ``go.mod`` for Go, etc.) and
wants the directory containing that marker. ``nearest_root()``
walks up from a starting path looking for any of a list of marker
files, optionally bailing if an exclude marker shows up first.
"""
from __future__ import annotations
import logging
import os
from pathlib import Path
from typing import Iterable, Optional, Tuple
logger = logging.getLogger("agent.lsp.workspace")
# Cache: cwd → (worktree_root, is_git) so repeated calls don't re-stat.
# Cleared on shutdown. Keyed by absolute resolved path so symlink
# folds collapse to one entry.
_workspace_cache: dict = {}
def normalize_path(path: str) -> str:
"""Normalize a path for use as a stable map key.
Resolves ``~``, makes absolute, and collapses ``.``/``..``. We do
NOT resolve symlinks here — symlink stability matters for some
LSP servers (rust-analyzer cares about Cargo workspace identity)
and we want the canonical path the user typed when possible.
"""
return os.path.abspath(os.path.expanduser(path))
def find_git_worktree(start: str) -> Optional[str]:
"""Walk up from ``start`` looking for a ``.git`` entry (file or dir).
Returns the directory containing ``.git``, or ``None`` if no git
root is found before hitting the filesystem root.
A ``.git`` *file* (not directory) means we're inside a git
worktree set up via ``git worktree add`` — both forms count.
"""
try:
start_path = Path(normalize_path(start))
if start_path.is_file():
start_path = start_path.parent
except (OSError, RuntimeError, ValueError):
# Pathological input (loop in symlinks, encoding error, etc.) —
# bail out rather than crash the lint hook.
return None
# Cache check
cached = _workspace_cache.get(str(start_path))
if cached is not None:
root, _is_git = cached
return root
cur = start_path
# Defensive cap: the deepest reasonable monorepo is well under 64
# levels. Caps the walk so a pathological cwd or a symlink cycle
# we somehow traverse can't keep us looping.
for _ in range(64):
git_marker = cur / ".git"
try:
if git_marker.exists():
resolved = str(cur)
_workspace_cache[str(start_path)] = (resolved, True)
return resolved
except OSError:
# Permission error on a parent dir — bail out cleanly.
break
parent = cur.parent
if parent == cur:
break
cur = parent
_workspace_cache[str(start_path)] = (None, False)
return None
def is_inside_workspace(path: str, workspace_root: str) -> bool:
"""Return True iff ``path`` is inside (or equal to) ``workspace_root``.
Uses absolute paths but does not resolve symlinks — a file accessed
via a symlink that points outside the workspace still counts as
outside. This is the conservative interpretation; matches LSP
behaviour where servers reject didOpen for unrelated files.
"""
p = normalize_path(path)
root = normalize_path(workspace_root)
if p == root:
return True
# Use os.path.commonpath to handle case-insensitive filesystems
# correctly on macOS/Windows.
try:
common = os.path.commonpath([p, root])
except ValueError:
# Different drives on Windows.
return False
return common == root
def nearest_root(
start: str,
markers: Iterable[str],
*,
excludes: Optional[Iterable[str]] = None,
ceiling: Optional[str] = None,
) -> Optional[str]:
"""Walk up from ``start`` looking for any of the given marker files.
Returns the **directory containing** the first matched marker, or
``None`` if no marker is found before hitting ``ceiling`` (or the
filesystem root if no ceiling).
If ``excludes`` is provided and an exclude marker matches *first*
in the upward walk, returns ``None`` — the server is gated off
for that file. Mirrors OpenCode's NearestRoot exclude semantics
(e.g. typescript skips deno projects when ``deno.json`` is found
before ``package.json``).
"""
start_path = Path(normalize_path(start))
try:
if start_path.is_file():
start_path = start_path.parent
except (OSError, RuntimeError, ValueError):
return None
ceiling_path = Path(normalize_path(ceiling)) if ceiling else None
markers_list = list(markers)
excludes_list = list(excludes) if excludes else []
cur = start_path
# Defensive cap matching ``find_git_worktree``. Bounded walk
# protects against pathological inputs even though the
# parent-equality stop normally terminates within ~10 steps.
for _ in range(64):
# Check excludes first — if an exclude is found at this level,
# the server is gated off for this file.
for exc in excludes_list:
try:
if (cur / exc).exists():
return None
except OSError:
continue
# Then check markers.
for marker in markers_list:
try:
if (cur / marker).exists():
return str(cur)
except OSError:
continue
# Stop conditions.
if ceiling_path is not None and cur == ceiling_path:
return None
parent = cur.parent
if parent == cur:
return None
cur = parent
return None
def resolve_workspace_for_file(
file_path: str,
*,
cwd: Optional[str] = None,
) -> Tuple[Optional[str], bool]:
"""Resolve the workspace root for a file.
Returns ``(workspace_root, gated_in)`` where ``gated_in`` is True
iff LSP should run for this file at all. Currently the gate is
"file is inside a git worktree found by walking up from cwd OR
from the file itself".
The cwd path takes precedence — if the agent was launched in a
git project, that worktree is the workspace, and any edit inside
it (regardless of where the file lives) is in-scope. If the cwd
isn't in a git worktree, we try the file's own location as a
fallback.
Returns ``(None, False)`` when neither path is in a git worktree.
"""
cwd = cwd or os.getcwd()
cwd_root = find_git_worktree(cwd)
if cwd_root is not None:
if is_inside_workspace(file_path, cwd_root):
return cwd_root, True
# File is outside the cwd's worktree — try the file's own
# location as a secondary anchor. Useful for monorepos where
# the user opens an unrelated checkout.
file_root = find_git_worktree(file_path)
if file_root is not None:
return file_root, True
return None, False
def clear_cache() -> None:
"""Clear the workspace-resolution cache.
Called on service shutdown so a subsequent re-init doesn't pick
up stale results from a previous session.
"""
_workspace_cache.clear()
__all__ = [
"find_git_worktree",
"is_inside_workspace",
"nearest_root",
"normalize_path",
"resolve_workspace_for_file",
"clear_cache",
]

View File

@@ -91,12 +91,10 @@ class StreamingContextScrubber:
def __init__(self) -> None:
self._in_span: bool = False
self._buf: str = ""
self._at_block_boundary: bool = True
def reset(self) -> None:
self._in_span = False
self._buf = ""
self._at_block_boundary = True
def feed(self, text: str) -> str:
"""Return the visible portion of ``text`` after scrubbing.
@@ -123,22 +121,19 @@ class StreamingContextScrubber:
buf = buf[idx + len(self._CLOSE_TAG):]
self._in_span = False
else:
idx = self._find_boundary_open_tag(buf)
idx = buf.lower().find(self._OPEN_TAG)
if idx == -1:
# No open tag — hold back a potential partial open tag
held = (
self._max_pending_open_suffix(buf)
or self._max_partial_suffix(buf, self._OPEN_TAG)
)
held = self._max_partial_suffix(buf, self._OPEN_TAG)
if held:
self._append_visible(out, buf[:-held])
out.append(buf[:-held])
self._buf = buf[-held:]
else:
self._append_visible(out, buf)
out.append(buf)
return "".join(out)
# Emit text before the tag, enter span
if idx > 0:
self._append_visible(out, buf[:idx])
out.append(buf[:idx])
buf = buf[idx + len(self._OPEN_TAG):]
self._in_span = True
@@ -174,55 +169,6 @@ class StreamingContextScrubber:
return i
return 0
def _find_boundary_open_tag(self, buf: str) -> int:
"""Find an opening fence only when it starts a block-like span."""
buf_lower = buf.lower()
search_start = 0
while True:
idx = buf_lower.find(self._OPEN_TAG, search_start)
if idx == -1:
return -1
if self._is_block_boundary(buf, idx) and self._has_block_opener_suffix(buf, idx):
return idx
search_start = idx + 1
def _max_pending_open_suffix(self, buf: str) -> int:
"""Hold a complete boundary tag until the following char confirms it."""
if not buf.lower().endswith(self._OPEN_TAG):
return 0
idx = len(buf) - len(self._OPEN_TAG)
if not self._is_block_boundary(buf, idx):
return 0
return len(self._OPEN_TAG)
def _has_block_opener_suffix(self, buf: str, idx: int) -> bool:
after_idx = idx + len(self._OPEN_TAG)
if after_idx >= len(buf):
return False
return buf[after_idx] in "\r\n"
def _is_block_boundary(self, buf: str, idx: int) -> bool:
if idx == 0:
return self._at_block_boundary
preceding = buf[:idx]
last_newline = preceding.rfind("\n")
if last_newline == -1:
return self._at_block_boundary and preceding.strip() == ""
return preceding[last_newline + 1:].strip() == ""
def _append_visible(self, out: list[str], text: str) -> None:
if not text:
return
out.append(text)
self._update_block_boundary(text)
def _update_block_boundary(self, text: str) -> None:
last_newline = text.rfind("\n")
if last_newline != -1:
self._at_block_boundary = text[last_newline + 1:].strip() == ""
else:
self._at_block_boundary = self._at_block_boundary and text.strip() == ""
def build_memory_context_block(raw_context: str) -> str:
"""Wrap prefetched memory in a fenced block with system note."""
@@ -368,42 +314,11 @@ class MemoryManager:
# -- Sync ----------------------------------------------------------------
@staticmethod
def _provider_sync_accepts_messages(provider: MemoryProvider) -> bool:
"""Return whether sync_turn accepts a messages keyword."""
try:
signature = inspect.signature(provider.sync_turn)
except (TypeError, ValueError):
return True
params = list(signature.parameters.values())
if any(p.kind == inspect.Parameter.VAR_KEYWORD for p in params):
return True
return "messages" in signature.parameters
def sync_all(
self,
user_content: str,
assistant_content: str,
*,
session_id: str = "",
messages: Optional[List[Dict[str, Any]]] = None,
) -> None:
def sync_all(self, user_content: str, assistant_content: str, *, session_id: str = "") -> None:
"""Sync a completed turn to all providers."""
for provider in self._providers:
try:
if messages is not None and self._provider_sync_accepts_messages(provider):
provider.sync_turn(
user_content,
assistant_content,
session_id=session_id,
messages=messages,
)
else:
provider.sync_turn(
user_content,
assistant_content,
session_id=session_id,
)
provider.sync_turn(user_content, assistant_content, session_id=session_id)
except Exception as e:
logger.warning(
"Memory provider '%s' sync_turn failed: %s",

View File

@@ -78,7 +78,6 @@ class MemoryProvider(ABC):
- agent_workspace (str): Shared workspace name (e.g. "hermes").
- parent_session_id (str): For subagents, the parent's session_id.
- user_id (str): Platform user identifier (gateway sessions).
- user_id_alt (str): Optional alternate stable platform user identifier.
"""
def system_prompt_block(self) -> str:
@@ -112,22 +111,11 @@ class MemoryProvider(ABC):
that do background prefetching should override this.
"""
def sync_turn(
self,
user_content: str,
assistant_content: str,
*,
session_id: str = "",
messages: Optional[List[Dict[str, Any]]] = None,
) -> None:
def sync_turn(self, user_content: str, assistant_content: str, *, session_id: str = "") -> None:
"""Persist a completed turn to the backend.
Called after each turn. Should be non-blocking — queue for
background processing if the backend has latency.
``messages`` is the OpenAI-style conversation message list as of the
completed turn, including any assistant tool calls and tool results.
Providers that do not need raw turn context can ignore it.
"""
@abstractmethod

View File

@@ -1,444 +0,0 @@
"""Message and tool-payload sanitization helpers.
Pure functions extracted from ``run_agent.py`` so the AIAgent module can
stay focused on the conversation loop. These walk OpenAI-format message
lists and structured payloads, repairing or stripping problematic
characters that would otherwise crash ``json.dumps`` inside the OpenAI
SDK or be rejected by upstream APIs.
All helpers are stateless and side-effect-free except for in-place
mutation of their input (where documented). Backward-compatible
re-exports from ``run_agent`` remain in place so existing imports
``from run_agent import _sanitize_surrogates`` keep working.
"""
from __future__ import annotations
import json
import logging
import re
from typing import Any
logger = logging.getLogger(__name__)
# Lone surrogate code points are invalid in UTF-8 and crash json.dumps
# inside the OpenAI SDK. Used by every surrogate-sanitization helper
# below as well as by run_agent and the CLI for paste-from-clipboard
# scrubbing.
_SURROGATE_RE = re.compile(r'[\ud800-\udfff]')
def _sanitize_surrogates(text: str) -> str:
"""Replace lone surrogate code points with U+FFFD (replacement character).
Surrogates are invalid in UTF-8 and will crash ``json.dumps()`` inside the
OpenAI SDK. This is a fast no-op when the text contains no surrogates.
"""
if _SURROGATE_RE.search(text):
return _SURROGATE_RE.sub('\ufffd', text)
return text
def _sanitize_structure_surrogates(payload: Any) -> bool:
"""Replace surrogate code points in nested dict/list payloads in-place.
Mirror of ``_sanitize_structure_non_ascii`` but for surrogate recovery.
Used to scrub nested structured fields (e.g. ``reasoning_details`` — an
array of dicts with ``summary``/``text`` strings) that flat per-field
checks don't reach. Returns True if any surrogates were replaced.
"""
found = False
def _walk(node):
nonlocal found
if isinstance(node, dict):
for key, value in node.items():
if isinstance(value, str):
if _SURROGATE_RE.search(value):
node[key] = _SURROGATE_RE.sub('\ufffd', value)
found = True
elif isinstance(value, (dict, list)):
_walk(value)
elif isinstance(node, list):
for idx, value in enumerate(node):
if isinstance(value, str):
if _SURROGATE_RE.search(value):
node[idx] = _SURROGATE_RE.sub('\ufffd', value)
found = True
elif isinstance(value, (dict, list)):
_walk(value)
_walk(payload)
return found
def _sanitize_messages_surrogates(messages: list) -> bool:
"""Sanitize surrogate characters from all string content in a messages list.
Walks message dicts in-place. Returns True if any surrogates were found
and replaced, False otherwise. Covers content/text, name, tool call
metadata/arguments, AND any additional string or nested structured fields
(``reasoning``, ``reasoning_content``, ``reasoning_details``, etc.) so
retries don't fail on a non-content field. Byte-level reasoning models
(xiaomi/mimo, kimi, glm) can emit lone surrogates in reasoning output
that flow through to ``api_messages["reasoning_content"]`` on the next
turn and crash json.dumps inside the OpenAI SDK.
"""
found = False
for msg in messages:
if not isinstance(msg, dict):
continue
content = msg.get("content")
if isinstance(content, str) and _SURROGATE_RE.search(content):
msg["content"] = _SURROGATE_RE.sub('\ufffd', content)
found = True
elif isinstance(content, list):
for part in content:
if isinstance(part, dict):
text = part.get("text")
if isinstance(text, str) and _SURROGATE_RE.search(text):
part["text"] = _SURROGATE_RE.sub('\ufffd', text)
found = True
name = msg.get("name")
if isinstance(name, str) and _SURROGATE_RE.search(name):
msg["name"] = _SURROGATE_RE.sub('\ufffd', name)
found = True
tool_calls = msg.get("tool_calls")
if isinstance(tool_calls, list):
for tc in tool_calls:
if not isinstance(tc, dict):
continue
tc_id = tc.get("id")
if isinstance(tc_id, str) and _SURROGATE_RE.search(tc_id):
tc["id"] = _SURROGATE_RE.sub('\ufffd', tc_id)
found = True
fn = tc.get("function")
if isinstance(fn, dict):
fn_name = fn.get("name")
if isinstance(fn_name, str) and _SURROGATE_RE.search(fn_name):
fn["name"] = _SURROGATE_RE.sub('\ufffd', fn_name)
found = True
fn_args = fn.get("arguments")
if isinstance(fn_args, str) and _SURROGATE_RE.search(fn_args):
fn["arguments"] = _SURROGATE_RE.sub('\ufffd', fn_args)
found = True
# Walk any additional string / nested fields (reasoning,
# reasoning_content, reasoning_details, etc.) — surrogates from
# byte-level reasoning models (xiaomi/mimo, kimi, glm) can lurk
# in these fields and aren't covered by the per-field checks above.
# Matches _sanitize_messages_non_ascii's coverage (PR #10537).
for key, value in msg.items():
if key in {"content", "name", "tool_calls", "role"}:
continue
if isinstance(value, str):
if _SURROGATE_RE.search(value):
msg[key] = _SURROGATE_RE.sub('\ufffd', value)
found = True
elif isinstance(value, (dict, list)):
if _sanitize_structure_surrogates(value):
found = True
return found
def _escape_invalid_chars_in_json_strings(raw: str) -> str:
"""Escape unescaped control chars inside JSON string values.
Walks the raw JSON character-by-character, tracking whether we are
inside a double-quoted string. Inside strings, replaces literal
control characters (0x00-0x1F) that aren't already part of an escape
sequence with their ``\\uXXXX`` equivalents. Pass-through for everything
else.
Ported from #12093 — complements the other repair passes in
``_repair_tool_call_arguments`` when ``json.loads(strict=False)`` is
not enough (e.g. llama.cpp backends that emit literal apostrophes or
tabs alongside other malformations).
"""
out: list[str] = []
in_string = False
i = 0
n = len(raw)
while i < n:
ch = raw[i]
if in_string:
if ch == "\\" and i + 1 < n:
# Already-escaped char — pass through as-is
out.append(ch)
out.append(raw[i + 1])
i += 2
continue
if ch == '"':
in_string = False
out.append(ch)
elif ord(ch) < 0x20:
out.append(f"\\u{ord(ch):04x}")
else:
out.append(ch)
else:
if ch == '"':
in_string = True
out.append(ch)
i += 1
return "".join(out)
def _repair_tool_call_arguments(raw_args: str, tool_name: str = "?") -> str:
"""Attempt to repair malformed tool_call argument JSON.
Models like GLM-5.1 via Ollama can produce truncated JSON, trailing
commas, Python ``None``, etc. The API proxy rejects these with HTTP 400
"invalid tool call arguments". This function applies common repairs;
if all fail it returns ``"{}"`` so the request succeeds (better than
crashing the session). All repairs are logged at WARNING level.
"""
raw_stripped = raw_args.strip() if isinstance(raw_args, str) else ""
# Fast-path: empty / whitespace-only -> empty object
if not raw_stripped:
logger.warning("Sanitized empty tool_call arguments for %s", tool_name)
return "{}"
# Python-literal None -> normalise to {}
if raw_stripped == "None":
logger.warning("Sanitized Python-None tool_call arguments for %s", tool_name)
return "{}"
# Repair pass 0: llama.cpp backends sometimes emit literal control
# characters (tabs, newlines) inside JSON string values. json.loads
# with strict=False accepts these and lets us re-serialise the
# result into wire-valid JSON without any string surgery. This is
# the most common local-model repair case (#12068).
try:
parsed = json.loads(raw_stripped, strict=False)
reserialised = json.dumps(parsed, separators=(",", ":"))
if reserialised != raw_stripped:
logger.warning(
"Repaired unescaped control chars in tool_call arguments for %s",
tool_name,
)
return reserialised
except (json.JSONDecodeError, TypeError, ValueError):
pass
# Attempt common JSON repairs
fixed = raw_stripped
# 1. Strip trailing commas before } or ]
fixed = re.sub(r',\s*([}\]])', r'\1', fixed)
# 2. Close unclosed structures
open_curly = fixed.count('{') - fixed.count('}')
open_bracket = fixed.count('[') - fixed.count(']')
if open_curly > 0:
fixed += '}' * open_curly
if open_bracket > 0:
fixed += ']' * open_bracket
# 3. Remove excess closing braces/brackets (bounded to 50 iterations)
for _ in range(50):
try:
json.loads(fixed)
break
except json.JSONDecodeError:
if fixed.endswith('}') and fixed.count('}') > fixed.count('{'):
fixed = fixed[:-1]
elif fixed.endswith(']') and fixed.count(']') > fixed.count('['):
fixed = fixed[:-1]
else:
break
try:
json.loads(fixed)
logger.warning(
"Repaired malformed tool_call arguments for %s: %s%s",
tool_name, raw_stripped[:80], fixed[:80],
)
return fixed
except json.JSONDecodeError:
pass
# Repair pass 4: escape unescaped control chars inside JSON strings,
# then retry. Catches cases where strict=False alone fails because
# other malformations are present too.
try:
escaped = _escape_invalid_chars_in_json_strings(fixed)
if escaped != fixed:
json.loads(escaped)
logger.warning(
"Repaired control-char-laced tool_call arguments for %s: %s%s",
tool_name, raw_stripped[:80], escaped[:80],
)
return escaped
except (json.JSONDecodeError, TypeError, ValueError):
pass
# Last resort: replace with empty object so the API request doesn't
# crash the entire session.
logger.warning(
"Unrepairable tool_call arguments for %s"
"replaced with empty object (was: %s)",
tool_name, raw_stripped[:80],
)
return "{}"
def _strip_non_ascii(text: str) -> str:
"""Remove non-ASCII characters, replacing with closest ASCII equivalent or removing.
Used as a last resort when the system encoding is ASCII and can't handle
any non-ASCII characters (e.g. LANG=C on Chromebooks).
"""
return text.encode('ascii', errors='ignore').decode('ascii')
def _sanitize_messages_non_ascii(messages: list) -> bool:
"""Strip non-ASCII characters from all string content in a messages list.
This is a last-resort recovery for systems with ASCII-only encoding
(LANG=C, Chromebooks, minimal containers). Returns True if any
non-ASCII content was found and sanitized.
"""
found = False
for msg in messages:
if not isinstance(msg, dict):
continue
# Sanitize content (string)
content = msg.get("content")
if isinstance(content, str):
sanitized = _strip_non_ascii(content)
if sanitized != content:
msg["content"] = sanitized
found = True
elif isinstance(content, list):
for part in content:
if isinstance(part, dict):
text = part.get("text")
if isinstance(text, str):
sanitized = _strip_non_ascii(text)
if sanitized != text:
part["text"] = sanitized
found = True
# Sanitize name field (can contain non-ASCII in tool results)
name = msg.get("name")
if isinstance(name, str):
sanitized = _strip_non_ascii(name)
if sanitized != name:
msg["name"] = sanitized
found = True
# Sanitize tool_calls
tool_calls = msg.get("tool_calls")
if isinstance(tool_calls, list):
for tc in tool_calls:
if isinstance(tc, dict):
fn = tc.get("function", {})
if isinstance(fn, dict):
fn_args = fn.get("arguments")
if isinstance(fn_args, str):
sanitized = _strip_non_ascii(fn_args)
if sanitized != fn_args:
fn["arguments"] = sanitized
found = True
# Sanitize any additional top-level string fields (e.g. reasoning_content)
for key, value in msg.items():
if key in {"content", "name", "tool_calls", "role"}:
continue
if isinstance(value, str):
sanitized = _strip_non_ascii(value)
if sanitized != value:
msg[key] = sanitized
found = True
return found
def _sanitize_tools_non_ascii(tools: list) -> bool:
"""Strip non-ASCII characters from tool payloads in-place."""
return _sanitize_structure_non_ascii(tools)
def _strip_images_from_messages(messages: list) -> bool:
"""Remove image_url content parts from all messages in-place.
Called when a server signals it does not support images (e.g.
"Only 'text' content type is supported."). Mutates messages so the
next API call sends text only.
Preserves message alternation invariants:
* ``tool``-role messages whose content was entirely images are replaced
with a plaintext placeholder, NOT deleted — deleting them would leave
the paired ``tool_call_id`` on the prior assistant message unmatched,
which providers reject with HTTP 400.
* Non-tool messages whose content becomes empty are dropped. In
practice this only hits synthetic image-only user messages appended
for attachment delivery; real user turns always include text.
Returns True if any image parts were removed.
"""
found = False
to_delete = []
for i, msg in enumerate(messages):
if not isinstance(msg, dict):
continue
content = msg.get("content")
if not isinstance(content, list):
continue
new_parts = []
for part in content:
if isinstance(part, dict) and part.get("type") in {"image_url", "image", "input_image"}:
found = True
else:
new_parts.append(part)
if len(new_parts) < len(content):
if new_parts:
msg["content"] = new_parts
elif msg.get("role") == "tool":
# Preserve tool_call_id linkage — providers require every
# assistant tool_call to have a matching tool response.
msg["content"] = "[image content removed — server does not support images]"
else:
# Synthetic image-only user/assistant message with no text;
# safe to drop.
to_delete.append(i)
for i in reversed(to_delete):
del messages[i]
return found
def _sanitize_structure_non_ascii(payload: Any) -> bool:
"""Strip non-ASCII characters from nested dict/list payloads in-place."""
found = False
def _walk(node):
nonlocal found
if isinstance(node, dict):
for key, value in node.items():
if isinstance(value, str):
sanitized = _strip_non_ascii(value)
if sanitized != value:
node[key] = sanitized
found = True
elif isinstance(value, (dict, list)):
_walk(value)
elif isinstance(node, list):
for idx, value in enumerate(node):
if isinstance(value, str):
sanitized = _strip_non_ascii(value)
if sanitized != value:
node[idx] = sanitized
found = True
elif isinstance(value, (dict, list)):
_walk(value)
_walk(payload)
return found
__all__ = [
"_SURROGATE_RE",
"_sanitize_surrogates",
"_sanitize_structure_surrogates",
"_sanitize_messages_surrogates",
"_escape_invalid_chars_in_json_strings",
"_repair_tool_call_arguments",
"_strip_non_ascii",
"_sanitize_messages_non_ascii",
"_sanitize_tools_non_ascii",
"_strip_images_from_messages",
"_sanitize_structure_non_ascii",
]

View File

@@ -10,7 +10,7 @@ import os
import re
import time
from pathlib import Path
from typing import Any, Dict, List, Optional, Tuple
from typing import Any, Dict, List, Optional
from urllib.parse import urlparse
import requests
@@ -47,7 +47,7 @@ def _resolve_requests_verify() -> bool | str:
_PROVIDER_PREFIXES: frozenset[str] = frozenset({
"openrouter", "nous", "openai-codex", "copilot", "copilot-acp",
"gemini", "ollama-cloud", "zai", "kimi-coding", "kimi-coding-cn", "stepfun", "minimax", "minimax-oauth", "minimax-cn", "anthropic", "deepseek",
"opencode-zen", "opencode-go", "kilocode", "alibaba", "novita",
"opencode-zen", "opencode-go", "ai-gateway", "kilocode", "alibaba",
"qwen-oauth",
"xiaomi",
"arcee",
@@ -59,14 +59,14 @@ _PROVIDER_PREFIXES: frozenset[str] = frozenset({
"glm", "z-ai", "z.ai", "zhipu", "github", "github-copilot",
"github-models", "kimi", "moonshot", "kimi-cn", "moonshot-cn", "claude", "deep-seek",
"ollama",
"stepfun", "opencode", "zen", "go", "kilo", "dashscope", "aliyun", "qwen",
"stepfun", "opencode", "zen", "go", "vercel", "kilo", "dashscope", "aliyun", "qwen",
"mimo", "xiaomi-mimo",
"tencent", "tokenhub", "tencent-cloud", "tencentmaas",
"arcee-ai", "arceeai",
"gmi-cloud", "gmicloud",
"xai", "x-ai", "x.ai", "grok",
"nvidia", "nim", "nvidia-nim", "nemotron",
"qwen-portal", "novita-ai", "novitaai",
"qwen-portal",
})
@@ -104,8 +104,6 @@ def _strip_provider_prefix(model: str) -> str:
_model_metadata_cache: Dict[str, Dict[str, Any]] = {}
_model_metadata_cache_time: float = 0
_novita_metadata_cache: Dict[str, Dict[str, Any]] = {}
_novita_metadata_cache_time: float = 0
_MODEL_CACHE_TTL = 3600
_endpoint_model_metadata_cache: Dict[str, Dict[str, Dict[str, Any]]] = {}
_endpoint_model_metadata_cache_time: Dict[str, float] = {}
@@ -141,8 +139,6 @@ DEFAULT_CONTEXT_LENGTHS = {
# fuzzy-match collisions (e.g. "anthropic/claude-sonnet-4" is a
# substring of "anthropic/claude-sonnet-4.6").
# OpenRouter-prefixed models resolve via OpenRouter live API or models.dev.
"claude-opus-4-8": 1000000,
"claude-opus-4.8": 1000000,
"claude-opus-4-7": 1000000,
"claude-opus-4.7": 1000000,
"claude-opus-4-6": 1000000,
@@ -196,7 +192,6 @@ DEFAULT_CONTEXT_LENGTHS = {
"llama": 131072,
# Qwen — specific model families before the catch-all.
# Official docs: https://help.aliyun.com/zh/model-studio/developer-reference/
"qwen3.6-plus": 1048576, # 1M context (DashScope/Alibaba & OpenRouter)
"qwen3-coder-plus": 1000000, # 1M context
"qwen3-coder": 262144, # 256K context
"qwen": 131072,
@@ -211,12 +206,11 @@ DEFAULT_CONTEXT_LENGTHS = {
# via a custom provider. Values sourced from models.dev (2026-04).
# Keys use substring matching (longest-first), so e.g. "grok-4.20"
# matches "grok-4.20-0309-reasoning" / "-non-reasoning" / "-multi-agent-0309".
"grok-build": 256000, # grok-build-0.1
"grok-code-fast": 256000, # grok-code-fast-1
"grok-4-1-fast": 2000000, # grok-4-1-fast-(non-)reasoning
"grok-2-vision": 8192, # grok-2-vision, -1212, -latest
"grok-4-fast": 2000000, # grok-4-fast-(non-)reasoning, also matches -reasoning
"grok-4-fast": 2000000, # grok-4-fast-(non-)reasoning
"grok-4.20": 2000000, # grok-4.20-0309-(non-)reasoning, -multi-agent-0309
"grok-4.3": 1000000, # grok-4.3, grok-4.3-latest — 1M context per docs.x.ai
"grok-4": 256000, # grok-4, grok-4-0709
"grok-3": 131072, # grok-3, grok-3-mini, grok-3-fast, grok-3-mini-fast
"grok-2": 131072, # grok-2, grok-2-1212, grok-2-latest
@@ -291,7 +285,6 @@ def grok_supports_reasoning_effort(model: str) -> bool:
_CONTEXT_LENGTH_KEYS = (
"context_length",
"context_window",
"context_size",
"max_context_length",
"max_position_embeddings",
"max_model_len",
@@ -361,12 +354,6 @@ _URL_TO_PROVIDER: Dict[str, str] = {
"api.deepseek.com": "deepseek",
"api.githubcopilot.com": "copilot",
"models.github.ai": "copilot",
# GitHub Models free tier (Azure-hosted prototyping endpoint) — same
# canonical provider as the Copilot API. Hard per-request token cap
# (often 8K) makes it unusable for Hermes' system prompt, but mapping
# it here lets us recognize the endpoint and emit a targeted hint
# instead of falling through the unknown-custom-endpoint path.
"models.inference.ai.azure.com": "copilot",
"api.fireworks.ai": "fireworks",
"opencode.ai": "opencode-go",
"api.x.ai": "xai",
@@ -374,7 +361,6 @@ _URL_TO_PROVIDER: Dict[str, str] = {
"api.xiaomimimo.com": "xiaomi",
"xiaomimimo.com": "xiaomi",
"api.gmi-serving.com": "gmi",
"api.novita.ai": "novita",
"tokenhub.tencentmaas.com": "tencent-tokenhub",
"ollama.com": "ollama-cloud",
}
@@ -571,16 +557,6 @@ def _extract_max_completion_tokens(payload: Dict[str, Any]) -> Optional[int]:
def _extract_pricing(payload: Dict[str, Any]) -> Dict[str, Any]:
novita_input = payload.get("input_token_price_per_m")
novita_output = payload.get("output_token_price_per_m")
if novita_input is not None or novita_output is not None:
pricing: Dict[str, Any] = {}
if novita_input is not None:
pricing["prompt"] = str(float(novita_input) / 10_000 / 1_000_000)
if novita_output is not None:
pricing["completion"] = str(float(novita_output) / 10_000 / 1_000_000)
return pricing
alias_map = {
"prompt": ("prompt", "input", "input_cost_per_token", "prompt_token_cost"),
"completion": ("completion", "output", "output_cost_per_token", "completion_token_cost"),
@@ -642,7 +618,7 @@ def fetch_model_metadata(force_refresh: bool = False) -> Dict[str, Dict[str, Any
return cache
except Exception as e:
logger.warning(f"Failed to fetch model metadata from OpenRouter: {e}")
logging.warning(f"Failed to fetch model metadata from OpenRouter: {e}")
return _model_metadata_cache or {}
@@ -913,33 +889,12 @@ def parse_context_limit_from_error(error_msg: str) -> Optional[int]:
return None
def get_context_length_from_provider_error(
error_msg: str,
current_context_length: int,
) -> Optional[int]:
"""Return a provider-reported lower context limit, if one is present.
Context-overflow recovery must not invent a new model window size. Some
providers only say that the input exceeds the context window without
reporting the actual maximum. In that case callers should keep the
configured context length and try compression only, rather than stepping
down through guessed probe tiers (1M → 256K → 128K → ...).
"""
parsed_limit = parse_context_limit_from_error(error_msg)
if parsed_limit is None:
return None
if parsed_limit < current_context_length:
return parsed_limit
return None
def parse_available_output_tokens_from_error(error_msg: str) -> Optional[int]:
"""Detect an "output cap too large" error and return how many output tokens are available.
Background — two distinct context errors exist:
1. "Prompt too long" — the INPUT itself exceeds the context window.
Fix: compress history, and only reduce context_length if the
provider explicitly reports the actual lower limit.
Fix: compress history and/or halve context_length.
2. "max_tokens too large" — input is fine, but input + requested_output > window.
Fix: reduce max_tokens (the output cap) for this call.
Do NOT touch context_length — the window hasn't shrunk.
@@ -1375,66 +1330,27 @@ def _resolve_codex_oauth_context_length(
return None
def _resolve_nous_context_length(
model: str,
base_url: str = "",
api_key: str = "",
) -> Tuple[Optional[int], str]:
"""Resolve Nous Portal model context length.
def _resolve_nous_context_length(model: str) -> Optional[int]:
"""Resolve Nous Portal model context length via OpenRouter metadata.
Tries the live Nous inference endpoint first (authoritative), then falls
back to OpenRouter metadata with suffix/version matching.
Nous model IDs are bare after prefix-stripping (e.g. 'qwen3.6-plus',
'claude-opus-4-6') while OpenRouter uses prefixed IDs (e.g.
'qwen/qwen3.6-plus', 'anthropic/claude-opus-4.6'). Version
normalization (dot↔dash) is applied to handle name drifts.
Returns ``(context_length, source)`` where ``source`` is one of:
- ``"portal"`` — live /v1/models response (authoritative)
- ``"openrouter"`` — OpenRouter cache fallback (non-authoritative;
callers must NOT persist this to the on-disk cache or a single
portal blip will freeze the wrong value in forever)
- ``""`` — could not resolve
Nous model IDs are bare (e.g. 'claude-opus-4-6') while OpenRouter uses
prefixed IDs (e.g. 'anthropic/claude-opus-4.6'). Try suffix matching
with version normalization (dot↔dash).
"""
# Portal first — the Nous /models endpoint is authoritative for what our
# infrastructure enforces and may differ from OR (e.g. OR reports 1M for
# qwen3.6-plus; the portal correctly says 262144). Fall back to the OR
# catalog only if the portal doesn't list the model.
if base_url:
portal_ctx = _resolve_endpoint_context_length(model, base_url, api_key=api_key)
if portal_ctx is not None:
return portal_ctx, "portal"
metadata = fetch_model_metadata()
def _safe_ctx(or_id: str, entry: dict) -> Optional[int]:
ctx = entry.get("context_length")
if ctx is None:
return None
if ctx <= 32768 and _model_name_suggests_kimi(or_id):
logger.info(
"Rejecting OpenRouter metadata context=%s for %r "
"(Kimi-family underreport, Nous path); falling through to hardcoded defaults",
ctx, or_id,
)
return None
return ctx
metadata = fetch_model_metadata() # OpenRouter cache
# Exact match first
if model in metadata:
ctx = _safe_ctx(model, metadata[model])
if ctx is not None:
return ctx, "openrouter"
return metadata[model].get("context_length")
normalized = _normalize_model_version(model).lower()
for or_id, entry in metadata.items():
bare = or_id.split("/", 1)[1] if "/" in or_id else or_id
if bare.lower() == model.lower() or _normalize_model_version(bare).lower() == normalized:
ctx = _safe_ctx(or_id, entry)
if ctx is not None:
return ctx, "openrouter"
return entry.get("context_length")
# Partial prefix match for cases like gemini-3-flash → gemini-3-flash-preview
# Require match to be at a word boundary (followed by -, :, or end of string)
model_lower = model.lower()
for or_id, entry in metadata.items():
bare = or_id.split("/", 1)[1] if "/" in or_id else or_id
@@ -1442,11 +1358,9 @@ def _resolve_nous_context_length(
if candidate.startswith(query) and (
len(candidate) == len(query) or candidate[len(query)] in "-:."
):
ctx = _safe_ctx(or_id, entry)
if ctx is not None:
return ctx, "openrouter"
return entry.get("context_length")
return None, ""
return None
def get_model_context_length(
@@ -1461,18 +1375,14 @@ def get_model_context_length(
Resolution order:
0. Explicit config override (model.context_length or custom_providers per-model)
1. Persistent cache (previously discovered via probing). Nous URLs
bypass the cache here so step 5b can always reconcile against
the authoritative portal /v1/models response.
1. Persistent cache (previously discovered via probing)
1b. AWS Bedrock static table (must precede custom-endpoint probe)
2. Active endpoint metadata (/models for explicit custom endpoints)
3. Local server query (for local endpoints)
4. Anthropic /v1/models API (API-key users only, not OAuth)
5. Provider-aware lookups (before generic OpenRouter cache):
a. Copilot live /models API
b. Nous: live /v1/models probe first (authoritative), then OR
cache fallback with suffix/version normalisation. Only
portal-derived values are persisted to disk.
b. Nous suffix-match via OpenRouter cache
c. Codex OAuth /models probe
d. GMI /models endpoint
e. Ollama native /api/show probe (any base_url, provider-agnostic)
@@ -1527,28 +1437,6 @@ def get_model_context_length(
model, base_url, f"{cached:,}",
)
_invalidate_cached_context_length(model, base_url)
# Invalidate stale 32k cache entries for Kimi-family models.
elif cached <= 32768 and _model_name_suggests_kimi(model):
logger.info(
"Dropping stale Kimi cache entry %s@%s -> %s (OpenRouter underreport); "
"re-resolving via hardcoded defaults",
model, base_url, f"{cached:,}",
)
_invalidate_cached_context_length(model, base_url)
# Nous Portal: the portal /v1/models endpoint is authoritative.
# Bypass the persistent cache so step 5b can always reconcile
# against it — this corrects pre-fix entries seeded from the
# OR catalog (the same OR underreport class that the Kimi/Qwen
# DEFAULT_CONTEXT_LENGTHS overrides exist to mitigate) without
# touching the on-disk file when the portal is unreachable.
# The in-memory 300s endpoint metadata cache makes the per-call
# cost amortise to ~0 within a process.
elif _infer_provider_from_url(base_url) == "nous":
logger.debug(
"Bypassing persistent cache for %s@%s (Nous portal authoritative)",
model, base_url,
)
# Fall through; step 5b reconciles and overwrites if portal responds.
else:
return cached
@@ -1572,13 +1460,6 @@ def get_model_context_length(
except ImportError:
pass # boto3 not installed — fall through to generic resolution
if provider == "novita" or (base_url and base_url_host_matches(base_url, "api.novita.ai")):
ctx = _resolve_endpoint_context_length(model, base_url or "https://api.novita.ai/openai/v1", api_key=api_key)
if ctx is not None:
if base_url:
save_context_length(model, base_url, ctx)
return ctx
# 2. Active endpoint metadata for truly custom/unknown endpoints.
# Known providers (Copilot, OpenAI, Anthropic, etc.) skip this — their
# /models endpoint may report a provider-imposed limit (e.g. Copilot
@@ -1647,18 +1528,8 @@ def get_model_context_length(
pass # Fall through to models.dev
if effective_provider == "nous":
ctx, source = _resolve_nous_context_length(
model, base_url=base_url or "", api_key=api_key or ""
)
ctx = _resolve_nous_context_length(model)
if ctx:
# Persist ONLY portal-derived values. Caching an OR-fallback
# value here would freeze in a wrong number on the first portal
# blip / auth glitch and step-1 would short-circuit it forever.
# OR's catalog is community-maintained and is precisely why the
# Kimi/Qwen DEFAULT_CONTEXT_LENGTHS overrides exist — we don't
# want it leaking into the persistent cache for Nous URLs.
if base_url and source == "portal":
save_context_length(model, base_url, ctx)
return ctx
if effective_provider == "openai-codex":
# Codex OAuth enforces lower context limits than the direct OpenAI
@@ -1704,6 +1575,14 @@ def get_model_context_length(
if model in metadata:
or_ctx = metadata[model].get("context_length", DEFAULT_FALLBACK_CONTEXT)
# Guard against stale OpenRouter metadata for Kimi-family models.
# OpenRouter reports 32768 for moonshotai/kimi-k2.6, but the model
# actually supports 262144 (models.dev + official Kimi docs agree).
# Providers that host their own Kimi endpoints (Ollama Cloud, Kimi
# Coding, Moonshot) would otherwise trip the 64k minimum-context
# guard and reject a perfectly capable model.
# The filter is narrow: only reject exactly 32768 for Kimi-named
# models. If OpenRouter ever updates its data, the stale path
# becomes dead code with no impact.
if or_ctx == 32768 and _model_name_suggests_kimi(model):
logger.info(
"Rejecting OpenRouter metadata context=%s for %r "

View File

@@ -141,7 +141,6 @@ class ProviderInfo:
# Hermes provider names → models.dev provider IDs
PROVIDER_TO_MODELS_DEV: Dict[str, str] = {
"openrouter": "openrouter",
"novita": "novita-ai",
"anthropic": "anthropic",
"openai": "openai",
"openai-codex": "openai",
@@ -158,6 +157,7 @@ PROVIDER_TO_MODELS_DEV: Dict[str, str] = {
"alibaba": "alibaba",
"qwen-oauth": "alibaba",
"copilot": "github-copilot",
"ai-gateway": "vercel",
"opencode-zen": "opencode",
"opencode-go": "opencode-go",
"kilocode": "kilo",
@@ -166,9 +166,6 @@ PROVIDER_TO_MODELS_DEV: Dict[str, str] = {
"gemini": "google",
"google": "google",
"xai": "xai",
# xAI OAuth is an authentication/transport path for the same xAI model
# catalog, so model metadata should resolve through the xAI provider.
"xai-oauth": "xai",
"xiaomi": "xiaomi",
"nvidia": "nvidia",
"groq": "groq",

View File

@@ -15,18 +15,6 @@ and MoonshotAI/kimi-cli#1595:
2. When ``anyOf`` is used, ``type`` must be on the ``anyOf`` children, not
the parent. Presence of both causes "type should be defined in anyOf
items instead of the parent schema".
3. ``enum`` arrays on scalar-typed nodes may not contain ``null`` or empty
strings. Strip those entries (drop the enum entirely if it becomes empty).
4. ``$ref`` nodes may not carry sibling keywords. Moonshot expands the
reference before validation and then rejects the node if sibling keys
like ``description`` remain on the same node as ``$ref``. Strip every
sibling from ``$ref`` nodes so only ``{"$ref": "..."}`` survives.
(Ported from anomalyco/opencode#24730.)
5. ``items`` may not be a tuple-style array (``items: [schemaA, schemaB]``
for positional element schemas). Moonshot's schema engine requires a
single object schema applied to every array element. Collapse tuple
``items`` to the first element schema (or ``{}`` if the tuple is empty).
(Ported from anomalyco/opencode#24730.)
The ``#/definitions/...`` → ``#/$defs/...`` rewrite for draft-07 refs is
handled separately in ``tools/mcp_tool._normalize_mcp_input_schema`` so it
@@ -78,16 +66,6 @@ def _repair_schema(node: Any, is_schema: bool = True) -> Any:
}
elif key in _SCHEMA_LIST_KEYS and isinstance(value, list):
repaired[key] = [_repair_schema(v, is_schema=True) for v in value]
elif key == "items" and isinstance(value, list):
# Rule 5: tuple-style ``items`` arrays (positional element
# schemas) are not accepted by Moonshot. Collapse to the
# first element schema if present, else to ``{}``. This
# matches opencode's behaviour for moonshotai / kimi models.
first = value[0] if value else {}
if isinstance(first, dict):
repaired[key] = _repair_schema(first, is_schema=True)
else:
repaired[key] = first
elif key in _SCHEMA_NODE_KEYS:
# items / not / additionalProperties: single nested schema.
# additionalProperties can also be a bool — leave those alone.
@@ -152,15 +130,6 @@ def _repair_schema(node: Any, is_schema: bool = True) -> Any:
else:
repaired.pop("enum")
# Rule 4: $ref nodes must not have sibling keywords. Moonshot expands
# the reference before validation and then rejects the node if siblings
# like ``description`` / ``type`` / ``default`` appear alongside $ref.
# The referenced definition still carries its own description on the
# target node, which Moonshot accepts.
# (Ported from anomalyco/opencode#24730.)
if "$ref" in repaired:
return {"$ref": repaired["$ref"]}
return repaired

View File

@@ -1,64 +0,0 @@
"""Centralized Nous Portal request tags.
Every Hermes request that hits the Nous Portal — main agent loop, auxiliary
client (compression / titles / vision / web_extract / session_search / etc.),
and any future code path — must carry the same product-attribution tags so
Nous can attribute usage to Hermes Agent and bucket it by client release.
Tag shape (sent in OpenAI-compatible ``extra_body['tags']``):
[
"product=hermes-agent",
"client=hermes-client-v<__version__>",
]
The version is sourced live from ``hermes_cli.__version__`` so it auto-aligns
to whatever release is installed; the release script
(``scripts/release.py``) regex-bumps that single string, and every Portal
request picks up the new tag on the next process start.
Why one helper instead of inlining the literal at each site:
* Four call sites (main loop profile, aux client, run_agent compression
fallback, web_tools fallback) used to drift apart — see PR #24194 which
only got the aux site, leaving the main loop sending a different tag set.
* Tests should assert the same tag list everywhere; centralizing makes that
assertion a one-liner against this module.
Do NOT pre-compute these as module-level constants in the consumers. The
version can change at runtime (editable installs, hot-reload tooling), and
``hermes_cli.__version__`` is the canonical source of truth.
"""
from __future__ import annotations
from typing import List
def _hermes_version() -> str:
"""Return the current Hermes release version, e.g. ``"0.13.0"``.
Falls back to ``"unknown"`` if ``hermes_cli`` cannot be imported (should
never happen in a real install — guarded for defensive testing).
"""
try:
from hermes_cli import __version__
return __version__
except Exception:
return "unknown"
def hermes_client_tag() -> str:
"""Return the ``client=...`` tag for Nous Portal requests.
Format: ``client=hermes-client-v<MAJOR>.<MINOR>.<PATCH>``.
"""
return f"client=hermes-client-v{_hermes_version()}"
def nous_portal_tags() -> List[str]:
"""Return the canonical list of Nous Portal product tags.
Always returns a fresh list so callers can mutate it freely
(e.g. ``merged_extra.setdefault("tags", []).extend(nous_portal_tags())``).
"""
return ["product=hermes-agent", hermes_client_tag()]

View File

@@ -1,167 +0,0 @@
"""Process-level bootstrap helpers for ``run_agent``.
Three concerns, all tied to ``AIAgent`` boot-time / runtime IO setup:
1. **Lazy OpenAI SDK import** — ``_load_openai_cls`` + ``_OpenAIProxy``
defer the 240ms-ish ``from openai import OpenAI`` cost until first use,
while preserving ``isinstance(client, OpenAI)`` checks and
``patch("run_agent.OpenAI", ...)`` test patterns.
2. **Crash-resistant stdio** — ``_SafeWriter`` wraps stdout/stderr so
``OSError: Input/output error`` from broken pipes (systemd, Docker,
thread teardown races) cannot crash the agent. ``_install_safe_stdio``
applies the wrapper.
3. **HTTP proxy resolution** — ``_get_proxy_from_env`` reads
``HTTPS_PROXY`` / ``HTTP_PROXY`` / ``ALL_PROXY``;
``_get_proxy_for_base_url`` respects ``NO_PROXY`` for the given base URL.
``run_agent`` re-exports every name so existing
``from run_agent import _get_proxy_from_env`` imports keep working
unchanged.
"""
from __future__ import annotations
import os
import sys
import urllib.request
from typing import Optional
from utils import base_url_hostname, normalize_proxy_url
# Cached at module level so we only pay the OpenAI SDK import cost once
# per process (after the first lazy load).
_OPENAI_CLS_CACHE = None
def _load_openai_cls() -> type:
"""Import and cache ``openai.OpenAI``."""
global _OPENAI_CLS_CACHE
if _OPENAI_CLS_CACHE is None:
from openai import OpenAI as _cls
_OPENAI_CLS_CACHE = _cls
return _OPENAI_CLS_CACHE
class _OpenAIProxy:
"""Module-level proxy that looks like ``openai.OpenAI`` but imports lazily."""
__slots__ = ()
def __call__(self, *args, **kwargs):
return _load_openai_cls()(*args, **kwargs)
def __instancecheck__(self, obj):
return isinstance(obj, _load_openai_cls())
def __repr__(self):
return "<lazy openai.OpenAI proxy>"
class _SafeWriter:
"""Transparent stdio wrapper that catches OSError/ValueError from broken pipes.
When hermes-agent runs as a systemd service, Docker container, or headless
daemon, the stdout/stderr pipe can become unavailable (idle timeout, buffer
exhaustion, socket reset). Any print() call then raises
``OSError: [Errno 5] Input/output error``, which can crash agent setup or
run_conversation() — especially via double-fault when an except handler
also tries to print.
Additionally, when subagents run in ThreadPoolExecutor threads, the shared
stdout handle can close between thread teardown and cleanup, raising
``ValueError: I/O operation on closed file`` instead of OSError.
This wrapper delegates all writes to the underlying stream and silently
catches both OSError and ValueError. It is transparent when the wrapped
stream is healthy.
"""
__slots__ = ("_inner",)
def __init__(self, inner):
object.__setattr__(self, "_inner", inner)
def write(self, data):
try:
return self._inner.write(data)
except (OSError, ValueError):
return len(data) if isinstance(data, str) else 0
def flush(self):
try:
self._inner.flush()
except (OSError, ValueError):
pass
def fileno(self):
return self._inner.fileno()
def isatty(self):
try:
return self._inner.isatty()
except (OSError, ValueError):
return False
def __getattr__(self, name):
return getattr(self._inner, name)
def _get_proxy_from_env() -> Optional[str]:
"""Read proxy URL from environment variables.
Checks HTTPS_PROXY, HTTP_PROXY, ALL_PROXY (and lowercase variants) in order.
Returns the first valid proxy URL found, or None if no proxy is configured.
"""
for key in ("HTTPS_PROXY", "HTTP_PROXY", "ALL_PROXY",
"https_proxy", "http_proxy", "all_proxy"):
value = os.environ.get(key, "").strip()
if value:
return normalize_proxy_url(value)
return None
def _get_proxy_for_base_url(base_url: Optional[str]) -> Optional[str]:
"""Return an env-configured proxy unless NO_PROXY excludes this base URL."""
proxy = _get_proxy_from_env()
if not proxy or not base_url:
return proxy
host = base_url_hostname(base_url)
if not host:
return proxy
try:
if urllib.request.proxy_bypass_environment(host):
return None
except Exception:
pass
return proxy
def _install_safe_stdio() -> None:
"""Wrap stdout/stderr so best-effort console output cannot crash the agent."""
for stream_name in ("stdout", "stderr"):
stream = getattr(sys, stream_name, None)
if stream is not None and not isinstance(stream, _SafeWriter):
setattr(sys, stream_name, _SafeWriter(stream))
# Module-level proxy instance — drops in for ``openai.OpenAI``. Imported as
# ``from agent.process_bootstrap import OpenAI`` (or re-exported via
# ``run_agent`` for legacy tests).
OpenAI = _OpenAIProxy()
__all__ = [
"OpenAI",
"_OpenAIProxy",
"_load_openai_cls",
"_SafeWriter",
"_install_safe_stdio",
"_get_proxy_from_env",
"_get_proxy_for_base_url",
]

View File

@@ -7,6 +7,7 @@ assemble pieces, then combines them with memory and ephemeral prompts.
import json
import logging
import os
import re
import threading
from collections import OrderedDict
from pathlib import Path
@@ -28,30 +29,43 @@ from utils import atomic_json_write
logger = logging.getLogger(__name__)
# ---------------------------------------------------------------------------
# Context file scanning — detect prompt injection / promptware in AGENTS.md,
# .cursorrules, SOUL.md before they get injected into the system prompt.
#
# Patterns live in ``tools/threat_patterns.py`` — the single source of truth
# shared with the memory-tool scanner and the tool-result delimiter system.
# This module just chooses how to react when a match is found (block-with-
# placeholder; the actual content never reaches the system prompt).
# Context file scanning — detect prompt injection in AGENTS.md, .cursorrules,
# SOUL.md before they get injected into the system prompt.
# ---------------------------------------------------------------------------
from tools.threat_patterns import scan_for_threats as _scan_for_threats
_CONTEXT_THREAT_PATTERNS = [
(r'ignore\s+(previous|all|above|prior)\s+instructions', "prompt_injection"),
(r'do\s+not\s+tell\s+the\s+user', "deception_hide"),
(r'system\s+prompt\s+override', "sys_prompt_override"),
(r'disregard\s+(your|all|any)\s+(instructions|rules|guidelines)', "disregard_rules"),
(r'act\s+as\s+(if|though)\s+you\s+(have\s+no|don\'t\s+have)\s+(restrictions|limits|rules)', "bypass_restrictions"),
(r'<!--[^>]*(?:ignore|override|system|secret|hidden)[^>]*-->', "html_comment_injection"),
(r'<\s*div\s+style\s*=\s*["\'][\s\S]*?display\s*:\s*none', "hidden_div"),
(r'translate\s+.*\s+into\s+.*\s+and\s+(execute|run|eval)', "translate_execute"),
(r'curl\s+[^\n]*\$\{?\w*(KEY|TOKEN|SECRET|PASSWORD|CREDENTIAL|API)', "exfil_curl"),
(r'cat\s+[^\n]*(\.env|credentials|\.netrc|\.pgpass)', "read_secrets"),
]
_CONTEXT_INVISIBLE_CHARS = {
'\u200b', '\u200c', '\u200d', '\u2060', '\ufeff',
'\u202a', '\u202b', '\u202c', '\u202d', '\u202e',
}
def _scan_context_content(content: str, filename: str) -> str:
"""Scan context file content for injection. Returns sanitized content.
"""Scan context file content for injection. Returns sanitized content."""
findings = []
# Check invisible unicode
for char in _CONTEXT_INVISIBLE_CHARS:
if char in content:
findings.append(f"invisible unicode U+{ord(char):04X}")
# Check threat patterns
for pattern, pid in _CONTEXT_THREAT_PATTERNS:
if re.search(pattern, content, re.IGNORECASE):
findings.append(pid)
Uses the "context" scope from the shared threat-pattern library, which
covers classic injection + promptware/C2 patterns + role-play hijack.
Strict-scope patterns (SSH backdoor, persistence, exfil-URL) are NOT
applied here — those are too aggressive for a context file in a
cloned repo (security research, infra docs). Content matching is
BLOCKED at this layer because the file would otherwise enter the
system prompt verbatim and the user has no chance to intervene.
"""
findings = _scan_for_threats(content, scope="context")
if findings:
logger.warning("Context file %s blocked: %s", filename, ", ".join(findings))
return f"[BLOCKED: {filename} contained potential prompt injection ({', '.join(findings)}). Content not loaded.]"
@@ -192,12 +206,7 @@ KANBAN_GUIDANCE = (
"files outside it unless the task explicitly asks.\n"
"3. **Heartbeat on long operations.** Call `kanban_heartbeat(note=...)` "
"every few minutes during long subprocesses (training, encoding, crawling). "
"Skip heartbeats for short tasks. **If your task may run longer than 1 hour, "
"you MUST call `kanban_heartbeat` at least once an hour** — the dispatcher "
"reclaims tasks running past `kanban.dispatch_stale_timeout_seconds` "
"(default 4 hours) when no heartbeat has arrived in the last hour. A "
"reclaim re-queues the task as `ready` without penalty (no failure counter "
"tick), but you lose your current run's progress.\n"
"Skip heartbeats for short tasks.\n"
"4. **Block on genuine ambiguity.** If you need a human decision you cannot "
"infer (missing credentials, UX choice, paywalled source, peer output you "
"need first), call `kanban_block(reason=\"...\")` and stop. Don't guess. "
@@ -235,11 +244,6 @@ KANBAN_GUIDANCE = (
"- Do not shell out to `hermes kanban <verb>` for board operations. Use "
"the `kanban_*` tools — they work across all terminal backends.\n"
"- Do not complete a task you didn't actually finish. Block it.\n"
"- Do not call `clarify` to ask questions. You are running headless — "
"there is no live user to answer. The call will time out and the task "
"will sit silently in `running` with no signal to the operator. Instead: "
"`kanban_comment` the context, then `kanban_block(reason=...)` so the "
"task surfaces on the board as needing input.\n"
"- Do not assign follow-up work to yourself. Assign it to the right "
"specialist profile.\n"
"- Do not call `delegate_task` as a board substitute. `delegate_task` is "
@@ -264,47 +268,12 @@ TOOL_USE_ENFORCEMENT_GUIDANCE = (
# Model name substrings that trigger tool-use enforcement guidance.
# Add new patterns here when a model family needs explicit steering.
TOOL_USE_ENFORCEMENT_MODELS = ("gpt", "codex", "gemini", "gemma", "grok", "glm", "qwen", "deepseek")
# Universal "finish the job" guidance — applied to ALL models, not gated
# by model family. Addresses two cross-model failure modes:
# 1. Stopping after a stub: writing a tiny file or running one command
# and then ending the turn with a description of the plan instead
# of the finished artifact. (Observed on Opus during a real
# Sarasota real-estate build task: 3 API calls, 85-byte file,
# one terminal command, finish_reason=stop.)
# 2. Fabricating output when a real path is blocked. When `pip` or a
# tool fails, some models will synthesize plausible-looking results
# (fake addresses, fake JSON, fake numbers) instead of reporting
# the blocker. (Observed on DeepSeek v4-flash on the same task:
# pushed through PEP-668 wall, then returned fabricated listings.)
#
# Short on purpose. This block is shipped to every user, every session,
# in the cached system prompt — token cost is paid once at install and
# then amortised across all sessions via prefix caching. Keep it tight.
TASK_COMPLETION_GUIDANCE = (
"# Finishing the job\n"
"When the user asks you to build, run, or verify something, the deliverable is "
"a working artifact backed by real tool output — not a description of one. "
"Do not stop after writing a stub, a plan, or a single command. Keep working "
"until you have actually exercised the code or produced the requested result, "
"then report what real execution returned.\n"
"If a tool, install, or network call fails and blocks the real path, say so "
"directly and try an alternative (different package manager, different "
"approach, ask the user). NEVER substitute plausible-looking fabricated "
"output (made-up data, invented file contents, synthesised API responses) "
"for results you couldn't actually produce. Reporting a blocker honestly "
"is always better than inventing a result."
)
TOOL_USE_ENFORCEMENT_MODELS = ("gpt", "codex", "gemini", "gemma", "grok")
# OpenAI GPT/Codex-specific execution guidance. Addresses known failure modes
# where GPT models abandon work on partial results, skip prerequisite lookups,
# hallucinate instead of using tools, and declare "done" without verification.
# Inspired by patterns from OpenAI's GPT-5.4 prompting guide & OpenClaw PR #38953.
# Also applied to xAI Grok — same failure modes in practice (claims completion
# without tool calls, suggests workarounds instead of using existing tools,
# replies with plans/suggestions instead of executing). The body is
# family-agnostic; the OPENAI_ prefix reflects origin, not exclusivity.
OPENAI_MODEL_EXECUTION_GUIDANCE = (
"# Execution discipline\n"
"<tool_persistence>\n"
@@ -645,7 +614,7 @@ WSL_ENVIRONMENT_HINT = (
# misleading — the agent should only see the machine it can actually touch.
_REMOTE_TERMINAL_BACKENDS = frozenset({
"docker", "singularity", "modal", "daytona", "ssh",
"managed_modal",
"vercel_sandbox", "managed_modal",
})
@@ -659,6 +628,7 @@ _BACKEND_FALLBACK_DESCRIPTIONS: dict[str, str] = {
"modal": "a Modal sandbox (Linux)",
"managed_modal": "a managed Modal sandbox (Linux)",
"daytona": "a Daytona workspace (Linux)",
"vercel_sandbox": "a Vercel sandbox (Linux)",
"ssh": "a remote host reached over SSH (likely Linux)",
}
@@ -772,7 +742,7 @@ def build_environment_hints() -> str:
and a Windows-only note that `terminal` shells out to bash, not
PowerShell).
- For **remote / sandbox** terminal backends (docker, singularity,
modal, daytona, ssh): host info is **suppressed**
modal, daytona, ssh, vercel_sandbox): host info is **suppressed**
because the agent's tools can't touch the host — only the backend
matters. A live probe inside the backend reports its OS, user, $HOME,
and cwd. Falls back to a static summary if the probe fails.
@@ -848,27 +818,6 @@ def build_environment_hints() -> str:
if is_wsl():
hints.append(WSL_ENVIRONMENT_HINT)
# Embedder-supplied environment description. Lets a host that wraps Hermes
# (e.g. a sandbox runner / managed platform) explain the environment the
# agent is running in — proxy, credential handling, mount layout — without
# forking the identity slot (SOUL.md). Read once at prompt-build time, so
# it's part of the stable, cache-safe system prompt. The env var is the
# build-time/embedder mechanism (set in a container ENV); config.yaml
# ``agent.environment_hint`` is the user-facing surface. Env var wins.
extra = (os.getenv("HERMES_ENVIRONMENT_HINT") or "").strip()
if not extra:
try:
from hermes_cli.config import load_config
extra = str(
(load_config().get("agent", {}) or {}).get("environment_hint", "")
).strip()
except Exception as e:
logger.debug("Could not read agent.environment_hint from config: %s", e)
if extra:
hints.append(extra)
return "\n\n".join(hints)

View File

@@ -1,15 +1,25 @@
"""Anthropic prompt caching strategy.
"""Anthropic prompt caching strategies.
Single layout: ``system_and_3``. 4 cache_control breakpoints — system
prompt + last 3 non-system messages, all at the same TTL (5m or 1h).
Reduces input token costs by ~75% on multi-turn conversations within a
single session.
Two layouts:
* ``system_and_3`` (default, used everywhere except the long-lived path):
4 cache_control breakpoints — system prompt + last 3 non-system messages.
All at the same TTL (5m or 1h). Reduces input token costs by ~75% on
multi-turn conversations within a single session.
* ``prefix_and_2`` (Claude on Anthropic / OpenRouter / Nous Portal):
4 breakpoints split across two TTL tiers — tools[-1] (1h) +
stable system prefix (1h) + last 2 non-system messages (5m). The
long-lived prefix is byte-stable across sessions for a given user
config, so every fresh session reads the cached system+tools instead
of re-paying for them. Within-session rolling window shrinks from 3
messages to 2 to free the breakpoint budget.
Pure functions -- no class state, no AIAgent dependency.
"""
import copy
from typing import Any, Dict, List
from typing import Any, Dict, List, Optional
def _apply_cache_marker(msg: dict, cache_marker: dict, native_anthropic: bool = False) -> None:
@@ -77,3 +87,115 @@ def apply_anthropic_cache_control(
_apply_cache_marker(messages[idx], marker, native_anthropic=native_anthropic)
return messages
def _mark_system_stable_block(
messages: List[Dict[str, Any]],
long_lived_marker: Dict[str, str],
) -> bool:
"""Mark the *first* content block of the system message with the 1h marker.
The system message is expected to have been split into multiple content
blocks beforehand by the caller — block[0] is the cross-session-stable
prefix, subsequent blocks carry context files + volatile suffix.
Falls back to marking the whole system message as a single block when
the message hasn't been split (preserves correctness on the fallback path).
Returns True when a marker was placed.
"""
if not messages or messages[0].get("role") != "system":
return False
sys_msg = messages[0]
content = sys_msg.get("content")
# Already a list of blocks → mark the first block.
if isinstance(content, list) and content:
first = content[0]
if isinstance(first, dict):
first["cache_control"] = long_lived_marker
return True
return False
# String content (no split) → cannot place a stable-prefix breakpoint
# without changing the byte content. Caller is responsible for
# splitting; if they didn't, fall through to envelope marker so we still
# cache *something* for this turn.
if isinstance(content, str) and content:
sys_msg["content"] = [
{"type": "text", "text": content, "cache_control": long_lived_marker}
]
return True
return False
def apply_anthropic_cache_control_long_lived(
api_messages: List[Dict[str, Any]],
long_lived_ttl: str = "1h",
rolling_ttl: str = "5m",
native_anthropic: bool = False,
) -> List[Dict[str, Any]]:
"""Apply prefix_and_2 caching: long-lived stable prefix + rolling window.
Layout (4 breakpoints total):
* Stable system prefix (block[0]) → ``long_lived_ttl`` TTL
* Last 2 non-system messages → ``rolling_ttl`` TTL each
NOTE: this function does NOT mark the tools array. Tools cache_control
is attached separately (see ``mark_tools_for_long_lived_cache``) because
tools live outside the messages list in the API payload.
The caller MUST have split the system message into ordered content
blocks where block[0] is the cross-session-stable portion. If the system
message is still a single string, it is wrapped into a single block and
marked — this is correct, just less effective (the volatile suffix is
not isolated, so the prefix invalidates per-session).
Returns:
Deep copy of messages with cache_control breakpoints injected.
"""
messages = copy.deepcopy(api_messages)
if not messages:
return messages
long_marker = _build_marker(long_lived_ttl)
rolling_marker = _build_marker(rolling_ttl)
placed_prefix = _mark_system_stable_block(messages, long_marker)
# Reserve 1 breakpoint for the system prefix (when placed); spend the
# remaining 3 on the rolling tail. Anthropic max is 4 total —
# tools[-1] (when marked) consumes the 4th, so we cap rolling at 2 here.
rolling_budget = 2 if placed_prefix else 3
non_sys = [i for i in range(len(messages)) if messages[i].get("role") != "system"]
for idx in non_sys[-rolling_budget:]:
_apply_cache_marker(messages[idx], rolling_marker, native_anthropic=native_anthropic)
return messages
def mark_tools_for_long_lived_cache(
tools: Optional[List[Dict[str, Any]]],
long_lived_ttl: str = "1h",
) -> Optional[List[Dict[str, Any]]]:
"""Attach cache_control to the last tool in the OpenAI-format tools list.
Anthropic prefix-cache order is ``tools → system → messages``. Marking
the last tool dict caches the entire tools array (Anthropic's docs:
"the marker is placed on the last block you want included in the cached
prefix"). Marker is preserved across the OpenAI-wire boundary on
OpenRouter and Nous Portal (which proxies to OpenRouter); on native
Anthropic the marker is forwarded by ``convert_tools_to_anthropic``.
Returns a deep copy of the tools list with the marker attached, or the
input unchanged when tools is empty/None. Pure function — does not
mutate the input.
"""
if not tools:
return tools
out = copy.deepcopy(tools)
last = out[-1]
if isinstance(last, dict):
last["cache_control"] = _build_marker(long_lived_ttl)
return out

View File

@@ -103,7 +103,6 @@ _PREFIX_PATTERNS = [
r"hsk-[A-Za-z0-9]{10,}", # Hindsight API key
r"mem0_[A-Za-z0-9]{10,}", # Mem0 Platform API key
r"brv_[A-Za-z0-9]{10,}", # ByteRover API key
r"xai-[A-Za-z0-9]{30,}", # xAI (Grok) API key
]
# ENV assignment patterns: KEY=value where KEY contains a secret-like name
@@ -176,15 +175,6 @@ _URL_USERINFO_RE = re.compile(
r"(https?|wss?|ftp)://([^/\s:@]+):([^/\s@]+)@",
)
# HTTP access logs often use a relative request target rather than a full URL:
# `"POST /webhook?password=... HTTP/1.1"`. The full-URL redactor above only
# sees strings containing `://`, so handle request-target query strings too.
_HTTP_REQUEST_TARGET_QUERY_RE = re.compile(
r"\b((?:GET|POST|PUT|PATCH|DELETE|HEAD|OPTIONS|TRACE|CONNECT)\s+[^ \t\r\n\"']*?)"
r"\?([^ \t\r\n\"']+)",
re.IGNORECASE,
)
# Form-urlencoded body detection: conservative — only applies when the entire
# text looks like a query string (k=v&k=v pattern with no newlines).
_FORM_BODY_RE = re.compile(
@@ -302,15 +292,6 @@ def _redact_url_userinfo(text: str) -> str:
)
def _redact_http_request_target_query_params(text: str) -> str:
"""Redact sensitive query params in HTTP access-log request targets."""
def _sub(m: re.Match) -> str:
prefix = m.group(1)
query = _redact_query_string(m.group(2))
return f"{prefix}?{query}"
return _HTTP_REQUEST_TARGET_QUERY_RE.sub(_sub, text)
def _redact_form_body(text: str) -> str:
"""Redact sensitive values in a form-urlencoded body.
@@ -339,15 +320,6 @@ def redact_sensitive_text(text: str, *, force: bool = False, code_file: bool = F
patterns when the text is known to be source code (e.g. MAX_TOKENS=***
constants, "apiKey": "test" fixtures). Prefix patterns, auth headers,
private keys, DB connstrings, JWTs, and URL secrets are still redacted.
Performance: each regex pattern is gated behind a cheap substring
pre-check (e.g. ``"=" in text`` for ENV assignments, ``"://" in text``
for URLs, ``"eyJ" in text`` for JWTs). On a typical hermes log line
(no secrets) this drops the 13-pattern scan from ~5.6us to ~1.8us per
record (-68%). The pre-checks are conservative — false positives
still run the full regex, which then doesn't match. False negatives
are impossible because every regex requires the gated substring to
match.
"""
if text is None:
return None
@@ -358,141 +330,68 @@ def redact_sensitive_text(text: str, *, force: bool = False, code_file: bool = F
if not (force or _REDACT_ENABLED):
return text
# Known prefixes (sk-, ghp_, etc.) — gate on substring presence
if _has_known_prefix_substring(text):
text = _PREFIX_RE.sub(lambda m: _mask_token(m.group(1)), text)
# Known prefixes (sk-, ghp_, etc.)
text = _PREFIX_RE.sub(lambda m: _mask_token(m.group(1)), text)
# ENV assignments: OPENAI_API_KEY=*** (skip for code files — false positives)
if not code_file:
if "=" in text:
def _redact_env(m):
name, quote, value = m.group(1), m.group(2), m.group(3)
return f"{name}={quote}{_mask_token(value)}{quote}"
text = _ENV_ASSIGN_RE.sub(_redact_env, text)
def _redact_env(m):
name, quote, value = m.group(1), m.group(2), m.group(3)
return f"{name}={quote}{_mask_token(value)}{quote}"
text = _ENV_ASSIGN_RE.sub(_redact_env, text)
# JSON fields: "apiKey": "***" (skip for code files — false positives)
if ":" in text and '"' in text:
def _redact_json(m):
key, value = m.group(1), m.group(2)
return f'{key}: "{_mask_token(value)}"'
text = _JSON_FIELD_RE.sub(_redact_json, text)
def _redact_json(m):
key, value = m.group(1), m.group(2)
return f'{key}: "{_mask_token(value)}"'
text = _JSON_FIELD_RE.sub(_redact_json, text)
# Authorization headers — _AUTH_HEADER_RE is "Authorization: Bearer ..."
# case-insensitive, so "uthorization" is the cheapest substring gate that
# covers both "Authorization" and "authorization" without a casefold().
if "uthorization" in text or "UTHORIZATION" in text:
text = _AUTH_HEADER_RE.sub(
lambda m: m.group(1) + _mask_token(m.group(2)),
text,
)
# Authorization headers
text = _AUTH_HEADER_RE.sub(
lambda m: m.group(1) + _mask_token(m.group(2)),
text,
)
# Telegram bot tokens — pattern requires ":<token>" with digits prefix
if ":" in text:
def _redact_telegram(m):
prefix = m.group(1) or ""
digits = m.group(2)
return f"{prefix}{digits}:***"
text = _TELEGRAM_RE.sub(_redact_telegram, text)
# Telegram bot tokens
def _redact_telegram(m):
prefix = m.group(1) or ""
digits = m.group(2)
return f"{prefix}{digits}:***"
text = _TELEGRAM_RE.sub(_redact_telegram, text)
# Private key blocks
if "BEGIN" in text and "-----" in text:
text = _PRIVATE_KEY_RE.sub("[REDACTED PRIVATE KEY]", text)
text = _PRIVATE_KEY_RE.sub("[REDACTED PRIVATE KEY]", text)
# Database connection string passwords
if "://" in text:
text = _DB_CONNSTR_RE.sub(lambda m: f"{m.group(1)}***{m.group(3)}", text)
text = _DB_CONNSTR_RE.sub(lambda m: f"{m.group(1)}***{m.group(3)}", text)
# JWT tokens (eyJ... — base64-encoded JSON headers)
if "eyJ" in text:
text = _JWT_RE.sub(lambda m: _mask_token(m.group(0)), text)
text = _JWT_RE.sub(lambda m: _mask_token(m.group(0)), text)
# NOTE: Web-URL redaction (query params + userinfo + HTTP access-log
# request targets) is intentionally OFF. Many legitimate workflows pass
# opaque tokens through query strings — magic-link checkouts, OAuth
# callbacks the agent is meant to follow, pre-signed share URLs — and
# blanket-redacting param values by name breaks those skills mid-flow.
# Known credential shapes (sk-, ghp_, JWTs, etc.) inside URLs are still
# caught by _PREFIX_RE and _JWT_RE above. DB connection-string passwords
# are still caught by _DB_CONNSTR_RE.
# URL userinfo (http(s)://user:pass@host) — redact for non-DB schemes.
# DB schemes are handled above by _DB_CONNSTR_RE.
text = _redact_url_userinfo(text)
# URL query params containing opaque tokens (?access_token=…&code=…)
text = _redact_url_query_params(text)
# Form-urlencoded bodies (only triggers on clean k=v&k=v inputs).
if "&" in text and "=" in text:
text = _redact_form_body(text)
text = _redact_form_body(text)
# Discord user/role mentions (<@snowflake_id>)
if "<@" in text:
text = _DISCORD_MENTION_RE.sub(lambda m: f"<@{'!' if '!' in m.group(0) else ''}***>", text)
text = _DISCORD_MENTION_RE.sub(lambda m: f"<@{'!' if '!' in m.group(0) else ''}***>", text)
# E.164 phone numbers (Signal, WhatsApp)
if "+" in text:
def _redact_phone(m):
phone = m.group(1)
if len(phone) <= 8:
return phone[:2] + "****" + phone[-2:]
return phone[:4] + "****" + phone[-4:]
text = _SIGNAL_PHONE_RE.sub(_redact_phone, text)
def _redact_phone(m):
phone = m.group(1)
if len(phone) <= 8:
return phone[:2] + "****" + phone[-2:]
return phone[:4] + "****" + phone[-4:]
text = _SIGNAL_PHONE_RE.sub(_redact_phone, text)
return text
# Substrings used to gate ``_PREFIX_RE`` execution. If none of these appear in
# the input string, the prefix regex cannot match anything, so we skip it.
# False positives are fine (they just run the regex, which then matches
# nothing) — the bound is "no false negatives" and that holds because every
# pattern in ``_PREFIX_PATTERNS`` has at least one of these as a literal
# substring of its leading characters.
#
# Derived automatically from ``_PREFIX_PATTERNS`` at module load time so a
# future PR that adds a new prefix to the regex list can't silently break
# the screen.
def _extract_literal_prefix(pattern: str) -> str:
"""Return the leading literal characters of a regex pattern.
Stops at the first regex metacharacter (``[``, ``(``, ``\\``, ``.``,
``?``, ``*``, ``+``, ``|``, ``{``, ``^``, ``$``). Returns the literal
that any match of the pattern MUST contain as a substring, so the
pre-screen never produces false negatives.
"""
meta = "[(\\.?*+|{^$"
for i, ch in enumerate(pattern):
if ch in meta:
return pattern[:i]
return pattern
_PREFIX_SUBSTRINGS = tuple(
_extract_literal_prefix(p) for p in _PREFIX_PATTERNS
)
def _has_known_prefix_substring(text: str) -> bool:
"""Return True if ``text`` contains any known credential prefix substring.
Used as a cheap pre-check before invoking the expensive ``_PREFIX_RE``.
"""
return any(p in text for p in _PREFIX_SUBSTRINGS)
_HTTP_METHOD_SUBSTRINGS = (
"GET ",
"POST ",
"PUT ",
"PATCH ",
"DELETE ",
"HEAD ",
"OPTIONS ",
"TRACE ",
"CONNECT ",
)
def _has_http_method_substring(text: str) -> bool:
"""Cheap pre-check before scanning for access-log request targets."""
upper = text.upper()
return any(method in upper for method in _HTTP_METHOD_SUBSTRINGS)
class RedactingFormatter(logging.Formatter):
"""Log formatter that redacts secrets from all log messages."""

View File

@@ -1,13 +0,0 @@
"""External secret source integrations.
A secret source is anything that can supply environment-variable-shaped
credentials at process startup, _after_ ~/.hermes/.env has loaded. By
default sources are non-destructive: they only set values for env vars
that aren't already present, so .env and shell exports continue to win.
Currently shipped:
- ``bitwarden`` — Bitwarden Secrets Manager (`bws` CLI). See
``agent.secret_sources.bitwarden`` for the integration and
``hermes_cli.secrets_cli`` for the user-facing setup wizard.
"""

View File

@@ -1,660 +0,0 @@
"""Bitwarden Secrets Manager (`bws` CLI) integration.
Hermes pulls API keys from Bitwarden Secrets Manager at process startup
so they don't have to live in plaintext in ``~/.hermes/.env``.
Design summary
--------------
* The ``bws`` binary is auto-installed into ``<hermes_home>/bin/bws`` on
first use. Hermes pins one version (``_BWS_VERSION``) and downloads
the matching asset from the official GitHub Releases page, verifying
the SHA-256 against the release's published checksum file.
* The access token is stored in ``~/.hermes/.env`` as
``BWS_ACCESS_TOKEN`` (or whatever name the user picked in
``secrets.bitwarden.access_token_env``). This is the one
bootstrap secret — every other provider key can live in Bitwarden.
* Pulling secrets is a single ``bws secret list <project_id>
--output json`` call. We cache the result in-process for
``cache_ttl_seconds`` so back-to-back ``hermes`` invocations don't
hammer the API.
* Failures NEVER block Hermes startup. Missing binary, no network,
expired token, etc. all emit a one-line warning and continue with
whatever credentials ``.env`` already had.
The module is intentionally subprocess-driven rather than going through
the ``bitwarden-sdk-secrets`` Python package: one cross-platform binary
is easier to lazy-install than a wheels-with-Rust-extension dependency.
"""
from __future__ import annotations
import hashlib
import json
import logging
import os
import platform
import shutil
import stat
import subprocess
import tempfile
import time
import urllib.error
import urllib.request
import zipfile
from dataclasses import dataclass, field
from pathlib import Path
from typing import Dict, List, Optional, Tuple
logger = logging.getLogger(__name__)
# ---------------------------------------------------------------------------
# Configuration constants
# ---------------------------------------------------------------------------
# Pinned upstream version. Bump in a follow-up PR — never auto-resolve
# "latest" because upstream release shape (asset names, CLI flags) is
# allowed to change between majors and we want updates to be deliberate.
_BWS_VERSION = "2.0.0"
_BWS_RELEASE_BASE = (
f"https://github.com/bitwarden/sdk-sm/releases/download/bws-v{_BWS_VERSION}"
)
_BWS_CHECKSUM_NAME = f"bws-sha256-checksums-{_BWS_VERSION}.txt"
# How long to wait for bws subprocesses and HTTP downloads, in seconds.
_BWS_DOWNLOAD_TIMEOUT = 60
_BWS_RUN_TIMEOUT = 30
# In-process cache so repeated load_hermes_dotenv() calls (CLI startup,
# gateway hot-reload, test suites) don't re-fetch from BSM.
_CacheKey = Tuple[str, str, str] # (access_token_fingerprint, project_id, server_url)
_CACHE: Dict[_CacheKey, "_CachedFetch"] = {}
# Disk-persisted cache so back-to-back CLI invocations (e.g. `hermes chat -q ...`
# called from scripts, cron, the gateway forking new agents) don't each pay the
# ~380ms `bws secret list` tax. The in-process _CACHE above only saves repeated
# fetches WITHIN one process; this saves repeated fetches ACROSS processes.
#
# Layout: one JSON object per cache key, written atomically with mode 0600 in
# <hermes_home>/cache/bws_cache.json. The file holds only the secret VALUES,
# never the access token. It's plaintext-equivalent to ~/.hermes/.env (which
# we already accept) but kept out of the .env file so users editing it won't
# accidentally commit BSM-sourced secrets.
_DISK_CACHE_BASENAME = "bws_cache.json"
def _disk_cache_path(home_path: Optional[Path] = None) -> Path:
"""Return the disk cache path under hermes_home/cache/.
`home_path` is what `load_hermes_dotenv()` already resolved; falling back
to `$HERMES_HOME` / `~/.hermes` keeps direct callers working too.
"""
if home_path is None:
home_path = Path(os.getenv("HERMES_HOME", Path.home() / ".hermes"))
return home_path / "cache" / _DISK_CACHE_BASENAME
def _cache_key_str(cache_key: _CacheKey) -> str:
"""Serialize a cache key to a stable string for JSON storage."""
token_fp, project_id, server_url = cache_key
return f"{token_fp}|{project_id}|{server_url}"
def _read_disk_cache(cache_key: _CacheKey, ttl_seconds: float,
home_path: Optional[Path] = None) -> Optional["_CachedFetch"]:
"""Return a cached entry from disk if fresh, else None.
Best-effort: any I/O or parse error returns None and we re-fetch.
"""
if ttl_seconds <= 0:
return None
path = _disk_cache_path(home_path)
try:
with open(path, "r", encoding="utf-8") as f:
payload = json.load(f)
except (OSError, json.JSONDecodeError):
return None
if not isinstance(payload, dict):
return None
if payload.get("key") != _cache_key_str(cache_key):
return None
secrets = payload.get("secrets")
fetched_at = payload.get("fetched_at")
if not isinstance(secrets, dict) or not isinstance(fetched_at, (int, float)):
return None
# Coerce all values to strings — JSON allows numbers but env vars need strings
typed_secrets: Dict[str, str] = {
k: v for k, v in secrets.items() if isinstance(k, str) and isinstance(v, str)
}
entry = _CachedFetch(secrets=typed_secrets, fetched_at=float(fetched_at))
if not entry.is_fresh(ttl_seconds):
return None
return entry
def _write_disk_cache(cache_key: _CacheKey, entry: "_CachedFetch",
home_path: Optional[Path] = None) -> None:
"""Persist a cache entry to disk atomically with mode 0600.
Best-effort: any I/O error is swallowed (the next invocation will just
re-fetch). We never want disk cache failures to break startup.
"""
path = _disk_cache_path(home_path)
try:
path.parent.mkdir(parents=True, exist_ok=True)
payload = {
"key": _cache_key_str(cache_key),
"secrets": entry.secrets,
"fetched_at": entry.fetched_at,
}
# Write to a temp file in the same directory and atomic-rename.
# tempfile honors os.umask, so we explicitly chmod 0600 before rename.
fd, tmp = tempfile.mkstemp(
prefix=".bws_cache_", suffix=".tmp", dir=str(path.parent)
)
try:
with os.fdopen(fd, "w", encoding="utf-8") as f:
json.dump(payload, f)
os.chmod(tmp, 0o600)
os.replace(tmp, path)
except BaseException:
try:
os.unlink(tmp)
except OSError:
pass
raise
except OSError:
pass # best-effort — disk cache miss on next invocation is fine
@dataclass
class _CachedFetch:
secrets: Dict[str, str]
fetched_at: float
def is_fresh(self, ttl_seconds: float) -> bool:
if ttl_seconds <= 0:
return False
return (time.time() - self.fetched_at) < ttl_seconds
# ---------------------------------------------------------------------------
# Public dataclasses
# ---------------------------------------------------------------------------
@dataclass
class FetchResult:
"""Outcome of a single BSM pull."""
secrets: Dict[str, str] = field(default_factory=dict)
applied: List[str] = field(default_factory=list) # set into os.environ
skipped: List[str] = field(default_factory=list) # already set, not overridden
warnings: List[str] = field(default_factory=list) # non-fatal issues
error: Optional[str] = None # fatal: nothing was fetched
binary_path: Optional[Path] = None
@property
def ok(self) -> bool:
return self.error is None
# ---------------------------------------------------------------------------
# Binary discovery + lazy install
# ---------------------------------------------------------------------------
def _hermes_bin_dir() -> Path:
"""Where Hermes stores its managed binaries. Profile-aware."""
from hermes_constants import get_hermes_home
return get_hermes_home() / "bin"
def find_bws(*, install_if_missing: bool = False) -> Optional[Path]:
"""Return a path to a usable ``bws`` binary, or None.
Resolution order:
1. ``<hermes_home>/bin/bws`` (our managed copy — preferred)
2. ``shutil.which("bws")`` (system PATH)
When ``install_if_missing`` is True and neither resolves, this calls
:func:`install_bws` to download and verify the pinned version.
"""
managed = _hermes_bin_dir() / _platform_binary_name()
if managed.exists() and os.access(managed, os.X_OK):
return managed
system = shutil.which("bws")
if system:
return Path(system)
if install_if_missing:
try:
return install_bws()
except Exception as exc: # noqa: BLE001 — never block startup
logger.warning("bws auto-install failed: %s", exc)
return None
return None
def _platform_binary_name() -> str:
return "bws.exe" if platform.system() == "Windows" else "bws"
def _platform_asset_name() -> str:
"""Map (uname, arch, libc) → the upstream asset filename.
Asset names follow Rust's target triple convention. Linux defaults
to gnu (glibc); we switch to musl only if ldd --version says so.
"""
system = platform.system()
machine = platform.machine().lower()
if system == "Darwin":
# Universal binary works on both Intel and Apple Silicon — no
# need to pick a per-arch asset.
return f"bws-macos-universal-{_BWS_VERSION}.zip"
if system == "Windows":
arch = "aarch64" if machine in ("arm64", "aarch64") else "x86_64"
return f"bws-{arch}-pc-windows-msvc-{_BWS_VERSION}.zip"
if system == "Linux":
arch = "aarch64" if machine in ("arm64", "aarch64") else "x86_64"
libc = "gnu"
# ldd --version writes to stderr on glibc, stdout on musl. We
# don't need bullet-proof detection — getting it wrong falls
# back to a clear error from the binary loader, which we catch.
try:
res = subprocess.run(
["ldd", "--version"],
capture_output=True,
text=True,
timeout=2,
)
if "musl" in (res.stdout + res.stderr).lower():
libc = "musl"
except (OSError, subprocess.TimeoutExpired):
pass
return f"bws-{arch}-unknown-linux-{libc}-{_BWS_VERSION}.zip"
raise RuntimeError(
f"Unsupported platform for bws auto-install: {system} {machine}"
)
def install_bws(*, force: bool = False) -> Path:
"""Download, verify, and install the pinned ``bws`` binary.
Returns the path to the installed executable. Raises on any
failure (network, checksum, extraction) — callers in the auto-install
path catch these; the user-facing ``hermes secrets bitwarden setup``
surface lets them propagate so the wizard can show a clear error.
"""
bin_dir = _hermes_bin_dir()
bin_dir.mkdir(parents=True, exist_ok=True)
target = bin_dir / _platform_binary_name()
if target.exists() and not force:
return target
asset_name = _platform_asset_name()
asset_url = f"{_BWS_RELEASE_BASE}/{asset_name}"
checksum_url = f"{_BWS_RELEASE_BASE}/{_BWS_CHECKSUM_NAME}"
with tempfile.TemporaryDirectory(prefix="hermes-bws-") as tmpdir:
tmp = Path(tmpdir)
zip_path = tmp / asset_name
checksum_path = tmp / _BWS_CHECKSUM_NAME
logger.info("Downloading %s", asset_url)
_http_download(asset_url, zip_path)
_http_download(checksum_url, checksum_path)
expected = _expected_sha256(checksum_path, asset_name)
actual = _sha256_file(zip_path)
if expected.lower() != actual.lower():
raise RuntimeError(
f"Checksum mismatch for {asset_name}: "
f"expected {expected}, got {actual}"
)
with zipfile.ZipFile(zip_path) as zf:
member = _pick_zip_member(zf, _platform_binary_name())
zf.extract(member, tmp)
extracted = tmp / member
# Move into place atomically. We write to a sibling tempfile in
# the final directory so the rename can't cross filesystems.
fd, staged = tempfile.mkstemp(dir=str(bin_dir), prefix=".bws_")
os.close(fd)
shutil.copy2(extracted, staged)
os.chmod(
staged,
stat.S_IRUSR | stat.S_IWUSR | stat.S_IXUSR
| stat.S_IRGRP | stat.S_IXGRP
| stat.S_IROTH | stat.S_IXOTH,
)
os.replace(staged, target)
logger.info("Installed bws %s at %s", _BWS_VERSION, target)
return target
def _http_download(url: str, dest: Path) -> None:
req = urllib.request.Request(url, headers={"User-Agent": "hermes-agent"})
try:
with urllib.request.urlopen(req, timeout=_BWS_DOWNLOAD_TIMEOUT) as resp: # noqa: S310
with open(dest, "wb") as f:
shutil.copyfileobj(resp, f)
except urllib.error.URLError as exc:
raise RuntimeError(f"Failed to download {url}: {exc}") from exc
def _expected_sha256(checksum_file: Path, asset_name: str) -> str:
"""Parse the upstream ``bws-sha256-checksums-X.Y.Z.txt`` file.
Format is the standard ``sha256sum`` output: ``<hex> <filename>``,
one per line.
"""
text = checksum_file.read_text(encoding="utf-8", errors="replace")
for line in text.splitlines():
parts = line.strip().split()
if len(parts) >= 2 and parts[-1] == asset_name:
return parts[0]
raise RuntimeError(
f"No checksum entry for {asset_name} in {checksum_file.name}"
)
def _sha256_file(path: Path) -> str:
h = hashlib.sha256()
with open(path, "rb") as f:
for chunk in iter(lambda: f.read(65536), b""):
h.update(chunk)
return h.hexdigest()
def _pick_zip_member(zf: zipfile.ZipFile, binary_name: str) -> str:
"""Find the binary inside the upstream zip.
Historically the archive has been flat (``bws`` at the root) but we
tolerate a top-level directory just in case upstream changes.
"""
candidates = [n for n in zf.namelist() if n.split("/")[-1] == binary_name]
if not candidates:
raise RuntimeError(
f"Could not find {binary_name} inside downloaded archive "
f"(members: {zf.namelist()[:5]}...)"
)
# Prefer the shortest path (i.e. root over nested) for determinism.
candidates.sort(key=len)
return candidates[0]
# ---------------------------------------------------------------------------
# Secret fetch + apply
# ---------------------------------------------------------------------------
def _token_fingerprint(token: str) -> str:
"""SHA-256 prefix used as a cache key — never logged, never displayed."""
return hashlib.sha256(token.encode("utf-8")).hexdigest()[:16]
def fetch_bitwarden_secrets(
*,
access_token: str,
project_id: str,
binary: Optional[Path] = None,
cache_ttl_seconds: float = 300,
use_cache: bool = True,
server_url: str = "",
home_path: Optional[Path] = None,
) -> Tuple[Dict[str, str], List[str]]:
"""Pull the secrets for ``project_id`` from Bitwarden Secrets Manager.
Returns ``(secrets_dict, warnings_list)``.
Set ``server_url`` to point at a non-default Bitwarden region or a
self-hosted instance — e.g. ``https://vault.bitwarden.eu`` for EU
Cloud accounts. When empty, ``bws`` uses its built-in default
(``https://vault.bitwarden.com``, US Cloud). This is plumbed into
the subprocess as ``BWS_SERVER_URL``.
Caching is a two-layer LRU: an in-process dict (for hot-reload paths
inside one process) and a disk-persisted JSON file under
``<hermes_home>/cache/bws_cache.json`` (for back-to-back CLI invocations).
Both share the same TTL. Pass ``home_path`` so disk cache lookups find
the right directory in tests / non-standard installs; otherwise we fall
back to ``$HERMES_HOME`` / ``~/.hermes``.
Raises :class:`RuntimeError` for fatal conditions (missing binary,
auth failure, unparseable output). Callers in the env_loader path
catch this and emit a single warning; callers in the user-facing
setup wizard let it propagate.
"""
if not access_token:
raise RuntimeError("Bitwarden access token is empty")
if not project_id:
raise RuntimeError("Bitwarden project_id is empty")
cache_key = (_token_fingerprint(access_token), project_id, server_url or "")
if use_cache:
cached = _CACHE.get(cache_key)
if cached and cached.is_fresh(cache_ttl_seconds):
return cached.secrets, []
# L2: disk cache. ~5ms on cache hit vs ~380ms for `bws secret list`.
disk_cached = _read_disk_cache(cache_key, cache_ttl_seconds, home_path)
if disk_cached is not None:
# Promote into in-process cache so subsequent fetches in the
# same process skip the disk read too.
_CACHE[cache_key] = disk_cached
return disk_cached.secrets, []
bws = binary or find_bws(install_if_missing=True)
if bws is None:
raise RuntimeError(
"bws binary not available — auto-install failed and `bws` is "
"not on PATH. Install manually from "
"https://github.com/bitwarden/sdk-sm/releases or re-run "
"`hermes secrets bitwarden setup`."
)
secrets, warnings = _run_bws_list(bws, access_token, project_id, server_url)
entry = _CachedFetch(secrets=secrets, fetched_at=time.time())
_CACHE[cache_key] = entry
if use_cache:
_write_disk_cache(cache_key, entry, home_path)
return secrets, warnings
def _run_bws_list(
bws: Path, access_token: str, project_id: str, server_url: str = ""
) -> Tuple[Dict[str, str], List[str]]:
cmd = [str(bws), "secret", "list", project_id, "--output", "json"]
env = os.environ.copy()
env["BWS_ACCESS_TOKEN"] = access_token
# Make sure we're not echoing telemetry / colour codes into json.
env.setdefault("NO_COLOR", "1")
# Region / self-hosted support. bws defaults to https://vault.bitwarden.com
# (US Cloud); EU Cloud users need https://vault.bitwarden.eu, and
# self-hosted users need their own URL. When unset, fall back to whatever
# BWS_SERVER_URL the caller already had in their shell env (preserved by
# the copy above) so manual overrides keep working too.
if server_url:
env["BWS_SERVER_URL"] = server_url
try:
proc = subprocess.run( # noqa: S603 — bws path is trusted
cmd,
env=env,
capture_output=True,
text=True,
timeout=_BWS_RUN_TIMEOUT,
)
except subprocess.TimeoutExpired as exc:
raise RuntimeError(
f"bws timed out after {_BWS_RUN_TIMEOUT}s fetching secrets"
) from exc
except OSError as exc:
raise RuntimeError(f"failed to invoke bws: {exc}") from exc
if proc.returncode != 0:
# bws writes auth/network errors to stderr in plain English.
# Strip ANSI just in case and surface the first 200 chars.
err = (proc.stderr or proc.stdout or "").strip().replace("\x1b", "")
raise RuntimeError(
f"bws exited {proc.returncode}: {err[:200]}"
)
raw = proc.stdout.strip()
if not raw:
return {}, ["bws returned no output (empty project?)"]
try:
payload = json.loads(raw)
except json.JSONDecodeError as exc:
raise RuntimeError(f"bws returned non-JSON output: {exc}") from exc
if not isinstance(payload, list):
raise RuntimeError(
f"bws returned unexpected shape: {type(payload).__name__}"
)
secrets: Dict[str, str] = {}
warnings: List[str] = []
for item in payload:
if not isinstance(item, dict):
continue
key = item.get("key")
value = item.get("value")
if not isinstance(key, str) or not isinstance(value, str):
continue
if not _is_valid_env_name(key):
warnings.append(
f"Skipping secret {key!r}: not a valid env-var name"
)
continue
secrets[key] = value
return secrets, warnings
def _is_valid_env_name(name: str) -> bool:
if not name:
return False
if not (name[0].isalpha() or name[0] == "_"):
return False
return all(c.isalnum() or c == "_" for c in name)
# ---------------------------------------------------------------------------
# Public entry point — called from hermes_cli.env_loader
# ---------------------------------------------------------------------------
def apply_bitwarden_secrets(
*,
enabled: bool,
access_token_env: str = "BWS_ACCESS_TOKEN",
project_id: str = "",
override_existing: bool = False,
cache_ttl_seconds: float = 300,
auto_install: bool = True,
server_url: str = "",
home_path: Optional[Path] = None,
) -> FetchResult:
"""Pull secrets from BSM and set them on ``os.environ``.
This is the function ``load_hermes_dotenv()`` calls after the .env
files have loaded. It is intentionally defensive — any failure
returns a :class:`FetchResult` with ``error`` set; it never raises.
``server_url`` selects the Bitwarden region or self-hosted endpoint
(e.g. ``https://vault.bitwarden.eu`` for EU Cloud). Empty string
means use ``bws``'s default (US Cloud).
Parameters mirror the ``secrets.bitwarden.*`` config keys so the
caller can just splat the dict in.
"""
result = FetchResult()
if not enabled:
return result
access_token = os.environ.get(access_token_env, "").strip()
if not access_token:
result.error = (
f"secrets.bitwarden.enabled is true but {access_token_env} is "
"not set. Run `hermes secrets bitwarden setup`."
)
return result
if not project_id:
result.error = (
"secrets.bitwarden.project_id is empty. "
"Run `hermes secrets bitwarden setup`."
)
return result
binary = find_bws(install_if_missing=auto_install)
result.binary_path = binary
if binary is None:
result.error = (
"bws binary not available and auto-install is disabled. "
"Run `hermes secrets bitwarden setup` to install."
)
return result
try:
secrets, warnings = fetch_bitwarden_secrets(
access_token=access_token,
project_id=project_id,
binary=binary,
cache_ttl_seconds=cache_ttl_seconds,
server_url=server_url,
home_path=home_path,
)
except RuntimeError as exc:
result.error = str(exc)
return result
result.secrets = secrets
result.warnings.extend(warnings)
for key, value in secrets.items():
if key == access_token_env:
# Don't let BSM clobber the very token we used to fetch
# itself — that would be a footgun if someone stored the
# token as a BSM secret too.
result.skipped.append(key)
continue
if not override_existing and os.environ.get(key):
result.skipped.append(key)
continue
os.environ[key] = value
result.applied.append(key)
return result
# ---------------------------------------------------------------------------
# Test hook — used by hermetic tests to flush the cache between cases.
# ---------------------------------------------------------------------------
def _reset_cache_for_tests(home_path: Optional[Path] = None) -> None:
"""Clear in-process AND disk caches.
Tests can pass ``home_path`` to scope the disk cleanup to a tmpdir.
Without it we fall back to the same default resolution as the cache
writer itself.
"""
_CACHE.clear()
try:
_disk_cache_path(home_path).unlink()
except (FileNotFoundError, OSError):
pass

View File

@@ -83,7 +83,6 @@ logger = logging.getLogger(__name__)
DEFAULT_TIMEOUT_SECONDS = 60
MAX_TIMEOUT_SECONDS = 300
ALLOWLIST_FILENAME = "shell-hooks-allowlist.json"
_DEFAULT_BLOCK_MESSAGE = "Blocked by shell hook."
# (event, matcher, command) triples that have been wired to the plugin
# manager in the current process. Matcher is part of the key because
@@ -482,17 +481,6 @@ def _serialize_payload(event: str, kwargs: Dict[str, Any]) -> str:
return json.dumps(payload, ensure_ascii=False, default=str)
def _block_message(primary: Any, secondary: Any) -> str:
"""Return a validated string block message, falling back to the default.
Accepts two candidate fields (primary wins over secondary) so callers
can express field-priority differences between the two hook wire formats
without duplicating the type-check logic.
"""
raw = primary or secondary
return raw if isinstance(raw, str) and raw else _DEFAULT_BLOCK_MESSAGE
def _parse_response(event: str, stdout: str) -> Optional[Dict[str, Any]]:
"""Translate stdout JSON into a Hermes wire-shape dict.
@@ -527,9 +515,13 @@ def _parse_response(event: str, stdout: str) -> Optional[Dict[str, Any]]:
if event == "pre_tool_call":
if data.get("action") == "block":
return {"action": "block", "message": _block_message(data.get("message"), data.get("reason"))}
message = data.get("message") or data.get("reason") or ""
if isinstance(message, str) and message:
return {"action": "block", "message": message}
if data.get("decision") == "block":
return {"action": "block", "message": _block_message(data.get("reason"), data.get("message"))}
message = data.get("reason") or data.get("message") or ""
if isinstance(message, str) and message:
return {"action": "block", "message": message}
return None
context = data.get("context")
@@ -632,10 +624,7 @@ def _locked_update_approvals() -> Iterator[Dict[str, Any]]:
yield data
save_allowlist(data)
finally:
try:
fcntl.flock(lock_fh.fileno(), fcntl.LOCK_UN)
except (OSError, IOError):
pass
fcntl.flock(lock_fh.fileno(), fcntl.LOCK_UN)
def _prompt_and_record(

Some files were not shown because too many files have changed in this diff Show More