Surfaces the usage_report()/provenance() data layer added in #36701 as a
user-facing CLI command. Unlike `hermes curator status` (scoped to
curator-managed agent-created candidates), `usage` lists every skill on disk
— bundled built-ins and hub-installed included — with per-skill use/view/patch
counts and an agent/bundled/hub provenance tag.
Flags: --sort {activity,recent,name}, --provenance {agent,bundled,hub} filter,
--json for machine-readable output.
A stream that drops mid-response after tokens are delivered (peer-closed
connection, stale-stream reconnect) is converted into a synthetic
finish_reason="length" stub. The conversation loop treated that network
stall as a max-output-tokens truncation: when the dropped content was a
tool call it retried exactly once, then hard-failed with "Response
truncated due to output length limit" — even on large-output models that
never hit any cap (e.g. Opus).
- Tool-call truncation now retries up to 3 times (was 1) with a
progressive max_tokens boost, and is stub-aware: a PARTIAL_STREAM_STUB_ID
stall prints "Stream interrupted mid tool-call — retrying (n/3)" instead
of the false "model hit max output tokens", and the give-up message
distinguishes a network drop from a real truncation.
- Length-continuation retries preserve the original request's output cap
as a floor, so a high provider/model default isn't silently downshifted
to 8K/12K on retry.
- Added _requested_output_cap_from_api_kwargs() helper.
Tests: stub-stall mid-tool-call recovery within 3 retries; continuation
preserves a large provider-default output cap.
Fixes#26425. Salvages the substance of #26427 (cap floor) and #9525
(retry bump), adapted to the post-refactor conversation_loop.py which
handles all three api_modes uniformly.
Co-authored-by: LeonSGP43 <cine.dreamer.one@gmail.com>
Co-authored-by: ygd58 <ygd58@users.noreply.github.com>
* feat(dashboard): backend API for MCP, pairing, webhooks, credential pool, memory, gateway lifecycle
Adds REST endpoints so a remote admin can manage these without CLI access:
- MCP servers: list/add/remove/test (config.yaml parity with hermes mcp)
- Pairing: list/approve/revoke/clear-pending messaging codes
- Webhooks: list/subscribe/remove (hot-reloaded JSON store)
- Credential pool: list/add/remove rotation keys (via CredentialPool API)
- Memory provider: status/select/disable/reset
- Gateway lifecycle: start/stop (restart+update already existed)
Secrets redacted on read; usable values only reach the agent at session start.
All endpoints sit behind the existing dashboard auth gate.
* feat(dashboard): backend API for ops + skills hub
- Ops actions (spawned, log-tailed via /api/actions): doctor, security audit,
backup, import, checkpoints prune
- Ops reads (structured JSON): hooks list + allowlist status, checkpoints list
with per-session size
- Skills hub actions (spawned): install / uninstall / update
- Registers new action log files for all spawn-based endpoints
All gated by the existing dashboard auth middleware.
* feat(dashboard): admin pages for MCP, pairing, webhooks, and system ops
Adds four new dashboard pages + nav entries so a remote admin can manage
Hermes without CLI access:
- MCP: list/add/remove/test MCP servers
- Webhooks: list/create/delete subscriptions (one-time secret reveal)
- Pairing: approve/revoke/clear messaging pairing codes
- System: gateway start/stop/restart, memory provider + reset, credential
pool add/remove, ops (doctor/audit/backup/import/skills update) with a
live action-log viewer, checkpoints prune, shell-hooks status
api.ts: client methods + types for all new endpoints.
App.tsx: routes + sidebar nav (plain labels, no i18n key required).
Verified: tsc -b clean, production build succeeds, new pages lint clean,
zero new eslint errors in App.tsx.
* test(dashboard): cover admin API endpoints
20 tests across MCP, credential pool, memory, pairing, webhooks, ops, plus
an auth-gate parametrize that asserts every admin endpoint requires the
session token. Asserts request contract + CLI-config parity, not catalog
values (per the no-change-detector-tests rule).
* docs(dashboard): document MCP, Webhooks, Pairing, and System admin pages
Adds Pages sections for the four new admin tabs and an Admin-endpoints table
to the REST API reference. Updates the page description to reflect the
dashboard's expanded role as a full administration panel.
* feat(install): --no-skills flag for blank-slate default profile
Add an install-time --no-skills flag so the default ~/.hermes profile can
be created with zero bundled skills, matching what
`hermes profile create --no-skills` already does for named profiles.
The flag writes $HERMES_HOME/.no-bundled-skills and skips the install-time
seed. sync_skills() now honors that marker with an early return
(skipped_opt_out=True), so neither the installer, a later `hermes update`,
nor a direct sync re-injects bundled skills into a profile that opted out.
Previously the marker was only checked by seed_profile_skills() (named
profiles); the default profile had no opt-out and `hermes update` would
re-seed it every time.
Tests: TestNoBundledSkillsOptOut covers marker-present (no-op) and
marker-absent (normal seed) paths.
* feat(skills): hermes skills opt-out / opt-in for existing profiles
Adds an interactive counterpart to the install-time --no-skills flag so
an already-installed profile (default or named) can toggle the
.no-bundled-skills marker without reinstalling.
- `hermes skills opt-out` writes the marker (stop future seeding). Safe
by default: nothing on disk is touched.
- `hermes skills opt-out --remove` ALSO deletes already-present bundled
skills, but ONLY ones that are manifest-tracked AND byte-identical to
their origin hash. User-edited bundled skills, hub-installed skills, and
hand-written skills are never removed. Previews + confirms before
deleting (--yes to skip).
- `hermes skills opt-in [--sync]` removes the marker and optionally
re-seeds immediately.
Core logic lives in tools/skills_sync.py (set_bundled_skills_opt_out,
is_bundled_skills_opt_out, remove_pristine_bundled_skills) reusing the
existing manifest origin-hash machinery for the safety check.
Tests: TestOptOutToggleAndRemove covers marker toggle idempotency and
proves user-modified + non-bundled skills survive --remove.
* docs: blank-slate skills — install --no-skills + opt-out/opt-in
- features/skills.md: new 'Starting with a blank slate' section covering
the install flag, profile-create flag, and runtime opt-out/opt-in, with
a safe-by-default note.
- reference/cli-commands.md: document the new skills opt-out / opt-in
subcommands + examples.
- reference/profile-commands.md: fix the marker filename (was .no-skills,
actually .no-bundled-skills) and cross-link the runtime commands.
Validated with a full docusaurus build (exit 0); the three edited pages
compile clean with no new warnings.
Two related changes to the skill curator:
1. Built-in pruning. New curator.prune_builtins config (default on) lets the
curator archive bundled built-in skills after the inactivity period, not
just agent-created ones. A .curator_suppressed list tells the update-time
re-seeder (tools/skills_sync) to leave pruned built-ins archived, so the
prune is durable across `hermes update`. Built-ins are seeded with a
baseline record on first sight, so the inactivity clock starts at upgrade
time -- no mass-prune on the first run. Hub-installed skills are never
pruned regardless of the flag. Restoring a built-in clears its suppression.
2. Usage tracking for all skills. Telemetry (view/use/patch) was wrongly gated
behind curation-eligibility, so built-ins were tracked only when prunable
and hub skills never. Telemetry is observability and is now decoupled from
curation: every skill accrues usage counts regardless of provenance, while
lifecycle mutators (set_state/set_pinned/mark_agent_created) stay
curation-gated. New usage_report() + provenance() expose all skills with an
agent/bundled/hub tag.
Gateway /undo was wired into every platform but still ran the old
single-turn hard-truncate. Now it matches the CLI/TUI: /undo [N] backs
up N user turns (default 1, clamps to oldest), soft-deletes the
truncated rows on disk (active=0, kept for audit, hidden from re-prompts
and search) via SessionDB.rewind_to_message, evicts the cached agent so
the next turn rebuilds from the active-only transcript (the gateway's
equivalent of the CLI's in-place history surgery + memory invalidation),
and echoes the backed-up message text so the user can copy/edit and
resend — platforms have no editable composer to prefill.
- gateway/session.py: SessionStore.rewind_session(session_id, n) wraps
the soft-delete primitive; load_transcript already returns active-only
- gateway/run.py: _handle_undo_command parses [N], calls rewind_session,
evicts the agent, echoes target text; confirm-prompt detail is count-aware
- locales: undo.removed gains {turns}; new undo.invalid_count, all 16 langs
- tests: tests/gateway/test_undo_rewind_session.py (6 cases)
The skill security scanner blocked legitimate community skills on three
intrinsic false-positive patterns:
- read_secrets_file matched `cat > file.env <<` heredocs (writing the
user's own keys into their own local .env), not just `cat file.env`
reads. Exclude output redirections.
- allowed-tools frontmatter is REQUIRED by the agent-skill spec; every
compliant skill declares it. Drop from HIGH privilege_escalation to a
LOW informational finding so it no longer drives the verdict.
- python_os_environ flagged `os.environ.get("CONFIG_VAR")` config reads
as HIGH exfiltration. Exempt non-secret `.get()` reads; add a dedicated
CRITICAL python_environ_get_secret pattern so secret-named reads
(OPENAI_API_KEY etc.) are still caught.
Also: scan_skill() now honors a skill-provided .skillignore / .clawhubignore
(gitignore-style) so dev/docs artifacts shipped in a skill root are excluded
from both structural checks and pattern scanning. SKILL.md is never ignorable.
80 tests pass (64 existing + 16 new).
The setup-mode chooser showed two bare labels ('Quick Setup (Nous
Portal) — OAuth login, model & messaging' / 'Full setup — configure
everything') that didn't explain what Quick Setup actually is. Expand
both labels inline so each choice line carries a concise explanation:
Quick Setup (Nous Portal) — free OAuth login, no API keys, model + tools
Full setup — configure every provider, tool & option yourself (bring your own keys)
Single-file change to the choice labels; no new plumbing.
Two pre-existing failures on main, unrelated to each other:
- test_model_catalog: website/static/api/model-catalog.json was stale vs
_PROVIDER_MODELS — minimax/minimax-m2.7 was renamed to minimax/minimax-m3
without regenerating the committed manifest. Ran scripts/build_model_catalog.py.
- test_gui_command: the macOS relaunchable-signing fixup
(_desktop_macos_relaunchable_fixup) makes two subprocess.run calls (xattr +
codesign) on darwin before launch. The two darwin GUI tests set
sys.platform='darwin' and mock subprocess.run with a 2-element side_effect
(pack + launch), so the fixup's calls drained the iterator -> StopIteration.
Mock out the fixup in those two tests so the subprocess accounting stays
focused on pack/launch.
The on_session_switch fan-out passed rewound=rewound unconditionally,
injecting rewound=False into every provider's **kwargs on the common
/resume, /branch, /new, and compression paths. Providers that capture
extra kwargs into an 'extra' dict (and the exact-dict-equality tests
guarding them) broke. Forward rewound only when truthy; /undo sets it
explicitly, everyone else stays clean.
Extends the existing /undo command from a single in-memory exchange
removal into a full rewind: back up N user turns (default 1), soft-delete
the truncated rows in SessionDB (active=0, kept for audit, hidden from
re-prompts and search), notify memory providers, and prefill the composer
with the backed-up message text for editing — CLI and TUI.
Reuses the SessionDB rewind primitives, the on_session_switch(rewound=True)
memory hook, and the TUI command.dispatch prefill payload from SaguaroDev's
#21910 work, wired to /undo [N] instead of a separate /rewind picker.
- cli.py: undo_last(n, prefill) — in-memory truncate + SQLite soft-delete
+ agent surgery (system-prompt invalidate, flush-index reset) + memory
notify + editable buffer prefill; /undo dispatch parses optional count;
checkpoint-rollback caller passes prefill=False
- tui_gateway/server.py: command.dispatch undo branch (was rewind) parses
count, picks Nth-from-last user turn, clamps to oldest
- commands.py: /undo gains [N] args_hint
- tests: rename + expand TUI suite (multi-turn, clamp, invalid-count)
- release.py: AUTHOR_MAP entry for SaguaroDev
Co-authored-by: SaguaroDev <74339271+SaguaroDev@users.noreply.github.com>
Adds the TUI half of the /rewind feature so the Ink terminal UI gets
the same affordance as the prompt_toolkit CLI.
Python side (tui_gateway/server.py):
- /rewind added to _PENDING_INPUT_COMMANDS so slash.exec rejects it
and the TUI falls through to command.dispatch (the only path with
access to live session state + memory hooks).
- New command.dispatch branch for name == "rewind":
v1 auto-picks the most recent user turn (Claude-Code-style single-
step undo), calls SessionDB.rewind_to_message, refreshes the
in-memory history, fires _memory_manager.on_session_switch with
rewound=True, and returns the new "prefill" payload.
- A dedicated picker overlay (multi-step rewind) is tracked as a
follow-up to #21910.
TS side (ui-tui/src/):
- New "prefill" variant on CommandDispatchResponse + asCommandDispatch
validator. Mirrors "send" but does NOT auto-submit; the client drops
the message into the composer for editing.
- createSlashHandler renders the optional notice via sys() and calls
ctx.composer.setInput(d.message), letting the user edit-and-resubmit
the rewound turn — the core UX promised by the issue.
Tests:
- 7 new tui_gateway tests covering prefill payload shape, in-memory
history truncation, DB soft-delete, memory-provider notification
(rewound=True), busy-session refusal, missing-session error, and
registry placement in _PENDING_INPUT_COMMANDS.
- Extended asCommandDispatch vitest covering the new prefill variant
(with + without notice, and rejection of malformed payloads).
Out of scope for v1 (tracked as #21910 follow-up):
- Dedicated picker overlay in Ink (the multi-step rewind UI). v1 auto-
picks the most recent user turn, matching the most common case.
- Gateway platforms (Telegram, Discord, etc.) — issue scopes v1 to
CLI + TUI only.
Self-review of the code-block masking fix: the cleanup path ran
media_pattern.sub('') over the _mask_protected_spans() copy of the text and
assigned that back to 'cleaned', so whenever a real MEDIA: tag was delivered
(if media: branch), every fenced code block / inline code / blockquote in the
reply was blanked to whitespace in the user-visible text.
Now mask only a length-equal copy of 'cleaned' to locate the real tag spans,
then delete those spans from the unmasked 'cleaned' — masking is a locator,
not a text rewrite. Protected spans survive verbatim. Strengthens the existing
mixed-code test (it only asserted 'Done.' survived, not the code block) and
adds an inline-code-survives regression test. Both fail on the old sub-based
code and pass now.
extract_media() scanned the full response text without distinguishing
live delivery tags from example paths in fenced code blocks, inline code
spans, and blockquotes. This caused false positives where the agent's
explanation of MEDIA: syntax (or tool output containing example paths)
was stripped from user-visible text and the path was added to the media
delivery list.
Added _mask_protected_spans() helper that replaces protected regions
with equal-length whitespace before regex matching, preserving match
offsets. The helper skips backtick-quoted paths in MEDIA: tags to
maintain existing path extraction behavior.
Fixes#35695
Self-review of #34375 fix: the cleanup path ran media_pattern.sub('') over
the JSON-masked copy of the text, which baked the masking spaces into the
user-visible 'cleaned' string — a serialized tool result like
{"old":"MEDIA:/x.png"} came back as {"old":" "}.
Now mask only a length-equal copy of 'cleaned' to locate the real tag spans,
then delete those spans from the unmasked 'cleaned'. Real tags are stripped;
JSON-embedded MEDIA: text reads back verbatim. Masking 'cleaned' (not the
original 'content') keeps offsets valid after the [[audio_as_voice]] /
[[as_document]] directives are removed. Adds two cleaned-text regression tests.
Serialized tool results frequently embed a prior reply's text, e.g.
{"result": "MEDIA:/path/stale.png"}. The bare-path branch of
MEDIA_TAG_CLEANUP_RE matched these and re-delivered stale files (#34375).
Adds BasePlatformAdapter._mask_json_string_media, which blanks (offset-
preserving) only MEDIA:<bare-path> tokens that sit inside a JSON value-
context string (opened by : , { or [). Legitimate tags at line start,
after prose, indented, MEDIA:"quoted" form, and two-line TTS output are
all left untouched.
Reworked from the approach in #34388 (a line-start regex anchor), which
no longer applied to current main and regressed same-line/indented tags.
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
Adds nicsequenzy@gmail.com -> polnikale to AUTHOR_MAP so the
check-attribution gate passes for the Playwright headless_shell browser
discovery fix (#35717).
* feat(desktop): drop files anywhere in the chat area
File drops were only wired to the composer input. Add a reusable
useFileDropZone hook (enter/leave depth counting + capture-phase reset so
the affordance clears even when the composer claims the drop) and a
pointer-events-none ChatDropOverlay, wired onto the conversation viewport.
Drops funnel through the existing onAttachDroppedItems; composer drops keep
their own inline-ref behavior.
* fix(desktop): chat-area drops insert inline @file refs, not attachment cards
Match the composer-input drop behavior — funnel dropped paths through
droppedFileInlineRef + the composer insert bus so they render as inline
ref chips instead of attachment cards.
* fix(desktop): don't render bare file paths as tool images (404)
vision_analyze reports its input image as a local filesystem path, which
toolImageUrl handed straight to <img src>. In the renderer that resolves
against the dev-server origin and 404s. Restrict inline tool images to
fetchable sources (data: URLs and remote http(s)); bare paths now fall
back to the tool's codicon.
When an unauthenticated SPA fetch hit a gated /api/* endpoint (e.g.
GET /api/analytics/models?days=30 fired from ModelsPage on mount or
after a session expiry), the gated middleware stamped the request's
own path into next= on the 401 envelope's login_url. The SPA's global
401 handler in web/src/lib/api.ts full-page-navigated to that URL,
the PKCE cookie carried the encoded /api/* value through the OAuth
round trip to Portal, and /auth/callback's _validate_post_login_target
accepted it as same-origin and redirected the user to the raw JSON
endpoint instead of the dashboard.
Symptom Ben reported: after the OAuth screen he kept landing on
$DOMAIN/api/analytics/models?days=30 (raw JSON) rather than /models.
The bug was deterministic per page — whichever /api/* call ModelsPage,
AnalyticsPage, or SessionsPage fired first owned the redirect race.
Fix: both validators now reject /api/* targets in addition to the
existing /login, /auth/, /api/auth/ exclusions:
- _safe_next_target in middleware.py drops the value before it ever
enters login_url, so the SPA's 401 handler navigates to a bare
/login (which the SPA itself can return-from via its own
sessionStorage["hermes.lastLocation"] fallback that was already
saving the actual browser location).
- _validate_post_login_target in routes.py drops it as second-line
defence at the callback boundary, so a legacy cookie, a regressed
middleware, or an attacker-crafted /auth/login?next=/api/... value
can't smuggle the redirect through. Either layer alone is enough;
pairing them means a regression in one is caught by the other.
The match is anchored: ``decoded == "/api"`` or
``decoded.startswith("/api/")``. SPA route lookalikes like /apidocs
or /api-keys remain valid landing targets — tests pin that.
Test additions in test_dashboard_auth_401_reauth.py:
- TestApi401Envelope: rewrote test_login_url_carries_next_for_deep_
api_path (which asserted the pre-fix behaviour) as
test_login_url_drops_next_for_deep_api_path, plus added the
specific analytics-models repro case from Ben's report.
- TestNextSameOriginValidation: rejects-api-paths + does-not-reject-
api-prefix-lookalikes (covers /apidocs, /api-keys).
- TestAuthCallbackNext: end-to-end test_callback_with_api_next_
lands_at_root drives /auth/login?next=/api/... through to the
callback and asserts the user lands at "/", not the API URL.
- TestValidatePostLoginTarget: new class covering the callback-side
validator directly, including the URL-encoded ``%2Fapi%2F...``
form the PKCE cookie actually carries.
Mutation-tested: reverting both validators causes exactly the 5 new
or rewritten /api/*-related assertions to fail (each fix layer is
independently tested), while the 31 other assertions in the file
remain green. Full tests/hermes_cli/ suite (288 files, 5,938 tests)
passes with the fix applied.
The desktop rename dialog sent PATCH /api/sessions/{id}, but the backend
only defined GET and DELETE for that path — FastAPI returned 405 Method
Not Allowed, surfaced to the user as "Rename failed". Add the PATCH route
backed by SessionDB.set_session_title (handles sanitization, uniqueness,
and clearing the title when empty).
Also fix a misleading notification: any 405 was summarized as an unrelated
"does not support that audio endpoint" message. Make it a generic 405 hint.
The targeted data-volume chown in stage2-hook.sh only covers hermes-owned
*subdirectories*; loose state files living directly under $HERMES_HOME
(auth.json, state.db, gateway.lock, gateway_state.json, …) are missed.
When created or rewritten by `docker exec <container> hermes …` (root
unless `-u` is passed) they land root-owned, and the unprivileged hermes
runtime then hits PermissionError on next startup, producing a gateway
restart loop.
Fix: reset ownership of an explicit allowlist of hermes-owned top-level
files on every boot. The list mirrors the top-level file entries of
hermes_cli.profile_distribution.USER_OWNED_EXCLUDE plus the runtime lock
files.
This uses a targeted allowlist rather than the originally-proposed blanket
`find $HERMES_HOME -maxdepth 1 -user root` sweep, preserving the
targeted-ownership contract from #19788 / PR #19795: a bind-mounted
$HERMES_HOME may contain host-owned files Hermes does not manage, and
those must never be chowned. Verified end-to-end: allowlisted root-owned
files are reset to hermes on restart while a non-allowlisted host file
keeps its root ownership.
Co-authored-by: x1am1 <2663402852@qq.com>
Shiki's github-light-default colors comments #6e7781 (~4.2:1 on the code
card background), which is borderline unreadable at the 11px code font
size — and worst for shell snippets, where a single `#` turns the rest
of the line into one long comment span. Remap light-mode comments to
GitHub's darker muted gray (#57606a, ~6.4:1) via per-theme
colorReplacements. Dark mode (~6.1:1) reads fine and is left untouched.
s6-overlay images (e.g. hermes-agent:latest) use /init as PID 1 and exec
/run/s6/basedir/bin/init during stage0 startup. The Docker terminal backend
unconditionally added Docker --init and mounted /run as noexec, which broke
those images in two ways: --init created a second competing PID-1 init, and
the noexec /run made s6 stage0 fail with "exec: /run/s6/basedir/bin/init:
Permission denied" (exit 126), so the container died and terminal commands
reported a generic "container is not running" error.
Detect images whose entrypoint is /init via 'docker image inspect' and, for
those images only, skip Docker --init and mount /run with exec. All other
images keep the hardened --init + noexec defaults. Detection is best-effort:
any inspect failure falls back to the safe defaults.
#34192 reports Hostinger's 'Hermes WebUI' catalog crashes on startup
with:
/usr/bin/tini: No such file or directory
The image moved from tini to s6-overlay as PID 1 (/init) earlier in
2026. Orchestration templates that still pin /usr/bin/tini as the
entrypoint \u2014 like the Hostinger Hermes WebUI catalog \u2014 have no
binary to exec and the container crashes immediately.
Hermes has no control over the Hostinger catalog template, but we can
make the image backward-compatible by symlinking /usr/bin/tini -> /init
during the s6-overlay install step. External wrappers that exec
/usr/bin/tini will land on the same s6-overlay reaper they would have
landed on if they'd used the canonical /init entrypoint.
The image's own ENTRYPOINT continues to be /init verbatim \u2014 the shim
is purely for legacy external wrappers, not for the image's own
runtime path. Once affected catalogs are updated, the symlink can be
removed.
Other issues #34192 raises that are NOT addressed by this PR:
* Problem #2 (UID 1024 vs 10000 mismatch): already fixed by #33148
(S6_KEEP_ENV=1) and #32412 (with-contenv shebangs). The Hostinger
template likely needs to update its env-var propagation.
* Problem #3 (incompatible session formats): RFC for pluggable
SessionDB is tracked in #23717.
* Problem #4 (Telegram polling conflict): an operations problem on
Hostinger's side, not in this codebase.
This PR is scoped to the one issue that can be fixed inside
Dockerfile: the missing /usr/bin/tini binary.
Tests (3 in test_dockerfile_tini_compat_shim.py):
- test_tini_compat_symlink_present
Guard: the symlink line must exist in Dockerfile.
- test_tini_compat_comment_explains_why
The #34192 anchor comment must be present so future readers know
why the shim is there (avoid accidental removal).
- test_entrypoint_still_init_not_tini
Sanity check: ENTRYPOINT remains /init (s6-overlay). The shim is
only for external wrappers.
Refs: #34192
Partial fix: addresses the immediate tini-binary crash. Catalog-side
fixes still needed by Hostinger for the UID and session-format
problems documented in the issue.
Co-authored-by: Cursor <cursoragent@cursor.com>
Fixes#34107. When Hermes runs in Docker with HERMES_UID=1000 /
HERMES_GID=911, the entrypoint chowns the top-level HERMES_HOME once
at startup — but subdirectories created at runtime by
ensure_hermes_home() (especially for profile namespaces under
profiles/<name>/ spawned by kanban workers) were landing as root:root
and blocking subsequent uid-mapped worker invocations with:
PermissionError: [Errno 13] Permission denied:
'/opt/data/profiles/charles/logs/curator'
Fix: add _resolve_hermes_uid_gid + _chown_to_hermes_uid helpers that
read the env vars and apply chown after mkdir. Invoke from _secure_dir
which already runs after every directory creation in the home-init path,
so all newly-created subdirs (including the profile namespaces) get the
right ownership.
Safety properties:
- No-op when HERMES_UID/HERMES_GID unset (the dominant non-Docker path)
- No-op on Windows (os.chown doesn't exist; AttributeError swallowed)
- No-op when running as non-root (EPERM swallowed — the entrypoint's
startup chown -R picks it up on next restart, and in most cases the
dir was already correctly-owned by the calling user)
- Uses -1 sentinel for missing field so only the set value applies
- Empty-string env vars treated as unset
Adds 14 tests across:
- TestResolveHermesUidGid (7) — env-var parsing
- TestChownToHermesUid (5) — chown helper invariants
- TestSecureDirChown (2) — end-to-end through _secure_dir
Co-authored-by: Cursor <cursoragent@cursor.com>
Add MiniMax-M3 to the minimax, minimax-oauth, and minimax-cn curated
lists (these are hardcoded — the native Anthropic-format endpoint has no
/v1/models listing and the providers aren't in _MODELS_DEV_PREFERRED, so
new models don't auto-pull). Add a DEFAULT_CONTEXT_LENGTHS key
'minimax-m3' -> 1,000,000 so M3 resolves to its 1M context on every
surface (native ID + OpenRouter/Nous slug) via longest-key-first
substring match, while the M2.x series stays at 204,800.
On macOS the desktop app is built locally and ad-hoc signed (no Developer ID
on the user's machine). An ad-hoc bundle has no stable Designated Requirement,
so when the self-updater rebuilds it in place with a fresh build (new cdhash)
— plus the com.apple.quarantine flag inherited from the downloaded installer
process chain — Gatekeeper/LaunchServices treats the changed code as tampering
and macOS reports "Hermes is damaged and can't be opened," and the app fails to
relaunch. First launch works (fresh registration); the in-place update relaunch
is what breaks.
Fix: after building the desktop app locally, strip quarantine xattrs and
re-apply a clean deep ad-hoc signature (omitting the hardened-runtime flag,
which an ad-hoc build can't satisfy). Applied in both build entry points:
- hermes_cli/main.py cmd_gui (the `hermes desktop --build-only` path the
updater drives) — so the fix ships via `hermes update` (git), no installer
re-download needed.
- scripts/install.sh install_desktop (first install) for parity.
Both are no-ops on non-macOS and when a real signing identity (CSC_LINK /
APPLE_SIGNING_IDENTITY) is configured, so signed/notarized builds are untouched.
The docker_forward_env build loop only consulted the ~/.hermes/.env disk
fallback when a key was unset (value is None), not when it was present
but empty (""). A transient empty value in os.environ was therefore
forwarded into the sandbox container as `-e KEY=`, clobbering the correct
value on disk. Sandboxed workloads then read a zero-length secret and
failed auth (observed as intermittent Linear API 401s) with no gateway
restart and no .env rewrite.
Treat empty-string like unset (`if not value:` on the fallback) and never
forward a blank secret (`if value:` on the guard).
Fixes#35580
Adds me@simontaggart.com → SiTaggart to AUTHOR_MAP so the
check-attribution gate passes for the docker_forward_env empty-secret
fix (#35583, fixes#35580).
Read the Portal's tool_access claim (JWT + /api/oauth/account) into NousToolAccessInfo and gate managed Tool Gateway access on it: tool_gateway_entitled (paid OR live pool) and per-category tool_gateway_entitled_for(). The pool funds web/image/tts/browser but not video, so per-backend availability, the charge picker (ensure_nous_portal_access coverage_category), and managed defaults all respect coverage.
Setup: rebuild prompt_enable_tool_gateway as a per-tool checklist that renders whenever the pool is enabled, lists only pool-covered tools (video excluded for free-pool users), and is framed as the free tool pool for $0 subscribers rather than a paid subscription. get_gateway_eligible_tools now gates and filters off the entitlement snapshot.
The thin installer (apps/bootstrap-installer) drives install.sh stage-by-stage,
each in its own process. The `desktop` stage never called check_node, so the
Hermes-managed Node provisioned earlier (at $HERMES_HOME/node/bin) wasn't on
PATH. install_desktop's `command -v npm` check then failed and the build was
skipped — yet the stage still reported {"ok":true,"skipped":false}, so the
installer showed "Installation Complete" and only failed at the end with
"Couldn't find a built Hermes desktop ... the desktop build step may have been
skipped or failed."
Fix:
- Call check_node in the `desktop` stage (mirrors every other Node-dependent
stage) so the managed Node is on PATH (or installed).
- Make install_desktop self-provision via check_node and hard-fail (return 1)
if npm is still unavailable, instead of a silent `return 0`. The desktop
stage only runs when a build is explicitly requested (--include-desktop), so
an unavailable toolchain is a real failure, not graceful degradation.
Verified on macOS arm64: the `desktop` stage now builds
release/mac-arm64/Hermes.app, which matches resolve_hermes_desktop_exe, so the
installer's "Launch Hermes" succeeds.
The lazy session.create path hand-builds a partial info dict that omitted
desktop_contract. The desktop GUI reads a missing contract as undefined and
treats it as an out-of-date backend, so it surfaced a "Backend out of date"
toast on every launch even against a current backend. Carry the contract in
the lazy payload like _session_info already does for resume/branch.
The desktop self-update branch defaulted to bb/gui, the pre-merge feature
branch. Now that the desktop app is on main, flip DEFAULT_UPDATE_BRANCH to
main so freshly built apps check for updates against the right branch
instead of relying on the runtime self-heal fallback.
* feat: better composer etc
* docs: add desktop and dashboard run instructions
* fix(desktop): address security scan findings
* fix(dashboard): resolve @nous-research/ui path under npm workspaces
The sync-assets prebuild step shelled out to 'cp -r
node_modules/@nous-research/ui/dist/fonts ...' with a path relative
to apps/dashboard/. That works only when the dep is installed
locally in the dashboard workspace, but 'npm install' at the repo
root (the documented setup — see apps/desktop/README.md) hoists
shared deps to the root node_modules under npm workspaces. The
relative cp then fails with 'No such file or directory', sync-assets
exits 1, the Vite build aborts, and 'hermes dashboard' surfaces a
generic 'Web UI build failed' message.
Replace the shell one-liner with scripts/sync-assets.cjs, which
walks up from the dashboard directory looking for node_modules/
@nous-research/ui — working in both the hoisted (workspaces) and
co-located (standalone) layouts. Also guards against a missing
dist/fonts or dist/assets with a clearer error pointing at a
rebuild of the UI package rather than silently copying nothing.
* feat(desktop): support connecting to a remote Hermes backend
Add HERMES_DESKTOP_REMOTE_URL and HERMES_DESKTOP_REMOTE_TOKEN env
vars that, when set, short-circuit the local-child spawn in
startHermes() and connect the Electron renderer to an already-
running 'hermes dashboard' server reachable over the network.
Motivating use case: WSL2 users who want to run the Hermes core
(agent loop, tools, filesystem access) inside their WSL
distribution while rendering the Electron GUI on native Windows.
Before this change, the desktop app always spawned a local Python
child on the same host as the renderer, which doesn't cross the
WSL/Windows boundary.
The remote path reuses waitForHermes() as a liveness probe
(/api/status is in the backend's public endpoint allowlist), so
the connection is only returned once the backend is actually
ready. WebSocket URL derivation picks ws:// or wss:// based on
the input scheme. URL validation rejects non-http(s) schemes and
requires both env vars together to avoid a half-configured
connection that would silently fall through to the spawn path.
No behaviour change when the env vars are unset — the default
local-spawn flow is untouched.
Typical usage:
# in WSL2
hermes dashboard --tui --no-open --host 0.0.0.0 --port 9119 --insecure
# on Windows
set HERMES_DESKTOP_REMOTE_URL=http://localhost:9119
set HERMES_DESKTOP_REMOTE_TOKEN=<session token>
set HERMES_DESKTOP_IGNORE_EXISTING=1
(launch Hermes desktop)
* ci(desktop): automate desktop releases
Add GitHub Actions release channels for signed desktop installers and document the stable/nightly download paths.
* feat: file tabs
* refactor(desktop): tighten right-rail tab close API
Promote closeRightRailTab/closeActiveRightRailTab as the single
public entry point. Drops the activeTabRef + handleCloseDocument
indirection in ChatPreviewRail, the unused $rightRailHasContent
atom, and the legacy dismissFilePreviewTarget alias. -70 LOC.
* feat(desktop): polish composer pill toward reference look
Solid foreground-on-background send/voice-conversation circle (black-on-white
in light, white-on-black in dark) anchors the right edge as the primary CTA
instead of the orange theme primary. Bumps the primary control to 2.125rem so
it visually outranks the ghost mic/plus controls. Opens up the surface padding
(0.625rem x / 0.5rem y) so the input row breathes around its controls, and
nudges the corner radius from 20 to 24px for a slightly pill-ier silhouette.
LiquidGlass distortion is preserved.
* feat(desktop): add startup and onboarding flow
Add phase-based desktop boot progress, fresh-install sandbox testing, and first-run provider credential onboarding so packaged installs can start cleanly without manual settings detours.
* fix(desktop): gate prompts on provider setup
Show the desktop provider onboarding flow before prompt submission when no inference provider is configured, preventing fresh installs from falling through to backend credential errors.
* fix(desktop): surface provider onboarding from session warnings
Propagate credential warnings through session runtime info and open desktop onboarding whenever a session reports no usable provider, so unconfigured installs cannot fall through to prompt errors.
* fix(desktop): route gateway provider errors to onboarding
The "No inference provider configured" auth error reaches the renderer through gateway error events, not the prompt.submit promise; the previous patch only caught the latter, so the error toast still surfaced and onboarding never opened.
Also strip credential-shaped env vars from the test:desktop:fresh sandbox so the packaged backend can't see provider keys leaking from the launching shell.
* fix(desktop): use strict runtime check to drive onboarding
setup.status returned True whenever any provider auth state was discoverable, including indirect fallbacks like a gh-CLI Copilot token. That made desktop think the user was set up while the agent's actual resolve_runtime_provider call still raised AuthError, leaving the user with a useless toast and no onboarding.
Add a setup.runtime_check gateway method that runs the same resolver the agent uses on session creation, and switch the desktop onboarding overlay and prompt precheck to use it.
* feat(desktop): OAuth-first onboarding using existing dashboard provider API
Replace the engineer-flavored API key form with a Sign-in-first onboarding overlay that uses the dashboard's existing /api/providers/oauth catalog and PKCE/device-code endpoints (Anthropic, Nous, OpenAI Codex, etc.). API key entry is now a fallback tab with friendly provider names instead of env var prefixes, and the loud raw resolver error is gone in favor of a one-line welcome message.
* fix(desktop): polish onboarding provider list
Reorder OAuth providers so Nous Portal is first, give the segmented Sign in / API key control equal column widths, and replace the engineer-flavored backend names like "Anthropic (Claude API)" / "MiniMax (OAuth)" with friendlier in-app titles. External-CLI providers now show a softer subtitle and an external-link icon instead of a chevron.
* refactor(desktop): split onboarding overlay into store + view
Move the OAuth state machine, runtime check, copy-to-clipboard, and api-key save into store/onboarding.ts (matching the boot.ts pattern), leaving the overlay as a presentation layer that subscribes via useStore. Tabs are now table-driven, child panels read flow from the store instead of prop-drilling, and the polling/PKCE/error/success branches share a small Status atom.
* fix(desktop): external CLI providers + center mode tabs
External-CLI providers (Claude Code, Qwen Code) now open an in-overlay panel with the CLI command, copy button, and an "I've signed in" recheck instead of firing an invisible toast. Center the Sign in / API key tab control so it sits under the heading instead of hugging the left edge.
* fix(desktop): drop onboarding tabs for an inline link, group device-code waiting state
Replace the Sign in / API key tab pair with an "I have an API key" footer link under the OAuth provider list, with a "Back to sign in" affordance inside the API key form. Group the device-code "Waiting for you to authorize..." status next to the Cancel button so the alignment matches the action.
* refactor(desktop): tighten onboarding store + overlay
Drop the dead isOnboardingBusy/BUSY set, factor the catch-fallback dance into safeReq, and share a single reloadAndConnect helper between PKCE submit, device-code success, external recheck, and api-key save.
In the overlay, extract Step / CodeBlock / FlowFooter / CancelBtn / DocsLink atoms so the four sign-in panels share the same chrome instead of repeating it inline. Net effect: fewer literal divs, one place to touch the spacing, and the code-block + footer rows are reusable across future flows.
* fix(desktop): mount onboarding from frame 1 to kill the FOUT
Default onboarding.configured to null (unknown until the runtime check resolves) and have the onboarding overlay render whenever it's not yet confirmed true. The boot overlay now yields to it, so the very first paint is the Welcome card with a "While we get you set up..." progress strip instead of a flash of the chat shell between boot dismiss and onboarding mount.
The picker swaps in cleanly once the gateway opens and the runtime check confirms the user is not configured. Already-configured users see the same prep card briefly while their existing runtime warms up, then the overlay dismisses without touching the chat shell.
* fix(desktop): top-align empty sessions placeholder
The "Start a chat to build your history." empty state used a min-h-35 grid place-items-center container, which floated the text in a tall dead zone. Render it as a flat paragraph that sits right under the section header like the empty pinned state does.
* refactor(desktop): drop dead boot overlay
Onboarding overlay subsumes the boot card now that it mounts from frame 1 and renders boot progress inline. The standalone DesktopBootOverlay is unreachable in every flow (yields whenever onboarding has not confirmed configured, dismisses once it has).
* fix(desktop): hide pinned/recents sections until first session
A fresh sidebar showed the Pinned and Recent chats headers with floating empty-state copy underneath. Drop both sections (and the now-orphan SidebarEmptySessionState) when there are no sessions yet — they reappear after the first chat. Skeletons during initial load are unchanged.
* feat(gui): route embedded TUI through dashboard gateway (#21979)
Inject HERMES_TUI_GATEWAY_URL into dashboard PTY sessions so embedded ui-tui instances attach to the in-process websocket gateway, with coverage for the new env wiring.
* Add desktop remote gateway settings
Make the desktop gateway connection configurable from settings so local remains the default while remote backends can be saved, tested, and applied without environment variables.
* feat(gui): first-class Messaging page + gateway menu redesign
- Add Messaging page to the desktop app with per-platform setup,
status, and inline guidance. Catalog derives from gateway.config
Platform enum + plugin registry, so every messaging adapter the CLI
supports (Telegram, Discord, Slack, Mattermost, Matrix, WhatsApp,
Signal, BlueBubbles, Home Assistant, Email, SMS, DingTalk, Feishu,
WeCom, Weixin, QQ, Yuanbao, API server, Webhooks, plugins) shows up
without per-platform code.
- New REST endpoints: GET /api/messaging/platforms, PUT and POST
/test on the same path. Secrets go through the existing .env
pipeline; enable/disable writes config.yaml.
- Replace gateway statusbar dropdown with a richer panel: status row,
icon-only restart + system-panel actions, recent activity (with
timestamps trimmed in display, full text on hover), platform list.
- Auto-poll the messaging page every 6s (paused when hidden) so
status updates without a manual check.
- Drop Settings / Command Center from the sidebar nav (still
reachable via shortcuts and the titlebar cog).
- Flatten top corners on Messaging/Skills/Artifacts/Chat panes.
- Share new StatusDot component across messaging + gateway menu.
- Fix gateway/config.py so an explicit platforms.<name>.enabled=false
in config.yaml is honored when env tokens are present.
- pb-9 on the chat content area for breathing room above the composer.
* Potential fix for pull request finding 'CodeQL / Clear-text logging of sensitive information'
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
* pin electron version
* hide application menu on non-mac systems
* interpret compactPreview for non-string vlaues as JSON or an empty string
* fix(desktop): keep composer contenteditable mounted across stacked toggle
The composer rendered {input} inside two different parent fragments
depending on `stacked`. When auto-expand flipped `stacked` (e.g. the
moment typed text wrapped past two lines), React reconciled the two
branches as different positions and unmounted/remounted the
contenteditable. The fresh mount started empty, so any in-flight
characters — most reliably reproduced by holding a key — were lost.
Replace the conditional with a single CSS Grid whose template-areas
swap on `stacked`. The three children (menu, input, controls) keep
stable identities across the toggle; only their grid placement
changes, which the browser handles without React tearing down the
editor.
* refactor(desktop): align install layout with install.ps1 / install.sh
Make the desktop app's runtime layout match what scripts/install.ps1 and
scripts/install.sh produce, so a desktop-only user and a CLI-only user end
up with the same files in the same places and can share one install.
Layout
- ACTIVE_HERMES_ROOT = HERMES_HOME/hermes-agent (was: process.resourcesPath/hermes-agent, read-only)
- VENV_ROOT = HERMES_HOME/hermes-agent/venv (was: userData/hermes-runtime)
- desktop.log = HERMES_HOME/logs/desktop.log (was: userData/desktop.log)
- HERMES_HOME default: %LOCALAPPDATA%\hermes on Windows, ~/.hermes elsewhere
The packaged .app/.exe still ships a read-only payload at
process.resourcesPath/hermes-agent (FACTORY_HERMES_ROOT). On first launch
or after an installer-driven upgrade we sync factory -> active, then
provision the venv and run pip install -e . against the active root.
Key behaviors
- Pin HERMES_HOME in the spawned Python's env so get_hermes_home() resolves
to the same path resolveHermesHome() picked. Without this, Python falls
back to ~/.hermes on every platform - fine on mac/linux, a split-state
bug on Windows where our default is %LOCALAPPDATA%\hermes.
- Detect developer installs by .git presence at ACTIVE; never overwrite
a user's checkout via factory sync.
- Marker at ACTIVE/.hermes-desktop-runtime.json (schema v4) tracks
pyproject hash + factory version + runtime schema version. depsFresh
fast-paths when nothing changed.
- Dev (npm run dev) prefers SOURCE_REPO_ROOT over ACTIVE so devs run
their local edits, not whatever's under HERMES_HOME.
- Better error messages distinguish "no payload" from "no Python".
- Preserve a legacy ~/.hermes on Windows when no %LOCALAPPDATA%\hermes
exists, so users with prior pip/manual installs aren't orphaned.
pyproject.toml
- Promote fastapi, uvicorn[standard], ptyprocess (non-Windows), and
pywinpty (Windows) to main dependencies. The dashboard backend
(hermes dashboard) needs them at runtime; the previous lazy-import
fallback was a footgun for fresh installs.
- Empty the [pty] optional-extra; kept as a no-op back-compat alias for
any existing pip install hermes-agent[pty] invocations.
Drops the hardcoded BUNDLED_RUNTIME_REQUIREMENTS list in main.cjs - the
desktop now installs whatever pyproject.toml says, single source of truth.
Files
- apps/desktop/electron/main.cjs: runtime layout, HERMES_HOME pin,
factory->active sync, marker v4
- apps/desktop/scripts/test-desktop.mjs: track new venv location
- apps/desktop/README.md: new Setup, Runtime Bootstrap, and
Debugging sections
- pyproject.toml: fastapi/uvicorn/pty backends in main
dependencies; [pty] extra emptied
Tested locally on Windows: npm run dev boots cleanly, sessions land at
the new location, type-check + lint + test:desktop:platforms all pass.
Verified end-to-end on a fresh Win11 VM via dist:win installer.
Known gaps (filed as follow-ups, not in this PR):
- Skills not seeded on packaged installs (sync_skills only runs in
cmd_chat, not cmd_dashboard). Need to move to shared pre-dispatch.
- Git Bash not bundled or detected; agent's terminal tool errors out
with a useful message but desktop bootstrapper should pre-flight it.
- install.ps1 / install.sh should be decomposed into composable phase
libraries so the desktop bootstrapper can reuse them as a single
source of truth across all install surfaces.
* feat(desktop): theme polish, prose chat typography, composer chrome
- DS tokens/midground, Backdrop, scoped scrollbars, typography plugin + prose
- Composer liquid/radius utilities, thread font parity, tool/thinking cues
- File tree label scale, preview flex, thread retry loading + streaming tests
* feat(desktop): NSIS prereq detection page + auto-install via winget
The packaged Windows installer now detects Python 3.11+ and Git for Windows
at install time and offers to install missing prereqs via winget. Mirrors
the prereq logic scripts/install.ps1 already runs for CLI installs, so
desktop installer users get the same out-of-the-box experience as
install.ps1 users.
Why
- Hermes' terminal tool calls bash.exe directly (tools/environments/
local.py); on Windows that's Git Bash from Git for Windows. Without it,
the agent fails on the first terminal() call.
- Hermes' Python runtime needs 3.11+. Without it, the desktop bootstrapper
errors out at venv creation.
- Both gaps surfaced on a fresh Windows 11 VM smoke test: VM had Python
pre-installed but no Git, so the agent's first terminal call failed
with "Git Bash isn't installed."
- install.ps1 has had Install-Git + Install-Uv functions for ages. The
desktop installer was the asymmetric outlier.
How — NSIS prereq page
- New file: apps/desktop/installer/prereq-check.nsh (plugged into
electron-builder via build.nsis.include)
- Real Wizard page using nsDialogs, inserted via customPageAfterChangeDir
hook (between the Directory page and InstFiles).
- Group boxes for Python and Git, each showing detection status.
- Pre-checked install checkboxes when winget is available.
- Auto-skips silently if both prereqs are already installed.
- Falls back to manual download URLs when winget itself is missing.
- Detection:
- Python: probes `py -3.11`/`-3.12`/`-3.13`/`-3.14` via the Python
launcher. Microsoft Store "Python stub" (no py.exe) is correctly
classified as not-installed.
- Git: `where git`.
- winget: `where winget` (Win10 1809+ / Win11 with App Installer).
- Install execution (in customInstall macro):
- Python: nsExec::ExecToLog with `--scope user --silent`. Per-user
install, no UAC prompt, output streams to install log.
- Git: ExecShellWait via Windows ShellExecute. Critical because Git
always installs per-machine and triggers UAC; ShellExecute preserves
the foreground focus chain across non-elevated → elevated process
spawns, so UAC actually comes to the foreground. nsExec::ExecToLog
breaks the chain because winget runs hidden.
- Both pass `--disable-interactivity --accept-package-agreements
--accept-source-agreements` to suppress winget's own dialogs.
- Verification: probes Git's standard install locations via FileExists
rather than `where git`. NSIS's process inherits PATH at startup, so
a freshly-installed Git won't be visible to `where` until restart.
- Silent installs (/S) skip the prompts; managed deploys handle prereqs
out-of-band via Group Policy / Intune.
How — Electron-side safety net
- New findGitBash() in main.cjs, parallel to findSystemPython(). Probes
the same locations as tools/environments/local.py:_find_bash() so a
positive result here means the agent's terminal tool will work.
- ensureRuntime now throws a clear, actionable error on Windows when Git
Bash isn't found, matching the existing "Python 3.11+ is required"
error path.
- Catches users the NSIS page doesn't: .msi installer users (NSIS prereq
page doesn't run for MSI), `npm run dev` users, manual installers,
anyone who unchecked the install boxes on the NSIS prereq page.
- All gated on `IS_WINDOWS`; macOS / Linux unaffected.
NSIS build issue (resolved)
- electron-builder defaults to `-WX` (warnings as errors). NSIS optimizer
emits "warning 6010: function not referenced" for our page functions
because Page custom directives don't count as references in its
static-analysis pass. The functions ARE called at runtime when NSIS
invokes the page; the optimizer just can't see it statically.
- Set `build.nsis.warningsAsErrors=false` in package.json so this
spurious warning doesn't fail the build. (Documented option from
electron-builder's nsisOptions.)
Out of scope (filed for future work)
- MSI prereq detection: Windows Installer custom actions are a different
mechanism. Enterprise deploys typically handle prereqs via GP/Intune.
- Bundle PortableGit + python-build-standalone in extraResources for
zero-network installs. ~80MB increase.
- Mac / Linux GUI prereq flows (different installer formats; Xcode CLT
covers most macOS prereqs already; Linux is per-distro hard).
Files
- apps/desktop/installer/prereq-check.nsh (new, ~290 lines NSIS)
- apps/desktop/package.json (build.nsis.include +
warningsAsErrors)
- apps/desktop/electron/main.cjs (findGitBash + preflight)
- apps/desktop/README.md (Runtime prerequisites
section)
Cross-platform impact
- macOS / Linux builds (dist:mac, dist:mac:dmg, dist:mac:zip): nsis
config is ignored entirely; .nsh is dormant.
- npm run dev: .nsh dormant; main.cjs preflight gated on IS_WINDOWS.
- scripts/install.ps1, scripts/install.sh: no reference to any new
files; CLI install paths untouched.
- Hermes CLI / dashboard / gateway: no reference; runtime untouched.
- All checks: node --check on main.cjs and test-desktop.mjs pass;
npm run test:desktop:platforms 4/4 passing; node --test green.
Tested
- npm run dist:win produces signed .exe and .msi without errors.
- Fresh Win11 VM (Python pre-installed, no Git): prereq page renders,
Python check shows detected, Git checkbox pre-checked. Click Next →
Git installs via winget with UAC prompt in foreground.
- After install completes, Hermes launches and the agent's terminal
tool can run bash commands. Verified Git Bash is detected at
`C:\Program Files\Git\bin\bash.exe` by ensureRuntime's preflight.
* feat: theme changes, composer tweaks, in app update ux, finesse
* fix(cli): seed bundled skills on dashboard + gateway entrypoints
`sync_skills(quiet=True)` was only being called from inside `cmd_chat`,
which meant `hermes dashboard` (the desktop GUI's backend) and `hermes
gateway` (Telegram/Discord/Slack/etc daemons) never seeded the bundled
skill library into ~/.hermes/skills/.
This surfaced as "No skills found" in the desktop GUI's skills panel on
fresh installs, despite the agent having access to the full bundled
library when invoked via `hermes chat`. scripts/install.ps1 worked
around it by running skills_sync.py as part of Copy-ConfigTemplates,
but that's not part of the desktop installer's bootstrap chain.
Fix
- Extract the skills-sync block from cmd_chat into a module-level
`_sync_bundled_skills_quietly()` helper.
- Call the helper from cmd_chat (preserving existing behavior),
cmd_dashboard (after the --status/--stop early-return paths and
fastapi import check, so we don't run skills_sync on management
commands or when deps aren't installed), and cmd_gateway.
Why these three entrypoints
- cmd_chat: the user's primary CLI entrypoint
- cmd_dashboard: the desktop GUI's backend; this is what `hermes
dashboard --tui` invokes when the desktop bootstrapper spawns Hermes
- cmd_gateway: long-running daemons where the user expects the agent
to have full skill access
Other entrypoints (cmd_config, cmd_doctor, cmd_login, cmd_status,
etc.) are management commands that don't need skill discovery and were
never running skills_sync in the first place — leaving them alone.
Idempotence
- tools/skills_sync.py is manifest-based: skipped skills cost
milliseconds. Calling it from multiple entrypoints adds no real
cost, and users running `hermes chat` then `hermes dashboard` get
two fast no-ops on the second call.
Failure handling
- Helper wraps skills_sync in try/except. Skills are an enhancement,
not a hard dependency — Hermes runs fine with an empty skills/ dir.
Files
- hermes_cli/main.py:
+ new helper `_sync_bundled_skills_quietly()` at module level
+ cmd_chat: replace inline block with helper call
+ cmd_dashboard: add helper call after fastapi import succeeds
+ cmd_gateway: add helper call before delegating to gateway_command
* feat(desktop): hoisted todo widget, JSON tool summaries, history grouping & timer fixes
- Hoist todo to first-class widget (shadcn checkboxes, brand colors, no
tool-accordion). Header derives label from active task; non-active rows fade.
- Replace raw JSON dumps with structured key/value summaries via
formatToolResultSummary; nested error extraction for clearer failures.
- Fix loaded-session grouping: stitch interleaved assistant/tool iterations
into one bubble instead of orphaned synthetic messages.
- Stable tool/thinking timers via keyed registry so unmount/scroll doesn't
reset elapsed counts; gate "running" on real live thread state.
- Reorganize chat-only assistant-ui components under components/chat/.
* fix(desktop): address CodeQL alerts on PR #20059
- settings/helpers.ts: harden setNested against prototype pollution.
POLLUTING_PATH_PARTS check is now applied at every assignment site
(loop + leaf) and uses Object.defineProperty so CodeQL can see the
guard inline rather than via a helper function call.
- lib/markdown-preprocess.ts: rebuild the dangling-fence close regex
from a fence-char + length instead of marker.replace(...). The marker
is captured by `(`{3,}|~{3,})` so it can only be backticks or tildes,
but CodeQL was tracing tainted input text into the RegExp source and
flagging hostname dots from input as part of the pattern (false
positive js/incomplete-hostname-regexp on the test fixture URLs).
Reconstructing from a literal char breaks the dataflow.
- scripts/notarize-artifact.cjs: drop args from the run() rejection
message. Args carry --key-id / --issuer / key file path; the existing
outer catch already squashes errors to a generic line, but CodeQL was
flagging the args.join(' ') as clear-text logging of APPLE_API_KEY_ID.
Composer DOM-text-as-HTML alerts (composer/index.tsx:379, :547) are
already addressed in 4dd9732a9 — innerHTML assignment was replaced with
renderComposerContents which builds DOM via replaceChildren / append
text nodes (no HTML interpretation).
* fix(desktop): inline prototype-pollution guard so CodeQL sees it
CodeQL's dataflow doesn't follow the helper-function guard inside
`safeSet`, so it kept flagging Object.defineProperty as prototype-
polluting. Inline the literal `__proto__`/`constructor`/`prototype`
check at the assignment site to break the dataflow.
Behavior unchanged — same set of disallowed keys, same throw.
* feat(ui-tui): resolve links to readable page titles
Mirror desktop pretty-link behavior in the TUI by resolving HTTP links to page titles with shared caching and safe fetch filters, plus slug-based fallbacks so chat links stay readable even when title fetch fails.
* fix(desktop): drop RegExp from dangling-fence close detection
Previous attempt tried to break the dataflow by reconstructing the
close-fence regex from a literal char + marker.length, but CodeQL still
traced marker.length back to input and kept flagging the test-fixture
URLs as hostname-regex sources (js/incomplete-hostname-regexp).
Replace `new RegExp(...)` + `closeRe.test(body)` with a string-only
hasCloseFenceLine() helper that splits on '\n' and uses ===. No regex
on this path now, so input data can no longer reach a RegExp source.
Behavior preserved: matches lines that are (whitespace + marker +
whitespace), which is what the original `\n[ \t]*${marker}[ \t]*(?=\n|$)`
matched. All 12 markdown-text tests still pass.
* fix(process-registry): suppress windows-footgun false positive on guarded killpg
Keep the existing POSIX-only process-group teardown path, but make the
signal selection explicit via getattr and add an inline windows-footgun
suppression marker on the guarded os.killpg line so the Windows footgun
check no longer blocks CI on this intentionally platform-gated code.
* feat(desktop): reconcile live tool events, polish thread chrome, harden boot
- chat-messages: match tool rows by overlapping query/context/preview values
so preview-first `tool.progress` rows reliably adopt later stable-id
`tool.start` payloads instead of spawning ghost rows or mis-merging
parallel same-name calls; preserve prior args/result across phases.
- tui_gateway: emit full args + parsed result on `tool.start` / `tool.complete`,
drop redundant `tool.started` re-emit from `tool.progress`.
- electron/main: prefer SOURCE_REPO_ROOT before PATH `hermes` in dev so
local backend edits actually run; split hardening helpers into
`electron/hardening.cjs` with tests.
- thread/tool UI: one-shot enter animation keyed by stable ids, braille
spinner for running rows, Cursor-like disclosure rows, drill-down +
duration/count formatting via new tool-fallback-model.
- composer: extract `text-utils`, drop liquid-glass overrides.
- right-rail: split preview-pane into preview-console / preview-file.
- runtime: incremental external-store runtime + runtime-readiness gate;
onboarding store + tests; route-resume hook test.
- regression tests for live tool reconciliation (parallel tools, id-less
progress, preview-first rows, structured args/results).
* feat(desktop): add ripgrep to NSIS prereq page + polish layout
Add ripgrep as a third (recommended) prereq alongside Python and Git in
the NSIS prereq detection page, and clean up the page layout based on
on-VM testing.
Why ripgrep
- Hermes' search_files tool calls `rg` directly for content + filename
search (tools/file_operations.py:1382). Falls back to grep/find from
Git Bash when missing — works but slower and noisier (no .gitignore
awareness).
- ~5MB winget install via `BurntSushi.ripgrep.MSVC --scope user` — no
UAC prompt, parallel to how Python installs.
- scripts/install.ps1 already installs ripgrep as part of
Install-SystemPackages; this brings the desktop installer to parity.
Why "recommended" not "required"
- Python and Git are hard requirements: without them the agent runtime
or terminal tool refuses to start. The bootstrapper preflight throws.
- ripgrep is a performance enhancement: missing it just means slower
searches. Page wording reflects this; failure to install is logged
but doesn't show a MessageBox or block.
Layout polish (response to on-VM screenshot review)
- Wizard header now correctly reads "System Requirements" instead of
the leftover "Choose Install Location" from the previous page. Set
via `GetDlgItem $HWNDPARENT 1037/1038` + WM_SETTEXT — the standard
NSIS pattern for overriding the page header on a custom Page.
- Removed redundant in-body title + verbose intro paragraph; the
wizard header IS the title now. Body has one short intro line.
- Group boxes tightened to 26u with content positioned just below the
groupbox title (not top-anchored status + bottom-anchored checkbox
with empty space in the middle). All three panels + footer fit
comfortably in 126u, well under the 140u page limit.
- Checkbox labels simplified: dropped "(per-user, no admin prompt)"
and "(administrator approval required)" suffixes. The footer note
still calls out UAC for Git when relevant.
- Footer text trimmed to fit cleanly without clipping.
Install order (in customInstall macro)
- Python → ripgrep → Git
- Python and ripgrep are silent and run first; Git's UAC prompt comes
last so the user's approval interaction isn't interrupted by silent
activity afterwards.
Skip behavior unchanged
- All three detected → page auto-skips via Abort
- Silent install (/S) → customInstall winget block skips
- User unchecks all → page advances without running winget
Files
- apps/desktop/installer/prereq-check.nsh: ripgrep detection block,
ripgrep page panel + checkbox, ripgrep customInstall block,
GetDlgItem header override, layout reflow
- apps/desktop/README.md: Runtime prerequisites section updated to
list ripgrep as recommended, with manual winget command
* feat(desktop): add model-confirmation step to onboarding
After OAuth/API-key login completes, onboarding now shows a confirmation
card with the curated default model and a Change button before dropping
the user into chat. Closes the gap where the desktop's `model.default`
was empty after first launch and the agent had to fall back to whatever
heuristic happened to fire — leaving users wondering "why am I getting
sonnet-4 when I logged into Nous Portal?"
Why
- Desktop onboarding only persisted credentials, never `model.default`.
The CLI's `hermes model` command pairs provider + model selection,
but the desktop's onboarding skipped the model step entirely.
- Result: users saw whichever model the agent's auto-fallback picked,
unpredictably and undocumented.
- For the BUILD demo we want users to land on the model they expect
for their provider, with a clear "this is what you're getting" UI
and a one-click path to change it before chatting.
How
- New `confirming_model` flow status carries the just-authenticated
provider slug, current default model, label, and a saving flag.
- `completeWithModelConfirm()` runs after credentials succeed: reloads
env, verifies runtime, fetches /api/model/options to find the curated
first-model for the provider, persists it via /api/model/set, then
transitions into `confirming_model`.
- If anything fails (no providers returned, network error), falls
through to the previous behaviour — onboarding completes without
the confirm step. Polish, not a hard requirement.
- All four credential paths (device_code OAuth, PKCE OAuth, external
CLI flow, API key) now use completeWithModelConfirm instead of
reloadAndConnect.
UI
- `ConfirmingModelPanel` shows: green "<provider> connected" banner,
card with "Default model: <name>" + Change button, and a "Start
chatting" CTA that finalises onboarding.
- Reuses the existing `ModelPickerDialog` (the same picker available
from the chat shell) for the change-model UX. Search, filtering,
multi-provider listing — all already built.
- Stacking: ModelPickerDialog defaults to z-130, which renders UNDER
the onboarding overlay (z-1300) and breaks pointer events. Added
optional `contentClassName` prop to ModelPickerDialog so callers
can override; onboarding passes `z-[1310]`.
Provider-slug matching
- For OAuth flows: pass `provider.id` directly as the preferred slug.
- For API-key flows: `OPENROUTER_API_KEY` → "openrouter" via env-key
prefix strip. Also includes the user-visible label as a fallback
candidate.
- fetchProviderDefaultModel falls back to the first authenticated
provider in the response if no preferred slug matches — so even a
miss still surfaces a reasonable default.
Files
- apps/desktop/src/store/onboarding.ts:
+ new `confirming_model` flow variant
+ fetchProviderDefaultModel + completeWithModelConfirm helpers
+ setOnboardingModel (optimistic update + revert on failure)
+ confirmOnboardingModel (finalises onboarding from the card)
- reloadAndConnect (replaced; the four call sites now go through
completeWithModelConfirm)
- apps/desktop/src/components/desktop-onboarding-overlay.tsx:
+ ConfirmingModelPanel component
+ new branch in FlowPanel for status `confirming_model`
+ ModelPickerDialog usage with z-[1310] content class
- apps/desktop/src/components/model-picker.tsx:
+ optional `contentClassName` prop on ModelPickerDialog so the
dialog can be stacked on top of other fixed overlays
Tested
- `npm run type-check` passes
- `npx eslint` clean on touched files
- Live test in `npm run dev`: cleared onboarding cache, walked
through Nous device-code flow, saw confirm card with curated
default, clicked Change → ModelPickerDialog rendered above the
onboarding overlay with working pointer events, picked a different
model, "Start chatting" persisted to ~/.hermes/config.yaml.
* fix(desktop): suppress generic provider warning in onboarding
Hide the red setup notice when the message is the generic missing-provider guidance, since onboarding already presents provider auth actions. Centralize provider-setup matching across desktop hooks and add coverage for the matcher.
* fix(desktop): add 2u clearance below prereq checkboxes
Group box bottom border was clipping the checkboxes by 1-2px.
Bumped each box height 26u→30u; checkboxes now sit 2u above the bottom border.
* fix(nix): refresh dashboard lockfile hash
Update the web npm deps hash in nix/web.nix to match the committed apps/dashboard/package-lock.json so bb/gui passes the nix lockfile check.
* fix(desktop): install TUI deps in release workflow
Ensure desktop release builds install the standalone ui-tui package before bundling the TUI payload.
* fix(desktop): run release builder from app package
Invoke the desktop builder through the package script so electron-builder uses apps/desktop/package.json.
* fix(desktop): expand release artifact names safely
Build desktop artifact names from workflow version/channel while preserving electron-builder platform macros.
* fix(desktop): use package artifact naming in release workflow
Let electron-builder's desktop package config provide platform-specific artifact extensions while the workflow injects the release version/channel metadata.
* fix(nix): fetch dashboard npm deps from package root
Point the dashboard npm dependency fetch at apps/dashboard so Nix can find the package lockfile after the dashboard move.
* fix(nix): build dashboard from package directory
Set the web package source root to apps/dashboard so npm patch/build phases run beside the dashboard lockfile while keeping apps/shared available as a sibling.
* feat(desktop): render LaTeX math via KaTeX after streaming completes
Add @streamdown/math plugin to the chat markdown renderer.
Inline ($x^2$) and block ($$...$$) math both supported with
singleDollarTextMath enabled. Plugin is gated to non-streaming state
to match the existing pattern for syntax highlighting — math renders
when the message completes, avoiding KaTeX re-render churn during
streaming. KaTeX CSS is imported in styles.css; ~30KB CSS + ~430KB
JS added to the bundle. Smoothness improvements during streaming
deferred to a follow-up.
* perf(desktop): memoize KaTeX renders so math streams without re-rendering
Wrap rehype-katex with a per-equation LRU cache (keyed by
displayMode + source text) and re-enable math during streaming.
Stock @streamdown/math runs rehype-katex on every markdown commit,
so each new token re-katexes every equation in the message. For
math-heavy responses (an equation derived step-by-step) that's
hundreds of ms of wasted work per token and the streaming UI
chokes. With memoization, each equation pays katex.renderToString
exactly once; subsequent tokens re-walk the tree but hit cache for
unchanged equations.
The wrapper mirrors rehype-katex's semantics exactly: same class
detection (language-math, math-inline, math-display), same
<pre>-walk-up for fenced math blocks, same parent.children.splice
replacement, same SKIP traversal, same strict-then-lenient render
strategy with VFile message reporting.
Cached children are structuredCloned on each splice so downstream
rehype plugins or toJsxRuntime can't mutate the cache.
* fix(desktop): declare katex-memo deps directly + drop per-app lockfile
katex-memo.ts (added in 112cad59b) imports hast-util-from-html-isomorphic,
hast-util-to-text, remark-math, katex, and unist-util-visit-parents but
those were never added to apps/desktop/package.json. They were silently
resolving via @streamdown/math at the workspace root, which broke the
moment `npm i --prefix apps/desktop` ran with the per-workspace lockfile
because that install only consults apps/desktop/package.json. Add them
as direct deps, plus unified/vfile/@types/hast for the type imports.
Also delete apps/desktop/package-lock.json — root package.json declares
workspaces: ["apps/*"], so npm manages all lockfile state at the root.
The stale per-app lockfile is what made `npm i --prefix apps/desktop`
diverge from the workspace install in the first place and left an empty
apps/desktop/node_modules/@assistant-ui/ stub that Vite's dep optimizer
then tried (and failed) to open at @assistant-ui/core/dist/internal.js.
* feat(desktop): disable Backdrop noise overlay by default
The noise overlay defaulted to on, which adds a busy speckle layer over
the whole window for every new user. Flip the Leva default to off; the
toggle stays in Backdrop / Noise for anyone who wants it back.
* fix(desktop): polish LaTeX rendering — currency, code blocks, brackets
Five distinct bugs surfaced from a math-heavy stress test:
1. Adjacent code fences glued together. scrubBacktickNoise's
second-pass regex /``\s*``/g matched the LAST 2 backticks of
one fence + whitespace + FIRST 2 backticks of the next, collapsing
two blocks into one. Fixed with lookbehind/lookahead so we only
match exactly 2 backticks not part of a longer run.
2. Whitespace eaten between fences and following content.
stripPreviewTargets internally calls .trim() which strips leading/
trailing whitespace from each split-segment. For segments between
two fences this collapsed \n\n to '', gluing fence close to next
block. Fixed by capturing leading/trailing whitespace at the call
site and restoring it after the transform.
3. Currency dollar signs eaten as math. With singleDollarTextMath:true
remark-math greedy-matched any pair of $, so '$5 ... $10' became
one inline math span. Added escapeCurrencyDollars to escape $<digit>
patterns to \$<digit> in prose segments (not in code). Trade-off:
math expressions starting with a digit (rare — '$5x = 10$') get
escaped too. Mirrors the convention in ChatGPT/Claude's UIs.
4. \(...\) and \[...\] LaTeX brackets unsupported. Models often
emit these instead of $...$ / $$...$$. Added
rewriteLatexBracketDelimiters preprocessor pass.
5. ```latex / ```tex blocks were being routed to KaTeX via a
rewrite to ```math. Aligns with GitHub markdown convention:
```math = render as math; ```latex / ```tex = LaTeX/TeX
source code (syntax highlighted, not rendered). Conflating them
broke teaching/showing-source use cases. MATH_FENCE_LANGUAGES
pruned to {'math'} only.
Also flipped parseIncompleteMarkdown to true (was !isStreaming) so
the math parser can't see $ inside streaming-but-not-yet-closed code
fences. Shiki was already deferred via defer={isStreaming} so this
doesn't introduce new tokenization cost.
Test: 18/18 existing tests still pass; one test updated to expect
escaped \$ in currency-prose-with-URL case.
* fix(desktop): detect Python via registry/filesystem; pin to 3.11–3.13
Two related fixes for Python detection on Windows:
1. py.exe (Python launcher) is missing from per-user installs that
didn't check the launcher option, so 'py -3.X --version' alone
misses real Python installs. User-reported case: clean Win11 +
official Python.org 3.14 install -> 'where py' returned nothing,
our installer offered to install Python again. Both NSIS prereq
page and main.cjs now probe in this order:
1. py.exe launcher (when present)
2. PEP 514 registry: HKLM/HKCU\SOFTWARE\Python\PythonCore\<v>\InstallPath
3. Filesystem: %ProgramFiles%\Python<v>, %LocalAppData%\Programs\Python\Python<v>
Crucially, we never fall back to running 'python.exe' from PATH
on Windows — the WindowsApps stub at %LOCALAPPDATA%\Microsoft\
WindowsApps\python.exe is a redirector that opens the Microsoft
Store window if no Store Python is installed. Triggering that
during boot would be terrible UX. Registry/filesystem probes
never execute the binary.
2. Drop 3.14 from the supported version set. Several Hermes deps
(notably pywinpty, which carries Rust crates like
windows_x86_64_msvc) don't yet publish 3.14 wheels. With wheels
missing, 'pip install -e .' falls back to building from sdist,
which needs a Rust toolchain — users see 'could not compile
windows_x86_64_msvc build script' on first run. install.ps1
sidesteps this by pinning to 3.11 via uv; the desktop installer
doesn't yet have the same uv-managed-Python pathway, so for now
we accept 3.11/3.12/3.13 and tell winget to install 3.11 if
none of those are present. Revisit when the wheel ecosystem
catches up to 3.14 (~early 2026).
* feat(desktop): Cron, Profiles, usage analytics, and titlebar fixes
- Add Cron and Profiles sidebar routes with full CRUD-style flows and API wiring.
- Extend Command Center with auxiliary task overrides and a Usage panel (7d/30d/90d).
- Fix titlebar geometry for WSL/Windows (native overlay width, tool spacing).
- Remove stray merge conflict markers from pyproject.toml optional deps.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(title-bar): position sidebar toggle button
* feat(desktop): composer queue — queue many, edit/delete/cancel-edit, Cursor-style
Press Enter while busy with a draft to queue it; with no draft to interrupt
and send the next queued turn. Auto-drains one queued turn each time the
session settles, same as Cursor. Queue persists across reloads so an
interrupted-and-queued turn isn't lost on refresh.
Each queued row supports edit-in-composer (with explicit Save/Cancel),
send-now (↑), and delete. Drain skips only the entry currently being
edited so the rest of the queue keeps flowing.
Queue dequeue is transactional — an entry only leaves the queue after
`prompt.submit` is accepted, so a rejected submit doesn't drop the turn.
Also shrinks the `[interrupted]` marker to a muted one-liner and drops
its assistant footer so it stops looking like a real reply.
* fix(desktop): handle empty usage analytics totals
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(desktop): address PR review titlebar and usage races
Co-authored-by: Cursor <cursoragent@cursor.com>
* feat(desktop): add MCP settings and live subagent tree
Surface configured MCP servers in Settings with JSON edit/save and a gateway-backed reload action so users can manage tool servers without falling back to slash commands.
Track live subagent gateway events in a desktop store, show active subagent counts in the Agents statusbar item, and replace the Agents overlay stub with a live spawn tree for the active session.
* fix(desktop): move power-user views out of sidebar
Keep Cron and Profiles available through lower-prominence chrome entry points so the workspace sidebar stays focused on core chat navigation.
Co-authored-by: Cursor <cursoragent@cursor.com>
* refactor(desktop): subagent overlay reads like a live transcript, not a dashboard
Strip the card chrome and rewire /agents to feel like peeking into the
child agent's stream:
- subagents store: single `stream` of typed entries (thinking/tool/progress/
summary) replaces the parallel notes/thinking/tools arrays. Drop unused
fields (toolsets, depth, apiCalls, reasoningTokens, sessionId).
- agents view: no OverlayCards, no boxed stream, no per-row borders. Goal +
status pill + indented stream lines, full row width.
- Group root spawns into "Delegation N" sections when batch shape + spawn
time match — hides task-index interleaving and makes hierarchy obvious.
- Sort tree by spawn time, then task_index. Step indicator is one colored
pill (primary while running, emerald when done) inside the row, not a
trailing pill that wrapped under the chevron.
- Tree picks up `subagent.start` (not only `spawn_requested`) and prunes
delegate-tool fallback rows once native subagent events land for the
session — fixes duplicate "Delegated task" rows alongside the real ones.
* feat(desktop): Esc closes every OverlayView-based overlay
Lift the keyboard handler into the shared OverlayView so Agents, Settings,
Command Center — and anything we build on top of it later — all dismiss on
Esc by default. Nested Radix dialogs stop propagation themselves, so a
modal opened inside an overlay (e.g. model picker inside Settings) still
closes the modal first, not the overlay underneath.
Drop the now-redundant Esc handlers in Settings (kept Cmd/Ctrl+P) and
Command Center.
* fix(desktop): drop numbered step pill on subagent rows
The pill was getting clipped at the overlay edge anyway. Just use the
status glyph (●/✓/✗/■/○) — the delegation header already conveys
"3 workers, 3 active", and order in the list implies which step you're
looking at.
* fix(desktop): drop noisy "returned N items / empty object" stub strings
When a tool returns nothing useful, the row should be silent — the title
("Search Files", etc.) already tells the user what happened. Counting the
fields in an opaque payload is engineer-noise.
`formatToolResultSummary` and `minimalValueSummary` now return '' for
empty arrays / records / unrecognized values; tool-fallback already hides
the detail section when its body is empty.
* refactor(desktop): subagent rows borrow chat tool patterns (fade-in, lucide glyphs, shimmer)
Pull the agents view closer to how chat tool blocks render:
- statusGlyph() returns the same lucide BrailleSpinner / CheckCircle2 /
AlertCircle vocabulary as tool-fallback's statusGlyph
- Stream lines fade-in via useEnterAnimation (one-shot WAAPI), keyed per
entry so streamed deltas settle in instead of popping
- Subagent rows fade in too, and pick up the existing data-slot=tool-block
spacing rules between blocks
- Active stream line trails a BrailleSpinner instead of a hand-rolled
pulsing rectangle
- Goal text drops FadeText (which forces nowrap); keep FadeText only for
the single-line meta subtitle
- Running rows shimmer the title — same affordance the chat thinking row
uses
* refactor(desktop): make /agents subagent-only, drop sidebar + dead sections
Activity rail and History stub were both noise. Strip the split layout,
sidebar, route enum, and the rail/stub helpers — the overlay is now just
the spawn tree, centered in a max-w-3xl column so it stops claiming the
whole screen for one section's worth of content.
* feat: update cron modals
* Add dedicated GUI log stream for dashboard debugging.
Capture dashboard and PTY websocket lifecycle failures in gui.log and expose it via hermes logs.
* Improve desktop runtime UX by surfacing inference readiness in gateway status and hardening WSL link opening.
This also stabilizes markdown code/table block spacing and adds root-install guards so desktop dev runs use a healthy workspace dependency tree.
* Log detailed GUI websocket failure metadata.
Capture richer reject/disconnect/send/parse context for dashboard gateway websocket flows so GUI connection failures are diagnosable from logs.
* Default dashboard startup logging to GUI mode.
Detect the dashboard subcommand during early CLI bootstrap so gui.log is attached from process start and GUI startup failures are always captured.
* Clean up gateway status conditionals and logging bootstrap mode detection.
Simplify nested dashboard gateway status branches for readability and use a concise first-subcommand check when selecting early GUI logging mode.
* add logging to nsis installer
* feat: glass ui pass
* fix(desktop): persist inline assistant errors across hydrate/resume
- Detect provider failure text arriving via message.complete
(HTTP 4xx, "API call failed after N retries", Provider/Gateway
error: ...) and persist as an inline assistant error instead of
regular completion text, blocking the hydrate that was wiping it.
- preserveLocalAssistantErrors: merge by id so same-id hydrated
messages keep their local error, and preserve the optimistic
user+error pair as a unit (with tail-user dedupe).
- Hook all hydrate/resume writers (use-session-actions resume +
fallback, hydrateFromStoredSession, syncSessionStateToView) into
the merge so stale snapshots can't clobber a failed turn.
- Add error to chatMessagesEquivalent so the resume diff actually
sees error-only changes and paints them.
- editMessage on a failed turn now submits a plain resend (no
truncate_before_user_ordinal) and retries plainly on the
"no longer in session history" race.
Style polish on touched files:
- Inline error: text-only treatment (no card).
- User stop / edit-composer send: shared Tabler IconPlayerStopFilled
glyph + shared icon-button class slot for parity.
* feat(desktop): theme xterm with active light/dark mode
The right-sidebar terminal hardcoded a light palette, which read poorly
on the dark glass surface. Subscribe to `useTheme().resolvedMode` and
hot-swap `term.options.theme` so Shift+X (and any other mode change)
updates the terminal in place without tearing down the PTY session.
Dark mode uses xterm's built-in defaults (white fg/cursor + vivid ANSI
16) with just a transparent background so the glass shows through;
light mode keeps the existing hand-tuned overrides for legibility on a
bright surface.
* feat(sidebar): right-click + drag-reorder sessions and workspaces
- Wire right-click on session rows to open the same actions menu;
suppresses the OS-native context menu so Windows stops looking awful.
- Share dropdown + context menu items via useSessionActions() driving
a single declarative ItemSpec[]; render polymorphic over MenuItem.
- New shadcn ContextMenu primitive mirroring DropdownMenu styling.
- Restore drag-and-drop reordering for Agents (lost during the cwd
cleanup) and add reordering of workspace groups via a right-side
grab handle. Pinned reorder unchanged.
- Generic orderByIds<T> replaces the duplicated session/group orderers;
useSortableBindings() hook collapses the two Sortable wrappers.
- cursor-pointer on every actionable element; cursor-grab on handles.
- KISS pass: baseName() helper, AGE_TICKS table, single WORKSPACE_PAGE
constant, flatter SidebarSessionsSection render.
* feat(desktop): solarize the xterm palette in both light & dark
xterm's default ANSI 16 is tuned for dark and reads candy-bright on the
light glass surface (vivid cyans/greens). Ship the canonical Solarized
palette (Schoonover) for both modes — same 16 accents either way, only
fg/cursor swap between `base00/01` (light) and `base0/1` (dark), so a
prompt's colors look uniform across a Shift+X toggle.
Background stays transparent in both modes — Solarized's cream/slate
backgrounds would fight the glass.
* feat(desktop): virtualize chat thread + sidebar via TanStack Virtual
Replaces `use-stick-to-bottom` and per-row session rendering with
`@tanstack/react-virtual`, matching what Cursor uses.
Chat thread (`thread-virtualizer.tsx`):
- Natural-flow virtualization (padding spacers, not absolute items) so
`position: sticky` on the human bubble still resolves cleanly against
the scroller.
- Custom at-bottom anchor: pins when armed, disarms on user-driven
upward scroll, re-arms at bottom, jumps on session switch +
`thread.runStart`.
- Loading indicator and `--thread-last-message-clearance` move to a
real `[data-slot=aui_composer-clearance]` node; drops the brittle
`:nth-last-child(1 of …)` rule that can't fire reliably under
virtualization.
Sidebar (`virtual-session-list.tsx`):
- Flat agents list virtualizes at >=25 rows; pinned and
workspace-grouped paths stay direct-render.
- `SortableContext` keeps all IDs; only the window mounts; dnd-kit's
`setNodeRef` is merged with `virtualizer.measureElement` so rows
participate in both DnD hit-testing and TanStack measurement.
Drops `use-stick-to-bottom`. Streaming test gets a global
`offsetWidth/offsetHeight` stub so the virtualizer's viewport sizing
works in jsdom; the scroll-up-doesn't-pull-back invariant still passes.
* feat: more ui qa
* fix(desktop): trim sidebar terminal startup spacer
Drop zsh's initial spacer row before writing the first terminal prompt so new sidebar terminal sessions do not open with a selectable blank line.
* chore: uptick
* feat(desktop): thin installer + first-launch install.ps1 bootstrap
Converges the Windows packaged desktop installer onto a single canonical
install topology: drop the Electron shell only (~80MB instead of ~500MB),
clone Hermes Agent at a build-time-pinned commit on first launch via
install.ps1's stage protocol, and treat the resulting git checkout at
%LOCALAPPDATA%\hermes\hermes-agent\ as the canonical install location
(same path the CLI installer uses). Future updates flow through the
existing applyUpdates() git-pull path.
Replaces the previous fat-installer architecture where the .exe bundled
a pre-staged hermes-agent source tree under resources/hermes-agent/ that
was then sync'd into ACTIVE_HERMES_ROOT at launch -- a complicated
factory-vs-active dance with several footguns (FACTORY_HERMES_ROOT
mismatch on path resolve, isGitCheckout guard regressions, pyproject
hash drift detection inside the sync loop).
Architecture overview
---------------------
Build time
apps/desktop/scripts/write-build-stamp.cjs writes
apps/desktop/build/install-stamp.json with {commit, branch, builtAt,
dirty}. Honours $GITHUB_SHA / $GITHUB_REF_NAME in CI, falls back to
`git rev-parse HEAD` locally.
apps/desktop/scripts/stage-native-deps.cjs copies the runtime subset
of @homebridge/node-pty-prebuilt-multiarch from the workspace-root
node_modules into apps/desktop/build/native-deps/. Workspace dedup
hoists this dep to the root, out of reach of electron-builder's
`files:`-restricted collector; staging gives us a deterministic
path to extraResources.
electron-builder ships both into resources/install-stamp.json and
resources/native-deps/ respectively.
Boot resolver (electron/main.cjs)
Resolver order:
1. HERMES_DESKTOP_HERMES_ROOT override
2. SOURCE_REPO_ROOT (dev mode)
3. ACTIVE_HERMES_ROOT git checkout WITH .hermes-bootstrap-complete
marker -- the post-install fast path
4. `hermes` on PATH (CLI-installed user adding the desktop)
5. pip-installed hermes_cli via system Python
6. bootstrap-needed sentinel -> hand off to runBootstrap
Deletes the entire FACTORY_HERMES_ROOT / RUNTIME_MARKER /
syncTreeExcludingVenv machinery (-200 lines). The isGitCheckout
guard that bit us in the install.ps1 PR is gone.
First-launch bootstrap (electron/bootstrap-runner.cjs)
1. Resolve install.ps1: prefer SOURCE_REPO_ROOT/scripts (dev), else
download from GitHub raw at INSTALL_STAMP.commit (cached at
HERMES_HOME\bootstrap-cache\install-<sha>.ps1).
2. Fetch the stage manifest via install.ps1 -Manifest -Commit X
-Branch Y.
3. Iterate stages: install.ps1 -Stage <name> -NonInteractive -Json
-Commit X -Branch Y per stage.
4. On all stages green: write the .hermes-bootstrap-complete
marker with {schemaVersion, pinnedCommit, pinnedBranch,
completedAt, desktopVersion}.
Per-run log to HERMES_HOME\logs\bootstrap-<ts>.log. Cancellation
via AbortSignal. Manifest cache so retries don't re-download.
Install overlay (src/components/desktop-install-overlay.tsx)
Mounted alongside the existing onboarding overlay; flexbox card
with header (static) + middle (scrollable) + footer (failure-only,
static). Subscribes to hermes:bootstrap:event IPC + resyncs from
hermes:bootstrap:get on mount/reload. Renders:
- 14-stage checklist with per-stage state icons
- Overall progress bar + current-stage spotlight
- Auto-expanded installer-output panel on failure
- "Copy output" button (full ring buffer + error to clipboard)
- "Reload and retry" wired through hermes:bootstrap:reset to
clear main.cjs's latched failure
Synthetic empty-manifest event from main.cjs flips the overlay to
'active' immediately so the slow install.ps1 download doesn't
leave the user staring at the generic Preparing splash.
Failure latching (main.cjs)
bootstrapFailure module-scope variable holds the rejection after
install.ps1 fails. startHermes() throws the latched error
immediately when set, bypassing the entire ensureRuntime +
runBootstrap chain. Without this, the renderer's ensureGatewayOpen
retries would re-run install.ps1 in a 5-10 min hot loop while the
user was still reading the failure overlay. Cleared via
hermes:bootstrap:reset on user-driven retry.
Unsupported-platform overlay (1F)
macOS / Linux packaged builds (no install.sh stage protocol yet)
emit an unsupported-platform event with a copy-pasteable install
command + docs URL. Dedicated overlay branch with "Copy command"
+ "I've run it -- retry" buttons.
install.ps1 additions (Phase 1F.3 + 1F.5)
-----------------------------------------
New -Commit and -Tag string params. Precedence Commit > Tag >
Branch. Honoured by all three code paths (update / fresh clone /
ZIP fallback), with archive URL selection that handles each
ref-type variant. Detached-HEAD checkouts intentionally -- they're
pins, not branches the user pulls into.
EAP=Continue wrap around the new pin-step git invocations. `git
fetch origin <commit>` writes the routine 'From <url>' info line to
stderr; under the script's global EAP=Stop that terminates the
script even though fetch+checkout succeed. Matches the established
pattern in Install-Uv, Test-Python, _Run-NpmInstall.
Backend fix (hermes_cli/web_server.py)
--------------------------------------
CORS allow_origin_regex now accepts Origin: 'null'. Packaged
Electron loads index.html via file://; Chromium sets the WebSocket
upgrade Origin header to the opaque origin 'null', which the old
regex rejected with HTTP 403 before gateway_ws() ever ran. This
failure mode was masked in the older FACTORY_HERMES_ROOT
architecture because the resolver often found an existing hermes
on PATH with different binding behavior.
Security maintained: localhost-only bind keeps cross-machine pages
out; per-process session token still gates every authenticated
/api/ endpoint regardless of Origin.
Desktop QoL
-----------
DevTools is now enabled in packaged builds (F12 / Cmd+Opt+I).
Field-debugging trade-off: tiny attack surface increase versus
a much better support story when CSP / WS / theme issues surface.
NSIS prereq-check page deleted (-767 lines). The standard
Welcome -> License -> Directory -> InstallFiles -> Finish wizard
now installs without custom Python/Git/ripgrep detection -- those
prereqs are install.ps1's job at first launch.
Test infrastructure (Phase 1G)
------------------------------
apps/desktop/scripts/test-desktop.mjs rewritten as a cross-platform
bundle validator (was darwin-only and asserted on dead factory-
payload paths):
NEGATIVE: hermes_cli/main.py is NOT shipped (regression guard)
POSITIVE: install-stamp.json carries a real commit + branch
POSITIVE: node-pty native deps shipped under resources/native-deps
POSITIVE: renderer dist/index.html reachable (asar or unpacked)
New nsis mode and npm run test:desktop:nsis script.
Validated end-to-end on clean Win10 VM
--------------------------------------
Confirmed: NSIS installer drops Electron shell, app launches,
install overlay shows progress, install.ps1 clones the pinned
commit, 14 stages run to completion, marker written, backend
spawns, WebSocket connects, onboarding overlay asks for API key,
main UI loads, integrated terminal works.
Failures handled: bootstrap stays failed (no hot-loop retry),
"Copy output" gives actionable transcript, "Reload and retry"
explicitly re-runs install.ps1.
What's deferred
---------------
- MSIX wrapping (Phase 2): same Electron .exe under MSIX manifest
with runFullTrust, signed and submitted to Microsoft Store.
- install.sh stage protocol parity (Phase 2): once shipped, the
unsupported-platform overlay becomes drive-it-yourself and
macOS/Linux packaged installers gain feature parity with Windows.
* feat(desktop): persistent terminal pane + fullscreen takeover
Adds a VSCode-style "focus terminal" toggle to the right sidebar's Terminal
tab that takes over the chat pane area without unmounting the shell. The
xterm host is mounted once at the layout root and CSS-overlayed onto
whichever <TerminalSlot /> is currently active, so the PTY session,
scrollback, selection, focus, and WebGL renderer survive every toggle.
Also:
- WebGL renderer (matching dashboard ChatPage) so Hermes' TUI skins paint
faithfully instead of muting through xterm's default DOM renderer
- File drag/drop from the project tree or OS into xterm — paths are
shell-quoted (zsh/bash/pwsh/cmd) and written straight into the PTY
- Solarized dark canvas with brights promoted to real accent variants
(Schoonover's UI-gray brights washed out every TUI accent)
- Strip NO_COLOR/FORCE_COLOR/COLORFGBG/TERM=dumb leaking from non-tty
parents (CI runners, Cursor's agent shell) so the embedded shell gets
truecolor regardless of how Electron was launched
- rAF-debounced ResizeObserver — running fit.fit() synchronously during
sibling pane transitions crashed the WebGL texture-atlas rebuild
* fix(install.ps1): strip UTF-8 BOM regression that broke 'irm | iex'
The canonical install flow
irm https://raw.githubusercontent.com/.../scripts/install.ps1 | iex
fails on PowerShell 5.1 with a cascade of 'The assignment expression
is not valid' errors at every param() default value:
[string]$Branch = 'main',
~~~~~~
The assignment expression is not valid. The input to an assignment
operator must be an object that is able to accept assignments...
Root cause: scripts/install.ps1 carries a UTF-8 BOM (0xEF 0xBB 0xBF)
as its first three bytes. 'irm' returns the response body as a string;
on PS 5.1 the BOM survives into that string as a leading \ufeff
character. 'iex' then evaluates the string and PS's parser chokes
on the invisible character before param() -- error recovery proceeds
into the body but every assignment is reported as broken.
This was the exact failure mode the install.ps1 hardening pass (PR
#27224) deliberately fixed by stripping the BOM and ensuring the
file body is pure ASCII. Commit 4279da4db ('fix(windows): make
PowerShell installer parse in 5.1') re-introduced the BOM later,
unintentionally undoing the irm|iex compatibility fix; the merge
that brought it into bb/gui carried it forward.
Fix: strip the three BOM bytes. File body is verified pure ASCII
(any-byte > 127 returns false), so PS 5.1 with no BOM falls back to
Windows-1252 decoding which is identical to ASCII for our content.
Both install paths now work:
- 'irm ... | iex' (canonical CLI)
- 'powershell -File install.ps1' (programmatic / desktop bootstrap)
* install.ps1: detect ARM64 Windows reliably for Node and Git stages
Add a Get-WindowsArch helper that reads Win32_Processor.Architecture
via CIM (invariant to PowerShell host bitness) with PROCESSOR_ARCHITEW6432
fallback. Use it in:
- Install-Git: previously only triggered the arm64 PortableGit asset
when invoked from a native-ARM64 PowerShell host. WoW64 / emulated
x64 hosts (the default powershell.exe on Windows-on-ARM) saw
PROCESSOR_ARCHITECTURE=AMD64 and fell through to the x64 PortableGit
build, leaving ARM64 users on emulated Git for Windows.
- Test-Node: previously hardcoded the Node download to win-x64 on any
64-bit OS, so ARM64 users always got x64 Node under Prism emulation
even though Node ships an arm64 build for Windows. The winget
fallback now also passes --architecture arm64 on ARM64.
Python remains x86_64 by design: uv intentionally prefers
windows-x86_64 cpython on ARM64 hosts for ecosystem (wheel)
compatibility (see astral-sh/uv#19015).
* install.ps1: harden Install-SystemPackages against winget msstore failures
The previous winget invocation discarded stdout/stderr and trusted no
signal at all -- not the exit code (winget exits 0 even when it bails
"please specify --source"), not output (sent to Out-Null), not the
catch handler (winget returning 0 means no exception fires). The only
trust signal was a post-install Get-Command rg / Get-Command ffmpeg
check, which would also miss the package because %LOCALAPPDATA%\
Microsoft\WinGet\Links (where winget puts command aliases) is added to
PATH by AppExecutionAlias machinery only in fresh shells. End result on
machines where the msstore source has a cert problem (0x8a15005e --
common on Windows-on-ARM and some corporate networks): silent failure,
no log, no breadcrumb, and the user is told the install succeeded.
Specifically:
- Pin --source winget on every winget install call. Defeats the broken-
msstore-source path. We ship nothing from msstore so this is safe and
forward-compatible.
- Add --exact --id for a tighter package match.
- Capture each winget invocation's combined stdout/stderr + exit code to
%TEMP%\hermes-winget-<pkg>-<n>.log instead of Out-Null. On the happy
path the log is deleted after the post-install check confirms the
binary is on PATH; on failure the log is kept and its path is named in
a Write-Warn so the user has something to grep.
- Refresh PATH to include %LOCALAPPDATA%\Microsoft\WinGet\Links in
addition to the User/Machine env-var hives, so Get-Command sees newly-
installed winget aliases in the same process.
- No behavior change on the happy path. Same Write-Info/Success/Warn
cadence, same fallback order (winget -> choco -> scoop -> manual),
same $script:HasRipgrep / $script:HasFfmpeg outputs.
Verified end-to-end on a real Snapdragon ARM64 Windows host: ripgrep
uninstalled, stage re-run, [OK] ripgrep installed in 1.4s, ok:true.
* desktop: swap node-pty fork for upstream microsoft/node-pty 1.1.0
The previous dependency, @homebridge/node-pty-prebuilt-multiarch@0.13.1,
publishes no win32-arm64 prebuilds on its v0.13.x line, and its v0.14.x
betas (which do add an arm64 Windows build) ship no electron-vXXX-win32-
arm64 prebuilds at all -- so packaged Electron 40 builds (NMV 143) would
fail at runtime even on a successful npm install. Net effect: the
desktop's integrated terminal was unbuildable on Windows-on-ARM, in
both dev (npm install fails: 404 fetching the node-vXXX-win32-arm64
prebuilt) and packaged builds (no Electron-ABI prebuilt exists).
The homebridge fork was originally created because upstream node-pty
shipped no prebuilds at all. That hasn't been true since node-pty@1.0
(April 2024), which:
- bundles prebuilts for mac (arm64+x64) and Windows (arm64+x64) directly
inside the npm tarball -- no GitHub-Releases fetch, no missing-binary
failure mode
- uses N-API (node-addon-api) for ABI stability across Node and Electron
major versions, so the same pty.node binary loads under Node 22 (dev)
and Electron 40+ (packaged) without per-ABI rebuilds
- is what VS Code, Hyper, and Theia actually ship
API surface is identical (spawn / onData / onExit / write / resize /
kill) -- no call-site changes needed.
Specifically:
- apps/desktop/package.json: replace the @homebridge fork with
node-pty@1.1.0 (exact pin). Widen `asarUnpack` from `["**/*.node"]`
to also unpack `**/prebuilds/**`, because node-pty ships runtime-
execed helpers alongside its .node files (darwin spawn-helper has no
extension and would not be matched by `**/*.node`; conpty.dll,
OpenConsole.exe, winpty.dll, winpty-agent.exe on Windows are also
exec'd at runtime and cannot live inside asar).
- apps/desktop/electron/main.cjs: update both require() strings to
match the new package name and the new staged path under
resources/native-deps/node-pty/.
- apps/desktop/scripts/stage-native-deps.cjs: point at node_modules/
node-pty. node-pty's prebuilts live under prebuilds/<plat>-<arch>/
(not build/Release/), so update the include glob to copy that dir.
Per-arch staging keeps the resource bundle small (target arch comes
from npm_config_arch when electron-builder cross-builds, else
process.arch). Explicitly enumerate file types in the prebuilds glob
so the ~25 MB of .pdb debug symbols that prebuild-install bundles
for Windows crash analysis don't bloat the installer (29 MB -> 2.6 MB
staged on win32-arm64). Re-assert +x on the darwin spawn-helper
defensively, since a stripped mode bit would manifest as a silent
ENOENT at first pty.spawn().
- apps/desktop/scripts/test-desktop.mjs: update expectedNativeDepPaths()
and its assertion site to look at prebuilds/<plat>-<arch>/ instead of
build/Release/. Add an explicit spawn-helper-exists check on darwin
so a regression in the asarUnpack glob would fail loudly in CI rather
than at first PTY spawn.
Trade-off: Linux end-users lose prebuilts and fall back to building
node-pty from source on `npm install`. Acceptable because Hermes
ships no Linux desktop builds (desktop-release.yml matrix is mac + win
only, package.json declares no `linux` target), and Linux developers
hacking on the desktop already need a C++ toolchain for the rest of
the stack.
Verified on Windows 11 ARM64 (Snapdragon):
npm install -> exit 0
node -e "require('node-pty').spawn(...)" round-trip -> OK
stage-native-deps -> 27 files, 2.6 MB
load from staged tree (simulates packaged fallback) -> ConPTY
round-trip OK
* desktop+gateway: harden Slack socket recovery and Windows restart dedupe (#28873)
* desktop+gateway: harden Slack socket recovery and Windows restart dedupe
Fix Slack Socket Mode reliability by adding a watchdog/reconnect path so silent socket task drops no longer leave the adapter stuck. Harden Windows gateway lifecycle by avoiding desktop-binary path collisions, making gateway PID scans case/extension tolerant, and reusing in-flight restart actions to prevent duplicate gateway spawns.
* test(slack): add Socket Mode watchdog/reconnect behavioural coverage
Drive the new Slack Socket Mode self-healing logic through a fake AsyncSocketModeHandler so we can simulate the P0 silent-hang failure mode (task exit, transport disconnected, intentional shutdown, concurrent reconnect attempts) without touching real Slack.
* fix(slack,desktop): address Copilot review on watchdog races and path normalization
- connect(): explicitly cancel + await the prior socket watchdog before flipping _running, so an old monitor cannot exit between teardown and respawn (Copilot #1)
- _socket_watchdog_loop: wrap the body in try/except + add a done-callback that respawns on unexpected crash, so a transient bug cannot permanently disable self-healing (Copilot #2)
- normalizeExecutablePathForCompare: use the resolved path for realpathSync so non-string inputs cannot leak through (Copilot #3)
- Add tests for crash-recovery and atomic watchdog replacement across reconnects
* fix(slack): tighten connect() error path and clarify watchdog test intent
Address Copilot review round 2.
- connect(): wrap _start_socket_mode_handler/_ensure_socket_watchdog in a focused try/except so any failure rolls back partially-started handler/task state and leaves _running=False, ensuring the platform lock is always released by the outer finally
- Defer _running=True until after the handler is actually started so the watchdog observes a live socket task immediately and never spins against a half-built adapter
- Rename test_watchdog_self_restarts_after_unexpected_crash to test_watchdog_cancellation_does_not_respawn (matches what it actually asserts) and add test_watchdog_unexpected_exit_respawns_via_done_callback that drives a real RuntimeError through _on_socket_watchdog_done and verifies a fresh task replaces the crashed one
* fix(web_server): serialize action spawn check+store under a threading lock
Address Copilot review round 3.
FastAPI runs sync handlers on its threadpool, so two near-simultaneous /api/gateway/restart (or /api/hermes/update) requests could both observe "no live process" in _spawn_hermes_action's poll-based dedupe and double-spawn. Add a module-level _ACTION_SPAWN_LOCK around the entire check + Popen + _ACTION_PROCS store sequence so the dedupe is atomic across threads.
* fix: address Copilot review round 4
- slack.disconnect(): mirror connect()'s defensive cleanup — catch the broad Exception path on watchdog await so handler shutdown and lock release still run if the watchdog raised before cancellation took effect
- web_server._spawn_hermes_action: wrap subprocess.Popen in try/except so a missing executable / permission error closes the log file handle, writes a failure marker, and re-raises instead of leaking a file descriptor
- gateway._scan_gateway_pids: drop the over-broad "hermes.exe --profile" / "hermes.exe -p" patterns that would match any Hermes CLI subcommand using a profile flag (e.g. `hermes.exe --profile foo dashboard`); rely on the "hermes.exe gateway" + "hermes-gateway.exe" tokens instead
- tests: tighten _fake_create_task to assert coroutine input and return a real asyncio.Task that stays pending until pytest teardown, and update the three callsites whose mocked AsyncSocketModeHandler.start_async returned a non-coroutine value
* fix(slack): reset multi-workspace state on reconnect
Address Copilot review round 5.
connect() is reentrant (gateway restart, in-process reconnect), but it was leaving _bot_user_id / _team_clients / _team_bot_user_ids populated from the previous session. A reconnect that rotated the primary token or dropped a workspace would silently keep the stale bot user id and stale workspace client maps, leading to dispatch against gone workspaces.
Clear these three pieces of state right after _stop_socket_mode_handler() and before the auth_test loop, then let the loop repopulate from the current tokens. Add test_reconnect_refreshes_multi_workspace_state to lock it in.
* nix: package apps/desktop as .#desktop (#28964)
Adds nix/desktop.nix building the Electron renderer with buildNpmPackage
and wrapping nixpkgs' electron binary. Reuses .#default by setting
HERMES_DESKTOP_HERMES to its hermes binary, so the desktop's resolver
picks up the fully-wired nix hermes (venv, bundled skills/plugins,
runtime PATH) without reimplementing agent resolution.
- nix/desktop.nix: renderer + electron wrapper
- nix/hermes-agent.nix: finalAttrs form, exposes hermesDesktop in passthru
- nix/packages.nix: exposes .#desktop + adds to fix-lockfiles
- apps/desktop/package-lock.json: standalone hermetic lockfile
nix build .#desktop && nix run .#desktop both clean.
* fix(desktop): probe steps 4 & 5 of resolveHermesBackend before trusting
A user-reported failure on Windows-on-ARM: a pre-installed Python 3.13
on PATH makes findSystemPython() succeed, so resolveHermesBackend
returns a backend pointing at it -- but hermes_cli isn't in that
interpreter's site-packages. The spawn dies with ModuleNotFoundError
and the user sees a dead GUI instead of the first-launch installer.
Same shape can hit step 4 (existing `hermes` on PATH) when a stale
shim survives a partial uninstall.
Add cheap exit-code probes -- `python -c "import hermes_cli"` for
step 5, `<hermes> --version` for step 4 -- and fall through to step 6
(bootstrap-needed) on failure. install.ps1 then runs as if on a clean
box and the venv gets built.
Probes live in a standalone electron/backend-probes.cjs module so they
can be unit-tested with node --test, same pattern as bootstrap-platform.cjs
and hardening.cjs. New test file wired into test:desktop:platforms.
* test(desktop): allow `node-pty` bare-require in packaged entrypoints
Pre-existing failure on bb/gui since c858484b4 swapped the node-pty
fork for upstream microsoft/node-pty 1.1.0. main.cjs intentionally
bare-requires node-pty (it's hoisted by workspace dedup in dev, and
staged to resources/native-deps via scripts/stage-native-deps.cjs +
extraResources for packaged builds, with a try/catch fallback at
line ~38). The allowlist hadn't been updated to match -- same shape
as `electron`, which was already allowed.
* chore(deps): refresh root lockfile for dashboard @nous-research/ui 0.14.0
apps/dashboard/package.json was bumped to @nous-research/ui 0.14.0 (+
flag-icons ^7.5.0, motion ^12.38.0) but the root package-lock.json was
never refreshed. Running `npm install` from the repo root now
materialises 0.14.0's transitive closure (launder, bumps for
@nanostores/react, nanostores, sanitize-html, tailwind-merge).
No code changes; purely a lockfile catch-up so fresh checkouts on bb/gui
get a working dashboard install.
* chore(desktop): bump version to 0.0.1
First non-placeholder version so electron-builder's artifactName template
produces `Hermes-0.0.1-win-x64.exe` instead of the obviously-unreleased
`Hermes-0.0.0-...`. No release process yet; this just stops the artifact
filename from telling users "you got a debug build."
Bumped in three slots that all carry the desktop app's version:
- apps/desktop/package.json (source of truth)
- apps/desktop/package-lock.json (per-app lockfile, kept for CI parity)
- root package-lock.json's apps/desktop workspace entry
Identity-of-build for first-launch bootstrap continues to come from
build/install-stamp.json (commit SHA + builtAt), unchanged.
* fix: fs icon color
* perf(desktop): cut per-keystroke layout + listener churn in chat composer
Empirical work via CDP harnesses under apps/desktop/scripts/ (see
profile-typing-lag.md):
jsListeners growth (per round of 200 chars + GC):
before: +35 (verified leak — listeners stuck after 1st trigger popover use)
after: +0
Four narrow edits in src/app/chat/composer/index.tsx:
1. Drop the per-keystroke `editorRef.current.scrollHeight` read used to
decide composer expansion. Replace with `draft.length > 60` heuristic;
the existing ResizeObserver still catches edge cases. `scrollHeight`
is a forced-layout call and was firing on every char until the first
wrap.
2. Bucket measured composer height to 8px before writing
`--composer-measured-height` / `--composer-surface-measured-height`
on `documentElement`. Without this, the editor grows ~1px per char,
setProperty fires every keystroke, computed style is invalidated tree-
wide.
3. Remove the dead `$composerDraft` two-way sync. Nothing outside the
composer subscribed to that atom (verified via grep). Two useEffects
on `[draft]` were pushing draft→atom and atom→aui per keystroke for
no consumer. Also drop the per-keystroke
`reconcileComposerTerminalSelections` call; it was pruning stale
labels for `terminalContextBlocksFromDraft`, but that helper already
ignores labels not in the current submitted text, so pruning per
keystroke was just bookkeeping.
4. `refreshTrigger` fast-bails when the draft contains neither `@` nor
`/`. Previously `textBeforeCaret(editor)` ran on every input/keyup
regardless; `range.toString()` inside is O(n) over draft length.
Synthetic typing latency p50/p90/p99 is similar before vs after on a
freshly-loaded session (Blink can already handle ~30cps typing into a
contentEditable on its own); the real win is the listener leak being
gone and the global computed-style invalidations dropping ~8× when the
composer is sitting at a fixed height row.
The `Enter → stall` follow-up (see profile-typing-lag.md §"Submit /
TTFT stall") is unmeasured here — needs a throwaway session because
the harness fires a real prompt. Not blocking this commit.
* perf(desktop): cut FadeText forced layouts during streaming
The slowest user-felt path is typing into the composer while the
assistant is streaming. Profile (scripts/profile-under-stream.mjs):
FadeText measureOverflow self time: 35.8 ms → 18.1 ms (-50%)
total active CPU during 7s window: ~150 ms → ~50 ms
Two changes in src/components/ui/fade-text.tsx:
1. Drop the `useEffect([children])` that re-ran `measureOverflow`
(reads scrollWidth + clientWidth — forced layout) on every parent
re-render. `useResizeObserver` already fires the same callback on
mount and whenever the host span's box size changes; that covers
the only case where overflow state can legitimately change. The
previous explicit useEffect was a forced-layout flush on every
parent render, which during streaming meant every token tick.
2. Wrap the component in `memo` with a custom comparator that
short-circuits the entire render when scalar string `children` and
the className/fadeWidth/style props are unchanged. The hot path
was tool-fallback's title chips being re-rendered by parent
streaming updates even though their text was stable; memo+
comparator skips that.
Also adds two harness scripts under apps/desktop/scripts/:
- latency-under-stream.mjs (key→paint latency while a turn streams)
- profile-under-stream.mjs (CPU profile while a turn streams)
Updates profile-typing-lag.md with the streaming numbers and confirms
the Enter→paint submit path is already fast (≤320ms on the populated
session; the 2s "stall after Enter" the user noticed once was a
one-time cold-start, not reproducible at the UI layer).
I'd guess the felt jank in real use is fast-burst typing during a
long-form streaming reply (code blocks + markdown lists multiply the
per-token render cost). The CPU savings here scale linearly with
token volume.
* chore(desktop): drop diag scratch scripts no longer needed
* docs(desktop): correct leak-typing numbers on a real session
Re-ran the leak harness on a populated session (Phaser thread) for both
unpatched and patched builds. The original 'listener leak' was transient
warm-up cost, not a steady-state leak — both versions show 0 listener
growth/round in steady state.
The load-bearing number is forced layouts per character:
unpatched (HEAD~2): 7.02 layouts/char
patched (HEAD): 2.35 layouts/char (3× fewer)
The patches reduce per-char forced-layout work to Blink's natural floor.
Document node count and heap are flat in both builds.
* perf(desktop): fix "Enter jumps up" on long threads
User reported: after pressing Enter on a long thread, the view jumps up
— the just-submitted message disappears below the fold. Confirmed via
apps/desktop/scripts/measure-jump.mjs:
before: distFromBottom 0 → 49.5px, sticks there permanently
after: distFromBottom 0 → ~0 (worst case 4px for one frame)
Root cause in useThreadScrollAnchor (thread-virtualizer.tsx):
1. The sticky-bottom logic disarmed on any scroll event where
`scrollTop < lastTopRef.current`. That check can't distinguish a
user scrolling up from a programmatic `pinToBottom` write that
the browser clamped short of bottom (because content also grew in
the same frame, so `scrollTop = scrollHeight` lands at
`scrollHeight - clientHeight` for the OLD scrollHeight, which is
now below the NEW scrollHeight). Result: sticky-bottom disarmed
permanently on the user's first submit.
2. There was no synchronous pin tied to React's commit phase. By the
time the ResizeObserver fired and re-pinned, the user had already
seen ~50ms of "message below the fold" — visually that reads as the
view jumping up.
Fix:
- `programmaticScrollPendingRef` counter tracks scroll events we
expect to be ours (one per `pinToBottom` write). The scroll handler
skips the disarm check when consuming a pending tick, keeps the
arm bit true, and re-pins synchronously if the browser clamped us
short of bottom. A depth cap (8) breaks runaway loops in
pathological streaming-burst layouts.
- `useLayoutEffect` on `groupCount` increase pins BEFORE the browser
paints, eliminating the visible ~50ms window between optimistic
user-message insert and the RO/scroll-event chain firing.
Verified on the long Cloud Shadows thread (7-8 turns, ~11k px tall):
all three repro runs now hold within 0–4 px of bottom across the
post-Enter transition. Submit latency unchanged (paint 77–107 ms),
streaming-typing latency unchanged.
Also adds three debug harnesses:
- measure-jump.mjs — sample thread scroll across Enter
- probe-thread.mjs — dump current thread / scroll state
- diag-jump.mjs — intercept scrollTop + RO + mutations across Enter
* perf(desktop): rate-limit thread auto-pin during streaming
Follow-up to the Enter-jump fix. The first version did a synchronous
re-pin loop inside the on-scroll handler when the browser clamped our
`scrollTop = scrollHeight` write short of the new bottom; that gave a
tight 4 px visible jump on Enter, but during streaming the
ResizeObserver fires many times per second as content grows, and each
RO callback re-entered the pin loop. CPU profile showed
`Virtualizer.getMaxScrollOffset` climbing to 22 ms self over a typing-
during-streaming window — the sync re-pin path was paying tanstack-
virtual's recompute cost ~3× per token.
Re-architect:
- RO callback coalesces to one pin per animation frame. Streaming-rate
RO bursts now cost the same as a single per-frame pin.
- The on-scroll programmatic-counter guard remains (it's what prevents
the false-disarm bug when the browser clamps a write). It no longer
does sync re-pins; the next RO/rAF will catch up.
- The useLayoutEffect on groupCount (the path that fires on user
submit / new turn arrival) ALSO schedules one rAF pin in addition to
the synchronous pin. This catches the case where React mounts the
new message in a second commit (after our layout effect ran), which
grows scrollHeight again. Two pins instead of a tight loop, paid only
once per turn change.
Net effect on the Cloud Shadows long thread:
enter-jump transient: 12–20 px for 1 frame (was 49 px permanent)
CPU during stream+type: `getMaxScrollOffset` dropped out of top-5
self-time list
typing-during-stream: p50 ~10 ms paint, p99 ~20 ms (1 frame),
occasional 40 ms+ outliers during burst
token arrivals
Also adds scripts/profile-long-stream.mjs: 20-second streaming profile
with per-500ms FPS histogram + content-length tracking, so we can see
whether streaming render cost grows with message length (it doesn't —
sustained 60 fps).
* perf(desktop): use textContent for trigger precondition
Replace composerPlainText() call inside refreshTrigger's no-trigger
fast-bail with a textContent check. textContent is a browser-native
flat traversal; composerPlainText walks recursively with chip-aware
logic. We only need to know if @ or / appears; either way the trigger
char will be in textContent because chips contain @ in their refText.
Profile shows composerPlainText was ~18ms self over a 12s typing-during-
stream window, called from refreshTrigger on every keystroke. Most of
that was the precondition check (the trigger detection path is the
slow path but only runs when a trigger char is present).
* Revert "perf(desktop): use textContent for trigger precondition"
This reverts commit a6a78ff08a.
* Revert "perf(desktop): cut FadeText forced layouts during streaming"
This reverts commit 88e7d7537c.
* Revert "perf(desktop): cut per-keystroke layout + listener churn in chat composer"
This reverts commit bff1b3261d.
* Revert "Revert "perf(desktop): cut per-keystroke layout + listener churn in chat composer""
This reverts commit b7b378e3a4.
* Revert "Revert "perf(desktop): use textContent for trigger precondition""
This reverts commit 0739588f48.
* chore(desktop): synthetic-stream perf harness + scripts
Drops the React `<Profiler>` approach (no-op because Vite is currently
serving the production React build) in favor of an externally-observable
measurement stack: rAF frame intervals, `PerformanceObserver({entryTypes:
['longtask']})`, and a `MutationObserver` on the live streaming message.
Adds a synthetic stream driver — `window.__PERF_DRIVE__.stream({...})` —
that pushes tokens through the live `$messages` atom at a controlled rate,
so the assistant-ui runtime, incremental repository, and Streamdown
markdown pipeline see the same workload they'd see during a real LLM
stream, without the LLM cost.
The driver lives in `src/app/chat/perf-probe.tsx`; `main.tsx` side-imports
it under `import.meta.env.MODE !== 'production'` so it tree-shakes out of
prod builds. (Using `MODE` rather than `DEV` because our Vite setup
currently reports `DEV=false` even under `vite dev` — see the dev-build
note in `profile-typing-lag.md`.)
Scripts:
- measure-synthetic-stream.mjs drive synthetic + record frame/longtask/mutation
- profile-synth-stream.mjs CPU profile + top self-time during synthetic
- measure-real-stream.mjs same harness, real LLM stream
- profile-real-stream.mjs CPU profile bracketing the real stream window
- eval.mjs / reload.mjs small CDP helpers
A real-LLM measurement on Cloud Shadows (gpt-4o-mini, 39 s window) showed
12 longtasks in the same 75-127 ms range the synthetic predicted, so the
synthetic is a faithful proxy.
* perf(desktop): memo FadeText so it skips re-renders when text unchanged
FadeText is used 110+ times inside `tool-fallback.tsx` on a tool-heavy
thread. During streaming each parent re-render previously triggered the
component's `useEffect([children])`, which forced a `scrollWidth` layout
read even when the title text was unchanged. The `useResizeObserver` was
already covering the genuine resize case, so that effect was strictly
redundant work.
Drops the effect and wraps the component in `React.memo` with a custom
comparator that field-compares `className`, `fadeWidth`, and `style`,
plus identity-compares `children` (scalar fast-path; correct for JSX
nodes too since a new node should force a re-render).
Verified via temporary render counter on the 34 MB
`session_20260514_215353_fe0ac8` thread (110 FadeText instances): a
2 s synthetic stream went from ~11k FadeText render calls to 122 —
roughly one render per truly-new instance instead of one per parent
commit per instance.
Doesn't move the longtask needle on its own (Streamdown's markdown
re-parse dwarfs it) but eliminates a steady CPU floor and a class of
forced layouts during streaming. Profile-typing-lag.md documents the
full investigation, including the remaining Streamdown cost as the
real source of the perceived "5 fps moment" hitches.
* perf(desktop): memoize MarkdownText plugins to stop churning Streamdown
The inline `plugins={{ math: mathPlugin, ...(isStreaming ? {} : { code }) }}`
on `<StreamdownTextPrimitive>` constructed a new object literal on every
parent render. That broke `<Streamdown>`'s outer memo and forced its
internal `rehypePlugins` / `remarkPlugins` array useMemos to rebuild,
which propagates a new identity into every `<Block>` and defeats Block's
memoization for stable historical blocks.
After memoizing on `[isStreaming]` (the only real dimension of variance),
CPU profile during a 5 s synthetic stream on the 34 MB session shows
`parser` self-time dropping out of the top 10, `compile` cut roughly in
half, and `bn$1` / `m$1` (micromark internals) leaving the top entries.
Doesn't move the visible longtask count on its own — Streamdown's
per-Block parse cost still dominates whenever the last block's content
changes — but it removes a class of unnecessary re-parses for historical
blocks during streaming. See `scripts/profile-typing-lag.md` for the
full investigation.
* perf(desktop): floor assistant-text flush gap to 33ms for predictable batching
`scheduleDeltaFlush` previously coalesced via `requestAnimationFrame`
only. The "at most one flush per frame" guarantee that gives you is fine
for fast streams (>~80 tok/sec) where multiple tokens arrive within a
single frame, but breaks down at typical LLM token rates (30-80 tok/sec)
where each token arrives slower than the rAF cadence and triggers its
own React commit + Streamdown markdown re-parse.
Track `lastFlushAt` and require at least 33 ms between two flushes.
React 18+ auto-batching probabilistically already collapsed some of
these, but the floor makes it deterministic.
A/B on the 34 MB session, 300 tokens at 50 tok/sec (markdown chunks):
| | avgFps | p99 frame | LTs / 5 s | max LT |
|---|---|---|---|---|
| no floor (current rAF) | 54.0 | 38 ms | 2.0 | 145 ms |
| 33 ms floor (this PR) | 54.3 | 41 ms | 1.7 | 110 ms |
`inter-mutation` p50 also tightens from 22-28 ms to a clean 33 ms,
which is the expected signature of a deterministic floor. Doesn't fully
solve the user's perceived hitches — Streamdown's per-Block parse cost
when the last block grows past ~2 k chars is still the elephant — but
it consistently shaves the worst-case longtask and makes the streaming
cadence visibly steadier.
Also threads a matching `flushMinMs` option through the synthetic
stream driver in `perf-probe.tsx` + `scripts/measure-synthetic-stream.mjs`
so the harness can A/B both regimes without spending LLM credits.
See `scripts/profile-typing-lag.md` for the full investigation.
* perf(desktop): useDeferredValue for streaming markdown so parses don't block input
Streamdown's per-Block parse cost grows with the live tail's length and
is unavoidable inside the block-memo pattern (industry standard, see
findings doc). The fix is to stop having that work block the main thread.
`<DeferStreamingText>` is a 12-line wrapper that reads message-part state
via `useMessagePartText`, runs it through `useDeferredValue`, and
re-publishes via assistant-ui's `<TextMessagePartProvider>`. The inner
`<StreamdownTextPrimitive>` reads the deferred value through the normal
`useMessagePartText` hook — no fork, no internal-path imports, fully on
assistant-ui's public API. React's concurrent scheduler then:
- abandons in-flight deferred renders when a newer token arrives, so
intermediate states get skipped under fast streams
- deprioritises the markdown render when the main thread has urgent
work (typing, scroll), so input stays responsive even while a
100ms parse is queued
Streamdown already uses `useTransition` for its block-array setState;
this lifts the deferral up to the consumer boundary so it covers the
whole pipeline (preprocess → split → repair → parse → render).
A/B on the 34 MB session, 300 tokens at 50 tok/sec, markdown chunks
(four trials each, with the 33ms flush throttle on for both):
| | avgFps | p99 frame | LTs/5s | max LT | typing-while-stream p95 |
|---|---|---|---|---|---|
| pre | 54.3 | 41 ms | 1.7 | 110 ms | ~17 ms |
| post | 58.5 | 31 ms | 2.0 | 117 ms | 14-18 ms |
Longtask count + max LT unchanged — useDeferredValue doesn't reduce
CPU, only its priority. The avgFps lift and p99 frame drop are the
proof that the existing CPU is no longer blocking 60 fps cadence. One
clean run logged MUTATIONS=0 — React skipped every intermediate text
state and only committed the final one (textbook deferred-value
behaviour).
The actually-reduce-CPU path is replacing the parser with a state
machine like Flowdown — left for a future PR; see
`apps/desktop/scripts/profile-typing-lag.md` for the full investigation.
* feat(desktop): add hermes gui launcher
* feat(desktop): launch packaged gui builds by default
* bump gui version to 0.0.2
* fix(dashboard): allow file:// origin on loopback WS + diagnostic logging
Upstream commit 2e66eefbc ("fix(dashboard): validate WebSocket Host
and Origin") added a WebSocket Host/Origin guard to block DNS
rebinding against the dashboard. The guard rejects any Origin whose
scheme is not http/https or whose netloc is empty — which includes
Electron's renderer Origin: file:// when the desktop app loads its
bundle from disk in production mode.
That makes the bb/gui Electron desktop unable to open the gateway
WebSocket against the embedded backend on Windows / macOS prod
builds. The renderer reports "Desktop boot failed" and the backend
logs:
WARNING hermes_cli.web_server: gateway-ws reject
peer=127.0.0.1:NNNN reason=non_loopback_or_bad_origin
bound_host=127.0.0.1 close_code=4403
DNS-rebinding requires a DNS-resolvable hostname; file:// has no
host component and therefore cannot be the attack vector this guard
exists to block. When bound to a loopback interface (127.0.0.1 /
::1 / localhost), accept file:// origins so desktop wrappers can
attach. Non-loopback binds (operator opted into network exposure)
keep rejecting file:// — the loose policy doesn't apply.
Also adds per-reason diagnostic logging in
_ws_host_origin_is_allowed, so future ws-guard rejections name the
specific clause that fired (bad_host / bad_origin_scheme /
origin_host_mismatch) instead of the opaque
"non_loopback_or_bad_origin" surfaced at the call site.
Verified against tests/hermes_cli/test_web_server_host_header.py
(all 11 upstream tests still pass) and hand-tested by opening the
bb/gui Electron desktop dev build against the patched backend.
* fix(tui_gateway): restore _content_display_text helper
Bb/gui had dropped the helper but the orchestrator code merged from main
still calls it (_inflight_text, _message_preview). Re-add the definition
verbatim from main so session.create / _start_inflight_turn don't crash
with NameError on first prompt submit.
* fix(tui-gateway): restore _content_display_text helper lost in main merge
The May 27 merge of origin/main into bb/gui re-introduced two callers of
_content_display_text (in _inflight_text and _history_to_messages) but
dropped the helper definition itself, leaving an unresolved reference.
NameError fires on every user message via _start_inflight_turn ->
_inflight_text, taking down both the TUI and the desktop (which share
this gateway backend) the moment input is dispatched.
Restores the helper verbatim from main (commit 36c99af37) -- pure
structured-content text extractor, no other dependencies.
* fix(telegram): import Set for _dm_topic_chat_ids annotation
self._dm_topic_chat_ids: Set[str] = {...} at line 460 references Set
but only Dict, List, Optional, Any are imported from typing. The file
has no 'from __future__ import annotations', so the annotation is
evaluated at runtime and raises NameError on TelegramAdapter
construction.
* fix(setup): drop shadowing inner importlib.util re-imports
_print_setup_summary and _setup_tts_provider each had 'import
importlib.util' inside a try: block nested deeper in the function
body. Python flips importlib to function-local for the whole scope,
so earlier references in the same function (the neutts branches at
lines 493 / 1109) hit UnboundLocalError before the late import can
run.
The top-of-module 'import importlib.util' at line 14 already covers
both call sites, so dropping the redundant inner imports restores
the intended behavior.
* feat(install.ps1): add -IncludeDesktop switch + Stage-Desktop
The new Hermes-Setup.exe (Tauri bootstrap installer) passes -IncludeDesktop
so users who install via the GUI end up with a launchable Hermes.exe at
apps/desktop/release/<os>-unpacked/. Existing flows are unchanged:
* The 'irm install.ps1 | iex' CLI one-liner omits the flag — terminal
users don't need a prebuilt desktop binary; 'hermes desktop' builds
on demand.
* The Electron desktop's bootstrap-runner.cjs also omits the flag —
rebuilding apps/desktop from inside a running Hermes.exe would try
to overwrite the live binary on disk and fail.
Stage-Desktop runs after Stage-NodeDeps so workspace npm is already
installed when electron-builder fires. It does:
1. 'npm install' at repo root so apps/* workspaces resolve their deps
(Electron itself arrives via npm here, ~150MB)
2. 'npm run pack' in apps/desktop (tsc + vite + electron-builder --dir)
3. Probes apps/desktop/release/{win-unpacked,win-arm64-unpacked}/Hermes.exe
The --dir mode produces an unpacked launchable binary without an NSIS/MSI
installer artifact — we don't need one because Hermes-Setup.exe spawns the
unpacked binary directly via launch_hermes_desktop.
* feat(installer): Tauri bootstrap installer for first-time onboarding
Hermes-Setup.exe is a small signed Rust+Tauri binary that drives
scripts/install.ps1 stage-by-stage with a native UI matching the
desktop's design language. Replaces the chicken-and-egg pattern of
shipping a 200MB Electron app whose first launch existed only to
run install.ps1.
The architecture:
Rust backend (src-tauri/):
bootstrap.rs orchestrator -- Tauri commands, stage iteration
install_script.rs resolve install.ps1 (dev checkout, cache, GitHub raw)
powershell.rs spawn powershell, line-stream stdout/stderr, parse JSON
events.rs BootstrapEvent types -- mirror bootstrap-runner.cjs
paths.rs HERMES_HOME resolution + tracing log setup
build.rs bakes BUILD_PIN_COMMIT / BUILD_PIN_BRANCH from
'git rev-parse HEAD' at compile time
React frontend (src/):
Tauri webview rendering 4 screens (welcome / progress / success /
failure), driven by nanostores subscribing to the Rust event stream.
Visual layer reuses the desktop's styles.css wholesale via @import
so the installer and desktop never drift visually.
Distribution:
targets = ['app', 'dmg', 'appimage'] -- no NSIS/MSI wrapper. The
raw target/release/Hermes-Setup.exe IS the artifact on Windows;
.dmg + .app on macOS; AppImage on Linux. One file, double-click,
no installer-installing-an-installer pattern.
Compile-time pinning:
build.rs reads 'git rev-parse HEAD' and emits
cargo:rustc-env=BUILD_PIN_COMMIT=<sha> + BUILD_PIN_BRANCH=<branch>.
bootstrap.rs's option_env!() picks these up so the binary fetches
install.ps1 from the exact SHA it was tested against. CI / release
builds can override via HERMES_BUILD_PIN_COMMIT env var.
Windows manifest:
hermes-setup.manifest declares level='asInvoker' so the
productName 'Hermes Setup' doesn't trip Windows's installer-
detection heuristic and refuse to launch without elevation.
Also declares PerMonitorV2 DPI + UTF-8 active code page + Common
Controls v6.
Limitations of this initial version:
* No code signing -- Windows SmartScreen will warn once on Hermes-Setup.exe
('More info -> Run anyway'). The downstream binaries it produces
(Hermes.exe in win-unpacked/, the hermes CLI) are locally-built and
therefore don't carry MOTW, so they launch without SmartScreen
intervention. Cert procurement tracked separately.
* macOS and Linux build paths defined but untested -- Windows-only V1.
* fix(installer): pass -IncludeDesktop to manifest, surface launch errors, alias hermes desktop
Three bugs found in the first VM end-to-end test:
1. install.ps1 -Manifest was called WITHOUT -IncludeDesktop, so the
manifest came back with the 14-stage list (no desktop stage), the
UI showed '14 steps' and Stage-Desktop never ran. Pass the flag to
both the manifest fetch and the per-stage runs — install.ps1 gates
the desktop stage's inclusion on the flag.
2. The Success screen's Launch button silently swallowed the Tauri
error when no Hermes.exe existed (e.g. Stage-Desktop was skipped).
Wire the error through to inline UI with an alert callout, so the
user gets actionable text ('Hermes.exe missing, run hermes desktop
from a terminal') instead of an unresponsive button.
3. The Success screen tells users to run 'hermes desktop' from a
terminal but the CLI only accepted 'hermes gui' — invalid choice
for 'desktop'. Rename the subcommand canonically to 'desktop' with
'gui' as a backwards-compatible alias. Update the _SUBCOMMANDS sets
used by session-flag arg parsing + logging-mode probe so both names
route to the same logic.
* fix(install.ps1): pre-warm electron-builder winCodeSign cache + fix Stage-Desktop $HasNode false-skip
Two bugs caught in the second VM end-to-end run:
1. electron-builder's winCodeSign extraction fails on grandma-class
Windows boxes because the .7z archive contains macOS symlinks
(darwin/10.12/lib/libcrypto.dylib and libssl.dylib pointing at
versioned siblings). Creating symlinks on Windows requires
SeCreateSymbolicLinkPrivilege, a per-user right that non-admin
accounts don't have on stock Windows. Result: every fresh install
on a non-admin user fails Stage-Desktop with a 7-Zip 'cannot create
symbolic link' error, retried four times, then bails.
Fix: Initialize-ElectronBuilderCache pre-extracts winCodeSign-2.6.0.7z
ourselves with -snl (don't preserve symlinks, store as resolved file
content) AND -x!darwin (skip the entire macOS subtree — irrelevant
on Windows). Writes to electron-builder's expected cache dir before
electron-builder gets a chance to try its own broken extraction.
Idempotent — fast-paths via signtool.exe sentinel check.
2. Install-Desktop's first guard was 'if (-not $HasNode) skip'.
$HasNode is set by Stage-Node into $script:HasNode, but in
cross-process driver mode (each -Stage NAME is a fresh powershell.exe
spawned by Hermes-Setup.exe), that script-scope variable from the
PREVIOUS process is invisible — so the guard always fired and
Install-Desktop returned in 900ms with a misleading
'Node.js not available' reason. The real npm probe below it never
got to run. Fix: re-probe npm directly via Get-Command when $HasNode
is empty/false, since by that point Stage-Node has already verified
Node is installed and the only question is whether *this* process
can see it on PATH (it can — installer-wide PATH update from Stage-Node).
* fix(install.ps1): tell electron-builder we're NOT signing instead of pre-extracting winCodeSign
The previous commit (c7e46f9f3) worked around the winCodeSign-symlinks-
on-Windows extraction crash by pre-extracting the archive ourselves with
-snl + -x!darwin. That fix was correct but addressed the wrong layer.
The deeper question: why was electron-builder fetching winCodeSign at all
when we have no signing cert configured? Answer: electron-builder
unconditionally pre-warms the toolchain assuming any build MIGHT sign.
The cert auto-discovery never finds anything (we never set CSC_LINK
or anything else), so the signing never happens — but the 100MB fetch
of winCodeSign and its broken-on-Windows symlink extraction does.
Set CSC_IDENTITY_AUTO_DISCOVERY=false (with WIN_CSC_LINK and
WIN_CSC_KEY_PASSWORD also explicitly cleared as belt-and-suspenders)
before invoking npm run pack, and electron-builder skips the entire
winCodeSign apparatus. No download, no extraction, no privilege check.
Env vars are saved/restored around the invocation so we don't leak
the override into Stage-PlatformSdks etc.
Net: removes the 100-line Initialize-ElectronBuilderCache helper that
manually downloaded + extracted winCodeSign-2.6.0.7z. Replaced with
3 env-var assignments. The produced Hermes.exe is functionally
identical — just no longer carries a code-signing-machinery dependency
we never used.
* fix(installer): bump bootstrap-installer.log to capture stage transitions + every install.ps1 line
Diagnosing the second VM failure was impossible because bootstrap-installer.log
contained only the 'starting' banner. Two causes:
1. emit_log() inside run_bootstrap() was tracing::debug! — dropped on the
floor under the default INFO env-filter.
2. The per-stage sink callbacks (on_stdout_line / on_stderr_line) only
emitted Tauri events to the frontend; they never tee'd to the log file
at all. When the failure route mounts, the Tauri event stream is the
only place the script output lived, and it gets discarded.
3. The Failed / Stage / Manifest / Complete lifecycle frames in emit_event()
were also Tauri-only — so even the 'which stage failed' frame never
reached the log.
Fixes:
* emit_log() → tracing::info!
* Sink callbacks tee stdout to info!, stderr to warn!, with stage label
as a structured field for grep'ability
* emit_event() now matches on the variant and logs each lifecycle frame
at the right level: Failed → tracing::error!, others → info!
Result: a failing install leaves a complete forensic trail in
bootstrap-installer.log — manifest stage list, every install.ps1
stdout/stderr line tagged by stage, the stage transitions, and the
final error. Same path as before so nothing the user does changes.
* fix(install.ps1): Stage-NodeDeps cross-process $HasNode + stream npm install output to bootstrap log
VM run 3 diagnosis: node-deps stage skipped on the VM (logged
'Skipping Node.js dependencies (Node not installed)') and then
desktop's npm install failed with exit 1 and zero diagnostic detail.
Two root causes:
1. $HasNode false-skip in Stage-NodeDeps — same cross-process bug
pattern we fixed for Stage-Desktop in c7e46f9f3. Stage-Node ran
in process A and set $script:HasNode = $true, then exited. Stage-
NodeDeps ran in fresh process B (Hermes-Setup.exe -Stage NAME
spawns each stage independently), where that variable doesn't
exist. Re-probe via Get-Command npm instead of trusting the
stale script-scope global. The previous stage already verified
Node so the re-probe succeeds.
2. npm install --silent + Tee to TEMP file hid the real error.
When the workspace install failed on the VM, the actual reason
was buffered in $env:TEMP\hermes-npm-desktop-install-*.log and
the user saw only 'exit 1'. Drop --silent so npm streams its
full output, drop the TEMP-file dance — the Tauri installer's
streaming sink already tees every stdout/stderr line to the
rolling bootstrap-installer.log, so a side log file is dead
weight that hides the very error we need.
After this, the bootstrap log on a failure will contain npm's full
output (deprecation warnings, ETARGET, native-module compile errors,
whatever) tagged with stage=desktop, making the actual cause
diagnosable instead of an opaque exit code.
* fix(install.ps1): restore Initialize-ElectronBuilderCache (CSC env vars alone aren't enough)
VM run 4 diagnosis: even with CSC_IDENTITY_AUTO_DISCOVERY=false set,
electron-builder still fetches winCodeSign and signs bundled binaries.
The log shows the signing happens BEFORE the cache extraction:
• signing with signtool.exe ...\winpty-agent.exe
• signing with signtool.exe ...\OpenConsole.exe
• downloading winCodeSign-2.6.0.7z
• <symlink privilege error>
Cause: node-pty's bundled prebuilds are listed in apps/desktop's
asarUnpack ['**/*.node', '**/prebuilds/**']. electron-builder
re-signs anything unpacked from asar, regardless of whether OUR
binary gets signed. The signtool invocation needs winCodeSign on
disk, which needs the .7z extracted, which hits the macOS-symlink
crash on non-admin Windows.
The CSC env vars I added in d5fe46727 only kill IDENTITY DISCOVERY
(so OUR Hermes.exe stays unsigned, which is fine — we have no cert).
They don't prevent the toolchain fetch for the bundled-prebuild
re-sign. I removed the pre-extract in d5fe46727 thinking the env
vars subsumed it; that was wrong. Both are needed.
Restoring Initialize-ElectronBuilderCache verbatim from c7e46f9f3
and keeping the CSC env vars. Wrote a clearer doc-comment at the
call site explaining the two-knob interaction so future maintainers
don't drop one half again.
* fix(desktop): disable signtool via signtoolOptions.sign=null, drop dead winCodeSign pre-extract
VM run 5 diagnosis: the pre-extract from 3b29e65c1 ran (extracted 83
files, 24MB) but produced ZERO files at the expected sentinel path
'/winCodeSign-2.6.0/windows-10/x64/signtool.exe'.
Cause: the .7z archive's root entries are 'windows-10/', 'darwin/',
'linux/', etc. — not 'winCodeSign-2.6.0/<arch>'. Extracting with
'-o$cacheRoot' put files at $cacheRoot/windows-10/..., NOT at
$cacheRoot/winCodeSign-2.6.0/windows-10/.... I had the directory
nesting wrong from the start.
And then we observed: electron-builder downloads winCodeSign-2.6.0.7z
under a random numeric filename ('384387955.7z') regardless of what's
already extracted in the parent dir. The cache key isn't the dirname;
it's content-addressed. So the pre-extract approach was doomed even
if the path nesting had been right.
Actual fix: signtoolOptions.sign=null in apps/desktop/package.json's
win build config. electron-builder honors this and skips the bundled-
prebuild signing entirely — no signtool invocation, no winCodeSign
fetch, no symlink-privilege crash. The previous failures all stemmed
from electron-builder pre-signing node-pty's bundled .exes
(winpty-agent.exe, OpenConsole.exe) which are already author-signed
upstream; re-signing with our nonexistent cert was overwriting good
sigs with nothing useful anyway.
Cost: when we DO get a real cert later, we'll add it back with the
sign function pointing at the cert chain. Until then, all-null is
the correct config and unblocks every non-admin Windows user.
Removed Initialize-ElectronBuilderCache (the dead pre-extract).
Removed the call site. Kept the CSC_IDENTITY_AUTO_DISCOVERY env
vars as belt-and-suspenders against a future electron-builder
change that might revive cert auto-discovery.
* fix(desktop): use no-op sign function instead of sign=null
VM run 6 still hit the symlink crash even with signtoolOptions.sign=null.
electron-builder 26.8.1 treats null as 'use the default signtool path'
rather than 'skip signing', so the winCodeSign fetch + extraction still
fired for the bundled prebuild re-sign.
The Electron docs (electronjs.org/docs/latest/tutorial/code-signing)
make it clear signing is OPTIONAL and unsigned apps work fine — users
just see SmartScreen on first launch. The electron-builder mechanism
for 'don't actually sign anything' is to supply a custom sign function
(via signtoolOptions.sign: '<path-to-cjs-module>') that resolves
without invoking signtool.
build-noop-sign.cjs is that module — a 5-line async function that
returns undefined. electron-builder calls it for every binary it would
have signed, gets back a resolved promise, and considers each binary
'signed.' No signtool spawn, no winCodeSign fetch, no symlink crash.
When Nous's cert arrives, replace this file with a real signing hook
(@electron/windows-sign-based or a direct signtool invocation). The
architecture's signing-ready and the cutover is a one-file edit.
* fix(desktop): signAndEditExecutable=false to skip signtool path entirely
After reading app-builder-lib/winPackager.js line 216 + 231 directly:
signAndEditExecutable is the ACTUAL hardcoded gate that short-circuits
both signApp() (which signs Hermes.exe + every shouldSignFile match
including bundled prebuilds) AND createTransformerForExtraFiles().
None of signtoolOptions.sign / sign:null / sign:<custom-fn> gate the
winCodeSign download — that happens before they're consulted.
What we lose: rcedit also runs through signAndEditResources, so
disabling this drops PE metadata (file properties showing 'Hermes' /
'Nous Research' / file description). Cost is real but bounded:
* Hermes.exe filename, icon, asar contents, app identity intact
* Task Manager shows 'Hermes.exe' (the filename) not 'Hermes' (PE
description) — minor downgrade
* Start menu, taskbar, window title all work normally
* SmartScreen will warn once (unsigned, same as before)
When the cert lands, flip signAndEditExecutable back to default true,
both signing AND rcedit return, PE metadata is restored.
Removes the no-op sign function (build-noop-sign.cjs) since
signAndEditExecutable=false prevents signtool from being invoked at
all — the custom hook never gets called either.
* feat(install.ps1): write .hermes-bootstrap-complete marker at end of install
The desktop app's main.cjs resolver ladder has a 'bootstrap-needed' rung
that fires when .hermes-bootstrap-complete is missing from
ACTIVE_HERMES_ROOT. Pre-Hermes-Setup, this marker was written by the
packaged-desktop's own bootstrap-runner.cjs at the end of its install
flow. Now that Hermes-Setup.exe runs install.ps1 directly, install.ps1
needs to own the marker — otherwise the desktop sees no marker on first
launch and triggers its legacy first-launch bootstrap (re-running
install.ps1 from inside Electron, the exact recursion Hermes-Setup.exe
was supposed to obviate).
Implementation:
* New Stage-BootstrapMarker (worker) → Write-BootstrapMarker (helper)
* Slotted in the manifest right after platform-sdks, before the
interactive configure/gateway stages, so it runs unconditionally
when the install reaches the finalize phase
* Schema mirrors apps/desktop/electron/main.cjs writeBootstrapMarker /
isBootstrapComplete EXACTLY: {schemaVersion: 1, pinnedCommit,
pinnedBranch, completedAt}. Schema version stays at 1 so old
desktops that read marker files written by future install.ps1s
can still parse them.
* pinnedCommit comes from -Commit flag (Hermes-Setup.exe passes it)
or falls back to 'git rev-parse HEAD' in InstallDir
* pinnedBranch from -Branch flag, defaults to 'main' matching
install.ps1's own param default
Two PS-5.1 gotchas baked into comments:
* The ?. null-conditional operator doesn't exist pre-PS7; use
explicit if-checks on Get-Command results
* Set-Content -Encoding UTF8 emits a BOM in 5.1 and Node's plain
JSON.parse rejects BOM — write via .NET's UTF8Encoding(false)
to produce BOM-less JSON the desktop's readJson() can parse
* feat(installer): drive in-app updates through the Tauri installer
Converge update on the same principle as bootstrap: one driver owns all
repo mutation. The desktop becomes a pure consumer that hands off to
Hermes-Setup.exe --update instead of re-implementing git/pip in Electron.
- hermes desktop --build-only: build without launching, so the installer
owns the post-update launch (CLI keeps build logic single-sourced).
- Installer AppMode {Install,Update} from argv; get_mode exposed to the UI.
- Installer self-copies to HERMES_HOME/hermes-setup.exe on install success
(no-op guard during --update re-invocation to avoid the locked-exe copy).
- Installer --update flow (update.rs): wait for the desktop to release the
venv shim, run 'hermes update --yes --gateway' (branch on exit 0/2/other),
then 'hermes desktop --build-only', then launch the rebuilt desktop. Reuses
the bootstrap event channel + progress UI via a synthetic two-stage manifest.
- Desktop applyUpdates() gutted (~105 lines of git/stash/pull/pyproject/pip
removed) -> thin handoff: spawn updater, app.quit() to free the shim.
Detection (checkUpdates, commit changelog, behind-count) kept intact.
- install.ps1 creates Start Menu + Desktop shortcuts to the packed Hermes.exe
(never bare 'hermes desktop', which would rebuild every launch).
* test update
* fix(installer): pass --branch to hermes update in the --update flow
The install is a detached-HEAD checkout of a pinned commit. Without
--branch, 'hermes update' fell back to its default (main) and switched
the checkout to main — a divergent branch that lacks the desktop CLI
command — so the update targeted the wrong branch and the rebuild stage
failed with 'invalid choice: desktop'.
Thread BUILD_PIN_BRANCH (the branch this installer was built against,
and the same branch the desktop detected the update on) into
'hermes update --branch <b>' so update + rebuild stay on-branch.
* test update
* fix(installer): stamp Hermes icon onto Hermes.exe via rcedit (no winCodeSign)
The unpacked Hermes.exe showed the stock Electron icon + name in the
taskbar because build.win.signAndEditExecutable=false disables BOTH
electron-builder's signing AND its rcedit metadata/icon stamping. That
flag is load-bearing: enabling it re-triggers signtool -> winCodeSign,
whose macOS symlinks crash 7-Zip on non-admin Windows (unfixable dead end).
Decouple identity-stamping from signing entirely: after npm run pack,
run rcedit ourselves on the produced exe.
- Add rcedit as a direct devDependency of apps/desktop (the transitive
electron-winstaller copy is fragile).
- apps/desktop/scripts/set-exe-identity.cjs: Node helper that calls
rcedit's named export to set icon + ProductName/FileDescription/
CompanyName. Node builds argv natively — avoids the PowerShell->exe
->JSON double-escaping that broke the app-builder rcedit path.
- install.ps1 Set-DesktopExeIdentity invokes the script after the build,
before shortcuts. Best-effort: failure keeps the stock icon, never
fails the install. rcedit is a pure PE editor — no signtool, no
winCodeSign, no symlinks.
Verified locally: stamping a copy of the built Hermes.exe embeds the
32x32 icon and sets ProductName=Hermes.
Also fix update-path success-screen flash: in update mode the installer
hands off + exits in ~600ms, so don't route to the 'launch Hermes'
success view (it flashed before the window closed).
* update test
* fix(desktop): show 'hermes update' guidance for CLI installs instead of dead-end error
A user who installed via the CLI (irm|iex / install.sh) then ran
`hermes desktop` has no staged hermes-setup.exe, so clicking Update
in-app hit resolveUpdaterBinary()=null and showed a misleading error
('re-run the Hermes installer') with a Try-again button that could
never succeed — a dead loop for a perfectly valid install.
Treat the no-updater case as an intentional outcome, not a failure:
- main.cjs applyUpdates returns { ok:true, manual:true, command:'hermes update' }
(no throw, no 'error' stage) when no updater binary exists.
- New 'manual' update stage + apply-state.command thread the command to the UI.
- updates-overlay ManualView: a polished terminal-native card with the
exact command and a copy button, framed as the correct path for a CLI
user rather than an error.
GUI-installer users are unaffected — hermes-setup.exe present => seamless
auto-update runs as before. Zero new process orchestration; can't fail
the update demo.
* update test
* fix(gui): pin /api/hermes/update to the current branch
The desktop command-center 'update' action hits POST /api/hermes/update,
which spawned bare `hermes update` with no --branch. cmd_update then
falls back to its default (main) and checks the working tree OUT of the
tracked branch — a bb/gui install silently jumped to main and lost the
desktop CLI.
Resolve the checkout's current branch and pass --branch <current> from
this endpoint only. The engine default (main) is DELIBERATELY unchanged:
bare `hermes update` from a terminal, the gateway /update bot command,
and the CLI/TUI relaunch path all keep their long-standing 'update against
main' contract for the existing user base. Only the GUI button is scoped
to update-the-branch-you're-on. Detached HEAD / git failure falls back to
the bare default.
* update test
* fix(desktop): branch-pin the CLI manual-update command card
The 'Update from your terminal' card (shown to CLI installs with no staged
updater) hardcoded bare `hermes update` — which defaults to main and would
switch a bb/gui (or any non-main) checkout off-branch. Same bug we fixed for
the GUI button, leaked into the card's copy text.
Resolve the checkout's current branch and show `hermes update --branch
<current>` for non-main checkouts; keep it bare for main so the card stays
clean. Best-effort: bare fallback if branch detection fails. Matches the
GUI button + installer --update contract; bare terminal/bot/TUI update
paths still default to main, unchanged.
* docs: phragg was here
* feat(desktop): lead onboarding with Nous Portal + fix fresh-install detection (#34970)
- Feature Nous Portal as the primary onboarding card (Recommended tag,
app logo, single pitch line); collapse other OAuth providers behind an
"Other providers" disclosure whose open/closed state persists.
- Surface OpenRouter as a one-click API-key option inside the disclosure;
move "I have an API key" to a quiet bottom-right link.
- Treat "no provider configured" as a normal onboarding state, not a red
error banner (provider-setup-errors copy match).
- Fix setup.runtime_check: it reported ready when the resolved runtime had
an empty credential or only implicit Bedrock/IAM, so fresh installs never
saw onboarding. Now requires a usable credential.
- Auto-wire Windows fonts for WSL2 users so the renderer renders real
Segoe UI instead of the DejaVu fallback; make WSL detection env-independent
via the /proc kernel marker.
* feat(desktop): live elapsed timer on install bootstrap steps
The first-launch install overlay showed a static "Installing" with no
motion, so long steps (notably the repo clone) looked frozen. Stamp each
stage's start time on the running transition and tick once a second so the
active step shows live elapsed (e.g. "Installing · 1:23"), plus elapsed on
the overall current-step line. Completed steps keep their final duration.
* fix(desktop): resolve PortableGit for update checks + reserve titlebar tools space
- runGit() hardcoded spawn('git'), which ENOENTs on fresh installer-driven
Windows installs (git is PortableGit under %LOCALAPPDATA%\hermes\git, never
on PATH) — so "Check for updates" failed with "Couldn't check for updates".
Add resolveGitBinary() mirroring findGitBash (PortableGit → Git-for-Windows
→ PATH) and use it in runGit.
- PageSearchShell rendered a full-width search input in the titlebar row, so
on Windows its right edge slid under the fixed top-right tools + native
window controls. Reserve that footprint via --titlebar-tools-* vars.
* fix(desktop): stop streaming caret from shifting layout on completion
The streaming caret (::after on the running message's last child) was an
in-flow inline-block adding ~0.78em of inline width, which could wrap the
last line mid-stream; when the caret is removed on completion the line
un-wraps and reflows — the visible post-response layout shift. Net-zero its
inline advance with a compensating negative margin so it paints at the text
end without consuming layout width.
* fix(desktop): stop completed-message layout shift while streaming
The assistant message action bar used `hideWhenRunning`, which unmounts it
whenever the thread is streaming. Since the bar reserves vertical space in
each completed assistant message's footer (it's invisible-until-hover via
opacity, not via mount), unmounting it collapsed every prior turn by the
bar's height — then remounting on resolve grew them back, shifting the whole
conversation (visible as "padding appears above the last user message").
Drop hideWhenRunning so the footer height is constant; the bar stays
invisible during streaming via its existing opacity/pointer-events gating.
* fix(merge): keep windows-footgun suppressions inline
* fix(merge): keep remaining gateway footgun suppressions inline
* fix(merge): restore contracts caught by main-target CI
* fix(dashboard): honor injected HERMES_DASHBOARD_SESSION_TOKEN
The desktop shell mints a session token and signs its /api + /api/ws
calls with it via HERMES_DASHBOARD_SESSION_TOKEN, but the main-merge
restored a web_server.py that ignored the env var and minted its own
random _SESSION_TOKEN -- so every desktop request 401'd and the UI
reported "gateway offline". Read the injected token (fall back to a
fresh random one) so loopback HTTP + WS auth line up.
Adds a regression test so a future merge can't silently drop the read.
* fix(desktop): align fresh-install home so upgraders don't brick
Two related first-launch bugs on machines with a legacy ~/.hermes:
- install.ps1 hardcoded $HermesHome/$InstallDir to %LOCALAPPDATA%\hermes
and ignored the HERMES_HOME the desktop passes through. The desktop
freezes HERMES_HOME at module load and prefers a legacy ~/.hermes when
%LOCALAPPDATA%\hermes is absent, so the installer wrote to a different
home than the shell read -> "Could not connect to Hermes gateway". Honor
$env:HERMES_HOME in the param defaults.
- isBootstrapComplete() trusted the marker + checkout without verifying a
runnable venv, so an interrupted/split install spawned a dead backend
instead of re-bootstrapping. Also require the venv python to exist.
* fix(dashboard): allow packaged desktop file:// origin on loopback WS
The packaged Electron desktop loads its renderer over file://, so its
/api/ws handshake carries Origin: file:// (or null). The DNS-rebinding
WebSocket Origin guard only accepted http(s) origins matching the bound
host, so it rejected the desktop's own renderer with 4403 -> "Could not
connect to Hermes gateway" on macOS.
A browser DNS-rebinding attacker can only ever present an http(s) origin
(the site hosting the malicious page); it cannot forge file://, null, or
a custom app scheme AND hold the loopback session token. So on loopback
binds we now trust non-web origins -- the token in _ws_auth_ok remains
the real authenticator. Public/gated binds still reject them, and
cross-site http(s) origins are still rejected everywhere.
* fix(desktop): resolve renderer assets relative to BASE_URL
Absolute public asset paths (/apple-touch-icon.png, /ds-assets/...) work
under the dev server but break in the packaged app, where the renderer is
loaded from file://.../index.html and a leading slash resolves to the
filesystem root -> broken onboarding provider icon and backdrop image on
macOS. Prefix these with import.meta.env.BASE_URL so they resolve next to
the bundled index.html in both dev and packaged builds.
* feat(desktop): automate first-launch bootstrap on macOS/Linux
Previously a packaged macOS/Linux app with no Hermes install hit a
dead-end ("first-launch install is not yet automated -- run install.sh
manually") because install.sh lacked the staged protocol install.ps1
exposes. Now both platforms bootstrap on first launch with the same
structured, per-step progress UI as Windows.
- install.sh: add --manifest / --stage / --json / --non-interactive plus
a stage dispatcher (prerequisites, repository, venv, python-deps,
node-deps, path, config, setup, gateway, complete). User-input stages
(setup, gateway) are skipped under --non-interactive; the in-app
onboarding overlay owns API keys/model, matching the Windows flow.
Each stage runs inside the install dir (its own process) and a new
--commit flag pins the checkout to the build-stamp SHA.
- bootstrap-runner.cjs: drive the staged manifest/stage/JSON protocol for
both install.ps1 (PowerShell) and install.sh (bash), selected by
installer kind; removed the single-blob POSIX shim.
- main.cjs: drop the macOS/Linux unsupported-platform dead-end so the
bootstrap-needed path runs the installer on every platform.
* fix(dashboard): return 404 JSON for unmatched /api paths instead of SPA HTML
The SPA catch-all (serve_spa) served index.html for any unmatched GET,
including unregistered /api/* endpoints. A missing API route therefore
came back as <!doctype html> with status 200, and JSON clients (the
desktop app's fetchJson) crashed with an opaque
'SyntaxError: Unexpected token <' instead of a clear error.
- web_server.py: unmatched /api or /api/... now returns 404 JSON
('No such API endpoint'); non-api paths still serve the SPA for
client-side routing.
- main.cjs fetchJson: detect an HTML body / text/html content-type on a
2xx response and reject with a clear message naming the URL, rather
than a raw JSON.parse SyntaxError. Empty bodies resolve to null;
malformed JSON reports the URL plus a snippet.
* say 'OS appearance' instead of 'macOS appearance'
* feat(install): add --include-desktop stage + PowerShell-style flags to install.sh
Brings install.sh to parity with install.ps1's bootstrap surface so the
shared Rust/Tauri bootstrapper (apps/bootstrap-installer) can drive a
macOS/Linux install the same way it drives Windows.
- Accept the PowerShell-style aliases the bootstrapper emits to both
installers: -Commit / -Branch (alongside existing -Manifest / -Stage /
-Json / -NonInteractive).
- Add --include-desktop / -IncludeDesktop. When set, the manifest gains a
'desktop' stage (immediately before 'complete'), and a new install_desktop
runs a root workspace `npm install` + `npm run pack` (electron-builder
--dir, signing auto-discovery disabled) to produce release/mac*/Hermes.app
-- mirroring install.ps1's Install-Desktop / Stage-Desktop.
- The flag is opt-in, exactly like Windows: the signed bootstrap installer
passes it; the Electron app's own first-launch bootstrap and the CLI
one-liner omit it (building the desktop from inside the running app would
clobber it).
* fix: tts endpoints
* macOS desktop: install + in-app self-update (#35607)
* fix(installer): align macOS HERMES_HOME with the rest of the stack
paths.rs computed the macOS Hermes home as ~/Library/Application Support/
hermes, but nothing else does: hermes_constants.get_hermes_home() (Python),
scripts/install.sh, and the Electron desktop's resolveHermesHome() all use
~/.hermes on macOS. The drift meant the Tauri installer wrote the install to
one directory and the desktop looked for it in another, so a fresh GUI
install never found its backend (the file's own comment warned this exact
drift would break things). Use ~/.hermes on macOS to match.
* fix(install.sh): always emit a stage result frame on failure
Stage helpers (clone_repo, install_deps, check_python, …) were written for
the monolithic flow and call `exit 1` on failure. Under `--stage`, that
terminated the process before the JSON result frame was printed, so the
installer's parse_stage_result saw "no frame" instead of a clean
{ok:false,...} contract response. Run the stage body in a subshell so an
`exit` only unwinds the subshell and the parent still emits the frame.
* feat(install.sh): auto-provision git on macOS/Linux (parity with install.ps1)
install.ps1 downloads PortableGit on Windows, but install.sh just printed a
"please install git" hint and exited — so a fresh Mac with no developer tools
(no Xcode CLT → no git) couldn't get past the clone step. check_git now tries
to install git before bailing:
- macOS: Homebrew if present (headless), else `xcode-select --install`
(the CLT prompt also provides the compiler some wheels need), polling for
git to appear.
- Linux: apt/dnf/pacman via sudo when available.
Falls back to the manual instructions only if auto-provision fails.
* feat(desktop): in-app GUI+backend self-update on macOS/Linux
On Windows the staged Hermes-Setup binary drives updates (quit → hermes
update → hermes desktop --build-only → relaunch). The mac drag-install has no
such binary, so "Update now" previously just printed `hermes update`.
Since there's no venv-shim file lock on POSIX, the desktop can drive the whole
update itself. applyUpdates now, when no staged updater exists on mac/linux:
1. runs `hermes update --yes [--branch <current>]` (backend git pull + deps),
2. runs `hermes desktop --build-only` (OS-aware GUI rebuild) with the
Hermes-managed Node + venv on PATH,
3. spawns a detached swapper that waits for this process to exit, dittos the
freshly built Hermes.app over the running bundle, clears quarantine, and
relaunches.
Degrades to "backend updated — restart to load the new GUI" if the rebuild
fails or there's no .app bundle to swap (dev run, Linux AppImage).
* chore: uptick
* chore: uptick
* chore: linux build
* fix(install): detect xcode-select git stub on fresh macOS
* chore: bump
* fix(desktop): repair voice dictation on Windows
Voice dictation was broken on Windows in two ways:
1. Mic access was denied. The Electron permission request handler only
granted 'media' requests whose details.mediaTypes included 'audio',
but Chromium on Windows frequently fires the mic request with an empty
mediaTypes array, so getUserMedia threw NotAllowedError. The handler
now grants audio-capture when mediaTypes includes 'audio' OR is
empty/absent, handles the 'audioCapture' permission name, and adds a
setPermissionCheckHandler (the synchronous path Chromium also consults
for getUserMedia on Windows). Video is still denied.
2. Transcripts went nowhere. The composer's insertText handler (used by
dictation and other inserts) only updated the assistant-ui composer
store via setText, never the contentEditable editor DOM. The
draft->editor sync effect only re-renders the editor when it is NOT
focused, and dictation runs while the editor has/regains focus, so the
transcript was stored but never shown and could not be sent. insertText
now renders into the editor DOM and places the caret, mirroring
appendExternalText.
Also hardens fetchJson: a 2xx response with an HTML body (or text/html
content-type) now rejects with a clear message naming the URL instead of
an opaque JSON.parse 'Unexpected token <' error.
* feat(desktop): route Nous subscribers onto the Tool Gateway from the GUI
When the GUI sets the main provider to Nous via POST /api/model/set, call
the same apply_nous_managed_defaults the CLI uses after model selection, so
GUI/onboarding users land on the Nous Tool Gateway the same way CLI users do
— no separate prompt, no duplicated logic.
Purely additive: apply_nous_managed_defaults skips any tool where the user
has a direct key (FIRECRAWL_API_KEY, FAL_KEY, etc.) or explicit config, so it
never overwrites a user's own setup. Only unconfigured tools get routed.
- web_server.py: in set_model_assignment (scope=main, provider=nous), resolve
enabled toolsets and apply managed defaults; guarded so a Portal hiccup never
blocks saving the model. Returns routed tools as gateway_tools.
- onboarding.ts: surface a 'Tool Gateway enabled' toast listing routed tools.
- types/hermes.ts: add gateway_tools to ModelAssignmentResponse.
- tests: cover nous-applies, non-nous-skips, and failure-doesnt-block-save.
* feat(desktop): mirror hermes model free/paid curation in GUI onboarding
GUI onboarding picked models[0] from /api/model/options, which ignores the
Nous free/paid tier — a free user could land on a paid default (e.g.
anthropic/claude-opus-4). Now the recommended default mirrors what `hermes
model` does.
- web_server.py: new GET /api/model/recommended-default?provider=<slug>. For
Nous it runs the same curation as the CLI (get_curated_nous_model_ids +
pricing + check_nous_free_tier + union_with_portal_{free,paid}_recommendations
+ partition_nous_models_by_tier) so free users get a free model and paid users
get the curated default. Other providers fall back to the first curated model.
Never 500s — returns empty model on error so onboarding degrades gracefully.
- hermes.ts: getRecommendedDefaultModel client + RecommendedDefaultModel type.
- onboarding.ts: fetchProviderDefaultModel prefers the recommended endpoint,
falls back to models[0] when unavailable.
- tests: free-tier picks free model, paid-tier picks curated default, failure
returns empty without 500.
* feat(desktop): show model pricing + free/paid tier gating in GUI picker
The CLI `hermes model` picker shows per-model $/Mtok pricing and gates paid
models on free Nous accounts. The GUI picker showed bare model names. Bring it
to parity across both the model-picker dialog and onboarding confirm card.
Backend:
- inventory.build_models_payload gains a pricing=True flag → _apply_pricing
enriches each provider row with formatted per-model pricing
({input,output,cache,free}) via the same _format_price_per_mtok the CLI uses,
and for Nous adds free_tier + unavailable_models (paid models a free user
can't select) via check_nous_free_tier + partition_nous_models_by_tier.
Best-effort: any pricing/tier failure is swallowed and fails open (no gating).
- /api/model/options and TUI model.options now pass pricing=True so the
global picker and in-session picker both carry pricing.
Frontend:
- ModelOptionProvider gains pricing/free_tier/unavailable_models; new
ModelPricing type.
- model-picker dialog renders In/Out $/Mtok (or a Free pill) per model, a
Free tier/Pro badge on the Nous heading, and disables + grays unavailable
paid models for free users with a 'Pro models need a paid subscription' note.
- onboarding confirm card shows the chosen model's price + tier badge.
Tests: test_inventory_pricing covers price formatting, free-tier gating,
paid no-gating, providers without pricing, and swallowed failures.
* fix(desktop): GUI model picker shows curated Nous list in curated order
Two bugs made the GUI Nous model list diverge from the `hermes model` CLI picker:
1. Backend (model_switch.py): the Nous row in list_authenticated_providers
fell through to cached_provider_model_ids("nous"), dumping the full live
/v1/models catalog (~50 vendor-prefixed models, alphabetical). Now it uses
the curated list AND applies the Portal free/paid recommendation union —
exactly like _model_flow_nous in main.py — so newly-launched models such as
stepfun/step-3.7-flash:free surface in curated order. Best-effort: falls
back to the curated list alone if the Portal fetch fails.
2. Frontend (model-picker.tsx): cmdk's Command had shouldFilter on (default),
which re-sorts items by fuzzy-match score (≈alphabetical) and ignores array
order. Set shouldFilter={false} + own the search term and do an
order-preserving substring filter, so the backend's curated order is shown
verbatim.
* feat(desktop): add/switch providers from the model picker via onboarding reuse
The model picker could only select models from already-authenticated
providers. Switching to a new provider had no in-app path. Rather than
duplicate provider UI, reuse the existing onboarding provider selector
(featured Nous + other providers + API-key form + device-code/PKCE flow +
model-confirm with pricing/tier).
- onboarding store: add a 'manual' flag with startManualOnboarding() /
closeManualOnboarding(). Manual mode forces the onboarding overlay to show
even when configured===true and refreshOnboarding no longer auto-dismisses
on runtime-ready (the app is already working — the user is just adding or
switching a provider).
- onboarding overlay: render when manual even if configured; show a Close
button (the first-run flow has none since the app can't run yet).
- model picker: 'Add provider' footer button opens the onboarding selector;
ModelResults lists only configured (model-bearing) providers.
* feat(desktop): add PUT /api/tools/toolsets/{name} enable/disable endpoint
* feat(desktop): add toggleToolset RPC binding
* feat(desktop): toolset enable/disable switch in Tools settings
* feat(desktop): tool configuration parity in GUI Tools settings
Bring the desktop GUI Tools settings to parity with the CLI `hermes tools`
for provider selection and API-key configuration.
Backend (hermes_cli/web_server.py):
- GET /api/tools/toolsets/{name}/config - provider matrix + key status
- PUT /api/tools/toolsets/{name}/provider - persist provider selection
Shared core (hermes_cli/tools_config.py):
- Extract apply_provider_selection / _write_provider_config from the
interactive _configure_provider so the CLI and GUI write identical
config keys (web.backend, tts.provider, browser.cloud_provider, plugin
image/video providers, use_gateway flags) through one code path.
Desktop UI:
- ToolsetConfigPanel: provider list with select, per-provider API-key
entry (set/replace/clear/reveal via the shared env RPCs), Ready/Needs
keys state, guidance for Nous-auth and post-setup providers.
- Wire the Configured/Needs keys pill to expand the panel inline; refresh
the toolset list after key changes so the pill updates live.
- Add getToolsetConfig / selectToolsetProvider RPC bindings + types.
Post-setup (OAuth/install) flows still defer to the CLI; see
docs spike findings for the planned /api/tools/setup/* endpoint family.
Tests: backend round-trip + 400 cases for the new endpoints and
apply_provider_selection; desktop vitest coverage for the config panel
(provider render, select, key save). No change-detector tests.
Also removes three stale completed plan docs.
* fix(desktop): show real Hermes version + sync package.json on release
The desktop app version was disconnected from the Hermes version: the
release script bumped pyproject.toml + hermes_cli/__init__.py but never
touched apps/desktop/package.json, which sat stale at 0.0.2 (lockfile at
0.0.1).
- main.cjs: hermes:version IPC now resolves __version__ from
hermes_cli/__init__.py (the canonical source release.py bumps) via a new
resolveHermesVersion() helper, falling back to app.getVersion() when the
source tree isn't readable. The About panel now always shows the live
Hermes version and can't drift.
- release.py: update_version_files() also bumps apps/desktop/package.json
in lockstep with pyproject (top-level version only; dep specs untouched).
- One-time catch-up: package.json 0.0.2 -> 0.15.1 and the lockfile root
mirrors 0.0.1 -> 0.15.1.
* fix(desktop): stamp exe identity in afterPack hook so updates stay branded
The packed Hermes.exe reverted to the stock Electron icon + "Electron" name
after an in-app update. The icon/identity stamp (rcedit) lived only in
install.ps1, but the installer's --update path rebuilds the desktop via
`hermes desktop --build-only` -> `npm run pack`, which never ran install.ps1
and so never stamped the rebuilt exe.
Move the stamp into an electron-builder afterPack hook so it runs for EVERY
packed build regardless of caller (first install, hermes desktop, the update
rebuild, or a manual npm run pack):
- set-exe-identity.cjs: refactor to export stampExeIdentity(exe, desktopRoot);
still runnable as a standalone CLI.
- after-pack.cjs (new): afterPack hook calling stampExeIdentity. Windows-only
guard; best-effort (logs + resolves on failure, never fails the build).
- package.json: register build.afterPack.
- install.ps1: remove the now-redundant Set-DesktopExeIdentity function + call;
the hook handles it during npm run pack.
electron-builder's own rcedit step stays disabled (signAndEditExecutable=false)
to avoid the signtool -> winCodeSign -> 7-Zip macOS-symlink crash on non-admin
Windows; the hook runs rcedit directly (pure PE resource edit, no signing).
* fix(desktop): export afterPack hook as exports.default so electron-builder runs it
The afterPack hook used `module.exports = fn`, which electron-builder's hook
loader doesn't pick up — it expects the function as the module's default
export (the same shape afterSign/notarize.cjs uses). The hook silently never
ran, so even first install shipped the stock "Electron" exe.
Switch to `exports.default = async function afterPack(...)`. Verified with a
real `npm run pack`: electron-builder now invokes the hook and the produced
release/win-unpacked/Hermes.exe carries ProductName/FileDescription=Hermes.
* chore(desktop): drop auto-build release CI in favor of manual build + upload
Remove desktop-release.yml (nightly-on-main + stable publish). Installers
are now built locally per platform and uploaded to a GitHub Release by hand;
the website points at them via NEXT_PUBLIC_HERMES_DL_* env. Update README +
docs and drop the dead desktop-nightly channel links.
* fix(desktop): stable shortcut icon + bust icon cache so updates repaint
Symptom on a freshly-installed laptop: Hermes.exe itself shows the correct
Hermes icon (Explorer reads the live exe's stamped PE resource), but the
desktop shortcut still draws the stock Electron icon.
Cause: New-DesktopShortcuts set IconLocation to "<exe>,0", so Windows cached
the icon it extracted from the exe at shortcut-creation time. On an update the
exe gets re-stamped, but the shortcut keeps rendering the stale cached bitmap.
- package.json: ship assets/icon.ico beside the exe via extraResources
(-> resources/icon.ico). Verified with a real npm run pack.
- install.ps1 New-DesktopShortcuts: point IconLocation at resources/icon.ico
(fallback to <exe>,0 if absent) — a dedicated .ico is cache-stable and skips
the per-exe extraction that goes stale. Then run `ie4uinit.exe -show` to bust
the shell icon cache so the shortcut repaints immediately instead of showing
the old Electron icon until reboot.
Both best-effort; never fail an otherwise-good install.
* dummy update
* feat(desktop): self-heal update branch + backend contract guard
Two fixes for the bb/gui→main transition:
- Self-update self-heals: if the tracked branch (e.g. bb/gui) no longer
exists on origin (merged + deleted), the desktop updater falls back to
main and persists it. Read-only ls-remote probe that only flips on a
definitive "ref absent" (exit 2), never on a transient network error, so
already-installed clients migrate themselves with no manual flip.
- Backend contract guard: tui_gateway reports DESKTOP_BACKEND_CONTRACT in
session runtime info; the desktop warns with a one-click "Update Hermes"
when the backend predates the GUI's required contract (e.g. a bb/gui app
pointed at a main checkout) instead of failing cryptically downstream.
* docs(desktop): rewrite README to match current install/update/build flow
The old README contradicted itself (claimed a bundled Python payload while
also saying it no longer bundles source) and predated cross-platform support.
Rewrite for accuracy: Linux is a first-class build target, install.sh/install.ps1
both drive the staged bootstrap, the real self-update handoff (Windows
Hermes-Setup vs in-app macOS/Linux), and the bb/gui→main self-heal + backend
contract guard.
* docs(desktop): rewrite README as a real product readme
Lead with what the app is and how to get it (download an installer, or
`hermes desktop` for existing CLI users) plus a plain-language feature list,
then keep contributor/build/internals as a clearly separated secondary section.
* docs(desktop): fix install framing — releases no longer auto-build installers
Lead with the install-with-Hermes path (`--include-desktop` / `hermes desktop`),
which always works, and describe prebuilt installers as manually published when
a release ships them rather than implying CI attaches them to every release.
* docs(desktop): match base repo README style
Adopt the root README's conventions: centered title + badge row, bold
one-liner intro, a feature <table> grid, --- section dividers, and a
Community / License footer.
* feat(desktop): recover from gateway boot failures + validate API keys on entry (#35864)
Fresh installs that hit a gateway boot failure had no recovery path: the
shell rendered dead ("gateway offline"), logs were undiscoverable, and a
mistyped API key was accepted because onboarding only checked credential
presence, not validity.
- Add BootFailureOverlay: a top-level recovery surface (Retry, Repair
install, Use local gateway, Open logs + inline recent logs) that mounts
on any hard boot failure, including post-install. Trims the now-redundant
recovery button from the onboarding Preparing panel.
- Add hermes:logs:reveal / :recent IPC (reveal desktop.log) and a
hermes:bootstrap:repair IPC that drops the bootstrap marker to force a
clean reinstall. Surface "Open logs" in Gateway settings too.
- Add POST /api/providers/validate: a live per-provider probe
(OpenRouter/OpenAI/xAI/Gemini key check, local endpoint connectivity)
wired into saveOnboardingApiKey so a rejected key blocks before it's
persisted, while an unreachable probe falls through (offline-safe).
* test(model-catalog): fix stale nous picker test after curated-list change
ac2e48907 made the GUI/picker Nous row use the curated list (curated["nous"]
= get_curated_nous_model_ids()) + Portal union, matching the `hermes model`
CLI — but test_picker_nous_row_uses_manifest still asserted the old 2-model
manifest snapshot, breaking the test shard.
Rewrite it as an invariant: stub the Portal union to passthrough and assert the
row equals get_curated_nous_model_ids() computed under the same conditions, so
it tracks the real contract instead of a hardcoded model list that rots on every
catalog update.
---------
Co-authored-by: emozilla <emozilla@nousresearch.com>
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
Co-authored-by: Austin Pickett <pickett.austin@gmail.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: ethernet <arilotter@gmail.com>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Native Windows is out of beta. Removes the early-beta warnings, headings,
and rough-edge framing across the README and docs (EN + zh-Hans), keeping
the WSL2-only dashboard PTY caveat. Historical RELEASE_v0.14.0.md notes are
left intact since they accurately describe the state at that release.
- README: Windows install + cross-platform notes
- index.mdx, installation.md: headings, warning admonitions, parity note
- windows-native.md: title/sidebar_label/warning, provider-hunting tip
- contributing.md, nous-portal.md: cross-platform / Portal parity prose
- Repoint cross-links to the renamed installation#windows-native-powershell
anchor (EN) and #windows原生powershell (zh, also fixes pre-existing drift)
For grouped provider families, the descriptive text now lives only on the
collapsed top-level group row. The member sub-picker rows show just the
short provider label (no parenthetical tui_desc), so the description is not
duplicated one layer down.
Ungrouped providers are unaffected — they have no group layer, so their own
row keeps its full tui_desc.
- main.py: member sub-picker uses provider_labels (label) instead of
canonical_descs (tui_desc).
- Telegram already showed labels + model count on member buttons; group
buttons keep Label ▸ (count) since inline keyboards can't fit a long blurb.
Member labels retain their short disambiguators (e.g. 'MiniMax (OAuth)') so
the sub-picker rows stay distinguishable.
The 7 consolidated provider families (OpenAI, xAI Grok, GitHub Copilot,
Google Gemini, Kimi / Moonshot, MiniMax, OpenCode) collapse to one
top-level picker row. Previously that row showed only the bare group
label (e.g. `OpenAI ▸`); now it carries a short blurb describing the
endpoints folded inside (e.g. `OpenAI ▸ (Codex CLI or direct OpenAI API)`).
- models.py: extend PROVIDER_GROUPS tuples to (label, description, members);
group_providers() emits the description on group rows.
- main.py: CLI picker renders `<label> ▸ (<description>)` for group rows.
- telegram.py: update the group tuple unpack (button text keeps the member
count, which fits inline keyboards better than a long blurb).
- tests: assert every group has a non-empty description and the fold emits it.
Member-specific detail still lives in each member's tui_desc and shows in
the drill-down sub-picker. Slug identity, --provider, /model paths unchanged.
Update the tui_desc text shown for each provider in the interactive
`hermes model` / setup wizard / `/model` pickers. Pure copy refresh —
slugs, labels, PROVIDER_GROUPS folding, and all typed paths are unchanged,
so the 7 grouped families (OpenAI, xAI Grok, GitHub Copilot, Google Gemini,
Kimi / Moonshot, MiniMax, OpenCode) still fold identically.
Also aligns the auto-injected alibaba-coding-plan provider description to
the same parenthetical style.
Follow-up to the synthetic-notification DM-topic routing fix. The new
_is_telegram_dm_topic_target probed the adapter's _get_dm_topic_info via
instance-level getattr, which a MagicMock auto-creates as a truthy callable —
so any test double with a non-dm chat_type and a thread_id would be
misclassified as a DM topic lane and have the fallback routing keys injected.
Resolve the method on type(adapter) and treat only dict-shaped returns as an
operator-declared topic, mirroring the existing guard in
_rename_telegram_topic_for_session_title. Update the home-channel startup test
to declare _get_dm_topic_info on a real adapter subclass instead of patching a
MagicMock onto the instance.
Background tasks on non-local backends (SSH/Docker/Modal/Daytona/Singularity)
go through `ProcessRegistry.spawn_via_env`, which builds a hand-crafted,
shell-safe wrapper:
mkdir -p T && ( nohup bash -lc CMD > LOG 2>&1; rc=$?; ... ) & echo $! > PID && cat PID
`BaseEnvironment.execute()` unconditionally ran `_rewrite_compound_background`
on every command, including this wrapper. The rewrite (meant to defuse the
`A && B &` subshell-wait trap for user commands) turns `( ... ) & echo $!` into
`{ ( ... ) & } echo $!` — note `} echo` with no separator, which is a bash
syntax error. The wrapper then never produces a PID, the redirected output file
is never created, and the agent sees an immediate exit code -1. This breaks
*every* background launch on a non-local backend (e.g. a simple
count-and-redirect script over SSH), not just edge cases.
Fix:
- Add `rewrite_compound_background: bool = True` to `BaseEnvironment.execute()`
(and the `BaseModalExecutionEnvironment` override, which accepts and ignores
it). Default preserves existing behavior; the user foreground terminal path
still rewrites.
- `spawn_via_env` passes `rewrite_compound_background=False` so its already
shell-safe wrapper is left intact.
- Treat a wrapper that produces no PID as a failed launch (mark the session
exited with a real exit code instead of exposing a fake running session), and
don't register/checkpoint a session that never started.
Verified empirically: with the rewrite skipped, the wrapper is valid bash,
launches the process, captures the PID, and writes the log/pid/exit files; the
old rewritten form fails `bash -n` with a syntax error.
Based on #33756 by @CharZhou (extracted from a multi-feature branch; the
unrelated image_gen / docker-media changes are not included here).
Co-authored-by: CharZhou <17255546+CharZhou@users.noreply.github.com>
terminal_tool re-sent the init-time/config cwd on every command, clobbering
session-local `cd` state: the environment tracked the new directory in
`env.cwd`, but foreground/background calls forced the old cwd back. A small
`_resolve_command_cwd` resolver now applies the precedence
`workdir > live env.cwd > config/override cwd` to:
- foreground `env.execute(...)`
- background `process_registry.spawn_local(...)`
- background `process_registry.spawn_via_env(...)`
Additionally, syncing the cwd onto the live cached env when a `cwd` override is
(re-)registered. Preferring live `env.cwd` would otherwise demote the ACP
`update_cwd` override (registered via `register_task_env_overrides` on
`session/load` / `session/resume`) below an already-set `env.cwd`, silently
ignoring an editor's mid-session project-root change once any command had run.
`register_task_env_overrides` now pushes a new cwd onto the cached env so an
explicit ACP cwd change wins, while ordinary in-session `cd` tracking is
preserved.
Regression coverage:
- foreground/background commands follow live `env.cwd`
- explicit `workdir` still overrides everything
- registering a cwd override updates the live env cwd (ACP authority)
- no-op when no live env exists; non-cwd overrides leave env.cwd untouched
Based on #35510 by @Dusk1e.
Co-authored-by: Dusk1e <yusufalweshdemir@gmail.com>
In a per-user thread (thread_sessions_per_user=True), each participant
gets an isolated session key (...:{thread_id}:{user_id}). A run another
user started lives under a different key, so the caller's own /stop found
nothing and replied 'no active task to stop'.
When /stop finds no run under the caller's own key, fall back to
interrupting any running agent(s) sharing the caller's thread prefix
({chat_id}:{thread_id}), gated on _is_user_authorized. Thread-only — the
fallback returns [] for non-thread channels, and a prefix-collision guard
prevents thr1 from matching thr11.
* feat(setup): Quick Setup routes through Nous Portal (OAuth + model + messaging)
First-time quick setup now goes straight to the Nous Portal provider
instead of showing the full provider picker. Runs the device-code OAuth
login, selects a Nous model, configures the terminal backend, and offers
messaging setup — applying recommended defaults for everything else.
- Rename menu entry to 'Quick Setup (Nous Portal)'.
- _run_first_time_quick_setup now calls _model_flow_nous (handles both the
logged-out OAuth+model-select path and the logged-in curated picker),
then re-syncs config from disk to avoid the #4172 stale-overwrite.
- Terminal / defaults / messaging steps unchanged.
* feat(setup): thin out Full Setup with happy defaults
Full Setup no longer asks for every config knob — anything with an
obvious default is applied silently and stays tunable via the per-section
commands (hermes setup agent|terminal|tts, hermes auth add).
- Model section: drop the same-provider rotation pool, vision-backend
picker, and TTS provider sub-flows. Vision auto-detects from the main
provider; TTS defaults to Edge; rotation lives in hermes auth add.
- Terminal section: keep the backend picker (Local default) and any
required credentials (Modal token, SSH host/user/key, Daytona key),
but stop prompting for container image, CPU/mem/disk resources, gateway
cwd, and sudo password — all use defaults.
- Agent Settings: removed from the wizard. First installs get recommended
defaults silently; existing installs keep their tuned values.
- New defaults: max_turns 90 -> 150, session_reset both -> none.
- Tests: reconfigure tests assert agent settings are no longer prompted
on existing installs; drop 3 tests covering the deleted in-setup
rotation flow.
* fix(tui): persist gateway lifecycle breadcrumbs to crash log
A backend SIGTERM (`=== SIGTERM received ===` in tui_gateway_crash.log) is
always a parent action — `gw.kill()` (graceful-exit on a signal to Node, or an
explicit /quit) or `start()` replacing a live child. #31051 added parent-side
lifecycle breadcrumbs but left them in an in-memory CircularBuffer that dies
with the process, so SIGTERM crash reports arrive with no parent context and no
way to tell a signal-driven kill from a memory-critical `process.exit(137)`
(which closes the child's stdin → clean EOF, not SIGTERM).
Persist the death-explaining breadcrumbs (spawn / transport-exit / child-exit /
replace-live-child / kill-reason / startup-timeout) plus the graceful-exit
signal name and the memory-critical exit into the same crash log the Python
side writes, so they interleave by timestamp next to the child's panic entry —
making these recurring reports diagnosable.
Gated off under VITEST so unit tests stay hermetic.
* feat(tui): auto-recover the session when the gateway dies unexpectedly
When a still-owned gateway child dies while the TUI is alive (a crash, OOM
process.exit, or a SIGTERM/SIGHUP forwarded to it), the app currently nulls the
session and drops to an inert "gateway exited" state — the user loses a long
session and has to restart + re-run everything. That single behavior is most of
the "TUI doesn't survive heavy work" complaint, independent of what does the
killing.
The 'exit' event only reaches this handler on an *unexpected* death: a user
/quit calls process.exit before it fires, and a replaced child is identity-
skipped in GatewayClient. So on exit we now respawn the gateway and resume the
session that was live (history is persisted in SQLite) via a one-shot
recoverSidRef the next gateway.ready consults before forging a new session. The
in-flight reply is lost (it died with the process) but the session survives.
Bounded to GATEWAY_RECOVERY_LIMIT (3) attempts per GATEWAY_RECOVERY_WINDOW_MS
(60s) so a gateway that crash-loops on startup can't spawn-storm; past the
budget we fall back to the inert state.
* fix(tui): sanitize newlines + soften SIGTERM-cause claim in parentLog
Address PR review:
- recordParentLifecycle collapses embedded \r\n so a multi-line value (e.g. an
error message) stays a single breadcrumb and can't masquerade as a separate
entry or as the child's panic output sharing the crash log.
- Reword the header: a backend SIGTERM is *usually* a parent action but can come
straight from an external supervisor (s6, cgroup OOM, stray kill); the
presence/absence of a [tui-parent] line before the child's panic is precisely
what disambiguates the two.
* fix(tui): clear sid during recovery + extract/test the recovery budget
Address PR review:
- Null `sid` immediately in the gateway exit handler. While the gateway is down
(busy=false) the old sid would otherwise let sid-guarded effects (the 1.5s
session.active_list poll, queue drain) fire RPCs at a dead/respawning gateway.
recoverSidRef carries the session forward; resumeById restores sid on ready.
- Extract the respawn budget into a pure evalRecovery() (gatewayRecovery.ts) and
unit-test the bound: allows GATEWAY_RECOVERY_LIMIT within the window, blocks
past it, and prunes attempts older than the window so recovery re-arms.
* fix(tui): cap parent-log breadcrumb length (PR review)
Truncate a single persisted breadcrumb to 4096 chars (matching GatewayClient's
in-memory log-line cap) so a pathological value — e.g. a giant error string —
can't bloat the shared crash log or add noticeable blocking on the synchronous
append during a failure path. Covered by a test.
* fix(tui): keep "recovering session…" status visible during resume (PR review)
resumeById() synchronously sets status to 'resuming…' on entry, so the
recovery branch now applies its 'recovering session…' label *after* calling
resumeById — the distinct label sticks for the duration of the resume RPC
(which later flips to 'ready') instead of being immediately clobbered. Test
updated to assert the ordering.
* fix(tui): keep recovery budget alive across a startup crash-loop (PR review)
deadSid was read from getUiState().sid, which the first exit nulls — so if the
respawned gateway crash-looped before gateway.ready (resumeById never restored
sid), later exits saw null and abandoned the session after a single attempt,
defeating the bounded retry budget.
Lift the whole decision into a pure planGatewayRecovery() that falls back to the
pending recoverSidRef target when the live sid is already cleared, and unit-test
the crash-loop sequence (keeps retrying the same session up to the limit, then
falls back to inert). Supersedes evalRecovery.
* chore(tui): drop non-null assertion + clarify breadcrumb cap comment (PR review)
- Recovery branch guards on `recoverSidRef && recoverSid` so the ref write needs
no `!` assertion (avoids a future unsafe refactor).
- Reword the parentLog cap comment: it slices the value to 4096 chars and
appends a short truncation marker (so the written line is slightly longer),
rather than implying a strict 4096-byte limit.
* chore(tui): soften "absence ⇒ external signal" + "any in-flight reply" (PR review)
- parentLog header: a missing [tui-parent] line only *suggests* an external
signal (the logger is best-effort: VITEST-disabled, failed append swallowed),
not a definitive conclusion.
- Recovery notice says "any in-flight reply was lost" since the gateway can also
exit while idle.
Extended-thinking Claude models (4.6+, e.g. Opus 4.8) emit a signed `thinking`
block on assistant turns that also carry parallel `tool_use` blocks. Anthropic
signs that block against the full, original turn content.
When a parallel tool batch is interrupted before every `tool_result` returns,
`_strip_orphaned_tool_blocks` removes the unanswered `tool_use` on replay — which
mutates the turn. The latest-assistant branch of `_manage_thinking_signatures`
then replays the now-stale signed thinking block verbatim, and Anthropic rejects
the request with a non-retryable HTTP 400:
messages.N.content.M: `thinking` or `redacted_thinking` blocks in the latest
assistant message cannot be modified. These blocks must remain as they were
in the original response.
Because the poisoned turn is rebuilt from the persisted store every turn, the
gateway crash-loops with no self-recovery (a soft session reset does not clear
it). The drifting content index in the error is the changing count of stripped
`tool_use` blocks across rebuilds.
Fix: when orphan-stripping removes a `tool_use` from a turn that also holds a
thinking/redacted_thinking block, flag the turn. `_manage_thinking_signatures`
then demotes every thinking block on that latest turn to a plain text block
(preserving the reasoning text) instead of replaying a signature that can no
longer validate. An intact turn is unaffected — its signed thinking is still
replayed verbatim. The internal flag is stripped before the payload is sent.
Adds two regression tests:
- demotion when an orphaned parallel tool_use is stripped
- control: signed thinking preserved verbatim when nothing is stripped
PR #35718 added a per-slot "cumulative-resend" latch to the universal
streaming tool-call accumulator to fix DeepSeek / Baidu Qianfan (#35592).
The latch fires when a delta is a strict superset of the accumulated
buffer (len(_new) > len(_prev) and _new.startswith(_prev)) and then
REPLACES the buffer instead of appending.
That superset test is not an unambiguous cumulative signature. A normal
incremental stream can emit a single fragment that restates an already-
accumulated prefix — trivially common in large code-patch arguments with
repeated lines / indentation — which trips the latch and clobbers the
accumulated buffer, corrupting the tool call. Observed in the wild on
Anthropic Opus (the primary model) building a large patch: corrupted /
short arguments → finish_reason='length' dead-end → session killed.
A guessing heuristic that can silently clobber a tool-call buffer has no
place on the path every provider and model shares. Reverting restores the
known-good plain `+=` accumulator. The #35592 narrow provider bug should
be re-addressed provider-gated so it is structurally impossible to touch
Anthropic / OpenAI incremental streams, rather than via a heuristic on the
shared path.
Reverts ca03486b6.
The status bar read context_compressor.last_prompt_tokens directly with
an 'or 0' guard that only catches 0/None. Right after a compression the
compressor parks last_prompt_tokens at the -1 sentinel
(awaiting_real_usage_after_compression) until the next API call reports
real usage. -1 is truthy, so it sailed through and rendered as '-1/200K'
and '-1%' for that one transitional turn.
Clamp negative token/context-length values to 0 in the status-bar
snapshot so the gap reads as empty context until real usage arrives.
* feat(tools): always show Nous Tool Gateway backends, login on select
The Nous-managed Tool Gateway rows in `hermes tools` (Firecrawl, OpenAI
TTS, Browser Use, FAL image/video) were hidden unless the user was already
logged into Nous Portal with paid access. Now they are always listed.
Selecting one runs an inline Nous Portal device-code OAuth + entitlement
check — auth only, no inference-provider switch and no bulk 'enable all
tools' prompt (that stays in `hermes model`). The row only activates the
gateway once paid access is confirmed.
- _visible_providers: stop hiding managed_nous_feature rows (incl. those
also flagged requires_nous_auth); pure pre-auth UX rows still gate on login
- nous_subscription.ensure_nous_portal_access(): auth + entitlement gate
that preserves the user's active inference provider
- _configure_provider / _reconfigure_provider: run the inline gate for
managed backends; write config only when entitled
- picker marker: 'via Nous Portal (login on select)' for logged-out users
- _hidden_nous_gateway_message: now a no-op (rows are never hidden)
* docs: hermes tools is a first-class Tool Gateway entry point
The Tool Gateway docs framed `hermes setup --portal` / `hermes model` as
the activation path and only mentioned `hermes tools` for mixing in your
own keys. With the inline-login change, picking a Nous-managed backend in
`hermes tools` is a complete path on its own — it logs you into Nous
Portal on select if needed, without switching your inference provider or
prompting to enable every other tool.
- tool-gateway.md: Get started now lists three peer entry points; new
paragraph explaining login-on-select and the no-prompt fast path when
OAuth is already active
- nous-portal.md + run-hermes-with-nous-portal.md: note that managed rows
appear logged-out and trigger inline login on select
The three curses menus (curses_checklist / curses_radiolist /
curses_single_select) each hand-rolled an identical event loop: cursor
hide + color-pair init, the per-frame clear/getmaxyx/refresh cycle,
scroll-offset math, row iteration, the read_menu_key dispatch with
NAV_UP/NAV_DOWN cursor wrap, flush_stdin, and the
KeyboardInterrupt/curses-unavailable fallback. Terminal-behavior changes
(e.g. Ghostty raw-escape handling, scroll tweaks, a new key) had to be
made in three places.
Extract that boilerplate into one _run_curses_menu driver. Each public
menu now supplies small callbacks for the parts that genuinely differ:
draw_header (returns the item-list start row), draw_row (checkbox vs
radio vs bare prefix), an on_action reducer (toggle-set vs return-cursor
vs return-None + the single_select cancel-row guard), an optional
draw_footer (the checklist status bar), reserve_bottom, and the numbered
fallback. Behavior is passed as functions; the loop is the only stateful
piece — so future terminal/Ghostty work is a one-place edit.
Duplicated event-loop primitives drop 3 -> 1 (stdscr.clear, read_menu_key
dispatch, scroll math). Verified byte-identical: a render harness records
every addnstr(y, x, clamped-text, attr) call across frames plus the
return value for 6 cases (checklist, checklist+status, radiolist,
radiolist+description, single_select, single_select ESC-cancel); output
diffs clean against origin/main. Non-TTY returns the cancel value
directly (not the input()-based numbered fallback), matching the old
per-menu guard. 150 menu/setup/browse/plugins tests pass.
The setup provider->model sub-menu (and three sibling pickers) used
simple_term_menu.TerminalMenu, whose ESC and arrow-key handling was
unreliable across terminals — notably ESC failed to back out of the
model selection list on terminals that emit raw escape sequences (e.g.
Ghostty). The codebase already notes simple_term_menu 'conflicts with
/dev/tty' and causes 'ghost-duplication rendering', and a prior attempt
to migrate these (closed PR) confirmed the same root cause.
Route all four single-select pickers through the shared, already-hardened
curses_radiolist (which decodes raw CSI/SS3 escape sequences and handles
ESC consistently, fixed in #35776):
- auth.py _prompt_model_selection — model picker; the pricing column
header and the unavailable-models block are passed as the radiolist
description so they survive the curses screen clear. ESC now cancels.
- main.py _prompt_reasoning_effort_selection — reasoning-effort picker.
- main.py _model_flow_named_custom — named custom-provider model picker.
- main.py _remove_custom_provider — provider-removal picker.
simple_term_menu is no longer imported anywhere (only stale comments
referenced it; one in setup.py is corrected). The numbered-input
fallbacks are unchanged and still trigger on curses errors / non-TTY.
Tests: updated test_terminal_menu_fallbacks / test_reasoning_effort_menu
/ test_custom_provider_model_switch / test_model_provider_persistence to
drive the fallback via curses_radiolist errors instead of breaking
simple_term_menu. New test_setup_menu_curses_migration.py asserts each
picker routes through curses_radiolist, ESC cancels, and the pricing
header is preserved. Net -147/+183 (mostly the new test file; production
code shrinks by removing TerminalMenu boilerplate).
The setup wizard's provider/model pickers (curses_radiolist via
prompt_choice) bailed to the numbered "Select [1-N]" fallback the moment
a user pressed up or down. Root cause: even with keypad(True) — which
curses.wrapper sets — many terminals/terminfo entries deliver cursor keys
to getch() as raw CSI/SS3 byte sequences (e.g. 27, 91, 66 for arrow-down)
rather than the translated curses.KEY_DOWN. The menus matched only
curses.KEY_UP/KEY_DOWN and treated the leading 27 (ESC) as cancel, so
navigation dropped into the text fallback and the trailing bytes leaked
into the next input().
Add a shared read_menu_key() helper that decodes CSI/SS3 escape sequences
into normalized NAV_* actions (only a lone ESC, with no continuation byte
within a short timeout, still cancels) and consumes the tail of unhandled
sequences so stray bytes can't corrupt later input(). Route all three
curses menus (checklist, radiolist, single_select) through it.
Add regression tests covering raw CSI/SS3 arrows, translated KEY_*
constants, vim keys, lone-ESC cancel, and full consumption of unhandled
sequences (Delete/Home/End).
* feat(kanban): goal_mode cards run workers in a /goal loop
A goal_mode card wraps its dispatched worker in the Ralph-style goal
loop behind /goal: after each turn an auxiliary judge checks the
worker's response against the card title+body, and if not done the
worker keeps going in the SAME session until the judge agrees, the
worker terminates the task itself, or the turn budget runs out (which
blocks the card for human review — never a silent exit).
- kanban_db: goal_mode + goal_max_turns columns (additive migration),
Task fields, create_task params, INSERT wiring, created-event payload.
- kanban_tools: goal_mode/goal_max_turns on the kanban_create tool so
orchestrators can opt cards in when fanning out.
- kanban CLI: --goal / --goal-max-turns on 'kanban create'.
- dashboard API: goal_mode/goal_max_turns on the create endpoint
(auto-surfaced back via asdict).
- _default_spawn: sets HERMES_KANBAN_GOAL_MODE / _GOAL_MAX_TURNS only
when the card opts in.
- goals.run_kanban_goal_loop: standalone, callback-injected loop engine
(no SessionDB persistence; ephemeral worker). cli.py quiet path calls
it after the worker's first turn when the env vars are set.
- Docs: orchestrator skill + kanban feature page.
Tests: DB roundtrip + legacy migration, spawn env gating, and the loop's
continuation/completion/budget-block/finalize-nudge branches. E2E run
against a real kanban DB confirms a budget-exhausted goal worker lands
in a sticky blocked state.
* feat(kanban/dashboard): goal-mode toggle in the create form
Wires the goal_mode card setting into the dashboard UI (the plugin's
hand-written IIFE bundle, no build step):
- InlineCreate: 'goal mode' checkbox after the skills field; checking it
reveals an optional 'max turns' number input. Both reset on submit and
only post goal_mode/goal_max_turns when enabled.
- TaskDrawer: a 'Goal mode: on (max N turns)' MetaRow so a card's
goal-mode setting is visible after creation (auto-fed by asdict via the
existing _task_dict).
Live-tested through the running dashboard with a browser: created a
goal-mode card with max-turns=8, confirmed it persisted to the kanban DB
(goal_mode=1, goal_max_turns=8) and rendered back in the drawer as
'on (max 8 turns)'. No JS console errors.
Self-review follow-up on top of the salvaged perf fixes:
- gateway/run.py (both watcher-drain sites): the salvaged O(n^2) fix
(#32708) replaced `while pending_watchers: pop(0)` with iterate-then-
`watchers.clear()`, but `watchers` aliased the registry's live list.
A watcher appended by a concurrent session during the `await
asyncio.sleep(0)` yield would be cleared without ever being scheduled.
Detach the batch atomically (`pending_watchers = []`) before iterating.
- gateway/platforms/bluebubbles.py: normalize the salvaged _guid_cache
LRU (#30523) to match feishu/codebase precedent — module-level
`_GUID_CACHE_SIZE` constant, `while len > cap`, and drop the redundant
post-insert `move_to_end` (a fresh insert is already most-recent).
- gateway/platforms/feishu.py: drop the same redundant post-insert
`move_to_end` from the salvaged _message_text_cache LRU (#23706).
- scripts/release.py: add AUTHOR_MAP entries for the salvaged commits'
authors (amathxbt #22155, ErnestHysa #32636/#32708) so the contributor
audit passes when these commits land on main.
- tests/tools/test_tool_output_limits.py: autouse fixture resets the new
module-level limits cache between tests.
- tests/gateway/test_feishu.py: hand-built adapter fixture seeded
_message_text_cache as a plain dict; it's now an OrderedDict, so the
fixture type had to match.
N43 — Silent plugin/bundle errors:
- Plugin command dispatch: logger.debug() -> logger.warning()
- Bundle dispatch: logger.debug() -> logger.warning()
Plugin/auth failures are no longer invisible to operators.
N42 — O(n^2) pending_watchers recovery:
- Both recovery loops (startup + per-message) used while+pop(0) which is O(n) per pop
- Replaced with enumerate() over the list + periodic asyncio.sleep(0) yield points
- Clears the list after iteration instead of per-pop
- Batch size of 100 balances throughput vs event-loop responsiveness
PAIN BEFORE:
Inside _handle_auth_error_and_retry() (a sync function that runs on the MCP
event loop thread), there was a blocking polling loop:
while time.monotonic() < deadline:
if srv.session is not None and srv._ready.is_set():
break
time.sleep(0.25) # BLOCKS THE ENTIRE EVENT LOOP
Since _handle_auth_error_and_retry is invoked from tool handlers that run ON
the MCP event loop, time.sleep(0.25) blocked ALL concurrent MCP operations
(including other tools, keepalive heartbeats, OAuth refreshes) for 250ms per
iteration. With a 15-second deadline, worst case = 60 * 250ms = 15 seconds
of fully blocked concurrency.
WHAT WAS FIXED:
Extracted the blocking poll into an async helper _await_ready() that uses
asyncio.sleep(0.25) (non-blocking), and runs it via _run_on_mcp_loop().
_run_on_mcp_loop() properly awaits the coroutine on the event loop without
blocking the caller's thread. Added exception handling around the poll so
stuck reconnects still fall through to the error path.
The sync _handle_auth_error_and_retry now:
1. Fires reconnect signal (threadsafe)
2. Calls _run_on_mcp_loop(_await_ready(), timeout=15) — non-blocking
3. Returns; the event loop handles the polling
File: tools/mcp_tool.py
Lines: _handle_auth_error_and_retry() (~1886-1920)
Found by: exhaustive multi-pass audit (10 strategies, 1901 files, 913K lines)
The _guid_cache dict grows without bound as new contacts/groups are
resolved. In a long-running gateway instance with many unique targets
this becomes a slow memory leak.
Replace the plain dict with an OrderedDict capped at 500 entries.
When the cap is exceeded the oldest (least-recently-used) entries are
evicted.
_message_text_cache was a plain dict with no size limit. Every unique
message_id whose text was fetched (for reply-context lookups) stayed in
memory permanently, causing unbounded growth in long-running deployments
with active group chats.
Replace with an OrderedDict and evict the least-recently-used entry
whenever the cache exceeds _FEISHU_MESSAGE_TEXT_CACHE_SIZE (512). Cache
hits call move_to_end() to refresh LRU order. Mirrors the identical
pattern already used by _pending_processing_reactions in the same class.
Lower the model_catalog disk-cache TTL from 24h to 1h so freshly
published model-catalog.json deploys reach the picker within an hour
instead of up to a day. The picker now refetches on the next
`hermes model` / `/model` once the cache is older than 1h; younger
than 1h still serves the cache (no network hit), and network failures
still fall back to the stale copy.
- DEFAULT_TTL_HOURS 24 -> 1 (model_catalog.py)
- DEFAULT_CONFIG model_catalog.ttl_hours 24 -> 1, _config_version 24 -> 25
- migration v24->25 rewrites a stale ttl_hours:24 to 1, preserving any
custom value the user set
E2E: verified >1h refetches / <1h skips, and migration rewrites 24->1
while preserving a custom 6.
DeepSeek / Baidu Qianfan stream tool-call arguments in cumulative mode:
each chunk resends the full arguments-so-far instead of the new fragment.
The stream accumulator blindly concatenated arg deltas with +=, turning
that into '{...}{...}{...}', which failed json.loads and got nuked to '{}'
— a silently corrupted tool call (#35592). Worse on multi-param tools
(search_files, session_search, memory replace) because longer args take
more chunks, giving more resend opportunities.
- Per-slot cumulative latch in the stream accumulator: a delta that is a
strict superset of the accumulated buffer marks the slot cumulative and
replaces (not appends); exact duplicates are dropped only after latching.
Incremental fragments are untouched (default += path).
- Backstop _collapse_repeated_json_arguments() in the repair pipeline
collapses pure identical-resend buffers (K exact repeats of a valid-JSON
unit) for providers that resend the complete object from chunk 1. Only
reached after json.loads already failed, so compliant single objects are
never touched.
Not a gateway or DeepSeek-model bug — any OpenAI-wire provider in
cumulative streaming mode is affected.
Resize vision tool-result images down to a 4 MB embed cap at load time,
not just at the 20 MB hard ceiling. A 5-20 MB image previously sailed
through the native fast path and got baked into conversation history,
where Anthropic's 5 MB per-image base64 limit rejected every subsequent
turn with a 400 — and because history is immutable, retries could never
clear it, permanently wedging the session.
Also harden the reactive shrink-recovery: it now returns False (don't
retry) when any oversized image part can't be brought under target, so
the single retry isn't burned re-sending a payload that will fail
identically. Previously it returned True after shrinking *any* part,
even when the actual oversized culprit survived.
SSH sessions hard-failed voice mode on the presence of SSH_* env vars
alone, even when a PulseAudio/PipeWire server is running on the host and
audio works (ffplay/aplay/pw-play -> pulseaudio). Probe the default
sound-server sockets (PULSE_SERVER unix path, PULSE_RUNTIME_PATH/native,
$XDG_RUNTIME_DIR/{pulse/native,pipewire-0}) and actually connect() so a
stale socket doesn't count; downgrade the SSH branch to a notice when
audio is reachable. Mirrors the existing Docker/WSL forwarding handling.
Fixes#35622
The all/* wildcard expands to every registered toolset, but a handful of
tools have an additional check_fn gate on top of toolset membership and
are intentionally NOT turned on by all/* alone:
- Capability-gated tools (browser, computer_use, code_execution, Feishu,
Home Assistant, cronjob) require their backend/credential prerequisite.
- The kanban toolset is workflow-gated and deliberately opt-in. Kanban
tools mutate shared board state, so they stay off by default even under
all/* — you must list 'kanban' by name (or be a dispatcher-spawned
worker with HERMES_KANBAN_TASK set).
This was the expectations gap behind #35581 — the docs previously said
all/* expands to 'every registered toolset' without noting the carve-out.
Closes#35581.
Follow-up to the salvaged #30728:
- Gateway already exports _HERMES_GATEWAY=1 at startup (gateway/run.py) and
cli.py already keys off it. Drop the redundant new HERMES_IN_GATEWAY var;
guard stop/restart on _HERMES_GATEWAY instead. One marker for one fact.
- Drop the greedy \bgateway.*restart alternation from the cron lifecycle
filter — it false-positived on legit prompts that merely mention an
unrelated gateway + a restart (API/payment gateway monitoring). The
specific 'hermes gateway (restart|stop|start)' pattern already covers the
real command.
- Rework the two negative guard tests to sentinel the first downstream call
so they don't drive real signal delivery (tripped the live-system guard).
- Add false-positive regression cases to test_safe_commands.
Three defenses against SIGTERM-respawn loops when agent schedules its
own gateway restart under launchd/systemd KeepAlive:
1. HERMES_IN_GATEWAY env var: gateway sets it at startup; stop/restart
subcommands refuse to run when set (exit 1 with clear message).
2. Cron create payload filter: regex pre-flight rejects prompts/scripts
containing hermes gateway restart/stop, launchctl kickstart/unload,
systemctl restart/stop, and pkill patterns.
3. 30 new tests: pattern matching (14), cron block (5), gateway guard (4),
safe command negatives (7).
The per-turn file-mutation verifier footer rendered failed-write paths as
bare absolute paths in the user-facing response. The gateway's
extract_local_files() scans response text for bare paths ending in a
deliverable extension (.yaml/.json/etc.), validates os.path.isfile(), and
auto-attaches matches as native uploads — so a denied write to
~/.hermes/config.yaml surfaced the path in the footer and got the
credential file silently uploaded to the messaging channel.
The gateway denylist (validate_media_delivery_path) already blocks the
config.yaml case after #35634. This is defense-in-depth at the source:
backtick-wrap every path the footer emits — both the bullet path and any
path echoed inside the tool's error preview (the protected-file denial
message embeds the path in single quotes, which do NOT block the
extractor regex). extract_local_files skips paths inside inline-code
spans, so wrapping defeats auto-attachment for ANY protected file while
keeping the path human-readable.
- run_agent.py: _format_file_mutation_failure_footer wraps bullet paths;
new _neutralize_footer_paths backticks any remaining bare path (covers
the preview echo). staticmethod -> classmethod (caller unaffected).
- tests: backtick-wrap assertion + end-to-end extract_local_files leak test.
When PTB's general httpx pool is exhausted, it converts httpx.PoolTimeout
into telegram.error.TimedOut whose message states the request was *not*
sent to Telegram. The send retry loop treated all non-connect TimedOut as
non-retryable, so a pool timeout raised immediately, skipped all 3 retry
attempts, and was returned as retryable=False -- silently dropping the
message (agent responses, cron reports, etc.).
A pool timeout means the request never left the process, making it the
safest case to retry. Add _looks_like_pool_timeout() and treat it like a
connect timeout in both the in-loop retry decision and the outer retryable
determination, so pool timeouts flow through the existing backoff loop and
stay retryable on exhaustion.
Reported-by: q3874758 (#35610)
* feat(models): add deepseek-v4-flash to OpenRouter + Nous curated lists
deepseek/deepseek-v4-flash was already in the native deepseek provider
catalog but missing from the curated OpenRouter and Nous Portal picker
lists. Added it to both and regenerated the model-catalog.json manifest
(drift guard requires same-PR regeneration).
* refactor(models): trim redundant variants, group curated lists by maker
Remove claude-opus-4.7/4.6, gpt-5.4-nano, gpt-5.3-codex,
gemini-3-pro-image-preview, gemini-3.1-flash-lite-preview, grok-4.20,
and the older gemini-3-pro-preview (Nous). Reorder both OPENROUTER_MODELS
and _PROVIDER_MODELS[nous] into contiguous per-maker blocks with comment
headers. Regenerated model-catalog.json (openrouter 27, nous 20).
* feat(models): add gemini-3-pro-preview to OpenRouter + Nous curated lists
Adds google/gemini-3-pro-preview to both curated pickers (new on
OpenRouter, restored on Nous). Regenerated model-catalog.json
(openrouter 28, nous 21).
* test(models): use claude-opus-4.8 in OpenRouter fetch fixtures
The two TestFetchOpenRouterModels tests mocked a live OpenRouter
response with claude-opus-4.6 and relied on it surviving the curated-list
filter. Since 4.6 was removed from OPENROUTER_MODELS, those models got
filtered out and the recommended tag shifted. Swap the fixture to
claude-opus-4.8 (still curated, still first in the Anthropic block).
Rewrite TestDiscordMentions as negative assertions (mentions survive the
redactor) and clean up the orphaned comment + dangling whitespace left by
removing _DISCORD_MENTION_RE. Follow-up to the salvaged #32259 fix for #35611.
The 'hermes update' config-migration prompt printed only counts ('1 new
config option available') then asked 'configure them now?' without ever
saying what the options were. Users said no because they couldn't tell what
they were agreeing to. For pure config-format version bumps (no new
env/config keys) it still asked the question, where saying yes just bumped
the version and looked like a no-op.
- List each new env var / config key by name + description before prompting
(cap at 8, then '… and N more'). The data was already available; we just
threw it away and printed a count.
- Pure version bump (no new options): apply the format migration
non-interactively and print what happened, instead of asking a misleading
yes/no.
Reported by ScottFive and Tt2021.
Some hosts (notably WSL) report a junk window size such as 131072 columns
by 1 row. Both the Ink fork and our components only guard against
0/null/undefined/NaN (stdout.columns || 80), so a positive-but-absurd
width sails through into createScreen(width*height), allocating tens to
hundreds of MB per frame and tripping the TUI memory monitor's hard exit.
Add clampStdoutDimensions(), installed in entry.tsx before ink.render: it
patches process.stdout.columns/rows with clamping getters (cols 1-2000,
rows 1-1000; out-of-range -> 80x24). One install point fixes the renderer,
its resize handler, and every component read. Live resizes still propagate
through the original descriptor, just clamped.
* fix(tui): swallow degraded mouse-burst noise so a stalled loop can't lock the composer
When the Node event loop blocks during a heavy render/tool-call burst, stdin
stops being drained. Mode-1003 any-motion mouse reports pile up in the kernel
buffer, get partially read, and arrive as text with the `\x1b[<` prefix AND
coordinate digits chewed off across many partial reads. The existing fragment
recovery (SGR_MOUSE_FRAGMENT_RE) only handles clean `button;col;row[Mm]`
triples, so the degraded shards leak into the composer as typed text — the user
can no longer type or exit until the stall clears.
Captured leak (Windows Terminal, during tool calls):
M6M35;220;56M6M35;218;56M169;48M;157;47M;44M20;43M79;40M78;40M0M7M35;49;41M
48;41M;47;40M9;15;32M[I;31M5;211;26M35;211;25M7M;220;1MM0M09;25M24M23M3;22M
M18M99;26M32MM38M63;44M47MM1;51M M4M54M
Add two recovery layers in parseTextWithSgrMouseFragments / the text-token path:
- MOUSE_BURST_NOISE_RE: whole-text fast path. If a text token is drawn only
from the mouse-leak alphabet (`[ ] < ; I M m`, digits, spaces) AND carries
the structural signature of mouse coordinates (>=3 M/m terminators, a digit,
and a `;`), swallow it wholesale.
- MOUSE_BURST_RESIDUE_RE: swallows pure-noise residue in the gaps between and
after recovered fragments, so a partially-recovered burst doesn't trail a
chewed-up tail into the prompt.
All three constraints together preserve real prose: `Mmm MMM mmm yummy` has no
digit/`;`, `see 1;2;3M for details` has disqualifying letters, and
`1234;56;78M9;10;11M` has only two terminators — none are swallowed.
This is defense-in-depth: it stops the leak/lockout regardless of what blocks
the loop. The underlying event-loop stall during streaming is a separate,
still-open issue that needs live-turn instrumentation to root-cause.
* fix(tui): check mouse-burst noise before fragment recovery; drop test cast
Copilot review on #35512:
- MOUSE_BURST_NOISE_RE was only evaluated when parseTextWithSgrMouseFragments
returned null. A noise blob that contains any intact `<b;c;r M` fragment makes
fragment recovery return non-null, so the whole-text swallow never fired and
the code emitted a pile of recovered mouse events instead of dropping the blob
wholesale (contradicting the comment, and doing extra work mid-stall). Move the
noise check ahead of fragment recovery so pure-noise tokens are dropped early.
Add a regression test for a noise blob carrying intact fragments.
- Drop the unnecessary `(e as { isPasted?: boolean })` cast in the test;
discriminated-union narrowing on `e.kind === 'key'` exposes isPasted directly.
* Potential fix for pull request finding
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
* Potential fix for pull request finding
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
* Potential fix for pull request finding
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
* Potential fix for pull request finding
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
---------
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
The installer's ensure_fts5() handled a no-FTS5 Python by running
'uv python install --reinstall', but WHICH Python builds a uv can
install is baked into the uv binary's download manifest. A stale uv
(e.g. 'pip install uv==0.7.20', which predates python-build-standalone
#694) only knows about pre-FTS5 builds, so --reinstall just pulls the
same FTS5-less interpreter — a no-op for FTS5. Result: 'Could not obtain
an FTS5-capable Python' and a broken session search even on the
supported installer path.
ensure_fts5() now escalates uv itself: reinstall with current uv ->
'uv self update' + reinstall (stale standalone uv) -> install a fresh
standalone uv into a temp dir and reinstall with that (externally-managed
uv that can't self-update, the reported case). Pythons live in uv's
shared store, so the fresh uv's --reinstall overwrites the stale
interpreter in place and the installer's later 'uv python find' resolves
to the FTS5-capable build.
Verified against the reporter's exact repro (ubuntu:24.04 +
pip install uv==0.7.20): Python 3.11.13 (no FTS5) -> 3.11.15 (FTS5).
Defense-in-depth on top of the EphemeralReply gate: even if a config.yaml
path reaches response text via some other path, it can never be delivered
as a native attachment. Matches existing protection for .env, auth.json,
and credentials/.
Co-authored-by: JezzaHehn <jezzahehn@gmail.com>
The compact "<n>|content" gutter from #35368 is now the sole behavior.
Removes the HERMES_READ_GUTTER=padded escape hatch and its env lookup —
no legacy fixed-width path to maintain. Padding was pure token overhead
(~48% more tokens than bare content, ~16% more than compact) with no
measured accuracy gain in the original A/B.
- file_operations.py: drop env lookup + os import; gutter always f"{i}|{line}"
- tests: drop the padded env-override test; compact assertions retained
Over SSH the OSC 11 background-color query round-trip routinely exceeds
the 100ms read budget, so _query_osc11_background() gives up and the late
reply lands after prompt_toolkit has grabbed the tty. prompt_toolkit then
injects the OSC payload as typed text and reads its BEL terminator
(\x07 = Ctrl+G) as a keystroke — Ctrl+G is the open-external-editor
binding, dropping the user into vi with garbage and no obvious way out.
- Skip the OSC 11 probe on remote sessions (SSH_CONNECTION/CLIENT/TTY);
fall back to COLORFGBG / env hints / the dark default.
- Restore the tty with TCSAFLUSH instead of TCSANOW so any partial/late
reply is scrubbed from the input buffer before pt reads it.
Relative paths in write_file/patch could resolve against the agent PROCESS cwd
instead of the terminal's working directory. In a git-worktree session with a
stale TERMINAL_CWD='.' (a relative base), early edits silently landed in the
MAIN checkout, verified there, and reported success — while the agent inspected
the worktree and saw nothing, misreading it as the patch tool no-op'ing.
- _resolve_base_dir(): resolution base is now ALWAYS absolute. A relative
TERMINAL_CWD is anchored to the process cwd once, deterministically, instead
of being left to resolve()-time cwd. Live terminal cwd stays authoritative.
- write_file/patch pass the resolved absolute path to the shell FileOps layer
so the tool layer and shell layer can't disagree about which file is edited.
- Responses now report the absolute resolved_path and files_modified, so a
wrong-cwd mismatch is visible on the first call.
- _path_resolution_warning(): emits a _warning when a relative path resolves
OUTSIDE the live terminal cwd (e.g. a worktree session writing into main).
Validation: 11 new unit tests + 43 live E2E assertions (worktree routing,
mid-session cd, V4A patches, divergence warning, absolute paths, consecutive
patches); 466 existing file/path/terminal tests green.
Tasks can now carry file attachments (PDFs, images, source docs) that
workers read directly — closes the gap where source material had to be
pasted as a path into the task body.
- kanban_db: task_attachments table (additive), Attachment dataclass,
add/list/get/delete accessors, attachments_root/task_attachments_dir
path helpers (per-board, HERMES_KANBAN_ATTACHMENTS_ROOT override)
- build_worker_context: surfaces each attachment's absolute path so the
worker (full file/terminal tool access) reads it via read_file/pdftotext
- dashboard API: POST/GET/DELETE attachment routes (multipart upload,
25MB cap, traversal-safe filenames, root-containment check on download)
- dashboard UI: Attachments section in the task drawer — upload button,
list with download, per-row remove
- docs + tests (13 cases: DB accessors, REST round-trip, traversal
rejection, collision suffixing, worker-context surfacing)
Closes#35338
test_windows_path_not_matched asserted the pre-fix POSIX-only behavior.
The Windows drive-letter support now intentionally matches these paths,
so replace it with parametrized positive cases plus a relative-path
negative guard, mirroring tests/gateway/test_platform_base.py.
The MEDIA_TAG_CLEANUP_RE and extract_local_files path regex both used
(?:~/|/) to anchor paths, which only matches Unix-style absolute and
home-relative paths. Two additional _TOOL_MEDIA_RE patterns in run.py
had the same limitation. Windows absolute paths (C:\Users\..., D:/...)
were silently ignored, causing MEDIA directive delivery to fail.
Add [A-Za-z]:[/\\] as a third anchor alternative in all four regex
locations (base.py x2, run.py x2). Also update path separators in
extract_local_files from / to [/\\] so it can traverse Windows
directory trees.
Revert accidental + quantifier in MEDIA_TAG_CLEANUP_RE lookahead
that changed match-one to match-one-or-more (unrelated to fix).
Fixes: #34632
The per-platform reconnect watcher auto-paused a platform after 10
consecutive reconnect failures, setting next_retry=inf and requiring a
manual /platform resume to recover. But both pause sites only ever fire
on *retryable* failures — non-retryable errors (bad auth) already drop
out of the retry queue earlier. So a transient DNS outage that spanned
the watcher's backoff window would silently park the bot forever, even
after connectivity returned.
The watcher's own docstring already promised 'retryable failures keep
retrying at the backoff cap indefinitely' — the code contradicted it.
Remove the auto-pause from both reconnect-failure branches. Retryable
failures now retry at the 5-min backoff cap forever and self-heal once
the network recovers. The circuit breaker (_pause_failed_platform /
_resume_paused_platform) stays for manual /platform pause|resume.
Fixes#35284.
Convert the salvaged text-debounce delays from HERMES_* env vars to
config.yaml (gateway.platforms.<name>.extra.text_batch_delay_seconds /
text_batch_split_delay_seconds), per the '.env is for secrets only'
policy. Adds a finite/non-negative guard so bad YAML values fall back to
the defaults instead of crashing asyncio.sleep().
- whatsapp.py / weixin.py: read delays via _coerce_float_extra(config.extra)
- update Weixin content-dedup regression test for the deferred dispatch path
- add text-debounce coverage (whatsapp + weixin): defaults, config override,
bad-value fallback, env-var-ignored, burst-collapse, lone-message
- docs: WhatsApp + Weixin config keys
WhatsApp and WeChat (Weixin/iLink) both deliver messages individually
without any client-side batching, so rapid multi-message bursts (forwarded
batches, paste-splits, etc.) each trigger a separate agent invocation.
This wastes tokens (redundant system prompts / context for each fragment)
and degrades UX (the user receives reply fragments instead of a single
coherent response).
Both adapters now mirror the Telegram adapter's proven text-debounce
pattern:
- _text_batch_delay_seconds / _text_batch_split_delay_seconds
(configurable via env vars)
- _pending_text_batches dict for per-session aggregation
- _enqueue_text_event() concatenates successive TEXT messages and
resets the flush timer
- _flush_text_batch() dispatches after the quiet period expires
Configurable via env vars:
HERMES_WHATSAPP_TEXT_BATCH_DELAY_SECONDS (default 5.0)
HERMES_WHATSAPP_TEXT_BATCH_SPLIT_DELAY_SECONDS (default 10.0)
HERMES_WEIXIN_TEXT_BATCH_DELAY_SECONDS (default 3.0)
HERMES_WEIXIN_TEXT_BATCH_SPLIT_DELAY_SECONDS (default 5.0)
The merged 0.0.0.0/:: insecure-bind fix (#35141) did not cover binding
directly to a specific non-loopback address (e.g. a Tailscale/LAN IP via
--host 100.64.0.10 --insecure). In that mode the dashboard HTML loaded but
every WebSocket upgrade was rejected by the loopback-only peer guard, so
/chat connected then silently received no data.
Generalize _ws_client_is_allowed to lift the loopback-only peer gate for
any explicit non-loopback bound host, not just the 0.0.0.0/:: wildcard.
DNS-rebinding stays blocked: _ws_host_origin_is_allowed already requires
the Host header to exactly match the bound interface for explicit binds,
mirroring _is_accepted_host on the HTTP layer.
Co-authored-by: pxdsgnco <14163800+pxdsgnco@users.noreply.github.com>
Layer an exception guard on top of the empty-response fix so a crash
inside the agent (e.g. OSError from prompt_toolkit/Vt100 when stdout is a
non-TTY pipe, per #30623) is surfaced on the real stderr with rc=1 instead
of crashing past the redirect_stderr block. KeyboardInterrupt/SystemExit
are re-raised so Ctrl-C and explicit exits still propagate.
Also map briancl2 in scripts/release.py AUTHOR_MAP for the cherry-picked
empty-response commit.
Adapts the exception-guard approach from sweetcornna's PR #33818.
Co-authored-by: sweetcornna <96944678+ymylive@users.noreply.github.com>
browser_console(expression="document.body") returned the cryptic CDP error
"Object reference chain is too long" instead of a usable result.
With returnByValue=true, Chrome deep-serializes the eval result; for a live
DOM Node/NodeList/Window that serialization overruns CDP's recursion guard
and fails the whole call with a protocol-level error (not a JS exception),
which _browser_eval surfaced raw.
- browser_supervisor.evaluate_runtime: on that specific error, retry once
with returnByValue=false so Chrome returns the node's description string —
the same graceful path already used for document.querySelector() results.
- browser_tool._browser_eval (CLI subprocess fallback): the subprocess can't
retry, so convert the reference-chain error into actionable guidance
(extract a primitive / use JSON.stringify) instead of leaking it raw.
No expression rewriting — normal evals (1+41 -> 42) are untouched.
A handoff persisted under an older SUMMARY_PREFIX can be inherited into a
resumed lineage. _strip_summary_prefix only matched the current/legacy
literal, so on re-compaction the old 'resume exactly from Active Task'
directive stayed embedded in the body and kept hijacking replies to new,
unrelated user messages.
- Add _HISTORICAL_SUMMARY_PREFIXES (pre-#35344 prefix) and strip/recognize
them in _strip_summary_prefix + _is_context_summary_content so resumed
stale handoffs are re-normalized to the current latest-message-wins prefix.
- Reconcile the overlapping Active Task template edits from the salvaged
#26290 (reverse-signal cancellation) and #32787 (capture open questions /
decisions, don't write None too eagerly) — both intents kept.
- Regression coverage in tests/agent/test_resume_stale_active_task.py.
- AUTHOR_MAP entries for both salvaged contributors.
The Active Task field in compression summaries is the single most important
field for task continuity across context boundaries. The previous template
described it narrowly as a 'task assignment' or 'request', which caused the
summary LLM to write 'None' whenever the user's most recent input was a
question, a decision request, or a discussion turn rather than an
imperative command. The assistant on the other side of the compaction then
treated the conversation as resolved and gave a generic recap instead of
answering the still-open question.
Expand the template guidance to cover:
* explicit task assignments
* questions awaiting an answer
* decisions awaiting input (A vs B)
* ongoing discussions where the assistant owes the next substantive reply
Reserve 'None' for the rare case where the last exchange was fully
resolved (e.g. user said 'thanks, that's all').
Also tighten the trailing CRITICAL instruction in the summary prompt so the
LLM cannot fall back to the old 'no imperative command → None' heuristic.
No behavioural code changes — template strings only. All 83 existing
compressor tests pass.
SUMMARY_PREFIX previously contained two contradictory directives:
1. "treat it as background reference, NOT as active instructions"
"Do NOT answer questions or fulfill requests mentioned in this summary"
"Respond ONLY to the latest user message that appears AFTER this summary"
2. "Your current task is identified in the '## Active Task' section of the
summary — resume exactly from there."
When the latest user message contradicted Active Task (e.g. 'stop the
i18n refactor', 'never mind, look at grafana instead'), models tended to
follow (2) anyway because 'resume exactly' is a strong, unambiguous
directive — leading to repeated re-surfacing of already-cancelled work
across turns, even after explicit 'stop'/'don't keep bringing that up'
messages from the user.
This change:
- Removes the conflicting 'resume exactly from Active Task' clause.
- Makes the precedence explicit: latest user message is the single source
of truth; it WINS on conflict; cancelled Active Task / In Progress /
Pending User Asks / Remaining Work must be discarded entirely (no
'wrap up the old task first').
- Names canonical reverse signals (stop, undo, roll back, never mind,
just verify, topic change) so the model recognizes them as cancellation
triggers, not background context.
- Updates the summarizer template instruction so the LLM doesn't
mechanically copy a cancelled task into Active Task on the next
compaction (it's instructed to copy the reverse signal verbatim).
- Preserves: REFERENCE ONLY framing, MEMORY.md/USER.md authority, and
the 'don't repeat work already reflected in session state' clause.
Adds tests/agent/test_summary_prefix_semantics.py to pin invariants so
the conflict can't regress.
Tested:
- All compaction tests pass: tests/agent/test_context_compressor.py,
tests/agent/test_context_compressor_summary_continuity.py,
tests/run_agent/test_413_compression.py,
tests/run_agent/test_compression_persistence.py,
tests/run_agent/test_compression_boundary_hook.py,
tests/cli/test_manual_compress.py — 117/117 passing.
- Tested on macOS.
Follow-up on the #35309 regression test: the trailing `with _lock: pass`
asserted nothing. Replace it with a concrete assertion that
_interrupted_threads is empty after the worker exits, directly verifying
the leak the fix prevents.
When _invoke_tool raises a BaseException (CancelledError, KeyboardInterrupt),
the cleanup code at the end of _run_tool was bypassed because it sat outside
the except block (which only catches Exception). ThreadPoolExecutor recycles
thread IDs, so the leaked tid in _interrupted_threads poisons the next tool
scheduled on that thread — it instantly aborts with 'Interrupted'.
Move the discard + _set_interrupt(False) into a finally block so cleanup
runs regardless of how the worker exits.
Fixes#35309
Empty model could reach the API on a recovery turn after stream_interrupt_abort,
failing HTTP 400 "No models provided" with no recovery — the session went
silent until the user manually re-sent (#35314).
- gateway/run.py: cache last-successfully-resolved model per session (+ a
process-wide slot); when a fresh config read returns an empty model on a
recovery turn, reuse the last-known-good instead of building model="".
- run_agent.py + agent/conversation_loop.py: only emit "trying fallback..."
status when a fallback chain actually exists, so the UI stops announcing a
fallback that will never run (also #17446).
- tests: empty-model recovery + _has_pending_fallback gate.
* fix(file-tools): handle UTF-8 BOM in read_file / write_file / patch
Some Windows editors prepend an invisible UTF-8 BOM (U+FEFF) to text
files. We had no awareness of it, so: read_file surfaced a phantom
U+FEFF as the first character; patch matches against the true first
line could miss; and a write/patch round-trip silently stripped the
marker, changing the file's byte signature.
Now:
- read_file / read_file_raw strip a single leading BOM so the model
never sees it (only on the first chunk — the marker lives at byte 0).
- patch_replace strips the BOM before fuzzy-matching (so an exact
first-line match works) and its post-write verification compares
BOM-stripped content.
- write_file restores the BOM when the original file had one and the
new content doesn't, mirroring the existing line-ending preservation
(detect on disk via a cheap `head -c 3` probe or reuse pre_content,
re-prepend across the edit). Guards against double-BOM.
Mid-content U+FEFF is left alone (it's data there, not a file marker).
Tests: TestBomHandling (real LocalEnvironment) — read-strips, raw-read
strips, write preserves, no-BOM-when-original-had-none, no-double-BOM,
patch round-trip preserves, patch matches first line through a BOM,
plus helper unit tests. 208 file-tool tests green.
* fix(kanban): respect mobile safe areas in task detail drawer
The task detail drawer is a body-level z-60 fixed overlay using
height:100vh starting at the viewport top. On mobile this puts the
drawer header behind the dashboard's fixed top bar (min-h-14, z-40)
and lets the bottom comment input sit under the browser's collapsing
nav bar.
- drawer: 100vh -> 100dvh (+ max-height:100dvh), 100vh kept as fallback
- head: padding-top honors env(safe-area-inset-top); mobile (<1024px,
matching the lg breakpoint where the fixed bar shows) clears the
3.5rem header
- comment-row + body: bottom padding extended with
env(safe-area-inset-bottom) so the bottom-most element clears the
mobile browser chrome
Mirrors the host shell idiom (100dvh + env(safe-area-inset-bottom) in
web/), and web/index.html already sets viewport-fit=cover so the insets
resolve. max()/calc() fallbacks leave desktop unchanged.
Closes#35324
read_file's gutter used a fixed-width zero/space-padded prefix
(" 1|content"). The padding is pure token overhead: measured with
cl100k on real Hermes source, the padded gutter costs ~48% more tokens
than bare content and ~16% more than a compact "<n>|content" gutter,
because the leading spaces tokenize into extra tokens on every line.
Switched the default to the compact "<n>|content" form. An A/B
(Sonnet 4.6 via OpenRouter, 2 passes, 4-task battery, every claim
verified against ground truth) showed:
- padded : 4/4 PASS both passes
- compact : 4/4 PASS both passes ← keeps line-referencing + patch
- none : 3/4 PASS both passes ← dropping numbers entirely made
the model hand-count lines and answer off-by-one (33 vs 34)
So we keep the line numbers (the model genuinely uses them to reference
lines) but drop the wasteful padding — capturing ~14% of the read-token
cost with zero measured accuracy change. Dropping numbers entirely
(the larger 33% saving) is rejected: it regresses line-referencing.
patch/fuzzy_match never consumed the gutter (they match old_string text
and compute char offsets internally), so editing is unaffected. No
downstream parser keys on the fixed-width columns. HERMES_READ_GUTTER=
padded restores the legacy format for anyone relying on alignment.
Tests: updated the 3 format assertions to the compact gutter; added an
env-override test for the legacy padded format. 209 file-tool tests green.
Some Windows editors prepend an invisible UTF-8 BOM (U+FEFF) to text
files. We had no awareness of it, so: read_file surfaced a phantom
U+FEFF as the first character; patch matches against the true first
line could miss; and a write/patch round-trip silently stripped the
marker, changing the file's byte signature.
Now:
- read_file / read_file_raw strip a single leading BOM so the model
never sees it (only on the first chunk — the marker lives at byte 0).
- patch_replace strips the BOM before fuzzy-matching (so an exact
first-line match works) and its post-write verification compares
BOM-stripped content.
- write_file restores the BOM when the original file had one and the
new content doesn't, mirroring the existing line-ending preservation
(detect on disk via a cheap `head -c 3` probe or reuse pre_content,
re-prepend across the edit). Guards against double-BOM.
Mid-content U+FEFF is left alone (it's data there, not a file marker).
Tests: TestBomHandling (real LocalEnvironment) — read-strips, raw-read
strips, write preserves, no-BOM-when-original-had-none, no-double-BOM,
patch round-trip preserves, patch matches first line through a BOM,
plus helper unit tests. 208 file-tool tests green.
The plugin apply_yaml_config_fn dispatch loop only ran when a top-level
platform block (e.g. `discord:`) existed. Configs that defined a platform
only under `platforms.<name>` or `gateway.platforms.<name>` skipped the
hook, so `platforms.discord.extra.allow_from` never reached
DISCORD_ALLOWED_USERS. Fall back to those nested blocks when the top-level
one is absent.
Also map byquenox@gmail.com -> Que0x for the salvaged commits.
Adds a 'hermes prompt-size' command that reports the fixed prompt budget
for a fresh session: system prompt total, skills index, memory, user
profile, prompt tiers, and tool-schema JSON bytes. Runs offline (dummy
credentials force the direct-construction path, no network call).
Lets users see which block dominates their per-call payload — the skills
index is often the largest single block when many skills are installed
(issue #34667). Zero model-tool footprint: it's a top-level CLI
subcommand, not an agent tool.
--platform <name> simulates a channel's platform hint; --json emits a
machine-readable breakdown.
Closes#34667
The 'summoning hermes…' phase blocked on gateway.ready, which ran MCP
tool discovery inline. Any configured-but-unreachable MCP server burned
its full connect-retry backoff (1+2+4s ≈ 7s) before the composer
appeared — startup went from instant to ~7.5s of dead air for anyone
with a down stdio/http server in mcp_servers.
Move discovery into a background daemon thread so gateway.ready fires
immediately; tools register into the shared registry as servers connect,
and the agent isn't built until the first prompt. Measured spawn→ready:
~7500ms → ~115ms (dead twozero_td server in config).
Also drop rich.console + prompt_toolkit off banner.py's import path
(lazy-imported inside cprint/build_welcome_banner). tui_gateway.server
imports banner only to reach the lightweight prefetch_update_check
helper; the eager rich/pt imports added ~45ms before gateway.ready for
no benefit. tui_gateway.server import: ~115ms → ~69ms.
The no-home-channel error for send_message derived the env var name
generically as <PLATFORM>_HOME_CHANNEL, producing EMAIL_HOME_CHANNEL for
the email platform. But gateway/config.py reads EMAIL_HOME_ADDRESS, so a
user following the error's guidance would set a variable that is never
consulted. Add a per-platform override map so the email hint names the
variable actually read; all other platforms keep the generic hint.
When using send_message with the email platform, valid email addresses
like user@example.com were not recognized as explicit targets by
_parse_target_ref(). This caused the function to return (None, None,
False), forcing the system into channel-name resolution which has no
way to resolve a raw email address, resulting in 'No home channel set
for email' errors.
Add _EMAIL_TARGET_RE pattern and email platform handler in
_parse_target_ref() so email addresses are treated as explicit targets
and routed directly without requiring a home target configuration.
Adds two real-client tests on top of the salvaged #34783 fix:
- config-less custom:<name> endpoint routes via the carried live base_url
(guards the #34777 symptom directly, not just the wiring)
- named custom:<name> WITH a config entry still resolves via the
named-custom branch (regression guard against collapsing to bare custom)
When a user configures a custom: provider (e.g. custom:openclaw-router),
set_runtime_main() only stored provider and model in process-local globals.
_resolve_auto() then had no base_url or api_key for the custom endpoint,
causing Step 1 to fail and auxiliary tasks (approval, compression, title
generation) to fall through to the aggregator chain and route to wrong
providers.
Fix: extend set_runtime_main() to accept base_url, api_key, and api_mode
keyword arguments; store them in new globals alongside the existing provider
and model; fall back to these globals in _resolve_auto() when the main_runtime
dict is empty. The call site in conversation_loop.py now passes all five
fields from the agent object.
Fixes#34777
hermes update on Windows still aborted with 'Another hermes.exe is running',
listing its own launcher shim(s) as concurrent instances (issues #29341,
#34795). The distlib Scripts\hermes.exe launcher spawns python.exe and waits;
detection runs in the python child, so the launcher shim shows up in
process_iter.
The prior fix walked the ancestor chain with per-hop current.parent() inside
'except: break' — the first psutil AccessDenied/NoSuchProcess (common on
Windows across session/elevation boundaries) bailed the walk early, leaving
the launcher in the candidate set and re-triggering the false positive.
- Switch to proc.parents() (whole ancestor list in one call), evaluate each
ancestor independently so one unreadable hop never strands the launcher.
- Only exclude ancestors whose exe is itself a shim, so a genuine second
hermes.exe under a non-Hermes parent (Desktop backend child) is still flagged.
- Message now prints a copy-pasteable 'taskkill /PID … /F' for the exact stale
PIDs so a user who already closed everything can self-remediate.
Conservative shim-only ancestor approach credited to the parallel attempts in
PRs #29358 (xxxigm) and #31808 (jquesnelle).
Normalize Gmail API message header names to lowercase before lookup so
gmail get/search/reply populate to/subject/from regardless of the casing
the message was stored with. Emit conventional MIME header casing
(To/Subject/Cc/From) on send and reply.
Fixes#34806
Co-authored-by: Donovan Yohan <donovan-yohan@users.noreply.github.com>
In the concurrent tool-execution path, checkpoint preflight (write_file,
patch, destructive terminal) fired BEFORE plugin guardrail block_result
was computed. A blocked write_file could still dirty checkpoint state
(doc_modified_this_turn, _last_write_file_call_id, turn_counter).
Move checkpoint preflight to AFTER block_result computation, gated on
`if block_result is None:` — matching the invariant the sequential path
already enforces.
Follow-up to LengR's #35181 salvage:
- gateway text-path uses getattr(self, '_session_db', None) to match the
picker callback path (defensive for object.__new__() gateway test pattern).
- add SessionDB.update_session_model test asserting it overwrites the
COALESCE-pinned model and survives subsequent token updates (#34850).
When a user switches models mid-session via /model, the gateway updates
the in-memory agent and session overrides, but the database was never
updated. The COALESCE(model, ?) in update_token_counts() only fills NULL
values, so the dashboard always showed the original model.
Fix: Add SessionDB.update_session_model() that unconditionally sets the
model column, and call it from both the interactive picker and direct
/model command paths in the gateway.
Fixes#34850
asyncio.create_subprocess_exec cannot run .cmd/.bat files on Windows
because CreateProcess expects a valid PE executable. npm-installed LSP
servers (intelephense, typescript-language-server, etc.) ship as .cmd
shims on Windows, causing WinError 193 on spawn.
Detect .cmd/.bat extensions and wrap with cmd.exe /c before spawning.
Gated behind sys.platform == 'win32' — no code path changes elsewhere.
Fixes#34864
The salvaged grandchild-reaping tests reference os.getpgid/os.killpg and
pytest.mark/skip/importorskip directly, but the file only imported asyncio,
signal, and unittest.mock. Add the missing imports so collection succeeds
on current main.
The orphan reaper for stdio MCP subprocesses only tracked the direct child
PID spawned by ``stdio_client`` (e.g. ``openclaw mcp serve``). When that
wrapper itself spawned a helper (``claude mcp serve``) and then exited, the
helper reparented to ``systemd --user`` and survived shutdown.
The MCP SDK already spawns stdio children with ``start_new_session=True``,
so the wrapper is its own pgroup leader and same-pgroup descendants are
reachable via ``killpg``. Capture the pgid at spawn time and reap via
``killpg(pgid, sig)`` so reparented grandchildren are reaped alongside the
direct child, even after the wrapper itself exits. Falls back to per-pid
``os.kill`` on Windows or when no pgid was recorded.
Fixes part 2 (orphan ``claude mcp serve``) of #23799. Part 1 (per-invocation
respawn) was confirmed by the reporter to be an environmental artifact, not
a code bug.
Extends the uv-tool detection (briandevans, #29703) to cover the
remaining no-venv install layouts that hit the same uv 'No virtual
environment found' error:
- pipx-managed installs (sys.prefix under .../pipx/...) -> 'pipx upgrade',
matching scripts/auto-update.sh (pipx-detection idea from
inchargeautomation-lab, #29852)
- bare pip outside any venv -> 'uv pip install --system --upgrade'
- venv (launcher shim) keeps the VIRTUAL_ENV overlay from #35224 and never
gets --system, so the install always targets the venv, not system Python
The four branches are mutually exclusive; VIRTUAL_ENV is exported only for
the uv-pip-in-venv path (uv tool / pipx upgrade ignore it).
Co-authored-by: Joshua Kimbrell <incharge.automation@gmail.com>
Copilot review on PR #29703 flagged two issues with the `uv tool list`
fallback in `is_uv_tool_install`:
1. False positive: `uv tool list` returns the *machine*'s installed
tools, not the active install. A regular pip/venv Hermes on a host
that also has `uv tool install hermes-agent` available would be
misclassified as a uv-tool install, and `hermes update` would
upgrade the wrong copy.
2. Overhead: the subprocess call (up to a 15s timeout) was triggered
even from `recommended_update_command_for_method`, which just
computes a display string.
Restrict detection to properties of the running interpreter
(`sys.prefix` and `sys.executable` — both can carry the uv-tool layout
marker depending on entry point). Drop the `uv tool list` fallback and
the `uv_path` parameter entirely. `_cmd_update_pip` now also surfaces a
clear hint when the runtime looks like a uv-tool install but `uv` is
missing from PATH, instead of silently falling back to `python -m pip`.
Hermes installed via `uv tool install hermes-agent` lives outside any
venv. `_cmd_update_pip` previously ran `uv pip install --upgrade`, which
errors with `No virtual environment found; run uv venv ...`. The user
hits this on the very first `hermes update` after a standard
non-`--system` install with `uv` on PATH.
Add `is_uv_tool_install()` in `hermes_cli/config.py`: fast path inspects
`sys.prefix` for the standard `uv/tools/hermes-agent/` layout, falls
back to `uv tool list` for non-standard prefixes. Both the
user-facing `recommended_update_command_for_method("pip")` string and
the actual subprocess invocation in `_cmd_update_pip` now switch to
`uv tool upgrade hermes-agent` when detected. Non-tool installs and the
no-`uv` fallback keep their existing commands unchanged.
* Inspired by Claude Code: /compress here [N] — boundary-aware 'summarize up to here'
Adds a user-chosen compression boundary to the existing /compress command.
/compress here [N] summarizes everything except the most recent N exchanges
(default 2), which are preserved verbatim — letting the user pick the
compression boundary instead of relying on the automatic token-budget heuristic.
Inspired by Claude Code's Rewind 'Summarize up to here' action (v2.1.139,
Week 20, May 2026): https://code.claude.com/docs/en/whats-new/2026-w20
- hermes_cli/partial_compress.py: pure split/parse helpers + seam-alternation
guard (shared by CLI and gateway).
- cli.py / gateway/run.py: route 'here [N]' / '--keep N' to partial compression;
compress only the head, re-append the verbatim tail through the seam guard.
- Preserves message-flow role alternation (seam guard merges any illegal
user->user / assistant->assistant adjacency).
- Reuses the existing _compress_context session-rotation/lock machinery — no
changes to the compression core.
- Bare /compress (full) and /compress <focus> behavior unchanged.
Tests: 12 helper unit tests + 5 CLI integration tests + E2E (interleaved
tool-call transcript, degenerate/multimodal seams, real handler path).
* fix(file-tools): make write_file/patch atomic (temp-file + rename)
write_file streamed content straight into the target via `cat > path`, so
a crash, SIGKILL, or truncated pipe mid-write left the file half-written
and corrupt. patch_replace routes through write_file, so it shared the flaw.
Now writes stream into a temp file in the SAME directory and `mv` it over
the target — a real same-filesystem rename, which is atomic on POSIX and on
every terminal backend (local/docker/ssh/modal). A failed write leaves the
original byte-intact and leaks no temp file. The existing file's mode is
preserved across the swap (stat + chmod, GNU/BSD), and content still rides
stdin so there's no ARG_MAX limit. A trap cleans the temp on any error path.
Tests: added TestAtomicWrite (real LocalEnvironment, no mocks) covering
inode-change-on-overwrite, mode preservation, failed-write-leaves-original,
no-temp-leak, special chars, and patch routing. Updated two mocks in
test_file_operations.py that keyed on the literal `cat >` write command to
key on the stdin_data behavioral signal instead. 200 file-tool tests green.
The hermetic CI env (slice 4/6) redirects HERMES_HOME, so a post-restore
_read_manifest() can resolve to an empty/redirected manifest path and return
{}. Assert on sync_skills's in-memory return value (synced["copied"]) instead,
which is the resilient signal that the skill was re-copied and is no longer in
limbo.
The cherry-picked fix's onerror handler chmod'd only the failing path, but
unlinking a child requires write permission on its PARENT directory. On a true
Nix-store copy (r-xr-xr-x dirs + files) rmtree still failed. Now chmod the
parent dir as well before retrying.
Also rewrites the regression test: the original asserted the helper FAILS on a
read-only dir (documenting the limitation), which is the wrong success criterion.
Split into two tests — restore succeeds on a full read-only tree (real Nix case),
and manifest is preserved when removal genuinely cannot proceed (monkeypatched).
Two related bugs in tools/skills_sync.py affecting Nix-store and
immutable-package installs:
**#34972 — reset_bundled_skill corrupts manifest on rmtree failure:**
The function deleted the manifest entry BEFORE attempting rmtree. If
rmtree failed (read-only files from Nix store), the function returned
early — leaving the skill in a manifest-less limbo state where future
syncs silently skip it forever.
Fix: reorder steps — attempt rmtree FIRST, only delete manifest entry
after rmtree succeeds. If rmtree fails, nothing is changed.
**#34860 — stale .bak directories after sync:**
sync_skills() called shutil.rmtree(backup, ignore_errors=True) which
silently failed on read-only files, leaving persistent .bak dirs.
Fix: add _rmtree_writable() helper that makes files writable via an
onerror callback before retrying removal. Used in both sync_skills()
backup cleanup and reset_bundled_skill().
Fixes#34972Fixes#34860
mcp_serve.py was missing from the setuptools py-modules list, causing
hermes mcp serve to crash with ModuleNotFoundError on standard pip installs.
Fixes#34871
* Inspired by Claude Code: /compress here [N] — boundary-aware 'summarize up to here'
Adds a user-chosen compression boundary to the existing /compress command.
/compress here [N] summarizes everything except the most recent N exchanges
(default 2), which are preserved verbatim — letting the user pick the
compression boundary instead of relying on the automatic token-budget heuristic.
Inspired by Claude Code's Rewind 'Summarize up to here' action (v2.1.139,
Week 20, May 2026): https://code.claude.com/docs/en/whats-new/2026-w20
- hermes_cli/partial_compress.py: pure split/parse helpers + seam-alternation
guard (shared by CLI and gateway).
- cli.py / gateway/run.py: route 'here [N]' / '--keep N' to partial compression;
compress only the head, re-append the verbatim tail through the seam guard.
- Preserves message-flow role alternation (seam guard merges any illegal
user->user / assistant->assistant adjacency).
- Reuses the existing _compress_context session-rotation/lock machinery — no
changes to the compression core.
- Bare /compress (full) and /compress <focus> behavior unchanged.
Tests: 12 helper unit tests + 5 CLI integration tests + E2E (interleaved
tool-call transcript, degenerate/multimodal seams, real handler path).
* feat(model-picker): group multi-endpoint providers under one row
The interactive provider pickers (hermes model, setup wizard, Telegram
/model) listed every provider slug flat, so vendors with several endpoints
(Kimi/Moonshot, MiniMax, xAI Grok, Google Gemini, OpenAI, OpenCode, GitHub
Copilot) each occupied multiple top-level rows. Now related slugs fold into
one top-level row that drills down to the specific endpoint.
- models.py: add PROVIDER_GROUPS table + group_providers() fold (display
only — CANONICAL_PROVIDERS, slugs, --provider, /model <provider:model>
all unchanged and individually addressable).
- hermes model (main.py): group rows drill into a member sub-picker, then
dispatch to the existing _model_flow_* unchanged. setup wizard inherits it.
- Telegram /model: new mpg:<group> callback expands to member mp:<slug>
buttons; single authenticated member degrades to a direct button.
- Grouping is the single shared fold across all three surfaces.
Validation: 163 targeted tests pass; E2E confirms group->member->model
resolves to the correct concrete slug for all families.
Follow-up to the budget-exhaustion recovery fix. recompute_ready's
new circuit-breaker guard resolved its effective limit from per-task
max_retries -> DEFAULT_FAILURE_LIMIT, skipping the dispatcher's
configured kanban.failure_limit. _record_task_failure resolves
max_retries -> failure_limit(config) -> DEFAULT, so the two disagreed
whenever an operator set kanban.failure_limit != 2:
- config > 2: a task could get stuck at DEFAULT(2) before reaching its
allowed retry count.
- config < 2: a task the breaker already blocked could be auto-recovered
back to ready, defeating the stricter limit.
Thread the dispatcher's failure_limit through dispatch_once into
recompute_ready so the guard and the breaker share one resolution order.
Updated test_circuit_breaker_block_still_auto_promotes (it asserted a
failures=5 block auto-recovers and resets the counter — that's the
pre-#35072 behavior the loop fix removes); it now exercises a below-limit
transient block, with the at-limit case covered in test_kanban_db.py.
Added two tests for the config-tier and per-task override resolution.
recompute_ready() previously reset consecutive_failures to 0 when
auto-recovering a blocked task. This defeated the circuit-breaker:
a task that repeatedly exhausted its iteration budget would cycle
forever (block → auto-recover with counter=0 → respawn → budget
exhausted → block → …) with no signal to the operator.
Fix: don't auto-recover tasks whose consecutive_failures has reached
the effective failure limit (per-task max_retries or
DEFAULT_FAILURE_LIMIT). The counter is also preserved across
recovery so the breaker can accumulate across cycles.
Fixes#35072
Legacy kanban boards (pre-AUTOINCREMENT schema) crashed the gateway
notifier on every tick — int(None) on a NULL id in unseen_events_for_sub
— silently losing all kanban notifications. CREATE TABLE IF NOT EXISTS
skips existing tables regardless of schema and _add_column_if_missing
only adds columns, so neither could fix a drifted primary-key type.
_rebuild_drifted_tables() detects the legacy shape via PRAGMA table_info
and rebuilds task_events/task_comments/task_runs (TEXT PK -> INTEGER
AUTOINCREMENT) and kanban_notify_subs.last_event_id (TEXT/NULL -> INTEGER
NOT NULL DEFAULT 0), preserving data. The whole pass is one transaction
so an interruption can't leave a table half-renamed, and recreates every
index DROP TABLE would otherwise take down (including idx_events_run).
Co-authored-by: liuhao1024 <liuhao1024@users.noreply.github.com>
The per-entry psScript callback was identical for every PowerShell entry,
so the function-valued union member added structure without behavior. Collapse
WriteCmd to a plain stdin boolean and apply the one shared base64 script in the
write loop. Document the CP936 root cause inline.
Co-authored-by: BROCCOLO1D <279959838+BROCCOLO1D@users.noreply.github.com>
When writing text to the clipboard via PowerShell (WSL2 and native Windows),
the previous implementation piped text through stdin using `Set-Clipboard
-Value $input`. PowerShell reads stdin using the Windows system's default
ANSI code page (e.g. CP936 for Chinese Windows), causing all non-ASCII
characters (CJK, emoji, accented) to become garbled.
Fix: encode the text as base64 in Node.js and pass it as a command argument.
PowerShell decodes it from base64 using explicit UTF-8, bypassing the code
page issue entirely.
Fixes#35107
_download_image() wrapped every download attempt in a blanket
`except Exception` and retried 3x with 2s/4s/8s backoff regardless of
cause. A 404/403 image URL would never resolve on retry, so it just
burned up to 6s of wall-clock + extra GETs before failing — inflating
latency for a deterministic failure (issue #32296, umbrella #35114).
Add _is_retryable_download_error(): 4xx client errors (except 429),
website-policy PermissionError, and too-large/SSRF ValueError now raise
on the first attempt. 429, 5xx, and unclassified network errors stay
retryable. Removed the now-unreachable fall-through branch since the
loop always returns on success or re-raises on the final/terminal attempt.
The subagent spawn-observability overlay added a `(/agents)` hint, but
only on the standalone "Spawn tree" panel, gated behind `!inlineDelegateKey`
— it never showed for a single delegate_task call, and only appeared once
subagents had already registered. A nudge that arrives at the end (or only
after spawn) is useless for the actual goal: letting users open the live
monitor *while* delegation is running.
Surface it the moment delegation starts, on both surfaces:
TUI (ui-tui/src/components/thinking.tsx)
- Show `(/agents)` on any "Delegate Task" tool group as soon as it appears
(in-flight, before any subagent registers), not gated on subagents
already existing. Same `startsWith('Delegate Task')` predicate already
used for delegateGroups.
CLI (agent/tool_executor.py)
- Append `· /agents to monitor` to the delegate spinner label, which is
displayed for the full duration of the delegate_task call. The previous
attempt put the hint on the completion line (get_cute_tool_message),
which only renders after the call finishes — reverted.
TUI tsc clean (pre-existing execFileNoThrow type errors unrelated);
subagentTree 35/35; display.py reverted to upstream.
A git conflict resolution (reset --hard or merge) can revert
hermes_cli/__init__.py to a stale __version__ while pyproject.toml stays
current, so 'hermes --version' silently reports the wrong version. Nothing
cross-checked the two files.
Add a version-consistency check to the doctor 'Python Environment' section:
reads the [project] version from pyproject.toml and compares it to
hermes_cli.__version__. Reports OK when they match, fails with a re-sync
hint when they drift, and is a silent no-op for installed wheels where
pyproject.toml isn't present.
Closes#35070
Adds _set_process_title() in hermes_cli/main.py, called first thing in
main(). Tries setproctitle (optional) for a full ps-args rewrite, then
falls back to ctypes prctl(PR_SET_NAME) on Linux / pthread_setname_np on
macOS. No-op on Windows and on any failure. No new dependency: the
setproctitle path is best-effort via ImportError guard.
Fixes#35108
Allow non-loopback websocket peers when the dashboard is explicitly exposed with --host 0.0.0.0/:: and --insecure.
This fixes the failure mode where /chat rendered over LAN but /api/ws and /api/events were rejected with HTTP 403, leaving the embedded TUI chat disconnected.
Add regression coverage for the insecure public bind case in the dashboard websocket auth tests.
The max-iterations summary path (`handle_max_iterations`) hand-builds its
message list and calls `chat.completions.create()` directly, bypassing
`ChatCompletionsTransport.convert_messages()`. It only popped
("reasoning", "finish_reason", "_thinking_prefill"), so `tool_name` (SQLite
FTS bookkeeping), the `codex_*` reasoning carriers, and other internal
`_`-prefixed scaffolding leaked to the wire.
Strict OpenAI-compatible gateways (Fireworks-backed OpenCode Go, Mistral,
Moonshot/Kimi) reject these with HTTP 400 "Extra inputs are not permitted,
field: 'messages[N].tool_name'", so a long tool-using session that exhausts
the iteration budget fails to summarise instead of returning the result.
Mirror convert_messages() in this path: also drop tool_name,
codex_reasoning_items, codex_message_items, and every `_`-prefixed key.
Copy-on-write is already in place, so internal history keeps the fields for
FTS / Codex-fallback.
Adds a regression test to TestHandleMaxIterations asserting the summary
request carries none of the schema-foreign keys (fails on main, passes here).
detect_install_method() returned "docker" for any container (is_container()),
before the .git check. Both supported installs already self-identify via the
.install_method stamp read first: the curl installer (scripts/install.sh)
git-clones and stamps "git"; the published nousresearch/hermes-agent image
stamps "docker" at boot via docker/stage2-hook.sh. An unsupported manual
install dropped into a container has no stamp, so the bare container check
hijacked it to "docker" and 'hermes update' bailed with the docker-pull
guidance.
Drop the redundant is_container() -> docker fallback. Unstamped installs now
fall through to the .git/pip checks like any off-path install; both supported
paths are unaffected because the stamp wins first.
Fixes#34397.
The TUI already ships a rich /agents spawn-tree dashboard (live tree,
timeline, per-child tokens/cost/files/tools, kill/pause), but nothing
surfaced it — during delegation the transcript stayed quiet and users
had to already know to type /agents.
Drop a one-time transient activity hint ("subagents working · /agents
to watch live") the first time a turn starts delegating, matching the
existing "· /logs to inspect" house style. Guards keep it unobtrusive:
- fires at most once per turn (resets on message.start)
- silent when the /agents overlay is already open
- gated by display.tui_agents_nudge (default true)
Hooked on subagent.start, not subagent.spawn_requested: the delegate
progress callback in tools/delegate_tool.py only relays start/complete
to the gateway and drops spawn_requested, so start is the first
delegation event the TUI reliably receives. spawn_requested is wired
too for the future case, guarded once-per-turn.
Adds the display.tui_agents_nudge config default and gatewayTypes entry.
These tests asserted that hardcoded curated model lists/constants still
contained specific model strings (e.g. 'glm-5' in provider_model_ids('zai'),
exact context-length values per model key, PROVIDER_TO_MODELS_DEV entries).
They mirror a constant rather than exercise logic, so they only ever break
when models are added/retired and never catch a real bug.
Removed 22 such functions across 7 files (149 deletions, 0 additions).
Behavioral siblings are kept: live-catalog-wins, fallback ordering,
substring/longest-match resolution, normalization, credential discovery,
and probe-tier stepping all still tested.
Starlette < 1.0.1 is affected by CVE-2026-48710 ("BadHost", CWE-444).
The HTTP Host header was not validated before being used to rebuild
`request.url`, so a malformed Host could make `request.url.path` desync
from the raw ASGI path the router actually dispatched. Middleware and
endpoints that apply path-based authorization off `request.url` (rather
than `scope["path"]`) can therefore be bypassed.
Hermes pulls Starlette transitively, never directly:
- [web] -> fastapi==0.133.1 (starlette>=0.40.0, no upper bound)
- [mcp] -> mcp==1.26.0 + sse-starlette (starlette>=0.27 / >=0.49.1)
- [computer-use] -> mcp==1.26.0
- [dev] -> mcp==1.26.0
A fresh resolve landed starlette 0.52.1 — vulnerable. With no upper
bound on the transitive specs, pip/uv could resolve any pre-1.0.1
release on a fresh install.
Fix: pin starlette==1.0.1 directly in every extra that exposes a
Starlette-backed server surface, regenerate uv.lock (only starlette
moves: 0.52.1 -> 1.0.1, hash-verified), and mirror the pin in the
lazy-install map (tools/lazy_deps.py `tool.dashboard`) so `hermes`
on-demand dashboard installs can't re-resolve a vulnerable version.
1.0.1 is the advisory's named fix floor and the oldest patched release
(more bake time than 1.1.0/1.2.0, which are days old); it satisfies
every carrier constraint and our requires-python>=3.11.
Scope note: this is a dependency-level fix complementing the
application-layer Host-header validator added in #34162
(`hermes_cli/web_server.py` `_is_accepted_host`). Defense in depth at
both the framework and app layers.
Guards: two invariant tests in tests/test_packaging_metadata.py assert
every server-surface extra pins starlette and that pyproject + uv.lock
both resolve >= the 1.0.1 CVE floor — a dropped pin or stale lock fails
in CI instead of shipping the bypass.
Closes#35067
Self-hosted Honcho setup had four sharp edges:
- local/cloud URLs ending in /vN double-prefixed by the SDK (/v3/v3/... 404)
- authenticated local servers had no setup prompt for a JWT/bearer token
- profile-derived host keys could be dot-containing workspace IDs Honcho rejects
- memory-provider config files with API keys written world-readable per umask
This keeps existing behavior but makes those paths safer:
- strip a trailing /vN version segment from any configured baseUrl before SDK
init (the SDK's route builders always prepend their own version prefix);
auth-skipping stays loopback-only
- add an optional local JWT/bearer prompt in honcho setup, stored under
hosts.<host>.apiKey
- derive new profile host keys with underscores, still reading legacy
hermes.<profile> blocks
- write memory-provider config files atomically with 0600 via a shared
utils.atomic_json_write(mode=) arg (honcho/hindsight/mem0/supermemory)
- skip honcho.json parsing in gateway cache-busting unless Honcho is the active
memory provider; memoize by honcho.json mtime when active
- bust the gateway agent cache on memory.provider change
- add a hermes memory setup <provider> one-liner so fresh installs can configure
a named provider without the picker (the per-provider hermes <provider>
subcommand only registers once that provider is active)
Closes#20688, #29885, #26459, #30246, #33382, #32244.
Co-authored-by: BROCCOLO1D
apply_nous_managed_defaults() was adding image_gen and video_gen to the
'changed' return set without writing any config values. The caller
(tools_command first_install flow) uses 'changed' to skip manual
configuration, so these tools ended up in platform_toolsets but with no
video_gen.provider, video_gen.use_gateway, or image_gen.use_gateway in
config.yaml.
At runtime the FAL plugin's is_available() returned False because there
was no FAL_KEY and no use_gateway config — the tool never loaded despite
being 'enabled' in the toolset list.
For image_gen this was a latent bug masked by the gateway offer prompt
(prompt_enable_tool_gateway) running earlier in the setup flow and
writing image_gen.use_gateway=True via apply_gateway_defaults(). But if
the user skipped the gateway offer, image_gen would silently break the
same way.
For video_gen (added in PR #33259) the bug was always hit because the
gateway offer ran before the user checked video_gen in the toolset
checklist.
Fix: write provider/use_gateway config values before adding to 'changed',
matching the pattern used by web, tts, and browser.
When FTS5 is missing the warning now explains the likely cause (an
unsupported / pip-managed Python whose bundled SQLite lacks FTS5) and
links the supported install at hermes-agent.nousresearch.com, instead
of just logging the raw error.
uv's python-build-standalone distributions only gained FTS5 in mid-2025
(#694). A stale interpreter already in uv's store — which `uv python find`
reuses without checking — can lack it, leaving the supported install with
a SQLite that can't create the FTS5 virtual tables hermes_state.py needs
for full-text session search ("no such module: fts5").
check_python now probes the resolved interpreter for FTS5 and, if missing,
reinstalls the latest patch for $PYTHON_VERSION (which has FTS5) and
re-resolves. If an FTS5-capable Python still can't be obtained (offline,
pinned env), it warns and continues — Hermes degrades gracefully and only
disables session search. No bundled second SQLite, no user action.
The salvaged contributor commit guarded only messages_fts. Current main
also creates a second virtual table, messages_fts_trigram (CJK substring
search), whose CREATE VIRTUAL TABLE ... USING fts5 still raised
"no such module: fts5" on builds without FTS5 — re-crashing SessionDB
init. Wrap the trigram setup with the same guard, and broaden the test's
no-fts5 mock to fail BOTH tables so the regression test actually
exercises a faithful no-FTS5 build.
The compression threshold is threshold × context_length where context_length
is the MAIN agent model's window, not the auxiliary/summary model's. On a
262,144-token model at the default 0.50 the threshold is 131,072 — close to a
common 128K figure by coincidence of the percentage, which has led to confusion
that the auxiliary model's context limit is the trigger. Add a note preempting
that misreading and pointing to the separate summary-model-context constraint.
Follow-up to the salvaged #34452 turn-completion explainer:
- Register display.turn_completion_explainer: True in DEFAULT_CONFIG so the
setting is discoverable, matching the file_mutation_verifier precedent.
- Shorten the repeated footer prefix from 'Turn ended without a usable
reply: ' to 'No reply: ' so the 10 reason variants don't all open with
the same 8-word boilerplate.
- Update the 7 assertions that referenced the old prefix.
PR #34470 adds an explainer suffix to abnormal turn endings (e.g.
max_iterations_reached) so users see why the response is short instead
of receiving a bare/blank reply. test_tool_call_validation_accepts_dict_arguments
runs the agent at max_iterations=3 which hits the explainer path; the
existing strict-equality assertion (== "done") no longer matches once
the suffix is appended.
Switch the assertion to .startswith("done") so the test continues to
verify that the models actual text survives intact while leaving the
explainer suffix wording owned by conversation_loop (where it belongs).
Test now passes (1 passed in 0.88s).
When a turn ends abnormally after substantive tool calls (empty content
after retries, a partial/truncated stream, exhausted retries, or an
iteration/budget limit), the CLI/TUI response area was left blank or
showed only a fragment (e.g. "The") with no consolidated reason. The
internal turn_exit_reason values (empty_response_exhausted,
partial_stream_recovery, etc.) were never surfaced to the user.
Add a turn-completion explainer that mirrors the existing file-mutation
verifier footer: at turn end, map an abnormal turn_exit_reason to a
short, actionable message and either replace the bare "(empty)"
sentinel or append the reason after a partial fragment. Normal
text_response exits (e.g. a terse "Done.") stay quiet.
Gated by display.turn_completion_explainer (default on) with
HERMES_TURN_COMPLETION_EXPLAINER env override, matching the
file-mutation verifier seam.
Closes#34452
* Inspired by Claude Code: /compress here [N] — boundary-aware 'summarize up to here'
Adds a user-chosen compression boundary to the existing /compress command.
/compress here [N] summarizes everything except the most recent N exchanges
(default 2), which are preserved verbatim — letting the user pick the
compression boundary instead of relying on the automatic token-budget heuristic.
Inspired by Claude Code's Rewind 'Summarize up to here' action (v2.1.139,
Week 20, May 2026): https://code.claude.com/docs/en/whats-new/2026-w20
- hermes_cli/partial_compress.py: pure split/parse helpers + seam-alternation
guard (shared by CLI and gateway).
- cli.py / gateway/run.py: route 'here [N]' / '--keep N' to partial compression;
compress only the head, re-append the verbatim tail through the seam guard.
- Preserves message-flow role alternation (seam guard merges any illegal
user->user / assistant->assistant adjacency).
- Reuses the existing _compress_context session-rotation/lock machinery — no
changes to the compression core.
- Bare /compress (full) and /compress <focus> behavior unchanged.
Tests: 12 helper unit tests + 5 CLI integration tests + E2E (interleaved
tool-call transcript, degenerate/multimodal seams, real handler path).
* fix: keep CLI context display in sync with preflight token estimate
The status bar reads compressor.last_prompt_tokens, which only updates
from a successful API response. When loaded history is oversized but
compression no-ops (e.g. the auxiliary summary model times out), no fresh
usage arrives and the bar stays frozen at the old, smaller value while the
preflight estimate reports a much larger number — looking permanently out
of sync (reported: 74.4K display vs ~144,669 preflight).
Seed last_prompt_tokens with the fresh preflight estimate (upward-only, so
a real usage figure is never clobbered and a successful compression's
downward correction still wins). Display-only; no behavioral change to
compression, caching, or the agent loop.
Sharpen the label from 'Session usage (cumulative)' to 'Cumulative API
tokens (re-sent each call)'. The number is real provider-reported usage
summed across every API call in the session — not context size. In an
agentic loop the same context is re-sent each iteration, so a one-hour
tool-heavy session legitimately reaches tens of millions of tokens. The
new label explains the magnitude so users stop reading it as a bug or as
a total across all sessions.
Hallucinated 'silence' tokens (*(silent)*, _silent_, the bare '.', '...',
'silent', no response/reply, the mute emoji) are emitted when a persona has
nothing actionable to say. In bot-to-bot channels the receiving bot mirrors
the token back, creating a tight loop that burns API tokens and can crash a
model with 'no content after all retries'. SOUL.md/prompt rules drift across
providers and have already failed in practice, so add a substrate-level guard.
_deliver_to_platform now drops a message whose finalized content is only a
silence-narration token, logs a WARNING with platform/chat_id/truncated
content, and returns {success: True, filtered: 'silence_narration',
delivered: False} instead of calling the adapter. Single chokepoint covers
every platform adapter; the regex is anchored start/end with a 64-char guard
so prose like 'Silence is golden — here is the plan...' or 'Silent install
completed' is never dropped. Local/file delivery is a separate path and is
left untouched. Opt out via gateway.filter_silence_narration: false or the
HERMES_FILTER_SILENCE_NARRATION env override (env wins when set).
Closes#34616
The rough-estimate mock supplied only 2 side_effect values but the
conversation loop calls estimate_request_tokens_rough a third time for
the post-response real-token estimate, exhausting the iterator. Use a
callable side_effect that returns 125k once (to fire preflight) then
sub-threshold values, independent of call count.
handle_enter dispatches /steer and /model inline on the UI thread while
the agent is running, calling buffer.reset() then returning. Unlike every
other early-return branch in the handler, these two skipped
event.app.invalidate(). process_command() prints through patch_stdout
(scrolls output above the prompt without redrawing the input line), so the
just-cleared input area could keep showing the submitted '/steer <text>'
until an unrelated redraw fired — looking unsent and inviting an accidental
re-submit.
Add event.app.invalidate() after reset in both inline branches to match
the sibling branches. AST regression test pins the invariant: every
reset-then-return branch in handle_enter must invalidate first.
Fixes#34569
Adds a user-chosen compression boundary to the existing /compress command.
/compress here [N] summarizes everything except the most recent N exchanges
(default 2), which are preserved verbatim — letting the user pick the
compression boundary instead of relying on the automatic token-budget heuristic.
Inspired by Claude Code's Rewind 'Summarize up to here' action (v2.1.139,
Week 20, May 2026): https://code.claude.com/docs/en/whats-new/2026-w20
- hermes_cli/partial_compress.py: pure split/parse helpers + seam-alternation
guard (shared by CLI and gateway).
- cli.py / gateway/run.py: route 'here [N]' / '--keep N' to partial compression;
compress only the head, re-append the verbatim tail through the seam guard.
- Preserves message-flow role alternation (seam guard merges any illegal
user->user / assistant->assistant adjacency).
- Reuses the existing _compress_context session-rotation/lock machinery — no
changes to the compression core.
- Bare /compress (full) and /compress <focus> behavior unchanged.
Tests: 12 helper unit tests + 5 CLI integration tests + E2E (interleaved
tool-call transcript, degenerate/multimodal seams, real handler path).
The POSIX installer drops node/npm/npx symlinks in ~/.local/bin pointing
into $HERMES_HOME/node and prepends ~/.local/bin to PATH, shadowing an
existing nvm. Uninstall removed the hermes wrapper but left these behind,
so the user's default node/npm/npx stayed redirected after uninstall.
Add remove_node_symlinks() and call it from run_uninstall. It removes
~/.local/bin/{node,npm,npx} only when each is a symlink resolving into the
current Hermes home's node dir, so a link the user repointed at nvm or a
real binary is never touched. Handles dangling links too.
Closes#34536
* fix(auxiliary): stop capping output with max_tokens by default
Auxiliary LLM calls (compression, titles, vision, etc.) no longer send
max_tokens on the OpenAI-compatible chat-completions path. Most providers
treat an omitted max_tokens as "use the model max", which is what we want;
an explicit cap only risks truncation or a wire-format 400.
This was surfaced by GitHub Copilot / GPT-5 (#34530): those models reject
max_tokens and require max_completion_tokens, so compression 400'd and fell
back to a static context marker. Omitting the param sidesteps that quirk
(and ZAI vision's error 1210) entirely.
The Anthropic Messages wire (MiniMax + /anthropic endpoints) keeps
max_tokens because it is a mandatory field there.
* test(auxiliary): update temperature-retry assertions for omitted max_tokens
The temperature-retry tests asserted retry_kwargs["max_tokens"] == 500 on an
api.openai.com endpoint. Now that auxiliary calls omit max_tokens on
OpenAI-compatible endpoints (#34530), that key is absent. Assert it's absent
in both first and retry kwargs and use model as the survives-the-retry witness.
* fix(deps): declare setuptools in dev extra for packaging tests
tests/test_packaging_metadata.py imports `from setuptools import
find_packages` at module scope to validate package discovery against
the live tree. setuptools was being picked up ambiently from the CI
runner image, but recent ubuntu-latest images no longer ship it in the
test venv, so collection fails with ModuleNotFoundError on every PR.
Declare setuptools==82.0.1 in the dev optional-dependencies so `.[all,dev]`
installs it explicitly rather than relying on the runner environment.
* test(packaging): skip packaging-metadata tests when setuptools absent
Belt-and-suspenders alongside declaring setuptools in [dev]: guard the
module-level `from setuptools import find_packages` with
pytest.importorskip so a runner missing setuptools SKIPS these checks
instead of erroring out collection for the entire test shard.
* chore(deps): sync uv.lock for setuptools dev dependency
* docs(code-execution): document HERMES_* env narrowing + passthrough workaround
The execute_code sandbox-child env scrub (108397726, #27303) deliberately
dropped the broad HERMES_ prefix passthrough, keeping only an operational
4-var allowlist (HERMES_HOME/PROFILE/CONFIG/ENV). A script that relied on a
non-secret HERMES_* var (HERMES_BASE_URL, HERMES_KANBAN_DB, HERMES_*_WEBHOOK,
or a plugin-defined one) now sees it unset in the child.
Document the behavior change and the two recovery routes (terminal.env_passthrough
in config.yaml, or required_environment_variables in skill frontmatter), plus
the debug log line that surfaces the drop for diagnosis.
* feat(cli): warn on unsupported pip installs + fix stale update-check cache after pip upgrade
Banner now shows a yellow warning when detect_install_method() == 'pip':
'pip install hermes-agent' isn't the supported install path (it exists on
PyPI for internal/CI reasons), so updates and issue support don't behave
correctly. Reuses existing install-method detection; warn, never block.
Also fixes#34491: check_for_updates() keyed its 6h cache only on ts+rev.
On the pip path (no HERMES_REVISION), rev is always None, so a
'pip install --upgrade' changed VERSION but left the cache valid — the
stale 'N commits behind' count survived the upgrade. Cache now also keys
on the installed VERSION and invalidates on mismatch.
asyncio.gather(return_exceptions=True) captures CancelledError as a
BaseException value. The previous isinstance(result, Exception) check
missed CancelledError, silently dropping it without logging.
Since Python 3.9, CancelledError is a BaseException subclass (not
Exception). This one-line change ensures all failure types from MCP
server connections are properly logged.
FixesNousResearch/hermes-agent#34443
test_notification_poller_requeues_when_busy drained and reused the
process-global process_registry.completion_queue, so a concurrent test
in the same xdist worker could put/get on the shared singleton mid-run
and empty the event the poller requeues — flaking 'assert not
completion_queue.empty()' under parallel CI load only.
Monkeypatch a fresh Queue onto the singleton for the test's duration so
nothing external can interleave. The poller reads completion_queue by
attribute at runtime, so the isolated queue is what it operates on.
monkeypatch restores the original on teardown. Verified immune: 50/50
passes under a background thread hammering the global queue.
A bare `/resume` printed the recent-sessions list but armed no selection
state, so typing just `3` on the next line was sent to the agent as chat
instead of resuming session #3. `/resume 3` worked, but the natural
list-then-pick flow did not.
Arm a one-shot pending-resume prompt when bare `/resume` shows the list,
and consume the next bare numeric input as the selection (out-of-range is
reported, non-numeric/other commands disarm it). Resolves against the same
_list_recent_sessions(limit=10) list used everywhere else.
Closes#34584.
* docs(code-execution): document HERMES_* env narrowing + passthrough workaround
The execute_code sandbox-child env scrub (108397726, #27303) deliberately
dropped the broad HERMES_ prefix passthrough, keeping only an operational
4-var allowlist (HERMES_HOME/PROFILE/CONFIG/ENV). A script that relied on a
non-secret HERMES_* var (HERMES_BASE_URL, HERMES_KANBAN_DB, HERMES_*_WEBHOOK,
or a plugin-defined one) now sees it unset in the child.
Document the behavior change and the two recovery routes (terminal.env_passthrough
in config.yaml, or required_environment_variables in skill frontmatter), plus
the debug log line that surfaces the drop for diagnosis.
* fix(stt,tts): restore mistralai — 2.4.8 is clean, ban lifted
PyPI quarantined mistralai on 2026-05-12 after the malicious 2.4.6
release (Mini Shai-Hulud worm). 2.4.6 has since been removed from the
registry and clean releases resumed (2.4.7 2026-05-25, 2.4.8 2026-05-28).
This rolls back the blanket runtime ban so Voxtral STT + TTS work again,
following the restoration checklist the repo left in pyproject.toml.
Verified against the real SDK: 2.4.8 keeps the import path the code uses
(from mistralai.client import Mistral) and the audio.transcriptions.complete
/ audio.speech.complete surfaces.
Changes:
- pyproject.toml: re-add mistral extra pinned to mistralai==2.4.8; left
OUT of [all] per the 2026-05-12 lazy-install policy (one quarantined
release must not break fresh installs). uv.lock regenerated.
- tools/lazy_deps.py: add stt.mistral / tts.mistral entries so the SDK
lazy-installs on first use (matches edge / elevenlabs).
- tools/transcription_tools.py: restore explicit-provider gate
(_HAS_MISTRAL + key) and auto-detect entry (local>groq>openai>mistral>xai);
_transcribe_mistral lazy-installs before import.
- tools/tts_tool.py: dispatcher routes back to _generate_mistral_tts;
_import_mistral_client lazy-installs the SDK.
- hermes_cli/tools_config.py, hermes_cli/web_server.py: un-hide Mistral
from the TTS provider picker and dashboard STT options.
- hermes_cli/security_advisories.py: KEEP the shai-hulud-2026-05 advisory
(module policy forbids removal) — it is scoped to 2.4.6 only, so it
still warns anyone with the poisoned build cached and never fires on
2.4.8. Summary note updated to reflect the un-quarantine.
- tests: revert the disabled-behavior assertions added by the ban commit
back to routing/positive expectations; add mistral to the
lazy-installable-extras-excluded-from-[all] contract.
Reported by @SkYNewZ (#34503).
Validation: 189 targeted STT/TTS/lazy_deps/metadata tests pass; E2E with
the real mistralai 2.4.8 SDK routes both STT and TTS to mistral.
MEDIA:<path> tags for .md/.json/.yaml/.xml/.html and other document
extensions were silently dropped. extract_media() carried a narrow
extension allowlist that omitted them, while extract_local_files()
had a broad one. The dispatch sites then ran an unconditional
re.sub(r'MEDIA:\\s*\\S+', '') that stripped the tag from the body even
when extract_media had not matched it — so extract_local_files (broad
list) ran on text where the path was already gone, and the file was
delivered by neither path.
- Add MEDIA_DELIVERY_EXTS in gateway/platforms/base.py as the single
source of truth; extract_media and extract_local_files both derive
their extension set from it (no more drift).
- Replace the loose MEDIA cleanup at the non-streaming dispatch site
(base.py) and the streaming consumer (stream_consumer.py) with the
shared, extension-anchored MEDIA_TAG_CLEANUP_RE. A MEDIA: tag with an
unknown extension is left in the body so the bare-path detector can
still pick it up instead of being black-holed.
- Chain cleaned text through extract_media -> extract_images ->
extract_local_files in run.py's post-stream media delivery (it was
dropping the cleaned text and rescanning raw text with MEDIA: tags).
- Regression tests covering both halves: previously-dropped extensions
now extract, and unknown-ext paths survive the cleanup.
Consolidates the MEDIA extension-allowlist PR cluster.
Co-authored-by: Bartok9 <259807879+Bartok9@users.noreply.github.com>
Co-authored-by: banditburai <123342691+banditburai@users.noreply.github.com>
Co-authored-by: Kyzcreig <9063726+Kyzcreig@users.noreply.github.com>
Config-version migrations have been observed to leave cron/jobs.json
valid-but-empty after `hermes update`, silently dropping every scheduled
job (#34600). The existing malformed-shape guards in cron/jobs.py don't
catch this because {"jobs": []} is valid JSON.
Add restore_cron_jobs_if_emptied() as a post-migration safety net: if the
live cron/jobs.json now has zero jobs while the pre-update snapshot held
one or more, restore the snapshot copy in place and warn loudly. The
check is conservative — it only restores on unambiguous evidence of loss
(snapshot had jobs, live file readable-and-empty), so a user who genuinely
cleared their jobs is never second-guessed and an unreadable live file is
left untouched so real corruption still surfaces.
Wired into _cmd_update_impl after migrate_config(), reusing the existing
pre-update quick snapshot (which already captures cron/jobs.json).
Closes#34600
The Feishu adapter stored one asyncio.Lock per chat_id in a plain dict
with no upper bound, so a long-running gateway that saw many distinct
chats grew _chat_locks without limit. Port the LRU-eviction pattern
already used by the yuanbao adapter: OrderedDict + move_to_end on access,
CHAT_LOCK_MAX_SIZE cap (1000), and eviction that skips currently-held
locks (falling back to dropping the LRU entry only if all are held).
Adds tests for the echo-loop fix (outgoing X-Tags header, inbound skip
on tagged events, genuine tags pass through) and extends the tag to the
out-of-process _standalone_send() path so cron / send_message deliveries
to a self-subscribed topic are also skipped. Maps both contributors in
release.py AUTHOR_MAP.
Co-authored-by: liuhao1024 <sunsky.lau@gmail.com>
When publish_topic equals the subscribe topic, the agent's own replies
are echoed back by ntfy as incoming messages, creating an infinite
reply spiral.
Fix: tag outgoing messages with X-Tags: hermes-agent header, and skip
incoming messages that carry this tag. This is zero-config — works
automatically regardless of topic configuration.
FixesNousResearch/hermes-agent#34447
The post-run scan that appends tool-emitted MEDIA: tags to the final
response iterated every tool/function message in the full conversation
and relied solely on path-based dedup against paths reconstructed from
the replayable transcript. When that reconstruction does not byte-match
the in-memory tool content (timestamp stripping, observed-context
withholding, compression rewrites), a stale path emitted several turns
earlier is absent from the dedup set and leaks onto a later text-only
reply (Telegram 'Sending media group of 1 photo(s)' with no MEDIA
directive present).
Scope the scan to this turn's new messages by slicing result['messages']
at len(agent_history) (agent_history is passed as conversation_history
into run_conversation, so the returned list is history + this turn).
Retain path-based dedup as a secondary guard and as the sole guard on
the compression-shrink fallback, preserving the #160 behaviour.
Closes#34608
test_system_unit_has_no_root_paths asserted the system unit's
WorkingDirectory was the remapped *checkout* path
(/home/alice/.hermes/hermes-agent). That is the brittle pin this PR
fixes — the system unit now anchors cwd at the target user's HERMES_HOME
(/home/alice/.hermes). The test's intent (no root-home leak, target-user
paths present) is unchanged and still holds.
The systemd unit (and launchd plist) pinned WorkingDirectory to PROJECT_ROOT
(the checkout the unit was generated from). When that checkout is transient —
a git worktree, or a clone hermes update later relocates/removes — the path
rots. systemd then fails the start at the CHDIR step (status=200/CHDIR) BEFORE
Python loads, so the on-boot refresh_systemd_unit_if_needed() self-heal never
runs and Restart=always crash-loops forever on a dead directory. Observed in
the wild: a gateway that crash-looped 153 times overnight, bot offline until a
manual 'hermes gateway restart' regenerated the unit.
Anchor cwd at HERMES_HOME instead — it never moves, always exists, and the
gateway never needed cwd to be the checkout (ExecStart uses an absolute python
+ -m hermes_cli.main). Existing broken units now differ from the generated unit
and self-heal on the next start/restart/update.
[tool.setuptools.packages.find] listed 'hermes_cli' without the
'hermes_cli.*' wildcard, so the wheel shipped hermes_cli/*.py but
dropped the dashboard_auth and proxy subpackages. The dashboard died
on every install with ModuleNotFoundError: No module named
'hermes_cli.dashboard_auth' (#34701); 'hermes proxy' was equally
broken.
Add the wildcard, and add a regression test that drives setuptools'
own find_packages against the live tree so any future subpackage
dropped from the include list fails CI instead of a user's container.
The profile alias --name path in main.py rewrote the wrapper with a
hardcoded #!/bin/sh script right after create_wrapper_script(), clobbering
the .bat on Windows and reintroducing the exact bug for custom aliases.
create_wrapper_script() now takes an optional target so the alias file is
named after the alias while the -p content references the profile — one
platform-aware code path, no post-hoc rewrite.
On Windows, hermes profile create produced a #!/bin/sh script that the
shell cannot execute. Now creates a .bat file with @echo off + %* on
Windows, and keeps the POSIX shell script on macOS/Linux.
Also fixes check_alias_collision to use 'where' instead of 'which' on
Windows, and remove_wrapper_script to find .bat files.
Fixes#34708
* docs(code-execution): document HERMES_* env narrowing + passthrough workaround
The execute_code sandbox-child env scrub (108397726, #27303) deliberately
dropped the broad HERMES_ prefix passthrough, keeping only an operational
4-var allowlist (HERMES_HOME/PROFILE/CONFIG/ENV). A script that relied on a
non-secret HERMES_* var (HERMES_BASE_URL, HERMES_KANBAN_DB, HERMES_*_WEBHOOK,
or a plugin-defined one) now sees it unset in the child.
Document the behavior change and the two recovery routes (terminal.env_passthrough
in config.yaml, or required_environment_variables in skill frontmatter), plus
the debug log line that surfaces the drop for diagnosis.
* fix(cli): clarify panel clips choices off-screen on short terminals
The clarify multiple-choice panel is a height-less Window inside a
non-full-screen HSplit. When its content exceeds the viewport,
prompt_toolkit distributes height per child and clips the panel's tail
— where the choices live — so options render invisible/cut off (issue
#34645, reported on macOS Terminal.app).
Two budget-accounting bugs let the panel overflow:
- the compact-chrome decision ignored the question rows, so full chrome
(3 blank separators) was kept even with no room
- the '… (question truncated)' marker was not counted against the
question's row budget, overshooting by one row at a 1-row budget
Fix: reserve one question row in the compact decision, count the
truncation marker against the budget, and drop the question entirely
when the choices alone already exceed the viewport (choices are the
must-see content for a selection).
* docs(code-execution): document HERMES_* env narrowing + passthrough workaround
The execute_code sandbox-child env scrub (108397726, #27303) deliberately
dropped the broad HERMES_ prefix passthrough, keeping only an operational
4-var allowlist (HERMES_HOME/PROFILE/CONFIG/ENV). A script that relied on a
non-secret HERMES_* var (HERMES_BASE_URL, HERMES_KANBAN_DB, HERMES_*_WEBHOOK,
or a plugin-defined one) now sees it unset in the child.
Document the behavior change and the two recovery routes (terminal.env_passthrough
in config.yaml, or required_environment_variables in skill frontmatter), plus
the debug log line that surfaces the drop for diagnosis.
* fix(mcp): stop reporting false OAuth success when no token was obtained
`hermes mcp login` reported "Authenticated — N tool(s) available" for
servers that serve tools/list without auth (e.g. Google's official Drive
MCP server) even when the OAuth flow never completed — dynamic client
registration 400'd because the provider doesn't support RFC 7591, so no
token was ever acquired. Every real tool call then hung until timeout
with no indication of why.
Login now verifies a token actually landed on disk after the probe. When
it didn't, it warns that authentication didn't complete and shows the
config needed to supply a pre-registered client_id/client_secret (the
existing, already-supported workaround for DCR-less providers).
Adds a docs pitfall for Google Drive / Atlassian-style providers.
Fixes#34775
* docs(code-execution): document HERMES_* env narrowing + passthrough workaround
The execute_code sandbox-child env scrub (108397726, #27303) deliberately
dropped the broad HERMES_ prefix passthrough, keeping only an operational
4-var allowlist (HERMES_HOME/PROFILE/CONFIG/ENV). A script that relied on a
non-secret HERMES_* var (HERMES_BASE_URL, HERMES_KANBAN_DB, HERMES_*_WEBHOOK,
or a plugin-defined one) now sees it unset in the child.
Document the behavior change and the two recovery routes (terminal.env_passthrough
in config.yaml, or required_environment_variables in skill frontmatter), plus
the debug log line that surfaces the drop for diagnosis.
* fix(api_server): emit per-turn transcript on run.completed (#34703)
WebUI clients lost intermediate (pre-tool-call) assistant text after
switching session pages mid-stream. The session-chat SSE stream delivers
all assistant text as assistant.delta events under one message_id
interleaved with tool.* events, then a single assistant.completed
carrying only the final reply — so a client accumulating deltas into one
buffer cannot reconstruct intermediate text segments that preceded tool
calls, and they vanish from the live view (state.db persists them
correctly).
run.completed now carries the authoritative per-turn transcript
(assistant + tool messages for this turn, in client-safe shape) so any
SSE consumer can reconcile its live view against ground truth without a
separate GET /messages round-trip. Purely additive — clients that ignore
the field are unaffected.
A GitHub tap can ship a repo-root skills.sh.json (the published skills.sh
schema) declaring category groupings. The Skills Hub now reads it at index
time and uses each grouping title as the skill's category label, instead of
the tag-derived guess. Generic: any tap that ships the file gets real
categorization — NVIDIA's groupings (Inference AI, Decision Optimization,
GPU Development, etc.) flow through automatically.
- GitHubSource: _get_skillsh_groupings() fetches+caches the sidecar per repo;
_parse_skillsh_groupings() flattens it to {skill_name: title};
_list_skills_in_repo() stamps meta.extra['category']; _meta_to_dict now
serializes extra so the category survives the index cache round-trip.
- extract-skills.py: prefers extra['category'] over the tag heuristic and
exempts sidecar categories from the small-category to Other collapse.
- Docs + 12 tests.
NVIDIA/skills is now a default trusted tap in the Hermes Skills Hub —
discoverable, browsable, searchable, and auto-updating through the same
pipeline that already serves OpenAI, Anthropic, and HuggingFace skills.
Rebased onto current main.
The Chat/TUI dashboard tab showed a false "Session token unavailable"
error and never rendered the terminal whenever the dashboard ran in
gated mode (OAuth auth gate active, --insecure not set), even though
the user was fully authenticated and every other tab worked.
Two checks in ChatPage.tsx gated purely on window.__HERMES_SESSION_TOKEN__,
which the server intentionally omits in gated mode (web_server.py only
injects __HERMES_AUTH_REQUIRED__=true there; the SPA is expected to use
cookie auth + a single-use WS ticket). buildWsAuthParam() already resolves
WS auth correctly for both modes, but the early bail prevented the effect
from ever reaching it.
Both checks now also honor __HERMES_AUTH_REQUIRED__: the banner no longer
fires and the xterm/WS effect no longer bails in gated mode.
Reported-by: wbrione <wbrione@users.noreply.github.com>
Closes#34755
* docs(code-execution): document HERMES_* env narrowing + passthrough workaround
The execute_code sandbox-child env scrub (108397726, #27303) deliberately
dropped the broad HERMES_ prefix passthrough, keeping only an operational
4-var allowlist (HERMES_HOME/PROFILE/CONFIG/ENV). A script that relied on a
non-secret HERMES_* var (HERMES_BASE_URL, HERMES_KANBAN_DB, HERMES_*_WEBHOOK,
or a plugin-defined one) now sees it unset in the child.
Document the behavior change and the two recovery routes (terminal.env_passthrough
in config.yaml, or required_environment_variables in skill frontmatter), plus
the debug log line that surfaces the drop for diagnosis.
* fix: drain thread no longer crashes on fd-less stdout streams
The _wait_for_process drain thread called proc.stdout.fileno()
unconditionally. ProcessHandle implementations whose stdout is not
backed by a real OS fd (iterator-style in-memory streams, mock procs)
raised 'list_iterator' object has no attribute 'fileno' (or
'fileno() returned a non-integer' from select.select), killing the
daemon thread and silently losing all process output.
Resolve the fd defensively at the top of _drain; when stdout has no
usable integer fileno, fall back to draining it as an iterable (the
legacy 'for line in proc.stdout' contract). The real subprocess /
os.pipe-backed select() fast path is unchanged.
The curator docs stated that any skill not bundled/hub-installed was
'agent-created' and subject to curation — including foreground-created
skills and hand-written ones. Since PR #19621 (May 2026), the curator
requires an explicit marker in .usage.json, which
only the background self-improvement review fork sets.
Changes:
- Rewrite 'What agent-created means' to document the 3-step eligibility
check (not bundled + not hub + created_by=agent marker)
- Explain that foreground skill_manage(create) does NOT mark skills as
agent-created (user-directed by design)
- Warn that hand-written skills are NOT curated
- Add note in Per-run reports explaining the '(not resolved)' display
when no candidates exist (LLM pass skipped, not a config error)
- Link to skill_provenance.py for the write-origin ContextVar
Ref: PR #19621, tools/skill_provenance.py, tools/skill_manager_tool.py
'hermes config get <key>' is referenced in three guides but is not a
valid subcommand. The valid subcommands under 'hermes config' are
{show,edit,set,path,env-path,check,migrate}. 'hermes config show' is
already used elsewhere in the docs (including 'hermes config show |
grep <pattern>' in the FAQ), so it's the idiomatic replacement.
- work-with-skills.md: 'View all skill config' now uses
'hermes config show | grep ^skills\.config'
- migrate-from-openclaw.md: session-policy check now reads the value
from 'hermes config show'
- configuring-models.md: 'inspect what the CLI will actually use'
now uses 'hermes config show | grep ^model\.'
Refs #30195
Fixes#24809
The docs site uses baseUrl='/docs/' but the <img> tags in sessions.md
and cli.md referenced images at /img/docs/... which resolves to a 404.
The static files are served at /docs/img/docs/... instead.
Before: <img src="/img/docs/session-recap.svg"> → 404
After: <img src="/docs/img/docs/session-recap.svg"> → 200
Also fixes cli-layout.svg which had the same issue.
Use the current top-level fallback_providers list in fallback docs and keep fallback_model documented only as the legacy compatibility shape. Also align cron and delegation fallback coverage with current runtime behavior.
Closes#19691
Co-authored-by: Codex <codex@openai.com>
- Replace outdated linear ordering in prompt-assembly guide with
current stable/context/volatile tier contract from system_prompt.py
- Clarify where memory/profile snapshots live versus skills guidance
- Document that pre_llm_call context is user-message injection, not
cached system-prompt mutation
- Update architecture guide wording to reference system_prompt.py +
prompt_builder.py tiered assembly
Closes#34118
* fix(gateway): only fire planned-stop watcher for markers targeting self
Salvaged from #34599 — rebased onto current main.
The planned-stop watcher now only fires shutdown for a marker that targets
the current process, instead of any marker that exists on disk. Fixes the
Windows crash loop (#34597) where a stale marker from a previous Gateway
instance kills a freshly booted Gateway ~400ms after start with a false
"Received UNKNOWN — initiating shutdown".
Co-authored-by: Bartok9 <danielrpike9@gmail.com>
* fix(gateway): match planned-stop/takeover markers by PID alone when start_time is unavailable
Follow-up to the #34599 salvage. The watcher's non-destructive probe
(planned_stop_marker_targets_self) already falls back to PID equality when
a process start_time is unavailable, but the authoritative consume it gates
(_consume_pid_marker_for_self) still required a non-None start_time match.
_get_process_start_time reads /proc/<pid>/stat and returns None on macOS and
native Windows — the only platform the planned-stop watcher exists for. So on
Windows the probe would fire the shutdown handler (PID matches) but the
handler's consume_planned_stop_marker_for_self() would return False, and a
legitimate 'hermes gateway stop' was still misclassified as an unexpected
UNKNOWN exit (exit 1) and revived by the service manager — a residual half of
the #34597 crash loop on the legitimate-stop path.
Align the consume with the probe: when both start_times are known they must
match (PID-reuse guard preserved on Linux); when either is unavailable, fall
back to PID equality alone, bounded by the existing short marker TTL. This
also fixes the parallel --replace takeover consume on Windows, which shares
the same helper.
Adds regression tests for the Windows (None start_time) path, the foreign-PID
rejection under that fallback, and confirmation the start_time-mismatch guard
still rejects when both are known.
---------
Co-authored-by: Bartok9 <danielrpike9@gmail.com>
FAL veo3.1 API expects duration as "4s"/"6s"/"8s" (with unit suffix),
not bare "4"/"6"/"8" like other families. Add per-family duration_suffix
field and apply it in _build_payload. Also add "4k" to veo3.1 resolutions
per FAL API docs.
Note: the managed gateway currently rejects the "4s" format (expects
integer duration). Gateway-side fix needed for veo3.1 to work through
the Nous subscription path.
Add the same managed-gateway UX that image_gen already has:
- TOOL_CATEGORIES['video_gen'] gets a 'Nous Subscription' provider row
with managed_nous_feature='video_gen' + video_gen_plugin_name='fal'
- NousSubscriptionFeatures gains a video_gen property + feature state
computation (managed/active/available using the fal-queue gateway)
- _GATEWAY_TOOL_LABELS, _GATEWAY_DIRECT_LABELS, _ALL_GATEWAY_KEYS,
_get_gateway_direct_credentials, opted_in all include video_gen
- apply_nous_managed_defaults and apply_gateway_defaults handle video_gen
- _is_toolset_satisfied checks Nous features for video_gen
- _is_provider_active detects managed video_gen (use_gateway + fal provider)
- _select_plugin_video_gen_provider accepts use_gateway kwarg, propagated
from all 4 call sites in _configure_provider when managed_feature is set
- hermes setup status shows 'Video Generation (FAL via Nous subscription)'
Users on a Nous subscription can now pick 'Nous Subscription' under
hermes tools → Video Generation, which sets video_gen.provider=fal +
video_gen.use_gateway=true. The FAL plugin's _resolve_managed_fal_video_gateway
then routes through the managed queue gateway — no FAL_KEY needed.
- Add test for 4xx ValueError with actionable remediation message
- Add test for is_available() returning True via managed gateway
- Add test for prefers_gateway overriding direct FAL_KEY
- Add test for is_available() via gateway in plugin test file
Wire plugins/video_gen/fal/__init__.py to use the same
_ManagedFalSyncClient pattern that image gen already uses.
Changes:
- Add managed gateway resolution, client caching, and
_submit_fal_video_request() that routes between direct FAL_KEY
and Nous gateway modes
- Update is_available() to return True when either FAL_KEY or the
managed gateway is reachable
- Update generate() to use submit+get handle pattern instead of
fal_client.subscribe() directly
- Fix happy-horse endpoint namespace: fal-ai/ → alibaba/ (matches
the tool-gateway allowlist from fal-video-gen branch)
- Surface actionable error on 4xx gateway rejections
Tests:
- 4 new tests in test_managed_media_gateways.py (gateway routing,
client reuse, direct mode fallback, alibaba namespace)
- Updated existing test_fal_plugin.py fixture to use submit/handle
pattern and patch _resolve_managed_fal_video_gateway for isolation
The scan-gate / dep-bounds-gate jobs use needs.changes; if the changes
job itself fails, its dependents would be skipped via a failed dependency
(not a conditional skip), leaving the required check unreported — the same
"pending forever" failure this PR fixes. Add always() and switch the gate
condition from == 'false' to != 'true' so the gate still fires (and reports
SUCCESS) when changes fails and its output is empty.
Remove paths filters from contributor-check and supply-chain-audit
workflows. When no matching files changed, the workflows never ran and
the required checks (check-attribution, supply chain scan, dep bounds)
stayed "pending" forever, blocking merge.
Now both workflows always trigger. A path-check step/job determines
whether the real work should run; gate jobs with matching names report
success when the real job was skipped, so branch protection always
gets a check status.
Also fixes dep-bounds: the old condition
if: contains(github.event.pull_request.changed_files_url, 'pyproject.toml') || true
was always true (the || true made it unconditional). Now uses the
proper changes.deps output from the shared filter job.
Have vacuum() return optimize_fts()'s count so the CLI 'sessions optimize'
summary uses the real merged-index count instead of probing the private
_FTS_TABLES / _fts_table_exists() members.
The FTS5 indexes (messages_fts, messages_fts_trigram) grow as a series of
incremental b-tree segments — one per trigger-driven insert batch. SQLite's
automerge caps at ~16 segments, so a long-lived store keeps scanning many
segments per MATCH and never collapses them unless the special 'optimize'
command runs. Nothing in the codebase ever ran it: vacuum() only fired after
a prune that deleted rows, and even then never merged FTS segments.
Changes:
- SessionDB.optimize_fts(): merges each FTS5 index to a single segment,
probing for the (optional/lazy) trigram table first so it is safe to call
unconditionally. Layout-only — search results and snippet() are unchanged.
- vacuum() now calls optimize_fts() before VACUUM so freed index pages are
returned to the OS in the same pass.
- 'hermes sessions optimize' CLI subcommand for on-demand reclamation +
segment compaction (previously there was no way to compact the store
without a prune deleting rows), with before/after size reporting.
Benchmark (8000 msgs, fragmented to 8 segments/index):
- segments 8 -> 1 on both indexes
- porter MATCH 5.5x faster (0.449 -> 0.081 ms/q)
- trigram MATCH 3.0x faster (0.632 -> 0.207 ms/q)
- 8000 matches before == 8000 after, identical row ids (no functional change)
Orthogonal to the structural FTS-size PRs (#20239 external-content,
#27770 optional trigram) — segment merge helps regardless of those.
Tests: TestOptimizeFts covers index count, search+snippet preservation,
missing-trigram path, and idempotency. Full test_hermes_state.py green (227).
The rotation flowchart only described the generic 'retry once, rotate on
second 429' path. ChatGPT/Codex plan-limit 429s carry a usage_limit_reached
reason and rotate to the next pool key immediately (no retry, since the cap
won't clear on retry). Document that case so the docs match the code.
Adds a `grok` skill under `skills/autonomous-ai-agents/`, a third coding-agent orchestration guide alongside `codex` and `claude-code`. It teaches Hermes to delegate coding tasks to Grok Build (xAI's `grok` CLI).
- Headless `-p` one-shots (preferred)
- Interactive TUI via pty + tmux
- Session resume, background tasks, structured JSON output
- PR review and parallel worktree patterns
- Auth via SuperGrok / X Premium+ (`grok login`)
- Full pitfalls and config notes
* chore(models): swap gemini-3-flash-preview for gemini-3.5-flash in OpenRouter + Nous lists
* chore(models): regenerate model-catalog.json for gemini-3.5-flash swap
Two CI flakes surfaced on PR #34572 (both in files this PR doesn't touch;
pre-existing host-dependent flakes):
1. test_process_registry::TestPopenLeakOnSetupFailure — the failure-cleanup
tests use a fake proc.pid (8888/9999) and assert proc.kill() runs. But
spawn_local's primary cleanup is os.killpg(os.getpgid(pid), SIGKILL),
falling back to proc.kill() only on ProcessLookupError/PermissionError/
OSError. When the fake PID happens to exist on a busy host, os.getpgid
succeeds, os.killpg fires against an UNRELATED real process group, and
proc.kill() is never reached -> flaky AssertionError (and a real risk of
SIGKILLing an innocent process group from a unit test). Patch os.getpgid
to raise ProcessLookupError so the fallback path runs deterministically
and no real killpg is ever issued.
2. test_web_server::test_resize_escape_is_forwarded — the receive loop calls
the blocking conn.receive_bytes() with no exception guard. Once the child
prints its winsize and exits, the PTY closes; on a missed-marker run the
next recv blocks until the 30s pytest-timeout instead of failing fast.
Add a try/except break (matching the working sibling tests) and bump the
child's pre-read sleep 0.15s -> 0.5s so the resize reliably lands first.
Verified: 4/4 pass across 3 consecutive runs; root cause for #1 reproduced
(os.getpgid(1) succeeds -> old code skips proc.kill).
_adapter_enforces_own_access_policy accessed self.adapters directly, but
several auth tests build a bare GatewayRunner via object.__new__ without
setting .adapters (pitfalls.md #17). Read it defensively with getattr so a
missing/empty adapter map means "no adapter owns the policy" instead of
raising AttributeError.
Fixes 4 tests: test_feishu_bot_auth_bypass, test_discord_bot_auth_bypass (x2),
test_signal::test_signal_in_allowlist_maps.
Config-driven platform policies (dm_policy / group_policy / allow_from /
group_allow_from) for WeCom, Weixin, Yuanbao, and QQBot now work without
also setting a PLATFORM_ALLOWED_USERS env var.
These adapters enforce their access policy at intake — a message is dropped
inside the adapter and never dispatched unless it already passed the policy.
The gateway's env-based check (_is_user_authorized) ran afterward and, with
no env allowlist set, fell through to an env-only default-deny — silently
rejecting `dm_policy: open` and config-only allowlists the adapter had
already authorized.
Rather than re-implement each adapter's policy a second time in run.py
(which would drift), adapters that own their gate now declare it via a new
BasePlatformAdapter.enforces_own_access_policy property (default False). The
gateway trusts that flag and skips the env-only default-deny for those
platforms. Env allowlists still take precedence when set.
Also resolves unauthorized DM behavior from config dm_policy so allowlist /
disabled policies drop unauthorized DMs silently instead of leaking pairing
codes, while an explicit pairing policy opts back in.
Co-authored-by: Frowtek <frowte3k@gmail.com>
Vulture + per-symbol verification (whole-repo grep incl. tests, string
literals, getattr, decorator/registry/argparse dispatch) confirmed each of
these has zero callers anywhere — not reachable via any dynamic-dispatch path,
not referenced by tests, not re-exported.
Removed:
- acp_adapter/tools.py: _build_patch_mode_content
- agent/anthropic_adapter.py: read_claude_managed_key (diagnostics-only, never called)
- agent/bedrock_adapter.py: get_bedrock_model_ids
- agent/browser_registry.py: get_active_browser_provider
- agent/chat_completion_helpers.py: _take_request_client (x2 nested closures, never invoked)
- gateway/platforms/weixin.py: _rewrite_headers_for_weixin, _rewrite_table_block_for_weixin
- hermes_cli/banner.py: _skin_branding
- hermes_cli/debug.py: _delete_hint
- hermes_cli/gateway.py: _setup_email, _setup_sms, _setup_yuanbao
(platform keys absent from the _builtin_setup_fn dispatch dict; handled by
the _setup_standard_platform fallback)
- hermes_cli/kanban_db.py: set_max_runtime, active_run
- hermes_cli/kanban_diagnostics.py: severity_of_highest, _latest_clean_event_ts
- hermes_cli/main.py: _build_provider_choices, cmd_portal
(portal subcommand is wired via portal_cli.add_parser, not this wrapper)
- hermes_cli/model_switch.py: CustomAutoResult (orphaned by the switch_model() extraction)
- hermes_cli/models.py: format_model_pricing_table, fetch_nous_account_tier
- hermes_cli/portal_cli.py: _nous_portal_base_url
- hermes_cli/proxy/server.py: handle_models_fallback (defined but never registered on the router)
- tools/computer_use/cua_backend.py: _parse_element, _is_arm_mac
- tools/file_operations.py: _get_safe_write_root (prod uses the imported
agent.file_safety.get_safe_write_root directly)
- tools/skills_tool.py: _load_category_description
Also dropped two imports left unused by the removals:
- tools/file_operations.py: get_safe_write_root alias
- tools/computer_use/cua_backend.py: import platform
Pure deletion: -551 LOC. No behavior change. Test files covering the edited
modules pass (640/640); the broader suite's pre-existing/env-dependent
failures reproduce unchanged on origin/main.
Follow-up for #30240 — the new page was not referenced in sidebars.ts,
leaving it orphaned (unreachable via nav and flagged as a broken relative
link to ./profiles.md). Added under Using Hermes after profile-distributions.
Covers running multiple Hermes profiles as managed services on one host:
- A shell-loop wrapper pattern for start/stop/restart/status across every
profile (the per-profile CLI commands stay unchanged).
- Per-platform service file locations (LaunchAgent on macOS, systemd user
unit on Linux), plus the rules around clashes.
- Log paths per profile and how to tail every gateway at once.
- Config file layout per profile and the restart-after-edit workflow.
- Keeping the host awake: caffeinate flags on macOS,
systemd-inhibit + loginctl enable-linger on Linux.
- Token-conflict auditing across .env files.
- Troubleshooting for the common "Could not find service in domain for
user gui: 501" message and stale PIDs after a crash.
Tested locally with five profiles on macOS launchd.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* fix(security): block AWS SDK creds from subprocess env
* fix(security): narrow Bedrock subprocess strip to inference bearer token only
Scopes the AWS_SDK subprocess strip down from the full AWS credential chain
to just AWS_BEARER_TOKEN_BEDROCK — the only Hermes-managed *inference* secret
(analogous to OPENAI_API_KEY). The general AWS credential chain
(AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY / AWS_SESSION_TOKEN / AWS_PROFILE
/ config + role pointers) is intentionally left inheritable.
Why: per SECURITY.md §3.2 the local terminal is the user's trusted operator
shell. Hard-blocklisting the general chain would (a) regress *every* user who
runs aws/terraform/cdk/boto3 in the agent terminal — not just Bedrock users,
since PROVIDER_REGISTRY is iterated unconditionally at import — and (b) be
unrecoverable, because env_passthrough.py refuses to re-allow anything in
_HERMES_PROVIDER_ENV_BLOCKLIST (GHSA-rhgp-j443-p4rf). The narrow strip closes
the reported leak (opencode enumerating the Bedrock catalog off the leaked
bearer token) with no capability loss.
Keeps zapabob's self-healing auth_type=="aws_sdk" mechanism so any future
SDK-cred provider is covered automatically.
Tests: bearer token stripped + general chain preserved (no-regression guard),
on both the runtime strip path and the blocklist-membership path.
Co-authored-by: zapabob <1920071390@campus.ouj.ac.jp>
* feat: embedder environment-hint hook for the system prompt
Adds HERMES_ENVIRONMENT_HINT env var (and config.yaml agent.environment_hint)
so a host wrapping Hermes (sandbox runner, managed platform) can describe the
runtime environment — proxy, credential handling, mount layout — in the system
prompt's environment-hints block, without editing the identity slot (SOUL.md).
Read once at prompt-build time, so it lands in the stable, cache-safe portion
of the system prompt. Env var overrides the config key (build-time/container
mechanism). Empty by default — no behavior change for existing installs.
---------
Co-authored-by: zapabob <1920071390@campus.ouj.ac.jp>
The setup guide was missing the specific Feishu permission scopes to
configure and the event subscription (im.message.receive_v1) needed
for the bot to receive messages. Users had to reference external
OpenClaw documentation to complete the setup.
Adds:
- Required permissions table (im:message, im:message:send_as_bot,
im:resource, im:chat, im:chat:readonly)
- Recommended permissions (reactions, app info, contact)
- Event subscription step (im.message.receive_v1)
- App version publish reminder (permissions require published version)
Web search/extract dispatch read agent.web_search_registry before plugin
discovery had run, so in any process that hadn't imported model_tools.py
(subprocess agent runs, delegate children, standalone scripts) the registry
was empty: get_provider('firecrawl') returned None and the dispatcher emitted
the misleading 'No web extract provider configured' error even with
web.extract_backend set and FIRECRAWL_API_KEY exported.
Adds an idempotent _ensure_web_plugins_loaded() helper (mirrors
tools.browser_tool._ensure_browser_plugins_loaded) and calls it at the top of
both the web_search_tool and web_extract_tool dispatch sites before the
registry lookup.
Fixes#27580.
Co-authored-by: briandevans <252620095+briandevans@users.noreply.github.com>
The fast-path decision (native routing + provider allowlist OR
supports_vision override) lived inline in vision_analyze and was copied
into browser_vision. Extract it to _should_use_native_vision_fast_path()
so both tools share one source of truth.
- vision_tools: gate logic now one helper; vision_analyze calls it in 3 lines
- browser_tool: thin envelope decoration over the shared helper, not a copy
- browser_vision typed Union[str, Dict] to match its real return shape
- tests slimmed to target the override path + text-mode-wins invariant
The _on_reaction approval handler used:
if self._allowed_user_ids and sender not in self._allowed_user_ids:
When MATRIX_ALLOWED_USERS is not configured, _allowed_user_ids is an
empty set. The short-circuit on the empty set caused the deny block to
never execute, allowing any Matrix room member to approve or deny tool
calls via ✅/❎ reactions — even users that run.py's _is_user_authorized
would reject for regular messages.
Fix mirrors the Telegram _is_callback_user_authorized fix (commit
89d32052e, PR #28494): deny by default when no allowlist is configured,
unless GATEWAY_ALLOW_ALL_USERS=true is explicitly set.
Follow-up mitigation for the #27303 env-scrub tightening. Dropping the
broad HERMES_ prefix in favor of a 4-var operational allowlist is correct
hardening, but a sandbox script that imports a repo module reading a
non-allowlisted HERMES_* var at import time would otherwise see it
silently unset. _scrub_child_env now emits a one-shot debug log naming the
dropped non-secret HERMES_* vars and pointing at the env_passthrough
opt-in escape hatch. Secret-shaped vars are never named in the log.
Tests: dropped vars are logged + env_passthrough named; no log when
nothing is dropped; secret vars excluded from the diagnostic.
Cover context+callback propagation and teardown-clears, a source guard that both RPC threads stay wrapped, the check_execute_code_guard decision matrix (isolated backend, headless-local, cron-deny, gateway approve/deny/timeout/missing-notify, smart mode, session-yolo), the env-scrub allowlist/secret rules, and a behavioral test that execute_code() blocks before spawning on denial.
Refs #4146, #27303, #30882, #33057
When approvals.mode=manual with security.tirith_enabled off and no auxiliary.approval model, dangerous commands and execute_code scripts can only be gated by live in-chat approval; with routing fixed they now fail closed (block) rather than silently auto-run. Surface that at startup so operators knowingly enable tirith or auxiliary.approval for unattended gateways.
Refs #30882
Wrap both execute_code RPC threads (local UDS + remote file-RPC) with propagate_context_to_thread so gateway sessions no longer fall into check_dangerous_command's non-interactive auto-approve branch and the CLI approval prompt stays reachable. Add check_execute_code_guard: one-shot fail-closed approval of the whole script in gateway/ask/cron-deny before the child spawns (skips isolated backends; command-string built only past the early returns). Drop the broad HERMES_ env passthrough for an explicit operational allowlist plus DSN/WEBHOOK secret substrings, and update the POSIX-equivalence oracle.
Refs #4146, #27303, #30882, #33057
Worker threads that dispatch Hermes tools started with an empty contextvars.Context and no thread-local approval/sudo callbacks. Add tools/thread_context.propagate_context_to_thread factoring that capture/install/clear lifecycle (mirrors the GHSA-qg5c-hvr5-hjgr pattern), and refactor agent/tool_executor onto it so the security-critical logic lives in one audited place. Update the contextvar-propagation source guard for the new call shape.
Refs #33057
After the legacy session-key path was removed, two parameters became dead
surface on the Nous runtime-resolution chain:
- min_key_ttl_seconds: del'd inside refresh_nous_oauth_pure and pass-through /
telemetry-only in refresh_nous_oauth_from_state, _try_import_shared_nous_state,
_nous_device_code_login, and resolve_nous_runtime_credentials. It controlled the
now-deleted agent-key mint TTL and drives no behavior.
- inference_auth_mode: with the legacy mode gone, AUTO and FRESH are behaviorally
identical; the value only fed _normalize_nous_inference_auth_mode validation and
oauth trace output, never a branch.
Removing inference_auth_mode orphaned its whole supporting cluster
(NOUS_INFERENCE_AUTH_MODE_AUTO/FRESH, NOUS_INFERENCE_AUTH_MODES,
_normalize_nous_inference_auth_mode), and dropping min_key_ttl_seconds orphaned
DEFAULT_AGENT_KEY_MIN_TTL_SECONDS — all deleted here.
Updated every caller (run_agent, auxiliary_client, credential_pool, proxy adapter,
runtime_provider, web_server, main, auth_commands, setup) and pruned the matching
test kwargs. Deleted two tests that exercised the removed surface
(test_legacy_auth_mode_is_rejected, test_try_refresh_..._accepts_explicit_auth_mode).
No behavior change: net -134 LOC of dead code.
The import-failure fallback returned any 3-segment token without scope/
expiry validation, a divergent reimplementation of the canonical
_nous_invoke_jwt_is_usable check. The import is from the same module that
provides resolve_nous_runtime_credentials, so a failure means the whole
auxiliary Nous path is unavailable anyway; return "" instead so the caller
falls through to the clear 'run: hermes auth add nous' guidance rather than
handing back an unvalidated token.
The live harness runs against a real OpenRouter key; record['error'] is a
full traceback that, on an auth failure, could echo a request header or URL
containing the key. _redact_secrets() now masks the live OPENROUTER_API_KEY,
any sk-/sk-or- bearer token, and Authorization/Bearer headers before
final_response and error enter the transcript or the console print. Addresses
the CodeQL clear-text-storage/logging findings at the source.
The scoping fix added enabled_toolsets/disabled_toolsets to the
agent_runtime_helpers sequential dispatch into handle_function_call, so
test_invoke_tool_dispatches_to_handle_function_call's assert_called_once_with
(exact match) needs the two new kwargs. Both are None for the default agent
fixture.
Brings in the tool_search live-test harness from the original PR but leaves
out the 11 checked-in scripts/out/*.json transcript files — those are
non-deterministic model output that goes stale the moment the model changes
and were the bulk of the diff. scripts/out/ is now gitignored so a harness
run never re-commits them.
Fixes on top:
- API-key loading goes through hermes_cli.env_loader.load_hermes_dotenv
instead of hand-parsing ~/.hermes/.env and assigning the value to a local.
The canonical loader never materializes the secret in a local variable in
this module, which clears the four CodeQL high alerts
(py/clear-text-storage / py/clear-text-logging-sensitive-data at the
transcript write/print sites — they were tracing the key from the
hand-rolled parser into the records) and removes a hand-rolled parser.
- encoding='utf-8' on every write_text/read_text in both harness scripts
(Windows-footgun hygiene).
Co-authored-by: teknium1 <127238744+teknium1@users.noreply.github.com>
Tool Search read its catalog from the global registry (get_tool_definitions
with no toolset scope = 'start with everything'), so a restricted-toolset
session — subagent, kanban worker, curated gateway session — could:
1. tool_search the entire process registry, not just its granted tools, and
2. tool_call any registered plugin/MCP tool it was never given, because
registry.dispatch() has no enabled_tools gate for non-execute_code tools.
A scoped session (enabled_toolsets=['mcp-github']) reported total_available=26
and successfully invoked an out-of-scope plugin tool via tool_call.
Fix:
- handle_function_call gains enabled_toolsets/disabled_toolsets; the bridge
dispatch scopes get_tool_definitions to them (also stops polluting the
process-global _last_resolved_tool_names with out-of-scope tools, which
leaked into execute_code's sandbox-tool fallback).
- A defense-in-depth gate rejects any tool_call'd name not in the scoped
deferrable catalog.
- tool_executor's unwrap (both concurrent + sequential paths) enforces the
same scope before dispatch, since it unwraps tool_call -> underlying name
and bypasses the bridge branch. New _tool_search_scoped_names() helper,
cached per-agent on registry generation + toolset scope.
- New scoped_deferrable_names() helper in tool_search.py shared by both sites.
Tests: 4 new regression tests in TestRegression_ToolsetScoping (scoped
catalog, out-of-scope tool_call rejection, no global pollution, helper).
Adds Tool Search, a structured-tools progressive-disclosure layer that
replaces MCP and non-core plugin tools in the model-visible tools array
with three bridge tools (tool_search / tool_describe / tool_call) when
the deferrable surface would consume more than a configurable percentage
of the active model's context window. Core Hermes tools are never deferred.
Default mode is 'auto' with a 10% context threshold, so small toolsets
pay no overhead. Set tools.tool_search.enabled to 'on' to force or 'off'
to disable.
Design carefully reflects the OpenClaw production failure modes
documented in the openclaw-tool-search-report:
- Core tools never defer (toolsets._HERMES_CORE_TOOLS). Addresses the
'tools silently missing from isolated cron turns' regression class
(openclaw#84141) by construction: there is no code path that can
drop a core tool.
- Catalog is stateless across turns — rebuilt from the live tool-defs
list on every assembly. No session-keyed Map that can drift out of
sync with the registry.
- tool_call unwraps the bridge call before any hook fires, so plugin
pre/post hooks, guardrails, approval flows, and the activity feed
all see the underlying tool name, not the bridge (addresses
openclaw#85588 and the verbose-mode complaint on openclaw#79823).
- The unwrap happens in both the parallel and sequential paths of
agent/tool_executor.py and also in handle_function_call, so direct
callers (sandboxed code, eval harnesses) are covered too.
- Bridge tools cannot invoke each other (recursion guard) and cannot
invoke core tools (those must be called directly).
- Tools mode only — no JS-sandbox code-mode. Keeps the surface small.
- Token estimation via cheap char/4 heuristic; precision isn't needed
for the threshold decision.
Files:
- tools/tool_search.py — new module (BM25 retrieval, classification,
threshold gate, bridge dispatch, unwrap helper).
- tests/tools/test_tool_search.py — 35 tests including the OpenClaw
#84141 regression guard.
- model_tools.py — wires assembly into _compute_tool_definitions as the
final step, adds skip_tool_search_assembly kwarg so the bridge can
see the real catalog, dispatches the three bridge tools.
- agent/tool_executor.py — unwraps tool_call in both parallel and
sequential parsing loops so checkpointing, guardrails, plugin hooks,
and tool-progress callbacks all observe the underlying tool name.
- hermes_cli/config.py — DEFAULT_CONFIG['tools']['tool_search'] block.
- website/docs/user-guide/features/tool-search.md — user docs.
Validation:
- 35/35 new tests pass.
- Existing tool/registry/model_tools/config/coercion/executor tests
(82 + 74 + small adjacents) green.
- Live E2E: 20 fake MCP tools registered, get_tool_definitions returns
3 bridges, tool_search returns top 3 hits, tool_describe returns
full schema, tool_call dispatches to the real underlying handler
and the underlying result is what the model sees.
- Reserved-name recursion guard verified live.
- Core-tool refusal via tool_call verified live.
The chatgpt.com/backend-api/codex endpoint can spend tens of seconds in
backend admission / prompt prefill before emitting its first SSE event. The
12s no-byte TTFB cutoff aborted those still-valid streams, surfacing as
'Codex stream produced no bytes within 12s' through all retries (Discord
reports). The OpenAI SDK's own streaming read timeout is 600s, so 12s was
~50x more aggressive than the transport layer would have tolerated.
Default the no-byte cutoff to 120s and raise the openai-codex MAX cap default
to 120s so it no longer clamps the new default back to 20s. Disabling stays
available via HERMES_CODEX_TTFB_TIMEOUT_SECONDS=0; the 25k-token auto-disable,
_STRICT override, and post-first-event idle watchdog are unchanged.
Co-authored-by: Gille <4317663+helix4u@users.noreply.github.com>
Scopes the AWS_SDK subprocess strip down from the full AWS credential chain
to just AWS_BEARER_TOKEN_BEDROCK — the only Hermes-managed *inference* secret
(analogous to OPENAI_API_KEY). The general AWS credential chain
(AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY / AWS_SESSION_TOKEN / AWS_PROFILE
/ config + role pointers) is intentionally left inheritable.
Why: per SECURITY.md §3.2 the local terminal is the user's trusted operator
shell. Hard-blocklisting the general chain would (a) regress *every* user who
runs aws/terraform/cdk/boto3 in the agent terminal — not just Bedrock users,
since PROVIDER_REGISTRY is iterated unconditionally at import — and (b) be
unrecoverable, because env_passthrough.py refuses to re-allow anything in
_HERMES_PROVIDER_ENV_BLOCKLIST (GHSA-rhgp-j443-p4rf). The narrow strip closes
the reported leak (opencode enumerating the Bedrock catalog off the leaked
bearer token) with no capability loss.
Keeps zapabob's self-healing auth_type=="aws_sdk" mechanism so any future
SDK-cred provider is covered automatically.
Tests: bearer token stripped + general chain preserved (no-regression guard),
on both the runtime strip path and the blocklist-membership path.
Co-authored-by: zapabob <1920071390@campus.ouj.ac.jp>
A process running mismatched module versions — conversation_compression.py
re-imported with the post-#34351 lock code while a long-lived
hermes_state.SessionDB stays bound to the pre-#34351 class in memory — has
the try_acquire_compression_lock call site but not the method. The
AttributeError it raises is NOT a sqlite3.Error, so the method's own
fail-open guard never runs; the exception escapes to the outer agent loop,
which prints the error and retries. Compression never succeeds, the token
count never drops, and the loop re-triggers compaction forever (the
'API call #47/#48/#49 ... has no attribute try_acquire_compression_lock'
spin a user hit after an update).
Wrap the lock acquire so any unexpected exception fails OPEN: skip locking
and proceed with compression. Skipping the lock risks a rare
concurrent-compression session fork; an infinite no-progress loop that never
compresses at all is strictly worse. The remediation hint in the log points
at the real fix (restart / hermes update to resync the stale module).
Also guards get_compression_lock_holder against the same skew.
Adds a regression test simulating the version skew (real SessionDB wrapped
so only the lock methods raise AttributeError) — asserts _compress_context
proceeds and rotates instead of raising.
Operators can now see which MEDIA path was dropped and why, generated
artifacts under the canonical ~/.hermes/cache/{images,...} layout deliver,
and a crafted ~\x00 path no longer aborts the whole attachment batch.
- MEDIA_DELIVERY_SAFE_ROOTS: add canonical cache/{images,audio,videos,
documents,screenshots} alongside the legacy *_cache dirs (#31733).
- filter_media/local_delivery_paths: log the rejected path (was a blind
"outside allowed roots") via _log_safe_path, which strips control chars
and Unicode line separators so a model-emitted path can't forge a log line.
- validate_media_delivery_path + extract_media: guard os.path.expanduser
so a ~\x00 path returns None / is skipped instead of raising and dropping
every other attachment in the response.
Salvaged and slimmed from #33251 (780 LOC -> 35): the reason-tag taxonomy,
the parts-eliding redactor, and the extension-partition hoist are dropped in
favor of logging the path directly. All three findings were verified and
reproduced by the contributor.
Co-authored-by: wysie <wysie@users.noreply.github.com>
The v0.15.0 PyPI wheel shipped every plugin's Python code but none of its
plugin.yaml manifests, so plugin discovery (hermes_cli/plugins.py) found zero
plugins and ALL gateway platforms failed with "No adapter available for
<platform>" (discord, slack, mattermost, ...). Same gap also dropped the
web-search provider manifests (#28149).
Declare manifest coverage in both packaging channels:
- wheel: [tool.setuptools.package-data] plugins += **/plugin.yaml, **/plugin.yml
- sdist: MANIFEST.in recursive-include plugins plugin.yaml plugin.yml
(Homebrew and other downstream packagers build from the sdist)
Verified by building the wheel before/after: plugin.yaml count went 0 -> 69,
discord's manifest now ships. Adds a regression test asserting both channels
cover manifests.
Fixes#34034
Co-authored-by: outsourc-e <201563152+outsourc-e@users.noreply.github.com>
Co-authored-by: Dhruvil Parikh <41384593+dparikh79@users.noreply.github.com>
Co-authored-by: ousiaresearch <261687298+ousiaresearch@users.noreply.github.com>
Co-authored-by: libre-7 <6366424+libre-7@users.noreply.github.com>
OAuth auto-open only checked _is_remote_session() (SSH + cloud-shell env
vars). On a headless/CLI-only Linux box with no GUI browser, none of those
trip, so webbrowser.open() resolved to a console browser (w3m/lynx/links)
and launched it INSIDE the terminal — hijacking the user's TTY with the
xAI 'Account Management' login page instead of letting them copy the URL.
Add _can_open_graphical_browser(): returns False when webbrowser would
resolve to a known console browser, when $BROWSER names one, when there's
no display server on Linux, or when no browser resolves at all. Gate all 5
OAuth auto-open callsites (xAI loopback, Spotify loopback, MiniMax device
code, Anthropic, Google) on it in addition to the existing remote check.
Headless boxes now print the URL / fall through to manual-paste instead.
Add an in-memory resourceId->local-path cache (24h TTL, 256-entry LRU) to
MediaResolveMiddleware so the same Yuanbao resource isn't re-downloaded when
it's referenced more than once in a session (own attachment, then quoted, then
group-observed backfill). Each reference otherwise triggers a fresh token
exchange + COS download.
The cache verifies the file still exists on disk before returning a hit (cache
dir may be swept) and is threaded through all three resolve paths:
_resolve_media_urls (rid parsed from placeholder URL), _collect_observed_media,
and the DispatchMiddleware quote path.
Salvaged from PR #30418 by @loongfay; the broader middleware refactor in that
PR converged with work already merged on main, so only the net-new download
cache is carried over.
The original PR diff updated two guides (oauth-over-ssh.md and
xai-grok-oauth.md) but only the oauth-over-ssh.md edit landed in the
PR's actual commit. Mirror the note to the primary xai-grok-oauth.md
guide too so users reading the main entry point don't miss the
bare-code form that already shipped in #33880.
Ghostty/macOS window or tab navigation (Cmd+Shift+[ / ], Alt+Tab,
etc.) can deliver terminal focus reports (CSI I / CSI O) to the
running TUI. prompt_toolkit does not map those sequences by default,
so its parser falls back to literal key presses (ESC, [, I/O) and
inserts `[I` / `[O` into the prompt buffer after the ESC byte is
handled.
Fix: register the two sequences as Keys.Ignore in ANSI_SEQUENCES at
parser level, plus a no-op kb.add(Keys.Ignore) handler so the
default self-insert path never inserts focus-report bytes.
Salvage notes: original PR put the helper in cli.py. Salvaged into
hermes_cli/pt_input_extras.py alongside install_shift_enter_alias /
install_ctrl_enter_alias to match the established pattern for
ANSI_SEQUENCES augmentation. setdefault → in-check so any prior user
registration wins.
Closes#16780
xAI returns HTTP 403 (not 401) with unauthenticated:bad-credentials
when an OAuth2 access token has expired or is invalid. The existing
_is_auth_error() only checked for 401 status codes, so these tokens
were never refreshed and the 403 propagated as a generic permission
denied error.
Three fixes:
1. _is_auth_error: Recognize xAI's 403+bad-credentials pattern as
an auth failure, triggering token refresh instead of silent failure.
2. _refresh_provider_credentials: Add xai-oauth branch with
pool-level refresh (try_refresh_current with select to ensure
current entry) then fallback to singleton resolver with
force_refresh=True.
3. _recoverable_pool_provider: Map api.x.ai host to xai-oauth
pool for auto-resolved providers, matching existing pattern for
openai-codex/openrouter/nous/anthropic.
Includes 14 tests covering the new detection logic, host mapping,
and graceful fallback behavior.
Signed-off-by: moikapy <moikapy@devmoi.com>
CodeQL flagged 'hermes-agent.nousresearch.com' in url and similar substring
checks as py/incomplete-url-substring-sanitization. The rule is about URL
allowlist checks in production code, not test routing — there's no
security boundary here. Switch to url == self.PRIMARY / self.FALLBACK,
which is the same semantic and silences the rule.
The docs site (Vercel) serves /docs/api/model-catalog.json behind a bot
mitigation rule that returns HTTP 403 + x-vercel-mitigated: challenge for
non-browser User-Agents — including urllib (what the CLI uses) and curl.
When that happens, get_catalog() falls back to the stale disk cache and
new model releases (Opus 4.8, etc.) never reach the /model picker even
though they're already in OPENROUTER_MODELS and the live OpenRouter API.
Adds a fallback URL chain: when the primary catalog URL fails, walk
DEFAULT_CATALOG_FALLBACK_URLS — currently the raw.githubusercontent.com
copy of the same file. GitHub raw doesn't bot-gate, so the manifest stays
reachable through Vercel firewall hiccups. Per-provider override URLs
keep their direct-fetch semantics (operators configure those specifically,
no implicit fallback).
Also swaps stepfun/step-3.5-flash for stepfun/step-3.7-flash in the
OpenRouter + Nous Portal curated picker lists. Native stepfun provider
configuration (api.stepfun.ai) is left alone — that depends on what
stepfun.ai itself serves, not what OpenRouter routes.
Test plan: 5 new TestFallbackChain tests cover primary-success,
primary-failure-fallback-success, all-fail, primary==fallback-dedup, and
end-to-end get_catalog routing through the new helper. Existing 23 tests
in test_model_catalog.py still pass (28 total). Wider tests/hermes_cli/
sweep: 5701/5701 pass.
Closes the termination-control gap left by PR #28432, which shipped the
read-only sibling endpoints (/workers/active, /runs/{run_id},
/runs/{run_id}/inspect) but no way to stop a misbehaving worker from
the dashboard without dropping to the CLI.
The new endpoint resolves run_id -> task_id and delegates to the
existing kanban_db.reclaim_task() flow, so the SIGTERM->SIGKILL
escalation, run-outcome bookkeeping, and event-log append all match
POST /tasks/{task_id}/reclaim exactly. No new termination semantics
introduced.
Responses:
200 {ok, run_id, task_id} on success
404 unknown run_id
409 run already ended OR task no longer reclaimable
Refs: #23762
The two tests in TestRunConversation now verify the new behavior:
- test_kanban_block_called_on_iteration_exhaustion → verifies
_record_task_failure(outcome='timed_out') is called instead of
kanban_block
- test_no_kanban_block_when_not_in_kanban_mode → verifies the bridge
is a no-op when HERMES_KANBAN_TASK is unset
The function names are kept for diff stability; both assert against
_record_task_failure now, which is the correct contract per the gap-2
fix in this PR.
Reporter diagnosed three independent gaps that together allowed infinite
'unblock → re-stuck' loops with no surfacing or escalation:
GAP 1: `_rule_stuck_in_blocked` resets timer on any `commented`/`unblocked`
event, so a task that cycles every few minutes is invisible to it
regardless of how many times it cycles.
Fix: new `_rule_block_unblock_cycling` rule (`hermes_cli/kanban_diagnostics.py`)
that counts block→unblock cycles in a sliding window. Default threshold
3 cycles within 24h, configurable via `block_cycle_threshold` /
`block_cycle_window_seconds`. Walks events in arrival order (event id)
since multiple events can share the same `created_at` second. Fires as a
warning with a CLI hint to inspect the block reasons.
GAP 2: Iteration-budget-exhausted runs in kanban workers map to
`kanban_block` (status=blocked, but a clean exit from the kernel's
perspective). `_rule_repeated_failures` reads `consecutive_failures`,
which `_record_task_failure` increments only for crashed/timed_out/
spawn_failed — `blocked` outcome bypasses the failure counter, so the
`kanban.failure_limit` circuit breaker never trips on budget-exhaustion
loops.
Fix: `agent/conversation_loop.py` budget-exhaustion path now calls
`_record_task_failure(outcome="timed_out")` instead of `kanban_block`.
Budget exhaustion is genuinely a timeout-shaped failure (the task ran out
of allowed iterations), so this is more honest semantics; it also routes
through the unified failure counter, so repeated budget exhaustions trip
the circuit breaker and the task auto-blocks with `gave_up` after
`failure_limit` retries.
GAP 3: `release_stale_claims` uses `_pid_alive(worker_pid)` only and
ignores `last_heartbeat_at`. Reporter observed a 91-min run that held
its claim with frozen heartbeat because the worker entered a logic loop
with no tool calls — `_pid_alive` kept returning True so the claim was
extended every 15 minutes indefinitely.
Fix: heartbeat-stale backstop. If `last_heartbeat_at` is set AND older
than `DEFAULT_CLAIM_HEARTBEAT_MAX_STALE_SECONDS` (default 1h), reclaim
even if the PID is alive. NULL `last_heartbeat_at` preserves backward
compatibility (no heartbeat yet = extend, as before). The reclaim event
payload now includes a `heartbeat_stale` boolean so operators see why a
live-PID worker was reclaimed.
This works cleanly in concert with PR #34418 (#31752 runtime → heartbeat
bridge): once `_touch_activity` keeps `last_heartbeat_at` fresh as a
side effect of normal API traffic, the backstop only fires for genuinely
wedged workers (no chunks, no tool results, no progress at all).
Co-authored-by: baofuen <45189813+baofuen@users.noreply.github.com>
The dispatcher watchdog (release_stale_claims) reads tasks.last_heartbeat_at
to decide whether to reclaim a running task. The agent maintains its own
in-process `_last_activity_ts` for every chunk/tool result, but those
liveness ticks never reach the board unless the model explicitly calls
the `kanban_heartbeat` tool — so a worker actively executing a long run
without tool-level heartbeats can be reclaimed mid-flight as 'stale',
returning the task to ready and orphaning the in-flight worker's progress.
Fix: in `_touch_activity` (the canonical 'we just did work' hook in
run_agent.py), call a new `heartbeat_current_worker_from_env` helper
in `tools/kanban_tools.py` that:
- No-ops outside dispatcher-spawned worker context (no HERMES_KANBAN_TASK).
- Rate-limited to one DB write per 60s (runtime activity ticks too often
to faithfully mirror; we just need the watchdog to see liveness).
- Best-effort: never raises. heartbeat_claim + heartbeat_worker calls are
individually try/except'd; any DB error logs at debug and returns.
- Uses worker env identity: HERMES_KANBAN_TASK + HERMES_KANBAN_RUN_ID +
HERMES_KANBAN_CLAIM_LOCK (all pinned by the dispatcher at spawn time).
- No durable note on auto-heartbeats — that's reserved for the explicit
`kanban_heartbeat` tool which carries a model-supplied note.
The explicit `kanban_heartbeat` tool stays available unchanged for
workers that want to attach a note or pre-emptively extend a claim
across a known-long single tool call.
Co-authored-by: faisfamilytravel <223516181+faisfamilytravel@users.noreply.github.com>
Kanban workers run headless — no live user is on the other side of `clarify`,
so the call times out (~120s default) and the task sits silently in `running`
with no signal to the operator that input is needed. Reporter observed a real
incident where a worker asked 'promote to production, or check staging first?'
via clarify, the call timed out, the agent hallucinated a fallback, and the
task sat 'running' for hours.
Fix: explicit 'do not call clarify' bullet in two surfaces every kanban worker
sees —
- `agent/prompt_builder.py` KANBAN_GUIDANCE `## Do NOT` section (auto-injected
into every dispatcher-spawned worker run).
- `skills/devops/kanban-worker/SKILL.md` `## Do NOT` section (the bundled
worker skill).
Both point at the right pattern: `kanban_comment` (context) + `kanban_block`
(decision needed) — the task surfaces on the board as blocked, the operator
sees it, unblocks with their answer in a comment, and the worker respawns
with the thread.
Co-authored-by: kweiner <17778+kweiner@users.noreply.github.com>
When OpenAI Codex returns 401 token_invalidated or token_revoked, the
credential is broken upstream — retrying after a TTL cooldown cannot
fix it. The existing code treated every 401/429 the same way:
STATUS_EXHAUSTED with a TTL cooldown (5 min for 401, 1 hour for 429).
After the TTL elapsed, the broken credential re-entered rotation and
immediately failed again with the same 401, surfacing as 'Failed to
generate context summary' on every context-compression cycle.
Reporter observed 7 separate 401 token_invalidated failures from the
same revoked credential in a single day; the only workaround was
removing it manually via 'hermes auth'.
Add a STATUS_DEAD terminal state. Only 401 responses whose
error.code/reason matches a known terminal OAuth state (token_invalidated,
token_revoked, invalid_token, invalid_grant, unauthorized_client,
refresh_token_reused) transition to DEAD. Everything else keeps the
existing TTL semantics — 429 rate limits are transient and should
recover.
DEAD entries are excluded from rotation unconditionally. They only
clear when an explicit write-side re-auth sync rewrites the tokens
(the existing _sync_codex_pool_entries / _sync_*_entry_from_auth_store
paths already clear last_status to None). The read-side
auth.json-sync paths also now fire on DEAD so an in-flight pool entry
can adopt fresh tokens written by another process without needing
explicit re-auth.
After 24 hours, DEAD manual entries (source='manual:*') are pruned
from the pool automatically so dead state doesn't accumulate forever.
Singleton-seeded DEAD entries (source='device_code' etc.) are kept
because _seed_from_singletons would recreate them on the next load
with the same stale tokens — pruning would be pointless. The audit
trail stays visible (label, last_error_reason, timestamps).
Closes#32849.
`hermes kanban unblock <id> review-required: ...` parsed every trailing word
as another task_id (since `task_ids` is `nargs='+'`), then quietly failed on
each non-existent id with "cannot unblock review-required: (not blocked/scheduled?)".
Reporter saw this as asymmetric with `block <id> <reason...>` which accepts
positional reason words.
Fix: add a `--reason "..."` flag that, when provided, is appended as a
`UNBLOCK: <reason>` comment before the unblock transition. Bulk syntax
(`unblock t_a t_b t_c`) is preserved unchanged.
Co-authored-by: julio-cloudvisor <211828103+julio-cloudvisor@users.noreply.github.com>
The Bitwarden Secrets Manager disk cache introduced in #31968 stores
plaintext secret values at <hermes_home>/cache/bws_cache.json to avoid
re-fetching across back-to-back CLI invocations. The file was not added
to get_read_block_error()'s credential_file_names list, leaving the
agent able to read it directly via the read_file tool.
Add os.path.join("cache", "bws_cache.json") to credential_file_names
so both HERMES_HOME and the global root are covered, matching the
existing pattern used for auth.json, .anthropic_oauth.json, etc.
Other files under cache/ (images, documents, audio) are unaffected —
the check is an exact-file match, not a prefix match.
Verified: 11/11 exploit/regression scenarios pass; 38/38 existing
file_safety tests pass.
The xAI tool-schema sanitizers (strip_slash_enum, strip_pattern_and_format)
mutate their input in place — that's their documented contract. The two
call sites (chat_completion_helpers.build_api_kwargs and the auxiliary
client) were passing agent.tools straight through, so the first xAI
request would permanently strip slash-containing enum constraints and
pattern/format keywords from the per-agent tool registry.
Effect: any subsequent non-xAI call from the same agent (auxiliary task
routed to Anthropic, OpenRouter fallback, mid-session model switch) saw
the already-stripped schema with no way for the user to notice from
their config.
Fix: deepcopy tools_for_api before sanitizing at both call sites.
The slash-enum bug itself (xAI 400ing on enums with '/') was fixed
earlier by #32443 (Nami4D) — that PR landed the strip but used the
sanitizers directly without copying. This salvages #27907's correctness
contribution (the deepcopy) while skipping its redundant parallel
sanitizer (strip_xai_incompatible_enum_values is functionally
equivalent to the existing strip_slash_enum) and its preflight-
neutrality argument (we chose model-gated preflight in #32443).
3 new tests in tests/run_agent/test_run_agent_codex_responses.py:
- strips_slash_enum_from_outgoing_request — outgoing kwargs has no
slash-containing enum values (functional contract preserved).
- does_not_mutate_agent_tools — headline #27907 regression. Snapshot
agent.tools before build_api_kwargs, assert it survives intact
after. Pre-fix this assertion would have caught the mutation.
- is_idempotent_across_repeated_calls — three xAI requests in a row
each strip cleanly AND don't progressively erode the source schema.
344/344 across tests/agent/test_auxiliary_client.py,
tests/agent/transports/test_codex_transport.py,
tests/run_agent/test_run_agent_codex_responses.py, and
tests/tools/test_schema_sanitizer.py.
Co-authored-by: Gabor Barany <barany.gabor@gmail.com>
Telegram DM topic bindings persist (chat_id, thread_id) -> session_id in
SQLite so reopening a topic resumes the right Hermes session. When
compression rotated session_entry.session_id mid-turn, the binding row
stayed pointed at the pre-compression parent. On the next inbound
message in that topic the gateway reloaded the oversized parent
transcript, retriggering preflight compression — sometimes in a loop.
Two-pronged fix:
1. `_sync_telegram_topic_binding(source, entry, *, reason)` helper
called immediately after each of the three session_id rotation sites
in _handle_message_with_agent (hygiene compression, agent-result
compression rotation, /compress command). Keeps future bindings
fresh.
2. Read-path self-heal: when resolving an existing topic binding, walk
SessionDB.get_compression_tip() forward and switch_session to the
descendant instead of the stored parent. Rewrites the binding row to
the tip so subsequent messages skip the walk. Heals existing stale
state on the next user message without requiring a gateway restart.
Skipped from competing PRs as not load-bearing for the bug:
- advance_session_after_compression SessionStore primitive (#26204/
#28870/#33416) — preserves end_reason='compression' analytics nicety
but doesn't affect routing correctness.
- Cached-agent eviction on session_id mismatch — _compress_context()
already mutates tmp_agent.session_id on the cached object so the
in-memory agent self-corrects.
- Startup repair pass (#33416) — redundant once the read path heals on
the next message; one-line CLI follow-up can address bindings for
topics users never reopen.
Closes#20470, #29712, #33414. Acknowledges work in #23195
(@litvinovvo), #26204 (@bizyumov), #28870 (@donrhmexe), #29713
(@hehehe0803), #29945 (@eugeneb1ack), #33416 (@bizyumov).
When users bind-mount /var/run/docker.sock to use TERMINAL_ENV=docker from
inside the container, the supervised hermes user (UID 10000) lacks
permission to talk to the socket — every `docker` invocation EACCES'es and
check_terminal_requirements() returns False. In messaging mode this also
silently strips the file/terminal toolset from the registered tool list,
so the agent rationalizes the missing tools as a platform restriction.
The naive workaround (docker run --group-add <socket-gid>) does NOT work
with our s6-setuidgid privilege drop: s6-setuidgid calls initgroups() for
the target user, which rebuilds supp groups from /etc/group. Without a
matching /etc/group entry the kernel-granted supp group is wiped between
PID 1 and the dropped hermes process. Verified empirically:
--group-add 998 alone: PID 1 Groups: 0 998 → after drop: Groups: 10000
This fix's /etc/group add: id hermes shows 998 → after drop: Groups: 998 10000
Detect the socket's GID at boot in stage2-hook (runs as root before the
privilege drop), reuse an existing group name if one matches the GID,
otherwise create 'hostdocker'. Idempotent across container restarts.
Silent no-op when no socket is mounted.
End-to-end verified by building the image and running the supervised
hermes user against the real host Docker daemon: `docker version`
succeeds and check_terminal_requirements() returns True.
Fixes#16703
Salvages #25872 by @konsisumer against current main.
NAS users (UGOS, Synology, unRAID) expect the LinuxServer.io
PUID/PGID convention and bind-mount /opt/data from a host directory
owned by their own UID. Without this alias those vars are silently
ignored and the s6-setuidgid drop to UID 10000 leaves the runtime
unable to read the volume. HERMES_UID/HERMES_GID still take
precedence when both are set.
The original PR targeted docker/entrypoint.sh, which is now a 27-line
deprecation shim under s6-overlay (the May 2026 rework moved all
bootstrap logic to docker/stage2-hook.sh, installed as
/etc/cont-init.d/01-hermes-setup). Re-applied the same 2-line
alias resolution at the equivalent spot in stage2-hook.sh just
before the existing UID/GID remap block. Test was retargeted at
docker/stage2-hook.sh; docs hunk adapted to current main's wording
("stage2 hook" + s6-setuidgid, not the obsolete "entrypoint drops
via gosu") with the NAS bind-mount example preserved verbatim.
Test-first regression verification: reverted just docker/stage2-hook.sh
to origin/main and re-ran the new tests. Result:
FAILED test_stage2_hook_resolves_puid_pgid_aliases
FAILED test_puid_pgid_populate_hermes_uid_gid
AssertionError: assert ':' == '1000:10'
That's the exact bug shape — PUID=1000 PGID=10 silently ignored,
HERMES_UID/HERMES_GID stay empty. With the salvage applied, all 4
tests pass.
Closes#25872
Co-authored-by: konsisumer <11262660+konsisumer@users.noreply.github.com>
The previous ruff prune commit removed two categories of test-file
imports whose value is the side effect of importing them, not their
binding:
tests/tools/test_kanban_tools.py — 5 sites
`import tools.kanban_tools # ensure registered`
The import itself runs tools/kanban_tools.py's @registry.register
calls; without it, the kanban tool registry is empty and
test_kanban_tools_visible_with_env_var asserts {} != {7 kanban tools}.
tests/tools/test_command_guards.py — 1 site
`import tools.tirith_security # Ensure the module is importable so we can patch it`
The comment names the requirement: keep the bare module reference
so subsequent mock.patch("tools.tirith_security.<fn>") calls find
a registered submodule.
CI failure: test (5) shard, tests/tools/test_kanban_tools.py:58
AssertionError: expected {kanban_*}, got set()
The mechanical ruff prune in the previous commit removed several names that
`appear` unused inside their defining module but are external test/runtime
anchors:
run_agent
OpenAI, _SafeWriter
get_tool_definitions, handle_function_call, check_toolset_requirements
estimate_request_tokens_rough
DEFAULT_AGENT_IDENTITY, build_context_files_prompt,
build_environment_hints, build_nous_subscription_prompt
_is_destructive_command, _extract_parallel_scope_path, _paths_overlap,
_append_subdir_hint_to_multimodal, _trajectory_normalize_msg
tools/web_tools
Firecrawl, _get_firecrawl_client
These get accessed via four channels that are invisible to ruff's
in-module usage analysis:
1. `mock.patch('module.name', ...)` in tests — resolves the attribute
lazily, so `pytest --collect-only` passes even when the name is
gone, but every test using the patch fails at runtime with
AttributeError.
2. `from run_agent import X` in production siblings (agent/transports
/codex.py, etc.).
3. The `_ra().X` indirection pattern in agent/system_prompt.py et al.
— explicitly documented ("Many tests patch('run_agent.load_soul_md')")
to preserve the patch contract.
4. `from tools.web_tools import _get_firecrawl_client` in tests.
Each re-added import carries an explicit `# noqa: F401` with a comment
naming the channel, so future cleanup passes won't strip them again.
Remove unused imports (F401) and duplicate/shadowed import
redefinitions (F811) across the codebase using ruff's safe
autofixes. No behavioral changes -- imports only.
- ~1400 safe autofixes applied across 644 files (net -1072 lines)
- __init__.py re-exports preserved (excluded from F401 removal so
public re-export surfaces stay intact)
- Re-exports that are imported or monkeypatched by tests but look
unused in their defining module are kept with explicit # noqa:
F401 (gateway/run.py load_dotenv; run_agent re-exports from
agent.message_sanitization, agent.context_compressor,
agent.retry_utils, agent.prompt_builder, agent.process_bootstrap,
agent.codex_responses_adapter)
- Unsafe F841 (unused-variable) fixes deliberately skipped -- those
can change behavior when the RHS has side effects
- ruff lints remain disabled in pyproject.toml (only PLW1514 is
selected); this is a one-time cleanup, not a config change
Verification:
- python -m compileall: clean
- pytest --collect-only: all 27161 tests collect (zero import errors)
- core entry points import clean (run_agent, model_tools, cli,
toolsets, hermes_state, batch_runner, gateway)
- static scan: every name any test imports directly from an edited
module still resolves
* fix(codex): surface error code in Responses 'failed' status errors
When a Codex Responses turn ends with status=failed, the response carries
the failure details under `response.error` as
`{code, message, param, ...}`. The previous extractor pulled only
`message`, so users seeing a rate-limit failure got a bare "Slow down"
string indistinguishable from a generic stream truncation; an
internal_error with empty message degraded to a dict dump
("{'code': 'internal_error', 'message': ''}").
Extract a `_format_responses_error()` helper that:
- prefixes `code` when both code and message are present
(e.g. 'rate_limit_exceeded: Slow down')
- falls back to the bare `code` when message is empty
- accepts both dict and attribute-style payloads (SDK and JSON-RPC paths)
- preserves the prior status-only fallback when no error payload exists
Apply the same helper at the sibling site in
`codex_app_server_session.run_turn()` so codex-CLI subprocess turn
failures get the same treatment.
Tests:
- 8 new unit tests for `_format_responses_error` covering both shapes,
empty/missing fields, non-string fields, and the status-only fallback.
- 2 regression tests on `_normalize_codex_response` for failed status
with and without a code, asserting the exact RuntimeError message.
- All 3603 tests in tests/agent/ pass.
Adapted from anomalyco/opencode#28757.
* feat(prompt): universal task-completion guidance + local Python toolchain probe
Two cross-model failure modes get a single-line answer in the cached
system prompt. Both gated by config (default on), both add zero overhead
when not needed, both verified via real AIAgent prompt builds.
## What changed
`TASK_COMPLETION_GUIDANCE` — short prompt block applied to ALL models.
Targets two failure modes observed on a real Sarasota real-estate build
task: (1) Opus stopped after writing an 85-byte stub and gave a prose
response with finish_reason=stop on call #3 of 90; (2) DeepSeek pushed
through a PEP-668 wall, then returned fabricated listings instead of
admitting the blocker. Both behaviors are model-family-agnostic, so the
guidance lives outside the existing tool_use_enforcement gate (~192
tokens, paid once per session via prefix cache).
`tools/env_probe.py` — local Python toolchain probe. Detects
python3/pip/uv/PEP-668 state and emits ONE short line in the system
prompt when something is non-default. Emits NOTHING when the env is
clean (zero token cost for normal users). Skipped entirely for remote
terminal backends (docker/modal/ssh) — they have their own probe.
Example output on a broken environment (the actual case):
Python toolchain: python3=3.11.15 (no pip module),
python=missing (use python3), pip→python3.12 (mismatch),
PEP 668=yes (use venv or uv).
## Config
Both flags live under `agent.` in config.yaml, default True:
agent:
task_completion_guidance: true # universal "finish the job" block
environment_probe: true # local Python toolchain hints
Neither addition required a `_config_version` bump — deep-merge fills
defaults in for existing user configs.
## Validation
| Test surface | Result |
|---|---|
| tests/tools/test_env_probe.py | 10/10 pass (probe unit) |
| tests/run_agent/test_run_agent.py — new classes | 8/8 pass (integration) |
| TestToolUseEnforcementConfig | 17/17 pass (no regression) |
| TestBuildSystemPrompt | 9/9 pass (no regression) |
| TestInvalidateSystemPrompt | 2/2 pass (no regression) |
| tests/agent/test_prompt_builder.py | 124/124 pass (no regression) |
| tests/hermes_cli/ | 5662/5662 pass (config defaults) |
| E2E AIAgent build (broken env) | Both blocks present, 2,178 chars |
| E2E AIAgent build (clean env) | 771-char net overhead, env probe silent |
External rotation (logrotate, manual `mv gateway.log gateway.log.1`,
another process rotating the file) leaves `_ManagedRotatingFileHandler`'s
open fd pinned to the renamed inode. All subsequent writes go to the
rotated backup instead of the file every operator expects to read,
producing the symptom 'gateway.log frozen mid-write while agent.log
keeps growing with gateway.* records'.
PR #16229 fixed the original CLI->gateway init-order bug (#8404) so the
handler attaches in the first place. This is the sibling fix for what
happens after attach, when something external rotates underneath us.
Adds a WatchedFileHandler-style inode check on emit(): if baseFilename
no longer matches the open stream's (dev,ino), close the stale fd and
reopen at the expected path. doRollover() refreshes the snapshot so our
own rollover isn't misidentified as external.
Five regression tests cover the matrix: external rename, external
unlink, external truncate (must NOT trigger reopen — inode unchanged),
normal doRollover() (must still work), and the end-to-end
Allen-reproduction (rotate + re-call setup_logging).
55/55 tests in tests/test_hermes_logging.py pass; 5972/5972 in
tests/gateway/ pass.
* fix(compression): prevent session-id fork from concurrent compressions
When two AIAgent instances share the same session_id (most commonly the
parent-turn agent and its background-review fork, which inherits
session_id verbatim via background_review.py L451), both can call
compress_context() on overlapping snapshots of the same conversation.
Each ends the parent and creates its own NEW child session in state.db,
both parented to the same old id. The gateway SessionEntry only catches
one rotation; the other becomes an orphan that silently accumulates
writes — Damien's incident shape (parent 20260527_234659_e65f0e → two
children, only one visible).
Adds a state.db-backed per-session compression lock. Acquired before
the rotation in conversation_compression.compress_context(); on
failure, the caller returns messages unchanged so the auto-compress
retry loop stops cleanly. TTL (5min default) reclaims locks abandoned
by crashed compressors. Lock holder identity (pid:tid:agent:nonce) is
preserved for diagnostics via get_compression_lock_holder().
Schema bumped 13 -> 14 to track the new compression_locks table.
Reconciled additively via the existing declarative-column pattern;
no data migration needed for existing DBs.
Regression test reproduces Damien's shape: two threads racing
_compress_context on a shared parent_sid. Without the lock the test
deterministically produces 2 child sessions; with the lock, exactly 1.
Covers all six compression entry points (preflight in conversation_loop,
mid-turn fallback, hygiene compression in gateway, /compact, CLI
/compress, TUI /compress). ACP /compress was already protected by
nulling out _session_db before its compress call.
* ci: trigger rerun (transient GitHub API rate limit on CodeQL workflow)
Tailwind v4 defines its own --font-sans and --font-mono tokens
independently of the Hermes theme variables. Components using
font-sans/font-mono utility classes bypass --theme-font-sans and
--theme-font-mono, so theme font changes have no effect.
Add --font-sans and --font-mono bridges in the @theme inline block
so Tailwind's font tokens follow the active Hermes theme.
Fixes#20380
Replace the runner-introspection trick in #32998 with an explicit
`set_topic_recovery_fn` setter on `BasePlatformAdapter`. The gateway
runner installs it once at adapter init; the adapter calls
`_apply_topic_recovery(event)` before any session keying.
Also apply the hook in `BasePlatformAdapter.handle_message` so the
running-agent guard and pending-message queue key off the recovered
thread_id too — not just the text-batch coalescence.
Net change vs #32998 alone: -2 files of indirection (no
`_message_handler.__self__` peek, no separate `_normalize_text_batch_source`),
+1 generic mechanism (other adapters can install their own hook later).
Salvages #24490 by @liuhao1024 against current main.
The Docker daemon will silently auto-create a directory at the host
path of any `-v <host>:<container>` bind mount when the host path
doesn't exist. In Docker-in-Docker setups (where the outer host's
real credential file isn't visible inside the agent's parent
container), this leaves a directory at the credential mount source —
and the inner `docker run` then refuses to mount a directory over a
file destination with exit 125.
Add defensive shape guards to all three mount loops in
DockerEnvironment.__init__:
* credentials (expected: file) — skip + warn on directory or missing
* skills (expected: dir) — skip + warn when not a directory
* cache (expected: dir) — skip + warn when not a directory
Failed mounts surface as WARN logs rather than crashing the container
start. Existing well-formed sources mount unchanged.
The original PR's branch was on a pre-container-reuse-rework base
(May 12) and conflicted with the post-May-28 driver work (label
tagging, container reuse, orphan reaper). Reconstructed the same
intent on current main; the three guard blocks slot cleanly into
`tools/environments/docker.py` around the existing mount loops.
Three new tests pinned in `tests/tools/test_docker_environment.py`:
directory-source skip, missing-source skip, valid-file mounts. Test-
first regression verification: reverted just the production code to
`origin/main` and confirmed the new tests fail with
`'deleted_token.json' is contained here: /root/.hermes/...` — the
fixed code makes them pass. Full file passes (54/54).
Closes#24490
Co-authored-by: liuhao1024 <11816344+liuhao1024@users.noreply.github.com>
Two small bugs in the kanban dispatcher's CLI surface that were
silently degrading two distinct workflows. Bundled because the test
files and the surrounding code surface overlap.
## #33488: hermes kanban dispatch ignored kanban.max_in_progress / max_spawn
The CLI wrapper in hermes_cli/kanban.py:_cmd_dispatch only passed
default_assignee and max_in_progress_per_profile through to
dispatch_once. The global concurrency cap (kanban.max_in_progress)
and the per-tick spawn limit (kanban.max_spawn) were silently dropped,
so operators using 'hermes kanban dispatch' as a one-shot or in a
custom loop couldn't reach either cap from config — only the gateway
embedded dispatcher honored them.
Fix: read both keys from config in the same coerce-positive-int
helper that already handled max_in_progress_per_profile. CLI --max
still wins over config kanban.max_spawn when both are present
(explicit operator signal beats default), but absent --max falls
back to config.
## #29415: synthesizer crashed in retry loop on missing skill
hermes_cli/kanban_swarm.py:212 hardcoded skills=['avoid-ai-writing'],
a skill that doesn't exist in the bundled skills/ directory or any
registered hub source. Every synthesizer worker spawn failed at CLI
startup with 'Unknown skill(s): avoid-ai-writing' before the agent
loop even started — the dispatcher retried up to failure_limit
(default 2), then auto-blocked the task, then dependency rules could
re-promote it, looping forever until manual intervention.
Fix: replace with 'humanizer' which is bundled at
skills/creative/humanizer/SKILL.md (description: 'Humanize text:
strip AI-isms and add real voice'). That's the obvious intent behind
the 'avoid-ai-writing' name, and the skill is platform-portable
(linux/macos/windows) so it works on every supported runtime.
## Tests
tests/hermes_cli/test_kanban_cli_dispatch_passthrough.py — 4 cases:
- CLI passes max_in_progress / max_spawn / default_assignee /
max_in_progress_per_profile from config to dispatch_once
- CLI --max flag overrides config kanban.max_spawn
- Invalid cap values (0, -1, 'abc', '1.5') silently fall through to None
- kanban_swarm.py no longer references 'avoid-ai-writing' AND the
replacement 'humanizer' skill exists at the expected on-disk path
Kanban suite: 468/468 pass (was 464; +4 new regression tests).
The opencode-go relay defaults max_tokens to 262144 when none is sent,
but Xiami mimo-v2.5-pro only supports 131072 completion tokens — every
request 400s with "max_tokens is too large: 262144" before the agent
can do anything.
Add a get_max_tokens(model) hook on ProviderProfile (default returns
default_max_tokens) so profiles fronting multiple upstreams can vary
the cap per-model. Wire chat_completions transport through the hook.
Override on OpenCodeGoProfile with mimo-v2.5-pro=131072.
Only mimo-v2.5-pro is capped — other opencode-go models (kimi, glm,
qwen, minimax, other mimo variants) unchanged.
When HOME=/root (Docker containers) and the process runs as unprivileged
user (hermes, uid 10000), Path.home() / '.modal.toml' raises PermissionError
because /root/ is inaccessible. This crashes the dashboard /api/skills endpoint.
Catch PermissionError/OSError and treat as 'no config file'. Env vars still
take priority (tested).
Fixes#33525
NVIDIA's verified skills catalog (https://github.com/NVIDIA/skills) ships
NVIDIA-signed skills for CUDA-X, AIQ, cuOpt, cuPyNumeric, DeepStream, NeMo,
NemoClaw and the Skill Card Generator — each bundle carrying a detached
`skill.oms.sig` signature, a governance `skill-card.md`, and `evals/`. The
sync pipeline drops any skill missing those artifacts before publishing.
Changes:
- tools/skills_hub.py: add NVIDIA/skills to GitHubSource.DEFAULT_TAPS so
it lights up in `hermes skills browse`, `hermes skills search <q>`, the
twice-daily skills-index build, and the docs-site Skills Hub page
(https://hermes-agent.nousresearch.com/docs/skills) automatically.
- tools/skills_guard.py: add NVIDIA/skills to TRUSTED_REPOS so installs
resolve to trust_level="trusted" (looser install policy than community).
- website/scripts/extract-skills.py: map the `github` source id to a
friendly "NVIDIA" pill label for the docs hub page.
- website/src/pages/skills/index.tsx: register the NVIDIA pill (green
#76b900) and slot it into SOURCE_ORDER after HuggingFace.
- website/docs/user-guide/features/skills.md (+ zh-Hans i18n): document
the new default tap and the expanded trusted-repos list.
- tests/tools/test_skills_guard.py: assert NVIDIA/skills resolves to
"trusted" (including the skills-sh-wrapped form).
- tests/tools/test_skills_hub.py: invariant — every TRUSTED_REPOS entry
must be reachable via GitHubSource.DEFAULT_TAPS (prevents future
trusted repos from being declared but never browseable).
Validation:
- Live GitHub fetch: `src.fetch('NVIDIA/skills/skills/aiq-deploy')` pulled
17 files including SKILL.md (13 KB), skill-card.md, skill.oms.sig, and
the full references/ + evals/ tree. trust_level="trusted".
- Live inspect resolved name, description, and trust correctly.
- All 193 existing skills_guard + skills_hub tests still pass.
When docker.sock is mounted (common Docker Compose pattern), the agent
can restart/stop/kill containers without user approval. hermes gateway
restart is already protected, but docker restart, docker stop,
docker kill, and their docker compose equivalents were not.
This caused repeated self-termination: the agent ran docker restart
hermes, killed its own container, Docker restarted it (restart policy),
and the agent resumed the same session — creating a restart loop.
Added patterns mirror the existing gateway lifecycle protection:
- docker compose restart/stop/kill/down
- docker restart/stop/kill
Co-authored-by: Sarbai <sarbai@users.noreply.github.com>
ping is a fundamental network diagnostic tool that most users expect to have available in the container. This adds iputils-ping to the apt install list in the Dockerfile.
Co-authored-by: ninjmnky <ninjmnky@users.noreply.github.com>
Two parallel public-path allowlists drifted: _PUBLIC_API_PATHS in
hermes_cli/web_server.py (legacy _SESSION_TOKEN middleware) and
_GATE_PUBLIC_PREFIXES in hermes_cli/dashboard_auth/middleware.py
(OAuth gate). The legacy list included /api/status (documented as a
non-sensitive read-only liveness target); the OAuth gate's list did not.
Effect: every wildcard-subdomain agent surfaced as STARTING/down to the
portal even though the dashboard was serving correctly. Nous account
service (src/server/agents/fly-provider.ts
getInstanceRuntimeStatus) fetches ``/api/status`` without a cookie
as its sole liveness probe; the OAuth gate's 401 looked identical to
'agent dead' on the portal side.
Fix: lift the allowlist into hermes_cli/dashboard_auth/public_paths.py
and have both middlewares import it. _path_is_public now consults
the shared frozenset first, then falls back to the gate's
auth-bootstrap/static prefix list. Future additions to the public list
hit both gates automatically.
Endpoint inventory (verified safe to remain public):
* /api/status — version, gateway state, active session count,
auth-gate shape. Portal liveness probe target.
* /api/config/defaults — config-defaults feed for the SPA's Config page
* /api/config/schema — config schema for the SPA's Config page
* /api/model/info — model catalogue metadata (context windows)
* /api/dashboard/themes — theme manifests for the skin engine
* /api/dashboard/plugins — plugin manifests for the dashboard
No user data, no session content, no secrets. Same shape an external
monitoring agent would hit on /healthz.
Tests:
* New: test_gated_status_is_public (regression guard with the NAS
fly-provider.ts liveness-probe rationale spelled out in the docstring)
* New: test_other_public_api_paths_are_public_under_gate (parametrised
over the rest of PUBLIC_API_PATHS — proves 401 / 302-to-login is
never the response)
* New: docker integration check #3 in
test_dashboard_oauth_gate_engaged_by_default — /api/status
remains 200 under the gate AND reports auth_required=True so the
portal can distinguish modes
* Updated: test_full_login_round_trip_unlocks_gated_api now probes
/api/sessions instead of /api/status (status is public, so it
can no longer distinguish 'logged in' from 'gate accidentally
disabled')
* Updated: TestApi401Envelope (the no-cookie / invalid-cookie /
dead-cookie tests) probes /api/sessions for the same reason
* Updated: docker integration check #2 in
test_dashboard_oauth_gate_engaged_by_default probes
/api/sessions to prove the gate is intercepting
* Removed: dead _login() helper in
test_dashboard_auth_status_endpoint.py (no longer needed since
/api/status is reachable cold)
Companion to docs/handover/hermes-agent-dashboard-s6-insecure-fix.md
(the --insecure flag fix that shipped earlier).
Two related dispatcher behaviors that have been missing for a while.
## kanban.default_assignee (#27145)
Reporter (@agarzon): dashboard creates a task without an assignee, task
parks in 'ready' forever even though the operator's intent ('default')
is perfectly clear. The dispatcher already had a 'skipped_unassigned'
bucket but no fallback routing — users had to manually type 'default'
in the assignee field every time.
Behavior: when 'kanban.default_assignee' is set in config.yaml, the
dispatcher applies that assignee to any unassigned ready task before
deciding whether to spawn. The row is mutated (assignee column + an
'assigned' event with source='kanban.default_assignee' for the audit
trail). Empty/whitespace config value = no fallback, preserving the
existing skipped_unassigned behavior.
Dry-run mode reports what WOULD happen via the new
'auto_assigned_default' bucket on DispatchResult, but does NOT mutate
the DB — operators using 'hermes kanban dispatch --dry-run' see the
routing decision before committing.
## kanban.max_in_progress_per_profile (#21582)
Reporter (@edwardchenchen, @simlu, 4 reactions): fan-out workloads
saturate one profile's local model / API quota / browser pool while
other profiles sit idle. The existing global 'max_in_progress' caps
total workers but doesn't balance across profiles.
Behavior: when 'kanban.max_in_progress_per_profile' is set to a
positive int, the dispatcher tracks per-assignee running counts (one
query at tick start) and refuses to spawn for any assignee already at
the cap. Tasks blocked this way go to a new
'skipped_per_profile_capped' bucket on DispatchResult as
(task_id, assignee, current_running_count) tuples — NOT an
operator-actionable failure, just 'try again next tick when the
profile has capacity'.
Pre-existing 'running' tasks count against the cap (verified via
regression test). The cap respects dry_run mode by incrementing
its in-memory counter on each would-be spawn so dry_run reports
the same balanced subset that a real tick would.
Invalid cap values (0, negative, non-int, None) are treated as 'no
cap', preserving the existing behavior. Backward-compatible for
installs that don't set the config.
## Surfaces
- 'hermes kanban dispatch' CLI now prints 'Auto-assigned to
kanban.default_assignee=X: ...' and 'Deferred (X at per-profile cap,
N running): ...' lines, plus matching JSON keys in --json output.
- Gateway dispatcher logs the configured values at startup
('default_assignee=X', 'max_in_progress_per_profile=N').
- 'kanban.max_in_progress_per_profile' added to DEFAULT_CONFIG with
inline docs.
## Validation
- tests/hermes_cli/test_kanban_default_assignee.py (6 cases): no-cap
baseline, auto-assign + DB mutation, dry-run reports without
mutating, whitespace treated as None, explicit assignees untouched,
DispatchResult field schema.
- tests/hermes_cli/test_kanban_per_profile_cap.py (9 cases including
4 parametrized): no-cap baseline, balanced 2-profile fan-out,
pre-existing running counts against cap, invalid cap values
(0/-1/'abc'/None), capped tasks dispatched on next tick after
running task completes, DispatchResult field schema.
- Broader kanban suite: 464/464 pass (was 449 baseline; +15 new
regression tests across both features).
## Credit
#27145 — Jimmy Johansson reported the dispatcher skipped-unassigned
gap; @agarzon scoped the simpler 'honor kanban.default_assignee' fix
that matches the existing config knob.
#21582 — @edwardchenchen filed the per-profile cap ask after hitting
model 429s on fan-out research projects; @simlu confirmed the same
pain on local-model setups.
The page was last meaningfully rewritten in the pre-s6 (tini) era and had
drifted on five points that no longer matched the image:
1. "Running the dashboard" claimed the entrypoint backgrounds
`hermes dashboard` and prefixes its output with `[dashboard]`. That
was the pre-s6 entrypoint.sh path; under s6 the dashboard is a
supervised s6-rc service (`docker/s6-rc.d/dashboard/run`) with no
sed-prefix pipeline. Rewrote the section accordingly.
2. The default for `HERMES_DASHBOARD_HOST` was documented as
`127.0.0.1`. The s6 run script defaults it to `0.0.0.0`
(`dash_host="${HERMES_DASHBOARD_HOST:-0.0.0.0}"`). Fixed the table
and the surrounding prose.
3. Multi-profile was documented as "not recommended in Docker — run
one container per profile." That advice was load-bearing when
there was no in-container supervisor, but the s6 architecture
explicitly adds per-profile gateway supervision: each profile
created via `hermes profile create <name>` gets a slot under
`/run/service/gateway-<name>/`, the `02-reconcile-profiles`
cont-init script restores them across `docker restart` from
`gateway_state.json`, and `hermes gateway start/stop/restart` is
intercepted by `_dispatch_via_service_manager_if_s6` to route
through `s6-svc`. Pivoted the section to "one container, many
supervised profile gateways" as the default, with a comparison
table and a "When you DO want a separate container" escape
hatch for the genuine resource-isolation / network-segmentation
cases.
4. The Compose example trailer also claimed `[dashboard]` log
prefixing. Replaced with the actual log routing.
5. Added a new "Where the logs go" section covering all four log
surfaces: per-profile gateways (tee'd to `docker logs` AND
`${HERMES_HOME}/logs/gateways/<profile>/current` since PR
b34532319), dashboard (`docker logs`, no prefix), boot reconciler
(`container-boot.log`), and `hermes logs`. The gateway-mode and
Compose sections cross-reference this rather than each carrying
their own routing prose.
Added a new "docker exec automatically drops to the hermes user"
subsection under "What the Dockerfile does", next to the existing
Privilege model warning. Documents the `/opt/hermes/bin/hermes` shim
(landed via the docker-exec privilege-drop work) — operators don't
need to remember `--user hermes` for `docker exec hermes login`,
`docker exec hermes profile create …`, etc. The historical footgun
(`auth.json` written as `root:root`, supervised gateway then can't
read its own auth file) is mentioned only as context for what the
fail-loud `exit 126` is protecting against, not as a problem the
reader needs to solve. The `HERMES_DOCKER_EXEC_AS_ROOT=1` opt-out is
documented for diagnostic sessions.
The "Permission denied" troubleshooting subsection now carries a
single-line pointer to the new section instead of duplicating it.
The `--insecure` framing reflects PR #fb5125362 (opt-in via
`HERMES_DASHBOARD_INSECURE`, not derived from bind host): the OAuth
gate is the authority, the bind host alone never implies
`--insecure`, and opting out is an explicit security trade-off.
Anchors verified resolve. i18n zh-Hans mirror left for the
translation flow to catch up.
Updates the Docker Backend section of the user-guide configuration page
to match the actual behavior shipped in PR #33645. Pre-PR the docs
claimed "container is stopped and removed on shutdown," which was
never quite true for the documented happy path and is now actively
wrong: in default mode the container survives across Hermes processes
so background processes (npm watchers, dev servers, long-running
pytest) carry over the way the "ONE long-lived container shared
across sessions" promise requires.
Changes to `website/docs/user-guide/configuration.md`:
* Reworked the intro paragraph at the top of the Docker Backend
section to describe the actual cross-process reuse contract.
* Expanded the YAML example with the new keys
`docker_persist_across_processes` and `docker_orphan_reaper`, plus
the pre-existing-but-undocumented `docker_env`, `timeout`, and
`lifetime_seconds`. Clarified the `container_persistent` comment
to disambiguate from `docker_persist_across_processes`.
* Added a `docker_env` vs `docker_forward_env` explainer (one
injects literal KEY=value, the other forwards values from the
host/.env — easy to confuse).
* Replaced the one-line "Container lifecycle" paragraph with a full
subsection covering:
- the three labels Hermes tags every container with
(hermes-agent, hermes-task-id, hermes-profile)
- the label-probe reuse mechanism on startup
- a teardown-trigger table with four rows for every situation
that destroys the container in default mode
- edge cases (OOM kill, profile switching)
* Added an "Environment variable overrides" table covering all
TERMINAL_* env vars relevant to the Docker backend, including the
previously-undocumented `TERMINAL_DOCKER_ENV` and
`HERMES_DOCKER_BINARY`.
Changes to `website/docs/user-guide/docker.md`:
* Extended the cross-link admonition (around l.227) so the
Hermes-in-Docker page points at the new terminal-backend keys
(`docker_env`, `docker_persist_across_processes`,
`docker_orphan_reaper`) alongside the ones already mentioned.
No code changes. Behavior already covered by tests added in earlier
commits on this branch (#33645 commits 1-5).
Refs #20561
Commit 4 made cleanup_vm() default to force_remove=True, which was wrong:
cleanup_vm() is called from AIAgent.close() (TUI session close at
tui_gateway/server.py:2991, gateway session teardown at gateway/run.py:3569)
and from per-turn cleanup (agent/chat_completion_helpers.py:1517). All
three are session-lifecycle events that should honor persist mode, not
explicit user-initiated teardown.
Ben reported the symptom: container shared between multiple TUI sessions
(good) but killed as soon as any session closed (bad). With force_remove=True
as the default, every `session.close` JSON-RPC tore down the container.
The fix is to flip cleanup_vm()'s force_remove default back to False.
The kwarg still exists for future explicit-teardown paths (`/reset`-style
flows, "destroy my sandbox" commands) that haven't been wired up yet.
Two new unit tests pin the behavior:
* `test_cleanup_vm_default_honors_persist_mode` — asserts
`cleanup_vm(task_id)` does neither docker stop nor docker rm on a
persist-mode container (the regression Ben caught).
* `test_cleanup_vm_force_remove_tears_down_persist_container` —
asserts the kwarg still flows through the runtime-signature-inspection
plumbing to the backend's cleanup().
E2E verified against real Docker (in addition to all 17 existing checks):
✓ Default cleanup_vm() leaves persist-mode container running
✓ cleanup_vm(force_remove=True) removed the container
Refs #20561
The first iteration of this PR did docker stop on every cleanup in
persist mode (only skipping docker rm). Ben caught this as
contradicting the documented "ONE long-lived container shared across
sessions" semantics: stopping the container on every Hermes /quit kills
any background processes inside (npm watchers, pytest watchers,
long-running scripts) — exactly the case persist mode is supposed to
protect.
This commit splits the cleanup paths cleanly:
* **Persist mode (default)** — cleanup() is a NO-OP for the
container. Container stays running, processes survive, next Hermes
process attaches via the existing label probe in ~ms instead of
waiting for docker start. Resource reclamation happens via the
orphan reaper at next startup (2 × lifetime_seconds threshold), which
covers the SIGKILL / OOM / abandoned-laptop cases.
* **Opt-out mode (persist_across_processes=False)** — unchanged:
docker stop + docker rm -f on cleanup as before.
* **Explicit teardown** — new cleanup(force_remove=True) kwarg
overrides persist mode and tears the container down unconditionally.
cleanup_vm(task_id) now defaults to force_remove=True since
it's the user-driven reset path (called from AIAgent.close(),
/reset-style flows, and the idle reaper's per-turn cleanup).
The idle reaper in _cleanup_inactive_envs calls env.cleanup()
directly with no kwargs, so idle persist-mode envs are no-op'd — the
container survives the in-process pop and the next tool call re-probes
via labels. No state leak: _container_id is still cleared on the
in-process handle.
E2E verified against real Docker:
✓ Container is still running after cleanup()
✓ Background process (sleep loop) survived cleanup()
✓ Filesystem state preserved across cleanup()
✓ In-process container_id cleared (next __init__ will re-probe)
✓ Background process visible from reused env (no docker start happened)
✓ force_remove=True removed the container even in persist mode
✓ cleanup_vm() removed the container (defaults to force_remove=True)
Test changes:
* Replaces `test_cleanup_with_persist_only_stops_no_rm` with
`test_cleanup_with_persist_is_noop_for_container` — asserts neither
stop nor rm runs in persist mode, and the in-process handle is
cleared so re-probe works.
* Adds `test_cleanup_force_remove_stops_and_rms_even_in_persist_mode`
— covers the new kwarg.
* Updates `test_cleanup_uses_subprocess_run_not_detached_shell` and
`test_wait_for_cleanup_after_cleanup_returns_true` to pass
`force_remove=True` so they actually exercise the docker code path
(default no-op would trivially pass).
cleanup_vm() forwards `force_remove` only to backends whose cleanup()
accepts the kwarg (currently just DockerEnvironment) via runtime
signature inspection — Modal/Daytona/SSH `cleanup()` signatures are
unchanged.
Refs #20561
The cleanup-fix in the previous commit handles the graceful-exit leak: a
Hermes process that runs ``atexit`` will now actually wait on the docker
stop/rm worker thread, so containers either survive (persist mode) or are
fully removed (opt-out mode) by the time the interpreter exits.
But ``atexit`` doesn't fire on SIGKILL, OOM-kill, or terminal-window
close. Containers from those exits stay parked with no surviving Python
process to reuse or remove them, so they accumulate until the operator
intervenes with ``docker rm -f``. The cleanup-fix doesn't help this class
— there's no live cleanup() to fix.
This commit adds the safety net: a startup orphan reaper that runs once
per Hermes process and removes long-Exited hermes-labeled containers
that the prior commit couldn't reach.
Implementation:
* New ``reap_orphan_containers()`` in ``tools/environments/docker.py``.
Filters: ``label=hermes-agent=1`` + ``status=exited`` + (optional)
``label=hermes-profile=<current>``. Per-container ``docker inspect``
parses ``State.FinishedAt`` (with nanosecond-precision trimming for
Python's microsecond-bound ``fromisoformat``); containers older than
the threshold get ``docker rm -f``'d. The ``status=exited`` filter is
load-bearing — a running container may belong to a sibling Hermes
process whose reuse path will pick it up; killing it would crash the
sibling mid-command. Single-container failures are logged and the
sweep continues to the next candidate.
* New ``_maybe_reap_docker_orphans()`` helper in
``tools/terminal_tool.py``. Wired into ``_create_environment()`` for
``env_type == "docker"``. Gated by:
- ``terminal.docker_orphan_reaper: true`` (default; opt-out for
operators running multiple Hermes processes in the same profile
who don't trust the conservative defaults)
- ``_docker_orphan_reaper_ran`` module flag with double-checked
locking — parallel subagents and RL rollouts don't trigger N
concurrent docker ps storms
- Age threshold = ``2 × TERMINAL_LIFETIME_SECONDS`` with a 60s floor
(so ``TERMINAL_LIFETIME_SECONDS=0`` doesn't race the user's own
setup)
- Profile scoping — a research profile NEVER reaps the default
profile's stragglers
- Exception swallow — a janitor failure must never block container
creation
* New config ``terminal.docker_orphan_reaper`` wired through all four
config-bridge sites (cli.py, gateway/run.py, hermes_cli/config.py,
tests/conftest.py) and pinned by
``test_docker_orphan_reaper_is_bridged_everywhere``.
Coverage:
* 9 new unit tests in test_docker_environment.py — happy path, recent-
container sparing, profile scoping, unparseable-timestamp safety,
docker-ps-failure handling, partial-failure continuation, nanosecond
timestamp parsing, zero-value FinishedAt rejection.
* 6 new integration tests in test_docker_orphan_reaper_integration.py
— once-per-process gate, disable-flag respected, lifetime doubling
with 60s floor, current-profile filter wiring, exception swallow.
* 1 new bridge-invariant regression test.
Closes#20561 (combined with the two prior commits on this branch).
The Docker backend docs claim "Single persistent container — ONE long-
lived container shared across sessions, /new, /reset, and delegate_task
subagents. Stopped/removed on shutdown." In practice the code only
honored that contract within a single Python process via the in-memory
\`_active_environments[task_id]\` cache. Every \`hermes chat\` invocation
spawned a fresh \`hermes-<hex>\` container; older containers piled up in
\`Exited\` state and accumulated until manual \`docker rm\` (issue #20561).
Three root causes, all addressed by this commit:
1. No cross-process container discovery.
2. \`cleanup()\` used fire-and-forget \`subprocess.Popen("... &", shell=True)\`
which raced with parent-process exit — when Python exited promptly the
detached shell child got killed mid-\`docker stop\`, leaving stopped
containers behind.
3. The \`docker rm\` step in cleanup was gated on \`not self._persistent\`
(the bind-mount-persistence flag). Default config sets
\`container_persistent: true\`, so the default happy path skipped \`rm\`
entirely — even when the user explicitly didn't want cross-process
reuse, containers leaked.
Fix:
* Add \`DockerEnvironment.__init__(persist_across_processes=True)\`. When
true, init probes
\`docker ps -a --filter label=hermes-agent=1
--filter label=hermes-task-id=<task>
--filter label=hermes-profile=<profile>\`
and reuses a matching container (running → attach; stopped →
\`docker start\` → attach; \`docker start\` failure → fall through to a
fresh \`docker run\`). Multiple matches prefer the running one, with the
stragglers left for the orphan reaper (next commit) to clean up.
* Rewrite \`cleanup()\`. Uses \`subprocess.run(..., timeout=30)\` on a
daemon \`threading.Thread\`, not the racy \`Popen(... &)\`. The
\`_persistent\` guard is dropped on the \`rm\` step — \`rm\` now runs
whenever \`persist_across_processes\` is false, regardless of the
bind-mount-persistence setting. The leak class is gone in all
combinations.
* Add \`wait_for_cleanup(timeout)\`. \`tools/terminal_tool.py\`'s atexit
hook calls this on every active env, blocking up to 15s for the
cleanup thread before interpreter exit. Without this, \`hermes /quit\`
raced the daemon-thread teardown and dropped the stop/rm work.
* New config \`terminal.docker_persist_across_processes\` (default
\`true\` — restores the documented contract). Set \`false\` for hard
per-process isolation. Wired through all four config-bridge sites
(cli.py env_mappings, gateway/run.py _terminal_env_map,
hermes_cli/config.py _config_to_env_sync, tests/conftest.py env-strip
list); regression-pinned by
\`test_docker_persist_across_processes_is_bridged_everywhere\` matching
the existing pattern for docker_run_as_host_user / docker_env.
Reuse intentionally does NOT compare image / mounts / resources — only
the labels. Operators changing those settings should set
\`docker_persist_across_processes: false\` (or \`docker rm -f\` the
labeled container) to force a fresh start. This keeps the probe cheap
and the failure mode obvious.
Coverage: 12 new unit tests in tests/tools/test_docker_environment.py
covering reuse paths (running, stopped, fallback, opt-out, duplicate
preference) and cleanup behavior (persist-mode no-rm, opt-out always-rm,
no-Popen, wait_for_cleanup semantics, partial-init safety). Plus one
config-bridge regression pin.
Refs #20561
Issue #20561 (Docker containers accumulate) needs a way to identify
hermes-created containers from the outside — both for the orphan reaper
(a follow-up commit) and for operators triaging `docker ps -a | grep
hermes-` after a SIGKILL leaves stragglers. The previous `hermes-<hex>`
name prefix was the only signal, which broke down under cross-process
reuse (planned) and against any custom `--name` someone might pass via
`docker_extra_args`.
This commit adds three labels at `docker run` time:
--label hermes-agent=1 # global sweep target
--label hermes-task-id=<sanitized> # per-task reuse key
--label hermes-profile=<sanitized> # per-profile isolation key
Values are sanitized to `[A-Za-z0-9_.-]` and truncated to 63 chars so the
label round-trips cleanly through `docker ps --filter label=key=value`.
Empty or non-string inputs collapse to "unknown" rather than producing
an unqueryable empty value.
No behavior change: the labels are pure metadata. The follow-up commits
in this PR (cleanup-fix + orphan reaper) are what use them.
Refs #20561
tui_gateway.server registers two atexit hooks at module load time:
ThreadPoolExecutor shutdown (line 170) and _shutdown_sessions (line 336).
Three test files reloaded the module on each fixture teardown to reset
per-test state. Each reload re-runs module-level code, including the
atexit registrations — duplicates accumulate across the test session.
At pytest interpreter shutdown the duplicated atexit hooks race the
stderr buffer flush:
Fatal Python error: _enter_buffered_busy: could not acquire lock
for <_io.BufferedWriter name='<stderr>'> at interpreter shutdown,
possibly due to daemon threads
pytest reports 'tests passed but the slice exited non-zero', and the
shard turns red on CI. Surfaced today on PR #34193's test slice 1
(204 files, 3572 tests passed, then Fatal Python error during exit).
Fix: drop importlib.reload(mod) from the three fixtures that have it.
Per-test reset is handled by clearing the mutable session dicts
(_sessions, _pending, _answers). _methods is also no longer cleared —
it's populated at module import time and would only be re-populated by
a reload, so clearing it without reload broke session.resume /
command.dispatch / slash.exec method registration across tests.
Affected fixtures:
- tests/tui_gateway/test_goal_command.py
- tests/tui_gateway/test_protocol.py
- tests/tui_gateway/test_review_summary_callback.py
The second reload in test_protocol.py at line 211 (reload of
tui_gateway.transport) is preserved — transport.py has no atexit hooks
or threads, so reload is safe there.
Tests: 84/84 in tests/tui_gateway/ pass cleanly with exit code 0; no
Fatal Python error at interpreter shutdown.
The `vercel` optional-dependency was removed from pyproject.toml in
#33067, but `nix/packages.nix` (added a few hours later in #33108)
still references `"vercel"` in the `#full` variant's
`extraDependencyGroups`. uv2nix fails evaluation with:
error: Extra/group name 'vercel' does not match either extra or
dependency group
Because `nix/devShell.nix` does
`inputsFrom = builtins.attrValues self'.packages`, the broken `#full`
derivation is pulled into the dev shell too, so `nix develop` /
direnv breaks on a fresh clone — not just `nix build .#full`.
2026-05-28 11:52:31 +03:00
1426 changed files with 157241 additions and 9512 deletions
@@ -43,7 +43,7 @@ Bundled skills (in `skills/`) ship with every Hermes install. They should be **b
- Document handling, web research, common dev workflows, system administration
- Used regularly by a wide range of people
If your skill is official and useful but not universally needed (e.g., a paid service integration, a heavyweight dependency), put it in **`optional-skills/`** — it ships with the repo but isn't activated by default. Users can discover it via `hermes skills browse` (labeled "official") and install it with `hermes skills install` (no third-party warning, builtin trust).
If your skill is official and useful but not universally needed (e.g., a paid service integration, a heavyweight dependency), put it in **`optional-skills/`** — it ships with the repo but isn't activated by default. Users can discover it via `hermes skills browse` (labeled "official") and install it with `hermes skills install` (no third-party warning, built-in trust).
If your skill is specialized, community-contributed, or niche, it's better suited for a **Skills Hub** — upload it to a skills registry and share it in the [Nous Research Discord](https://discord.gg/NousResearch). Users can install it with `hermes skills install`.
> **Heads up:** Native Windows support is **early beta**. It installs and runs, but hasn't been road-tested as broadly as our Linux/macOS/WSL2 paths. Please [file issues](https://github.com/NousResearch/hermes-agent/issues) when you hit rough edges. For the most battle-tested Windows setup today, run the Linux/macOS one-liner above inside **WSL2**.
> **Heads up:** Native Windows runs Hermes without WSL — CLI, gateway, TUI, and tools all work natively. If you'd rather use WSL2, the Linux/macOS one-liner above works there too. Found a bug? Please [file issues](https://github.com/NousResearch/hermes-agent/issues).
The installer handles everything: uv, Python 3.11, Node.js, ripgrep, ffmpeg, **and a portable Git Bash** (MinGit, unpacked to `%LOCALAPPDATA%\hermes\git` — no admin required, completely isolated from any system Git install). Hermes uses this bundled Git Bash to run shell commands.
The installer handles everything: uv, Python 3.11, Node.js, ripgrep, ffmpeg, **and a portable Git Bash** (MinGit, unpacked to `%LOCALAPPDATA%\hermes\git` — no admin required, completely isolated from any system Git install). Hermes uses this bundled Git Bash to run shell commands.
If you already have Git installed, the installer detects it and uses that instead. Otherwise a ~45MB MinGit download is all you need — it won't touch or interfere with any system Git.
If you already have Git installed, the installer detects it and uses that instead. Otherwise a ~45MB MinGit download is all you need — it won't touch or interfere with any system Git.
> **Android / Termux:** The tested manual path is documented in the [Termux guide](https://hermes-agent.nousresearch.com/docs/getting-started/termux). On Termux, Hermes installs a curated `.[termux]` extra because the full `.[all]` extra currently pulls Android-incompatible voice dependencies.
>
> **Windows:** Native Windows is supported as an **early beta** — the PowerShell one-liner above installs everything, but expect rough edges and please file issues when you hit them. If you'd rather use WSL2 (our most battle-tested Windows path), the Linux command works there too. Native Windows install lives under `%LOCALAPPDATA%\hermes`; WSL2 installs under `~/.hermes` as on Linux. The only Hermes feature that currently needs WSL2 specifically is the browser-based dashboard chat pane (it uses a POSIX PTY — classic CLI and gateway both run natively).
> **Windows:** Native Windows is fully supported — the PowerShell one-liner above installs everything. If you'd rather use WSL2, the Linux command works there too. Native Windows install lives under `%LOCALAPPDATA%\hermes`; WSL2 installs under `~/.hermes` as on Linux. The only Hermes feature that currently needs WSL2 specifically is the browser-based dashboard chat pane (it uses a POSIX PTY — classic CLI and gateway both run natively).
After installation:
@@ -104,17 +104,17 @@ You can still bring your own keys per-tool whenever you want — the gateway is
Hermes has two entry points: start the terminal UI with `hermes`, or run the gateway and talk to it from Telegram, Discord, Slack, WhatsApp, Signal, or Email. Once you're in a conversation, many slash commands are shared across both interfaces.
| Action | CLI | Messaging platforms |
|---------|-----|---------------------|
| Start chatting | `hermes` | Run `hermes gateway setup` + `hermes gateway start`, then send the bot a message |
| Start fresh conversation | `/new` or `/reset`| `/new` or `/reset` |
| Change model | `/model [provider:model]` | `/model [provider:model]` |
| Set a personality | `/personality [name]` | `/personality [name]` |
| Retry or undo the last turn | `/retry`, `/undo`| `/retry`, `/undo` |
| Browse skills | `/skills` or `/<skill-name>` | `/<skill-name>` |
| Interrupt current work | `Ctrl+C` or send a new message | `/stop` or send a new message |
| Platform-specific status | `/platforms` | `/status`, `/sethome` |
For the full command lists, see the [CLI guide](https://hermes-agent.nousresearch.com/docs/user-guide/cli) and the [Messaging Gateway guide](https://hermes-agent.nousresearch.com/docs/user-guide/messaging).
@@ -124,23 +124,23 @@ For the full command lists, see the [CLI guide](https://hermes-agent.nousresearc
All documentation lives at **[hermes-agent.nousresearch.com/docs](https://hermes-agent.nousresearch.com/docs/)**:
| Section | What's Covered |
|---------|---------------|
| [Quickstart](https://hermes-agent.nousresearch.com/docs/getting-started/quickstart) | Install → setup → first conversation in 2 minutes |
> The Foundation Release — Hermes installs and runs anywhere, ships with the things you actually want to use, and stops shipping the things you don't. xAI Grok lands as a SuperGrok OAuth provider with grok-4.3 bumped to a 1M context window. A new OpenAI-compatible local proxy turns any OAuth-authed Hermes provider — Claude Pro, ChatGPT Pro, SuperGrok — into an endpoint that Codex / Aider / Cline / Continue can hit. `x_search` lands as a first-class X (Twitter) search tool with OAuth-or-API-key auth. The Microsoft Teams stack is wired end-to-end (Graph auth + webhook listener + pipeline runtime + outbound delivery). A debloating wave makes installs dramatically lighter — heavyweight backends now lazy-install on first use, the `[all]` extras drop everything covered by lazy-deps, and a tiered install falls back when a wheel rejects on your platform. `pip install hermes-agent` works from PyPI. The cold-start wave shaves ~19 seconds off `hermes` launch. Browser CDP calls are 180x faster. Two new messaging platforms (LINE + SimpleX Chat) bring the total to 22. Cross-session 1-hour Claude prompt caching, `/handoff` that actually transfers sessions live, native button UI for `clarify` on Telegram and Discord, Discord channel history backfill, LSP semantic diagnostics on every write, a unified pluggable `video_generate`, a `computer_use` cua-driver backend that finally works with non-Anthropic providers, clickable URLs in any terminal, Zed ACP Registry integration via `uvx`, native Windows beta, 9 new optional skills, OpenRouter Pareto Code router, huggingface/skills as a trusted default tap. 12 P0 + 50 P1 closures.
> The Foundation Release — Hermes Agent installs and runs anywhere now. Native Windows ships in early beta with a full PowerShell installer story, a `pip install hermes-agent` wheel lands on PyPI, lazy-deps reshape what `pip install hermes-agent` actually pulls down, the supply-chain checker scans every install/upgrade for unsafe versions, and a new OpenAI-compatible local proxy lets Codex / Aider / Cline talk to OAuth-only providers (Claude Pro, ChatGPT Pro, SuperGrok). The cold-start wave shaves ~19 seconds off `hermes` launch, browser-tool CDP calls run 180x faster, and `hermes tools` All-Platforms drops from 14s to under 1.5s. Two new messaging platforms (LINE and SimpleX Chat) and a Microsoft Graph foundation (Teams pipeline + webhook adapter) land alongside `/handoff` that finally transfers sessions live, `vision_analyze` passing pixels through to vision-capable models, `x_search` as a first-class tool, LSP semantic diagnostics on every `write_file` / `patch`, a unified pluggable `video_generate`, a `computer_use` cua-driver backend, cross-session 1-hour Claude prompt caching, a per-turn file-mutation verifier, plus 9 new optional skills. 50+ P1 closures, 12 P0 closures.
---
## ✨ Highlights
- **xAI Grok via SuperGrok OAuth — and grok-4.3 jumps to a 1M context window** — If you pay for SuperGrok, you can now use Grok inside Hermes by signing in with your xAI account — no API key, no separate billing. The wire-through also bumps grok-4.3 to a 1M token context window, so you can drop whole codebases or research corpora into a single prompt. Includes proper handling for entitlement errors and an SSH-to-tunnel docs page for when you're SSH'd into a remote box and need to complete the OAuth flow. ([#26534](https://github.com/NousResearch/hermes-agent/pull/26534), [#26664](https://github.com/NousResearch/hermes-agent/pull/26664), [#26644](https://github.com/NousResearch/hermes-agent/pull/26644), [#26592](https://github.com/NousResearch/hermes-agent/pull/26592))
- **Native Windows support (early beta)** — full PowerShell installer, native subprocess/PTY paths, taskkill-based process management, MinGit auto-install, Microsoft Store python stub detection, foreground Ctrl+C preservation, taskkill+ps2 fallback, npm prefix handling, and ~40 follow-up Windows-only fixes across CLI / gateway / TUI / curator / tools. Hermes finally runs natively on `cmd.exe` and PowerShell, no WSL required. ([#21561](https://github.com/NousResearch/hermes-agent/pull/21561), [#22130](https://github.com/NousResearch/hermes-agent/pull/22130), [#22752](https://github.com/NousResearch/hermes-agent/pull/22752), [#26618](https://github.com/NousResearch/hermes-agent/pull/26618), and many more)
- **OpenAI-compatible local proxy for OAuth providers** — Run `hermes proxy` and you get a `http://localhost:port` endpoint that speaks the OpenAI API but is backed by whichever OAuth provider you're signed into — Claude Pro, ChatGPT Pro, SuperGrok. Now any tool that expects an OpenAI-compatible endpoint (Codex CLI, Aider, Cline, Continue, your custom scripts) just works with your existing subscription, no API key required. One subscription, every tool. ([#25969](https://github.com/NousResearch/hermes-agent/pull/25969))
- **`pip install hermes-agent && hermes`** — Hermes Agent is now a real PyPI package. One command, no clone, no git, no shell installer. Wheel includes the Ink TUI bundle and shell launcher. (salvage of [#26350](https://github.com/NousResearch/hermes-agent/pull/26350)) ([#26593](https://github.com/NousResearch/hermes-agent/pull/26593))
- **`x_search` — first-class X (Twitter) search tool** — The agent can now search X directly without installing a skill or wiring up a custom integration. Search the timeline, find threads, surface specific posts — straight from the chat. Auth with either your X OAuth login or an API key, whichever you have. ([#26763](https://github.com/NousResearch/hermes-agent/pull/26763))
- **Cold-start performance wave — ~19s off `hermes` launch** — skills cache, lazy Feishu import, no Nous HTTP at startup, plus PEP-562 lazy adapter imports (QQ, Yuanbao, Teams, Google Chat), deferred `fal_client` / `google-cloud` / `httpx` loads, models.dev disk-cache-first lookup, parallel doctor API checks, eager-skip plugin discovery on built-in subcommands, `hermes tools` All-Platforms drops from 14s to <1.5s, welcome banner skipped on `chat -q`. ([#22138](https://github.com/NousResearch/hermes-agent/pull/22138), [#22120](https://github.com/NousResearch/hermes-agent/pull/22120), [#22681](https://github.com/NousResearch/hermes-agent/pull/22681), [#22790](https://github.com/NousResearch/hermes-agent/pull/22790), [#22808](https://github.com/NousResearch/hermes-agent/pull/22808), [#22831](https://github.com/NousResearch/hermes-agent/pull/22831), [#22859](https://github.com/NousResearch/hermes-agent/pull/22859), [#22904](https://github.com/NousResearch/hermes-agent/pull/22904), [#22766](https://github.com/NousResearch/hermes-agent/pull/22766), [#25341](https://github.com/NousResearch/hermes-agent/pull/25341))
- **Microsoft Teams — end-to-end** — Hermes can now read messages from Teams and post back. The full Microsoft Graph stack lands together: auth + client foundation, a webhook listener that receives Teams events, a pipeline plugin runtime, and outbound delivery. Wire up the bot once, then chat to your agent from any Teams channel, DM, or group. (salvages of #21408–#21411) ([#21922](https://github.com/NousResearch/hermes-agent/pull/21922), [#21969](https://github.com/NousResearch/hermes-agent/pull/21969), [#22007](https://github.com/NousResearch/hermes-agent/pull/22007), [#22024](https://github.com/NousResearch/hermes-agent/pull/22024))
- **180x faster `browser_console` evaluations** — routed through the supervisor's persistent CDP WebSocket instead of spawning a fresh DevTools session per call. Real-world page interactions feel instant. ([#23226](https://github.com/NousResearch/hermes-agent/pull/23226))
- **Debloating wave — lighter installs, less you don't use** — A clean `pip install hermes-agent` used to pull down everything: every messaging adapter SDK, every image-gen SDK, every voice/TTS provider, whether you used them or not. Now those heavy backends (Slack / Matrix / Feishu / DingTalk adapters, hindsight client, codex app-server, Pixverse / Camofox / image-gen SDKs, voice/TTS providers) install automatically the first time you actually use them. The `[all]` extras drop everything covered by lazy-deps, the installer falls back through tiers when a wheel doesn't fit your platform, and a supply-chain advisory checker scans every install for unsafe versions. Faster installs, smaller disk footprint, fewer transitive vulnerabilities. ([#24220](https://github.com/NousResearch/hermes-agent/pull/24220), [#24515](https://github.com/NousResearch/hermes-agent/pull/24515), [#25014](https://github.com/NousResearch/hermes-agent/pull/25014), [#25038](https://github.com/NousResearch/hermes-agent/pull/25038), [#25766](https://github.com/NousResearch/hermes-agent/pull/25766), [#21818](https://github.com/NousResearch/hermes-agent/pull/21818))
- **Supply-chain advisory checker + lazy-deps framework + tiered install fallback** — every `pip install` / `hermes update` scans dependencies against an advisory list, lazy-deps replace heavy import-time loads with first-use installs, and the installer falls back through extras tiers when a wheel rejects on the target platform. ([#24220](https://github.com/NousResearch/hermes-agent/pull/24220))
- **`pip install hermes-agent && hermes`** — Hermes Agent is now a real PyPI package. No more cloning the repo or running shell installers — one pip command and you're running. The wheel ships with the Ink TUI bundle and the shell launcher, so the full experience comes out of the box. (salvage of [#26350](https://github.com/NousResearch/hermes-agent/pull/26350)) ([#26593](https://github.com/NousResearch/hermes-agent/pull/26593), [#26148](https://github.com/NousResearch/hermes-agent/pull/26148))
- **OpenAI-compatible local proxy** — `hermes proxy` exposes any OAuth-authed provider (Claude Pro, ChatGPT Pro, SuperGrok) as an OpenAI-compatible endpoint that Codex / Aider / Cline / VS Code Continue can hit. Your subscription, your tools. ([#25969](https://github.com/NousResearch/hermes-agent/pull/25969))
- **Cross-session 1h Claude prompt cache** — When you use Claude through Anthropic, OpenRouter, or Nous Portal, the prompt prefix (system prompt, skills, memory) now caches for an hour across sessions. Start a `/new` session and the first response comes back faster and cheaper because the cache is still warm from your last session. Background memory review hits the cache too, so it's not paying full price every turn. ([#23828](https://github.com/NousResearch/hermes-agent/pull/23828), [#25434](https://github.com/NousResearch/hermes-agent/pull/25434), [#24778](https://github.com/NousResearch/hermes-agent/pull/24778))
- **Cross-session 1-hour Claude prompt cache** — Anthropic / OpenRouter / Nous Portal now share a 1h prefix cache across sessions for Claude models. Fast resume, fast `/new`, lower cost on repeat work. ([#23828](https://github.com/NousResearch/hermes-agent/pull/23828))
- **180x faster `browser_console` evaluations** — When the agent uses the browser tool to inspect a page or run JavaScript, those calls now share one persistent connection to Chrome instead of spinning up a new DevTools session every time. The difference is huge: things that used to take a couple of seconds per call return in milliseconds. Real-world page interactions feel instant. ([#23226](https://github.com/NousResearch/hermes-agent/pull/23226))
- **Two new messaging platforms — LINE + SimpleX Chat** — LINE Messaging API lands as a first-class platform, SimpleX Chat salvages #2558 onto the modern adapter spec. Hermes is now on 22 platforms. ([#23197](https://github.com/NousResearch/hermes-agent/pull/23197), [#26232](https://github.com/NousResearch/hermes-agent/pull/26232))
- **Cold-start performance wave — ~19 seconds off `hermes` launch** — Running `hermes` used to make you wait through a chunk of import overhead and network calls before you saw a prompt. Now the launch path is mostly deferred: heavy adapters only load when you use them, model catalogs come from disk cache first, doctor checks run in parallel, and `chat -q` skips the welcome banner entirely. The `hermes tools` All-Platforms screen alone dropped from 14 seconds to under 1.5 seconds. ([#22138](https://github.com/NousResearch/hermes-agent/pull/22138), [#22120](https://github.com/NousResearch/hermes-agent/pull/22120), [#22681](https://github.com/NousResearch/hermes-agent/pull/22681), [#22790](https://github.com/NousResearch/hermes-agent/pull/22790), [#22808](https://github.com/NousResearch/hermes-agent/pull/22808), [#22831](https://github.com/NousResearch/hermes-agent/pull/22831), [#22859](https://github.com/NousResearch/hermes-agent/pull/22859), [#22904](https://github.com/NousResearch/hermes-agent/pull/22904), [#22766](https://github.com/NousResearch/hermes-agent/pull/22766), [#25341](https://github.com/NousResearch/hermes-agent/pull/25341))
- **Microsoft Graph foundation — Teams pipeline + webhook adapter** — `msgraph` auth/client foundation, webhook listener platform, Teams pipeline plugin runtime, and Teams outbound delivery via the existing adapter — Hermes can now read and post to Teams. (salvages of #21408–#21411) ([#21922](https://github.com/NousResearch/hermes-agent/pull/21922), [#21969](https://github.com/NousResearch/hermes-agent/pull/21969), [#22007](https://github.com/NousResearch/hermes-agent/pull/22007), [#22024](https://github.com/NousResearch/hermes-agent/pull/22024))
- **Two new messaging platforms — LINE + SimpleX Chat** — LINE is huge in Japan, Korea, and Taiwan, and now Hermes runs natively on the LINE Messaging API. SimpleX Chat is the privacy-focused decentralized messenger with no user IDs — also wired up as a first-class platform. That brings Hermes to 22 messaging platforms total, so wherever you and your team chat, the agent can be there. ([#23197](https://github.com/NousResearch/hermes-agent/pull/23197), [#26232](https://github.com/NousResearch/hermes-agent/pull/26232))
- **`/handoff` actually transfers the session live** — the agent's active session moves to a different model / persona / profile mid-conversation, with messages, tool history, and context preserved. ([#23395](https://github.com/NousResearch/hermes-agent/pull/23395))
- **`/handoff` actually transfers the session live** — Switching models or personalities mid-conversation used to mean losing context or starting over. Now `/handoff` moves your active session — every message, every tool call, every piece of context — to the target model, persona, or profile, live, without dropping anything. Mid-debugging hand off from a fast model to a deep-reasoning one, or pass a session between profiles for different parts of a task. ([#23395](https://github.com/NousResearch/hermes-agent/pull/23395))
- **`x_search` — first-class X (Twitter) search tool** — gated tool with OAuth-or-API-key auth, no skill needed to query the timeline. ([#26763](https://github.com/NousResearch/hermes-agent/pull/26763))
- **Native button UI for `clarify` on Telegram and Discord** — When the agent uses the `clarify` tool to ask you a multiple-choice question, it now shows real platform-native buttons on Telegram and Discord instead of asking you to type back the option number. Tap the button, the agent gets your answer. Especially nice on mobile. ([#24199](https://github.com/NousResearch/hermes-agent/pull/24199), [#25485](https://github.com/NousResearch/hermes-agent/pull/25485))
- **`vision_analyze` returns pixels to vision-capable models** — when the active model can see, `vision_analyze` now hands the image straight through instead of falling back to a text description. ([#22955](https://github.com/NousResearch/hermes-agent/pull/22955))
- **Discord channel history backfill (default on)** — When Hermes joins a Discord channel or thread for the first time, it now reads the recent message history so it knows what's been said before it responds. No more "what are we talking about?" — the agent has the context that's already on screen for everyone else. ([#25984](https://github.com/NousResearch/hermes-agent/pull/25984))
- **LSP semantic diagnostics on every write** — `write_file` and `patch` now run real language-server diagnostics on the post-edit file (delta-only) and surface real errors before they ship downstream. ([#24168](https://github.com/NousResearch/hermes-agent/pull/24168), [#25978](https://github.com/NousResearch/hermes-agent/pull/25978))
- **`vision_analyze` returns pixels to vision-capable models** — When you point the agent at an image with `vision_analyze` and the active model can actually see (GPT-5, Claude, Gemini, Grok-vision), Hermes now passes the raw pixels straight to the model instead of converting them to a text description first. You get the model's actual visual reasoning instead of a degraded text-summary round-trip. ([#22955](https://github.com/NousResearch/hermes-agent/pull/22955))
- **Per-turn file-mutation verifier footer** — after every turn that wrote files, the agent gets a verifier footer summarizing what actually changed on disk — catches silent overwrites and "wrote it but it didn't land" bugs. ([#24498](https://github.com/NousResearch/hermes-agent/pull/24498))
- **Per-turn file-mutation verifier footer** — After every turn that wrote or edited files, the agent now gets a short footer summarizing exactly what changed on disk — the file paths, the line counts, the actual delta. That means the agent catches its own mistakes when a write didn't land or got silently overwritten, instead of confidently telling you "I added the function" when the file wasn't actually saved. ([#24498](https://github.com/NousResearch/hermes-agent/pull/24498))
- **Unified `video_generate` with pluggable provider backends** — single tool, any backend. Drop in a new video provider as a plugin, no core changes. ([#25126](https://github.com/NousResearch/hermes-agent/pull/25126))
- **LSP semantic diagnostics on every write** — When the agent uses `write_file` or `patch`, Hermes now runs a real language server against the edited file and surfaces any new errors back to the agent before the next turn. Type errors, undefined symbols, missing imports — caught immediately. Goes way beyond v0.13.0's basic Python/JSON/YAML/TOML linting because it's actual semantic analysis. ([#24168](https://github.com/NousResearch/hermes-agent/pull/24168), [#25978](https://github.com/NousResearch/hermes-agent/pull/25978))
- **`computer_use` cua-driver backend** — proper focus-safe ops, non-Anthropic provider support, refresh on `hermes update`. Computer-use is no longer locked to a single SDK. (re-salvage of #16936) ([#21967](https://github.com/NousResearch/hermes-agent/pull/21967), [#24063](https://github.com/NousResearch/hermes-agent/pull/24063))
- **Unified `video_generate` with pluggable provider backends** — One tool, any video model. Hermes ships with the obvious backends already, but you can drop in a new video provider as a plugin without touching core. So when a new video model lands next month, it can be a one-file plugin instead of a fork. ([#25126](https://github.com/NousResearch/hermes-agent/pull/25126))
- **xAI Grok OAuth provider — SuperGrok via subscription** — sign in with your xAI account, talk to Grok models from Hermes. ([#26534](https://github.com/NousResearch/hermes-agent/pull/26534))
- **`computer_use` cua-driver backend — works with non-Anthropic models now** — Computer-use (the agent controlling your mouse and keyboard to drive GUI apps) used to be locked to Anthropic's SDK. The new cua-driver backend works with non-Anthropic providers too, has proper focus-safe operations, and refreshes itself on `hermes update`. Now any vision-capable model can drive your desktop. (re-salvage of #16936) ([#21967](https://github.com/NousResearch/hermes-agent/pull/21967), [#24063](https://github.com/NousResearch/hermes-agent/pull/24063))
- **Clarify with buttons — native inline keyboards on Telegram + Discord** — the `clarify` tool renders multi-choice prompts as platform-native buttons instead of typed responses. ([#24199](https://github.com/NousResearch/hermes-agent/pull/24199), [#25485](https://github.com/NousResearch/hermes-agent/pull/25485))
- **Clickable URLs in any terminal** — Links in agent output are now real OSC8 hyperlinks with hover-highlight in any terminal that supports them. Click to open in your browser — no more copy-paste-trim of long URLs from the transcript. Just works in iTerm2, Kitty, Ghostty, modern Windows Terminal, etc. (@OutThisLife) ([#25071](https://github.com/NousResearch/hermes-agent/pull/25071), [#24013](https://github.com/NousResearch/hermes-agent/pull/24013))
- **Discord channel history backfill (default on)** — Hermes reads recent channel history when joining a thread so it actually knows what's been said. ([#25984](https://github.com/NousResearch/hermes-agent/pull/25984))
- **Zed ACP Registry — `uvx` install in one click** — Hermes is now listed in Zed's Agent Client Protocol registry, so Zed users can install it with one click. The install path uses `uvx` so there's no npm dependency. `hermes acp --setup-browser` bootstraps the browser tools for registry-driven installs. (salvage of [#25908](https://github.com/NousResearch/hermes-agent/pull/25908)) ([#26079](https://github.com/NousResearch/hermes-agent/pull/26079), [#26120](https://github.com/NousResearch/hermes-agent/pull/26120), [#26234](https://github.com/NousResearch/hermes-agent/pull/26234))
- **Watchers skill — RSS / HTTP JSON / GitHub polling via cron `no_agent` mode** — skill recipes that wire change-detection sources directly into cron's script-only watchdog mode. ([#21881](https://github.com/NousResearch/hermes-agent/pull/21881))
- **OpenRouter Pareto Code router with `min_coding_score` knob** — OpenRouter's "Pareto" router automatically picks the cheapest model that meets a minimum quality bar. The new `min_coding_score` config lets you set that bar for coding tasks specifically — Hermes routes to the most affordable model that's at least that good at code. Stop paying for top-tier models when a mid-tier one would do. ([#22838](https://github.com/NousResearch/hermes-agent/pull/22838))
- **Zed ACP Registry integration + uvx distribution** — Hermes is in the Zed registry, installable via `uvx` (no npm). Plus `hermes acp --setup-browser` bootstraps browser tools for registry installs. (salvage of [#25908](https://github.com/NousResearch/hermes-agent/pull/25908)) ([#26079](https://github.com/NousResearch/hermes-agent/pull/26079), [#26120](https://github.com/NousResearch/hermes-agent/pull/26120), [#26234](https://github.com/NousResearch/hermes-agent/pull/26234))
- **NovitaAI as a new model provider** — NovitaAI joins the provider lineup, giving you another option for open-source model hosting (Llama, Qwen, DeepSeek, etc.) with their pricing and rate limits. (salvage #7219) (@kshitijk4poor) ([#25507](https://github.com/NousResearch/hermes-agent/pull/25507))
- **OpenRouter Pareto Code router** — wire a new OpenRouter router with `min_coding_score` knob. Pick the cheapest model that meets your quality bar. ([#22838](https://github.com/NousResearch/hermes-agent/pull/22838))
- **Codex app-server runtime for OpenAI/Codex models** — An optional runtime that drives OpenAI's Codex CLI under the hood when you're using OpenAI or Codex paths. You get session reuse, automatic retirement of wedged sessions, and proper OAuth refresh classification — the kind of plumbing that makes long agentic runs not fall over. ([#24182](https://github.com/NousResearch/hermes-agent/pull/24182), [#25769](https://github.com/NousResearch/hermes-agent/pull/25769))
- **Optional codex app-server runtime for OpenAI/Codex models** — drives the OpenAI Codex CLI under the hood for OpenAI/Codex paths, with session reuse, wedge retirement, and OAuth refresh classification. ([#24182](https://github.com/NousResearch/hermes-agent/pull/24182), [#25769](https://github.com/NousResearch/hermes-agent/pull/25769))
- **`huggingface/skills` as a trusted default tap** — The community skills index hosted at huggingface.co/skills is now wired into the Skills Hub by default. So when somebody publishes a useful skill there, you can install it from your own `hermes skills` browser without any extra config. (closes #2549) ([#26219](https://github.com/NousResearch/hermes-agent/pull/26219))
- **`hermes-skills/huggingface` as a trusted default tap** — community skills index from huggingface.co/skills is available by default in the Skills Hub. ([#26219](https://github.com/NousResearch/hermes-agent/pull/26219))
- **9 new optional skills** — Hyperliquid (perp + spot trading via the SDK and REST API), Yahoo Finance (live market data, fundamentals, historicals), api-testing (REST + GraphQL debug recipes), unified EVM multi-chain (one skill covers Ethereum + L2s + Base), darwinian-evolver (evolutionary prompt/skill tuning), osint-investigation (OSINT recipes for people / domains / orgs), pinggy-tunnel (expose local services to the public internet), watchers (polls RSS / HTTP JSON / GitHub via cron `no_agent` mode for change detection), and a full Notion overhaul for the May 2026 Developer Platform. ([#23582](https://github.com/NousResearch/hermes-agent/pull/23582), [#23583](https://github.com/NousResearch/hermes-agent/pull/23583), [#23590](https://github.com/NousResearch/hermes-agent/pull/23590), [#25299](https://github.com/NousResearch/hermes-agent/pull/25299), [#26760](https://github.com/NousResearch/hermes-agent/pull/26760), [#26729](https://github.com/NousResearch/hermes-agent/pull/26729), [#26765](https://github.com/NousResearch/hermes-agent/pull/26765), [#21881](https://github.com/NousResearch/hermes-agent/pull/21881), [#26612](https://github.com/NousResearch/hermes-agent/pull/26612))
- **API server exposes run approval events** — If you're driving Hermes programmatically through the HTTP API, long-running runs no longer silently hang when the agent hits an approval-required command. The approval request now surfaces on the API stream so your client can prompt the user and reply — no more silent stalls. (salvage of [#20311](https://github.com/NousResearch/hermes-agent/pull/20311)) ([#21899](https://github.com/NousResearch/hermes-agent/pull/21899))
- **API server exposes run approval events** — long-running runs surface approval requests over the API stream, no more silent stalls. (salvage of [#20311](https://github.com/NousResearch/hermes-agent/pull/20311)) ([#21899](https://github.com/NousResearch/hermes-agent/pull/21899))
- **Plugins can run any LLM call via `ctx.llm` + replace built-in tools via `tool_override`** — If you're writing a Hermes plugin, you now get first-class access to make LLM calls through the active provider and credentials — no manual client wiring. The new `tool_override` flag lets a plugin swap out a built-in tool with its own implementation cleanly. Plugin authors get the same model-routing and auth plumbing the core agent uses. (closes #11049) ([#23194](https://github.com/NousResearch/hermes-agent/pull/23194), [#26759](https://github.com/NousResearch/hermes-agent/pull/26759))
- **`/subgoal` — user-added criteria appended to active `/goal`** — layer extra success criteria onto a running goal loop. The judge sees them in the prompt, no behavior change when subgoals are empty. ([#25449](https://github.com/NousResearch/hermes-agent/pull/25449))
- **Brave Search (free tier) + DuckDuckGo (DDGS) as web-search providers** — Two new free web-search backends join Tavily, SearXNG, and Exa. Brave Search has a generous free tier; DDGS is the DuckDuckGo scraper that needs no key at all. Pick whichever fits your budget and rate-limit needs. ([#21337](https://github.com/NousResearch/hermes-agent/pull/21337))
- **Plugins can run any LLM call via `ctx.llm`** — plugins get a first-class hook to make their own LLM requests through the active provider/credentials, no manual wiring. Plus `tool_override` flag for replacing built-in tools. ([#23194](https://github.com/NousResearch/hermes-agent/pull/23194), [#26759](https://github.com/NousResearch/hermes-agent/pull/26759))
- **Sudo brute-force block + 3 dangerous-command bypasses closed + tool-error sanitization** — The approval gate now blocks `sudo -S` brute-force attempts and classifies stdin-fed or askpass-stripped sudo invocations as DANGEROUS. Three known bypasses of dangerous-command detection are closed (inspired by Claude Code's command-detection work). And tool error strings are now sanitized before being re-injected into the model context, so a malicious file or remote service can't pass instructions to your agent through error output. ([#23736](https://github.com/NousResearch/hermes-agent/pull/23736), [#26829](https://github.com/NousResearch/hermes-agent/pull/26829), [#26823](https://github.com/NousResearch/hermes-agent/pull/26823))
- **Brave Search (free tier) + DuckDuckGo (DDGS) as web-search providers** — two new free search backends alongside Tavily / SearXNG / Exa. ([#21337](https://github.com/NousResearch/hermes-agent/pull/21337))
- **`/subgoal` — user-added criteria appended to an active `/goal`** — When you've got a `/goal` running (the persistent Ralph-loop goal where the agent keeps going until criteria are met), you can now use `/subgoal <text>` to layer extra success criteria onto it mid-run. The judge factors your new criteria into the done-or-keep-going decision without restarting the loop. ([#25449](https://github.com/NousResearch/hermes-agent/pull/25449))
- **Sudo brute-force block + sudo-stdin/askpass DANGEROUS classification** — closes the `sudo -S`brute-force avenue; approval gates classify stdin-fed and askpass-stripped sudo invocations as dangerous. (salvages of #22194 + #21128) ([#23736](https://github.com/NousResearch/hermes-agent/pull/23736))
- **Provider rename — Alibaba Cloud → Qwen Cloud** — The Alibaba Cloud provider is renamed to Qwen Cloud in the picker and config to match what the rest of the world calls it. Existing config keys still work — no breaking changes — but the UI matches the actual brand now. ([#24835](https://github.com/NousResearch/hermes-agent/pull/24835))
- **Native Windows support (early beta)** — Hermes now runs natively on `cmd.exe` and PowerShell without WSL. A full PowerShell installer handles MinGit auto-install, Microsoft Store python stub detection, and the foreground Ctrl+C dance. There's still rough edges (this is the "early beta" stamp) — ~40 follow-up Windows-only fixes already landed in the window — but the basic loop works end-to-end on a clean Windows box. ([#21561](https://github.com/NousResearch/hermes-agent/pull/21561))
- **Provider rename — Alibaba Cloud → Qwen Cloud, picker reorder** — matches what the world calls it. Existing config keys still work. ([#24835](https://github.com/NousResearch/hermes-agent/pull/24835))
Recovered from a deterministic fallback because the LLM context summarizer was unavailable. Continue from the protected recent messages after this summary and use current file/system state for exact details.{previous_summary_note}
## Constraints & Preferences
- This fallback was generated locally without an LLM summary call.
- Secrets and credentials were redacted before preservation.
- The summary may be incomplete; prefer verifying current files, git state, processes, and test results instead of assuming omitted details.
## Completed Actions
{chr(10).join(completed)ifcompletedelse"None recoverable from compacted turns."}
## Active State
Unknown from deterministic fallback. Inspect current repository/session state if needed.
## In Progress
{active_task}
## Blocked
{_bullets(blockers,limit=5)}
## Key Decisions
None recoverable from deterministic fallback.
## Resolved Questions
None recoverable from deterministic fallback.
## Pending User Asks
{active_task}
## Relevant Files
{_bullets(relevant_files,limit=12)}
## Remaining Work
Continue from the most recent unfulfilled user ask and protected tail messages. Verify state with tools before making claims.
## Last Dropped Turns
{_bullets(last_dropped_turns,limit=8)}
## Critical Context
Summary generation was unavailable, so this is a best-effort deterministic fallback for {len(turns_to_summarize)} compacted message(s).{reason_text}"""
"User chose option A; awaiting implementation of step 2"
If the user's most recent message was a reverse signal (stop, undo, roll
back, never mind, just verify, change of topic) that supersedes earlier
work, write the reverse signal verbatim and DO NOT carry forward the
cancelled task. Example: "User asked: 'Stop the i18n refactor and just
verify the current diff' — earlier i18n in-flight work is cancelled."
If no outstanding task exists, write "None."]
## Goal
@@ -1029,7 +1352,7 @@ PREVIOUS SUMMARY:
NEW TURNS TO INCORPORATE:
{content_to_summarize}
Update the summary using this exact structure. PRESERVE all existing information that is still relevant. ADD new completed actions to the numbered list (continue numbering). Move items from "In Progress" to "Completed Actions" when done. Move answered questions to "Resolved Questions". Update "Active State" to reflect current state. Remove information only if it is clearly obsolete. CRITICAL: Update "## Active Task" to reflect the user's most recent unfulfilled request — this is the most important field for task continuity.
Update the summary using this exact structure. PRESERVE all existing information that is still relevant. ADD new completed actions to the numbered list (continue numbering). Move items from "In Progress" to "Completed Actions" when done. Move answered questions to "Resolved Questions". Update "Active State" to reflect current state. Remove information only if it is clearly obsolete. CRITICAL: Update "## Active Task" to reflect the user's most recent unfulfilled input — this includes any question, decision request, or discussion turn that the assistant has not yet answered. Only write "None" if the last exchange was fully resolved.
{_template_sections}"""
else:
@@ -1193,9 +1516,16 @@ The user has requested that this compaction PRIORITISE preserving all informatio
@staticmethod
def_strip_summary_prefix(summary:str)->str:
"""Return summary body without the current or legacy handoff prefix."""
"""Return summary body without the current, legacy, or any historical
handoff prefix.
Historical prefixes must be stripped too: a handoff persisted under an
older prefix can be inherited into a resumed lineage (#35344), and if we
only re-prepend the current prefix without removing the old one, the
stale directive it carried stays embedded in the body.
"description":"Capabilities required by Hermes Setup. Narrowly scoped: we don't write user files outside HERMES_HOME, we don't read arbitrary paths, and the only external network call goes through reqwest (Rust side, not exposed to the webview).",
**The native desktop app for [Hermes Agent](../../README.md) — the self-improving AI agent from [Nous Research](https://nousresearch.com).** Same agent, same skills, same memory as the CLI and gateway, in a polished native window — chat with streaming tool output, side-by-side previews, a file browser, voice, and settings, no terminal required. Available for **macOS, Windows, and Linux**.
<table>
<tr><td><b>Chat with the full agent</b></td><td>Streaming responses, live tool activity, structured tool summaries, and the same conversation history as every other Hermes surface.</td></tr>
<tr><td><b>Side-by-side previews</b></td><td>Render web pages, files, and tool outputs in a right-hand pane while you keep chatting.</td></tr>
<tr><td><b>File browser</b></td><td>Explore and preview the working directory without leaving the app.</td></tr>
<tr><td><b>Voice</b></td><td>Talk to Hermes and hear it back.</td></tr>
<tr><td><b>Settings & onboarding</b></td><td>Manage providers, models, tools, and credentials from a real UI. First-run setup gets you to your first message in seconds.</td></tr>
<tr><td><b>Stays current</b></td><td>Built-in updates pull the latest agent and rebuild the app in place.</td></tr>
</table>
---
## Install
### Install with Hermes (recommended)
Add `--include-desktop` to the [one-line installer](../../README.md#quick-install) and it sets up the agent and builds the desktop app in one go:
It builds and launches the GUI against your existing install — same config, keys, sessions, and skills. On first launch Hermes walks you through picking a provider and model; nothing else to configure.
### Prebuilt installers
When a release ships desktop installers they're attached to its [releases page](https://github.com/NousResearch/hermes-agent/releases) — `.dmg` (macOS), `.exe` / `.msi` (Windows), `.AppImage` / `.deb` / `.rpm` (Linux). These are published manually, so the install-with-Hermes path above is the most reliable way to get the latest.
---
## Updating
The app checks for updates in the background and offers a one-click update when one is ready. You can also update any time from the CLI:
```bash
hermes update
```
---
## Requirements
The installer handles everything for you (Python 3.11+, a portable Git, ripgrep). The only thing worth knowing:
- **Windows** — the installer bundles its own Git and Python; no admin rights or system changes required.
- **macOS / Linux** — uses your system Python 3.11+ (installed automatically if missing).
---
## Development
Want to hack on the app itself? Install workspace deps from the repo root once, then run the dev server from this directory:
npm run dev # Vite renderer + Electron, which boots the Python backend
```
Point the app at a specific source checkout, or sandbox it away from your real config:
```bash
HERMES_DESKTOP_HERMES_ROOT=/path/to/clone npm run dev
HERMES_HOME=/tmp/throwaway npm run dev
npm run dev:fake-boot # exercise the startup overlay with deterministic delays
```
### Building installers
```bash
npm run dist:mac # DMG + zip
npm run dist:win # NSIS + MSI
npm run dist:linux # AppImage + deb + rpm
npm run pack # unpacked app under release/ (no installer)
```
Installers are built and uploaded to GitHub Releases manually. macOS/Windows signing & notarization happen automatically when the relevant credentials are present in the environment (`CSC_LINK` / `CSC_KEY_PASSWORD` / `APPLE_*` for macOS, `WIN_CSC_*` for Windows).
### How it works
The packaged app ships only the Electron shell. On first launch it installs the Hermes Agent runtime into `HERMES_HOME` (`~/.hermes`, or `%LOCALAPPDATA%\hermes` on Windows) — the **same layout a CLI install uses**, so the two are interchangeable. The renderer (React, in `src/`) talks to a `hermes dashboard --tui` backend over the standard gateway APIs and reuses the embedded TUI rather than reimplementing chat. The install, backend-resolution, and self-update logic all live in `electron/main.cjs`.
### Verification
Run before opening a PR (lint may surface pre-existing warnings but must exit cleanly):
```bash
npm run fix
npm run type-check
npm run lint
npm run test:desktop:all
```
### Troubleshooting
Boot logs land in `HERMES_HOME/logs/desktop.log` (includes backend output and recent Python tracebacks) — check it first if the app reports a boot failure.
File diff suppressed because it is too large
Load Diff
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.