Surfaces the usage_report()/provenance() data layer added in #36701 as a
user-facing CLI command. Unlike `hermes curator status` (scoped to
curator-managed agent-created candidates), `usage` lists every skill on disk
— bundled built-ins and hub-installed included — with per-skill use/view/patch
counts and an agent/bundled/hub provenance tag.
Flags: --sort {activity,recent,name}, --provenance {agent,bundled,hub} filter,
--json for machine-readable output.
A stream that drops mid-response after tokens are delivered (peer-closed
connection, stale-stream reconnect) is converted into a synthetic
finish_reason="length" stub. The conversation loop treated that network
stall as a max-output-tokens truncation: when the dropped content was a
tool call it retried exactly once, then hard-failed with "Response
truncated due to output length limit" — even on large-output models that
never hit any cap (e.g. Opus).
- Tool-call truncation now retries up to 3 times (was 1) with a
progressive max_tokens boost, and is stub-aware: a PARTIAL_STREAM_STUB_ID
stall prints "Stream interrupted mid tool-call — retrying (n/3)" instead
of the false "model hit max output tokens", and the give-up message
distinguishes a network drop from a real truncation.
- Length-continuation retries preserve the original request's output cap
as a floor, so a high provider/model default isn't silently downshifted
to 8K/12K on retry.
- Added _requested_output_cap_from_api_kwargs() helper.
Tests: stub-stall mid-tool-call recovery within 3 retries; continuation
preserves a large provider-default output cap.
Fixes#26425. Salvages the substance of #26427 (cap floor) and #9525
(retry bump), adapted to the post-refactor conversation_loop.py which
handles all three api_modes uniformly.
Co-authored-by: LeonSGP43 <cine.dreamer.one@gmail.com>
Co-authored-by: ygd58 <ygd58@users.noreply.github.com>
* feat(dashboard): backend API for MCP, pairing, webhooks, credential pool, memory, gateway lifecycle
Adds REST endpoints so a remote admin can manage these without CLI access:
- MCP servers: list/add/remove/test (config.yaml parity with hermes mcp)
- Pairing: list/approve/revoke/clear-pending messaging codes
- Webhooks: list/subscribe/remove (hot-reloaded JSON store)
- Credential pool: list/add/remove rotation keys (via CredentialPool API)
- Memory provider: status/select/disable/reset
- Gateway lifecycle: start/stop (restart+update already existed)
Secrets redacted on read; usable values only reach the agent at session start.
All endpoints sit behind the existing dashboard auth gate.
* feat(dashboard): backend API for ops + skills hub
- Ops actions (spawned, log-tailed via /api/actions): doctor, security audit,
backup, import, checkpoints prune
- Ops reads (structured JSON): hooks list + allowlist status, checkpoints list
with per-session size
- Skills hub actions (spawned): install / uninstall / update
- Registers new action log files for all spawn-based endpoints
All gated by the existing dashboard auth middleware.
* feat(dashboard): admin pages for MCP, pairing, webhooks, and system ops
Adds four new dashboard pages + nav entries so a remote admin can manage
Hermes without CLI access:
- MCP: list/add/remove/test MCP servers
- Webhooks: list/create/delete subscriptions (one-time secret reveal)
- Pairing: approve/revoke/clear messaging pairing codes
- System: gateway start/stop/restart, memory provider + reset, credential
pool add/remove, ops (doctor/audit/backup/import/skills update) with a
live action-log viewer, checkpoints prune, shell-hooks status
api.ts: client methods + types for all new endpoints.
App.tsx: routes + sidebar nav (plain labels, no i18n key required).
Verified: tsc -b clean, production build succeeds, new pages lint clean,
zero new eslint errors in App.tsx.
* test(dashboard): cover admin API endpoints
20 tests across MCP, credential pool, memory, pairing, webhooks, ops, plus
an auth-gate parametrize that asserts every admin endpoint requires the
session token. Asserts request contract + CLI-config parity, not catalog
values (per the no-change-detector-tests rule).
* docs(dashboard): document MCP, Webhooks, Pairing, and System admin pages
Adds Pages sections for the four new admin tabs and an Admin-endpoints table
to the REST API reference. Updates the page description to reflect the
dashboard's expanded role as a full administration panel.
* feat(install): --no-skills flag for blank-slate default profile
Add an install-time --no-skills flag so the default ~/.hermes profile can
be created with zero bundled skills, matching what
`hermes profile create --no-skills` already does for named profiles.
The flag writes $HERMES_HOME/.no-bundled-skills and skips the install-time
seed. sync_skills() now honors that marker with an early return
(skipped_opt_out=True), so neither the installer, a later `hermes update`,
nor a direct sync re-injects bundled skills into a profile that opted out.
Previously the marker was only checked by seed_profile_skills() (named
profiles); the default profile had no opt-out and `hermes update` would
re-seed it every time.
Tests: TestNoBundledSkillsOptOut covers marker-present (no-op) and
marker-absent (normal seed) paths.
* feat(skills): hermes skills opt-out / opt-in for existing profiles
Adds an interactive counterpart to the install-time --no-skills flag so
an already-installed profile (default or named) can toggle the
.no-bundled-skills marker without reinstalling.
- `hermes skills opt-out` writes the marker (stop future seeding). Safe
by default: nothing on disk is touched.
- `hermes skills opt-out --remove` ALSO deletes already-present bundled
skills, but ONLY ones that are manifest-tracked AND byte-identical to
their origin hash. User-edited bundled skills, hub-installed skills, and
hand-written skills are never removed. Previews + confirms before
deleting (--yes to skip).
- `hermes skills opt-in [--sync]` removes the marker and optionally
re-seeds immediately.
Core logic lives in tools/skills_sync.py (set_bundled_skills_opt_out,
is_bundled_skills_opt_out, remove_pristine_bundled_skills) reusing the
existing manifest origin-hash machinery for the safety check.
Tests: TestOptOutToggleAndRemove covers marker toggle idempotency and
proves user-modified + non-bundled skills survive --remove.
* docs: blank-slate skills — install --no-skills + opt-out/opt-in
- features/skills.md: new 'Starting with a blank slate' section covering
the install flag, profile-create flag, and runtime opt-out/opt-in, with
a safe-by-default note.
- reference/cli-commands.md: document the new skills opt-out / opt-in
subcommands + examples.
- reference/profile-commands.md: fix the marker filename (was .no-skills,
actually .no-bundled-skills) and cross-link the runtime commands.
Validated with a full docusaurus build (exit 0); the three edited pages
compile clean with no new warnings.
Two related changes to the skill curator:
1. Built-in pruning. New curator.prune_builtins config (default on) lets the
curator archive bundled built-in skills after the inactivity period, not
just agent-created ones. A .curator_suppressed list tells the update-time
re-seeder (tools/skills_sync) to leave pruned built-ins archived, so the
prune is durable across `hermes update`. Built-ins are seeded with a
baseline record on first sight, so the inactivity clock starts at upgrade
time -- no mass-prune on the first run. Hub-installed skills are never
pruned regardless of the flag. Restoring a built-in clears its suppression.
2. Usage tracking for all skills. Telemetry (view/use/patch) was wrongly gated
behind curation-eligibility, so built-ins were tracked only when prunable
and hub skills never. Telemetry is observability and is now decoupled from
curation: every skill accrues usage counts regardless of provenance, while
lifecycle mutators (set_state/set_pinned/mark_agent_created) stay
curation-gated. New usage_report() + provenance() expose all skills with an
agent/bundled/hub tag.
Gateway /undo was wired into every platform but still ran the old
single-turn hard-truncate. Now it matches the CLI/TUI: /undo [N] backs
up N user turns (default 1, clamps to oldest), soft-deletes the
truncated rows on disk (active=0, kept for audit, hidden from re-prompts
and search) via SessionDB.rewind_to_message, evicts the cached agent so
the next turn rebuilds from the active-only transcript (the gateway's
equivalent of the CLI's in-place history surgery + memory invalidation),
and echoes the backed-up message text so the user can copy/edit and
resend — platforms have no editable composer to prefill.
- gateway/session.py: SessionStore.rewind_session(session_id, n) wraps
the soft-delete primitive; load_transcript already returns active-only
- gateway/run.py: _handle_undo_command parses [N], calls rewind_session,
evicts the agent, echoes target text; confirm-prompt detail is count-aware
- locales: undo.removed gains {turns}; new undo.invalid_count, all 16 langs
- tests: tests/gateway/test_undo_rewind_session.py (6 cases)
The skill security scanner blocked legitimate community skills on three
intrinsic false-positive patterns:
- read_secrets_file matched `cat > file.env <<` heredocs (writing the
user's own keys into their own local .env), not just `cat file.env`
reads. Exclude output redirections.
- allowed-tools frontmatter is REQUIRED by the agent-skill spec; every
compliant skill declares it. Drop from HIGH privilege_escalation to a
LOW informational finding so it no longer drives the verdict.
- python_os_environ flagged `os.environ.get("CONFIG_VAR")` config reads
as HIGH exfiltration. Exempt non-secret `.get()` reads; add a dedicated
CRITICAL python_environ_get_secret pattern so secret-named reads
(OPENAI_API_KEY etc.) are still caught.
Also: scan_skill() now honors a skill-provided .skillignore / .clawhubignore
(gitignore-style) so dev/docs artifacts shipped in a skill root are excluded
from both structural checks and pattern scanning. SKILL.md is never ignorable.
80 tests pass (64 existing + 16 new).
The setup-mode chooser showed two bare labels ('Quick Setup (Nous
Portal) — OAuth login, model & messaging' / 'Full setup — configure
everything') that didn't explain what Quick Setup actually is. Expand
both labels inline so each choice line carries a concise explanation:
Quick Setup (Nous Portal) — free OAuth login, no API keys, model + tools
Full setup — configure every provider, tool & option yourself (bring your own keys)
Single-file change to the choice labels; no new plumbing.
Two pre-existing failures on main, unrelated to each other:
- test_model_catalog: website/static/api/model-catalog.json was stale vs
_PROVIDER_MODELS — minimax/minimax-m2.7 was renamed to minimax/minimax-m3
without regenerating the committed manifest. Ran scripts/build_model_catalog.py.
- test_gui_command: the macOS relaunchable-signing fixup
(_desktop_macos_relaunchable_fixup) makes two subprocess.run calls (xattr +
codesign) on darwin before launch. The two darwin GUI tests set
sys.platform='darwin' and mock subprocess.run with a 2-element side_effect
(pack + launch), so the fixup's calls drained the iterator -> StopIteration.
Mock out the fixup in those two tests so the subprocess accounting stays
focused on pack/launch.
The on_session_switch fan-out passed rewound=rewound unconditionally,
injecting rewound=False into every provider's **kwargs on the common
/resume, /branch, /new, and compression paths. Providers that capture
extra kwargs into an 'extra' dict (and the exact-dict-equality tests
guarding them) broke. Forward rewound only when truthy; /undo sets it
explicitly, everyone else stays clean.
Extends the existing /undo command from a single in-memory exchange
removal into a full rewind: back up N user turns (default 1), soft-delete
the truncated rows in SessionDB (active=0, kept for audit, hidden from
re-prompts and search), notify memory providers, and prefill the composer
with the backed-up message text for editing — CLI and TUI.
Reuses the SessionDB rewind primitives, the on_session_switch(rewound=True)
memory hook, and the TUI command.dispatch prefill payload from SaguaroDev's
#21910 work, wired to /undo [N] instead of a separate /rewind picker.
- cli.py: undo_last(n, prefill) — in-memory truncate + SQLite soft-delete
+ agent surgery (system-prompt invalidate, flush-index reset) + memory
notify + editable buffer prefill; /undo dispatch parses optional count;
checkpoint-rollback caller passes prefill=False
- tui_gateway/server.py: command.dispatch undo branch (was rewind) parses
count, picks Nth-from-last user turn, clamps to oldest
- commands.py: /undo gains [N] args_hint
- tests: rename + expand TUI suite (multi-turn, clamp, invalid-count)
- release.py: AUTHOR_MAP entry for SaguaroDev
Co-authored-by: SaguaroDev <74339271+SaguaroDev@users.noreply.github.com>
Adds the TUI half of the /rewind feature so the Ink terminal UI gets
the same affordance as the prompt_toolkit CLI.
Python side (tui_gateway/server.py):
- /rewind added to _PENDING_INPUT_COMMANDS so slash.exec rejects it
and the TUI falls through to command.dispatch (the only path with
access to live session state + memory hooks).
- New command.dispatch branch for name == "rewind":
v1 auto-picks the most recent user turn (Claude-Code-style single-
step undo), calls SessionDB.rewind_to_message, refreshes the
in-memory history, fires _memory_manager.on_session_switch with
rewound=True, and returns the new "prefill" payload.
- A dedicated picker overlay (multi-step rewind) is tracked as a
follow-up to #21910.
TS side (ui-tui/src/):
- New "prefill" variant on CommandDispatchResponse + asCommandDispatch
validator. Mirrors "send" but does NOT auto-submit; the client drops
the message into the composer for editing.
- createSlashHandler renders the optional notice via sys() and calls
ctx.composer.setInput(d.message), letting the user edit-and-resubmit
the rewound turn — the core UX promised by the issue.
Tests:
- 7 new tui_gateway tests covering prefill payload shape, in-memory
history truncation, DB soft-delete, memory-provider notification
(rewound=True), busy-session refusal, missing-session error, and
registry placement in _PENDING_INPUT_COMMANDS.
- Extended asCommandDispatch vitest covering the new prefill variant
(with + without notice, and rejection of malformed payloads).
Out of scope for v1 (tracked as #21910 follow-up):
- Dedicated picker overlay in Ink (the multi-step rewind UI). v1 auto-
picks the most recent user turn, matching the most common case.
- Gateway platforms (Telegram, Discord, etc.) — issue scopes v1 to
CLI + TUI only.
Self-review of the code-block masking fix: the cleanup path ran
media_pattern.sub('') over the _mask_protected_spans() copy of the text and
assigned that back to 'cleaned', so whenever a real MEDIA: tag was delivered
(if media: branch), every fenced code block / inline code / blockquote in the
reply was blanked to whitespace in the user-visible text.
Now mask only a length-equal copy of 'cleaned' to locate the real tag spans,
then delete those spans from the unmasked 'cleaned' — masking is a locator,
not a text rewrite. Protected spans survive verbatim. Strengthens the existing
mixed-code test (it only asserted 'Done.' survived, not the code block) and
adds an inline-code-survives regression test. Both fail on the old sub-based
code and pass now.
extract_media() scanned the full response text without distinguishing
live delivery tags from example paths in fenced code blocks, inline code
spans, and blockquotes. This caused false positives where the agent's
explanation of MEDIA: syntax (or tool output containing example paths)
was stripped from user-visible text and the path was added to the media
delivery list.
Added _mask_protected_spans() helper that replaces protected regions
with equal-length whitespace before regex matching, preserving match
offsets. The helper skips backtick-quoted paths in MEDIA: tags to
maintain existing path extraction behavior.
Fixes#35695
Self-review of #34375 fix: the cleanup path ran media_pattern.sub('') over
the JSON-masked copy of the text, which baked the masking spaces into the
user-visible 'cleaned' string — a serialized tool result like
{"old":"MEDIA:/x.png"} came back as {"old":" "}.
Now mask only a length-equal copy of 'cleaned' to locate the real tag spans,
then delete those spans from the unmasked 'cleaned'. Real tags are stripped;
JSON-embedded MEDIA: text reads back verbatim. Masking 'cleaned' (not the
original 'content') keeps offsets valid after the [[audio_as_voice]] /
[[as_document]] directives are removed. Adds two cleaned-text regression tests.
Serialized tool results frequently embed a prior reply's text, e.g.
{"result": "MEDIA:/path/stale.png"}. The bare-path branch of
MEDIA_TAG_CLEANUP_RE matched these and re-delivered stale files (#34375).
Adds BasePlatformAdapter._mask_json_string_media, which blanks (offset-
preserving) only MEDIA:<bare-path> tokens that sit inside a JSON value-
context string (opened by : , { or [). Legitimate tags at line start,
after prose, indented, MEDIA:"quoted" form, and two-line TTS output are
all left untouched.
Reworked from the approach in #34388 (a line-start regex anchor), which
no longer applied to current main and regressed same-line/indented tags.
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
Adds nicsequenzy@gmail.com -> polnikale to AUTHOR_MAP so the
check-attribution gate passes for the Playwright headless_shell browser
discovery fix (#35717).
* feat(desktop): drop files anywhere in the chat area
File drops were only wired to the composer input. Add a reusable
useFileDropZone hook (enter/leave depth counting + capture-phase reset so
the affordance clears even when the composer claims the drop) and a
pointer-events-none ChatDropOverlay, wired onto the conversation viewport.
Drops funnel through the existing onAttachDroppedItems; composer drops keep
their own inline-ref behavior.
* fix(desktop): chat-area drops insert inline @file refs, not attachment cards
Match the composer-input drop behavior — funnel dropped paths through
droppedFileInlineRef + the composer insert bus so they render as inline
ref chips instead of attachment cards.
* fix(desktop): don't render bare file paths as tool images (404)
vision_analyze reports its input image as a local filesystem path, which
toolImageUrl handed straight to <img src>. In the renderer that resolves
against the dev-server origin and 404s. Restrict inline tool images to
fetchable sources (data: URLs and remote http(s)); bare paths now fall
back to the tool's codicon.
When an unauthenticated SPA fetch hit a gated /api/* endpoint (e.g.
GET /api/analytics/models?days=30 fired from ModelsPage on mount or
after a session expiry), the gated middleware stamped the request's
own path into next= on the 401 envelope's login_url. The SPA's global
401 handler in web/src/lib/api.ts full-page-navigated to that URL,
the PKCE cookie carried the encoded /api/* value through the OAuth
round trip to Portal, and /auth/callback's _validate_post_login_target
accepted it as same-origin and redirected the user to the raw JSON
endpoint instead of the dashboard.
Symptom Ben reported: after the OAuth screen he kept landing on
$DOMAIN/api/analytics/models?days=30 (raw JSON) rather than /models.
The bug was deterministic per page — whichever /api/* call ModelsPage,
AnalyticsPage, or SessionsPage fired first owned the redirect race.
Fix: both validators now reject /api/* targets in addition to the
existing /login, /auth/, /api/auth/ exclusions:
- _safe_next_target in middleware.py drops the value before it ever
enters login_url, so the SPA's 401 handler navigates to a bare
/login (which the SPA itself can return-from via its own
sessionStorage["hermes.lastLocation"] fallback that was already
saving the actual browser location).
- _validate_post_login_target in routes.py drops it as second-line
defence at the callback boundary, so a legacy cookie, a regressed
middleware, or an attacker-crafted /auth/login?next=/api/... value
can't smuggle the redirect through. Either layer alone is enough;
pairing them means a regression in one is caught by the other.
The match is anchored: ``decoded == "/api"`` or
``decoded.startswith("/api/")``. SPA route lookalikes like /apidocs
or /api-keys remain valid landing targets — tests pin that.
Test additions in test_dashboard_auth_401_reauth.py:
- TestApi401Envelope: rewrote test_login_url_carries_next_for_deep_
api_path (which asserted the pre-fix behaviour) as
test_login_url_drops_next_for_deep_api_path, plus added the
specific analytics-models repro case from Ben's report.
- TestNextSameOriginValidation: rejects-api-paths + does-not-reject-
api-prefix-lookalikes (covers /apidocs, /api-keys).
- TestAuthCallbackNext: end-to-end test_callback_with_api_next_
lands_at_root drives /auth/login?next=/api/... through to the
callback and asserts the user lands at "/", not the API URL.
- TestValidatePostLoginTarget: new class covering the callback-side
validator directly, including the URL-encoded ``%2Fapi%2F...``
form the PKCE cookie actually carries.
Mutation-tested: reverting both validators causes exactly the 5 new
or rewritten /api/*-related assertions to fail (each fix layer is
independently tested), while the 31 other assertions in the file
remain green. Full tests/hermes_cli/ suite (288 files, 5,938 tests)
passes with the fix applied.
The desktop rename dialog sent PATCH /api/sessions/{id}, but the backend
only defined GET and DELETE for that path — FastAPI returned 405 Method
Not Allowed, surfaced to the user as "Rename failed". Add the PATCH route
backed by SessionDB.set_session_title (handles sanitization, uniqueness,
and clearing the title when empty).
Also fix a misleading notification: any 405 was summarized as an unrelated
"does not support that audio endpoint" message. Make it a generic 405 hint.
The targeted data-volume chown in stage2-hook.sh only covers hermes-owned
*subdirectories*; loose state files living directly under $HERMES_HOME
(auth.json, state.db, gateway.lock, gateway_state.json, …) are missed.
When created or rewritten by `docker exec <container> hermes …` (root
unless `-u` is passed) they land root-owned, and the unprivileged hermes
runtime then hits PermissionError on next startup, producing a gateway
restart loop.
Fix: reset ownership of an explicit allowlist of hermes-owned top-level
files on every boot. The list mirrors the top-level file entries of
hermes_cli.profile_distribution.USER_OWNED_EXCLUDE plus the runtime lock
files.
This uses a targeted allowlist rather than the originally-proposed blanket
`find $HERMES_HOME -maxdepth 1 -user root` sweep, preserving the
targeted-ownership contract from #19788 / PR #19795: a bind-mounted
$HERMES_HOME may contain host-owned files Hermes does not manage, and
those must never be chowned. Verified end-to-end: allowlisted root-owned
files are reset to hermes on restart while a non-allowlisted host file
keeps its root ownership.
Co-authored-by: x1am1 <2663402852@qq.com>
Shiki's github-light-default colors comments #6e7781 (~4.2:1 on the code
card background), which is borderline unreadable at the 11px code font
size — and worst for shell snippets, where a single `#` turns the rest
of the line into one long comment span. Remap light-mode comments to
GitHub's darker muted gray (#57606a, ~6.4:1) via per-theme
colorReplacements. Dark mode (~6.1:1) reads fine and is left untouched.
s6-overlay images (e.g. hermes-agent:latest) use /init as PID 1 and exec
/run/s6/basedir/bin/init during stage0 startup. The Docker terminal backend
unconditionally added Docker --init and mounted /run as noexec, which broke
those images in two ways: --init created a second competing PID-1 init, and
the noexec /run made s6 stage0 fail with "exec: /run/s6/basedir/bin/init:
Permission denied" (exit 126), so the container died and terminal commands
reported a generic "container is not running" error.
Detect images whose entrypoint is /init via 'docker image inspect' and, for
those images only, skip Docker --init and mount /run with exec. All other
images keep the hardened --init + noexec defaults. Detection is best-effort:
any inspect failure falls back to the safe defaults.
#34192 reports Hostinger's 'Hermes WebUI' catalog crashes on startup
with:
/usr/bin/tini: No such file or directory
The image moved from tini to s6-overlay as PID 1 (/init) earlier in
2026. Orchestration templates that still pin /usr/bin/tini as the
entrypoint \u2014 like the Hostinger Hermes WebUI catalog \u2014 have no
binary to exec and the container crashes immediately.
Hermes has no control over the Hostinger catalog template, but we can
make the image backward-compatible by symlinking /usr/bin/tini -> /init
during the s6-overlay install step. External wrappers that exec
/usr/bin/tini will land on the same s6-overlay reaper they would have
landed on if they'd used the canonical /init entrypoint.
The image's own ENTRYPOINT continues to be /init verbatim \u2014 the shim
is purely for legacy external wrappers, not for the image's own
runtime path. Once affected catalogs are updated, the symlink can be
removed.
Other issues #34192 raises that are NOT addressed by this PR:
* Problem #2 (UID 1024 vs 10000 mismatch): already fixed by #33148
(S6_KEEP_ENV=1) and #32412 (with-contenv shebangs). The Hostinger
template likely needs to update its env-var propagation.
* Problem #3 (incompatible session formats): RFC for pluggable
SessionDB is tracked in #23717.
* Problem #4 (Telegram polling conflict): an operations problem on
Hostinger's side, not in this codebase.
This PR is scoped to the one issue that can be fixed inside
Dockerfile: the missing /usr/bin/tini binary.
Tests (3 in test_dockerfile_tini_compat_shim.py):
- test_tini_compat_symlink_present
Guard: the symlink line must exist in Dockerfile.
- test_tini_compat_comment_explains_why
The #34192 anchor comment must be present so future readers know
why the shim is there (avoid accidental removal).
- test_entrypoint_still_init_not_tini
Sanity check: ENTRYPOINT remains /init (s6-overlay). The shim is
only for external wrappers.
Refs: #34192
Partial fix: addresses the immediate tini-binary crash. Catalog-side
fixes still needed by Hostinger for the UID and session-format
problems documented in the issue.
Co-authored-by: Cursor <cursoragent@cursor.com>
Fixes#34107. When Hermes runs in Docker with HERMES_UID=1000 /
HERMES_GID=911, the entrypoint chowns the top-level HERMES_HOME once
at startup — but subdirectories created at runtime by
ensure_hermes_home() (especially for profile namespaces under
profiles/<name>/ spawned by kanban workers) were landing as root:root
and blocking subsequent uid-mapped worker invocations with:
PermissionError: [Errno 13] Permission denied:
'/opt/data/profiles/charles/logs/curator'
Fix: add _resolve_hermes_uid_gid + _chown_to_hermes_uid helpers that
read the env vars and apply chown after mkdir. Invoke from _secure_dir
which already runs after every directory creation in the home-init path,
so all newly-created subdirs (including the profile namespaces) get the
right ownership.
Safety properties:
- No-op when HERMES_UID/HERMES_GID unset (the dominant non-Docker path)
- No-op on Windows (os.chown doesn't exist; AttributeError swallowed)
- No-op when running as non-root (EPERM swallowed — the entrypoint's
startup chown -R picks it up on next restart, and in most cases the
dir was already correctly-owned by the calling user)
- Uses -1 sentinel for missing field so only the set value applies
- Empty-string env vars treated as unset
Adds 14 tests across:
- TestResolveHermesUidGid (7) — env-var parsing
- TestChownToHermesUid (5) — chown helper invariants
- TestSecureDirChown (2) — end-to-end through _secure_dir
Co-authored-by: Cursor <cursoragent@cursor.com>
Add MiniMax-M3 to the minimax, minimax-oauth, and minimax-cn curated
lists (these are hardcoded — the native Anthropic-format endpoint has no
/v1/models listing and the providers aren't in _MODELS_DEV_PREFERRED, so
new models don't auto-pull). Add a DEFAULT_CONTEXT_LENGTHS key
'minimax-m3' -> 1,000,000 so M3 resolves to its 1M context on every
surface (native ID + OpenRouter/Nous slug) via longest-key-first
substring match, while the M2.x series stays at 204,800.
On macOS the desktop app is built locally and ad-hoc signed (no Developer ID
on the user's machine). An ad-hoc bundle has no stable Designated Requirement,
so when the self-updater rebuilds it in place with a fresh build (new cdhash)
— plus the com.apple.quarantine flag inherited from the downloaded installer
process chain — Gatekeeper/LaunchServices treats the changed code as tampering
and macOS reports "Hermes is damaged and can't be opened," and the app fails to
relaunch. First launch works (fresh registration); the in-place update relaunch
is what breaks.
Fix: after building the desktop app locally, strip quarantine xattrs and
re-apply a clean deep ad-hoc signature (omitting the hardened-runtime flag,
which an ad-hoc build can't satisfy). Applied in both build entry points:
- hermes_cli/main.py cmd_gui (the `hermes desktop --build-only` path the
updater drives) — so the fix ships via `hermes update` (git), no installer
re-download needed.
- scripts/install.sh install_desktop (first install) for parity.
Both are no-ops on non-macOS and when a real signing identity (CSC_LINK /
APPLE_SIGNING_IDENTITY) is configured, so signed/notarized builds are untouched.
The docker_forward_env build loop only consulted the ~/.hermes/.env disk
fallback when a key was unset (value is None), not when it was present
but empty (""). A transient empty value in os.environ was therefore
forwarded into the sandbox container as `-e KEY=`, clobbering the correct
value on disk. Sandboxed workloads then read a zero-length secret and
failed auth (observed as intermittent Linear API 401s) with no gateway
restart and no .env rewrite.
Treat empty-string like unset (`if not value:` on the fallback) and never
forward a blank secret (`if value:` on the guard).
Fixes#35580
Adds me@simontaggart.com → SiTaggart to AUTHOR_MAP so the
check-attribution gate passes for the docker_forward_env empty-secret
fix (#35583, fixes#35580).
Read the Portal's tool_access claim (JWT + /api/oauth/account) into NousToolAccessInfo and gate managed Tool Gateway access on it: tool_gateway_entitled (paid OR live pool) and per-category tool_gateway_entitled_for(). The pool funds web/image/tts/browser but not video, so per-backend availability, the charge picker (ensure_nous_portal_access coverage_category), and managed defaults all respect coverage.
Setup: rebuild prompt_enable_tool_gateway as a per-tool checklist that renders whenever the pool is enabled, lists only pool-covered tools (video excluded for free-pool users), and is framed as the free tool pool for $0 subscribers rather than a paid subscription. get_gateway_eligible_tools now gates and filters off the entitlement snapshot.
The thin installer (apps/bootstrap-installer) drives install.sh stage-by-stage,
each in its own process. The `desktop` stage never called check_node, so the
Hermes-managed Node provisioned earlier (at $HERMES_HOME/node/bin) wasn't on
PATH. install_desktop's `command -v npm` check then failed and the build was
skipped — yet the stage still reported {"ok":true,"skipped":false}, so the
installer showed "Installation Complete" and only failed at the end with
"Couldn't find a built Hermes desktop ... the desktop build step may have been
skipped or failed."
Fix:
- Call check_node in the `desktop` stage (mirrors every other Node-dependent
stage) so the managed Node is on PATH (or installed).
- Make install_desktop self-provision via check_node and hard-fail (return 1)
if npm is still unavailable, instead of a silent `return 0`. The desktop
stage only runs when a build is explicitly requested (--include-desktop), so
an unavailable toolchain is a real failure, not graceful degradation.
Verified on macOS arm64: the `desktop` stage now builds
release/mac-arm64/Hermes.app, which matches resolve_hermes_desktop_exe, so the
installer's "Launch Hermes" succeeds.
The lazy session.create path hand-builds a partial info dict that omitted
desktop_contract. The desktop GUI reads a missing contract as undefined and
treats it as an out-of-date backend, so it surfaced a "Backend out of date"
toast on every launch even against a current backend. Carry the contract in
the lazy payload like _session_info already does for resume/branch.
The desktop self-update branch defaulted to bb/gui, the pre-merge feature
branch. Now that the desktop app is on main, flip DEFAULT_UPDATE_BRANCH to
main so freshly built apps check for updates against the right branch
instead of relying on the runtime self-heal fallback.
* feat: better composer etc
* docs: add desktop and dashboard run instructions
* fix(desktop): address security scan findings
* fix(dashboard): resolve @nous-research/ui path under npm workspaces
The sync-assets prebuild step shelled out to 'cp -r
node_modules/@nous-research/ui/dist/fonts ...' with a path relative
to apps/dashboard/. That works only when the dep is installed
locally in the dashboard workspace, but 'npm install' at the repo
root (the documented setup — see apps/desktop/README.md) hoists
shared deps to the root node_modules under npm workspaces. The
relative cp then fails with 'No such file or directory', sync-assets
exits 1, the Vite build aborts, and 'hermes dashboard' surfaces a
generic 'Web UI build failed' message.
Replace the shell one-liner with scripts/sync-assets.cjs, which
walks up from the dashboard directory looking for node_modules/
@nous-research/ui — working in both the hoisted (workspaces) and
co-located (standalone) layouts. Also guards against a missing
dist/fonts or dist/assets with a clearer error pointing at a
rebuild of the UI package rather than silently copying nothing.
* feat(desktop): support connecting to a remote Hermes backend
Add HERMES_DESKTOP_REMOTE_URL and HERMES_DESKTOP_REMOTE_TOKEN env
vars that, when set, short-circuit the local-child spawn in
startHermes() and connect the Electron renderer to an already-
running 'hermes dashboard' server reachable over the network.
Motivating use case: WSL2 users who want to run the Hermes core
(agent loop, tools, filesystem access) inside their WSL
distribution while rendering the Electron GUI on native Windows.
Before this change, the desktop app always spawned a local Python
child on the same host as the renderer, which doesn't cross the
WSL/Windows boundary.
The remote path reuses waitForHermes() as a liveness probe
(/api/status is in the backend's public endpoint allowlist), so
the connection is only returned once the backend is actually
ready. WebSocket URL derivation picks ws:// or wss:// based on
the input scheme. URL validation rejects non-http(s) schemes and
requires both env vars together to avoid a half-configured
connection that would silently fall through to the spawn path.
No behaviour change when the env vars are unset — the default
local-spawn flow is untouched.
Typical usage:
# in WSL2
hermes dashboard --tui --no-open --host 0.0.0.0 --port 9119 --insecure
# on Windows
set HERMES_DESKTOP_REMOTE_URL=http://localhost:9119
set HERMES_DESKTOP_REMOTE_TOKEN=<session token>
set HERMES_DESKTOP_IGNORE_EXISTING=1
(launch Hermes desktop)
* ci(desktop): automate desktop releases
Add GitHub Actions release channels for signed desktop installers and document the stable/nightly download paths.
* feat: file tabs
* refactor(desktop): tighten right-rail tab close API
Promote closeRightRailTab/closeActiveRightRailTab as the single
public entry point. Drops the activeTabRef + handleCloseDocument
indirection in ChatPreviewRail, the unused $rightRailHasContent
atom, and the legacy dismissFilePreviewTarget alias. -70 LOC.
* feat(desktop): polish composer pill toward reference look
Solid foreground-on-background send/voice-conversation circle (black-on-white
in light, white-on-black in dark) anchors the right edge as the primary CTA
instead of the orange theme primary. Bumps the primary control to 2.125rem so
it visually outranks the ghost mic/plus controls. Opens up the surface padding
(0.625rem x / 0.5rem y) so the input row breathes around its controls, and
nudges the corner radius from 20 to 24px for a slightly pill-ier silhouette.
LiquidGlass distortion is preserved.
* feat(desktop): add startup and onboarding flow
Add phase-based desktop boot progress, fresh-install sandbox testing, and first-run provider credential onboarding so packaged installs can start cleanly without manual settings detours.
* fix(desktop): gate prompts on provider setup
Show the desktop provider onboarding flow before prompt submission when no inference provider is configured, preventing fresh installs from falling through to backend credential errors.
* fix(desktop): surface provider onboarding from session warnings
Propagate credential warnings through session runtime info and open desktop onboarding whenever a session reports no usable provider, so unconfigured installs cannot fall through to prompt errors.
* fix(desktop): route gateway provider errors to onboarding
The "No inference provider configured" auth error reaches the renderer through gateway error events, not the prompt.submit promise; the previous patch only caught the latter, so the error toast still surfaced and onboarding never opened.
Also strip credential-shaped env vars from the test:desktop:fresh sandbox so the packaged backend can't see provider keys leaking from the launching shell.
* fix(desktop): use strict runtime check to drive onboarding
setup.status returned True whenever any provider auth state was discoverable, including indirect fallbacks like a gh-CLI Copilot token. That made desktop think the user was set up while the agent's actual resolve_runtime_provider call still raised AuthError, leaving the user with a useless toast and no onboarding.
Add a setup.runtime_check gateway method that runs the same resolver the agent uses on session creation, and switch the desktop onboarding overlay and prompt precheck to use it.
* feat(desktop): OAuth-first onboarding using existing dashboard provider API
Replace the engineer-flavored API key form with a Sign-in-first onboarding overlay that uses the dashboard's existing /api/providers/oauth catalog and PKCE/device-code endpoints (Anthropic, Nous, OpenAI Codex, etc.). API key entry is now a fallback tab with friendly provider names instead of env var prefixes, and the loud raw resolver error is gone in favor of a one-line welcome message.
* fix(desktop): polish onboarding provider list
Reorder OAuth providers so Nous Portal is first, give the segmented Sign in / API key control equal column widths, and replace the engineer-flavored backend names like "Anthropic (Claude API)" / "MiniMax (OAuth)" with friendlier in-app titles. External-CLI providers now show a softer subtitle and an external-link icon instead of a chevron.
* refactor(desktop): split onboarding overlay into store + view
Move the OAuth state machine, runtime check, copy-to-clipboard, and api-key save into store/onboarding.ts (matching the boot.ts pattern), leaving the overlay as a presentation layer that subscribes via useStore. Tabs are now table-driven, child panels read flow from the store instead of prop-drilling, and the polling/PKCE/error/success branches share a small Status atom.
* fix(desktop): external CLI providers + center mode tabs
External-CLI providers (Claude Code, Qwen Code) now open an in-overlay panel with the CLI command, copy button, and an "I've signed in" recheck instead of firing an invisible toast. Center the Sign in / API key tab control so it sits under the heading instead of hugging the left edge.
* fix(desktop): drop onboarding tabs for an inline link, group device-code waiting state
Replace the Sign in / API key tab pair with an "I have an API key" footer link under the OAuth provider list, with a "Back to sign in" affordance inside the API key form. Group the device-code "Waiting for you to authorize..." status next to the Cancel button so the alignment matches the action.
* refactor(desktop): tighten onboarding store + overlay
Drop the dead isOnboardingBusy/BUSY set, factor the catch-fallback dance into safeReq, and share a single reloadAndConnect helper between PKCE submit, device-code success, external recheck, and api-key save.
In the overlay, extract Step / CodeBlock / FlowFooter / CancelBtn / DocsLink atoms so the four sign-in panels share the same chrome instead of repeating it inline. Net effect: fewer literal divs, one place to touch the spacing, and the code-block + footer rows are reusable across future flows.
* fix(desktop): mount onboarding from frame 1 to kill the FOUT
Default onboarding.configured to null (unknown until the runtime check resolves) and have the onboarding overlay render whenever it's not yet confirmed true. The boot overlay now yields to it, so the very first paint is the Welcome card with a "While we get you set up..." progress strip instead of a flash of the chat shell between boot dismiss and onboarding mount.
The picker swaps in cleanly once the gateway opens and the runtime check confirms the user is not configured. Already-configured users see the same prep card briefly while their existing runtime warms up, then the overlay dismisses without touching the chat shell.
* fix(desktop): top-align empty sessions placeholder
The "Start a chat to build your history." empty state used a min-h-35 grid place-items-center container, which floated the text in a tall dead zone. Render it as a flat paragraph that sits right under the section header like the empty pinned state does.
* refactor(desktop): drop dead boot overlay
Onboarding overlay subsumes the boot card now that it mounts from frame 1 and renders boot progress inline. The standalone DesktopBootOverlay is unreachable in every flow (yields whenever onboarding has not confirmed configured, dismisses once it has).
* fix(desktop): hide pinned/recents sections until first session
A fresh sidebar showed the Pinned and Recent chats headers with floating empty-state copy underneath. Drop both sections (and the now-orphan SidebarEmptySessionState) when there are no sessions yet — they reappear after the first chat. Skeletons during initial load are unchanged.
* feat(gui): route embedded TUI through dashboard gateway (#21979)
Inject HERMES_TUI_GATEWAY_URL into dashboard PTY sessions so embedded ui-tui instances attach to the in-process websocket gateway, with coverage for the new env wiring.
* Add desktop remote gateway settings
Make the desktop gateway connection configurable from settings so local remains the default while remote backends can be saved, tested, and applied without environment variables.
* feat(gui): first-class Messaging page + gateway menu redesign
- Add Messaging page to the desktop app with per-platform setup,
status, and inline guidance. Catalog derives from gateway.config
Platform enum + plugin registry, so every messaging adapter the CLI
supports (Telegram, Discord, Slack, Mattermost, Matrix, WhatsApp,
Signal, BlueBubbles, Home Assistant, Email, SMS, DingTalk, Feishu,
WeCom, Weixin, QQ, Yuanbao, API server, Webhooks, plugins) shows up
without per-platform code.
- New REST endpoints: GET /api/messaging/platforms, PUT and POST
/test on the same path. Secrets go through the existing .env
pipeline; enable/disable writes config.yaml.
- Replace gateway statusbar dropdown with a richer panel: status row,
icon-only restart + system-panel actions, recent activity (with
timestamps trimmed in display, full text on hover), platform list.
- Auto-poll the messaging page every 6s (paused when hidden) so
status updates without a manual check.
- Drop Settings / Command Center from the sidebar nav (still
reachable via shortcuts and the titlebar cog).
- Flatten top corners on Messaging/Skills/Artifacts/Chat panes.
- Share new StatusDot component across messaging + gateway menu.
- Fix gateway/config.py so an explicit platforms.<name>.enabled=false
in config.yaml is honored when env tokens are present.
- pb-9 on the chat content area for breathing room above the composer.
* Potential fix for pull request finding 'CodeQL / Clear-text logging of sensitive information'
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
* pin electron version
* hide application menu on non-mac systems
* interpret compactPreview for non-string vlaues as JSON or an empty string
* fix(desktop): keep composer contenteditable mounted across stacked toggle
The composer rendered {input} inside two different parent fragments
depending on `stacked`. When auto-expand flipped `stacked` (e.g. the
moment typed text wrapped past two lines), React reconciled the two
branches as different positions and unmounted/remounted the
contenteditable. The fresh mount started empty, so any in-flight
characters — most reliably reproduced by holding a key — were lost.
Replace the conditional with a single CSS Grid whose template-areas
swap on `stacked`. The three children (menu, input, controls) keep
stable identities across the toggle; only their grid placement
changes, which the browser handles without React tearing down the
editor.
* refactor(desktop): align install layout with install.ps1 / install.sh
Make the desktop app's runtime layout match what scripts/install.ps1 and
scripts/install.sh produce, so a desktop-only user and a CLI-only user end
up with the same files in the same places and can share one install.
Layout
- ACTIVE_HERMES_ROOT = HERMES_HOME/hermes-agent (was: process.resourcesPath/hermes-agent, read-only)
- VENV_ROOT = HERMES_HOME/hermes-agent/venv (was: userData/hermes-runtime)
- desktop.log = HERMES_HOME/logs/desktop.log (was: userData/desktop.log)
- HERMES_HOME default: %LOCALAPPDATA%\hermes on Windows, ~/.hermes elsewhere
The packaged .app/.exe still ships a read-only payload at
process.resourcesPath/hermes-agent (FACTORY_HERMES_ROOT). On first launch
or after an installer-driven upgrade we sync factory -> active, then
provision the venv and run pip install -e . against the active root.
Key behaviors
- Pin HERMES_HOME in the spawned Python's env so get_hermes_home() resolves
to the same path resolveHermesHome() picked. Without this, Python falls
back to ~/.hermes on every platform - fine on mac/linux, a split-state
bug on Windows where our default is %LOCALAPPDATA%\hermes.
- Detect developer installs by .git presence at ACTIVE; never overwrite
a user's checkout via factory sync.
- Marker at ACTIVE/.hermes-desktop-runtime.json (schema v4) tracks
pyproject hash + factory version + runtime schema version. depsFresh
fast-paths when nothing changed.
- Dev (npm run dev) prefers SOURCE_REPO_ROOT over ACTIVE so devs run
their local edits, not whatever's under HERMES_HOME.
- Better error messages distinguish "no payload" from "no Python".
- Preserve a legacy ~/.hermes on Windows when no %LOCALAPPDATA%\hermes
exists, so users with prior pip/manual installs aren't orphaned.
pyproject.toml
- Promote fastapi, uvicorn[standard], ptyprocess (non-Windows), and
pywinpty (Windows) to main dependencies. The dashboard backend
(hermes dashboard) needs them at runtime; the previous lazy-import
fallback was a footgun for fresh installs.
- Empty the [pty] optional-extra; kept as a no-op back-compat alias for
any existing pip install hermes-agent[pty] invocations.
Drops the hardcoded BUNDLED_RUNTIME_REQUIREMENTS list in main.cjs - the
desktop now installs whatever pyproject.toml says, single source of truth.
Files
- apps/desktop/electron/main.cjs: runtime layout, HERMES_HOME pin,
factory->active sync, marker v4
- apps/desktop/scripts/test-desktop.mjs: track new venv location
- apps/desktop/README.md: new Setup, Runtime Bootstrap, and
Debugging sections
- pyproject.toml: fastapi/uvicorn/pty backends in main
dependencies; [pty] extra emptied
Tested locally on Windows: npm run dev boots cleanly, sessions land at
the new location, type-check + lint + test:desktop:platforms all pass.
Verified end-to-end on a fresh Win11 VM via dist:win installer.
Known gaps (filed as follow-ups, not in this PR):
- Skills not seeded on packaged installs (sync_skills only runs in
cmd_chat, not cmd_dashboard). Need to move to shared pre-dispatch.
- Git Bash not bundled or detected; agent's terminal tool errors out
with a useful message but desktop bootstrapper should pre-flight it.
- install.ps1 / install.sh should be decomposed into composable phase
libraries so the desktop bootstrapper can reuse them as a single
source of truth across all install surfaces.
* feat(desktop): theme polish, prose chat typography, composer chrome
- DS tokens/midground, Backdrop, scoped scrollbars, typography plugin + prose
- Composer liquid/radius utilities, thread font parity, tool/thinking cues
- File tree label scale, preview flex, thread retry loading + streaming tests
* feat(desktop): NSIS prereq detection page + auto-install via winget
The packaged Windows installer now detects Python 3.11+ and Git for Windows
at install time and offers to install missing prereqs via winget. Mirrors
the prereq logic scripts/install.ps1 already runs for CLI installs, so
desktop installer users get the same out-of-the-box experience as
install.ps1 users.
Why
- Hermes' terminal tool calls bash.exe directly (tools/environments/
local.py); on Windows that's Git Bash from Git for Windows. Without it,
the agent fails on the first terminal() call.
- Hermes' Python runtime needs 3.11+. Without it, the desktop bootstrapper
errors out at venv creation.
- Both gaps surfaced on a fresh Windows 11 VM smoke test: VM had Python
pre-installed but no Git, so the agent's first terminal call failed
with "Git Bash isn't installed."
- install.ps1 has had Install-Git + Install-Uv functions for ages. The
desktop installer was the asymmetric outlier.
How — NSIS prereq page
- New file: apps/desktop/installer/prereq-check.nsh (plugged into
electron-builder via build.nsis.include)
- Real Wizard page using nsDialogs, inserted via customPageAfterChangeDir
hook (between the Directory page and InstFiles).
- Group boxes for Python and Git, each showing detection status.
- Pre-checked install checkboxes when winget is available.
- Auto-skips silently if both prereqs are already installed.
- Falls back to manual download URLs when winget itself is missing.
- Detection:
- Python: probes `py -3.11`/`-3.12`/`-3.13`/`-3.14` via the Python
launcher. Microsoft Store "Python stub" (no py.exe) is correctly
classified as not-installed.
- Git: `where git`.
- winget: `where winget` (Win10 1809+ / Win11 with App Installer).
- Install execution (in customInstall macro):
- Python: nsExec::ExecToLog with `--scope user --silent`. Per-user
install, no UAC prompt, output streams to install log.
- Git: ExecShellWait via Windows ShellExecute. Critical because Git
always installs per-machine and triggers UAC; ShellExecute preserves
the foreground focus chain across non-elevated → elevated process
spawns, so UAC actually comes to the foreground. nsExec::ExecToLog
breaks the chain because winget runs hidden.
- Both pass `--disable-interactivity --accept-package-agreements
--accept-source-agreements` to suppress winget's own dialogs.
- Verification: probes Git's standard install locations via FileExists
rather than `where git`. NSIS's process inherits PATH at startup, so
a freshly-installed Git won't be visible to `where` until restart.
- Silent installs (/S) skip the prompts; managed deploys handle prereqs
out-of-band via Group Policy / Intune.
How — Electron-side safety net
- New findGitBash() in main.cjs, parallel to findSystemPython(). Probes
the same locations as tools/environments/local.py:_find_bash() so a
positive result here means the agent's terminal tool will work.
- ensureRuntime now throws a clear, actionable error on Windows when Git
Bash isn't found, matching the existing "Python 3.11+ is required"
error path.
- Catches users the NSIS page doesn't: .msi installer users (NSIS prereq
page doesn't run for MSI), `npm run dev` users, manual installers,
anyone who unchecked the install boxes on the NSIS prereq page.
- All gated on `IS_WINDOWS`; macOS / Linux unaffected.
NSIS build issue (resolved)
- electron-builder defaults to `-WX` (warnings as errors). NSIS optimizer
emits "warning 6010: function not referenced" for our page functions
because Page custom directives don't count as references in its
static-analysis pass. The functions ARE called at runtime when NSIS
invokes the page; the optimizer just can't see it statically.
- Set `build.nsis.warningsAsErrors=false` in package.json so this
spurious warning doesn't fail the build. (Documented option from
electron-builder's nsisOptions.)
Out of scope (filed for future work)
- MSI prereq detection: Windows Installer custom actions are a different
mechanism. Enterprise deploys typically handle prereqs via GP/Intune.
- Bundle PortableGit + python-build-standalone in extraResources for
zero-network installs. ~80MB increase.
- Mac / Linux GUI prereq flows (different installer formats; Xcode CLT
covers most macOS prereqs already; Linux is per-distro hard).
Files
- apps/desktop/installer/prereq-check.nsh (new, ~290 lines NSIS)
- apps/desktop/package.json (build.nsis.include +
warningsAsErrors)
- apps/desktop/electron/main.cjs (findGitBash + preflight)
- apps/desktop/README.md (Runtime prerequisites
section)
Cross-platform impact
- macOS / Linux builds (dist:mac, dist:mac:dmg, dist:mac:zip): nsis
config is ignored entirely; .nsh is dormant.
- npm run dev: .nsh dormant; main.cjs preflight gated on IS_WINDOWS.
- scripts/install.ps1, scripts/install.sh: no reference to any new
files; CLI install paths untouched.
- Hermes CLI / dashboard / gateway: no reference; runtime untouched.
- All checks: node --check on main.cjs and test-desktop.mjs pass;
npm run test:desktop:platforms 4/4 passing; node --test green.
Tested
- npm run dist:win produces signed .exe and .msi without errors.
- Fresh Win11 VM (Python pre-installed, no Git): prereq page renders,
Python check shows detected, Git checkbox pre-checked. Click Next →
Git installs via winget with UAC prompt in foreground.
- After install completes, Hermes launches and the agent's terminal
tool can run bash commands. Verified Git Bash is detected at
`C:\Program Files\Git\bin\bash.exe` by ensureRuntime's preflight.
* feat: theme changes, composer tweaks, in app update ux, finesse
* fix(cli): seed bundled skills on dashboard + gateway entrypoints
`sync_skills(quiet=True)` was only being called from inside `cmd_chat`,
which meant `hermes dashboard` (the desktop GUI's backend) and `hermes
gateway` (Telegram/Discord/Slack/etc daemons) never seeded the bundled
skill library into ~/.hermes/skills/.
This surfaced as "No skills found" in the desktop GUI's skills panel on
fresh installs, despite the agent having access to the full bundled
library when invoked via `hermes chat`. scripts/install.ps1 worked
around it by running skills_sync.py as part of Copy-ConfigTemplates,
but that's not part of the desktop installer's bootstrap chain.
Fix
- Extract the skills-sync block from cmd_chat into a module-level
`_sync_bundled_skills_quietly()` helper.
- Call the helper from cmd_chat (preserving existing behavior),
cmd_dashboard (after the --status/--stop early-return paths and
fastapi import check, so we don't run skills_sync on management
commands or when deps aren't installed), and cmd_gateway.
Why these three entrypoints
- cmd_chat: the user's primary CLI entrypoint
- cmd_dashboard: the desktop GUI's backend; this is what `hermes
dashboard --tui` invokes when the desktop bootstrapper spawns Hermes
- cmd_gateway: long-running daemons where the user expects the agent
to have full skill access
Other entrypoints (cmd_config, cmd_doctor, cmd_login, cmd_status,
etc.) are management commands that don't need skill discovery and were
never running skills_sync in the first place — leaving them alone.
Idempotence
- tools/skills_sync.py is manifest-based: skipped skills cost
milliseconds. Calling it from multiple entrypoints adds no real
cost, and users running `hermes chat` then `hermes dashboard` get
two fast no-ops on the second call.
Failure handling
- Helper wraps skills_sync in try/except. Skills are an enhancement,
not a hard dependency — Hermes runs fine with an empty skills/ dir.
Files
- hermes_cli/main.py:
+ new helper `_sync_bundled_skills_quietly()` at module level
+ cmd_chat: replace inline block with helper call
+ cmd_dashboard: add helper call after fastapi import succeeds
+ cmd_gateway: add helper call before delegating to gateway_command
* feat(desktop): hoisted todo widget, JSON tool summaries, history grouping & timer fixes
- Hoist todo to first-class widget (shadcn checkboxes, brand colors, no
tool-accordion). Header derives label from active task; non-active rows fade.
- Replace raw JSON dumps with structured key/value summaries via
formatToolResultSummary; nested error extraction for clearer failures.
- Fix loaded-session grouping: stitch interleaved assistant/tool iterations
into one bubble instead of orphaned synthetic messages.
- Stable tool/thinking timers via keyed registry so unmount/scroll doesn't
reset elapsed counts; gate "running" on real live thread state.
- Reorganize chat-only assistant-ui components under components/chat/.
* fix(desktop): address CodeQL alerts on PR #20059
- settings/helpers.ts: harden setNested against prototype pollution.
POLLUTING_PATH_PARTS check is now applied at every assignment site
(loop + leaf) and uses Object.defineProperty so CodeQL can see the
guard inline rather than via a helper function call.
- lib/markdown-preprocess.ts: rebuild the dangling-fence close regex
from a fence-char + length instead of marker.replace(...). The marker
is captured by `(`{3,}|~{3,})` so it can only be backticks or tildes,
but CodeQL was tracing tainted input text into the RegExp source and
flagging hostname dots from input as part of the pattern (false
positive js/incomplete-hostname-regexp on the test fixture URLs).
Reconstructing from a literal char breaks the dataflow.
- scripts/notarize-artifact.cjs: drop args from the run() rejection
message. Args carry --key-id / --issuer / key file path; the existing
outer catch already squashes errors to a generic line, but CodeQL was
flagging the args.join(' ') as clear-text logging of APPLE_API_KEY_ID.
Composer DOM-text-as-HTML alerts (composer/index.tsx:379, :547) are
already addressed in 4dd9732a9 — innerHTML assignment was replaced with
renderComposerContents which builds DOM via replaceChildren / append
text nodes (no HTML interpretation).
* fix(desktop): inline prototype-pollution guard so CodeQL sees it
CodeQL's dataflow doesn't follow the helper-function guard inside
`safeSet`, so it kept flagging Object.defineProperty as prototype-
polluting. Inline the literal `__proto__`/`constructor`/`prototype`
check at the assignment site to break the dataflow.
Behavior unchanged — same set of disallowed keys, same throw.
* feat(ui-tui): resolve links to readable page titles
Mirror desktop pretty-link behavior in the TUI by resolving HTTP links to page titles with shared caching and safe fetch filters, plus slug-based fallbacks so chat links stay readable even when title fetch fails.
* fix(desktop): drop RegExp from dangling-fence close detection
Previous attempt tried to break the dataflow by reconstructing the
close-fence regex from a literal char + marker.length, but CodeQL still
traced marker.length back to input and kept flagging the test-fixture
URLs as hostname-regex sources (js/incomplete-hostname-regexp).
Replace `new RegExp(...)` + `closeRe.test(body)` with a string-only
hasCloseFenceLine() helper that splits on '\n' and uses ===. No regex
on this path now, so input data can no longer reach a RegExp source.
Behavior preserved: matches lines that are (whitespace + marker +
whitespace), which is what the original `\n[ \t]*${marker}[ \t]*(?=\n|$)`
matched. All 12 markdown-text tests still pass.
* fix(process-registry): suppress windows-footgun false positive on guarded killpg
Keep the existing POSIX-only process-group teardown path, but make the
signal selection explicit via getattr and add an inline windows-footgun
suppression marker on the guarded os.killpg line so the Windows footgun
check no longer blocks CI on this intentionally platform-gated code.
* feat(desktop): reconcile live tool events, polish thread chrome, harden boot
- chat-messages: match tool rows by overlapping query/context/preview values
so preview-first `tool.progress` rows reliably adopt later stable-id
`tool.start` payloads instead of spawning ghost rows or mis-merging
parallel same-name calls; preserve prior args/result across phases.
- tui_gateway: emit full args + parsed result on `tool.start` / `tool.complete`,
drop redundant `tool.started` re-emit from `tool.progress`.
- electron/main: prefer SOURCE_REPO_ROOT before PATH `hermes` in dev so
local backend edits actually run; split hardening helpers into
`electron/hardening.cjs` with tests.
- thread/tool UI: one-shot enter animation keyed by stable ids, braille
spinner for running rows, Cursor-like disclosure rows, drill-down +
duration/count formatting via new tool-fallback-model.
- composer: extract `text-utils`, drop liquid-glass overrides.
- right-rail: split preview-pane into preview-console / preview-file.
- runtime: incremental external-store runtime + runtime-readiness gate;
onboarding store + tests; route-resume hook test.
- regression tests for live tool reconciliation (parallel tools, id-less
progress, preview-first rows, structured args/results).
* feat(desktop): add ripgrep to NSIS prereq page + polish layout
Add ripgrep as a third (recommended) prereq alongside Python and Git in
the NSIS prereq detection page, and clean up the page layout based on
on-VM testing.
Why ripgrep
- Hermes' search_files tool calls `rg` directly for content + filename
search (tools/file_operations.py:1382). Falls back to grep/find from
Git Bash when missing — works but slower and noisier (no .gitignore
awareness).
- ~5MB winget install via `BurntSushi.ripgrep.MSVC --scope user` — no
UAC prompt, parallel to how Python installs.
- scripts/install.ps1 already installs ripgrep as part of
Install-SystemPackages; this brings the desktop installer to parity.
Why "recommended" not "required"
- Python and Git are hard requirements: without them the agent runtime
or terminal tool refuses to start. The bootstrapper preflight throws.
- ripgrep is a performance enhancement: missing it just means slower
searches. Page wording reflects this; failure to install is logged
but doesn't show a MessageBox or block.
Layout polish (response to on-VM screenshot review)
- Wizard header now correctly reads "System Requirements" instead of
the leftover "Choose Install Location" from the previous page. Set
via `GetDlgItem $HWNDPARENT 1037/1038` + WM_SETTEXT — the standard
NSIS pattern for overriding the page header on a custom Page.
- Removed redundant in-body title + verbose intro paragraph; the
wizard header IS the title now. Body has one short intro line.
- Group boxes tightened to 26u with content positioned just below the
groupbox title (not top-anchored status + bottom-anchored checkbox
with empty space in the middle). All three panels + footer fit
comfortably in 126u, well under the 140u page limit.
- Checkbox labels simplified: dropped "(per-user, no admin prompt)"
and "(administrator approval required)" suffixes. The footer note
still calls out UAC for Git when relevant.
- Footer text trimmed to fit cleanly without clipping.
Install order (in customInstall macro)
- Python → ripgrep → Git
- Python and ripgrep are silent and run first; Git's UAC prompt comes
last so the user's approval interaction isn't interrupted by silent
activity afterwards.
Skip behavior unchanged
- All three detected → page auto-skips via Abort
- Silent install (/S) → customInstall winget block skips
- User unchecks all → page advances without running winget
Files
- apps/desktop/installer/prereq-check.nsh: ripgrep detection block,
ripgrep page panel + checkbox, ripgrep customInstall block,
GetDlgItem header override, layout reflow
- apps/desktop/README.md: Runtime prerequisites section updated to
list ripgrep as recommended, with manual winget command
* feat(desktop): add model-confirmation step to onboarding
After OAuth/API-key login completes, onboarding now shows a confirmation
card with the curated default model and a Change button before dropping
the user into chat. Closes the gap where the desktop's `model.default`
was empty after first launch and the agent had to fall back to whatever
heuristic happened to fire — leaving users wondering "why am I getting
sonnet-4 when I logged into Nous Portal?"
Why
- Desktop onboarding only persisted credentials, never `model.default`.
The CLI's `hermes model` command pairs provider + model selection,
but the desktop's onboarding skipped the model step entirely.
- Result: users saw whichever model the agent's auto-fallback picked,
unpredictably and undocumented.
- For the BUILD demo we want users to land on the model they expect
for their provider, with a clear "this is what you're getting" UI
and a one-click path to change it before chatting.
How
- New `confirming_model` flow status carries the just-authenticated
provider slug, current default model, label, and a saving flag.
- `completeWithModelConfirm()` runs after credentials succeed: reloads
env, verifies runtime, fetches /api/model/options to find the curated
first-model for the provider, persists it via /api/model/set, then
transitions into `confirming_model`.
- If anything fails (no providers returned, network error), falls
through to the previous behaviour — onboarding completes without
the confirm step. Polish, not a hard requirement.
- All four credential paths (device_code OAuth, PKCE OAuth, external
CLI flow, API key) now use completeWithModelConfirm instead of
reloadAndConnect.
UI
- `ConfirmingModelPanel` shows: green "<provider> connected" banner,
card with "Default model: <name>" + Change button, and a "Start
chatting" CTA that finalises onboarding.
- Reuses the existing `ModelPickerDialog` (the same picker available
from the chat shell) for the change-model UX. Search, filtering,
multi-provider listing — all already built.
- Stacking: ModelPickerDialog defaults to z-130, which renders UNDER
the onboarding overlay (z-1300) and breaks pointer events. Added
optional `contentClassName` prop to ModelPickerDialog so callers
can override; onboarding passes `z-[1310]`.
Provider-slug matching
- For OAuth flows: pass `provider.id` directly as the preferred slug.
- For API-key flows: `OPENROUTER_API_KEY` → "openrouter" via env-key
prefix strip. Also includes the user-visible label as a fallback
candidate.
- fetchProviderDefaultModel falls back to the first authenticated
provider in the response if no preferred slug matches — so even a
miss still surfaces a reasonable default.
Files
- apps/desktop/src/store/onboarding.ts:
+ new `confirming_model` flow variant
+ fetchProviderDefaultModel + completeWithModelConfirm helpers
+ setOnboardingModel (optimistic update + revert on failure)
+ confirmOnboardingModel (finalises onboarding from the card)
- reloadAndConnect (replaced; the four call sites now go through
completeWithModelConfirm)
- apps/desktop/src/components/desktop-onboarding-overlay.tsx:
+ ConfirmingModelPanel component
+ new branch in FlowPanel for status `confirming_model`
+ ModelPickerDialog usage with z-[1310] content class
- apps/desktop/src/components/model-picker.tsx:
+ optional `contentClassName` prop on ModelPickerDialog so the
dialog can be stacked on top of other fixed overlays
Tested
- `npm run type-check` passes
- `npx eslint` clean on touched files
- Live test in `npm run dev`: cleared onboarding cache, walked
through Nous device-code flow, saw confirm card with curated
default, clicked Change → ModelPickerDialog rendered above the
onboarding overlay with working pointer events, picked a different
model, "Start chatting" persisted to ~/.hermes/config.yaml.
* fix(desktop): suppress generic provider warning in onboarding
Hide the red setup notice when the message is the generic missing-provider guidance, since onboarding already presents provider auth actions. Centralize provider-setup matching across desktop hooks and add coverage for the matcher.
* fix(desktop): add 2u clearance below prereq checkboxes
Group box bottom border was clipping the checkboxes by 1-2px.
Bumped each box height 26u→30u; checkboxes now sit 2u above the bottom border.
* fix(nix): refresh dashboard lockfile hash
Update the web npm deps hash in nix/web.nix to match the committed apps/dashboard/package-lock.json so bb/gui passes the nix lockfile check.
* fix(desktop): install TUI deps in release workflow
Ensure desktop release builds install the standalone ui-tui package before bundling the TUI payload.
* fix(desktop): run release builder from app package
Invoke the desktop builder through the package script so electron-builder uses apps/desktop/package.json.
* fix(desktop): expand release artifact names safely
Build desktop artifact names from workflow version/channel while preserving electron-builder platform macros.
* fix(desktop): use package artifact naming in release workflow
Let electron-builder's desktop package config provide platform-specific artifact extensions while the workflow injects the release version/channel metadata.
* fix(nix): fetch dashboard npm deps from package root
Point the dashboard npm dependency fetch at apps/dashboard so Nix can find the package lockfile after the dashboard move.
* fix(nix): build dashboard from package directory
Set the web package source root to apps/dashboard so npm patch/build phases run beside the dashboard lockfile while keeping apps/shared available as a sibling.
* feat(desktop): render LaTeX math via KaTeX after streaming completes
Add @streamdown/math plugin to the chat markdown renderer.
Inline ($x^2$) and block ($$...$$) math both supported with
singleDollarTextMath enabled. Plugin is gated to non-streaming state
to match the existing pattern for syntax highlighting — math renders
when the message completes, avoiding KaTeX re-render churn during
streaming. KaTeX CSS is imported in styles.css; ~30KB CSS + ~430KB
JS added to the bundle. Smoothness improvements during streaming
deferred to a follow-up.
* perf(desktop): memoize KaTeX renders so math streams without re-rendering
Wrap rehype-katex with a per-equation LRU cache (keyed by
displayMode + source text) and re-enable math during streaming.
Stock @streamdown/math runs rehype-katex on every markdown commit,
so each new token re-katexes every equation in the message. For
math-heavy responses (an equation derived step-by-step) that's
hundreds of ms of wasted work per token and the streaming UI
chokes. With memoization, each equation pays katex.renderToString
exactly once; subsequent tokens re-walk the tree but hit cache for
unchanged equations.
The wrapper mirrors rehype-katex's semantics exactly: same class
detection (language-math, math-inline, math-display), same
<pre>-walk-up for fenced math blocks, same parent.children.splice
replacement, same SKIP traversal, same strict-then-lenient render
strategy with VFile message reporting.
Cached children are structuredCloned on each splice so downstream
rehype plugins or toJsxRuntime can't mutate the cache.
* fix(desktop): declare katex-memo deps directly + drop per-app lockfile
katex-memo.ts (added in 112cad59b) imports hast-util-from-html-isomorphic,
hast-util-to-text, remark-math, katex, and unist-util-visit-parents but
those were never added to apps/desktop/package.json. They were silently
resolving via @streamdown/math at the workspace root, which broke the
moment `npm i --prefix apps/desktop` ran with the per-workspace lockfile
because that install only consults apps/desktop/package.json. Add them
as direct deps, plus unified/vfile/@types/hast for the type imports.
Also delete apps/desktop/package-lock.json — root package.json declares
workspaces: ["apps/*"], so npm manages all lockfile state at the root.
The stale per-app lockfile is what made `npm i --prefix apps/desktop`
diverge from the workspace install in the first place and left an empty
apps/desktop/node_modules/@assistant-ui/ stub that Vite's dep optimizer
then tried (and failed) to open at @assistant-ui/core/dist/internal.js.
* feat(desktop): disable Backdrop noise overlay by default
The noise overlay defaulted to on, which adds a busy speckle layer over
the whole window for every new user. Flip the Leva default to off; the
toggle stays in Backdrop / Noise for anyone who wants it back.
* fix(desktop): polish LaTeX rendering — currency, code blocks, brackets
Five distinct bugs surfaced from a math-heavy stress test:
1. Adjacent code fences glued together. scrubBacktickNoise's
second-pass regex /``\s*``/g matched the LAST 2 backticks of
one fence + whitespace + FIRST 2 backticks of the next, collapsing
two blocks into one. Fixed with lookbehind/lookahead so we only
match exactly 2 backticks not part of a longer run.
2. Whitespace eaten between fences and following content.
stripPreviewTargets internally calls .trim() which strips leading/
trailing whitespace from each split-segment. For segments between
two fences this collapsed \n\n to '', gluing fence close to next
block. Fixed by capturing leading/trailing whitespace at the call
site and restoring it after the transform.
3. Currency dollar signs eaten as math. With singleDollarTextMath:true
remark-math greedy-matched any pair of $, so '$5 ... $10' became
one inline math span. Added escapeCurrencyDollars to escape $<digit>
patterns to \$<digit> in prose segments (not in code). Trade-off:
math expressions starting with a digit (rare — '$5x = 10$') get
escaped too. Mirrors the convention in ChatGPT/Claude's UIs.
4. \(...\) and \[...\] LaTeX brackets unsupported. Models often
emit these instead of $...$ / $$...$$. Added
rewriteLatexBracketDelimiters preprocessor pass.
5. ```latex / ```tex blocks were being routed to KaTeX via a
rewrite to ```math. Aligns with GitHub markdown convention:
```math = render as math; ```latex / ```tex = LaTeX/TeX
source code (syntax highlighted, not rendered). Conflating them
broke teaching/showing-source use cases. MATH_FENCE_LANGUAGES
pruned to {'math'} only.
Also flipped parseIncompleteMarkdown to true (was !isStreaming) so
the math parser can't see $ inside streaming-but-not-yet-closed code
fences. Shiki was already deferred via defer={isStreaming} so this
doesn't introduce new tokenization cost.
Test: 18/18 existing tests still pass; one test updated to expect
escaped \$ in currency-prose-with-URL case.
* fix(desktop): detect Python via registry/filesystem; pin to 3.11–3.13
Two related fixes for Python detection on Windows:
1. py.exe (Python launcher) is missing from per-user installs that
didn't check the launcher option, so 'py -3.X --version' alone
misses real Python installs. User-reported case: clean Win11 +
official Python.org 3.14 install -> 'where py' returned nothing,
our installer offered to install Python again. Both NSIS prereq
page and main.cjs now probe in this order:
1. py.exe launcher (when present)
2. PEP 514 registry: HKLM/HKCU\SOFTWARE\Python\PythonCore\<v>\InstallPath
3. Filesystem: %ProgramFiles%\Python<v>, %LocalAppData%\Programs\Python\Python<v>
Crucially, we never fall back to running 'python.exe' from PATH
on Windows — the WindowsApps stub at %LOCALAPPDATA%\Microsoft\
WindowsApps\python.exe is a redirector that opens the Microsoft
Store window if no Store Python is installed. Triggering that
during boot would be terrible UX. Registry/filesystem probes
never execute the binary.
2. Drop 3.14 from the supported version set. Several Hermes deps
(notably pywinpty, which carries Rust crates like
windows_x86_64_msvc) don't yet publish 3.14 wheels. With wheels
missing, 'pip install -e .' falls back to building from sdist,
which needs a Rust toolchain — users see 'could not compile
windows_x86_64_msvc build script' on first run. install.ps1
sidesteps this by pinning to 3.11 via uv; the desktop installer
doesn't yet have the same uv-managed-Python pathway, so for now
we accept 3.11/3.12/3.13 and tell winget to install 3.11 if
none of those are present. Revisit when the wheel ecosystem
catches up to 3.14 (~early 2026).
* feat(desktop): Cron, Profiles, usage analytics, and titlebar fixes
- Add Cron and Profiles sidebar routes with full CRUD-style flows and API wiring.
- Extend Command Center with auxiliary task overrides and a Usage panel (7d/30d/90d).
- Fix titlebar geometry for WSL/Windows (native overlay width, tool spacing).
- Remove stray merge conflict markers from pyproject.toml optional deps.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(title-bar): position sidebar toggle button
* feat(desktop): composer queue — queue many, edit/delete/cancel-edit, Cursor-style
Press Enter while busy with a draft to queue it; with no draft to interrupt
and send the next queued turn. Auto-drains one queued turn each time the
session settles, same as Cursor. Queue persists across reloads so an
interrupted-and-queued turn isn't lost on refresh.
Each queued row supports edit-in-composer (with explicit Save/Cancel),
send-now (↑), and delete. Drain skips only the entry currently being
edited so the rest of the queue keeps flowing.
Queue dequeue is transactional — an entry only leaves the queue after
`prompt.submit` is accepted, so a rejected submit doesn't drop the turn.
Also shrinks the `[interrupted]` marker to a muted one-liner and drops
its assistant footer so it stops looking like a real reply.
* fix(desktop): handle empty usage analytics totals
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(desktop): address PR review titlebar and usage races
Co-authored-by: Cursor <cursoragent@cursor.com>
* feat(desktop): add MCP settings and live subagent tree
Surface configured MCP servers in Settings with JSON edit/save and a gateway-backed reload action so users can manage tool servers without falling back to slash commands.
Track live subagent gateway events in a desktop store, show active subagent counts in the Agents statusbar item, and replace the Agents overlay stub with a live spawn tree for the active session.
* fix(desktop): move power-user views out of sidebar
Keep Cron and Profiles available through lower-prominence chrome entry points so the workspace sidebar stays focused on core chat navigation.
Co-authored-by: Cursor <cursoragent@cursor.com>
* refactor(desktop): subagent overlay reads like a live transcript, not a dashboard
Strip the card chrome and rewire /agents to feel like peeking into the
child agent's stream:
- subagents store: single `stream` of typed entries (thinking/tool/progress/
summary) replaces the parallel notes/thinking/tools arrays. Drop unused
fields (toolsets, depth, apiCalls, reasoningTokens, sessionId).
- agents view: no OverlayCards, no boxed stream, no per-row borders. Goal +
status pill + indented stream lines, full row width.
- Group root spawns into "Delegation N" sections when batch shape + spawn
time match — hides task-index interleaving and makes hierarchy obvious.
- Sort tree by spawn time, then task_index. Step indicator is one colored
pill (primary while running, emerald when done) inside the row, not a
trailing pill that wrapped under the chevron.
- Tree picks up `subagent.start` (not only `spawn_requested`) and prunes
delegate-tool fallback rows once native subagent events land for the
session — fixes duplicate "Delegated task" rows alongside the real ones.
* feat(desktop): Esc closes every OverlayView-based overlay
Lift the keyboard handler into the shared OverlayView so Agents, Settings,
Command Center — and anything we build on top of it later — all dismiss on
Esc by default. Nested Radix dialogs stop propagation themselves, so a
modal opened inside an overlay (e.g. model picker inside Settings) still
closes the modal first, not the overlay underneath.
Drop the now-redundant Esc handlers in Settings (kept Cmd/Ctrl+P) and
Command Center.
* fix(desktop): drop numbered step pill on subagent rows
The pill was getting clipped at the overlay edge anyway. Just use the
status glyph (●/✓/✗/■/○) — the delegation header already conveys
"3 workers, 3 active", and order in the list implies which step you're
looking at.
* fix(desktop): drop noisy "returned N items / empty object" stub strings
When a tool returns nothing useful, the row should be silent — the title
("Search Files", etc.) already tells the user what happened. Counting the
fields in an opaque payload is engineer-noise.
`formatToolResultSummary` and `minimalValueSummary` now return '' for
empty arrays / records / unrecognized values; tool-fallback already hides
the detail section when its body is empty.
* refactor(desktop): subagent rows borrow chat tool patterns (fade-in, lucide glyphs, shimmer)
Pull the agents view closer to how chat tool blocks render:
- statusGlyph() returns the same lucide BrailleSpinner / CheckCircle2 /
AlertCircle vocabulary as tool-fallback's statusGlyph
- Stream lines fade-in via useEnterAnimation (one-shot WAAPI), keyed per
entry so streamed deltas settle in instead of popping
- Subagent rows fade in too, and pick up the existing data-slot=tool-block
spacing rules between blocks
- Active stream line trails a BrailleSpinner instead of a hand-rolled
pulsing rectangle
- Goal text drops FadeText (which forces nowrap); keep FadeText only for
the single-line meta subtitle
- Running rows shimmer the title — same affordance the chat thinking row
uses
* refactor(desktop): make /agents subagent-only, drop sidebar + dead sections
Activity rail and History stub were both noise. Strip the split layout,
sidebar, route enum, and the rail/stub helpers — the overlay is now just
the spawn tree, centered in a max-w-3xl column so it stops claiming the
whole screen for one section's worth of content.
* feat: update cron modals
* Add dedicated GUI log stream for dashboard debugging.
Capture dashboard and PTY websocket lifecycle failures in gui.log and expose it via hermes logs.
* Improve desktop runtime UX by surfacing inference readiness in gateway status and hardening WSL link opening.
This also stabilizes markdown code/table block spacing and adds root-install guards so desktop dev runs use a healthy workspace dependency tree.
* Log detailed GUI websocket failure metadata.
Capture richer reject/disconnect/send/parse context for dashboard gateway websocket flows so GUI connection failures are diagnosable from logs.
* Default dashboard startup logging to GUI mode.
Detect the dashboard subcommand during early CLI bootstrap so gui.log is attached from process start and GUI startup failures are always captured.
* Clean up gateway status conditionals and logging bootstrap mode detection.
Simplify nested dashboard gateway status branches for readability and use a concise first-subcommand check when selecting early GUI logging mode.
* add logging to nsis installer
* feat: glass ui pass
* fix(desktop): persist inline assistant errors across hydrate/resume
- Detect provider failure text arriving via message.complete
(HTTP 4xx, "API call failed after N retries", Provider/Gateway
error: ...) and persist as an inline assistant error instead of
regular completion text, blocking the hydrate that was wiping it.
- preserveLocalAssistantErrors: merge by id so same-id hydrated
messages keep their local error, and preserve the optimistic
user+error pair as a unit (with tail-user dedupe).
- Hook all hydrate/resume writers (use-session-actions resume +
fallback, hydrateFromStoredSession, syncSessionStateToView) into
the merge so stale snapshots can't clobber a failed turn.
- Add error to chatMessagesEquivalent so the resume diff actually
sees error-only changes and paints them.
- editMessage on a failed turn now submits a plain resend (no
truncate_before_user_ordinal) and retries plainly on the
"no longer in session history" race.
Style polish on touched files:
- Inline error: text-only treatment (no card).
- User stop / edit-composer send: shared Tabler IconPlayerStopFilled
glyph + shared icon-button class slot for parity.
* feat(desktop): theme xterm with active light/dark mode
The right-sidebar terminal hardcoded a light palette, which read poorly
on the dark glass surface. Subscribe to `useTheme().resolvedMode` and
hot-swap `term.options.theme` so Shift+X (and any other mode change)
updates the terminal in place without tearing down the PTY session.
Dark mode uses xterm's built-in defaults (white fg/cursor + vivid ANSI
16) with just a transparent background so the glass shows through;
light mode keeps the existing hand-tuned overrides for legibility on a
bright surface.
* feat(sidebar): right-click + drag-reorder sessions and workspaces
- Wire right-click on session rows to open the same actions menu;
suppresses the OS-native context menu so Windows stops looking awful.
- Share dropdown + context menu items via useSessionActions() driving
a single declarative ItemSpec[]; render polymorphic over MenuItem.
- New shadcn ContextMenu primitive mirroring DropdownMenu styling.
- Restore drag-and-drop reordering for Agents (lost during the cwd
cleanup) and add reordering of workspace groups via a right-side
grab handle. Pinned reorder unchanged.
- Generic orderByIds<T> replaces the duplicated session/group orderers;
useSortableBindings() hook collapses the two Sortable wrappers.
- cursor-pointer on every actionable element; cursor-grab on handles.
- KISS pass: baseName() helper, AGE_TICKS table, single WORKSPACE_PAGE
constant, flatter SidebarSessionsSection render.
* feat(desktop): solarize the xterm palette in both light & dark
xterm's default ANSI 16 is tuned for dark and reads candy-bright on the
light glass surface (vivid cyans/greens). Ship the canonical Solarized
palette (Schoonover) for both modes — same 16 accents either way, only
fg/cursor swap between `base00/01` (light) and `base0/1` (dark), so a
prompt's colors look uniform across a Shift+X toggle.
Background stays transparent in both modes — Solarized's cream/slate
backgrounds would fight the glass.
* feat(desktop): virtualize chat thread + sidebar via TanStack Virtual
Replaces `use-stick-to-bottom` and per-row session rendering with
`@tanstack/react-virtual`, matching what Cursor uses.
Chat thread (`thread-virtualizer.tsx`):
- Natural-flow virtualization (padding spacers, not absolute items) so
`position: sticky` on the human bubble still resolves cleanly against
the scroller.
- Custom at-bottom anchor: pins when armed, disarms on user-driven
upward scroll, re-arms at bottom, jumps on session switch +
`thread.runStart`.
- Loading indicator and `--thread-last-message-clearance` move to a
real `[data-slot=aui_composer-clearance]` node; drops the brittle
`:nth-last-child(1 of …)` rule that can't fire reliably under
virtualization.
Sidebar (`virtual-session-list.tsx`):
- Flat agents list virtualizes at >=25 rows; pinned and
workspace-grouped paths stay direct-render.
- `SortableContext` keeps all IDs; only the window mounts; dnd-kit's
`setNodeRef` is merged with `virtualizer.measureElement` so rows
participate in both DnD hit-testing and TanStack measurement.
Drops `use-stick-to-bottom`. Streaming test gets a global
`offsetWidth/offsetHeight` stub so the virtualizer's viewport sizing
works in jsdom; the scroll-up-doesn't-pull-back invariant still passes.
* feat: more ui qa
* fix(desktop): trim sidebar terminal startup spacer
Drop zsh's initial spacer row before writing the first terminal prompt so new sidebar terminal sessions do not open with a selectable blank line.
* chore: uptick
* feat(desktop): thin installer + first-launch install.ps1 bootstrap
Converges the Windows packaged desktop installer onto a single canonical
install topology: drop the Electron shell only (~80MB instead of ~500MB),
clone Hermes Agent at a build-time-pinned commit on first launch via
install.ps1's stage protocol, and treat the resulting git checkout at
%LOCALAPPDATA%\hermes\hermes-agent\ as the canonical install location
(same path the CLI installer uses). Future updates flow through the
existing applyUpdates() git-pull path.
Replaces the previous fat-installer architecture where the .exe bundled
a pre-staged hermes-agent source tree under resources/hermes-agent/ that
was then sync'd into ACTIVE_HERMES_ROOT at launch -- a complicated
factory-vs-active dance with several footguns (FACTORY_HERMES_ROOT
mismatch on path resolve, isGitCheckout guard regressions, pyproject
hash drift detection inside the sync loop).
Architecture overview
---------------------
Build time
apps/desktop/scripts/write-build-stamp.cjs writes
apps/desktop/build/install-stamp.json with {commit, branch, builtAt,
dirty}. Honours $GITHUB_SHA / $GITHUB_REF_NAME in CI, falls back to
`git rev-parse HEAD` locally.
apps/desktop/scripts/stage-native-deps.cjs copies the runtime subset
of @homebridge/node-pty-prebuilt-multiarch from the workspace-root
node_modules into apps/desktop/build/native-deps/. Workspace dedup
hoists this dep to the root, out of reach of electron-builder's
`files:`-restricted collector; staging gives us a deterministic
path to extraResources.
electron-builder ships both into resources/install-stamp.json and
resources/native-deps/ respectively.
Boot resolver (electron/main.cjs)
Resolver order:
1. HERMES_DESKTOP_HERMES_ROOT override
2. SOURCE_REPO_ROOT (dev mode)
3. ACTIVE_HERMES_ROOT git checkout WITH .hermes-bootstrap-complete
marker -- the post-install fast path
4. `hermes` on PATH (CLI-installed user adding the desktop)
5. pip-installed hermes_cli via system Python
6. bootstrap-needed sentinel -> hand off to runBootstrap
Deletes the entire FACTORY_HERMES_ROOT / RUNTIME_MARKER /
syncTreeExcludingVenv machinery (-200 lines). The isGitCheckout
guard that bit us in the install.ps1 PR is gone.
First-launch bootstrap (electron/bootstrap-runner.cjs)
1. Resolve install.ps1: prefer SOURCE_REPO_ROOT/scripts (dev), else
download from GitHub raw at INSTALL_STAMP.commit (cached at
HERMES_HOME\bootstrap-cache\install-<sha>.ps1).
2. Fetch the stage manifest via install.ps1 -Manifest -Commit X
-Branch Y.
3. Iterate stages: install.ps1 -Stage <name> -NonInteractive -Json
-Commit X -Branch Y per stage.
4. On all stages green: write the .hermes-bootstrap-complete
marker with {schemaVersion, pinnedCommit, pinnedBranch,
completedAt, desktopVersion}.
Per-run log to HERMES_HOME\logs\bootstrap-<ts>.log. Cancellation
via AbortSignal. Manifest cache so retries don't re-download.
Install overlay (src/components/desktop-install-overlay.tsx)
Mounted alongside the existing onboarding overlay; flexbox card
with header (static) + middle (scrollable) + footer (failure-only,
static). Subscribes to hermes:bootstrap:event IPC + resyncs from
hermes:bootstrap:get on mount/reload. Renders:
- 14-stage checklist with per-stage state icons
- Overall progress bar + current-stage spotlight
- Auto-expanded installer-output panel on failure
- "Copy output" button (full ring buffer + error to clipboard)
- "Reload and retry" wired through hermes:bootstrap:reset to
clear main.cjs's latched failure
Synthetic empty-manifest event from main.cjs flips the overlay to
'active' immediately so the slow install.ps1 download doesn't
leave the user staring at the generic Preparing splash.
Failure latching (main.cjs)
bootstrapFailure module-scope variable holds the rejection after
install.ps1 fails. startHermes() throws the latched error
immediately when set, bypassing the entire ensureRuntime +
runBootstrap chain. Without this, the renderer's ensureGatewayOpen
retries would re-run install.ps1 in a 5-10 min hot loop while the
user was still reading the failure overlay. Cleared via
hermes:bootstrap:reset on user-driven retry.
Unsupported-platform overlay (1F)
macOS / Linux packaged builds (no install.sh stage protocol yet)
emit an unsupported-platform event with a copy-pasteable install
command + docs URL. Dedicated overlay branch with "Copy command"
+ "I've run it -- retry" buttons.
install.ps1 additions (Phase 1F.3 + 1F.5)
-----------------------------------------
New -Commit and -Tag string params. Precedence Commit > Tag >
Branch. Honoured by all three code paths (update / fresh clone /
ZIP fallback), with archive URL selection that handles each
ref-type variant. Detached-HEAD checkouts intentionally -- they're
pins, not branches the user pulls into.
EAP=Continue wrap around the new pin-step git invocations. `git
fetch origin <commit>` writes the routine 'From <url>' info line to
stderr; under the script's global EAP=Stop that terminates the
script even though fetch+checkout succeed. Matches the established
pattern in Install-Uv, Test-Python, _Run-NpmInstall.
Backend fix (hermes_cli/web_server.py)
--------------------------------------
CORS allow_origin_regex now accepts Origin: 'null'. Packaged
Electron loads index.html via file://; Chromium sets the WebSocket
upgrade Origin header to the opaque origin 'null', which the old
regex rejected with HTTP 403 before gateway_ws() ever ran. This
failure mode was masked in the older FACTORY_HERMES_ROOT
architecture because the resolver often found an existing hermes
on PATH with different binding behavior.
Security maintained: localhost-only bind keeps cross-machine pages
out; per-process session token still gates every authenticated
/api/ endpoint regardless of Origin.
Desktop QoL
-----------
DevTools is now enabled in packaged builds (F12 / Cmd+Opt+I).
Field-debugging trade-off: tiny attack surface increase versus
a much better support story when CSP / WS / theme issues surface.
NSIS prereq-check page deleted (-767 lines). The standard
Welcome -> License -> Directory -> InstallFiles -> Finish wizard
now installs without custom Python/Git/ripgrep detection -- those
prereqs are install.ps1's job at first launch.
Test infrastructure (Phase 1G)
------------------------------
apps/desktop/scripts/test-desktop.mjs rewritten as a cross-platform
bundle validator (was darwin-only and asserted on dead factory-
payload paths):
NEGATIVE: hermes_cli/main.py is NOT shipped (regression guard)
POSITIVE: install-stamp.json carries a real commit + branch
POSITIVE: node-pty native deps shipped under resources/native-deps
POSITIVE: renderer dist/index.html reachable (asar or unpacked)
New nsis mode and npm run test:desktop:nsis script.
Validated end-to-end on clean Win10 VM
--------------------------------------
Confirmed: NSIS installer drops Electron shell, app launches,
install overlay shows progress, install.ps1 clones the pinned
commit, 14 stages run to completion, marker written, backend
spawns, WebSocket connects, onboarding overlay asks for API key,
main UI loads, integrated terminal works.
Failures handled: bootstrap stays failed (no hot-loop retry),
"Copy output" gives actionable transcript, "Reload and retry"
explicitly re-runs install.ps1.
What's deferred
---------------
- MSIX wrapping (Phase 2): same Electron .exe under MSIX manifest
with runFullTrust, signed and submitted to Microsoft Store.
- install.sh stage protocol parity (Phase 2): once shipped, the
unsupported-platform overlay becomes drive-it-yourself and
macOS/Linux packaged installers gain feature parity with Windows.
* feat(desktop): persistent terminal pane + fullscreen takeover
Adds a VSCode-style "focus terminal" toggle to the right sidebar's Terminal
tab that takes over the chat pane area without unmounting the shell. The
xterm host is mounted once at the layout root and CSS-overlayed onto
whichever <TerminalSlot /> is currently active, so the PTY session,
scrollback, selection, focus, and WebGL renderer survive every toggle.
Also:
- WebGL renderer (matching dashboard ChatPage) so Hermes' TUI skins paint
faithfully instead of muting through xterm's default DOM renderer
- File drag/drop from the project tree or OS into xterm — paths are
shell-quoted (zsh/bash/pwsh/cmd) and written straight into the PTY
- Solarized dark canvas with brights promoted to real accent variants
(Schoonover's UI-gray brights washed out every TUI accent)
- Strip NO_COLOR/FORCE_COLOR/COLORFGBG/TERM=dumb leaking from non-tty
parents (CI runners, Cursor's agent shell) so the embedded shell gets
truecolor regardless of how Electron was launched
- rAF-debounced ResizeObserver — running fit.fit() synchronously during
sibling pane transitions crashed the WebGL texture-atlas rebuild
* fix(install.ps1): strip UTF-8 BOM regression that broke 'irm | iex'
The canonical install flow
irm https://raw.githubusercontent.com/.../scripts/install.ps1 | iex
fails on PowerShell 5.1 with a cascade of 'The assignment expression
is not valid' errors at every param() default value:
[string]$Branch = 'main',
~~~~~~
The assignment expression is not valid. The input to an assignment
operator must be an object that is able to accept assignments...
Root cause: scripts/install.ps1 carries a UTF-8 BOM (0xEF 0xBB 0xBF)
as its first three bytes. 'irm' returns the response body as a string;
on PS 5.1 the BOM survives into that string as a leading \ufeff
character. 'iex' then evaluates the string and PS's parser chokes
on the invisible character before param() -- error recovery proceeds
into the body but every assignment is reported as broken.
This was the exact failure mode the install.ps1 hardening pass (PR
#27224) deliberately fixed by stripping the BOM and ensuring the
file body is pure ASCII. Commit 4279da4db ('fix(windows): make
PowerShell installer parse in 5.1') re-introduced the BOM later,
unintentionally undoing the irm|iex compatibility fix; the merge
that brought it into bb/gui carried it forward.
Fix: strip the three BOM bytes. File body is verified pure ASCII
(any-byte > 127 returns false), so PS 5.1 with no BOM falls back to
Windows-1252 decoding which is identical to ASCII for our content.
Both install paths now work:
- 'irm ... | iex' (canonical CLI)
- 'powershell -File install.ps1' (programmatic / desktop bootstrap)
* install.ps1: detect ARM64 Windows reliably for Node and Git stages
Add a Get-WindowsArch helper that reads Win32_Processor.Architecture
via CIM (invariant to PowerShell host bitness) with PROCESSOR_ARCHITEW6432
fallback. Use it in:
- Install-Git: previously only triggered the arm64 PortableGit asset
when invoked from a native-ARM64 PowerShell host. WoW64 / emulated
x64 hosts (the default powershell.exe on Windows-on-ARM) saw
PROCESSOR_ARCHITECTURE=AMD64 and fell through to the x64 PortableGit
build, leaving ARM64 users on emulated Git for Windows.
- Test-Node: previously hardcoded the Node download to win-x64 on any
64-bit OS, so ARM64 users always got x64 Node under Prism emulation
even though Node ships an arm64 build for Windows. The winget
fallback now also passes --architecture arm64 on ARM64.
Python remains x86_64 by design: uv intentionally prefers
windows-x86_64 cpython on ARM64 hosts for ecosystem (wheel)
compatibility (see astral-sh/uv#19015).
* install.ps1: harden Install-SystemPackages against winget msstore failures
The previous winget invocation discarded stdout/stderr and trusted no
signal at all -- not the exit code (winget exits 0 even when it bails
"please specify --source"), not output (sent to Out-Null), not the
catch handler (winget returning 0 means no exception fires). The only
trust signal was a post-install Get-Command rg / Get-Command ffmpeg
check, which would also miss the package because %LOCALAPPDATA%\
Microsoft\WinGet\Links (where winget puts command aliases) is added to
PATH by AppExecutionAlias machinery only in fresh shells. End result on
machines where the msstore source has a cert problem (0x8a15005e --
common on Windows-on-ARM and some corporate networks): silent failure,
no log, no breadcrumb, and the user is told the install succeeded.
Specifically:
- Pin --source winget on every winget install call. Defeats the broken-
msstore-source path. We ship nothing from msstore so this is safe and
forward-compatible.
- Add --exact --id for a tighter package match.
- Capture each winget invocation's combined stdout/stderr + exit code to
%TEMP%\hermes-winget-<pkg>-<n>.log instead of Out-Null. On the happy
path the log is deleted after the post-install check confirms the
binary is on PATH; on failure the log is kept and its path is named in
a Write-Warn so the user has something to grep.
- Refresh PATH to include %LOCALAPPDATA%\Microsoft\WinGet\Links in
addition to the User/Machine env-var hives, so Get-Command sees newly-
installed winget aliases in the same process.
- No behavior change on the happy path. Same Write-Info/Success/Warn
cadence, same fallback order (winget -> choco -> scoop -> manual),
same $script:HasRipgrep / $script:HasFfmpeg outputs.
Verified end-to-end on a real Snapdragon ARM64 Windows host: ripgrep
uninstalled, stage re-run, [OK] ripgrep installed in 1.4s, ok:true.
* desktop: swap node-pty fork for upstream microsoft/node-pty 1.1.0
The previous dependency, @homebridge/node-pty-prebuilt-multiarch@0.13.1,
publishes no win32-arm64 prebuilds on its v0.13.x line, and its v0.14.x
betas (which do add an arm64 Windows build) ship no electron-vXXX-win32-
arm64 prebuilds at all -- so packaged Electron 40 builds (NMV 143) would
fail at runtime even on a successful npm install. Net effect: the
desktop's integrated terminal was unbuildable on Windows-on-ARM, in
both dev (npm install fails: 404 fetching the node-vXXX-win32-arm64
prebuilt) and packaged builds (no Electron-ABI prebuilt exists).
The homebridge fork was originally created because upstream node-pty
shipped no prebuilds at all. That hasn't been true since node-pty@1.0
(April 2024), which:
- bundles prebuilts for mac (arm64+x64) and Windows (arm64+x64) directly
inside the npm tarball -- no GitHub-Releases fetch, no missing-binary
failure mode
- uses N-API (node-addon-api) for ABI stability across Node and Electron
major versions, so the same pty.node binary loads under Node 22 (dev)
and Electron 40+ (packaged) without per-ABI rebuilds
- is what VS Code, Hyper, and Theia actually ship
API surface is identical (spawn / onData / onExit / write / resize /
kill) -- no call-site changes needed.
Specifically:
- apps/desktop/package.json: replace the @homebridge fork with
node-pty@1.1.0 (exact pin). Widen `asarUnpack` from `["**/*.node"]`
to also unpack `**/prebuilds/**`, because node-pty ships runtime-
execed helpers alongside its .node files (darwin spawn-helper has no
extension and would not be matched by `**/*.node`; conpty.dll,
OpenConsole.exe, winpty.dll, winpty-agent.exe on Windows are also
exec'd at runtime and cannot live inside asar).
- apps/desktop/electron/main.cjs: update both require() strings to
match the new package name and the new staged path under
resources/native-deps/node-pty/.
- apps/desktop/scripts/stage-native-deps.cjs: point at node_modules/
node-pty. node-pty's prebuilts live under prebuilds/<plat>-<arch>/
(not build/Release/), so update the include glob to copy that dir.
Per-arch staging keeps the resource bundle small (target arch comes
from npm_config_arch when electron-builder cross-builds, else
process.arch). Explicitly enumerate file types in the prebuilds glob
so the ~25 MB of .pdb debug symbols that prebuild-install bundles
for Windows crash analysis don't bloat the installer (29 MB -> 2.6 MB
staged on win32-arm64). Re-assert +x on the darwin spawn-helper
defensively, since a stripped mode bit would manifest as a silent
ENOENT at first pty.spawn().
- apps/desktop/scripts/test-desktop.mjs: update expectedNativeDepPaths()
and its assertion site to look at prebuilds/<plat>-<arch>/ instead of
build/Release/. Add an explicit spawn-helper-exists check on darwin
so a regression in the asarUnpack glob would fail loudly in CI rather
than at first PTY spawn.
Trade-off: Linux end-users lose prebuilts and fall back to building
node-pty from source on `npm install`. Acceptable because Hermes
ships no Linux desktop builds (desktop-release.yml matrix is mac + win
only, package.json declares no `linux` target), and Linux developers
hacking on the desktop already need a C++ toolchain for the rest of
the stack.
Verified on Windows 11 ARM64 (Snapdragon):
npm install -> exit 0
node -e "require('node-pty').spawn(...)" round-trip -> OK
stage-native-deps -> 27 files, 2.6 MB
load from staged tree (simulates packaged fallback) -> ConPTY
round-trip OK
* desktop+gateway: harden Slack socket recovery and Windows restart dedupe (#28873)
* desktop+gateway: harden Slack socket recovery and Windows restart dedupe
Fix Slack Socket Mode reliability by adding a watchdog/reconnect path so silent socket task drops no longer leave the adapter stuck. Harden Windows gateway lifecycle by avoiding desktop-binary path collisions, making gateway PID scans case/extension tolerant, and reusing in-flight restart actions to prevent duplicate gateway spawns.
* test(slack): add Socket Mode watchdog/reconnect behavioural coverage
Drive the new Slack Socket Mode self-healing logic through a fake AsyncSocketModeHandler so we can simulate the P0 silent-hang failure mode (task exit, transport disconnected, intentional shutdown, concurrent reconnect attempts) without touching real Slack.
* fix(slack,desktop): address Copilot review on watchdog races and path normalization
- connect(): explicitly cancel + await the prior socket watchdog before flipping _running, so an old monitor cannot exit between teardown and respawn (Copilot #1)
- _socket_watchdog_loop: wrap the body in try/except + add a done-callback that respawns on unexpected crash, so a transient bug cannot permanently disable self-healing (Copilot #2)
- normalizeExecutablePathForCompare: use the resolved path for realpathSync so non-string inputs cannot leak through (Copilot #3)
- Add tests for crash-recovery and atomic watchdog replacement across reconnects
* fix(slack): tighten connect() error path and clarify watchdog test intent
Address Copilot review round 2.
- connect(): wrap _start_socket_mode_handler/_ensure_socket_watchdog in a focused try/except so any failure rolls back partially-started handler/task state and leaves _running=False, ensuring the platform lock is always released by the outer finally
- Defer _running=True until after the handler is actually started so the watchdog observes a live socket task immediately and never spins against a half-built adapter
- Rename test_watchdog_self_restarts_after_unexpected_crash to test_watchdog_cancellation_does_not_respawn (matches what it actually asserts) and add test_watchdog_unexpected_exit_respawns_via_done_callback that drives a real RuntimeError through _on_socket_watchdog_done and verifies a fresh task replaces the crashed one
* fix(web_server): serialize action spawn check+store under a threading lock
Address Copilot review round 3.
FastAPI runs sync handlers on its threadpool, so two near-simultaneous /api/gateway/restart (or /api/hermes/update) requests could both observe "no live process" in _spawn_hermes_action's poll-based dedupe and double-spawn. Add a module-level _ACTION_SPAWN_LOCK around the entire check + Popen + _ACTION_PROCS store sequence so the dedupe is atomic across threads.
* fix: address Copilot review round 4
- slack.disconnect(): mirror connect()'s defensive cleanup — catch the broad Exception path on watchdog await so handler shutdown and lock release still run if the watchdog raised before cancellation took effect
- web_server._spawn_hermes_action: wrap subprocess.Popen in try/except so a missing executable / permission error closes the log file handle, writes a failure marker, and re-raises instead of leaking a file descriptor
- gateway._scan_gateway_pids: drop the over-broad "hermes.exe --profile" / "hermes.exe -p" patterns that would match any Hermes CLI subcommand using a profile flag (e.g. `hermes.exe --profile foo dashboard`); rely on the "hermes.exe gateway" + "hermes-gateway.exe" tokens instead
- tests: tighten _fake_create_task to assert coroutine input and return a real asyncio.Task that stays pending until pytest teardown, and update the three callsites whose mocked AsyncSocketModeHandler.start_async returned a non-coroutine value
* fix(slack): reset multi-workspace state on reconnect
Address Copilot review round 5.
connect() is reentrant (gateway restart, in-process reconnect), but it was leaving _bot_user_id / _team_clients / _team_bot_user_ids populated from the previous session. A reconnect that rotated the primary token or dropped a workspace would silently keep the stale bot user id and stale workspace client maps, leading to dispatch against gone workspaces.
Clear these three pieces of state right after _stop_socket_mode_handler() and before the auth_test loop, then let the loop repopulate from the current tokens. Add test_reconnect_refreshes_multi_workspace_state to lock it in.
* nix: package apps/desktop as .#desktop (#28964)
Adds nix/desktop.nix building the Electron renderer with buildNpmPackage
and wrapping nixpkgs' electron binary. Reuses .#default by setting
HERMES_DESKTOP_HERMES to its hermes binary, so the desktop's resolver
picks up the fully-wired nix hermes (venv, bundled skills/plugins,
runtime PATH) without reimplementing agent resolution.
- nix/desktop.nix: renderer + electron wrapper
- nix/hermes-agent.nix: finalAttrs form, exposes hermesDesktop in passthru
- nix/packages.nix: exposes .#desktop + adds to fix-lockfiles
- apps/desktop/package-lock.json: standalone hermetic lockfile
nix build .#desktop && nix run .#desktop both clean.
* fix(desktop): probe steps 4 & 5 of resolveHermesBackend before trusting
A user-reported failure on Windows-on-ARM: a pre-installed Python 3.13
on PATH makes findSystemPython() succeed, so resolveHermesBackend
returns a backend pointing at it -- but hermes_cli isn't in that
interpreter's site-packages. The spawn dies with ModuleNotFoundError
and the user sees a dead GUI instead of the first-launch installer.
Same shape can hit step 4 (existing `hermes` on PATH) when a stale
shim survives a partial uninstall.
Add cheap exit-code probes -- `python -c "import hermes_cli"` for
step 5, `<hermes> --version` for step 4 -- and fall through to step 6
(bootstrap-needed) on failure. install.ps1 then runs as if on a clean
box and the venv gets built.
Probes live in a standalone electron/backend-probes.cjs module so they
can be unit-tested with node --test, same pattern as bootstrap-platform.cjs
and hardening.cjs. New test file wired into test:desktop:platforms.
* test(desktop): allow `node-pty` bare-require in packaged entrypoints
Pre-existing failure on bb/gui since c858484b4 swapped the node-pty
fork for upstream microsoft/node-pty 1.1.0. main.cjs intentionally
bare-requires node-pty (it's hoisted by workspace dedup in dev, and
staged to resources/native-deps via scripts/stage-native-deps.cjs +
extraResources for packaged builds, with a try/catch fallback at
line ~38). The allowlist hadn't been updated to match -- same shape
as `electron`, which was already allowed.
* chore(deps): refresh root lockfile for dashboard @nous-research/ui 0.14.0
apps/dashboard/package.json was bumped to @nous-research/ui 0.14.0 (+
flag-icons ^7.5.0, motion ^12.38.0) but the root package-lock.json was
never refreshed. Running `npm install` from the repo root now
materialises 0.14.0's transitive closure (launder, bumps for
@nanostores/react, nanostores, sanitize-html, tailwind-merge).
No code changes; purely a lockfile catch-up so fresh checkouts on bb/gui
get a working dashboard install.
* chore(desktop): bump version to 0.0.1
First non-placeholder version so electron-builder's artifactName template
produces `Hermes-0.0.1-win-x64.exe` instead of the obviously-unreleased
`Hermes-0.0.0-...`. No release process yet; this just stops the artifact
filename from telling users "you got a debug build."
Bumped in three slots that all carry the desktop app's version:
- apps/desktop/package.json (source of truth)
- apps/desktop/package-lock.json (per-app lockfile, kept for CI parity)
- root package-lock.json's apps/desktop workspace entry
Identity-of-build for first-launch bootstrap continues to come from
build/install-stamp.json (commit SHA + builtAt), unchanged.
* fix: fs icon color
* perf(desktop): cut per-keystroke layout + listener churn in chat composer
Empirical work via CDP harnesses under apps/desktop/scripts/ (see
profile-typing-lag.md):
jsListeners growth (per round of 200 chars + GC):
before: +35 (verified leak — listeners stuck after 1st trigger popover use)
after: +0
Four narrow edits in src/app/chat/composer/index.tsx:
1. Drop the per-keystroke `editorRef.current.scrollHeight` read used to
decide composer expansion. Replace with `draft.length > 60` heuristic;
the existing ResizeObserver still catches edge cases. `scrollHeight`
is a forced-layout call and was firing on every char until the first
wrap.
2. Bucket measured composer height to 8px before writing
`--composer-measured-height` / `--composer-surface-measured-height`
on `documentElement`. Without this, the editor grows ~1px per char,
setProperty fires every keystroke, computed style is invalidated tree-
wide.
3. Remove the dead `$composerDraft` two-way sync. Nothing outside the
composer subscribed to that atom (verified via grep). Two useEffects
on `[draft]` were pushing draft→atom and atom→aui per keystroke for
no consumer. Also drop the per-keystroke
`reconcileComposerTerminalSelections` call; it was pruning stale
labels for `terminalContextBlocksFromDraft`, but that helper already
ignores labels not in the current submitted text, so pruning per
keystroke was just bookkeeping.
4. `refreshTrigger` fast-bails when the draft contains neither `@` nor
`/`. Previously `textBeforeCaret(editor)` ran on every input/keyup
regardless; `range.toString()` inside is O(n) over draft length.
Synthetic typing latency p50/p90/p99 is similar before vs after on a
freshly-loaded session (Blink can already handle ~30cps typing into a
contentEditable on its own); the real win is the listener leak being
gone and the global computed-style invalidations dropping ~8× when the
composer is sitting at a fixed height row.
The `Enter → stall` follow-up (see profile-typing-lag.md §"Submit /
TTFT stall") is unmeasured here — needs a throwaway session because
the harness fires a real prompt. Not blocking this commit.
* perf(desktop): cut FadeText forced layouts during streaming
The slowest user-felt path is typing into the composer while the
assistant is streaming. Profile (scripts/profile-under-stream.mjs):
FadeText measureOverflow self time: 35.8 ms → 18.1 ms (-50%)
total active CPU during 7s window: ~150 ms → ~50 ms
Two changes in src/components/ui/fade-text.tsx:
1. Drop the `useEffect([children])` that re-ran `measureOverflow`
(reads scrollWidth + clientWidth — forced layout) on every parent
re-render. `useResizeObserver` already fires the same callback on
mount and whenever the host span's box size changes; that covers
the only case where overflow state can legitimately change. The
previous explicit useEffect was a forced-layout flush on every
parent render, which during streaming meant every token tick.
2. Wrap the component in `memo` with a custom comparator that
short-circuits the entire render when scalar string `children` and
the className/fadeWidth/style props are unchanged. The hot path
was tool-fallback's title chips being re-rendered by parent
streaming updates even though their text was stable; memo+
comparator skips that.
Also adds two harness scripts under apps/desktop/scripts/:
- latency-under-stream.mjs (key→paint latency while a turn streams)
- profile-under-stream.mjs (CPU profile while a turn streams)
Updates profile-typing-lag.md with the streaming numbers and confirms
the Enter→paint submit path is already fast (≤320ms on the populated
session; the 2s "stall after Enter" the user noticed once was a
one-time cold-start, not reproducible at the UI layer).
I'd guess the felt jank in real use is fast-burst typing during a
long-form streaming reply (code blocks + markdown lists multiply the
per-token render cost). The CPU savings here scale linearly with
token volume.
* chore(desktop): drop diag scratch scripts no longer needed
* docs(desktop): correct leak-typing numbers on a real session
Re-ran the leak harness on a populated session (Phaser thread) for both
unpatched and patched builds. The original 'listener leak' was transient
warm-up cost, not a steady-state leak — both versions show 0 listener
growth/round in steady state.
The load-bearing number is forced layouts per character:
unpatched (HEAD~2): 7.02 layouts/char
patched (HEAD): 2.35 layouts/char (3× fewer)
The patches reduce per-char forced-layout work to Blink's natural floor.
Document node count and heap are flat in both builds.
* perf(desktop): fix "Enter jumps up" on long threads
User reported: after pressing Enter on a long thread, the view jumps up
— the just-submitted message disappears below the fold. Confirmed via
apps/desktop/scripts/measure-jump.mjs:
before: distFromBottom 0 → 49.5px, sticks there permanently
after: distFromBottom 0 → ~0 (worst case 4px for one frame)
Root cause in useThreadScrollAnchor (thread-virtualizer.tsx):
1. The sticky-bottom logic disarmed on any scroll event where
`scrollTop < lastTopRef.current`. That check can't distinguish a
user scrolling up from a programmatic `pinToBottom` write that
the browser clamped short of bottom (because content also grew in
the same frame, so `scrollTop = scrollHeight` lands at
`scrollHeight - clientHeight` for the OLD scrollHeight, which is
now below the NEW scrollHeight). Result: sticky-bottom disarmed
permanently on the user's first submit.
2. There was no synchronous pin tied to React's commit phase. By the
time the ResizeObserver fired and re-pinned, the user had already
seen ~50ms of "message below the fold" — visually that reads as the
view jumping up.
Fix:
- `programmaticScrollPendingRef` counter tracks scroll events we
expect to be ours (one per `pinToBottom` write). The scroll handler
skips the disarm check when consuming a pending tick, keeps the
arm bit true, and re-pins synchronously if the browser clamped us
short of bottom. A depth cap (8) breaks runaway loops in
pathological streaming-burst layouts.
- `useLayoutEffect` on `groupCount` increase pins BEFORE the browser
paints, eliminating the visible ~50ms window between optimistic
user-message insert and the RO/scroll-event chain firing.
Verified on the long Cloud Shadows thread (7-8 turns, ~11k px tall):
all three repro runs now hold within 0–4 px of bottom across the
post-Enter transition. Submit latency unchanged (paint 77–107 ms),
streaming-typing latency unchanged.
Also adds three debug harnesses:
- measure-jump.mjs — sample thread scroll across Enter
- probe-thread.mjs — dump current thread / scroll state
- diag-jump.mjs — intercept scrollTop + RO + mutations across Enter
* perf(desktop): rate-limit thread auto-pin during streaming
Follow-up to the Enter-jump fix. The first version did a synchronous
re-pin loop inside the on-scroll handler when the browser clamped our
`scrollTop = scrollHeight` write short of the new bottom; that gave a
tight 4 px visible jump on Enter, but during streaming the
ResizeObserver fires many times per second as content grows, and each
RO callback re-entered the pin loop. CPU profile showed
`Virtualizer.getMaxScrollOffset` climbing to 22 ms self over a typing-
during-streaming window — the sync re-pin path was paying tanstack-
virtual's recompute cost ~3× per token.
Re-architect:
- RO callback coalesces to one pin per animation frame. Streaming-rate
RO bursts now cost the same as a single per-frame pin.
- The on-scroll programmatic-counter guard remains (it's what prevents
the false-disarm bug when the browser clamps a write). It no longer
does sync re-pins; the next RO/rAF will catch up.
- The useLayoutEffect on groupCount (the path that fires on user
submit / new turn arrival) ALSO schedules one rAF pin in addition to
the synchronous pin. This catches the case where React mounts the
new message in a second commit (after our layout effect ran), which
grows scrollHeight again. Two pins instead of a tight loop, paid only
once per turn change.
Net effect on the Cloud Shadows long thread:
enter-jump transient: 12–20 px for 1 frame (was 49 px permanent)
CPU during stream+type: `getMaxScrollOffset` dropped out of top-5
self-time list
typing-during-stream: p50 ~10 ms paint, p99 ~20 ms (1 frame),
occasional 40 ms+ outliers during burst
token arrivals
Also adds scripts/profile-long-stream.mjs: 20-second streaming profile
with per-500ms FPS histogram + content-length tracking, so we can see
whether streaming render cost grows with message length (it doesn't —
sustained 60 fps).
* perf(desktop): use textContent for trigger precondition
Replace composerPlainText() call inside refreshTrigger's no-trigger
fast-bail with a textContent check. textContent is a browser-native
flat traversal; composerPlainText walks recursively with chip-aware
logic. We only need to know if @ or / appears; either way the trigger
char will be in textContent because chips contain @ in their refText.
Profile shows composerPlainText was ~18ms self over a 12s typing-during-
stream window, called from refreshTrigger on every keystroke. Most of
that was the precondition check (the trigger detection path is the
slow path but only runs when a trigger char is present).
* Revert "perf(desktop): use textContent for trigger precondition"
This reverts commit a6a78ff08a.
* Revert "perf(desktop): cut FadeText forced layouts during streaming"
This reverts commit 88e7d7537c.
* Revert "perf(desktop): cut per-keystroke layout + listener churn in chat composer"
This reverts commit bff1b3261d.
* Revert "Revert "perf(desktop): cut per-keystroke layout + listener churn in chat composer""
This reverts commit b7b378e3a4.
* Revert "Revert "perf(desktop): use textContent for trigger precondition""
This reverts commit 0739588f48.
* chore(desktop): synthetic-stream perf harness + scripts
Drops the React `<Profiler>` approach (no-op because Vite is currently
serving the production React build) in favor of an externally-observable
measurement stack: rAF frame intervals, `PerformanceObserver({entryTypes:
['longtask']})`, and a `MutationObserver` on the live streaming message.
Adds a synthetic stream driver — `window.__PERF_DRIVE__.stream({...})` —
that pushes tokens through the live `$messages` atom at a controlled rate,
so the assistant-ui runtime, incremental repository, and Streamdown
markdown pipeline see the same workload they'd see during a real LLM
stream, without the LLM cost.
The driver lives in `src/app/chat/perf-probe.tsx`; `main.tsx` side-imports
it under `import.meta.env.MODE !== 'production'` so it tree-shakes out of
prod builds. (Using `MODE` rather than `DEV` because our Vite setup
currently reports `DEV=false` even under `vite dev` — see the dev-build
note in `profile-typing-lag.md`.)
Scripts:
- measure-synthetic-stream.mjs drive synthetic + record frame/longtask/mutation
- profile-synth-stream.mjs CPU profile + top self-time during synthetic
- measure-real-stream.mjs same harness, real LLM stream
- profile-real-stream.mjs CPU profile bracketing the real stream window
- eval.mjs / reload.mjs small CDP helpers
A real-LLM measurement on Cloud Shadows (gpt-4o-mini, 39 s window) showed
12 longtasks in the same 75-127 ms range the synthetic predicted, so the
synthetic is a faithful proxy.
* perf(desktop): memo FadeText so it skips re-renders when text unchanged
FadeText is used 110+ times inside `tool-fallback.tsx` on a tool-heavy
thread. During streaming each parent re-render previously triggered the
component's `useEffect([children])`, which forced a `scrollWidth` layout
read even when the title text was unchanged. The `useResizeObserver` was
already covering the genuine resize case, so that effect was strictly
redundant work.
Drops the effect and wraps the component in `React.memo` with a custom
comparator that field-compares `className`, `fadeWidth`, and `style`,
plus identity-compares `children` (scalar fast-path; correct for JSX
nodes too since a new node should force a re-render).
Verified via temporary render counter on the 34 MB
`session_20260514_215353_fe0ac8` thread (110 FadeText instances): a
2 s synthetic stream went from ~11k FadeText render calls to 122 —
roughly one render per truly-new instance instead of one per parent
commit per instance.
Doesn't move the longtask needle on its own (Streamdown's markdown
re-parse dwarfs it) but eliminates a steady CPU floor and a class of
forced layouts during streaming. Profile-typing-lag.md documents the
full investigation, including the remaining Streamdown cost as the
real source of the perceived "5 fps moment" hitches.
* perf(desktop): memoize MarkdownText plugins to stop churning Streamdown
The inline `plugins={{ math: mathPlugin, ...(isStreaming ? {} : { code }) }}`
on `<StreamdownTextPrimitive>` constructed a new object literal on every
parent render. That broke `<Streamdown>`'s outer memo and forced its
internal `rehypePlugins` / `remarkPlugins` array useMemos to rebuild,
which propagates a new identity into every `<Block>` and defeats Block's
memoization for stable historical blocks.
After memoizing on `[isStreaming]` (the only real dimension of variance),
CPU profile during a 5 s synthetic stream on the 34 MB session shows
`parser` self-time dropping out of the top 10, `compile` cut roughly in
half, and `bn$1` / `m$1` (micromark internals) leaving the top entries.
Doesn't move the visible longtask count on its own — Streamdown's
per-Block parse cost still dominates whenever the last block's content
changes — but it removes a class of unnecessary re-parses for historical
blocks during streaming. See `scripts/profile-typing-lag.md` for the
full investigation.
* perf(desktop): floor assistant-text flush gap to 33ms for predictable batching
`scheduleDeltaFlush` previously coalesced via `requestAnimationFrame`
only. The "at most one flush per frame" guarantee that gives you is fine
for fast streams (>~80 tok/sec) where multiple tokens arrive within a
single frame, but breaks down at typical LLM token rates (30-80 tok/sec)
where each token arrives slower than the rAF cadence and triggers its
own React commit + Streamdown markdown re-parse.
Track `lastFlushAt` and require at least 33 ms between two flushes.
React 18+ auto-batching probabilistically already collapsed some of
these, but the floor makes it deterministic.
A/B on the 34 MB session, 300 tokens at 50 tok/sec (markdown chunks):
| | avgFps | p99 frame | LTs / 5 s | max LT |
|---|---|---|---|---|
| no floor (current rAF) | 54.0 | 38 ms | 2.0 | 145 ms |
| 33 ms floor (this PR) | 54.3 | 41 ms | 1.7 | 110 ms |
`inter-mutation` p50 also tightens from 22-28 ms to a clean 33 ms,
which is the expected signature of a deterministic floor. Doesn't fully
solve the user's perceived hitches — Streamdown's per-Block parse cost
when the last block grows past ~2 k chars is still the elephant — but
it consistently shaves the worst-case longtask and makes the streaming
cadence visibly steadier.
Also threads a matching `flushMinMs` option through the synthetic
stream driver in `perf-probe.tsx` + `scripts/measure-synthetic-stream.mjs`
so the harness can A/B both regimes without spending LLM credits.
See `scripts/profile-typing-lag.md` for the full investigation.
* perf(desktop): useDeferredValue for streaming markdown so parses don't block input
Streamdown's per-Block parse cost grows with the live tail's length and
is unavoidable inside the block-memo pattern (industry standard, see
findings doc). The fix is to stop having that work block the main thread.
`<DeferStreamingText>` is a 12-line wrapper that reads message-part state
via `useMessagePartText`, runs it through `useDeferredValue`, and
re-publishes via assistant-ui's `<TextMessagePartProvider>`. The inner
`<StreamdownTextPrimitive>` reads the deferred value through the normal
`useMessagePartText` hook — no fork, no internal-path imports, fully on
assistant-ui's public API. React's concurrent scheduler then:
- abandons in-flight deferred renders when a newer token arrives, so
intermediate states get skipped under fast streams
- deprioritises the markdown render when the main thread has urgent
work (typing, scroll), so input stays responsive even while a
100ms parse is queued
Streamdown already uses `useTransition` for its block-array setState;
this lifts the deferral up to the consumer boundary so it covers the
whole pipeline (preprocess → split → repair → parse → render).
A/B on the 34 MB session, 300 tokens at 50 tok/sec, markdown chunks
(four trials each, with the 33ms flush throttle on for both):
| | avgFps | p99 frame | LTs/5s | max LT | typing-while-stream p95 |
|---|---|---|---|---|---|
| pre | 54.3 | 41 ms | 1.7 | 110 ms | ~17 ms |
| post | 58.5 | 31 ms | 2.0 | 117 ms | 14-18 ms |
Longtask count + max LT unchanged — useDeferredValue doesn't reduce
CPU, only its priority. The avgFps lift and p99 frame drop are the
proof that the existing CPU is no longer blocking 60 fps cadence. One
clean run logged MUTATIONS=0 — React skipped every intermediate text
state and only committed the final one (textbook deferred-value
behaviour).
The actually-reduce-CPU path is replacing the parser with a state
machine like Flowdown — left for a future PR; see
`apps/desktop/scripts/profile-typing-lag.md` for the full investigation.
* feat(desktop): add hermes gui launcher
* feat(desktop): launch packaged gui builds by default
* bump gui version to 0.0.2
* fix(dashboard): allow file:// origin on loopback WS + diagnostic logging
Upstream commit 2e66eefbc ("fix(dashboard): validate WebSocket Host
and Origin") added a WebSocket Host/Origin guard to block DNS
rebinding against the dashboard. The guard rejects any Origin whose
scheme is not http/https or whose netloc is empty — which includes
Electron's renderer Origin: file:// when the desktop app loads its
bundle from disk in production mode.
That makes the bb/gui Electron desktop unable to open the gateway
WebSocket against the embedded backend on Windows / macOS prod
builds. The renderer reports "Desktop boot failed" and the backend
logs:
WARNING hermes_cli.web_server: gateway-ws reject
peer=127.0.0.1:NNNN reason=non_loopback_or_bad_origin
bound_host=127.0.0.1 close_code=4403
DNS-rebinding requires a DNS-resolvable hostname; file:// has no
host component and therefore cannot be the attack vector this guard
exists to block. When bound to a loopback interface (127.0.0.1 /
::1 / localhost), accept file:// origins so desktop wrappers can
attach. Non-loopback binds (operator opted into network exposure)
keep rejecting file:// — the loose policy doesn't apply.
Also adds per-reason diagnostic logging in
_ws_host_origin_is_allowed, so future ws-guard rejections name the
specific clause that fired (bad_host / bad_origin_scheme /
origin_host_mismatch) instead of the opaque
"non_loopback_or_bad_origin" surfaced at the call site.
Verified against tests/hermes_cli/test_web_server_host_header.py
(all 11 upstream tests still pass) and hand-tested by opening the
bb/gui Electron desktop dev build against the patched backend.
* fix(tui_gateway): restore _content_display_text helper
Bb/gui had dropped the helper but the orchestrator code merged from main
still calls it (_inflight_text, _message_preview). Re-add the definition
verbatim from main so session.create / _start_inflight_turn don't crash
with NameError on first prompt submit.
* fix(tui-gateway): restore _content_display_text helper lost in main merge
The May 27 merge of origin/main into bb/gui re-introduced two callers of
_content_display_text (in _inflight_text and _history_to_messages) but
dropped the helper definition itself, leaving an unresolved reference.
NameError fires on every user message via _start_inflight_turn ->
_inflight_text, taking down both the TUI and the desktop (which share
this gateway backend) the moment input is dispatched.
Restores the helper verbatim from main (commit 36c99af37) -- pure
structured-content text extractor, no other dependencies.
* fix(telegram): import Set for _dm_topic_chat_ids annotation
self._dm_topic_chat_ids: Set[str] = {...} at line 460 references Set
but only Dict, List, Optional, Any are imported from typing. The file
has no 'from __future__ import annotations', so the annotation is
evaluated at runtime and raises NameError on TelegramAdapter
construction.
* fix(setup): drop shadowing inner importlib.util re-imports
_print_setup_summary and _setup_tts_provider each had 'import
importlib.util' inside a try: block nested deeper in the function
body. Python flips importlib to function-local for the whole scope,
so earlier references in the same function (the neutts branches at
lines 493 / 1109) hit UnboundLocalError before the late import can
run.
The top-of-module 'import importlib.util' at line 14 already covers
both call sites, so dropping the redundant inner imports restores
the intended behavior.
* feat(install.ps1): add -IncludeDesktop switch + Stage-Desktop
The new Hermes-Setup.exe (Tauri bootstrap installer) passes -IncludeDesktop
so users who install via the GUI end up with a launchable Hermes.exe at
apps/desktop/release/<os>-unpacked/. Existing flows are unchanged:
* The 'irm install.ps1 | iex' CLI one-liner omits the flag — terminal
users don't need a prebuilt desktop binary; 'hermes desktop' builds
on demand.
* The Electron desktop's bootstrap-runner.cjs also omits the flag —
rebuilding apps/desktop from inside a running Hermes.exe would try
to overwrite the live binary on disk and fail.
Stage-Desktop runs after Stage-NodeDeps so workspace npm is already
installed when electron-builder fires. It does:
1. 'npm install' at repo root so apps/* workspaces resolve their deps
(Electron itself arrives via npm here, ~150MB)
2. 'npm run pack' in apps/desktop (tsc + vite + electron-builder --dir)
3. Probes apps/desktop/release/{win-unpacked,win-arm64-unpacked}/Hermes.exe
The --dir mode produces an unpacked launchable binary without an NSIS/MSI
installer artifact — we don't need one because Hermes-Setup.exe spawns the
unpacked binary directly via launch_hermes_desktop.
* feat(installer): Tauri bootstrap installer for first-time onboarding
Hermes-Setup.exe is a small signed Rust+Tauri binary that drives
scripts/install.ps1 stage-by-stage with a native UI matching the
desktop's design language. Replaces the chicken-and-egg pattern of
shipping a 200MB Electron app whose first launch existed only to
run install.ps1.
The architecture:
Rust backend (src-tauri/):
bootstrap.rs orchestrator -- Tauri commands, stage iteration
install_script.rs resolve install.ps1 (dev checkout, cache, GitHub raw)
powershell.rs spawn powershell, line-stream stdout/stderr, parse JSON
events.rs BootstrapEvent types -- mirror bootstrap-runner.cjs
paths.rs HERMES_HOME resolution + tracing log setup
build.rs bakes BUILD_PIN_COMMIT / BUILD_PIN_BRANCH from
'git rev-parse HEAD' at compile time
React frontend (src/):
Tauri webview rendering 4 screens (welcome / progress / success /
failure), driven by nanostores subscribing to the Rust event stream.
Visual layer reuses the desktop's styles.css wholesale via @import
so the installer and desktop never drift visually.
Distribution:
targets = ['app', 'dmg', 'appimage'] -- no NSIS/MSI wrapper. The
raw target/release/Hermes-Setup.exe IS the artifact on Windows;
.dmg + .app on macOS; AppImage on Linux. One file, double-click,
no installer-installing-an-installer pattern.
Compile-time pinning:
build.rs reads 'git rev-parse HEAD' and emits
cargo:rustc-env=BUILD_PIN_COMMIT=<sha> + BUILD_PIN_BRANCH=<branch>.
bootstrap.rs's option_env!() picks these up so the binary fetches
install.ps1 from the exact SHA it was tested against. CI / release
builds can override via HERMES_BUILD_PIN_COMMIT env var.
Windows manifest:
hermes-setup.manifest declares level='asInvoker' so the
productName 'Hermes Setup' doesn't trip Windows's installer-
detection heuristic and refuse to launch without elevation.
Also declares PerMonitorV2 DPI + UTF-8 active code page + Common
Controls v6.
Limitations of this initial version:
* No code signing -- Windows SmartScreen will warn once on Hermes-Setup.exe
('More info -> Run anyway'). The downstream binaries it produces
(Hermes.exe in win-unpacked/, the hermes CLI) are locally-built and
therefore don't carry MOTW, so they launch without SmartScreen
intervention. Cert procurement tracked separately.
* macOS and Linux build paths defined but untested -- Windows-only V1.
* fix(installer): pass -IncludeDesktop to manifest, surface launch errors, alias hermes desktop
Three bugs found in the first VM end-to-end test:
1. install.ps1 -Manifest was called WITHOUT -IncludeDesktop, so the
manifest came back with the 14-stage list (no desktop stage), the
UI showed '14 steps' and Stage-Desktop never ran. Pass the flag to
both the manifest fetch and the per-stage runs — install.ps1 gates
the desktop stage's inclusion on the flag.
2. The Success screen's Launch button silently swallowed the Tauri
error when no Hermes.exe existed (e.g. Stage-Desktop was skipped).
Wire the error through to inline UI with an alert callout, so the
user gets actionable text ('Hermes.exe missing, run hermes desktop
from a terminal') instead of an unresponsive button.
3. The Success screen tells users to run 'hermes desktop' from a
terminal but the CLI only accepted 'hermes gui' — invalid choice
for 'desktop'. Rename the subcommand canonically to 'desktop' with
'gui' as a backwards-compatible alias. Update the _SUBCOMMANDS sets
used by session-flag arg parsing + logging-mode probe so both names
route to the same logic.
* fix(install.ps1): pre-warm electron-builder winCodeSign cache + fix Stage-Desktop $HasNode false-skip
Two bugs caught in the second VM end-to-end run:
1. electron-builder's winCodeSign extraction fails on grandma-class
Windows boxes because the .7z archive contains macOS symlinks
(darwin/10.12/lib/libcrypto.dylib and libssl.dylib pointing at
versioned siblings). Creating symlinks on Windows requires
SeCreateSymbolicLinkPrivilege, a per-user right that non-admin
accounts don't have on stock Windows. Result: every fresh install
on a non-admin user fails Stage-Desktop with a 7-Zip 'cannot create
symbolic link' error, retried four times, then bails.
Fix: Initialize-ElectronBuilderCache pre-extracts winCodeSign-2.6.0.7z
ourselves with -snl (don't preserve symlinks, store as resolved file
content) AND -x!darwin (skip the entire macOS subtree — irrelevant
on Windows). Writes to electron-builder's expected cache dir before
electron-builder gets a chance to try its own broken extraction.
Idempotent — fast-paths via signtool.exe sentinel check.
2. Install-Desktop's first guard was 'if (-not $HasNode) skip'.
$HasNode is set by Stage-Node into $script:HasNode, but in
cross-process driver mode (each -Stage NAME is a fresh powershell.exe
spawned by Hermes-Setup.exe), that script-scope variable from the
PREVIOUS process is invisible — so the guard always fired and
Install-Desktop returned in 900ms with a misleading
'Node.js not available' reason. The real npm probe below it never
got to run. Fix: re-probe npm directly via Get-Command when $HasNode
is empty/false, since by that point Stage-Node has already verified
Node is installed and the only question is whether *this* process
can see it on PATH (it can — installer-wide PATH update from Stage-Node).
* fix(install.ps1): tell electron-builder we're NOT signing instead of pre-extracting winCodeSign
The previous commit (c7e46f9f3) worked around the winCodeSign-symlinks-
on-Windows extraction crash by pre-extracting the archive ourselves with
-snl + -x!darwin. That fix was correct but addressed the wrong layer.
The deeper question: why was electron-builder fetching winCodeSign at all
when we have no signing cert configured? Answer: electron-builder
unconditionally pre-warms the toolchain assuming any build MIGHT sign.
The cert auto-discovery never finds anything (we never set CSC_LINK
or anything else), so the signing never happens — but the 100MB fetch
of winCodeSign and its broken-on-Windows symlink extraction does.
Set CSC_IDENTITY_AUTO_DISCOVERY=false (with WIN_CSC_LINK and
WIN_CSC_KEY_PASSWORD also explicitly cleared as belt-and-suspenders)
before invoking npm run pack, and electron-builder skips the entire
winCodeSign apparatus. No download, no extraction, no privilege check.
Env vars are saved/restored around the invocation so we don't leak
the override into Stage-PlatformSdks etc.
Net: removes the 100-line Initialize-ElectronBuilderCache helper that
manually downloaded + extracted winCodeSign-2.6.0.7z. Replaced with
3 env-var assignments. The produced Hermes.exe is functionally
identical — just no longer carries a code-signing-machinery dependency
we never used.
* fix(installer): bump bootstrap-installer.log to capture stage transitions + every install.ps1 line
Diagnosing the second VM failure was impossible because bootstrap-installer.log
contained only the 'starting' banner. Two causes:
1. emit_log() inside run_bootstrap() was tracing::debug! — dropped on the
floor under the default INFO env-filter.
2. The per-stage sink callbacks (on_stdout_line / on_stderr_line) only
emitted Tauri events to the frontend; they never tee'd to the log file
at all. When the failure route mounts, the Tauri event stream is the
only place the script output lived, and it gets discarded.
3. The Failed / Stage / Manifest / Complete lifecycle frames in emit_event()
were also Tauri-only — so even the 'which stage failed' frame never
reached the log.
Fixes:
* emit_log() → tracing::info!
* Sink callbacks tee stdout to info!, stderr to warn!, with stage label
as a structured field for grep'ability
* emit_event() now matches on the variant and logs each lifecycle frame
at the right level: Failed → tracing::error!, others → info!
Result: a failing install leaves a complete forensic trail in
bootstrap-installer.log — manifest stage list, every install.ps1
stdout/stderr line tagged by stage, the stage transitions, and the
final error. Same path as before so nothing the user does changes.
* fix(install.ps1): Stage-NodeDeps cross-process $HasNode + stream npm install output to bootstrap log
VM run 3 diagnosis: node-deps stage skipped on the VM (logged
'Skipping Node.js dependencies (Node not installed)') and then
desktop's npm install failed with exit 1 and zero diagnostic detail.
Two root causes:
1. $HasNode false-skip in Stage-NodeDeps — same cross-process bug
pattern we fixed for Stage-Desktop in c7e46f9f3. Stage-Node ran
in process A and set $script:HasNode = $true, then exited. Stage-
NodeDeps ran in fresh process B (Hermes-Setup.exe -Stage NAME
spawns each stage independently), where that variable doesn't
exist. Re-probe via Get-Command npm instead of trusting the
stale script-scope global. The previous stage already verified
Node so the re-probe succeeds.
2. npm install --silent + Tee to TEMP file hid the real error.
When the workspace install failed on the VM, the actual reason
was buffered in $env:TEMP\hermes-npm-desktop-install-*.log and
the user saw only 'exit 1'. Drop --silent so npm streams its
full output, drop the TEMP-file dance — the Tauri installer's
streaming sink already tees every stdout/stderr line to the
rolling bootstrap-installer.log, so a side log file is dead
weight that hides the very error we need.
After this, the bootstrap log on a failure will contain npm's full
output (deprecation warnings, ETARGET, native-module compile errors,
whatever) tagged with stage=desktop, making the actual cause
diagnosable instead of an opaque exit code.
* fix(install.ps1): restore Initialize-ElectronBuilderCache (CSC env vars alone aren't enough)
VM run 4 diagnosis: even with CSC_IDENTITY_AUTO_DISCOVERY=false set,
electron-builder still fetches winCodeSign and signs bundled binaries.
The log shows the signing happens BEFORE the cache extraction:
• signing with signtool.exe ...\winpty-agent.exe
• signing with signtool.exe ...\OpenConsole.exe
• downloading winCodeSign-2.6.0.7z
• <symlink privilege error>
Cause: node-pty's bundled prebuilds are listed in apps/desktop's
asarUnpack ['**/*.node', '**/prebuilds/**']. electron-builder
re-signs anything unpacked from asar, regardless of whether OUR
binary gets signed. The signtool invocation needs winCodeSign on
disk, which needs the .7z extracted, which hits the macOS-symlink
crash on non-admin Windows.
The CSC env vars I added in d5fe46727 only kill IDENTITY DISCOVERY
(so OUR Hermes.exe stays unsigned, which is fine — we have no cert).
They don't prevent the toolchain fetch for the bundled-prebuild
re-sign. I removed the pre-extract in d5fe46727 thinking the env
vars subsumed it; that was wrong. Both are needed.
Restoring Initialize-ElectronBuilderCache verbatim from c7e46f9f3
and keeping the CSC env vars. Wrote a clearer doc-comment at the
call site explaining the two-knob interaction so future maintainers
don't drop one half again.
* fix(desktop): disable signtool via signtoolOptions.sign=null, drop dead winCodeSign pre-extract
VM run 5 diagnosis: the pre-extract from 3b29e65c1 ran (extracted 83
files, 24MB) but produced ZERO files at the expected sentinel path
'/winCodeSign-2.6.0/windows-10/x64/signtool.exe'.
Cause: the .7z archive's root entries are 'windows-10/', 'darwin/',
'linux/', etc. — not 'winCodeSign-2.6.0/<arch>'. Extracting with
'-o$cacheRoot' put files at $cacheRoot/windows-10/..., NOT at
$cacheRoot/winCodeSign-2.6.0/windows-10/.... I had the directory
nesting wrong from the start.
And then we observed: electron-builder downloads winCodeSign-2.6.0.7z
under a random numeric filename ('384387955.7z') regardless of what's
already extracted in the parent dir. The cache key isn't the dirname;
it's content-addressed. So the pre-extract approach was doomed even
if the path nesting had been right.
Actual fix: signtoolOptions.sign=null in apps/desktop/package.json's
win build config. electron-builder honors this and skips the bundled-
prebuild signing entirely — no signtool invocation, no winCodeSign
fetch, no symlink-privilege crash. The previous failures all stemmed
from electron-builder pre-signing node-pty's bundled .exes
(winpty-agent.exe, OpenConsole.exe) which are already author-signed
upstream; re-signing with our nonexistent cert was overwriting good
sigs with nothing useful anyway.
Cost: when we DO get a real cert later, we'll add it back with the
sign function pointing at the cert chain. Until then, all-null is
the correct config and unblocks every non-admin Windows user.
Removed Initialize-ElectronBuilderCache (the dead pre-extract).
Removed the call site. Kept the CSC_IDENTITY_AUTO_DISCOVERY env
vars as belt-and-suspenders against a future electron-builder
change that might revive cert auto-discovery.
* fix(desktop): use no-op sign function instead of sign=null
VM run 6 still hit the symlink crash even with signtoolOptions.sign=null.
electron-builder 26.8.1 treats null as 'use the default signtool path'
rather than 'skip signing', so the winCodeSign fetch + extraction still
fired for the bundled prebuild re-sign.
The Electron docs (electronjs.org/docs/latest/tutorial/code-signing)
make it clear signing is OPTIONAL and unsigned apps work fine — users
just see SmartScreen on first launch. The electron-builder mechanism
for 'don't actually sign anything' is to supply a custom sign function
(via signtoolOptions.sign: '<path-to-cjs-module>') that resolves
without invoking signtool.
build-noop-sign.cjs is that module — a 5-line async function that
returns undefined. electron-builder calls it for every binary it would
have signed, gets back a resolved promise, and considers each binary
'signed.' No signtool spawn, no winCodeSign fetch, no symlink crash.
When Nous's cert arrives, replace this file with a real signing hook
(@electron/windows-sign-based or a direct signtool invocation). The
architecture's signing-ready and the cutover is a one-file edit.
* fix(desktop): signAndEditExecutable=false to skip signtool path entirely
After reading app-builder-lib/winPackager.js line 216 + 231 directly:
signAndEditExecutable is the ACTUAL hardcoded gate that short-circuits
both signApp() (which signs Hermes.exe + every shouldSignFile match
including bundled prebuilds) AND createTransformerForExtraFiles().
None of signtoolOptions.sign / sign:null / sign:<custom-fn> gate the
winCodeSign download — that happens before they're consulted.
What we lose: rcedit also runs through signAndEditResources, so
disabling this drops PE metadata (file properties showing 'Hermes' /
'Nous Research' / file description). Cost is real but bounded:
* Hermes.exe filename, icon, asar contents, app identity intact
* Task Manager shows 'Hermes.exe' (the filename) not 'Hermes' (PE
description) — minor downgrade
* Start menu, taskbar, window title all work normally
* SmartScreen will warn once (unsigned, same as before)
When the cert lands, flip signAndEditExecutable back to default true,
both signing AND rcedit return, PE metadata is restored.
Removes the no-op sign function (build-noop-sign.cjs) since
signAndEditExecutable=false prevents signtool from being invoked at
all — the custom hook never gets called either.
* feat(install.ps1): write .hermes-bootstrap-complete marker at end of install
The desktop app's main.cjs resolver ladder has a 'bootstrap-needed' rung
that fires when .hermes-bootstrap-complete is missing from
ACTIVE_HERMES_ROOT. Pre-Hermes-Setup, this marker was written by the
packaged-desktop's own bootstrap-runner.cjs at the end of its install
flow. Now that Hermes-Setup.exe runs install.ps1 directly, install.ps1
needs to own the marker — otherwise the desktop sees no marker on first
launch and triggers its legacy first-launch bootstrap (re-running
install.ps1 from inside Electron, the exact recursion Hermes-Setup.exe
was supposed to obviate).
Implementation:
* New Stage-BootstrapMarker (worker) → Write-BootstrapMarker (helper)
* Slotted in the manifest right after platform-sdks, before the
interactive configure/gateway stages, so it runs unconditionally
when the install reaches the finalize phase
* Schema mirrors apps/desktop/electron/main.cjs writeBootstrapMarker /
isBootstrapComplete EXACTLY: {schemaVersion: 1, pinnedCommit,
pinnedBranch, completedAt}. Schema version stays at 1 so old
desktops that read marker files written by future install.ps1s
can still parse them.
* pinnedCommit comes from -Commit flag (Hermes-Setup.exe passes it)
or falls back to 'git rev-parse HEAD' in InstallDir
* pinnedBranch from -Branch flag, defaults to 'main' matching
install.ps1's own param default
Two PS-5.1 gotchas baked into comments:
* The ?. null-conditional operator doesn't exist pre-PS7; use
explicit if-checks on Get-Command results
* Set-Content -Encoding UTF8 emits a BOM in 5.1 and Node's plain
JSON.parse rejects BOM — write via .NET's UTF8Encoding(false)
to produce BOM-less JSON the desktop's readJson() can parse
* feat(installer): drive in-app updates through the Tauri installer
Converge update on the same principle as bootstrap: one driver owns all
repo mutation. The desktop becomes a pure consumer that hands off to
Hermes-Setup.exe --update instead of re-implementing git/pip in Electron.
- hermes desktop --build-only: build without launching, so the installer
owns the post-update launch (CLI keeps build logic single-sourced).
- Installer AppMode {Install,Update} from argv; get_mode exposed to the UI.
- Installer self-copies to HERMES_HOME/hermes-setup.exe on install success
(no-op guard during --update re-invocation to avoid the locked-exe copy).
- Installer --update flow (update.rs): wait for the desktop to release the
venv shim, run 'hermes update --yes --gateway' (branch on exit 0/2/other),
then 'hermes desktop --build-only', then launch the rebuilt desktop. Reuses
the bootstrap event channel + progress UI via a synthetic two-stage manifest.
- Desktop applyUpdates() gutted (~105 lines of git/stash/pull/pyproject/pip
removed) -> thin handoff: spawn updater, app.quit() to free the shim.
Detection (checkUpdates, commit changelog, behind-count) kept intact.
- install.ps1 creates Start Menu + Desktop shortcuts to the packed Hermes.exe
(never bare 'hermes desktop', which would rebuild every launch).
* test update
* fix(installer): pass --branch to hermes update in the --update flow
The install is a detached-HEAD checkout of a pinned commit. Without
--branch, 'hermes update' fell back to its default (main) and switched
the checkout to main — a divergent branch that lacks the desktop CLI
command — so the update targeted the wrong branch and the rebuild stage
failed with 'invalid choice: desktop'.
Thread BUILD_PIN_BRANCH (the branch this installer was built against,
and the same branch the desktop detected the update on) into
'hermes update --branch <b>' so update + rebuild stay on-branch.
* test update
* fix(installer): stamp Hermes icon onto Hermes.exe via rcedit (no winCodeSign)
The unpacked Hermes.exe showed the stock Electron icon + name in the
taskbar because build.win.signAndEditExecutable=false disables BOTH
electron-builder's signing AND its rcedit metadata/icon stamping. That
flag is load-bearing: enabling it re-triggers signtool -> winCodeSign,
whose macOS symlinks crash 7-Zip on non-admin Windows (unfixable dead end).
Decouple identity-stamping from signing entirely: after npm run pack,
run rcedit ourselves on the produced exe.
- Add rcedit as a direct devDependency of apps/desktop (the transitive
electron-winstaller copy is fragile).
- apps/desktop/scripts/set-exe-identity.cjs: Node helper that calls
rcedit's named export to set icon + ProductName/FileDescription/
CompanyName. Node builds argv natively — avoids the PowerShell->exe
->JSON double-escaping that broke the app-builder rcedit path.
- install.ps1 Set-DesktopExeIdentity invokes the script after the build,
before shortcuts. Best-effort: failure keeps the stock icon, never
fails the install. rcedit is a pure PE editor — no signtool, no
winCodeSign, no symlinks.
Verified locally: stamping a copy of the built Hermes.exe embeds the
32x32 icon and sets ProductName=Hermes.
Also fix update-path success-screen flash: in update mode the installer
hands off + exits in ~600ms, so don't route to the 'launch Hermes'
success view (it flashed before the window closed).
* update test
* fix(desktop): show 'hermes update' guidance for CLI installs instead of dead-end error
A user who installed via the CLI (irm|iex / install.sh) then ran
`hermes desktop` has no staged hermes-setup.exe, so clicking Update
in-app hit resolveUpdaterBinary()=null and showed a misleading error
('re-run the Hermes installer') with a Try-again button that could
never succeed — a dead loop for a perfectly valid install.
Treat the no-updater case as an intentional outcome, not a failure:
- main.cjs applyUpdates returns { ok:true, manual:true, command:'hermes update' }
(no throw, no 'error' stage) when no updater binary exists.
- New 'manual' update stage + apply-state.command thread the command to the UI.
- updates-overlay ManualView: a polished terminal-native card with the
exact command and a copy button, framed as the correct path for a CLI
user rather than an error.
GUI-installer users are unaffected — hermes-setup.exe present => seamless
auto-update runs as before. Zero new process orchestration; can't fail
the update demo.
* update test
* fix(gui): pin /api/hermes/update to the current branch
The desktop command-center 'update' action hits POST /api/hermes/update,
which spawned bare `hermes update` with no --branch. cmd_update then
falls back to its default (main) and checks the working tree OUT of the
tracked branch — a bb/gui install silently jumped to main and lost the
desktop CLI.
Resolve the checkout's current branch and pass --branch <current> from
this endpoint only. The engine default (main) is DELIBERATELY unchanged:
bare `hermes update` from a terminal, the gateway /update bot command,
and the CLI/TUI relaunch path all keep their long-standing 'update against
main' contract for the existing user base. Only the GUI button is scoped
to update-the-branch-you're-on. Detached HEAD / git failure falls back to
the bare default.
* update test
* fix(desktop): branch-pin the CLI manual-update command card
The 'Update from your terminal' card (shown to CLI installs with no staged
updater) hardcoded bare `hermes update` — which defaults to main and would
switch a bb/gui (or any non-main) checkout off-branch. Same bug we fixed for
the GUI button, leaked into the card's copy text.
Resolve the checkout's current branch and show `hermes update --branch
<current>` for non-main checkouts; keep it bare for main so the card stays
clean. Best-effort: bare fallback if branch detection fails. Matches the
GUI button + installer --update contract; bare terminal/bot/TUI update
paths still default to main, unchanged.
* docs: phragg was here
* feat(desktop): lead onboarding with Nous Portal + fix fresh-install detection (#34970)
- Feature Nous Portal as the primary onboarding card (Recommended tag,
app logo, single pitch line); collapse other OAuth providers behind an
"Other providers" disclosure whose open/closed state persists.
- Surface OpenRouter as a one-click API-key option inside the disclosure;
move "I have an API key" to a quiet bottom-right link.
- Treat "no provider configured" as a normal onboarding state, not a red
error banner (provider-setup-errors copy match).
- Fix setup.runtime_check: it reported ready when the resolved runtime had
an empty credential or only implicit Bedrock/IAM, so fresh installs never
saw onboarding. Now requires a usable credential.
- Auto-wire Windows fonts for WSL2 users so the renderer renders real
Segoe UI instead of the DejaVu fallback; make WSL detection env-independent
via the /proc kernel marker.
* feat(desktop): live elapsed timer on install bootstrap steps
The first-launch install overlay showed a static "Installing" with no
motion, so long steps (notably the repo clone) looked frozen. Stamp each
stage's start time on the running transition and tick once a second so the
active step shows live elapsed (e.g. "Installing · 1:23"), plus elapsed on
the overall current-step line. Completed steps keep their final duration.
* fix(desktop): resolve PortableGit for update checks + reserve titlebar tools space
- runGit() hardcoded spawn('git'), which ENOENTs on fresh installer-driven
Windows installs (git is PortableGit under %LOCALAPPDATA%\hermes\git, never
on PATH) — so "Check for updates" failed with "Couldn't check for updates".
Add resolveGitBinary() mirroring findGitBash (PortableGit → Git-for-Windows
→ PATH) and use it in runGit.
- PageSearchShell rendered a full-width search input in the titlebar row, so
on Windows its right edge slid under the fixed top-right tools + native
window controls. Reserve that footprint via --titlebar-tools-* vars.
* fix(desktop): stop streaming caret from shifting layout on completion
The streaming caret (::after on the running message's last child) was an
in-flow inline-block adding ~0.78em of inline width, which could wrap the
last line mid-stream; when the caret is removed on completion the line
un-wraps and reflows — the visible post-response layout shift. Net-zero its
inline advance with a compensating negative margin so it paints at the text
end without consuming layout width.
* fix(desktop): stop completed-message layout shift while streaming
The assistant message action bar used `hideWhenRunning`, which unmounts it
whenever the thread is streaming. Since the bar reserves vertical space in
each completed assistant message's footer (it's invisible-until-hover via
opacity, not via mount), unmounting it collapsed every prior turn by the
bar's height — then remounting on resolve grew them back, shifting the whole
conversation (visible as "padding appears above the last user message").
Drop hideWhenRunning so the footer height is constant; the bar stays
invisible during streaming via its existing opacity/pointer-events gating.
* fix(merge): keep windows-footgun suppressions inline
* fix(merge): keep remaining gateway footgun suppressions inline
* fix(merge): restore contracts caught by main-target CI
* fix(dashboard): honor injected HERMES_DASHBOARD_SESSION_TOKEN
The desktop shell mints a session token and signs its /api + /api/ws
calls with it via HERMES_DASHBOARD_SESSION_TOKEN, but the main-merge
restored a web_server.py that ignored the env var and minted its own
random _SESSION_TOKEN -- so every desktop request 401'd and the UI
reported "gateway offline". Read the injected token (fall back to a
fresh random one) so loopback HTTP + WS auth line up.
Adds a regression test so a future merge can't silently drop the read.
* fix(desktop): align fresh-install home so upgraders don't brick
Two related first-launch bugs on machines with a legacy ~/.hermes:
- install.ps1 hardcoded $HermesHome/$InstallDir to %LOCALAPPDATA%\hermes
and ignored the HERMES_HOME the desktop passes through. The desktop
freezes HERMES_HOME at module load and prefers a legacy ~/.hermes when
%LOCALAPPDATA%\hermes is absent, so the installer wrote to a different
home than the shell read -> "Could not connect to Hermes gateway". Honor
$env:HERMES_HOME in the param defaults.
- isBootstrapComplete() trusted the marker + checkout without verifying a
runnable venv, so an interrupted/split install spawned a dead backend
instead of re-bootstrapping. Also require the venv python to exist.
* fix(dashboard): allow packaged desktop file:// origin on loopback WS
The packaged Electron desktop loads its renderer over file://, so its
/api/ws handshake carries Origin: file:// (or null). The DNS-rebinding
WebSocket Origin guard only accepted http(s) origins matching the bound
host, so it rejected the desktop's own renderer with 4403 -> "Could not
connect to Hermes gateway" on macOS.
A browser DNS-rebinding attacker can only ever present an http(s) origin
(the site hosting the malicious page); it cannot forge file://, null, or
a custom app scheme AND hold the loopback session token. So on loopback
binds we now trust non-web origins -- the token in _ws_auth_ok remains
the real authenticator. Public/gated binds still reject them, and
cross-site http(s) origins are still rejected everywhere.
* fix(desktop): resolve renderer assets relative to BASE_URL
Absolute public asset paths (/apple-touch-icon.png, /ds-assets/...) work
under the dev server but break in the packaged app, where the renderer is
loaded from file://.../index.html and a leading slash resolves to the
filesystem root -> broken onboarding provider icon and backdrop image on
macOS. Prefix these with import.meta.env.BASE_URL so they resolve next to
the bundled index.html in both dev and packaged builds.
* feat(desktop): automate first-launch bootstrap on macOS/Linux
Previously a packaged macOS/Linux app with no Hermes install hit a
dead-end ("first-launch install is not yet automated -- run install.sh
manually") because install.sh lacked the staged protocol install.ps1
exposes. Now both platforms bootstrap on first launch with the same
structured, per-step progress UI as Windows.
- install.sh: add --manifest / --stage / --json / --non-interactive plus
a stage dispatcher (prerequisites, repository, venv, python-deps,
node-deps, path, config, setup, gateway, complete). User-input stages
(setup, gateway) are skipped under --non-interactive; the in-app
onboarding overlay owns API keys/model, matching the Windows flow.
Each stage runs inside the install dir (its own process) and a new
--commit flag pins the checkout to the build-stamp SHA.
- bootstrap-runner.cjs: drive the staged manifest/stage/JSON protocol for
both install.ps1 (PowerShell) and install.sh (bash), selected by
installer kind; removed the single-blob POSIX shim.
- main.cjs: drop the macOS/Linux unsupported-platform dead-end so the
bootstrap-needed path runs the installer on every platform.
* fix(dashboard): return 404 JSON for unmatched /api paths instead of SPA HTML
The SPA catch-all (serve_spa) served index.html for any unmatched GET,
including unregistered /api/* endpoints. A missing API route therefore
came back as <!doctype html> with status 200, and JSON clients (the
desktop app's fetchJson) crashed with an opaque
'SyntaxError: Unexpected token <' instead of a clear error.
- web_server.py: unmatched /api or /api/... now returns 404 JSON
('No such API endpoint'); non-api paths still serve the SPA for
client-side routing.
- main.cjs fetchJson: detect an HTML body / text/html content-type on a
2xx response and reject with a clear message naming the URL, rather
than a raw JSON.parse SyntaxError. Empty bodies resolve to null;
malformed JSON reports the URL plus a snippet.
* say 'OS appearance' instead of 'macOS appearance'
* feat(install): add --include-desktop stage + PowerShell-style flags to install.sh
Brings install.sh to parity with install.ps1's bootstrap surface so the
shared Rust/Tauri bootstrapper (apps/bootstrap-installer) can drive a
macOS/Linux install the same way it drives Windows.
- Accept the PowerShell-style aliases the bootstrapper emits to both
installers: -Commit / -Branch (alongside existing -Manifest / -Stage /
-Json / -NonInteractive).
- Add --include-desktop / -IncludeDesktop. When set, the manifest gains a
'desktop' stage (immediately before 'complete'), and a new install_desktop
runs a root workspace `npm install` + `npm run pack` (electron-builder
--dir, signing auto-discovery disabled) to produce release/mac*/Hermes.app
-- mirroring install.ps1's Install-Desktop / Stage-Desktop.
- The flag is opt-in, exactly like Windows: the signed bootstrap installer
passes it; the Electron app's own first-launch bootstrap and the CLI
one-liner omit it (building the desktop from inside the running app would
clobber it).
* fix: tts endpoints
* macOS desktop: install + in-app self-update (#35607)
* fix(installer): align macOS HERMES_HOME with the rest of the stack
paths.rs computed the macOS Hermes home as ~/Library/Application Support/
hermes, but nothing else does: hermes_constants.get_hermes_home() (Python),
scripts/install.sh, and the Electron desktop's resolveHermesHome() all use
~/.hermes on macOS. The drift meant the Tauri installer wrote the install to
one directory and the desktop looked for it in another, so a fresh GUI
install never found its backend (the file's own comment warned this exact
drift would break things). Use ~/.hermes on macOS to match.
* fix(install.sh): always emit a stage result frame on failure
Stage helpers (clone_repo, install_deps, check_python, …) were written for
the monolithic flow and call `exit 1` on failure. Under `--stage`, that
terminated the process before the JSON result frame was printed, so the
installer's parse_stage_result saw "no frame" instead of a clean
{ok:false,...} contract response. Run the stage body in a subshell so an
`exit` only unwinds the subshell and the parent still emits the frame.
* feat(install.sh): auto-provision git on macOS/Linux (parity with install.ps1)
install.ps1 downloads PortableGit on Windows, but install.sh just printed a
"please install git" hint and exited — so a fresh Mac with no developer tools
(no Xcode CLT → no git) couldn't get past the clone step. check_git now tries
to install git before bailing:
- macOS: Homebrew if present (headless), else `xcode-select --install`
(the CLT prompt also provides the compiler some wheels need), polling for
git to appear.
- Linux: apt/dnf/pacman via sudo when available.
Falls back to the manual instructions only if auto-provision fails.
* feat(desktop): in-app GUI+backend self-update on macOS/Linux
On Windows the staged Hermes-Setup binary drives updates (quit → hermes
update → hermes desktop --build-only → relaunch). The mac drag-install has no
such binary, so "Update now" previously just printed `hermes update`.
Since there's no venv-shim file lock on POSIX, the desktop can drive the whole
update itself. applyUpdates now, when no staged updater exists on mac/linux:
1. runs `hermes update --yes [--branch <current>]` (backend git pull + deps),
2. runs `hermes desktop --build-only` (OS-aware GUI rebuild) with the
Hermes-managed Node + venv on PATH,
3. spawns a detached swapper that waits for this process to exit, dittos the
freshly built Hermes.app over the running bundle, clears quarantine, and
relaunches.
Degrades to "backend updated — restart to load the new GUI" if the rebuild
fails or there's no .app bundle to swap (dev run, Linux AppImage).
* chore: uptick
* chore: uptick
* chore: linux build
* fix(install): detect xcode-select git stub on fresh macOS
* chore: bump
* fix(desktop): repair voice dictation on Windows
Voice dictation was broken on Windows in two ways:
1. Mic access was denied. The Electron permission request handler only
granted 'media' requests whose details.mediaTypes included 'audio',
but Chromium on Windows frequently fires the mic request with an empty
mediaTypes array, so getUserMedia threw NotAllowedError. The handler
now grants audio-capture when mediaTypes includes 'audio' OR is
empty/absent, handles the 'audioCapture' permission name, and adds a
setPermissionCheckHandler (the synchronous path Chromium also consults
for getUserMedia on Windows). Video is still denied.
2. Transcripts went nowhere. The composer's insertText handler (used by
dictation and other inserts) only updated the assistant-ui composer
store via setText, never the contentEditable editor DOM. The
draft->editor sync effect only re-renders the editor when it is NOT
focused, and dictation runs while the editor has/regains focus, so the
transcript was stored but never shown and could not be sent. insertText
now renders into the editor DOM and places the caret, mirroring
appendExternalText.
Also hardens fetchJson: a 2xx response with an HTML body (or text/html
content-type) now rejects with a clear message naming the URL instead of
an opaque JSON.parse 'Unexpected token <' error.
* feat(desktop): route Nous subscribers onto the Tool Gateway from the GUI
When the GUI sets the main provider to Nous via POST /api/model/set, call
the same apply_nous_managed_defaults the CLI uses after model selection, so
GUI/onboarding users land on the Nous Tool Gateway the same way CLI users do
— no separate prompt, no duplicated logic.
Purely additive: apply_nous_managed_defaults skips any tool where the user
has a direct key (FIRECRAWL_API_KEY, FAL_KEY, etc.) or explicit config, so it
never overwrites a user's own setup. Only unconfigured tools get routed.
- web_server.py: in set_model_assignment (scope=main, provider=nous), resolve
enabled toolsets and apply managed defaults; guarded so a Portal hiccup never
blocks saving the model. Returns routed tools as gateway_tools.
- onboarding.ts: surface a 'Tool Gateway enabled' toast listing routed tools.
- types/hermes.ts: add gateway_tools to ModelAssignmentResponse.
- tests: cover nous-applies, non-nous-skips, and failure-doesnt-block-save.
* feat(desktop): mirror hermes model free/paid curation in GUI onboarding
GUI onboarding picked models[0] from /api/model/options, which ignores the
Nous free/paid tier — a free user could land on a paid default (e.g.
anthropic/claude-opus-4). Now the recommended default mirrors what `hermes
model` does.
- web_server.py: new GET /api/model/recommended-default?provider=<slug>. For
Nous it runs the same curation as the CLI (get_curated_nous_model_ids +
pricing + check_nous_free_tier + union_with_portal_{free,paid}_recommendations
+ partition_nous_models_by_tier) so free users get a free model and paid users
get the curated default. Other providers fall back to the first curated model.
Never 500s — returns empty model on error so onboarding degrades gracefully.
- hermes.ts: getRecommendedDefaultModel client + RecommendedDefaultModel type.
- onboarding.ts: fetchProviderDefaultModel prefers the recommended endpoint,
falls back to models[0] when unavailable.
- tests: free-tier picks free model, paid-tier picks curated default, failure
returns empty without 500.
* feat(desktop): show model pricing + free/paid tier gating in GUI picker
The CLI `hermes model` picker shows per-model $/Mtok pricing and gates paid
models on free Nous accounts. The GUI picker showed bare model names. Bring it
to parity across both the model-picker dialog and onboarding confirm card.
Backend:
- inventory.build_models_payload gains a pricing=True flag → _apply_pricing
enriches each provider row with formatted per-model pricing
({input,output,cache,free}) via the same _format_price_per_mtok the CLI uses,
and for Nous adds free_tier + unavailable_models (paid models a free user
can't select) via check_nous_free_tier + partition_nous_models_by_tier.
Best-effort: any pricing/tier failure is swallowed and fails open (no gating).
- /api/model/options and TUI model.options now pass pricing=True so the
global picker and in-session picker both carry pricing.
Frontend:
- ModelOptionProvider gains pricing/free_tier/unavailable_models; new
ModelPricing type.
- model-picker dialog renders In/Out $/Mtok (or a Free pill) per model, a
Free tier/Pro badge on the Nous heading, and disables + grays unavailable
paid models for free users with a 'Pro models need a paid subscription' note.
- onboarding confirm card shows the chosen model's price + tier badge.
Tests: test_inventory_pricing covers price formatting, free-tier gating,
paid no-gating, providers without pricing, and swallowed failures.
* fix(desktop): GUI model picker shows curated Nous list in curated order
Two bugs made the GUI Nous model list diverge from the `hermes model` CLI picker:
1. Backend (model_switch.py): the Nous row in list_authenticated_providers
fell through to cached_provider_model_ids("nous"), dumping the full live
/v1/models catalog (~50 vendor-prefixed models, alphabetical). Now it uses
the curated list AND applies the Portal free/paid recommendation union —
exactly like _model_flow_nous in main.py — so newly-launched models such as
stepfun/step-3.7-flash:free surface in curated order. Best-effort: falls
back to the curated list alone if the Portal fetch fails.
2. Frontend (model-picker.tsx): cmdk's Command had shouldFilter on (default),
which re-sorts items by fuzzy-match score (≈alphabetical) and ignores array
order. Set shouldFilter={false} + own the search term and do an
order-preserving substring filter, so the backend's curated order is shown
verbatim.
* feat(desktop): add/switch providers from the model picker via onboarding reuse
The model picker could only select models from already-authenticated
providers. Switching to a new provider had no in-app path. Rather than
duplicate provider UI, reuse the existing onboarding provider selector
(featured Nous + other providers + API-key form + device-code/PKCE flow +
model-confirm with pricing/tier).
- onboarding store: add a 'manual' flag with startManualOnboarding() /
closeManualOnboarding(). Manual mode forces the onboarding overlay to show
even when configured===true and refreshOnboarding no longer auto-dismisses
on runtime-ready (the app is already working — the user is just adding or
switching a provider).
- onboarding overlay: render when manual even if configured; show a Close
button (the first-run flow has none since the app can't run yet).
- model picker: 'Add provider' footer button opens the onboarding selector;
ModelResults lists only configured (model-bearing) providers.
* feat(desktop): add PUT /api/tools/toolsets/{name} enable/disable endpoint
* feat(desktop): add toggleToolset RPC binding
* feat(desktop): toolset enable/disable switch in Tools settings
* feat(desktop): tool configuration parity in GUI Tools settings
Bring the desktop GUI Tools settings to parity with the CLI `hermes tools`
for provider selection and API-key configuration.
Backend (hermes_cli/web_server.py):
- GET /api/tools/toolsets/{name}/config - provider matrix + key status
- PUT /api/tools/toolsets/{name}/provider - persist provider selection
Shared core (hermes_cli/tools_config.py):
- Extract apply_provider_selection / _write_provider_config from the
interactive _configure_provider so the CLI and GUI write identical
config keys (web.backend, tts.provider, browser.cloud_provider, plugin
image/video providers, use_gateway flags) through one code path.
Desktop UI:
- ToolsetConfigPanel: provider list with select, per-provider API-key
entry (set/replace/clear/reveal via the shared env RPCs), Ready/Needs
keys state, guidance for Nous-auth and post-setup providers.
- Wire the Configured/Needs keys pill to expand the panel inline; refresh
the toolset list after key changes so the pill updates live.
- Add getToolsetConfig / selectToolsetProvider RPC bindings + types.
Post-setup (OAuth/install) flows still defer to the CLI; see
docs spike findings for the planned /api/tools/setup/* endpoint family.
Tests: backend round-trip + 400 cases for the new endpoints and
apply_provider_selection; desktop vitest coverage for the config panel
(provider render, select, key save). No change-detector tests.
Also removes three stale completed plan docs.
* fix(desktop): show real Hermes version + sync package.json on release
The desktop app version was disconnected from the Hermes version: the
release script bumped pyproject.toml + hermes_cli/__init__.py but never
touched apps/desktop/package.json, which sat stale at 0.0.2 (lockfile at
0.0.1).
- main.cjs: hermes:version IPC now resolves __version__ from
hermes_cli/__init__.py (the canonical source release.py bumps) via a new
resolveHermesVersion() helper, falling back to app.getVersion() when the
source tree isn't readable. The About panel now always shows the live
Hermes version and can't drift.
- release.py: update_version_files() also bumps apps/desktop/package.json
in lockstep with pyproject (top-level version only; dep specs untouched).
- One-time catch-up: package.json 0.0.2 -> 0.15.1 and the lockfile root
mirrors 0.0.1 -> 0.15.1.
* fix(desktop): stamp exe identity in afterPack hook so updates stay branded
The packed Hermes.exe reverted to the stock Electron icon + "Electron" name
after an in-app update. The icon/identity stamp (rcedit) lived only in
install.ps1, but the installer's --update path rebuilds the desktop via
`hermes desktop --build-only` -> `npm run pack`, which never ran install.ps1
and so never stamped the rebuilt exe.
Move the stamp into an electron-builder afterPack hook so it runs for EVERY
packed build regardless of caller (first install, hermes desktop, the update
rebuild, or a manual npm run pack):
- set-exe-identity.cjs: refactor to export stampExeIdentity(exe, desktopRoot);
still runnable as a standalone CLI.
- after-pack.cjs (new): afterPack hook calling stampExeIdentity. Windows-only
guard; best-effort (logs + resolves on failure, never fails the build).
- package.json: register build.afterPack.
- install.ps1: remove the now-redundant Set-DesktopExeIdentity function + call;
the hook handles it during npm run pack.
electron-builder's own rcedit step stays disabled (signAndEditExecutable=false)
to avoid the signtool -> winCodeSign -> 7-Zip macOS-symlink crash on non-admin
Windows; the hook runs rcedit directly (pure PE resource edit, no signing).
* fix(desktop): export afterPack hook as exports.default so electron-builder runs it
The afterPack hook used `module.exports = fn`, which electron-builder's hook
loader doesn't pick up — it expects the function as the module's default
export (the same shape afterSign/notarize.cjs uses). The hook silently never
ran, so even first install shipped the stock "Electron" exe.
Switch to `exports.default = async function afterPack(...)`. Verified with a
real `npm run pack`: electron-builder now invokes the hook and the produced
release/win-unpacked/Hermes.exe carries ProductName/FileDescription=Hermes.
* chore(desktop): drop auto-build release CI in favor of manual build + upload
Remove desktop-release.yml (nightly-on-main + stable publish). Installers
are now built locally per platform and uploaded to a GitHub Release by hand;
the website points at them via NEXT_PUBLIC_HERMES_DL_* env. Update README +
docs and drop the dead desktop-nightly channel links.
* fix(desktop): stable shortcut icon + bust icon cache so updates repaint
Symptom on a freshly-installed laptop: Hermes.exe itself shows the correct
Hermes icon (Explorer reads the live exe's stamped PE resource), but the
desktop shortcut still draws the stock Electron icon.
Cause: New-DesktopShortcuts set IconLocation to "<exe>,0", so Windows cached
the icon it extracted from the exe at shortcut-creation time. On an update the
exe gets re-stamped, but the shortcut keeps rendering the stale cached bitmap.
- package.json: ship assets/icon.ico beside the exe via extraResources
(-> resources/icon.ico). Verified with a real npm run pack.
- install.ps1 New-DesktopShortcuts: point IconLocation at resources/icon.ico
(fallback to <exe>,0 if absent) — a dedicated .ico is cache-stable and skips
the per-exe extraction that goes stale. Then run `ie4uinit.exe -show` to bust
the shell icon cache so the shortcut repaints immediately instead of showing
the old Electron icon until reboot.
Both best-effort; never fail an otherwise-good install.
* dummy update
* feat(desktop): self-heal update branch + backend contract guard
Two fixes for the bb/gui→main transition:
- Self-update self-heals: if the tracked branch (e.g. bb/gui) no longer
exists on origin (merged + deleted), the desktop updater falls back to
main and persists it. Read-only ls-remote probe that only flips on a
definitive "ref absent" (exit 2), never on a transient network error, so
already-installed clients migrate themselves with no manual flip.
- Backend contract guard: tui_gateway reports DESKTOP_BACKEND_CONTRACT in
session runtime info; the desktop warns with a one-click "Update Hermes"
when the backend predates the GUI's required contract (e.g. a bb/gui app
pointed at a main checkout) instead of failing cryptically downstream.
* docs(desktop): rewrite README to match current install/update/build flow
The old README contradicted itself (claimed a bundled Python payload while
also saying it no longer bundles source) and predated cross-platform support.
Rewrite for accuracy: Linux is a first-class build target, install.sh/install.ps1
both drive the staged bootstrap, the real self-update handoff (Windows
Hermes-Setup vs in-app macOS/Linux), and the bb/gui→main self-heal + backend
contract guard.
* docs(desktop): rewrite README as a real product readme
Lead with what the app is and how to get it (download an installer, or
`hermes desktop` for existing CLI users) plus a plain-language feature list,
then keep contributor/build/internals as a clearly separated secondary section.
* docs(desktop): fix install framing — releases no longer auto-build installers
Lead with the install-with-Hermes path (`--include-desktop` / `hermes desktop`),
which always works, and describe prebuilt installers as manually published when
a release ships them rather than implying CI attaches them to every release.
* docs(desktop): match base repo README style
Adopt the root README's conventions: centered title + badge row, bold
one-liner intro, a feature <table> grid, --- section dividers, and a
Community / License footer.
* feat(desktop): recover from gateway boot failures + validate API keys on entry (#35864)
Fresh installs that hit a gateway boot failure had no recovery path: the
shell rendered dead ("gateway offline"), logs were undiscoverable, and a
mistyped API key was accepted because onboarding only checked credential
presence, not validity.
- Add BootFailureOverlay: a top-level recovery surface (Retry, Repair
install, Use local gateway, Open logs + inline recent logs) that mounts
on any hard boot failure, including post-install. Trims the now-redundant
recovery button from the onboarding Preparing panel.
- Add hermes:logs:reveal / :recent IPC (reveal desktop.log) and a
hermes:bootstrap:repair IPC that drops the bootstrap marker to force a
clean reinstall. Surface "Open logs" in Gateway settings too.
- Add POST /api/providers/validate: a live per-provider probe
(OpenRouter/OpenAI/xAI/Gemini key check, local endpoint connectivity)
wired into saveOnboardingApiKey so a rejected key blocks before it's
persisted, while an unreachable probe falls through (offline-safe).
* test(model-catalog): fix stale nous picker test after curated-list change
ac2e48907 made the GUI/picker Nous row use the curated list (curated["nous"]
= get_curated_nous_model_ids()) + Portal union, matching the `hermes model`
CLI — but test_picker_nous_row_uses_manifest still asserted the old 2-model
manifest snapshot, breaking the test shard.
Rewrite it as an invariant: stub the Portal union to passthrough and assert the
row equals get_curated_nous_model_ids() computed under the same conditions, so
it tracks the real contract instead of a hardcoded model list that rots on every
catalog update.
---------
Co-authored-by: emozilla <emozilla@nousresearch.com>
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
Co-authored-by: Austin Pickett <pickett.austin@gmail.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: ethernet <arilotter@gmail.com>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Native Windows is out of beta. Removes the early-beta warnings, headings,
and rough-edge framing across the README and docs (EN + zh-Hans), keeping
the WSL2-only dashboard PTY caveat. Historical RELEASE_v0.14.0.md notes are
left intact since they accurately describe the state at that release.
- README: Windows install + cross-platform notes
- index.mdx, installation.md: headings, warning admonitions, parity note
- windows-native.md: title/sidebar_label/warning, provider-hunting tip
- contributing.md, nous-portal.md: cross-platform / Portal parity prose
- Repoint cross-links to the renamed installation#windows-native-powershell
anchor (EN) and #windows原生powershell (zh, also fixes pre-existing drift)
For grouped provider families, the descriptive text now lives only on the
collapsed top-level group row. The member sub-picker rows show just the
short provider label (no parenthetical tui_desc), so the description is not
duplicated one layer down.
Ungrouped providers are unaffected — they have no group layer, so their own
row keeps its full tui_desc.
- main.py: member sub-picker uses provider_labels (label) instead of
canonical_descs (tui_desc).
- Telegram already showed labels + model count on member buttons; group
buttons keep Label ▸ (count) since inline keyboards can't fit a long blurb.
Member labels retain their short disambiguators (e.g. 'MiniMax (OAuth)') so
the sub-picker rows stay distinguishable.
The 7 consolidated provider families (OpenAI, xAI Grok, GitHub Copilot,
Google Gemini, Kimi / Moonshot, MiniMax, OpenCode) collapse to one
top-level picker row. Previously that row showed only the bare group
label (e.g. `OpenAI ▸`); now it carries a short blurb describing the
endpoints folded inside (e.g. `OpenAI ▸ (Codex CLI or direct OpenAI API)`).
- models.py: extend PROVIDER_GROUPS tuples to (label, description, members);
group_providers() emits the description on group rows.
- main.py: CLI picker renders `<label> ▸ (<description>)` for group rows.
- telegram.py: update the group tuple unpack (button text keeps the member
count, which fits inline keyboards better than a long blurb).
- tests: assert every group has a non-empty description and the fold emits it.
Member-specific detail still lives in each member's tui_desc and shows in
the drill-down sub-picker. Slug identity, --provider, /model paths unchanged.
Update the tui_desc text shown for each provider in the interactive
`hermes model` / setup wizard / `/model` pickers. Pure copy refresh —
slugs, labels, PROVIDER_GROUPS folding, and all typed paths are unchanged,
so the 7 grouped families (OpenAI, xAI Grok, GitHub Copilot, Google Gemini,
Kimi / Moonshot, MiniMax, OpenCode) still fold identically.
Also aligns the auto-injected alibaba-coding-plan provider description to
the same parenthetical style.
Follow-up to the synthetic-notification DM-topic routing fix. The new
_is_telegram_dm_topic_target probed the adapter's _get_dm_topic_info via
instance-level getattr, which a MagicMock auto-creates as a truthy callable —
so any test double with a non-dm chat_type and a thread_id would be
misclassified as a DM topic lane and have the fallback routing keys injected.
Resolve the method on type(adapter) and treat only dict-shaped returns as an
operator-declared topic, mirroring the existing guard in
_rename_telegram_topic_for_session_title. Update the home-channel startup test
to declare _get_dm_topic_info on a real adapter subclass instead of patching a
MagicMock onto the instance.
Background tasks on non-local backends (SSH/Docker/Modal/Daytona/Singularity)
go through `ProcessRegistry.spawn_via_env`, which builds a hand-crafted,
shell-safe wrapper:
mkdir -p T && ( nohup bash -lc CMD > LOG 2>&1; rc=$?; ... ) & echo $! > PID && cat PID
`BaseEnvironment.execute()` unconditionally ran `_rewrite_compound_background`
on every command, including this wrapper. The rewrite (meant to defuse the
`A && B &` subshell-wait trap for user commands) turns `( ... ) & echo $!` into
`{ ( ... ) & } echo $!` — note `} echo` with no separator, which is a bash
syntax error. The wrapper then never produces a PID, the redirected output file
is never created, and the agent sees an immediate exit code -1. This breaks
*every* background launch on a non-local backend (e.g. a simple
count-and-redirect script over SSH), not just edge cases.
Fix:
- Add `rewrite_compound_background: bool = True` to `BaseEnvironment.execute()`
(and the `BaseModalExecutionEnvironment` override, which accepts and ignores
it). Default preserves existing behavior; the user foreground terminal path
still rewrites.
- `spawn_via_env` passes `rewrite_compound_background=False` so its already
shell-safe wrapper is left intact.
- Treat a wrapper that produces no PID as a failed launch (mark the session
exited with a real exit code instead of exposing a fake running session), and
don't register/checkpoint a session that never started.
Verified empirically: with the rewrite skipped, the wrapper is valid bash,
launches the process, captures the PID, and writes the log/pid/exit files; the
old rewritten form fails `bash -n` with a syntax error.
Based on #33756 by @CharZhou (extracted from a multi-feature branch; the
unrelated image_gen / docker-media changes are not included here).
Co-authored-by: CharZhou <17255546+CharZhou@users.noreply.github.com>
terminal_tool re-sent the init-time/config cwd on every command, clobbering
session-local `cd` state: the environment tracked the new directory in
`env.cwd`, but foreground/background calls forced the old cwd back. A small
`_resolve_command_cwd` resolver now applies the precedence
`workdir > live env.cwd > config/override cwd` to:
- foreground `env.execute(...)`
- background `process_registry.spawn_local(...)`
- background `process_registry.spawn_via_env(...)`
Additionally, syncing the cwd onto the live cached env when a `cwd` override is
(re-)registered. Preferring live `env.cwd` would otherwise demote the ACP
`update_cwd` override (registered via `register_task_env_overrides` on
`session/load` / `session/resume`) below an already-set `env.cwd`, silently
ignoring an editor's mid-session project-root change once any command had run.
`register_task_env_overrides` now pushes a new cwd onto the cached env so an
explicit ACP cwd change wins, while ordinary in-session `cd` tracking is
preserved.
Regression coverage:
- foreground/background commands follow live `env.cwd`
- explicit `workdir` still overrides everything
- registering a cwd override updates the live env cwd (ACP authority)
- no-op when no live env exists; non-cwd overrides leave env.cwd untouched
Based on #35510 by @Dusk1e.
Co-authored-by: Dusk1e <yusufalweshdemir@gmail.com>
In a per-user thread (thread_sessions_per_user=True), each participant
gets an isolated session key (...:{thread_id}:{user_id}). A run another
user started lives under a different key, so the caller's own /stop found
nothing and replied 'no active task to stop'.
When /stop finds no run under the caller's own key, fall back to
interrupting any running agent(s) sharing the caller's thread prefix
({chat_id}:{thread_id}), gated on _is_user_authorized. Thread-only — the
fallback returns [] for non-thread channels, and a prefix-collision guard
prevents thr1 from matching thr11.
* feat(setup): Quick Setup routes through Nous Portal (OAuth + model + messaging)
First-time quick setup now goes straight to the Nous Portal provider
instead of showing the full provider picker. Runs the device-code OAuth
login, selects a Nous model, configures the terminal backend, and offers
messaging setup — applying recommended defaults for everything else.
- Rename menu entry to 'Quick Setup (Nous Portal)'.
- _run_first_time_quick_setup now calls _model_flow_nous (handles both the
logged-out OAuth+model-select path and the logged-in curated picker),
then re-syncs config from disk to avoid the #4172 stale-overwrite.
- Terminal / defaults / messaging steps unchanged.
* feat(setup): thin out Full Setup with happy defaults
Full Setup no longer asks for every config knob — anything with an
obvious default is applied silently and stays tunable via the per-section
commands (hermes setup agent|terminal|tts, hermes auth add).
- Model section: drop the same-provider rotation pool, vision-backend
picker, and TTS provider sub-flows. Vision auto-detects from the main
provider; TTS defaults to Edge; rotation lives in hermes auth add.
- Terminal section: keep the backend picker (Local default) and any
required credentials (Modal token, SSH host/user/key, Daytona key),
but stop prompting for container image, CPU/mem/disk resources, gateway
cwd, and sudo password — all use defaults.
- Agent Settings: removed from the wizard. First installs get recommended
defaults silently; existing installs keep their tuned values.
- New defaults: max_turns 90 -> 150, session_reset both -> none.
- Tests: reconfigure tests assert agent settings are no longer prompted
on existing installs; drop 3 tests covering the deleted in-setup
rotation flow.
* fix(tui): persist gateway lifecycle breadcrumbs to crash log
A backend SIGTERM (`=== SIGTERM received ===` in tui_gateway_crash.log) is
always a parent action — `gw.kill()` (graceful-exit on a signal to Node, or an
explicit /quit) or `start()` replacing a live child. #31051 added parent-side
lifecycle breadcrumbs but left them in an in-memory CircularBuffer that dies
with the process, so SIGTERM crash reports arrive with no parent context and no
way to tell a signal-driven kill from a memory-critical `process.exit(137)`
(which closes the child's stdin → clean EOF, not SIGTERM).
Persist the death-explaining breadcrumbs (spawn / transport-exit / child-exit /
replace-live-child / kill-reason / startup-timeout) plus the graceful-exit
signal name and the memory-critical exit into the same crash log the Python
side writes, so they interleave by timestamp next to the child's panic entry —
making these recurring reports diagnosable.
Gated off under VITEST so unit tests stay hermetic.
* feat(tui): auto-recover the session when the gateway dies unexpectedly
When a still-owned gateway child dies while the TUI is alive (a crash, OOM
process.exit, or a SIGTERM/SIGHUP forwarded to it), the app currently nulls the
session and drops to an inert "gateway exited" state — the user loses a long
session and has to restart + re-run everything. That single behavior is most of
the "TUI doesn't survive heavy work" complaint, independent of what does the
killing.
The 'exit' event only reaches this handler on an *unexpected* death: a user
/quit calls process.exit before it fires, and a replaced child is identity-
skipped in GatewayClient. So on exit we now respawn the gateway and resume the
session that was live (history is persisted in SQLite) via a one-shot
recoverSidRef the next gateway.ready consults before forging a new session. The
in-flight reply is lost (it died with the process) but the session survives.
Bounded to GATEWAY_RECOVERY_LIMIT (3) attempts per GATEWAY_RECOVERY_WINDOW_MS
(60s) so a gateway that crash-loops on startup can't spawn-storm; past the
budget we fall back to the inert state.
* fix(tui): sanitize newlines + soften SIGTERM-cause claim in parentLog
Address PR review:
- recordParentLifecycle collapses embedded \r\n so a multi-line value (e.g. an
error message) stays a single breadcrumb and can't masquerade as a separate
entry or as the child's panic output sharing the crash log.
- Reword the header: a backend SIGTERM is *usually* a parent action but can come
straight from an external supervisor (s6, cgroup OOM, stray kill); the
presence/absence of a [tui-parent] line before the child's panic is precisely
what disambiguates the two.
* fix(tui): clear sid during recovery + extract/test the recovery budget
Address PR review:
- Null `sid` immediately in the gateway exit handler. While the gateway is down
(busy=false) the old sid would otherwise let sid-guarded effects (the 1.5s
session.active_list poll, queue drain) fire RPCs at a dead/respawning gateway.
recoverSidRef carries the session forward; resumeById restores sid on ready.
- Extract the respawn budget into a pure evalRecovery() (gatewayRecovery.ts) and
unit-test the bound: allows GATEWAY_RECOVERY_LIMIT within the window, blocks
past it, and prunes attempts older than the window so recovery re-arms.
* fix(tui): cap parent-log breadcrumb length (PR review)
Truncate a single persisted breadcrumb to 4096 chars (matching GatewayClient's
in-memory log-line cap) so a pathological value — e.g. a giant error string —
can't bloat the shared crash log or add noticeable blocking on the synchronous
append during a failure path. Covered by a test.
* fix(tui): keep "recovering session…" status visible during resume (PR review)
resumeById() synchronously sets status to 'resuming…' on entry, so the
recovery branch now applies its 'recovering session…' label *after* calling
resumeById — the distinct label sticks for the duration of the resume RPC
(which later flips to 'ready') instead of being immediately clobbered. Test
updated to assert the ordering.
* fix(tui): keep recovery budget alive across a startup crash-loop (PR review)
deadSid was read from getUiState().sid, which the first exit nulls — so if the
respawned gateway crash-looped before gateway.ready (resumeById never restored
sid), later exits saw null and abandoned the session after a single attempt,
defeating the bounded retry budget.
Lift the whole decision into a pure planGatewayRecovery() that falls back to the
pending recoverSidRef target when the live sid is already cleared, and unit-test
the crash-loop sequence (keeps retrying the same session up to the limit, then
falls back to inert). Supersedes evalRecovery.
* chore(tui): drop non-null assertion + clarify breadcrumb cap comment (PR review)
- Recovery branch guards on `recoverSidRef && recoverSid` so the ref write needs
no `!` assertion (avoids a future unsafe refactor).
- Reword the parentLog cap comment: it slices the value to 4096 chars and
appends a short truncation marker (so the written line is slightly longer),
rather than implying a strict 4096-byte limit.
* chore(tui): soften "absence ⇒ external signal" + "any in-flight reply" (PR review)
- parentLog header: a missing [tui-parent] line only *suggests* an external
signal (the logger is best-effort: VITEST-disabled, failed append swallowed),
not a definitive conclusion.
- Recovery notice says "any in-flight reply was lost" since the gateway can also
exit while idle.
Extended-thinking Claude models (4.6+, e.g. Opus 4.8) emit a signed `thinking`
block on assistant turns that also carry parallel `tool_use` blocks. Anthropic
signs that block against the full, original turn content.
When a parallel tool batch is interrupted before every `tool_result` returns,
`_strip_orphaned_tool_blocks` removes the unanswered `tool_use` on replay — which
mutates the turn. The latest-assistant branch of `_manage_thinking_signatures`
then replays the now-stale signed thinking block verbatim, and Anthropic rejects
the request with a non-retryable HTTP 400:
messages.N.content.M: `thinking` or `redacted_thinking` blocks in the latest
assistant message cannot be modified. These blocks must remain as they were
in the original response.
Because the poisoned turn is rebuilt from the persisted store every turn, the
gateway crash-loops with no self-recovery (a soft session reset does not clear
it). The drifting content index in the error is the changing count of stripped
`tool_use` blocks across rebuilds.
Fix: when orphan-stripping removes a `tool_use` from a turn that also holds a
thinking/redacted_thinking block, flag the turn. `_manage_thinking_signatures`
then demotes every thinking block on that latest turn to a plain text block
(preserving the reasoning text) instead of replaying a signature that can no
longer validate. An intact turn is unaffected — its signed thinking is still
replayed verbatim. The internal flag is stripped before the payload is sent.
Adds two regression tests:
- demotion when an orphaned parallel tool_use is stripped
- control: signed thinking preserved verbatim when nothing is stripped
PR #35718 added a per-slot "cumulative-resend" latch to the universal
streaming tool-call accumulator to fix DeepSeek / Baidu Qianfan (#35592).
The latch fires when a delta is a strict superset of the accumulated
buffer (len(_new) > len(_prev) and _new.startswith(_prev)) and then
REPLACES the buffer instead of appending.
That superset test is not an unambiguous cumulative signature. A normal
incremental stream can emit a single fragment that restates an already-
accumulated prefix — trivially common in large code-patch arguments with
repeated lines / indentation — which trips the latch and clobbers the
accumulated buffer, corrupting the tool call. Observed in the wild on
Anthropic Opus (the primary model) building a large patch: corrupted /
short arguments → finish_reason='length' dead-end → session killed.
A guessing heuristic that can silently clobber a tool-call buffer has no
place on the path every provider and model shares. Reverting restores the
known-good plain `+=` accumulator. The #35592 narrow provider bug should
be re-addressed provider-gated so it is structurally impossible to touch
Anthropic / OpenAI incremental streams, rather than via a heuristic on the
shared path.
Reverts ca03486b6.
The status bar read context_compressor.last_prompt_tokens directly with
an 'or 0' guard that only catches 0/None. Right after a compression the
compressor parks last_prompt_tokens at the -1 sentinel
(awaiting_real_usage_after_compression) until the next API call reports
real usage. -1 is truthy, so it sailed through and rendered as '-1/200K'
and '-1%' for that one transitional turn.
Clamp negative token/context-length values to 0 in the status-bar
snapshot so the gap reads as empty context until real usage arrives.
* feat(tools): always show Nous Tool Gateway backends, login on select
The Nous-managed Tool Gateway rows in `hermes tools` (Firecrawl, OpenAI
TTS, Browser Use, FAL image/video) were hidden unless the user was already
logged into Nous Portal with paid access. Now they are always listed.
Selecting one runs an inline Nous Portal device-code OAuth + entitlement
check — auth only, no inference-provider switch and no bulk 'enable all
tools' prompt (that stays in `hermes model`). The row only activates the
gateway once paid access is confirmed.
- _visible_providers: stop hiding managed_nous_feature rows (incl. those
also flagged requires_nous_auth); pure pre-auth UX rows still gate on login
- nous_subscription.ensure_nous_portal_access(): auth + entitlement gate
that preserves the user's active inference provider
- _configure_provider / _reconfigure_provider: run the inline gate for
managed backends; write config only when entitled
- picker marker: 'via Nous Portal (login on select)' for logged-out users
- _hidden_nous_gateway_message: now a no-op (rows are never hidden)
* docs: hermes tools is a first-class Tool Gateway entry point
The Tool Gateway docs framed `hermes setup --portal` / `hermes model` as
the activation path and only mentioned `hermes tools` for mixing in your
own keys. With the inline-login change, picking a Nous-managed backend in
`hermes tools` is a complete path on its own — it logs you into Nous
Portal on select if needed, without switching your inference provider or
prompting to enable every other tool.
- tool-gateway.md: Get started now lists three peer entry points; new
paragraph explaining login-on-select and the no-prompt fast path when
OAuth is already active
- nous-portal.md + run-hermes-with-nous-portal.md: note that managed rows
appear logged-out and trigger inline login on select
The three curses menus (curses_checklist / curses_radiolist /
curses_single_select) each hand-rolled an identical event loop: cursor
hide + color-pair init, the per-frame clear/getmaxyx/refresh cycle,
scroll-offset math, row iteration, the read_menu_key dispatch with
NAV_UP/NAV_DOWN cursor wrap, flush_stdin, and the
KeyboardInterrupt/curses-unavailable fallback. Terminal-behavior changes
(e.g. Ghostty raw-escape handling, scroll tweaks, a new key) had to be
made in three places.
Extract that boilerplate into one _run_curses_menu driver. Each public
menu now supplies small callbacks for the parts that genuinely differ:
draw_header (returns the item-list start row), draw_row (checkbox vs
radio vs bare prefix), an on_action reducer (toggle-set vs return-cursor
vs return-None + the single_select cancel-row guard), an optional
draw_footer (the checklist status bar), reserve_bottom, and the numbered
fallback. Behavior is passed as functions; the loop is the only stateful
piece — so future terminal/Ghostty work is a one-place edit.
Duplicated event-loop primitives drop 3 -> 1 (stdscr.clear, read_menu_key
dispatch, scroll math). Verified byte-identical: a render harness records
every addnstr(y, x, clamped-text, attr) call across frames plus the
return value for 6 cases (checklist, checklist+status, radiolist,
radiolist+description, single_select, single_select ESC-cancel); output
diffs clean against origin/main. Non-TTY returns the cancel value
directly (not the input()-based numbered fallback), matching the old
per-menu guard. 150 menu/setup/browse/plugins tests pass.
The setup provider->model sub-menu (and three sibling pickers) used
simple_term_menu.TerminalMenu, whose ESC and arrow-key handling was
unreliable across terminals — notably ESC failed to back out of the
model selection list on terminals that emit raw escape sequences (e.g.
Ghostty). The codebase already notes simple_term_menu 'conflicts with
/dev/tty' and causes 'ghost-duplication rendering', and a prior attempt
to migrate these (closed PR) confirmed the same root cause.
Route all four single-select pickers through the shared, already-hardened
curses_radiolist (which decodes raw CSI/SS3 escape sequences and handles
ESC consistently, fixed in #35776):
- auth.py _prompt_model_selection — model picker; the pricing column
header and the unavailable-models block are passed as the radiolist
description so they survive the curses screen clear. ESC now cancels.
- main.py _prompt_reasoning_effort_selection — reasoning-effort picker.
- main.py _model_flow_named_custom — named custom-provider model picker.
- main.py _remove_custom_provider — provider-removal picker.
simple_term_menu is no longer imported anywhere (only stale comments
referenced it; one in setup.py is corrected). The numbered-input
fallbacks are unchanged and still trigger on curses errors / non-TTY.
Tests: updated test_terminal_menu_fallbacks / test_reasoning_effort_menu
/ test_custom_provider_model_switch / test_model_provider_persistence to
drive the fallback via curses_radiolist errors instead of breaking
simple_term_menu. New test_setup_menu_curses_migration.py asserts each
picker routes through curses_radiolist, ESC cancels, and the pricing
header is preserved. Net -147/+183 (mostly the new test file; production
code shrinks by removing TerminalMenu boilerplate).
The setup wizard's provider/model pickers (curses_radiolist via
prompt_choice) bailed to the numbered "Select [1-N]" fallback the moment
a user pressed up or down. Root cause: even with keypad(True) — which
curses.wrapper sets — many terminals/terminfo entries deliver cursor keys
to getch() as raw CSI/SS3 byte sequences (e.g. 27, 91, 66 for arrow-down)
rather than the translated curses.KEY_DOWN. The menus matched only
curses.KEY_UP/KEY_DOWN and treated the leading 27 (ESC) as cancel, so
navigation dropped into the text fallback and the trailing bytes leaked
into the next input().
Add a shared read_menu_key() helper that decodes CSI/SS3 escape sequences
into normalized NAV_* actions (only a lone ESC, with no continuation byte
within a short timeout, still cancels) and consumes the tail of unhandled
sequences so stray bytes can't corrupt later input(). Route all three
curses menus (checklist, radiolist, single_select) through it.
Add regression tests covering raw CSI/SS3 arrows, translated KEY_*
constants, vim keys, lone-ESC cancel, and full consumption of unhandled
sequences (Delete/Home/End).
* feat(kanban): goal_mode cards run workers in a /goal loop
A goal_mode card wraps its dispatched worker in the Ralph-style goal
loop behind /goal: after each turn an auxiliary judge checks the
worker's response against the card title+body, and if not done the
worker keeps going in the SAME session until the judge agrees, the
worker terminates the task itself, or the turn budget runs out (which
blocks the card for human review — never a silent exit).
- kanban_db: goal_mode + goal_max_turns columns (additive migration),
Task fields, create_task params, INSERT wiring, created-event payload.
- kanban_tools: goal_mode/goal_max_turns on the kanban_create tool so
orchestrators can opt cards in when fanning out.
- kanban CLI: --goal / --goal-max-turns on 'kanban create'.
- dashboard API: goal_mode/goal_max_turns on the create endpoint
(auto-surfaced back via asdict).
- _default_spawn: sets HERMES_KANBAN_GOAL_MODE / _GOAL_MAX_TURNS only
when the card opts in.
- goals.run_kanban_goal_loop: standalone, callback-injected loop engine
(no SessionDB persistence; ephemeral worker). cli.py quiet path calls
it after the worker's first turn when the env vars are set.
- Docs: orchestrator skill + kanban feature page.
Tests: DB roundtrip + legacy migration, spawn env gating, and the loop's
continuation/completion/budget-block/finalize-nudge branches. E2E run
against a real kanban DB confirms a budget-exhausted goal worker lands
in a sticky blocked state.
* feat(kanban/dashboard): goal-mode toggle in the create form
Wires the goal_mode card setting into the dashboard UI (the plugin's
hand-written IIFE bundle, no build step):
- InlineCreate: 'goal mode' checkbox after the skills field; checking it
reveals an optional 'max turns' number input. Both reset on submit and
only post goal_mode/goal_max_turns when enabled.
- TaskDrawer: a 'Goal mode: on (max N turns)' MetaRow so a card's
goal-mode setting is visible after creation (auto-fed by asdict via the
existing _task_dict).
Live-tested through the running dashboard with a browser: created a
goal-mode card with max-turns=8, confirmed it persisted to the kanban DB
(goal_mode=1, goal_max_turns=8) and rendered back in the drawer as
'on (max 8 turns)'. No JS console errors.
Self-review follow-up on top of the salvaged perf fixes:
- gateway/run.py (both watcher-drain sites): the salvaged O(n^2) fix
(#32708) replaced `while pending_watchers: pop(0)` with iterate-then-
`watchers.clear()`, but `watchers` aliased the registry's live list.
A watcher appended by a concurrent session during the `await
asyncio.sleep(0)` yield would be cleared without ever being scheduled.
Detach the batch atomically (`pending_watchers = []`) before iterating.
- gateway/platforms/bluebubbles.py: normalize the salvaged _guid_cache
LRU (#30523) to match feishu/codebase precedent — module-level
`_GUID_CACHE_SIZE` constant, `while len > cap`, and drop the redundant
post-insert `move_to_end` (a fresh insert is already most-recent).
- gateway/platforms/feishu.py: drop the same redundant post-insert
`move_to_end` from the salvaged _message_text_cache LRU (#23706).
- scripts/release.py: add AUTHOR_MAP entries for the salvaged commits'
authors (amathxbt #22155, ErnestHysa #32636/#32708) so the contributor
audit passes when these commits land on main.
- tests/tools/test_tool_output_limits.py: autouse fixture resets the new
module-level limits cache between tests.
- tests/gateway/test_feishu.py: hand-built adapter fixture seeded
_message_text_cache as a plain dict; it's now an OrderedDict, so the
fixture type had to match.
N43 — Silent plugin/bundle errors:
- Plugin command dispatch: logger.debug() -> logger.warning()
- Bundle dispatch: logger.debug() -> logger.warning()
Plugin/auth failures are no longer invisible to operators.
N42 — O(n^2) pending_watchers recovery:
- Both recovery loops (startup + per-message) used while+pop(0) which is O(n) per pop
- Replaced with enumerate() over the list + periodic asyncio.sleep(0) yield points
- Clears the list after iteration instead of per-pop
- Batch size of 100 balances throughput vs event-loop responsiveness
PAIN BEFORE:
Inside _handle_auth_error_and_retry() (a sync function that runs on the MCP
event loop thread), there was a blocking polling loop:
while time.monotonic() < deadline:
if srv.session is not None and srv._ready.is_set():
break
time.sleep(0.25) # BLOCKS THE ENTIRE EVENT LOOP
Since _handle_auth_error_and_retry is invoked from tool handlers that run ON
the MCP event loop, time.sleep(0.25) blocked ALL concurrent MCP operations
(including other tools, keepalive heartbeats, OAuth refreshes) for 250ms per
iteration. With a 15-second deadline, worst case = 60 * 250ms = 15 seconds
of fully blocked concurrency.
WHAT WAS FIXED:
Extracted the blocking poll into an async helper _await_ready() that uses
asyncio.sleep(0.25) (non-blocking), and runs it via _run_on_mcp_loop().
_run_on_mcp_loop() properly awaits the coroutine on the event loop without
blocking the caller's thread. Added exception handling around the poll so
stuck reconnects still fall through to the error path.
The sync _handle_auth_error_and_retry now:
1. Fires reconnect signal (threadsafe)
2. Calls _run_on_mcp_loop(_await_ready(), timeout=15) — non-blocking
3. Returns; the event loop handles the polling
File: tools/mcp_tool.py
Lines: _handle_auth_error_and_retry() (~1886-1920)
Found by: exhaustive multi-pass audit (10 strategies, 1901 files, 913K lines)
The _guid_cache dict grows without bound as new contacts/groups are
resolved. In a long-running gateway instance with many unique targets
this becomes a slow memory leak.
Replace the plain dict with an OrderedDict capped at 500 entries.
When the cap is exceeded the oldest (least-recently-used) entries are
evicted.
_message_text_cache was a plain dict with no size limit. Every unique
message_id whose text was fetched (for reply-context lookups) stayed in
memory permanently, causing unbounded growth in long-running deployments
with active group chats.
Replace with an OrderedDict and evict the least-recently-used entry
whenever the cache exceeds _FEISHU_MESSAGE_TEXT_CACHE_SIZE (512). Cache
hits call move_to_end() to refresh LRU order. Mirrors the identical
pattern already used by _pending_processing_reactions in the same class.
Lower the model_catalog disk-cache TTL from 24h to 1h so freshly
published model-catalog.json deploys reach the picker within an hour
instead of up to a day. The picker now refetches on the next
`hermes model` / `/model` once the cache is older than 1h; younger
than 1h still serves the cache (no network hit), and network failures
still fall back to the stale copy.
- DEFAULT_TTL_HOURS 24 -> 1 (model_catalog.py)
- DEFAULT_CONFIG model_catalog.ttl_hours 24 -> 1, _config_version 24 -> 25
- migration v24->25 rewrites a stale ttl_hours:24 to 1, preserving any
custom value the user set
E2E: verified >1h refetches / <1h skips, and migration rewrites 24->1
while preserving a custom 6.
DeepSeek / Baidu Qianfan stream tool-call arguments in cumulative mode:
each chunk resends the full arguments-so-far instead of the new fragment.
The stream accumulator blindly concatenated arg deltas with +=, turning
that into '{...}{...}{...}', which failed json.loads and got nuked to '{}'
— a silently corrupted tool call (#35592). Worse on multi-param tools
(search_files, session_search, memory replace) because longer args take
more chunks, giving more resend opportunities.
- Per-slot cumulative latch in the stream accumulator: a delta that is a
strict superset of the accumulated buffer marks the slot cumulative and
replaces (not appends); exact duplicates are dropped only after latching.
Incremental fragments are untouched (default += path).
- Backstop _collapse_repeated_json_arguments() in the repair pipeline
collapses pure identical-resend buffers (K exact repeats of a valid-JSON
unit) for providers that resend the complete object from chunk 1. Only
reached after json.loads already failed, so compliant single objects are
never touched.
Not a gateway or DeepSeek-model bug — any OpenAI-wire provider in
cumulative streaming mode is affected.
Resize vision tool-result images down to a 4 MB embed cap at load time,
not just at the 20 MB hard ceiling. A 5-20 MB image previously sailed
through the native fast path and got baked into conversation history,
where Anthropic's 5 MB per-image base64 limit rejected every subsequent
turn with a 400 — and because history is immutable, retries could never
clear it, permanently wedging the session.
Also harden the reactive shrink-recovery: it now returns False (don't
retry) when any oversized image part can't be brought under target, so
the single retry isn't burned re-sending a payload that will fail
identically. Previously it returned True after shrinking *any* part,
even when the actual oversized culprit survived.
SSH sessions hard-failed voice mode on the presence of SSH_* env vars
alone, even when a PulseAudio/PipeWire server is running on the host and
audio works (ffplay/aplay/pw-play -> pulseaudio). Probe the default
sound-server sockets (PULSE_SERVER unix path, PULSE_RUNTIME_PATH/native,
$XDG_RUNTIME_DIR/{pulse/native,pipewire-0}) and actually connect() so a
stale socket doesn't count; downgrade the SSH branch to a notice when
audio is reachable. Mirrors the existing Docker/WSL forwarding handling.
Fixes#35622
The all/* wildcard expands to every registered toolset, but a handful of
tools have an additional check_fn gate on top of toolset membership and
are intentionally NOT turned on by all/* alone:
- Capability-gated tools (browser, computer_use, code_execution, Feishu,
Home Assistant, cronjob) require their backend/credential prerequisite.
- The kanban toolset is workflow-gated and deliberately opt-in. Kanban
tools mutate shared board state, so they stay off by default even under
all/* — you must list 'kanban' by name (or be a dispatcher-spawned
worker with HERMES_KANBAN_TASK set).
This was the expectations gap behind #35581 — the docs previously said
all/* expands to 'every registered toolset' without noting the carve-out.
Closes#35581.
Follow-up to the salvaged #30728:
- Gateway already exports _HERMES_GATEWAY=1 at startup (gateway/run.py) and
cli.py already keys off it. Drop the redundant new HERMES_IN_GATEWAY var;
guard stop/restart on _HERMES_GATEWAY instead. One marker for one fact.
- Drop the greedy \bgateway.*restart alternation from the cron lifecycle
filter — it false-positived on legit prompts that merely mention an
unrelated gateway + a restart (API/payment gateway monitoring). The
specific 'hermes gateway (restart|stop|start)' pattern already covers the
real command.
- Rework the two negative guard tests to sentinel the first downstream call
so they don't drive real signal delivery (tripped the live-system guard).
- Add false-positive regression cases to test_safe_commands.
Three defenses against SIGTERM-respawn loops when agent schedules its
own gateway restart under launchd/systemd KeepAlive:
1. HERMES_IN_GATEWAY env var: gateway sets it at startup; stop/restart
subcommands refuse to run when set (exit 1 with clear message).
2. Cron create payload filter: regex pre-flight rejects prompts/scripts
containing hermes gateway restart/stop, launchctl kickstart/unload,
systemctl restart/stop, and pkill patterns.
3. 30 new tests: pattern matching (14), cron block (5), gateway guard (4),
safe command negatives (7).
The per-turn file-mutation verifier footer rendered failed-write paths as
bare absolute paths in the user-facing response. The gateway's
extract_local_files() scans response text for bare paths ending in a
deliverable extension (.yaml/.json/etc.), validates os.path.isfile(), and
auto-attaches matches as native uploads — so a denied write to
~/.hermes/config.yaml surfaced the path in the footer and got the
credential file silently uploaded to the messaging channel.
The gateway denylist (validate_media_delivery_path) already blocks the
config.yaml case after #35634. This is defense-in-depth at the source:
backtick-wrap every path the footer emits — both the bullet path and any
path echoed inside the tool's error preview (the protected-file denial
message embeds the path in single quotes, which do NOT block the
extractor regex). extract_local_files skips paths inside inline-code
spans, so wrapping defeats auto-attachment for ANY protected file while
keeping the path human-readable.
- run_agent.py: _format_file_mutation_failure_footer wraps bullet paths;
new _neutralize_footer_paths backticks any remaining bare path (covers
the preview echo). staticmethod -> classmethod (caller unaffected).
- tests: backtick-wrap assertion + end-to-end extract_local_files leak test.
When PTB's general httpx pool is exhausted, it converts httpx.PoolTimeout
into telegram.error.TimedOut whose message states the request was *not*
sent to Telegram. The send retry loop treated all non-connect TimedOut as
non-retryable, so a pool timeout raised immediately, skipped all 3 retry
attempts, and was returned as retryable=False -- silently dropping the
message (agent responses, cron reports, etc.).
A pool timeout means the request never left the process, making it the
safest case to retry. Add _looks_like_pool_timeout() and treat it like a
connect timeout in both the in-loop retry decision and the outer retryable
determination, so pool timeouts flow through the existing backoff loop and
stay retryable on exhaustion.
Reported-by: q3874758 (#35610)
* feat(models): add deepseek-v4-flash to OpenRouter + Nous curated lists
deepseek/deepseek-v4-flash was already in the native deepseek provider
catalog but missing from the curated OpenRouter and Nous Portal picker
lists. Added it to both and regenerated the model-catalog.json manifest
(drift guard requires same-PR regeneration).
* refactor(models): trim redundant variants, group curated lists by maker
Remove claude-opus-4.7/4.6, gpt-5.4-nano, gpt-5.3-codex,
gemini-3-pro-image-preview, gemini-3.1-flash-lite-preview, grok-4.20,
and the older gemini-3-pro-preview (Nous). Reorder both OPENROUTER_MODELS
and _PROVIDER_MODELS[nous] into contiguous per-maker blocks with comment
headers. Regenerated model-catalog.json (openrouter 27, nous 20).
* feat(models): add gemini-3-pro-preview to OpenRouter + Nous curated lists
Adds google/gemini-3-pro-preview to both curated pickers (new on
OpenRouter, restored on Nous). Regenerated model-catalog.json
(openrouter 28, nous 21).
* test(models): use claude-opus-4.8 in OpenRouter fetch fixtures
The two TestFetchOpenRouterModels tests mocked a live OpenRouter
response with claude-opus-4.6 and relied on it surviving the curated-list
filter. Since 4.6 was removed from OPENROUTER_MODELS, those models got
filtered out and the recommended tag shifted. Swap the fixture to
claude-opus-4.8 (still curated, still first in the Anthropic block).
Rewrite TestDiscordMentions as negative assertions (mentions survive the
redactor) and clean up the orphaned comment + dangling whitespace left by
removing _DISCORD_MENTION_RE. Follow-up to the salvaged #32259 fix for #35611.
The 'hermes update' config-migration prompt printed only counts ('1 new
config option available') then asked 'configure them now?' without ever
saying what the options were. Users said no because they couldn't tell what
they were agreeing to. For pure config-format version bumps (no new
env/config keys) it still asked the question, where saying yes just bumped
the version and looked like a no-op.
- List each new env var / config key by name + description before prompting
(cap at 8, then '… and N more'). The data was already available; we just
threw it away and printed a count.
- Pure version bump (no new options): apply the format migration
non-interactively and print what happened, instead of asking a misleading
yes/no.
Reported by ScottFive and Tt2021.
Some hosts (notably WSL) report a junk window size such as 131072 columns
by 1 row. Both the Ink fork and our components only guard against
0/null/undefined/NaN (stdout.columns || 80), so a positive-but-absurd
width sails through into createScreen(width*height), allocating tens to
hundreds of MB per frame and tripping the TUI memory monitor's hard exit.
Add clampStdoutDimensions(), installed in entry.tsx before ink.render: it
patches process.stdout.columns/rows with clamping getters (cols 1-2000,
rows 1-1000; out-of-range -> 80x24). One install point fixes the renderer,
its resize handler, and every component read. Live resizes still propagate
through the original descriptor, just clamped.
* fix(tui): swallow degraded mouse-burst noise so a stalled loop can't lock the composer
When the Node event loop blocks during a heavy render/tool-call burst, stdin
stops being drained. Mode-1003 any-motion mouse reports pile up in the kernel
buffer, get partially read, and arrive as text with the `\x1b[<` prefix AND
coordinate digits chewed off across many partial reads. The existing fragment
recovery (SGR_MOUSE_FRAGMENT_RE) only handles clean `button;col;row[Mm]`
triples, so the degraded shards leak into the composer as typed text — the user
can no longer type or exit until the stall clears.
Captured leak (Windows Terminal, during tool calls):
M6M35;220;56M6M35;218;56M169;48M;157;47M;44M20;43M79;40M78;40M0M7M35;49;41M
48;41M;47;40M9;15;32M[I;31M5;211;26M35;211;25M7M;220;1MM0M09;25M24M23M3;22M
M18M99;26M32MM38M63;44M47MM1;51M M4M54M
Add two recovery layers in parseTextWithSgrMouseFragments / the text-token path:
- MOUSE_BURST_NOISE_RE: whole-text fast path. If a text token is drawn only
from the mouse-leak alphabet (`[ ] < ; I M m`, digits, spaces) AND carries
the structural signature of mouse coordinates (>=3 M/m terminators, a digit,
and a `;`), swallow it wholesale.
- MOUSE_BURST_RESIDUE_RE: swallows pure-noise residue in the gaps between and
after recovered fragments, so a partially-recovered burst doesn't trail a
chewed-up tail into the prompt.
All three constraints together preserve real prose: `Mmm MMM mmm yummy` has no
digit/`;`, `see 1;2;3M for details` has disqualifying letters, and
`1234;56;78M9;10;11M` has only two terminators — none are swallowed.
This is defense-in-depth: it stops the leak/lockout regardless of what blocks
the loop. The underlying event-loop stall during streaming is a separate,
still-open issue that needs live-turn instrumentation to root-cause.
* fix(tui): check mouse-burst noise before fragment recovery; drop test cast
Copilot review on #35512:
- MOUSE_BURST_NOISE_RE was only evaluated when parseTextWithSgrMouseFragments
returned null. A noise blob that contains any intact `<b;c;r M` fragment makes
fragment recovery return non-null, so the whole-text swallow never fired and
the code emitted a pile of recovered mouse events instead of dropping the blob
wholesale (contradicting the comment, and doing extra work mid-stall). Move the
noise check ahead of fragment recovery so pure-noise tokens are dropped early.
Add a regression test for a noise blob carrying intact fragments.
- Drop the unnecessary `(e as { isPasted?: boolean })` cast in the test;
discriminated-union narrowing on `e.kind === 'key'` exposes isPasted directly.
* Potential fix for pull request finding
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
* Potential fix for pull request finding
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
* Potential fix for pull request finding
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
* Potential fix for pull request finding
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
---------
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
The installer's ensure_fts5() handled a no-FTS5 Python by running
'uv python install --reinstall', but WHICH Python builds a uv can
install is baked into the uv binary's download manifest. A stale uv
(e.g. 'pip install uv==0.7.20', which predates python-build-standalone
#694) only knows about pre-FTS5 builds, so --reinstall just pulls the
same FTS5-less interpreter — a no-op for FTS5. Result: 'Could not obtain
an FTS5-capable Python' and a broken session search even on the
supported installer path.
ensure_fts5() now escalates uv itself: reinstall with current uv ->
'uv self update' + reinstall (stale standalone uv) -> install a fresh
standalone uv into a temp dir and reinstall with that (externally-managed
uv that can't self-update, the reported case). Pythons live in uv's
shared store, so the fresh uv's --reinstall overwrites the stale
interpreter in place and the installer's later 'uv python find' resolves
to the FTS5-capable build.
Verified against the reporter's exact repro (ubuntu:24.04 +
pip install uv==0.7.20): Python 3.11.13 (no FTS5) -> 3.11.15 (FTS5).
Defense-in-depth on top of the EphemeralReply gate: even if a config.yaml
path reaches response text via some other path, it can never be delivered
as a native attachment. Matches existing protection for .env, auth.json,
and credentials/.
Co-authored-by: JezzaHehn <jezzahehn@gmail.com>
The compact "<n>|content" gutter from #35368 is now the sole behavior.
Removes the HERMES_READ_GUTTER=padded escape hatch and its env lookup —
no legacy fixed-width path to maintain. Padding was pure token overhead
(~48% more tokens than bare content, ~16% more than compact) with no
measured accuracy gain in the original A/B.
- file_operations.py: drop env lookup + os import; gutter always f"{i}|{line}"
- tests: drop the padded env-override test; compact assertions retained
Over SSH the OSC 11 background-color query round-trip routinely exceeds
the 100ms read budget, so _query_osc11_background() gives up and the late
reply lands after prompt_toolkit has grabbed the tty. prompt_toolkit then
injects the OSC payload as typed text and reads its BEL terminator
(\x07 = Ctrl+G) as a keystroke — Ctrl+G is the open-external-editor
binding, dropping the user into vi with garbage and no obvious way out.
- Skip the OSC 11 probe on remote sessions (SSH_CONNECTION/CLIENT/TTY);
fall back to COLORFGBG / env hints / the dark default.
- Restore the tty with TCSAFLUSH instead of TCSANOW so any partial/late
reply is scrubbed from the input buffer before pt reads it.
Relative paths in write_file/patch could resolve against the agent PROCESS cwd
instead of the terminal's working directory. In a git-worktree session with a
stale TERMINAL_CWD='.' (a relative base), early edits silently landed in the
MAIN checkout, verified there, and reported success — while the agent inspected
the worktree and saw nothing, misreading it as the patch tool no-op'ing.
- _resolve_base_dir(): resolution base is now ALWAYS absolute. A relative
TERMINAL_CWD is anchored to the process cwd once, deterministically, instead
of being left to resolve()-time cwd. Live terminal cwd stays authoritative.
- write_file/patch pass the resolved absolute path to the shell FileOps layer
so the tool layer and shell layer can't disagree about which file is edited.
- Responses now report the absolute resolved_path and files_modified, so a
wrong-cwd mismatch is visible on the first call.
- _path_resolution_warning(): emits a _warning when a relative path resolves
OUTSIDE the live terminal cwd (e.g. a worktree session writing into main).
Validation: 11 new unit tests + 43 live E2E assertions (worktree routing,
mid-session cd, V4A patches, divergence warning, absolute paths, consecutive
patches); 466 existing file/path/terminal tests green.
Tasks can now carry file attachments (PDFs, images, source docs) that
workers read directly — closes the gap where source material had to be
pasted as a path into the task body.
- kanban_db: task_attachments table (additive), Attachment dataclass,
add/list/get/delete accessors, attachments_root/task_attachments_dir
path helpers (per-board, HERMES_KANBAN_ATTACHMENTS_ROOT override)
- build_worker_context: surfaces each attachment's absolute path so the
worker (full file/terminal tool access) reads it via read_file/pdftotext
- dashboard API: POST/GET/DELETE attachment routes (multipart upload,
25MB cap, traversal-safe filenames, root-containment check on download)
- dashboard UI: Attachments section in the task drawer — upload button,
list with download, per-row remove
- docs + tests (13 cases: DB accessors, REST round-trip, traversal
rejection, collision suffixing, worker-context surfacing)
Closes#35338
test_windows_path_not_matched asserted the pre-fix POSIX-only behavior.
The Windows drive-letter support now intentionally matches these paths,
so replace it with parametrized positive cases plus a relative-path
negative guard, mirroring tests/gateway/test_platform_base.py.
The MEDIA_TAG_CLEANUP_RE and extract_local_files path regex both used
(?:~/|/) to anchor paths, which only matches Unix-style absolute and
home-relative paths. Two additional _TOOL_MEDIA_RE patterns in run.py
had the same limitation. Windows absolute paths (C:\Users\..., D:/...)
were silently ignored, causing MEDIA directive delivery to fail.
Add [A-Za-z]:[/\\] as a third anchor alternative in all four regex
locations (base.py x2, run.py x2). Also update path separators in
extract_local_files from / to [/\\] so it can traverse Windows
directory trees.
Revert accidental + quantifier in MEDIA_TAG_CLEANUP_RE lookahead
that changed match-one to match-one-or-more (unrelated to fix).
Fixes: #34632
The per-platform reconnect watcher auto-paused a platform after 10
consecutive reconnect failures, setting next_retry=inf and requiring a
manual /platform resume to recover. But both pause sites only ever fire
on *retryable* failures — non-retryable errors (bad auth) already drop
out of the retry queue earlier. So a transient DNS outage that spanned
the watcher's backoff window would silently park the bot forever, even
after connectivity returned.
The watcher's own docstring already promised 'retryable failures keep
retrying at the backoff cap indefinitely' — the code contradicted it.
Remove the auto-pause from both reconnect-failure branches. Retryable
failures now retry at the 5-min backoff cap forever and self-heal once
the network recovers. The circuit breaker (_pause_failed_platform /
_resume_paused_platform) stays for manual /platform pause|resume.
Fixes#35284.
Convert the salvaged text-debounce delays from HERMES_* env vars to
config.yaml (gateway.platforms.<name>.extra.text_batch_delay_seconds /
text_batch_split_delay_seconds), per the '.env is for secrets only'
policy. Adds a finite/non-negative guard so bad YAML values fall back to
the defaults instead of crashing asyncio.sleep().
- whatsapp.py / weixin.py: read delays via _coerce_float_extra(config.extra)
- update Weixin content-dedup regression test for the deferred dispatch path
- add text-debounce coverage (whatsapp + weixin): defaults, config override,
bad-value fallback, env-var-ignored, burst-collapse, lone-message
- docs: WhatsApp + Weixin config keys
WhatsApp and WeChat (Weixin/iLink) both deliver messages individually
without any client-side batching, so rapid multi-message bursts (forwarded
batches, paste-splits, etc.) each trigger a separate agent invocation.
This wastes tokens (redundant system prompts / context for each fragment)
and degrades UX (the user receives reply fragments instead of a single
coherent response).
Both adapters now mirror the Telegram adapter's proven text-debounce
pattern:
- _text_batch_delay_seconds / _text_batch_split_delay_seconds
(configurable via env vars)
- _pending_text_batches dict for per-session aggregation
- _enqueue_text_event() concatenates successive TEXT messages and
resets the flush timer
- _flush_text_batch() dispatches after the quiet period expires
Configurable via env vars:
HERMES_WHATSAPP_TEXT_BATCH_DELAY_SECONDS (default 5.0)
HERMES_WHATSAPP_TEXT_BATCH_SPLIT_DELAY_SECONDS (default 10.0)
HERMES_WEIXIN_TEXT_BATCH_DELAY_SECONDS (default 3.0)
HERMES_WEIXIN_TEXT_BATCH_SPLIT_DELAY_SECONDS (default 5.0)
The merged 0.0.0.0/:: insecure-bind fix (#35141) did not cover binding
directly to a specific non-loopback address (e.g. a Tailscale/LAN IP via
--host 100.64.0.10 --insecure). In that mode the dashboard HTML loaded but
every WebSocket upgrade was rejected by the loopback-only peer guard, so
/chat connected then silently received no data.
Generalize _ws_client_is_allowed to lift the loopback-only peer gate for
any explicit non-loopback bound host, not just the 0.0.0.0/:: wildcard.
DNS-rebinding stays blocked: _ws_host_origin_is_allowed already requires
the Host header to exactly match the bound interface for explicit binds,
mirroring _is_accepted_host on the HTTP layer.
Co-authored-by: pxdsgnco <14163800+pxdsgnco@users.noreply.github.com>
Layer an exception guard on top of the empty-response fix so a crash
inside the agent (e.g. OSError from prompt_toolkit/Vt100 when stdout is a
non-TTY pipe, per #30623) is surfaced on the real stderr with rc=1 instead
of crashing past the redirect_stderr block. KeyboardInterrupt/SystemExit
are re-raised so Ctrl-C and explicit exits still propagate.
Also map briancl2 in scripts/release.py AUTHOR_MAP for the cherry-picked
empty-response commit.
Adapts the exception-guard approach from sweetcornna's PR #33818.
Co-authored-by: sweetcornna <96944678+ymylive@users.noreply.github.com>
browser_console(expression="document.body") returned the cryptic CDP error
"Object reference chain is too long" instead of a usable result.
With returnByValue=true, Chrome deep-serializes the eval result; for a live
DOM Node/NodeList/Window that serialization overruns CDP's recursion guard
and fails the whole call with a protocol-level error (not a JS exception),
which _browser_eval surfaced raw.
- browser_supervisor.evaluate_runtime: on that specific error, retry once
with returnByValue=false so Chrome returns the node's description string —
the same graceful path already used for document.querySelector() results.
- browser_tool._browser_eval (CLI subprocess fallback): the subprocess can't
retry, so convert the reference-chain error into actionable guidance
(extract a primitive / use JSON.stringify) instead of leaking it raw.
No expression rewriting — normal evals (1+41 -> 42) are untouched.
A handoff persisted under an older SUMMARY_PREFIX can be inherited into a
resumed lineage. _strip_summary_prefix only matched the current/legacy
literal, so on re-compaction the old 'resume exactly from Active Task'
directive stayed embedded in the body and kept hijacking replies to new,
unrelated user messages.
- Add _HISTORICAL_SUMMARY_PREFIXES (pre-#35344 prefix) and strip/recognize
them in _strip_summary_prefix + _is_context_summary_content so resumed
stale handoffs are re-normalized to the current latest-message-wins prefix.
- Reconcile the overlapping Active Task template edits from the salvaged
#26290 (reverse-signal cancellation) and #32787 (capture open questions /
decisions, don't write None too eagerly) — both intents kept.
- Regression coverage in tests/agent/test_resume_stale_active_task.py.
- AUTHOR_MAP entries for both salvaged contributors.
The Active Task field in compression summaries is the single most important
field for task continuity across context boundaries. The previous template
described it narrowly as a 'task assignment' or 'request', which caused the
summary LLM to write 'None' whenever the user's most recent input was a
question, a decision request, or a discussion turn rather than an
imperative command. The assistant on the other side of the compaction then
treated the conversation as resolved and gave a generic recap instead of
answering the still-open question.
Expand the template guidance to cover:
* explicit task assignments
* questions awaiting an answer
* decisions awaiting input (A vs B)
* ongoing discussions where the assistant owes the next substantive reply
Reserve 'None' for the rare case where the last exchange was fully
resolved (e.g. user said 'thanks, that's all').
Also tighten the trailing CRITICAL instruction in the summary prompt so the
LLM cannot fall back to the old 'no imperative command → None' heuristic.
No behavioural code changes — template strings only. All 83 existing
compressor tests pass.
SUMMARY_PREFIX previously contained two contradictory directives:
1. "treat it as background reference, NOT as active instructions"
"Do NOT answer questions or fulfill requests mentioned in this summary"
"Respond ONLY to the latest user message that appears AFTER this summary"
2. "Your current task is identified in the '## Active Task' section of the
summary — resume exactly from there."
When the latest user message contradicted Active Task (e.g. 'stop the
i18n refactor', 'never mind, look at grafana instead'), models tended to
follow (2) anyway because 'resume exactly' is a strong, unambiguous
directive — leading to repeated re-surfacing of already-cancelled work
across turns, even after explicit 'stop'/'don't keep bringing that up'
messages from the user.
This change:
- Removes the conflicting 'resume exactly from Active Task' clause.
- Makes the precedence explicit: latest user message is the single source
of truth; it WINS on conflict; cancelled Active Task / In Progress /
Pending User Asks / Remaining Work must be discarded entirely (no
'wrap up the old task first').
- Names canonical reverse signals (stop, undo, roll back, never mind,
just verify, topic change) so the model recognizes them as cancellation
triggers, not background context.
- Updates the summarizer template instruction so the LLM doesn't
mechanically copy a cancelled task into Active Task on the next
compaction (it's instructed to copy the reverse signal verbatim).
- Preserves: REFERENCE ONLY framing, MEMORY.md/USER.md authority, and
the 'don't repeat work already reflected in session state' clause.
Adds tests/agent/test_summary_prefix_semantics.py to pin invariants so
the conflict can't regress.
Tested:
- All compaction tests pass: tests/agent/test_context_compressor.py,
tests/agent/test_context_compressor_summary_continuity.py,
tests/run_agent/test_413_compression.py,
tests/run_agent/test_compression_persistence.py,
tests/run_agent/test_compression_boundary_hook.py,
tests/cli/test_manual_compress.py — 117/117 passing.
- Tested on macOS.
Follow-up on the #35309 regression test: the trailing `with _lock: pass`
asserted nothing. Replace it with a concrete assertion that
_interrupted_threads is empty after the worker exits, directly verifying
the leak the fix prevents.
When _invoke_tool raises a BaseException (CancelledError, KeyboardInterrupt),
the cleanup code at the end of _run_tool was bypassed because it sat outside
the except block (which only catches Exception). ThreadPoolExecutor recycles
thread IDs, so the leaked tid in _interrupted_threads poisons the next tool
scheduled on that thread — it instantly aborts with 'Interrupted'.
Move the discard + _set_interrupt(False) into a finally block so cleanup
runs regardless of how the worker exits.
Fixes#35309
Empty model could reach the API on a recovery turn after stream_interrupt_abort,
failing HTTP 400 "No models provided" with no recovery — the session went
silent until the user manually re-sent (#35314).
- gateway/run.py: cache last-successfully-resolved model per session (+ a
process-wide slot); when a fresh config read returns an empty model on a
recovery turn, reuse the last-known-good instead of building model="".
- run_agent.py + agent/conversation_loop.py: only emit "trying fallback..."
status when a fallback chain actually exists, so the UI stops announcing a
fallback that will never run (also #17446).
- tests: empty-model recovery + _has_pending_fallback gate.
* fix(file-tools): handle UTF-8 BOM in read_file / write_file / patch
Some Windows editors prepend an invisible UTF-8 BOM (U+FEFF) to text
files. We had no awareness of it, so: read_file surfaced a phantom
U+FEFF as the first character; patch matches against the true first
line could miss; and a write/patch round-trip silently stripped the
marker, changing the file's byte signature.
Now:
- read_file / read_file_raw strip a single leading BOM so the model
never sees it (only on the first chunk — the marker lives at byte 0).
- patch_replace strips the BOM before fuzzy-matching (so an exact
first-line match works) and its post-write verification compares
BOM-stripped content.
- write_file restores the BOM when the original file had one and the
new content doesn't, mirroring the existing line-ending preservation
(detect on disk via a cheap `head -c 3` probe or reuse pre_content,
re-prepend across the edit). Guards against double-BOM.
Mid-content U+FEFF is left alone (it's data there, not a file marker).
Tests: TestBomHandling (real LocalEnvironment) — read-strips, raw-read
strips, write preserves, no-BOM-when-original-had-none, no-double-BOM,
patch round-trip preserves, patch matches first line through a BOM,
plus helper unit tests. 208 file-tool tests green.
* fix(kanban): respect mobile safe areas in task detail drawer
The task detail drawer is a body-level z-60 fixed overlay using
height:100vh starting at the viewport top. On mobile this puts the
drawer header behind the dashboard's fixed top bar (min-h-14, z-40)
and lets the bottom comment input sit under the browser's collapsing
nav bar.
- drawer: 100vh -> 100dvh (+ max-height:100dvh), 100vh kept as fallback
- head: padding-top honors env(safe-area-inset-top); mobile (<1024px,
matching the lg breakpoint where the fixed bar shows) clears the
3.5rem header
- comment-row + body: bottom padding extended with
env(safe-area-inset-bottom) so the bottom-most element clears the
mobile browser chrome
Mirrors the host shell idiom (100dvh + env(safe-area-inset-bottom) in
web/), and web/index.html already sets viewport-fit=cover so the insets
resolve. max()/calc() fallbacks leave desktop unchanged.
Closes#35324
read_file's gutter used a fixed-width zero/space-padded prefix
(" 1|content"). The padding is pure token overhead: measured with
cl100k on real Hermes source, the padded gutter costs ~48% more tokens
than bare content and ~16% more than a compact "<n>|content" gutter,
because the leading spaces tokenize into extra tokens on every line.
Switched the default to the compact "<n>|content" form. An A/B
(Sonnet 4.6 via OpenRouter, 2 passes, 4-task battery, every claim
verified against ground truth) showed:
- padded : 4/4 PASS both passes
- compact : 4/4 PASS both passes ← keeps line-referencing + patch
- none : 3/4 PASS both passes ← dropping numbers entirely made
the model hand-count lines and answer off-by-one (33 vs 34)
So we keep the line numbers (the model genuinely uses them to reference
lines) but drop the wasteful padding — capturing ~14% of the read-token
cost with zero measured accuracy change. Dropping numbers entirely
(the larger 33% saving) is rejected: it regresses line-referencing.
patch/fuzzy_match never consumed the gutter (they match old_string text
and compute char offsets internally), so editing is unaffected. No
downstream parser keys on the fixed-width columns. HERMES_READ_GUTTER=
padded restores the legacy format for anyone relying on alignment.
Tests: updated the 3 format assertions to the compact gutter; added an
env-override test for the legacy padded format. 209 file-tool tests green.
Some Windows editors prepend an invisible UTF-8 BOM (U+FEFF) to text
files. We had no awareness of it, so: read_file surfaced a phantom
U+FEFF as the first character; patch matches against the true first
line could miss; and a write/patch round-trip silently stripped the
marker, changing the file's byte signature.
Now:
- read_file / read_file_raw strip a single leading BOM so the model
never sees it (only on the first chunk — the marker lives at byte 0).
- patch_replace strips the BOM before fuzzy-matching (so an exact
first-line match works) and its post-write verification compares
BOM-stripped content.
- write_file restores the BOM when the original file had one and the
new content doesn't, mirroring the existing line-ending preservation
(detect on disk via a cheap `head -c 3` probe or reuse pre_content,
re-prepend across the edit). Guards against double-BOM.
Mid-content U+FEFF is left alone (it's data there, not a file marker).
Tests: TestBomHandling (real LocalEnvironment) — read-strips, raw-read
strips, write preserves, no-BOM-when-original-had-none, no-double-BOM,
patch round-trip preserves, patch matches first line through a BOM,
plus helper unit tests. 208 file-tool tests green.
The plugin apply_yaml_config_fn dispatch loop only ran when a top-level
platform block (e.g. `discord:`) existed. Configs that defined a platform
only under `platforms.<name>` or `gateway.platforms.<name>` skipped the
hook, so `platforms.discord.extra.allow_from` never reached
DISCORD_ALLOWED_USERS. Fall back to those nested blocks when the top-level
one is absent.
Also map byquenox@gmail.com -> Que0x for the salvaged commits.
Adds a 'hermes prompt-size' command that reports the fixed prompt budget
for a fresh session: system prompt total, skills index, memory, user
profile, prompt tiers, and tool-schema JSON bytes. Runs offline (dummy
credentials force the direct-construction path, no network call).
Lets users see which block dominates their per-call payload — the skills
index is often the largest single block when many skills are installed
(issue #34667). Zero model-tool footprint: it's a top-level CLI
subcommand, not an agent tool.
--platform <name> simulates a channel's platform hint; --json emits a
machine-readable breakdown.
Closes#34667
The 'summoning hermes…' phase blocked on gateway.ready, which ran MCP
tool discovery inline. Any configured-but-unreachable MCP server burned
its full connect-retry backoff (1+2+4s ≈ 7s) before the composer
appeared — startup went from instant to ~7.5s of dead air for anyone
with a down stdio/http server in mcp_servers.
Move discovery into a background daemon thread so gateway.ready fires
immediately; tools register into the shared registry as servers connect,
and the agent isn't built until the first prompt. Measured spawn→ready:
~7500ms → ~115ms (dead twozero_td server in config).
Also drop rich.console + prompt_toolkit off banner.py's import path
(lazy-imported inside cprint/build_welcome_banner). tui_gateway.server
imports banner only to reach the lightweight prefetch_update_check
helper; the eager rich/pt imports added ~45ms before gateway.ready for
no benefit. tui_gateway.server import: ~115ms → ~69ms.
The no-home-channel error for send_message derived the env var name
generically as <PLATFORM>_HOME_CHANNEL, producing EMAIL_HOME_CHANNEL for
the email platform. But gateway/config.py reads EMAIL_HOME_ADDRESS, so a
user following the error's guidance would set a variable that is never
consulted. Add a per-platform override map so the email hint names the
variable actually read; all other platforms keep the generic hint.
When using send_message with the email platform, valid email addresses
like user@example.com were not recognized as explicit targets by
_parse_target_ref(). This caused the function to return (None, None,
False), forcing the system into channel-name resolution which has no
way to resolve a raw email address, resulting in 'No home channel set
for email' errors.
Add _EMAIL_TARGET_RE pattern and email platform handler in
_parse_target_ref() so email addresses are treated as explicit targets
and routed directly without requiring a home target configuration.
Adds two real-client tests on top of the salvaged #34783 fix:
- config-less custom:<name> endpoint routes via the carried live base_url
(guards the #34777 symptom directly, not just the wiring)
- named custom:<name> WITH a config entry still resolves via the
named-custom branch (regression guard against collapsing to bare custom)
When a user configures a custom: provider (e.g. custom:openclaw-router),
set_runtime_main() only stored provider and model in process-local globals.
_resolve_auto() then had no base_url or api_key for the custom endpoint,
causing Step 1 to fail and auxiliary tasks (approval, compression, title
generation) to fall through to the aggregator chain and route to wrong
providers.
Fix: extend set_runtime_main() to accept base_url, api_key, and api_mode
keyword arguments; store them in new globals alongside the existing provider
and model; fall back to these globals in _resolve_auto() when the main_runtime
dict is empty. The call site in conversation_loop.py now passes all five
fields from the agent object.
Fixes#34777
hermes update on Windows still aborted with 'Another hermes.exe is running',
listing its own launcher shim(s) as concurrent instances (issues #29341,
#34795). The distlib Scripts\hermes.exe launcher spawns python.exe and waits;
detection runs in the python child, so the launcher shim shows up in
process_iter.
The prior fix walked the ancestor chain with per-hop current.parent() inside
'except: break' — the first psutil AccessDenied/NoSuchProcess (common on
Windows across session/elevation boundaries) bailed the walk early, leaving
the launcher in the candidate set and re-triggering the false positive.
- Switch to proc.parents() (whole ancestor list in one call), evaluate each
ancestor independently so one unreadable hop never strands the launcher.
- Only exclude ancestors whose exe is itself a shim, so a genuine second
hermes.exe under a non-Hermes parent (Desktop backend child) is still flagged.
- Message now prints a copy-pasteable 'taskkill /PID … /F' for the exact stale
PIDs so a user who already closed everything can self-remediate.
Conservative shim-only ancestor approach credited to the parallel attempts in
PRs #29358 (xxxigm) and #31808 (jquesnelle).
Normalize Gmail API message header names to lowercase before lookup so
gmail get/search/reply populate to/subject/from regardless of the casing
the message was stored with. Emit conventional MIME header casing
(To/Subject/Cc/From) on send and reply.
Fixes#34806
Co-authored-by: Donovan Yohan <donovan-yohan@users.noreply.github.com>
In the concurrent tool-execution path, checkpoint preflight (write_file,
patch, destructive terminal) fired BEFORE plugin guardrail block_result
was computed. A blocked write_file could still dirty checkpoint state
(doc_modified_this_turn, _last_write_file_call_id, turn_counter).
Move checkpoint preflight to AFTER block_result computation, gated on
`if block_result is None:` — matching the invariant the sequential path
already enforces.
Follow-up to LengR's #35181 salvage:
- gateway text-path uses getattr(self, '_session_db', None) to match the
picker callback path (defensive for object.__new__() gateway test pattern).
- add SessionDB.update_session_model test asserting it overwrites the
COALESCE-pinned model and survives subsequent token updates (#34850).
When a user switches models mid-session via /model, the gateway updates
the in-memory agent and session overrides, but the database was never
updated. The COALESCE(model, ?) in update_token_counts() only fills NULL
values, so the dashboard always showed the original model.
Fix: Add SessionDB.update_session_model() that unconditionally sets the
model column, and call it from both the interactive picker and direct
/model command paths in the gateway.
Fixes#34850
asyncio.create_subprocess_exec cannot run .cmd/.bat files on Windows
because CreateProcess expects a valid PE executable. npm-installed LSP
servers (intelephense, typescript-language-server, etc.) ship as .cmd
shims on Windows, causing WinError 193 on spawn.
Detect .cmd/.bat extensions and wrap with cmd.exe /c before spawning.
Gated behind sys.platform == 'win32' — no code path changes elsewhere.
Fixes#34864
The salvaged grandchild-reaping tests reference os.getpgid/os.killpg and
pytest.mark/skip/importorskip directly, but the file only imported asyncio,
signal, and unittest.mock. Add the missing imports so collection succeeds
on current main.
The orphan reaper for stdio MCP subprocesses only tracked the direct child
PID spawned by ``stdio_client`` (e.g. ``openclaw mcp serve``). When that
wrapper itself spawned a helper (``claude mcp serve``) and then exited, the
helper reparented to ``systemd --user`` and survived shutdown.
The MCP SDK already spawns stdio children with ``start_new_session=True``,
so the wrapper is its own pgroup leader and same-pgroup descendants are
reachable via ``killpg``. Capture the pgid at spawn time and reap via
``killpg(pgid, sig)`` so reparented grandchildren are reaped alongside the
direct child, even after the wrapper itself exits. Falls back to per-pid
``os.kill`` on Windows or when no pgid was recorded.
Fixes part 2 (orphan ``claude mcp serve``) of #23799. Part 1 (per-invocation
respawn) was confirmed by the reporter to be an environmental artifact, not
a code bug.
Extends the uv-tool detection (briandevans, #29703) to cover the
remaining no-venv install layouts that hit the same uv 'No virtual
environment found' error:
- pipx-managed installs (sys.prefix under .../pipx/...) -> 'pipx upgrade',
matching scripts/auto-update.sh (pipx-detection idea from
inchargeautomation-lab, #29852)
- bare pip outside any venv -> 'uv pip install --system --upgrade'
- venv (launcher shim) keeps the VIRTUAL_ENV overlay from #35224 and never
gets --system, so the install always targets the venv, not system Python
The four branches are mutually exclusive; VIRTUAL_ENV is exported only for
the uv-pip-in-venv path (uv tool / pipx upgrade ignore it).
Co-authored-by: Joshua Kimbrell <incharge.automation@gmail.com>
Copilot review on PR #29703 flagged two issues with the `uv tool list`
fallback in `is_uv_tool_install`:
1. False positive: `uv tool list` returns the *machine*'s installed
tools, not the active install. A regular pip/venv Hermes on a host
that also has `uv tool install hermes-agent` available would be
misclassified as a uv-tool install, and `hermes update` would
upgrade the wrong copy.
2. Overhead: the subprocess call (up to a 15s timeout) was triggered
even from `recommended_update_command_for_method`, which just
computes a display string.
Restrict detection to properties of the running interpreter
(`sys.prefix` and `sys.executable` — both can carry the uv-tool layout
marker depending on entry point). Drop the `uv tool list` fallback and
the `uv_path` parameter entirely. `_cmd_update_pip` now also surfaces a
clear hint when the runtime looks like a uv-tool install but `uv` is
missing from PATH, instead of silently falling back to `python -m pip`.
Hermes installed via `uv tool install hermes-agent` lives outside any
venv. `_cmd_update_pip` previously ran `uv pip install --upgrade`, which
errors with `No virtual environment found; run uv venv ...`. The user
hits this on the very first `hermes update` after a standard
non-`--system` install with `uv` on PATH.
Add `is_uv_tool_install()` in `hermes_cli/config.py`: fast path inspects
`sys.prefix` for the standard `uv/tools/hermes-agent/` layout, falls
back to `uv tool list` for non-standard prefixes. Both the
user-facing `recommended_update_command_for_method("pip")` string and
the actual subprocess invocation in `_cmd_update_pip` now switch to
`uv tool upgrade hermes-agent` when detected. Non-tool installs and the
no-`uv` fallback keep their existing commands unchanged.
* Inspired by Claude Code: /compress here [N] — boundary-aware 'summarize up to here'
Adds a user-chosen compression boundary to the existing /compress command.
/compress here [N] summarizes everything except the most recent N exchanges
(default 2), which are preserved verbatim — letting the user pick the
compression boundary instead of relying on the automatic token-budget heuristic.
Inspired by Claude Code's Rewind 'Summarize up to here' action (v2.1.139,
Week 20, May 2026): https://code.claude.com/docs/en/whats-new/2026-w20
- hermes_cli/partial_compress.py: pure split/parse helpers + seam-alternation
guard (shared by CLI and gateway).
- cli.py / gateway/run.py: route 'here [N]' / '--keep N' to partial compression;
compress only the head, re-append the verbatim tail through the seam guard.
- Preserves message-flow role alternation (seam guard merges any illegal
user->user / assistant->assistant adjacency).
- Reuses the existing _compress_context session-rotation/lock machinery — no
changes to the compression core.
- Bare /compress (full) and /compress <focus> behavior unchanged.
Tests: 12 helper unit tests + 5 CLI integration tests + E2E (interleaved
tool-call transcript, degenerate/multimodal seams, real handler path).
* fix(file-tools): make write_file/patch atomic (temp-file + rename)
write_file streamed content straight into the target via `cat > path`, so
a crash, SIGKILL, or truncated pipe mid-write left the file half-written
and corrupt. patch_replace routes through write_file, so it shared the flaw.
Now writes stream into a temp file in the SAME directory and `mv` it over
the target — a real same-filesystem rename, which is atomic on POSIX and on
every terminal backend (local/docker/ssh/modal). A failed write leaves the
original byte-intact and leaks no temp file. The existing file's mode is
preserved across the swap (stat + chmod, GNU/BSD), and content still rides
stdin so there's no ARG_MAX limit. A trap cleans the temp on any error path.
Tests: added TestAtomicWrite (real LocalEnvironment, no mocks) covering
inode-change-on-overwrite, mode preservation, failed-write-leaves-original,
no-temp-leak, special chars, and patch routing. Updated two mocks in
test_file_operations.py that keyed on the literal `cat >` write command to
key on the stdin_data behavioral signal instead. 200 file-tool tests green.
The hermetic CI env (slice 4/6) redirects HERMES_HOME, so a post-restore
_read_manifest() can resolve to an empty/redirected manifest path and return
{}. Assert on sync_skills's in-memory return value (synced["copied"]) instead,
which is the resilient signal that the skill was re-copied and is no longer in
limbo.
The cherry-picked fix's onerror handler chmod'd only the failing path, but
unlinking a child requires write permission on its PARENT directory. On a true
Nix-store copy (r-xr-xr-x dirs + files) rmtree still failed. Now chmod the
parent dir as well before retrying.
Also rewrites the regression test: the original asserted the helper FAILS on a
read-only dir (documenting the limitation), which is the wrong success criterion.
Split into two tests — restore succeeds on a full read-only tree (real Nix case),
and manifest is preserved when removal genuinely cannot proceed (monkeypatched).
Two related bugs in tools/skills_sync.py affecting Nix-store and
immutable-package installs:
**#34972 — reset_bundled_skill corrupts manifest on rmtree failure:**
The function deleted the manifest entry BEFORE attempting rmtree. If
rmtree failed (read-only files from Nix store), the function returned
early — leaving the skill in a manifest-less limbo state where future
syncs silently skip it forever.
Fix: reorder steps — attempt rmtree FIRST, only delete manifest entry
after rmtree succeeds. If rmtree fails, nothing is changed.
**#34860 — stale .bak directories after sync:**
sync_skills() called shutil.rmtree(backup, ignore_errors=True) which
silently failed on read-only files, leaving persistent .bak dirs.
Fix: add _rmtree_writable() helper that makes files writable via an
onerror callback before retrying removal. Used in both sync_skills()
backup cleanup and reset_bundled_skill().
Fixes#34972Fixes#34860
mcp_serve.py was missing from the setuptools py-modules list, causing
hermes mcp serve to crash with ModuleNotFoundError on standard pip installs.
Fixes#34871
* Inspired by Claude Code: /compress here [N] — boundary-aware 'summarize up to here'
Adds a user-chosen compression boundary to the existing /compress command.
/compress here [N] summarizes everything except the most recent N exchanges
(default 2), which are preserved verbatim — letting the user pick the
compression boundary instead of relying on the automatic token-budget heuristic.
Inspired by Claude Code's Rewind 'Summarize up to here' action (v2.1.139,
Week 20, May 2026): https://code.claude.com/docs/en/whats-new/2026-w20
- hermes_cli/partial_compress.py: pure split/parse helpers + seam-alternation
guard (shared by CLI and gateway).
- cli.py / gateway/run.py: route 'here [N]' / '--keep N' to partial compression;
compress only the head, re-append the verbatim tail through the seam guard.
- Preserves message-flow role alternation (seam guard merges any illegal
user->user / assistant->assistant adjacency).
- Reuses the existing _compress_context session-rotation/lock machinery — no
changes to the compression core.
- Bare /compress (full) and /compress <focus> behavior unchanged.
Tests: 12 helper unit tests + 5 CLI integration tests + E2E (interleaved
tool-call transcript, degenerate/multimodal seams, real handler path).
* feat(model-picker): group multi-endpoint providers under one row
The interactive provider pickers (hermes model, setup wizard, Telegram
/model) listed every provider slug flat, so vendors with several endpoints
(Kimi/Moonshot, MiniMax, xAI Grok, Google Gemini, OpenAI, OpenCode, GitHub
Copilot) each occupied multiple top-level rows. Now related slugs fold into
one top-level row that drills down to the specific endpoint.
- models.py: add PROVIDER_GROUPS table + group_providers() fold (display
only — CANONICAL_PROVIDERS, slugs, --provider, /model <provider:model>
all unchanged and individually addressable).
- hermes model (main.py): group rows drill into a member sub-picker, then
dispatch to the existing _model_flow_* unchanged. setup wizard inherits it.
- Telegram /model: new mpg:<group> callback expands to member mp:<slug>
buttons; single authenticated member degrades to a direct button.
- Grouping is the single shared fold across all three surfaces.
Validation: 163 targeted tests pass; E2E confirms group->member->model
resolves to the correct concrete slug for all families.
Follow-up to the budget-exhaustion recovery fix. recompute_ready's
new circuit-breaker guard resolved its effective limit from per-task
max_retries -> DEFAULT_FAILURE_LIMIT, skipping the dispatcher's
configured kanban.failure_limit. _record_task_failure resolves
max_retries -> failure_limit(config) -> DEFAULT, so the two disagreed
whenever an operator set kanban.failure_limit != 2:
- config > 2: a task could get stuck at DEFAULT(2) before reaching its
allowed retry count.
- config < 2: a task the breaker already blocked could be auto-recovered
back to ready, defeating the stricter limit.
Thread the dispatcher's failure_limit through dispatch_once into
recompute_ready so the guard and the breaker share one resolution order.
Updated test_circuit_breaker_block_still_auto_promotes (it asserted a
failures=5 block auto-recovers and resets the counter — that's the
pre-#35072 behavior the loop fix removes); it now exercises a below-limit
transient block, with the at-limit case covered in test_kanban_db.py.
Added two tests for the config-tier and per-task override resolution.
recompute_ready() previously reset consecutive_failures to 0 when
auto-recovering a blocked task. This defeated the circuit-breaker:
a task that repeatedly exhausted its iteration budget would cycle
forever (block → auto-recover with counter=0 → respawn → budget
exhausted → block → …) with no signal to the operator.
Fix: don't auto-recover tasks whose consecutive_failures has reached
the effective failure limit (per-task max_retries or
DEFAULT_FAILURE_LIMIT). The counter is also preserved across
recovery so the breaker can accumulate across cycles.
Fixes#35072
Legacy kanban boards (pre-AUTOINCREMENT schema) crashed the gateway
notifier on every tick — int(None) on a NULL id in unseen_events_for_sub
— silently losing all kanban notifications. CREATE TABLE IF NOT EXISTS
skips existing tables regardless of schema and _add_column_if_missing
only adds columns, so neither could fix a drifted primary-key type.
_rebuild_drifted_tables() detects the legacy shape via PRAGMA table_info
and rebuilds task_events/task_comments/task_runs (TEXT PK -> INTEGER
AUTOINCREMENT) and kanban_notify_subs.last_event_id (TEXT/NULL -> INTEGER
NOT NULL DEFAULT 0), preserving data. The whole pass is one transaction
so an interruption can't leave a table half-renamed, and recreates every
index DROP TABLE would otherwise take down (including idx_events_run).
Co-authored-by: liuhao1024 <liuhao1024@users.noreply.github.com>
The per-entry psScript callback was identical for every PowerShell entry,
so the function-valued union member added structure without behavior. Collapse
WriteCmd to a plain stdin boolean and apply the one shared base64 script in the
write loop. Document the CP936 root cause inline.
Co-authored-by: BROCCOLO1D <279959838+BROCCOLO1D@users.noreply.github.com>
When writing text to the clipboard via PowerShell (WSL2 and native Windows),
the previous implementation piped text through stdin using `Set-Clipboard
-Value $input`. PowerShell reads stdin using the Windows system's default
ANSI code page (e.g. CP936 for Chinese Windows), causing all non-ASCII
characters (CJK, emoji, accented) to become garbled.
Fix: encode the text as base64 in Node.js and pass it as a command argument.
PowerShell decodes it from base64 using explicit UTF-8, bypassing the code
page issue entirely.
Fixes#35107
_download_image() wrapped every download attempt in a blanket
`except Exception` and retried 3x with 2s/4s/8s backoff regardless of
cause. A 404/403 image URL would never resolve on retry, so it just
burned up to 6s of wall-clock + extra GETs before failing — inflating
latency for a deterministic failure (issue #32296, umbrella #35114).
Add _is_retryable_download_error(): 4xx client errors (except 429),
website-policy PermissionError, and too-large/SSRF ValueError now raise
on the first attempt. 429, 5xx, and unclassified network errors stay
retryable. Removed the now-unreachable fall-through branch since the
loop always returns on success or re-raises on the final/terminal attempt.
The subagent spawn-observability overlay added a `(/agents)` hint, but
only on the standalone "Spawn tree" panel, gated behind `!inlineDelegateKey`
— it never showed for a single delegate_task call, and only appeared once
subagents had already registered. A nudge that arrives at the end (or only
after spawn) is useless for the actual goal: letting users open the live
monitor *while* delegation is running.
Surface it the moment delegation starts, on both surfaces:
TUI (ui-tui/src/components/thinking.tsx)
- Show `(/agents)` on any "Delegate Task" tool group as soon as it appears
(in-flight, before any subagent registers), not gated on subagents
already existing. Same `startsWith('Delegate Task')` predicate already
used for delegateGroups.
CLI (agent/tool_executor.py)
- Append `· /agents to monitor` to the delegate spinner label, which is
displayed for the full duration of the delegate_task call. The previous
attempt put the hint on the completion line (get_cute_tool_message),
which only renders after the call finishes — reverted.
TUI tsc clean (pre-existing execFileNoThrow type errors unrelated);
subagentTree 35/35; display.py reverted to upstream.
A git conflict resolution (reset --hard or merge) can revert
hermes_cli/__init__.py to a stale __version__ while pyproject.toml stays
current, so 'hermes --version' silently reports the wrong version. Nothing
cross-checked the two files.
Add a version-consistency check to the doctor 'Python Environment' section:
reads the [project] version from pyproject.toml and compares it to
hermes_cli.__version__. Reports OK when they match, fails with a re-sync
hint when they drift, and is a silent no-op for installed wheels where
pyproject.toml isn't present.
Closes#35070
Adds _set_process_title() in hermes_cli/main.py, called first thing in
main(). Tries setproctitle (optional) for a full ps-args rewrite, then
falls back to ctypes prctl(PR_SET_NAME) on Linux / pthread_setname_np on
macOS. No-op on Windows and on any failure. No new dependency: the
setproctitle path is best-effort via ImportError guard.
Fixes#35108
Allow non-loopback websocket peers when the dashboard is explicitly exposed with --host 0.0.0.0/:: and --insecure.
This fixes the failure mode where /chat rendered over LAN but /api/ws and /api/events were rejected with HTTP 403, leaving the embedded TUI chat disconnected.
Add regression coverage for the insecure public bind case in the dashboard websocket auth tests.
The max-iterations summary path (`handle_max_iterations`) hand-builds its
message list and calls `chat.completions.create()` directly, bypassing
`ChatCompletionsTransport.convert_messages()`. It only popped
("reasoning", "finish_reason", "_thinking_prefill"), so `tool_name` (SQLite
FTS bookkeeping), the `codex_*` reasoning carriers, and other internal
`_`-prefixed scaffolding leaked to the wire.
Strict OpenAI-compatible gateways (Fireworks-backed OpenCode Go, Mistral,
Moonshot/Kimi) reject these with HTTP 400 "Extra inputs are not permitted,
field: 'messages[N].tool_name'", so a long tool-using session that exhausts
the iteration budget fails to summarise instead of returning the result.
Mirror convert_messages() in this path: also drop tool_name,
codex_reasoning_items, codex_message_items, and every `_`-prefixed key.
Copy-on-write is already in place, so internal history keeps the fields for
FTS / Codex-fallback.
Adds a regression test to TestHandleMaxIterations asserting the summary
request carries none of the schema-foreign keys (fails on main, passes here).
detect_install_method() returned "docker" for any container (is_container()),
before the .git check. Both supported installs already self-identify via the
.install_method stamp read first: the curl installer (scripts/install.sh)
git-clones and stamps "git"; the published nousresearch/hermes-agent image
stamps "docker" at boot via docker/stage2-hook.sh. An unsupported manual
install dropped into a container has no stamp, so the bare container check
hijacked it to "docker" and 'hermes update' bailed with the docker-pull
guidance.
Drop the redundant is_container() -> docker fallback. Unstamped installs now
fall through to the .git/pip checks like any off-path install; both supported
paths are unaffected because the stamp wins first.
Fixes#34397.
The TUI already ships a rich /agents spawn-tree dashboard (live tree,
timeline, per-child tokens/cost/files/tools, kill/pause), but nothing
surfaced it — during delegation the transcript stayed quiet and users
had to already know to type /agents.
Drop a one-time transient activity hint ("subagents working · /agents
to watch live") the first time a turn starts delegating, matching the
existing "· /logs to inspect" house style. Guards keep it unobtrusive:
- fires at most once per turn (resets on message.start)
- silent when the /agents overlay is already open
- gated by display.tui_agents_nudge (default true)
Hooked on subagent.start, not subagent.spawn_requested: the delegate
progress callback in tools/delegate_tool.py only relays start/complete
to the gateway and drops spawn_requested, so start is the first
delegation event the TUI reliably receives. spawn_requested is wired
too for the future case, guarded once-per-turn.
Adds the display.tui_agents_nudge config default and gatewayTypes entry.
These tests asserted that hardcoded curated model lists/constants still
contained specific model strings (e.g. 'glm-5' in provider_model_ids('zai'),
exact context-length values per model key, PROVIDER_TO_MODELS_DEV entries).
They mirror a constant rather than exercise logic, so they only ever break
when models are added/retired and never catch a real bug.
Removed 22 such functions across 7 files (149 deletions, 0 additions).
Behavioral siblings are kept: live-catalog-wins, fallback ordering,
substring/longest-match resolution, normalization, credential discovery,
and probe-tier stepping all still tested.
Starlette < 1.0.1 is affected by CVE-2026-48710 ("BadHost", CWE-444).
The HTTP Host header was not validated before being used to rebuild
`request.url`, so a malformed Host could make `request.url.path` desync
from the raw ASGI path the router actually dispatched. Middleware and
endpoints that apply path-based authorization off `request.url` (rather
than `scope["path"]`) can therefore be bypassed.
Hermes pulls Starlette transitively, never directly:
- [web] -> fastapi==0.133.1 (starlette>=0.40.0, no upper bound)
- [mcp] -> mcp==1.26.0 + sse-starlette (starlette>=0.27 / >=0.49.1)
- [computer-use] -> mcp==1.26.0
- [dev] -> mcp==1.26.0
A fresh resolve landed starlette 0.52.1 — vulnerable. With no upper
bound on the transitive specs, pip/uv could resolve any pre-1.0.1
release on a fresh install.
Fix: pin starlette==1.0.1 directly in every extra that exposes a
Starlette-backed server surface, regenerate uv.lock (only starlette
moves: 0.52.1 -> 1.0.1, hash-verified), and mirror the pin in the
lazy-install map (tools/lazy_deps.py `tool.dashboard`) so `hermes`
on-demand dashboard installs can't re-resolve a vulnerable version.
1.0.1 is the advisory's named fix floor and the oldest patched release
(more bake time than 1.1.0/1.2.0, which are days old); it satisfies
every carrier constraint and our requires-python>=3.11.
Scope note: this is a dependency-level fix complementing the
application-layer Host-header validator added in #34162
(`hermes_cli/web_server.py` `_is_accepted_host`). Defense in depth at
both the framework and app layers.
Guards: two invariant tests in tests/test_packaging_metadata.py assert
every server-surface extra pins starlette and that pyproject + uv.lock
both resolve >= the 1.0.1 CVE floor — a dropped pin or stale lock fails
in CI instead of shipping the bypass.
Closes#35067
Self-hosted Honcho setup had four sharp edges:
- local/cloud URLs ending in /vN double-prefixed by the SDK (/v3/v3/... 404)
- authenticated local servers had no setup prompt for a JWT/bearer token
- profile-derived host keys could be dot-containing workspace IDs Honcho rejects
- memory-provider config files with API keys written world-readable per umask
This keeps existing behavior but makes those paths safer:
- strip a trailing /vN version segment from any configured baseUrl before SDK
init (the SDK's route builders always prepend their own version prefix);
auth-skipping stays loopback-only
- add an optional local JWT/bearer prompt in honcho setup, stored under
hosts.<host>.apiKey
- derive new profile host keys with underscores, still reading legacy
hermes.<profile> blocks
- write memory-provider config files atomically with 0600 via a shared
utils.atomic_json_write(mode=) arg (honcho/hindsight/mem0/supermemory)
- skip honcho.json parsing in gateway cache-busting unless Honcho is the active
memory provider; memoize by honcho.json mtime when active
- bust the gateway agent cache on memory.provider change
- add a hermes memory setup <provider> one-liner so fresh installs can configure
a named provider without the picker (the per-provider hermes <provider>
subcommand only registers once that provider is active)
Closes#20688, #29885, #26459, #30246, #33382, #32244.
Co-authored-by: BROCCOLO1D
apply_nous_managed_defaults() was adding image_gen and video_gen to the
'changed' return set without writing any config values. The caller
(tools_command first_install flow) uses 'changed' to skip manual
configuration, so these tools ended up in platform_toolsets but with no
video_gen.provider, video_gen.use_gateway, or image_gen.use_gateway in
config.yaml.
At runtime the FAL plugin's is_available() returned False because there
was no FAL_KEY and no use_gateway config — the tool never loaded despite
being 'enabled' in the toolset list.
For image_gen this was a latent bug masked by the gateway offer prompt
(prompt_enable_tool_gateway) running earlier in the setup flow and
writing image_gen.use_gateway=True via apply_gateway_defaults(). But if
the user skipped the gateway offer, image_gen would silently break the
same way.
For video_gen (added in PR #33259) the bug was always hit because the
gateway offer ran before the user checked video_gen in the toolset
checklist.
Fix: write provider/use_gateway config values before adding to 'changed',
matching the pattern used by web, tts, and browser.
When FTS5 is missing the warning now explains the likely cause (an
unsupported / pip-managed Python whose bundled SQLite lacks FTS5) and
links the supported install at hermes-agent.nousresearch.com, instead
of just logging the raw error.
uv's python-build-standalone distributions only gained FTS5 in mid-2025
(#694). A stale interpreter already in uv's store — which `uv python find`
reuses without checking — can lack it, leaving the supported install with
a SQLite that can't create the FTS5 virtual tables hermes_state.py needs
for full-text session search ("no such module: fts5").
check_python now probes the resolved interpreter for FTS5 and, if missing,
reinstalls the latest patch for $PYTHON_VERSION (which has FTS5) and
re-resolves. If an FTS5-capable Python still can't be obtained (offline,
pinned env), it warns and continues — Hermes degrades gracefully and only
disables session search. No bundled second SQLite, no user action.
The salvaged contributor commit guarded only messages_fts. Current main
also creates a second virtual table, messages_fts_trigram (CJK substring
search), whose CREATE VIRTUAL TABLE ... USING fts5 still raised
"no such module: fts5" on builds without FTS5 — re-crashing SessionDB
init. Wrap the trigram setup with the same guard, and broaden the test's
no-fts5 mock to fail BOTH tables so the regression test actually
exercises a faithful no-FTS5 build.
The compression threshold is threshold × context_length where context_length
is the MAIN agent model's window, not the auxiliary/summary model's. On a
262,144-token model at the default 0.50 the threshold is 131,072 — close to a
common 128K figure by coincidence of the percentage, which has led to confusion
that the auxiliary model's context limit is the trigger. Add a note preempting
that misreading and pointing to the separate summary-model-context constraint.
Follow-up to the salvaged #34452 turn-completion explainer:
- Register display.turn_completion_explainer: True in DEFAULT_CONFIG so the
setting is discoverable, matching the file_mutation_verifier precedent.
- Shorten the repeated footer prefix from 'Turn ended without a usable
reply: ' to 'No reply: ' so the 10 reason variants don't all open with
the same 8-word boilerplate.
- Update the 7 assertions that referenced the old prefix.
PR #34470 adds an explainer suffix to abnormal turn endings (e.g.
max_iterations_reached) so users see why the response is short instead
of receiving a bare/blank reply. test_tool_call_validation_accepts_dict_arguments
runs the agent at max_iterations=3 which hits the explainer path; the
existing strict-equality assertion (== "done") no longer matches once
the suffix is appended.
Switch the assertion to .startswith("done") so the test continues to
verify that the models actual text survives intact while leaving the
explainer suffix wording owned by conversation_loop (where it belongs).
Test now passes (1 passed in 0.88s).
When a turn ends abnormally after substantive tool calls (empty content
after retries, a partial/truncated stream, exhausted retries, or an
iteration/budget limit), the CLI/TUI response area was left blank or
showed only a fragment (e.g. "The") with no consolidated reason. The
internal turn_exit_reason values (empty_response_exhausted,
partial_stream_recovery, etc.) were never surfaced to the user.
Add a turn-completion explainer that mirrors the existing file-mutation
verifier footer: at turn end, map an abnormal turn_exit_reason to a
short, actionable message and either replace the bare "(empty)"
sentinel or append the reason after a partial fragment. Normal
text_response exits (e.g. a terse "Done.") stay quiet.
Gated by display.turn_completion_explainer (default on) with
HERMES_TURN_COMPLETION_EXPLAINER env override, matching the
file-mutation verifier seam.
Closes#34452
* Inspired by Claude Code: /compress here [N] — boundary-aware 'summarize up to here'
Adds a user-chosen compression boundary to the existing /compress command.
/compress here [N] summarizes everything except the most recent N exchanges
(default 2), which are preserved verbatim — letting the user pick the
compression boundary instead of relying on the automatic token-budget heuristic.
Inspired by Claude Code's Rewind 'Summarize up to here' action (v2.1.139,
Week 20, May 2026): https://code.claude.com/docs/en/whats-new/2026-w20
- hermes_cli/partial_compress.py: pure split/parse helpers + seam-alternation
guard (shared by CLI and gateway).
- cli.py / gateway/run.py: route 'here [N]' / '--keep N' to partial compression;
compress only the head, re-append the verbatim tail through the seam guard.
- Preserves message-flow role alternation (seam guard merges any illegal
user->user / assistant->assistant adjacency).
- Reuses the existing _compress_context session-rotation/lock machinery — no
changes to the compression core.
- Bare /compress (full) and /compress <focus> behavior unchanged.
Tests: 12 helper unit tests + 5 CLI integration tests + E2E (interleaved
tool-call transcript, degenerate/multimodal seams, real handler path).
* fix: keep CLI context display in sync with preflight token estimate
The status bar reads compressor.last_prompt_tokens, which only updates
from a successful API response. When loaded history is oversized but
compression no-ops (e.g. the auxiliary summary model times out), no fresh
usage arrives and the bar stays frozen at the old, smaller value while the
preflight estimate reports a much larger number — looking permanently out
of sync (reported: 74.4K display vs ~144,669 preflight).
Seed last_prompt_tokens with the fresh preflight estimate (upward-only, so
a real usage figure is never clobbered and a successful compression's
downward correction still wins). Display-only; no behavioral change to
compression, caching, or the agent loop.
Sharpen the label from 'Session usage (cumulative)' to 'Cumulative API
tokens (re-sent each call)'. The number is real provider-reported usage
summed across every API call in the session — not context size. In an
agentic loop the same context is re-sent each iteration, so a one-hour
tool-heavy session legitimately reaches tens of millions of tokens. The
new label explains the magnitude so users stop reading it as a bug or as
a total across all sessions.
Hallucinated 'silence' tokens (*(silent)*, _silent_, the bare '.', '...',
'silent', no response/reply, the mute emoji) are emitted when a persona has
nothing actionable to say. In bot-to-bot channels the receiving bot mirrors
the token back, creating a tight loop that burns API tokens and can crash a
model with 'no content after all retries'. SOUL.md/prompt rules drift across
providers and have already failed in practice, so add a substrate-level guard.
_deliver_to_platform now drops a message whose finalized content is only a
silence-narration token, logs a WARNING with platform/chat_id/truncated
content, and returns {success: True, filtered: 'silence_narration',
delivered: False} instead of calling the adapter. Single chokepoint covers
every platform adapter; the regex is anchored start/end with a 64-char guard
so prose like 'Silence is golden — here is the plan...' or 'Silent install
completed' is never dropped. Local/file delivery is a separate path and is
left untouched. Opt out via gateway.filter_silence_narration: false or the
HERMES_FILTER_SILENCE_NARRATION env override (env wins when set).
Closes#34616
The rough-estimate mock supplied only 2 side_effect values but the
conversation loop calls estimate_request_tokens_rough a third time for
the post-response real-token estimate, exhausting the iterator. Use a
callable side_effect that returns 125k once (to fire preflight) then
sub-threshold values, independent of call count.
handle_enter dispatches /steer and /model inline on the UI thread while
the agent is running, calling buffer.reset() then returning. Unlike every
other early-return branch in the handler, these two skipped
event.app.invalidate(). process_command() prints through patch_stdout
(scrolls output above the prompt without redrawing the input line), so the
just-cleared input area could keep showing the submitted '/steer <text>'
until an unrelated redraw fired — looking unsent and inviting an accidental
re-submit.
Add event.app.invalidate() after reset in both inline branches to match
the sibling branches. AST regression test pins the invariant: every
reset-then-return branch in handle_enter must invalidate first.
Fixes#34569
Adds a user-chosen compression boundary to the existing /compress command.
/compress here [N] summarizes everything except the most recent N exchanges
(default 2), which are preserved verbatim — letting the user pick the
compression boundary instead of relying on the automatic token-budget heuristic.
Inspired by Claude Code's Rewind 'Summarize up to here' action (v2.1.139,
Week 20, May 2026): https://code.claude.com/docs/en/whats-new/2026-w20
- hermes_cli/partial_compress.py: pure split/parse helpers + seam-alternation
guard (shared by CLI and gateway).
- cli.py / gateway/run.py: route 'here [N]' / '--keep N' to partial compression;
compress only the head, re-append the verbatim tail through the seam guard.
- Preserves message-flow role alternation (seam guard merges any illegal
user->user / assistant->assistant adjacency).
- Reuses the existing _compress_context session-rotation/lock machinery — no
changes to the compression core.
- Bare /compress (full) and /compress <focus> behavior unchanged.
Tests: 12 helper unit tests + 5 CLI integration tests + E2E (interleaved
tool-call transcript, degenerate/multimodal seams, real handler path).
The POSIX installer drops node/npm/npx symlinks in ~/.local/bin pointing
into $HERMES_HOME/node and prepends ~/.local/bin to PATH, shadowing an
existing nvm. Uninstall removed the hermes wrapper but left these behind,
so the user's default node/npm/npx stayed redirected after uninstall.
Add remove_node_symlinks() and call it from run_uninstall. It removes
~/.local/bin/{node,npm,npx} only when each is a symlink resolving into the
current Hermes home's node dir, so a link the user repointed at nvm or a
real binary is never touched. Handles dangling links too.
Closes#34536
* fix(auxiliary): stop capping output with max_tokens by default
Auxiliary LLM calls (compression, titles, vision, etc.) no longer send
max_tokens on the OpenAI-compatible chat-completions path. Most providers
treat an omitted max_tokens as "use the model max", which is what we want;
an explicit cap only risks truncation or a wire-format 400.
This was surfaced by GitHub Copilot / GPT-5 (#34530): those models reject
max_tokens and require max_completion_tokens, so compression 400'd and fell
back to a static context marker. Omitting the param sidesteps that quirk
(and ZAI vision's error 1210) entirely.
The Anthropic Messages wire (MiniMax + /anthropic endpoints) keeps
max_tokens because it is a mandatory field there.
* test(auxiliary): update temperature-retry assertions for omitted max_tokens
The temperature-retry tests asserted retry_kwargs["max_tokens"] == 500 on an
api.openai.com endpoint. Now that auxiliary calls omit max_tokens on
OpenAI-compatible endpoints (#34530), that key is absent. Assert it's absent
in both first and retry kwargs and use model as the survives-the-retry witness.
* fix(deps): declare setuptools in dev extra for packaging tests
tests/test_packaging_metadata.py imports `from setuptools import
find_packages` at module scope to validate package discovery against
the live tree. setuptools was being picked up ambiently from the CI
runner image, but recent ubuntu-latest images no longer ship it in the
test venv, so collection fails with ModuleNotFoundError on every PR.
Declare setuptools==82.0.1 in the dev optional-dependencies so `.[all,dev]`
installs it explicitly rather than relying on the runner environment.
* test(packaging): skip packaging-metadata tests when setuptools absent
Belt-and-suspenders alongside declaring setuptools in [dev]: guard the
module-level `from setuptools import find_packages` with
pytest.importorskip so a runner missing setuptools SKIPS these checks
instead of erroring out collection for the entire test shard.
* chore(deps): sync uv.lock for setuptools dev dependency
2026-05-29 17:24:23 -07:00
722 changed files with 138674 additions and 5697 deletions
> **Heads up:** Native Windows support is **early beta**. It installs and runs, but hasn't been road-tested as broadly as our Linux/macOS/WSL2 paths. Please [file issues](https://github.com/NousResearch/hermes-agent/issues) when you hit rough edges. For the most battle-tested Windows setup today, run the Linux/macOS one-liner above inside **WSL2**.
> **Heads up:** Native Windows runs Hermes without WSL — CLI, gateway, TUI, and tools all work natively. If you'd rather use WSL2, the Linux/macOS one-liner above works there too. Found a bug? Please [file issues](https://github.com/NousResearch/hermes-agent/issues).
The installer handles everything: uv, Python 3.11, Node.js, ripgrep, ffmpeg, **and a portable Git Bash** (MinGit, unpacked to `%LOCALAPPDATA%\hermes\git` — no admin required, completely isolated from any system Git install). Hermes uses this bundled Git Bash to run shell commands.
The installer handles everything: uv, Python 3.11, Node.js, ripgrep, ffmpeg, **and a portable Git Bash** (MinGit, unpacked to `%LOCALAPPDATA%\hermes\git` — no admin required, completely isolated from any system Git install). Hermes uses this bundled Git Bash to run shell commands.
If you already have Git installed, the installer detects it and uses that instead. Otherwise a ~45MB MinGit download is all you need — it won't touch or interfere with any system Git.
If you already have Git installed, the installer detects it and uses that instead. Otherwise a ~45MB MinGit download is all you need — it won't touch or interfere with any system Git.
> **Android / Termux:** The tested manual path is documented in the [Termux guide](https://hermes-agent.nousresearch.com/docs/getting-started/termux). On Termux, Hermes installs a curated `.[termux]` extra because the full `.[all]` extra currently pulls Android-incompatible voice dependencies.
>
> **Windows:** Native Windows is supported as an **early beta** — the PowerShell one-liner above installs everything, but expect rough edges and please file issues when you hit them. If you'd rather use WSL2 (our most battle-tested Windows path), the Linux command works there too. Native Windows install lives under `%LOCALAPPDATA%\hermes`; WSL2 installs under `~/.hermes` as on Linux. The only Hermes feature that currently needs WSL2 specifically is the browser-based dashboard chat pane (it uses a POSIX PTY — classic CLI and gateway both run natively).
> **Windows:** Native Windows is fully supported — the PowerShell one-liner above installs everything. If you'd rather use WSL2, the Linux command works there too. Native Windows install lives under `%LOCALAPPDATA%\hermes`; WSL2 installs under `~/.hermes` as on Linux. The only Hermes feature that currently needs WSL2 specifically is the browser-based dashboard chat pane (it uses a POSIX PTY — classic CLI and gateway both run natively).
After installation:
@@ -104,17 +104,17 @@ You can still bring your own keys per-tool whenever you want — the gateway is
Hermes has two entry points: start the terminal UI with `hermes`, or run the gateway and talk to it from Telegram, Discord, Slack, WhatsApp, Signal, or Email. Once you're in a conversation, many slash commands are shared across both interfaces.
| Action | CLI | Messaging platforms |
|---------|-----|---------------------|
| Start chatting | `hermes` | Run `hermes gateway setup` + `hermes gateway start`, then send the bot a message |
| Start fresh conversation | `/new` or `/reset`| `/new` or `/reset` |
| Change model | `/model [provider:model]` | `/model [provider:model]` |
| Set a personality | `/personality [name]` | `/personality [name]` |
| Retry or undo the last turn | `/retry`, `/undo`| `/retry`, `/undo` |
| Browse skills | `/skills` or `/<skill-name>` | `/<skill-name>` |
| Interrupt current work | `Ctrl+C` or send a new message | `/stop` or send a new message |
| Platform-specific status | `/platforms` | `/status`, `/sethome` |
For the full command lists, see the [CLI guide](https://hermes-agent.nousresearch.com/docs/user-guide/cli) and the [Messaging Gateway guide](https://hermes-agent.nousresearch.com/docs/user-guide/messaging).
@@ -124,23 +124,23 @@ For the full command lists, see the [CLI guide](https://hermes-agent.nousresearc
All documentation lives at **[hermes-agent.nousresearch.com/docs](https://hermes-agent.nousresearch.com/docs/)**:
| Section | What's Covered |
|---------|---------------|
| [Quickstart](https://hermes-agent.nousresearch.com/docs/getting-started/quickstart) | Install → setup → first conversation in 2 minutes |
> The Foundation Release — Hermes installs and runs anywhere, ships with the things you actually want to use, and stops shipping the things you don't. xAI Grok lands as a SuperGrok OAuth provider with grok-4.3 bumped to a 1M context window. A new OpenAI-compatible local proxy turns any OAuth-authed Hermes provider — Claude Pro, ChatGPT Pro, SuperGrok — into an endpoint that Codex / Aider / Cline / Continue can hit. `x_search` lands as a first-class X (Twitter) search tool with OAuth-or-API-key auth. The Microsoft Teams stack is wired end-to-end (Graph auth + webhook listener + pipeline runtime + outbound delivery). A debloating wave makes installs dramatically lighter — heavyweight backends now lazy-install on first use, the `[all]` extras drop everything covered by lazy-deps, and a tiered install falls back when a wheel rejects on your platform. `pip install hermes-agent` works from PyPI. The cold-start wave shaves ~19 seconds off `hermes` launch. Browser CDP calls are 180x faster. Two new messaging platforms (LINE + SimpleX Chat) bring the total to 22. Cross-session 1-hour Claude prompt caching, `/handoff` that actually transfers sessions live, native button UI for `clarify` on Telegram and Discord, Discord channel history backfill, LSP semantic diagnostics on every write, a unified pluggable `video_generate`, a `computer_use` cua-driver backend that finally works with non-Anthropic providers, clickable URLs in any terminal, Zed ACP Registry integration via `uvx`, native Windows beta, 9 new optional skills, OpenRouter Pareto Code router, huggingface/skills as a trusted default tap. 12 P0 + 50 P1 closures.
> The Foundation Release — Hermes Agent installs and runs anywhere now. Native Windows ships in early beta with a full PowerShell installer story, a `pip install hermes-agent` wheel lands on PyPI, lazy-deps reshape what `pip install hermes-agent` actually pulls down, the supply-chain checker scans every install/upgrade for unsafe versions, and a new OpenAI-compatible local proxy lets Codex / Aider / Cline talk to OAuth-only providers (Claude Pro, ChatGPT Pro, SuperGrok). The cold-start wave shaves ~19 seconds off `hermes` launch, browser-tool CDP calls run 180x faster, and `hermes tools` All-Platforms drops from 14s to under 1.5s. Two new messaging platforms (LINE and SimpleX Chat) and a Microsoft Graph foundation (Teams pipeline + webhook adapter) land alongside `/handoff` that finally transfers sessions live, `vision_analyze` passing pixels through to vision-capable models, `x_search` as a first-class tool, LSP semantic diagnostics on every `write_file` / `patch`, a unified pluggable `video_generate`, a `computer_use` cua-driver backend, cross-session 1-hour Claude prompt caching, a per-turn file-mutation verifier, plus 9 new optional skills. 50+ P1 closures, 12 P0 closures.
---
## ✨ Highlights
- **xAI Grok via SuperGrok OAuth — and grok-4.3 jumps to a 1M context window** — If you pay for SuperGrok, you can now use Grok inside Hermes by signing in with your xAI account — no API key, no separate billing. The wire-through also bumps grok-4.3 to a 1M token context window, so you can drop whole codebases or research corpora into a single prompt. Includes proper handling for entitlement errors and an SSH-to-tunnel docs page for when you're SSH'd into a remote box and need to complete the OAuth flow. ([#26534](https://github.com/NousResearch/hermes-agent/pull/26534), [#26664](https://github.com/NousResearch/hermes-agent/pull/26664), [#26644](https://github.com/NousResearch/hermes-agent/pull/26644), [#26592](https://github.com/NousResearch/hermes-agent/pull/26592))
- **Native Windows support (early beta)** — full PowerShell installer, native subprocess/PTY paths, taskkill-based process management, MinGit auto-install, Microsoft Store python stub detection, foreground Ctrl+C preservation, taskkill+ps2 fallback, npm prefix handling, and ~40 follow-up Windows-only fixes across CLI / gateway / TUI / curator / tools. Hermes finally runs natively on `cmd.exe` and PowerShell, no WSL required. ([#21561](https://github.com/NousResearch/hermes-agent/pull/21561), [#22130](https://github.com/NousResearch/hermes-agent/pull/22130), [#22752](https://github.com/NousResearch/hermes-agent/pull/22752), [#26618](https://github.com/NousResearch/hermes-agent/pull/26618), and many more)
- **OpenAI-compatible local proxy for OAuth providers** — Run `hermes proxy` and you get a `http://localhost:port` endpoint that speaks the OpenAI API but is backed by whichever OAuth provider you're signed into — Claude Pro, ChatGPT Pro, SuperGrok. Now any tool that expects an OpenAI-compatible endpoint (Codex CLI, Aider, Cline, Continue, your custom scripts) just works with your existing subscription, no API key required. One subscription, every tool. ([#25969](https://github.com/NousResearch/hermes-agent/pull/25969))
- **`pip install hermes-agent && hermes`** — Hermes Agent is now a real PyPI package. One command, no clone, no git, no shell installer. Wheel includes the Ink TUI bundle and shell launcher. (salvage of [#26350](https://github.com/NousResearch/hermes-agent/pull/26350)) ([#26593](https://github.com/NousResearch/hermes-agent/pull/26593))
- **`x_search` — first-class X (Twitter) search tool** — The agent can now search X directly without installing a skill or wiring up a custom integration. Search the timeline, find threads, surface specific posts — straight from the chat. Auth with either your X OAuth login or an API key, whichever you have. ([#26763](https://github.com/NousResearch/hermes-agent/pull/26763))
- **Cold-start performance wave — ~19s off `hermes` launch** — skills cache, lazy Feishu import, no Nous HTTP at startup, plus PEP-562 lazy adapter imports (QQ, Yuanbao, Teams, Google Chat), deferred `fal_client` / `google-cloud` / `httpx` loads, models.dev disk-cache-first lookup, parallel doctor API checks, eager-skip plugin discovery on built-in subcommands, `hermes tools` All-Platforms drops from 14s to <1.5s, welcome banner skipped on `chat -q`. ([#22138](https://github.com/NousResearch/hermes-agent/pull/22138), [#22120](https://github.com/NousResearch/hermes-agent/pull/22120), [#22681](https://github.com/NousResearch/hermes-agent/pull/22681), [#22790](https://github.com/NousResearch/hermes-agent/pull/22790), [#22808](https://github.com/NousResearch/hermes-agent/pull/22808), [#22831](https://github.com/NousResearch/hermes-agent/pull/22831), [#22859](https://github.com/NousResearch/hermes-agent/pull/22859), [#22904](https://github.com/NousResearch/hermes-agent/pull/22904), [#22766](https://github.com/NousResearch/hermes-agent/pull/22766), [#25341](https://github.com/NousResearch/hermes-agent/pull/25341))
- **Microsoft Teams — end-to-end** — Hermes can now read messages from Teams and post back. The full Microsoft Graph stack lands together: auth + client foundation, a webhook listener that receives Teams events, a pipeline plugin runtime, and outbound delivery. Wire up the bot once, then chat to your agent from any Teams channel, DM, or group. (salvages of #21408–#21411) ([#21922](https://github.com/NousResearch/hermes-agent/pull/21922), [#21969](https://github.com/NousResearch/hermes-agent/pull/21969), [#22007](https://github.com/NousResearch/hermes-agent/pull/22007), [#22024](https://github.com/NousResearch/hermes-agent/pull/22024))
- **180x faster `browser_console` evaluations** — routed through the supervisor's persistent CDP WebSocket instead of spawning a fresh DevTools session per call. Real-world page interactions feel instant. ([#23226](https://github.com/NousResearch/hermes-agent/pull/23226))
- **Debloating wave — lighter installs, less you don't use** — A clean `pip install hermes-agent` used to pull down everything: every messaging adapter SDK, every image-gen SDK, every voice/TTS provider, whether you used them or not. Now those heavy backends (Slack / Matrix / Feishu / DingTalk adapters, hindsight client, codex app-server, Pixverse / Camofox / image-gen SDKs, voice/TTS providers) install automatically the first time you actually use them. The `[all]` extras drop everything covered by lazy-deps, the installer falls back through tiers when a wheel doesn't fit your platform, and a supply-chain advisory checker scans every install for unsafe versions. Faster installs, smaller disk footprint, fewer transitive vulnerabilities. ([#24220](https://github.com/NousResearch/hermes-agent/pull/24220), [#24515](https://github.com/NousResearch/hermes-agent/pull/24515), [#25014](https://github.com/NousResearch/hermes-agent/pull/25014), [#25038](https://github.com/NousResearch/hermes-agent/pull/25038), [#25766](https://github.com/NousResearch/hermes-agent/pull/25766), [#21818](https://github.com/NousResearch/hermes-agent/pull/21818))
- **Supply-chain advisory checker + lazy-deps framework + tiered install fallback** — every `pip install` / `hermes update` scans dependencies against an advisory list, lazy-deps replace heavy import-time loads with first-use installs, and the installer falls back through extras tiers when a wheel rejects on the target platform. ([#24220](https://github.com/NousResearch/hermes-agent/pull/24220))
- **`pip install hermes-agent && hermes`** — Hermes Agent is now a real PyPI package. No more cloning the repo or running shell installers — one pip command and you're running. The wheel ships with the Ink TUI bundle and the shell launcher, so the full experience comes out of the box. (salvage of [#26350](https://github.com/NousResearch/hermes-agent/pull/26350)) ([#26593](https://github.com/NousResearch/hermes-agent/pull/26593), [#26148](https://github.com/NousResearch/hermes-agent/pull/26148))
- **OpenAI-compatible local proxy** — `hermes proxy` exposes any OAuth-authed provider (Claude Pro, ChatGPT Pro, SuperGrok) as an OpenAI-compatible endpoint that Codex / Aider / Cline / VS Code Continue can hit. Your subscription, your tools. ([#25969](https://github.com/NousResearch/hermes-agent/pull/25969))
- **Cross-session 1h Claude prompt cache** — When you use Claude through Anthropic, OpenRouter, or Nous Portal, the prompt prefix (system prompt, skills, memory) now caches for an hour across sessions. Start a `/new` session and the first response comes back faster and cheaper because the cache is still warm from your last session. Background memory review hits the cache too, so it's not paying full price every turn. ([#23828](https://github.com/NousResearch/hermes-agent/pull/23828), [#25434](https://github.com/NousResearch/hermes-agent/pull/25434), [#24778](https://github.com/NousResearch/hermes-agent/pull/24778))
- **Cross-session 1-hour Claude prompt cache** — Anthropic / OpenRouter / Nous Portal now share a 1h prefix cache across sessions for Claude models. Fast resume, fast `/new`, lower cost on repeat work. ([#23828](https://github.com/NousResearch/hermes-agent/pull/23828))
- **180x faster `browser_console` evaluations** — When the agent uses the browser tool to inspect a page or run JavaScript, those calls now share one persistent connection to Chrome instead of spinning up a new DevTools session every time. The difference is huge: things that used to take a couple of seconds per call return in milliseconds. Real-world page interactions feel instant. ([#23226](https://github.com/NousResearch/hermes-agent/pull/23226))
- **Two new messaging platforms — LINE + SimpleX Chat** — LINE Messaging API lands as a first-class platform, SimpleX Chat salvages #2558 onto the modern adapter spec. Hermes is now on 22 platforms. ([#23197](https://github.com/NousResearch/hermes-agent/pull/23197), [#26232](https://github.com/NousResearch/hermes-agent/pull/26232))
- **Cold-start performance wave — ~19 seconds off `hermes` launch** — Running `hermes` used to make you wait through a chunk of import overhead and network calls before you saw a prompt. Now the launch path is mostly deferred: heavy adapters only load when you use them, model catalogs come from disk cache first, doctor checks run in parallel, and `chat -q` skips the welcome banner entirely. The `hermes tools` All-Platforms screen alone dropped from 14 seconds to under 1.5 seconds. ([#22138](https://github.com/NousResearch/hermes-agent/pull/22138), [#22120](https://github.com/NousResearch/hermes-agent/pull/22120), [#22681](https://github.com/NousResearch/hermes-agent/pull/22681), [#22790](https://github.com/NousResearch/hermes-agent/pull/22790), [#22808](https://github.com/NousResearch/hermes-agent/pull/22808), [#22831](https://github.com/NousResearch/hermes-agent/pull/22831), [#22859](https://github.com/NousResearch/hermes-agent/pull/22859), [#22904](https://github.com/NousResearch/hermes-agent/pull/22904), [#22766](https://github.com/NousResearch/hermes-agent/pull/22766), [#25341](https://github.com/NousResearch/hermes-agent/pull/25341))
- **Microsoft Graph foundation — Teams pipeline + webhook adapter** — `msgraph` auth/client foundation, webhook listener platform, Teams pipeline plugin runtime, and Teams outbound delivery via the existing adapter — Hermes can now read and post to Teams. (salvages of #21408–#21411) ([#21922](https://github.com/NousResearch/hermes-agent/pull/21922), [#21969](https://github.com/NousResearch/hermes-agent/pull/21969), [#22007](https://github.com/NousResearch/hermes-agent/pull/22007), [#22024](https://github.com/NousResearch/hermes-agent/pull/22024))
- **Two new messaging platforms — LINE + SimpleX Chat** — LINE is huge in Japan, Korea, and Taiwan, and now Hermes runs natively on the LINE Messaging API. SimpleX Chat is the privacy-focused decentralized messenger with no user IDs — also wired up as a first-class platform. That brings Hermes to 22 messaging platforms total, so wherever you and your team chat, the agent can be there. ([#23197](https://github.com/NousResearch/hermes-agent/pull/23197), [#26232](https://github.com/NousResearch/hermes-agent/pull/26232))
- **`/handoff` actually transfers the session live** — the agent's active session moves to a different model / persona / profile mid-conversation, with messages, tool history, and context preserved. ([#23395](https://github.com/NousResearch/hermes-agent/pull/23395))
- **`/handoff` actually transfers the session live** — Switching models or personalities mid-conversation used to mean losing context or starting over. Now `/handoff` moves your active session — every message, every tool call, every piece of context — to the target model, persona, or profile, live, without dropping anything. Mid-debugging hand off from a fast model to a deep-reasoning one, or pass a session between profiles for different parts of a task. ([#23395](https://github.com/NousResearch/hermes-agent/pull/23395))
- **`x_search` — first-class X (Twitter) search tool** — gated tool with OAuth-or-API-key auth, no skill needed to query the timeline. ([#26763](https://github.com/NousResearch/hermes-agent/pull/26763))
- **Native button UI for `clarify` on Telegram and Discord** — When the agent uses the `clarify` tool to ask you a multiple-choice question, it now shows real platform-native buttons on Telegram and Discord instead of asking you to type back the option number. Tap the button, the agent gets your answer. Especially nice on mobile. ([#24199](https://github.com/NousResearch/hermes-agent/pull/24199), [#25485](https://github.com/NousResearch/hermes-agent/pull/25485))
- **`vision_analyze` returns pixels to vision-capable models** — when the active model can see, `vision_analyze` now hands the image straight through instead of falling back to a text description. ([#22955](https://github.com/NousResearch/hermes-agent/pull/22955))
- **Discord channel history backfill (default on)** — When Hermes joins a Discord channel or thread for the first time, it now reads the recent message history so it knows what's been said before it responds. No more "what are we talking about?" — the agent has the context that's already on screen for everyone else. ([#25984](https://github.com/NousResearch/hermes-agent/pull/25984))
- **LSP semantic diagnostics on every write** — `write_file` and `patch` now run real language-server diagnostics on the post-edit file (delta-only) and surface real errors before they ship downstream. ([#24168](https://github.com/NousResearch/hermes-agent/pull/24168), [#25978](https://github.com/NousResearch/hermes-agent/pull/25978))
- **`vision_analyze` returns pixels to vision-capable models** — When you point the agent at an image with `vision_analyze` and the active model can actually see (GPT-5, Claude, Gemini, Grok-vision), Hermes now passes the raw pixels straight to the model instead of converting them to a text description first. You get the model's actual visual reasoning instead of a degraded text-summary round-trip. ([#22955](https://github.com/NousResearch/hermes-agent/pull/22955))
- **Per-turn file-mutation verifier footer** — after every turn that wrote files, the agent gets a verifier footer summarizing what actually changed on disk — catches silent overwrites and "wrote it but it didn't land" bugs. ([#24498](https://github.com/NousResearch/hermes-agent/pull/24498))
- **Per-turn file-mutation verifier footer** — After every turn that wrote or edited files, the agent now gets a short footer summarizing exactly what changed on disk — the file paths, the line counts, the actual delta. That means the agent catches its own mistakes when a write didn't land or got silently overwritten, instead of confidently telling you "I added the function" when the file wasn't actually saved. ([#24498](https://github.com/NousResearch/hermes-agent/pull/24498))
- **Unified `video_generate` with pluggable provider backends** — single tool, any backend. Drop in a new video provider as a plugin, no core changes. ([#25126](https://github.com/NousResearch/hermes-agent/pull/25126))
- **LSP semantic diagnostics on every write** — When the agent uses `write_file` or `patch`, Hermes now runs a real language server against the edited file and surfaces any new errors back to the agent before the next turn. Type errors, undefined symbols, missing imports — caught immediately. Goes way beyond v0.13.0's basic Python/JSON/YAML/TOML linting because it's actual semantic analysis. ([#24168](https://github.com/NousResearch/hermes-agent/pull/24168), [#25978](https://github.com/NousResearch/hermes-agent/pull/25978))
- **`computer_use` cua-driver backend** — proper focus-safe ops, non-Anthropic provider support, refresh on `hermes update`. Computer-use is no longer locked to a single SDK. (re-salvage of #16936) ([#21967](https://github.com/NousResearch/hermes-agent/pull/21967), [#24063](https://github.com/NousResearch/hermes-agent/pull/24063))
- **Unified `video_generate` with pluggable provider backends** — One tool, any video model. Hermes ships with the obvious backends already, but you can drop in a new video provider as a plugin without touching core. So when a new video model lands next month, it can be a one-file plugin instead of a fork. ([#25126](https://github.com/NousResearch/hermes-agent/pull/25126))
- **xAI Grok OAuth provider — SuperGrok via subscription** — sign in with your xAI account, talk to Grok models from Hermes. ([#26534](https://github.com/NousResearch/hermes-agent/pull/26534))
- **`computer_use` cua-driver backend — works with non-Anthropic models now** — Computer-use (the agent controlling your mouse and keyboard to drive GUI apps) used to be locked to Anthropic's SDK. The new cua-driver backend works with non-Anthropic providers too, has proper focus-safe operations, and refreshes itself on `hermes update`. Now any vision-capable model can drive your desktop. (re-salvage of #16936) ([#21967](https://github.com/NousResearch/hermes-agent/pull/21967), [#24063](https://github.com/NousResearch/hermes-agent/pull/24063))
- **Clarify with buttons — native inline keyboards on Telegram + Discord** — the `clarify` tool renders multi-choice prompts as platform-native buttons instead of typed responses. ([#24199](https://github.com/NousResearch/hermes-agent/pull/24199), [#25485](https://github.com/NousResearch/hermes-agent/pull/25485))
- **Clickable URLs in any terminal** — Links in agent output are now real OSC8 hyperlinks with hover-highlight in any terminal that supports them. Click to open in your browser — no more copy-paste-trim of long URLs from the transcript. Just works in iTerm2, Kitty, Ghostty, modern Windows Terminal, etc. (@OutThisLife) ([#25071](https://github.com/NousResearch/hermes-agent/pull/25071), [#24013](https://github.com/NousResearch/hermes-agent/pull/24013))
- **Discord channel history backfill (default on)** — Hermes reads recent channel history when joining a thread so it actually knows what's been said. ([#25984](https://github.com/NousResearch/hermes-agent/pull/25984))
- **Zed ACP Registry — `uvx` install in one click** — Hermes is now listed in Zed's Agent Client Protocol registry, so Zed users can install it with one click. The install path uses `uvx` so there's no npm dependency. `hermes acp --setup-browser` bootstraps the browser tools for registry-driven installs. (salvage of [#25908](https://github.com/NousResearch/hermes-agent/pull/25908)) ([#26079](https://github.com/NousResearch/hermes-agent/pull/26079), [#26120](https://github.com/NousResearch/hermes-agent/pull/26120), [#26234](https://github.com/NousResearch/hermes-agent/pull/26234))
- **Watchers skill — RSS / HTTP JSON / GitHub polling via cron `no_agent` mode** — skill recipes that wire change-detection sources directly into cron's script-only watchdog mode. ([#21881](https://github.com/NousResearch/hermes-agent/pull/21881))
- **OpenRouter Pareto Code router with `min_coding_score` knob** — OpenRouter's "Pareto" router automatically picks the cheapest model that meets a minimum quality bar. The new `min_coding_score` config lets you set that bar for coding tasks specifically — Hermes routes to the most affordable model that's at least that good at code. Stop paying for top-tier models when a mid-tier one would do. ([#22838](https://github.com/NousResearch/hermes-agent/pull/22838))
- **Zed ACP Registry integration + uvx distribution** — Hermes is in the Zed registry, installable via `uvx` (no npm). Plus `hermes acp --setup-browser` bootstraps browser tools for registry installs. (salvage of [#25908](https://github.com/NousResearch/hermes-agent/pull/25908)) ([#26079](https://github.com/NousResearch/hermes-agent/pull/26079), [#26120](https://github.com/NousResearch/hermes-agent/pull/26120), [#26234](https://github.com/NousResearch/hermes-agent/pull/26234))
- **NovitaAI as a new model provider** — NovitaAI joins the provider lineup, giving you another option for open-source model hosting (Llama, Qwen, DeepSeek, etc.) with their pricing and rate limits. (salvage #7219) (@kshitijk4poor) ([#25507](https://github.com/NousResearch/hermes-agent/pull/25507))
- **OpenRouter Pareto Code router** — wire a new OpenRouter router with `min_coding_score` knob. Pick the cheapest model that meets your quality bar. ([#22838](https://github.com/NousResearch/hermes-agent/pull/22838))
- **Codex app-server runtime for OpenAI/Codex models** — An optional runtime that drives OpenAI's Codex CLI under the hood when you're using OpenAI or Codex paths. You get session reuse, automatic retirement of wedged sessions, and proper OAuth refresh classification — the kind of plumbing that makes long agentic runs not fall over. ([#24182](https://github.com/NousResearch/hermes-agent/pull/24182), [#25769](https://github.com/NousResearch/hermes-agent/pull/25769))
- **Optional codex app-server runtime for OpenAI/Codex models** — drives the OpenAI Codex CLI under the hood for OpenAI/Codex paths, with session reuse, wedge retirement, and OAuth refresh classification. ([#24182](https://github.com/NousResearch/hermes-agent/pull/24182), [#25769](https://github.com/NousResearch/hermes-agent/pull/25769))
- **`huggingface/skills` as a trusted default tap** — The community skills index hosted at huggingface.co/skills is now wired into the Skills Hub by default. So when somebody publishes a useful skill there, you can install it from your own `hermes skills` browser without any extra config. (closes #2549) ([#26219](https://github.com/NousResearch/hermes-agent/pull/26219))
- **`hermes-skills/huggingface` as a trusted default tap** — community skills index from huggingface.co/skills is available by default in the Skills Hub. ([#26219](https://github.com/NousResearch/hermes-agent/pull/26219))
- **9 new optional skills** — Hyperliquid (perp + spot trading via the SDK and REST API), Yahoo Finance (live market data, fundamentals, historicals), api-testing (REST + GraphQL debug recipes), unified EVM multi-chain (one skill covers Ethereum + L2s + Base), darwinian-evolver (evolutionary prompt/skill tuning), osint-investigation (OSINT recipes for people / domains / orgs), pinggy-tunnel (expose local services to the public internet), watchers (polls RSS / HTTP JSON / GitHub via cron `no_agent` mode for change detection), and a full Notion overhaul for the May 2026 Developer Platform. ([#23582](https://github.com/NousResearch/hermes-agent/pull/23582), [#23583](https://github.com/NousResearch/hermes-agent/pull/23583), [#23590](https://github.com/NousResearch/hermes-agent/pull/23590), [#25299](https://github.com/NousResearch/hermes-agent/pull/25299), [#26760](https://github.com/NousResearch/hermes-agent/pull/26760), [#26729](https://github.com/NousResearch/hermes-agent/pull/26729), [#26765](https://github.com/NousResearch/hermes-agent/pull/26765), [#21881](https://github.com/NousResearch/hermes-agent/pull/21881), [#26612](https://github.com/NousResearch/hermes-agent/pull/26612))
- **API server exposes run approval events** — If you're driving Hermes programmatically through the HTTP API, long-running runs no longer silently hang when the agent hits an approval-required command. The approval request now surfaces on the API stream so your client can prompt the user and reply — no more silent stalls. (salvage of [#20311](https://github.com/NousResearch/hermes-agent/pull/20311)) ([#21899](https://github.com/NousResearch/hermes-agent/pull/21899))
- **API server exposes run approval events** — long-running runs surface approval requests over the API stream, no more silent stalls. (salvage of [#20311](https://github.com/NousResearch/hermes-agent/pull/20311)) ([#21899](https://github.com/NousResearch/hermes-agent/pull/21899))
- **Plugins can run any LLM call via `ctx.llm` + replace built-in tools via `tool_override`** — If you're writing a Hermes plugin, you now get first-class access to make LLM calls through the active provider and credentials — no manual client wiring. The new `tool_override` flag lets a plugin swap out a built-in tool with its own implementation cleanly. Plugin authors get the same model-routing and auth plumbing the core agent uses. (closes #11049) ([#23194](https://github.com/NousResearch/hermes-agent/pull/23194), [#26759](https://github.com/NousResearch/hermes-agent/pull/26759))
- **`/subgoal` — user-added criteria appended to active `/goal`** — layer extra success criteria onto a running goal loop. The judge sees them in the prompt, no behavior change when subgoals are empty. ([#25449](https://github.com/NousResearch/hermes-agent/pull/25449))
- **Brave Search (free tier) + DuckDuckGo (DDGS) as web-search providers** — Two new free web-search backends join Tavily, SearXNG, and Exa. Brave Search has a generous free tier; DDGS is the DuckDuckGo scraper that needs no key at all. Pick whichever fits your budget and rate-limit needs. ([#21337](https://github.com/NousResearch/hermes-agent/pull/21337))
- **Plugins can run any LLM call via `ctx.llm`** — plugins get a first-class hook to make their own LLM requests through the active provider/credentials, no manual wiring. Plus `tool_override` flag for replacing built-in tools. ([#23194](https://github.com/NousResearch/hermes-agent/pull/23194), [#26759](https://github.com/NousResearch/hermes-agent/pull/26759))
- **Sudo brute-force block + 3 dangerous-command bypasses closed + tool-error sanitization** — The approval gate now blocks `sudo -S` brute-force attempts and classifies stdin-fed or askpass-stripped sudo invocations as DANGEROUS. Three known bypasses of dangerous-command detection are closed (inspired by Claude Code's command-detection work). And tool error strings are now sanitized before being re-injected into the model context, so a malicious file or remote service can't pass instructions to your agent through error output. ([#23736](https://github.com/NousResearch/hermes-agent/pull/23736), [#26829](https://github.com/NousResearch/hermes-agent/pull/26829), [#26823](https://github.com/NousResearch/hermes-agent/pull/26823))
- **Brave Search (free tier) + DuckDuckGo (DDGS) as web-search providers** — two new free search backends alongside Tavily / SearXNG / Exa. ([#21337](https://github.com/NousResearch/hermes-agent/pull/21337))
- **`/subgoal` — user-added criteria appended to an active `/goal`** — When you've got a `/goal` running (the persistent Ralph-loop goal where the agent keeps going until criteria are met), you can now use `/subgoal <text>` to layer extra success criteria onto it mid-run. The judge factors your new criteria into the done-or-keep-going decision without restarting the loop. ([#25449](https://github.com/NousResearch/hermes-agent/pull/25449))
- **Sudo brute-force block + sudo-stdin/askpass DANGEROUS classification** — closes the `sudo -S`brute-force avenue; approval gates classify stdin-fed and askpass-stripped sudo invocations as dangerous. (salvages of #22194 + #21128) ([#23736](https://github.com/NousResearch/hermes-agent/pull/23736))
- **Provider rename — Alibaba Cloud → Qwen Cloud** — The Alibaba Cloud provider is renamed to Qwen Cloud in the picker and config to match what the rest of the world calls it. Existing config keys still work — no breaking changes — but the UI matches the actual brand now. ([#24835](https://github.com/NousResearch/hermes-agent/pull/24835))
- **Native Windows support (early beta)** — Hermes now runs natively on `cmd.exe` and PowerShell without WSL. A full PowerShell installer handles MinGit auto-install, Microsoft Store python stub detection, and the foreground Ctrl+C dance. There's still rough edges (this is the "early beta" stamp) — ~40 follow-up Windows-only fixes already landed in the window — but the basic loop works end-to-end on a clean Windows box. ([#21561](https://github.com/NousResearch/hermes-agent/pull/21561))
- **Provider rename — Alibaba Cloud → Qwen Cloud, picker reorder** — matches what the world calls it. Existing config keys still work. ([#24835](https://github.com/NousResearch/hermes-agent/pull/24835))
"User chose option A; awaiting implementation of step 2"
If the user's most recent message was a reverse signal (stop, undo, roll
back, never mind, just verify, change of topic) that supersedes earlier
work, write the reverse signal verbatim and DO NOT carry forward the
cancelled task. Example: "User asked: 'Stop the i18n refactor and just
verify the current diff' — earlier i18n in-flight work is cancelled."
If no outstanding task exists, write "None."]
## Goal
@@ -1260,7 +1352,7 @@ PREVIOUS SUMMARY:
NEW TURNS TO INCORPORATE:
{content_to_summarize}
Update the summary using this exact structure. PRESERVE all existing information that is still relevant. ADD new completed actions to the numbered list (continue numbering). Move items from "In Progress" to "Completed Actions" when done. Move answered questions to "Resolved Questions". Update "Active State" to reflect current state. Remove information only if it is clearly obsolete. CRITICAL: Update "## Active Task" to reflect the user's most recent unfulfilled request — this is the most important field for task continuity.
Update the summary using this exact structure. PRESERVE all existing information that is still relevant. ADD new completed actions to the numbered list (continue numbering). Move items from "In Progress" to "Completed Actions" when done. Move answered questions to "Resolved Questions". Update "Active State" to reflect current state. Remove information only if it is clearly obsolete. CRITICAL: Update "## Active Task" to reflect the user's most recent unfulfilled input — this includes any question, decision request, or discussion turn that the assistant has not yet answered. Only write "None" if the last exchange was fully resolved.
{_template_sections}"""
else:
@@ -1424,9 +1516,16 @@ The user has requested that this compaction PRIORITISE preserving all informatio
@staticmethod
def_strip_summary_prefix(summary:str)->str:
"""Return summary body without the current or legacy handoff prefix."""
"""Return summary body without the current, legacy, or any historical
handoff prefix.
Historical prefixes must be stripped too: a handoff persisted under an
older prefix can be inherited into a resumed lineage (#35344), and if we
only re-prepend the current prefix without removing the old one, the
stale directive it carried stays embedded in the body.
"description":"Capabilities required by Hermes Setup. Narrowly scoped: we don't write user files outside HERMES_HOME, we don't read arbitrary paths, and the only external network call goes through reqwest (Rust side, not exposed to the webview).",
**The native desktop app for [Hermes Agent](../../README.md) — the self-improving AI agent from [Nous Research](https://nousresearch.com).** Same agent, same skills, same memory as the CLI and gateway, in a polished native window — chat with streaming tool output, side-by-side previews, a file browser, voice, and settings, no terminal required. Available for **macOS, Windows, and Linux**.
<table>
<tr><td><b>Chat with the full agent</b></td><td>Streaming responses, live tool activity, structured tool summaries, and the same conversation history as every other Hermes surface.</td></tr>
<tr><td><b>Side-by-side previews</b></td><td>Render web pages, files, and tool outputs in a right-hand pane while you keep chatting.</td></tr>
<tr><td><b>File browser</b></td><td>Explore and preview the working directory without leaving the app.</td></tr>
<tr><td><b>Voice</b></td><td>Talk to Hermes and hear it back.</td></tr>
<tr><td><b>Settings & onboarding</b></td><td>Manage providers, models, tools, and credentials from a real UI. First-run setup gets you to your first message in seconds.</td></tr>
<tr><td><b>Stays current</b></td><td>Built-in updates pull the latest agent and rebuild the app in place.</td></tr>
</table>
---
## Install
### Install with Hermes (recommended)
Add `--include-desktop` to the [one-line installer](../../README.md#quick-install) and it sets up the agent and builds the desktop app in one go:
It builds and launches the GUI against your existing install — same config, keys, sessions, and skills. On first launch Hermes walks you through picking a provider and model; nothing else to configure.
### Prebuilt installers
When a release ships desktop installers they're attached to its [releases page](https://github.com/NousResearch/hermes-agent/releases) — `.dmg` (macOS), `.exe` / `.msi` (Windows), `.AppImage` / `.deb` / `.rpm` (Linux). These are published manually, so the install-with-Hermes path above is the most reliable way to get the latest.
---
## Updating
The app checks for updates in the background and offers a one-click update when one is ready. You can also update any time from the CLI:
```bash
hermes update
```
---
## Requirements
The installer handles everything for you (Python 3.11+, a portable Git, ripgrep). The only thing worth knowing:
- **Windows** — the installer bundles its own Git and Python; no admin rights or system changes required.
- **macOS / Linux** — uses your system Python 3.11+ (installed automatically if missing).
---
## Development
Want to hack on the app itself? Install workspace deps from the repo root once, then run the dev server from this directory:
npm run dev # Vite renderer + Electron, which boots the Python backend
```
Point the app at a specific source checkout, or sandbox it away from your real config:
```bash
HERMES_DESKTOP_HERMES_ROOT=/path/to/clone npm run dev
HERMES_HOME=/tmp/throwaway npm run dev
npm run dev:fake-boot # exercise the startup overlay with deterministic delays
```
### Building installers
```bash
npm run dist:mac # DMG + zip
npm run dist:win # NSIS + MSI
npm run dist:linux # AppImage + deb + rpm
npm run pack # unpacked app under release/ (no installer)
```
Installers are built and uploaded to GitHub Releases manually. macOS/Windows signing & notarization happen automatically when the relevant credentials are present in the environment (`CSC_LINK` / `CSC_KEY_PASSWORD` / `APPLE_*` for macOS, `WIN_CSC_*` for Windows).
### How it works
The packaged app ships only the Electron shell. On first launch it installs the Hermes Agent runtime into `HERMES_HOME` (`~/.hermes`, or `%LOCALAPPDATA%\hermes` on Windows) — the **same layout a CLI install uses**, so the two are interchangeable. The renderer (React, in `src/`) talks to a `hermes dashboard --tui` backend over the standard gateway APIs and reuses the embedded TUI rather than reimplementing chat. The install, backend-resolution, and self-update logic all live in `electron/main.cjs`.
### Verification
Run before opening a PR (lint may surface pre-existing warnings but must exit cleanly):
```bash
npm run fix
npm run type-check
npm run lint
npm run test:desktop:all
```
### Troubleshooting
Boot logs land in `HERMES_HOME/logs/desktop.log` (includes backend output and recent Python tracebacks) — check it first if the app reports a boot failure.
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.