- web.citations: auto (default) | always | off
* auto: new CITATION_GUIDANCE_AUTO instructs the model to cite inline
only for research/report/fact-finding-shaped requests and answer
naturally for incidental lookups (Perplexity's leaked prompts handle
query-class exemptions the same way - instruction, not classifier)
* always: unconditional guidance (previous behavior)
* off: no source ids, no guidance - byte-identical to pre-feature output
- web.summary_stream: true (default) | false - gates the live CLI
summarization box
- Tool schema descriptions made mode-agnostic ('results MAY carry a
source field... follow guidance when present') so the static schema
stays byte-stable for a conversation regardless of config
- Docs: configuration.md 'Grounded citations' section
- 7 new tests covering mode resolution + off-mode output shape
Grounding (web_search + web_extract):
- Process-wide source registry assigns stable numbered citation ids
(url -> [n]) shared across search and extract calls, so the model only
ever emits small integers it received - it cannot hallucinate a URL
(the structural trick behind Perplexity's grounding)
- Every search/extract result carries its marker in a 'source' field;
responses with content include a compact citation_guidance block
modeled on Perplexity's leaked prompts (per-sentence inline [n] cites,
max 3 per sentence, cite-while-writing, never invent ids, Sources list)
- Summarizer prompts hardened per ALCE (arXiv:2305.14627) and WebGPT
(arXiv:2112.09332): preserve citable facts as verbatim quotes, keep
figures exact, never blend outside knowledge, flag gaps explicitly
Live summary streaming (CLI):
- New tools/summary_display.py: tiny callback registry + single display
slot; no-op everywhere a front-end doesn't register (gateway, cron,
subagents, tests)
- web_extract summarization now tries a streaming call first when a
display is attached, mirroring tokens into a dim 'Summarizing . <url>'
box in the CLI (same UX as reasoning blocks); any streaming failure
falls back to the existing non-streaming retry/fallback path
- Parallel page summaries race for the slot; only one streams, others
run silently - no interleaved boxes
macOS Desktop backend processes can still miss Apple Silicon Homebrew paths even after adding Hermes-managed Node and venv bins. That leaves `/codex-runtime on` unable to find a Homebrew-installed `codex` binary at `/opt/homebrew/bin/codex`.
Add a small testable backend env helper that builds the dashboard subprocess environment in one place. It prepends Hermes-managed Node and venv bins, appends missing POSIX sane PATH entries individually, preserves caller precedence without duplicates, and keeps Windows PATH casing/delimiters intact.
Wire both source-checkout and active-install backend descriptors through the helper, and add Node regression coverage to the desktop platform test suite.
Two halves of the same community report (dashboard Profile Builder):
1. A fresh dashboard/CLI-created profile got no .env file unless cloned,
so it silently inherited API keys and messaging tokens from the shell
environment / root install. create_profile() now seeds a placeholder
.env (0600) for non-clone profiles, matching the SOUL.md seeding.
2. The Channels endpoints (/api/messaging/platforms GET/PUT/test) were
not profile-scoped: they read/wrote the dashboard process's own .env
via load_env()/save_env_value() regardless of the global profile
switcher. They now accept the standard optional profile param (body
beats query on the PUT, matching other scoped writes) and run inside
_profile_scope(). When scoped, the payload no longer falls back to
os.environ or load_gateway_config()'s env-override layer — both carry
the ROOT install's credentials and would misreport them as the
profile's. /api/messaging/platforms added to PROFILE_SCOPED_PREFIXES
so the sidebar switcher scopes the Channels page automatically.
The Teams adapter only handled image/* attachments — documents (the
application/vnd.microsoft.teams.file.download.info consent-free download
payload and any direct-URL non-image attachment) never reached media_urls
at all, so run.py's document-context injection had nothing to surface.
Completes the class-wide sweep from PR #44695 (Signal/Email/SimpleX).
- download.info attachments: fetch the pre-authed SharePoint downloadUrl
(SSRF-guarded, same guard chain as base.py cache_*_from_url) and route
through cache_media_bytes
- direct-URL non-image attachments: same fetch + classify path
- skip Teams' text/html message-body mirror and adaptive-card attachments
- DOCUMENT > PHOTO > VIDEO > AUDIO precedence for mixed attachments,
matching the Email precedence rationale from #44695
* feat(billing): /usage → portal top-up browser handoff
Add the terminal side of the billing slice (phase 2a): start a top-up by
throwing the user to the portal billing page with the top-up modal open. The
terminal does not confirm, poll, or track payment — checkout completes in the
browser and the next /usage shows the new balance.
- nous_account.py: parse organisation.slug/name from /api/oauth/account into
NousPortalAccountInfo; add nous_portal_topup_url() building the org-pinned
{base}/orgs/{slug}/billing?topup=open with a null-slug fallback to the legacy
{base}/billing?topup=open (never /orgs/None/...).
- portal_cli.py: 'hermes portal topup' — fresh account fetch, identity line
(Topping up as <email> / org <name>), browser open with printed-URL fallback,
no-wait closing copy. No polling/confirmation (deferred to 2b).
- account_usage.py: the shared /usage credits block now links the org-pinned
top-up URL (auto-opens the modal) + points to the command.
Depends on NAS #409 (organisation.slug/name + ?topup=open). Do not merge until
that is live on the target env; until then /api/oauth/account returns
organisation: { id } only and the URL falls back to legacy.
* feat(billing): /credits command for balance + top-up handoff
Replace the standalone `hermes portal topup` subcommand with an in-session
/credits slash command — a focused money surface (balance in, top-up out) that
works in the CLI, TUI, and every messaging platform from one registry entry.
- commands.py: register /credits (Info category). Slack is at its 50-slash cap,
so /credits is routed via /hermes credits on Slack only (new
_SLACK_VIA_HERMES_ONLY set) to avoid clamping a canonical command off the
native list and breaking Telegram parity; native everywhere else.
- account_usage.py: build_credits_view() — one portal fetch → balance lines +
identity line + org-pinned top-up URL + depleted flag, consumed by all
surfaces. Reuses the same snapshot/URL builder as /usage so numbers match.
- cli.py: _show_credits() — balance block + identity line + 3-button panel
(Open top-up / Copy link / Cancel) via the existing prompt_toolkit modal.
ASK, never auto-launch; headless falls back to printing the URL.
- gateway/slash_commands.py: _handle_credits_command() — renders the block +
tappable top-up URL + no-wait copy; works on button and plain-text platforms.
- /usage credits line now points to /credits.
- Retire `hermes portal topup` (portal_cli.py back to baseline); the engine
(slug/name parse + nous_portal_topup_url) stays as the shared core.
No polling, no payment confirmation (billing phase 2a). Depends on NAS #409.
* fix(credits): /credits works in the TUI slash-worker (non-interactive)
In the TUI, /credits runs in the slash-worker subprocess where there is no
live prompt_toolkit app and stdin is the JSON-RPC pipe. _show_credits called
the 3-button modal unconditionally, which fell back to reading stdin →
exception → slash.exec rejected → the command produced no output (only the
pre-existing 'Credit access paused' banner showed).
- _show_credits: when self._app is None (TUI worker / piped / non-interactive),
render the text variant — balance block + tappable top-up URL + no-wait line,
same affordance as the messaging surfaces — and skip the modal entirely. The
3-button panel still renders in the interactive CLI.
- Depleted banner copy: 'run /usage for balance' → 'run /credits to top up'
now that /credits is the dedicated money surface (+ tests).
- Regression tests: _show_credits with self._app=None renders text and never
invokes the modal; logged-out path.
* feat(tui): credits.view RPC for the /credits tappable top-up button
Add a credits.view JSON-RPC method returning the structured CreditsView
(logged_in, balance_lines, identity_line, topup_url, depleted) so the TUI can
render a clickable <Link> top-up button instead of plain text. Account-
independent (portal fetch gated on a logged-in Nous account), fail-open to
{logged_in: false} on any hiccup. Mirrors session.usage's credits-block pattern.
Frontend (TUI-local /credits command + Ink component) lands separately.
* feat(tui): /credits command with keyboard-driven top-up confirm
TUI-local /credits: fetches the structured balance via the credits.view RPC,
prints the balance + identity + top-up URL, then arms the EXISTING confirm
overlay (Enter = open top-up in browser via openExternalUrl, Esc = cancel).
Reuses ConfirmReq — no new overlay component/state/input handler. Headless
(openExternalUrl returns false) falls back to printing the URL.
- gatewayTypes.ts: CreditsViewResponse.
- commands/credits.ts: the command (mirrors /status's rpc+guarded pattern).
- registry.ts: register creditsCommands.
- test: balance+overlay armed, headless fallback, no-url, logged-out (4 cases).
Matches the CLI /credits 'Enter to open' affordance. Phase 2a: no polling.
Modal prompt panels (dangerous-command approval, clarify questions)
live in the prompt_toolkit layout and vanish on the next repaint,
leaving no trace of the question or the decision in chat history.
Emit a dim one-line summary after each prompt resolves:
⚠ Approval: <command> → allowed for session
? Clarify: <question> → <answer>
Gated on display.persist_prompts (default true). Detail and outcome
are whitespace-collapsed and capped at 120 chars.
SimpleX tagged unknown files application/octet-stream in media_types
but classification only handled audio/image, leaving msg_type TEXT —
run.py never injected the document context. Same bug class as #12845.
Email cached document attachments and placed them in media_urls, but
msg_type only flipped on image attachments — documents stayed TEXT and
run.py's document-context injection (gated on MessageType.DOCUMENT)
silently dropped them. Same bug class as Signal #12845. DOCUMENT wins
over PHOTO for mixed attachments since image handling keys off per-path
mime types while document injection gates strictly on message_type.
Widen the salvaged #12851 fix to match the established classification
pattern (WhatsApp/Slack/BlueBubbles/Mattermost): video/* -> VIDEO, and
any remaining MIME type falls through to DOCUMENT instead of TEXT, so
exotic types still trigger run.py's document-context injection.
Follow-up for salvaged PR #44486: the adapter shipped remove_reaction but
the tool only exposed 'react'. Generalize _handle_react(remove=) and add
tool-level dispatch tests for react/unreact (missing from the original PR).
Inbound events key the tracker by the DM chat GUID (any;-;+1555...),
but home-channel react calls address the same space by bare E.164 —
normalize both to the phone so add_reaction's last-inbound default
resolves regardless of which form the caller uses (mirrors the
sidecar's phoneTargetFromSpaceId).
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Add `action='react'` to `send_message` tool and expose `add_reaction`/
`remove_reaction` on the Photon adapter.
- Track latest inbound message id per chat (`_last_inbound_by_chat`,
bounded to 200 entries) so the agent can react without threading
message ids through tool calls
- New `add_reaction`/`remove_reaction` public methods on PhotonAdapter;
unlike the lifecycle tapbacks, these are not gated by PHOTON_REACTIONS
- `send_message` gains `action='react'` with `emoji` and optional
`message_id` params; resolves target via existing channel-directory
and home-channel logic; requires a live gateway adapter
Prevents "Future attached to a different loop" errors when
_sidecar_call is invoked from a worker thread via _run_async in
send_message_tool. The persistent _http_client remains in use for
the inbound streaming loop, which always runs on the gateway's loop.
A hard gateway exit (crash, SIGKILL, supervisor restart) left the
detached Node sidecar running with a token the next gateway run doesn't
know, so it could never be told to /shutdown. Every replacement spawn
then died on EADDRINUSE, failing each 30→300s reconnect attempt while
the orphan kept consuming the inbound gRPC stream.
Two layers:
- Lifetime binding: the adapter now holds the sidecar's stdin as a
pipe, and the sidecar (PHOTON_SIDECAR_WATCH_STDIN=1) shuts down on
stdin EOF — fired by the OS on any parent death, including SIGKILL.
- Startup reaping: before spawning, the adapter probes the port and
terminates a stale listener, but only after verifying its command
line is a Photon sidecar; a foreign listener raises a clear error
instead of being signalled.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Pin spectrum-ts to exactly 3.0.0 (was ^1.18.0 plus an `npm install
spectrum-ts@latest` on every setup) so breaking SDK majors can't take
down fresh installs silently; `hermes photon setup` now runs `npm ci`.
Upgrade procedure documented in the README.
Migrate resolveSpace to the v3 namespace API: `im.space.create(phone)`
for DMs and `im.space.get(id)` for everything else — group spaces are
now rehydratable from their persisted id after a sidecar restart, which
v1 could not do.
Markdown: replies go out via the v3 `markdown()` builder (iMessage
renders natively; other Spectrum platforms degrade to plain text).
`PHOTON_MARKDOWN=false` reverts to the stripped plain-text path.
Reactions, behind PHOTON_REACTIONS (default off): lifecycle tapbacks
(👀 while processing, 👍/👎 on completion) via new sidecar /react and
/unreact endpoints with per-target reaction-handle tracking, and user
tapbacks on bot-sent messages routed to the agent as synthetic
`reaction:added:<emoji>` events.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The subscription-cap usage gauge (50/75/90% bands) ignored purchased
(top-up) credits: a sub user with top-up funds got a sticky warn banner
at 90% of their cap — permanently at >=100%, alongside grant_spent —
despite being fully able to keep inferencing. The cap is the wrong
denominator for an account that can keep spending.
- evaluate_credits_notices: purchased_micros > 0 suppresses the usage
band (grant_spent already covers the cap-reached + top-up case with
the remaining balance). A top-up landing mid-session clears any
showing band; spending top-up down to 0 resumes the gauge.
- New display.credits_notices config (default true): false silences all
credits notices. State capture and /usage are unaffected. Read once
per agent (cached) in _emit_credits_notices, fail-open true.
- Docs: configuration.md display block.
- Add ELECTRON_SKIP_BINARY_DOWNLOAD=1 to nix/lib.nix to prevent offline download failures.
- Manually trigger native compilation of node-pty via npm rebuild --build-from-source in buildPhase.
- Run stage-native-deps.cjs to copy the natively compiled binary into build/native-deps.
- Flatten native-deps and install-stamp.json to the root of the output derivation in installPhase, matching electron-builder's extraResources behavior so main.cjs can find it at process.resourcesPath + '/native-deps/node-pty'.
- Add doCheck=true and a strict checkPhase to fail fast if the staged native binary is missing.
The original fix added agent/memory_manager.py:flatten_message_content, but
that helper was a near-exact duplicate of
agent/codex_responses_adapter.py:_summarize_user_message_for_log — same
None/str/list dispatch, same {text,input_text,output_text}/{image_url,input_image}
part sets, the identical [N image(s)] marker, and the same str() fallback. The
only difference was the join separator (newline for memory vs space for the
log/trajectory previews the existing helper already serves), and that helper is
already imported into agent/turn_finalizer.py — the same file whose call site the
memory fix touches.
Parameterize the existing helper with sep=' ' (default preserves every current
logging/trajectory caller byte-for-byte) and call it with sep='\n' at the memory
boundary; drop the forked flatten_message_content. Repoints the unit tests to the
consolidated helper and adds a case locking the default space-join.
Single source of truth for multimodal-content flattening; no behavior change for
the fix or for existing callers.
Multimodal turns carry message content as a list of typed parts
({type: "text"|"image_url", ...}). _sync_external_memory_for_turn
passed that list straight into MemoryManager.sync_all, and providers
feed it to regexes — Honcho's sync_turn calls sanitize_context, where
re.sub raised 'expected string or bytes-like object, got list'. Every
turn with an attached image silently never synced.
Flatten to plain text at the boundary: text parts joined, images noted
as an [N image(s)] marker so the attachment isn't erased from recall.
Fixing here covers all providers instead of patching each plugin.
(cherry picked from commit 705bdb6ffe)
Sibling site of the CLI approval-panel fix: the TUI ApprovalPrompt
rendered each command line with wrap="truncate-end", so a long
single-line command lost its tail at terminal width. Wrap to the
panel width via wrapAnsi before applying the 10-line preview cap.
Fireworks-hosted Kimi rejects tool requests when nullable MCP/Pydantic
schemas collapse to {"$ref": "...", "default": null}. Strip that sibling
during global schema sanitization so gateway and CLI calls succeed again.
list_plugins() attribution diffed registry names against all already-loaded
plugins, so when a plugin registered a hook / middleware / tool name an
earlier plugin had already used, the shared name was credited to the first
plugin only and later plugins under-reported (0 hooks) in hermes plugins
list. commands_registered right beside it already attributed correctly by
plugin ownership.
Snapshot per-registry counts before register() and attribute the entries
this plugin's register() actually added (per-registration delta). Add a
regression test: two plugins registering the same hook name are each
credited with 1 hook.
discover_and_load(force=True) cleared every per-plugin registry except
_plugin_platform_names, which register_platform() populates. A platform
plugin disabled between force-rediscovers left a stale name behind, so the
set diverged from the real platform_registry / _plugins state and never
shrank across repeated force passes.
Add the missing clear() and a regression test that seeds every per-plugin
registry, forces a rediscover, and asserts they all empty (so a future
registry addition can't silently leak across a force pass either).
Register no-op Slack event handlers for inbound reaction_added and reaction_removed events so Slack Bolt does not log unhandled-request warnings for events Hermes does not consume.
The previous implementation captured loop vars via default arguments::
async def _wrapped(ack, body, action, _cb=_cb, _plugin_name=_plugin_name):
slack_bolt's ``kwargs_injection`` introspects each listener's signature
via ``inspect.signature`` and passes ``None`` for any parameter name it
doesn't recognise (see ``slack_bolt/kwargs_injection/async_utils.py``
``build_async_required_kwargs``). That clobbered ``_cb`` to ``None`` at
dispatch time, so the wrapped plugin handler became ``NoneType`` —
``await _cb(...)`` then raised ``'NoneType' object is not callable`` and
no plugin action handler ever fired.
Replace the default-arg trick with a small closure factory so the
wrapper's public signature is exactly ``(ack, body, action)``. Add a
regression test that introspects the wrapped function's signature.
Found via real Slack click on a Block Kit button registered through
``ctx.register_slack_action_handler`` — gateway log showed
``[Slack] Plugin 'None' action handler raised: 'NoneType' object is
not callable`` despite the registration log line confirming the
handler was wired.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Plugins that post Block Kit messages with interactive elements (buttons,
overflow menus, datepickers, etc.) had no documented way to receive the
resulting click events. The plugin API exposed register_tool, register_hook,
register_command, register_platform, and register_context_engine, but
nothing for slack_bolt action handlers. The only workaround was to
monkey-patch SlackAdapter.connect from inside register(), which is
fragile and breaks on every Hermes update.
This change adds:
* PluginContext.register_slack_action_handler(action_id, callback) —
validates inputs and queues the handler on the PluginManager.
action_id accepts whatever slack_bolt.App.action() accepts (literal
string, compiled re.Pattern, or constraint dict).
* PluginManager.get_slack_action_handlers() — accessor used by the
Slack adapter at connect time.
* SlackAdapter.connect — after wiring its built-in approval and
slash-confirm buttons, iterates the plugin-registered handlers
and registers each via self._app.action(matcher)(callback). Each
callback is wrapped defensively so a misbehaving plugin cannot
crash slack_bolt's dispatch loop, with a best-effort ack on
exception so Slack stops retrying the click.
* Defensive fallback when the plugin layer is unhealthy: a
RuntimeError from get_plugin_manager() is logged and swallowed
rather than blocking the gateway from starting.
* Test coverage in tests/gateway/test_slack_plugin_action_handlers.py
for input validation, multi-plugin registration, the connect-time
wiring, defensive exception handling, and the plugin-loader-
failure fallback path.
* Documentation in website/docs/guides/build-a-hermes-plugin.md
describing the new API alongside the existing register_command /
dispatch_tool documentation.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
body sets user-select:none for native feel and opts text back in only via
[data-selectable-text='true']; the preview's source and rendered-markdown
panes never set it, so code couldn't be selected or copied. Tag the Shiki
code column and the markdown root. The attribute stays off the SourceView
grid root so the gutter keeps its select-none and line numbers don't bleed
into copied text.
The terminal listed JetBrains Mono only as a late fallback and shipped no
webfont, so on machines without SF Mono/Menlo xterm measured the grid on the
regular system face while styled SGR spans fell back to a font with different
advances — glyphs squeezed and overlapped.
Bundle the regular/bold/italic woff2 (Apache-2.0, the faces the dashboard
already ships), put the family first in the xterm stack, pin the weights, and
warm every face before mount (fonts.ready only settles already-requested
faces; bold/italic aren't asked for until styled output paints, past atlas
init). Vite emits them as hashed assets under dist/** with base './', so the
fonts ship in the asar and every install path inherits them.
The per-row copy control lived in the header's trailing slot as a 24px
button that depended on a `group-hover/tool-row` group that exists nowhere
in the tree. It therefore stayed `opacity-0` yet remained clickable — an
invisible hit-target straddling the disclosure caret and duration, making
the caret hard to click without firing a copy.
Move copy into the expanded body's top-right (matching the code-block
convention) where it can't fight the caret for the right edge, and make it
actually visible (subtle at rest, full on hover/focus). The header right
edge now belongs solely to the duration label + caret.
Tradeoff: copy is only reachable once a row is expanded; rows with no
expandable body no longer surface a copy control.
Arabic/Hebrew/Persian/Urdu chat text rendered left-to-right and
left-aligned, and mixed RTL/English technical messages (the common case)
read backwards. Resolve each chat block's base direction from its own
first strong character (UAX#9) with pure CSS, scoped to the chat
surfaces only:
- `unicode-bidi: plaintext` + `text-align: start` on assistant prose
blocks (p, h1-h6, li, blockquote), the user bubble's text lines, and
both composers (main + edit share the composer-rich-input slot). RTL
blocks read and right-align RTL; English stays LTR; mixed
conversations resolve per block. `text-align: start` is required
because the user bubble hardcodes `text-left`.
- Inline `code` and KaTeX are pinned `direction: ltr; unicode-bidi:
isolate`, so the bidi first-strong heuristic skips them: a sentence
that *starts* with a command (`./run.sh ...`) followed by Arabic
still resolves RTL, and the command's own neutrals keep their order.
- Fenced code surfaces (code-card, user fences) are pinned LTR so they
never mirror or right-align inside an RTL list item or blockquote.
`direction` is never forced, so app chrome, layout, and list indent
stay LTR per the issue's request not to flip the whole UI. English-only
content is byte-for-byte unchanged.
Salvaged and unified from #44065 and #44169; verified in Chromium that
isolate removes inline code from the paragraph direction vote (the
code-first case), making the JS dir-resolution in #44065 unnecessary.
Fixes#44150
Co-authored-by: Adolanium <Adolanium@users.noreply.github.com>
Co-authored-by: Adalsteinn Helgason <AIalliAI@users.noreply.github.com>
Tell coding agents to activate shell setup once per session instead of re-sourcing it before every command, and pin the existing LocalEnvironment env-snapshot behavior with regression tests.
Port from anomalyco/opencode#31271: only call tools/list when the server
advertises the 'tools' capability in InitializeResult.capabilities.
Previously, _discover_tools() unconditionally called session.list_tools()
right after initialize. Prompt-only / resource-only servers (which omit
the tools capability per the MCP spec) raise McpError(-32601 Method not
found), which aborted the connection — burning all 3 initial-connect
retries and permanently failing the server even though its prompts and
resources were perfectly usable. The 180s keepalive had the same problem:
it probed with list_tools(), so even a successfully connected prompt-only
server would be torn down on the first keepalive cycle.
Changes:
- MCPServerTask._advertises_tools(): capability check with a legacy
fallback (no captured InitializeResult -> behave as before)
- _discover_tools(): skip tools/list for non-tool servers
- keepalive: use the universal ping request for non-tool servers
- _refresh_tools(): guard against tools/list_changed from non-tool servers
E2E verified with a real stdio prompt-only FastMCP-style server: on main
it fails all 3 connection attempts with Method-not-found; with this fix
it connects, lists prompts, answers ping keepalives, and shuts down
cleanly.
The desktop "Local / custom endpoint" onboarding never collected an API
key and /api/model/set silently dropped one, so an auth-gated endpoint
(e.g. a hosted vLLM behind a key) could never enumerate models — and
Settings' "Set up custom endpoint" routed `custom` into a non-existent
OAuth flow, booting the user back to the first screen (the reported loop).
Backend (web_server.py):
- /api/providers/validate accepts an optional api_key and sends it as a
Bearer header when probing a custom endpoint's /v1/models.
- /api/model/set accepts api_key, persists it to model.api_key (same
switch/preserve lifecycle as base_url), and registers a named
custom_providers entry via _save_custom_provider — matching the
`hermes model` CLI flow so the endpoint shows up as a ready picker row.
Desktop:
- ApiKeyForm shows an optional API key field for the local/custom option;
the key is threaded through saveOnboardingLocalEndpoint → validate +
setModelAssignment.
- New onboarding `localEndpoint` intent + startManualLocalEndpoint(); the
Settings "Set up custom endpoint" button now opens the local-endpoint
form (URL + key) instead of the OAuth dead-end.
- Added localApiKeyPlaceholder i18n key (en + types + zh).
Tests: api_key lifecycle on _apply_main_model_assignment, key persistence
+ custom_providers registration on /api/model/set, Bearer-header probe;
onboarding store forwards + persists the key.
Cherry-picked from #39840 by @flyinhigh and rebased cleanly on main.
- Defer config fetch in createGatewayEventHandler until gateway.ready to
avoid render-phase RPC that can mutate transcript state and trigger
React error 301 in embedded dashboard PTYs.
- Use undici WebSocket fallback when globalThis.WebSocket is unavailable
(Node attach mode and sidecar mirror sockets).
- Add regression tests for both fixes.
Co-authored-by: flyinhigh <flyinhigh@users.noreply.github.com>
Collapse the verbose multi-line rationale comments across the TUI/desktop/
backend approval surfaces into single-line "why" notes, and derive
APPROVAL_OPTS_NO_ALWAYS from APPROVAL_OPTS instead of re-listing it.
No behavior change.
Node >=18 / Electron 40 ship fetch; the hand-rolled http/https.request
plumbing buys nothing. AbortSignal.timeout replaces the socket timeout,
protocol guard and >=400 rejection semantics preserved. 13/13 unit
tests and the live web_server.py repro both green over the new
transport.
Both spawn paths (startHermes, spawnPoolBackend) duplicated the same
resolve -> log-fallback -> foreign-check -> throw dance. Collapse it into
adoptServedDashboardToken(baseUrl, spawnToken, {childAlive, label}) in
dashboard-token.cjs; childAlive is a thunk so liveness is sampled after
the fetch. Drop the redundant backendPool.delete in the pool's throw
path (the child exit/error handlers already own pool eviction).
Validated end-to-end against a real web_server.py backend, not just
units: token-injection regex vs the actual served index.html, foreign
refusal (dead child + live squatter), benign drift adoption, and the
401-vs-200 token auth split on /api/sessions.
When a tirith content-security warning is present the approval backend
forces allow_permanent=False and silently downgrades an "always" choice to
session scope (the persistence loop in check_all_command_guards only honors
"always" → permanent when no tirith finding exists). But the gateway notify
payload that drives the TUI and the Electron desktop app never carried that
flag, so both surfaces always rendered "Always allow" — offering a permanent
allow the backend would quietly refuse to persist.
Plumb allow_permanent end-to-end:
- tools/approval.py: include `allow_permanent: not has_tirith` in the gateway
approval_data the notify callback emits as `approval.request`.
- ui-tui: thread `allowPermanent` through the event handler, gateway types,
and ApprovalReq; ApprovalPrompt drops the "always" option (and renumbers the
quick-pick keys) when it's false.
- apps/desktop: thread `allow_permanent` through the gateway payload type, the
per-session approval store, and the inline ApprovalBar, which now hides the
"Always allow…" dropdown item when permanent allow is disallowed — reusing
the existing DropdownMenu / confirm-Dialog UI.
The desktop/TUI render path for approvals already landed in #38578 (the root
cause of approvals not surfacing in the GUI); this completes the salvage of
#37856 by carrying allow_permanent across both surfaces. #37856's original
thread-local _block() approach is dropped: desktop/TUI approvals resolve via
approval.respond → resolve_gateway_approval (the per-session queue), not the
_block()/request_id correlation, so a worker-thread callback waiting on _block
would never be released by the real UI.
Tests: gateway notify payload carries allow_permanent (True without tirith,
False with a tirith warning); ui-tui approvalAction reduced option set +
event-handler allowPermanent propagation; desktop store round-trip + the
ApprovalBar showing/hiding "Always allow".
Supersedes #37856Closes#37812
Co-authored-by: LeonSGP43 <cine.dreamer.one@gmail.com>
Two fixes to the Electron desktop launch path, with the port-reservation logic extracted into a unit-tested module:
1. hermes:bootstrap:reset ("Reload and retry") only cleared connectionPromise, leaving the live backend alive; the orphan kept binding PORT_FLOOR (9120) so the next startHermes() hit EADDRINUSE / "Object has been destroyed" and the window looped. Await teardownPrimaryBackendAndWait() so the reset stops the old backend before restarting.
2. pickPort() probes-then-closes a socket before the real bind happens in a separate Python child, so two concurrent spawns (primary + pool backend) could both be handed PORT_FLOOR and one died with EADDRINUSE. The reservation bookkeeping is extracted into electron/port-pool.cjs (PortPool): pickPort() reserves the chosen port until the child exits and releases it on every exit/error/throw-before-spawn path, closing the TOCTOU window.
PortPool is dependency-injected (probe passed in) and socket-free, unit-tested in electron/port-pool.test.cjs (8 cases) and wired into the test:desktop:platforms script.
(cherry picked from commit d4133945b9)
The served-token fallback adopts whatever token the dashboard HTML
injects. That is correct when our own child regenerated the token (env
pin lost across a shell-wrapped spawn), but wrong when the readiness
probe answered from a process we did not spawn: /api/status is public,
so an orphaned dashboard squatting the port passes waitForHermes while
our child dies on the bind conflict. Silently adopting that process's
token would authenticate the renderer against a foreign backend,
possibly on the wrong profile.
Discriminate on child liveness: the desktop pins
HERMES_DASHBOARD_SESSION_TOKEN on every spawn, so a live child always
serves our token. Served-token mismatch + dead child = foreign backend;
fail the boot loudly instead of connecting. Mismatch + live child keeps
the adopt-served-token salvage from #43720.
`@assistant-ui/store`'s index-keyed child-scope lookup (`tapClientLookup`)
throws — rather than returning undefined — when a subscriber reads an index
the message/parts list no longer has. During high-frequency store replacement
(switching sessions mid-stream, gateway reconnect replay) a subscriber from
the previous, longer list is still in React's notification queue and reads one
slot past the new, shorter array before it can unmount. The throw
(`Index N out of bounds (length: N)`, the classic index === length off-by-one)
unwinds all the way to the root error boundary and blanks the entire window,
even though the store self-heals on the very next consistent snapshot.
Wrap each virtualized message group in a tiny boundary that swallows ONLY this
transient lookup race and auto-recovers when the message signature changes
(the existing list-mutation key). Any other error re-throws to the root
boundary, so genuine bugs still surface.
Upstream-tracked and unresolved: assistant-ui/assistant-ui#4051, #3652.
Co-authored-by: mollusk <mollusk@users.noreply.github.com>
Desktop spawns its dashboard backend with `--profile <name>` and
`HERMES_DESKTOP=1`. cmd_dashboard's unified-launch routing treats any
named profile as a request for the shared machine dashboard: it re-execs
as the default profile (dropping HERMES_HOME) or, when one is already
listening, prints "Machine dashboard already running ... Managing profile
'<name>'" and exits 0. Either way the desktop-spawned child exits before
the app sees a ready backend, so Desktop retries forever — the Windows
named-profile boot loop in the post-mortem.
Skip the machine-dashboard reroute when HERMES_DESKTOP=1 so desktop pool
backends stay per-profile (which is what the pool expects). Carved out of
#44478.
Co-authored-by: AJ <yspdev@gmail.com>
The desktop chat surface talks to the dashboard's in-process /api/ws
gateway, which builds agents through tui_gateway.server._make_agent. That
path only snapshots the existing tool registry — MCP discovery is started
by tui_gateway/entry.py (the stdio TUI), which the dashboard process never
runs. So a profile's configured MCP servers never connect under the
desktop app and sessions show no MCP tools.
Start a shared background MCP discovery thread at dashboard startup (via
hermes_cli.mcp_startup, bounded so a slow/dead server can't block boot),
and have _make_agent briefly join that thread in addition to the existing
entry-owned TUI thread before snapshotting tools.
Carved out of #44478.
Co-authored-by: AJ <yspdev@gmail.com>
The deploy-site skills index crawl was capped at ~3k ClawHub entries
because CATALOG_WALK_BUDGET_SECONDS applied to max_items=0 walks too.
Only enforce the wall-clock budget for bounded browse requests and pass
limit=0 from build_skills_index so CI walks the full catalog.
Co-authored-by: Cursor <cursoragent@cursor.com>
* docs: finish Automation Blueprints terminology rebrand
Replace leftover "Automation Templates" wording from the Cron Recipes
rebrand, rename the copy-paste cookbook guide to Automation Recipes, and
point the marketing gallery link at the blueprints catalog.
Co-authored-by: Cursor <cursoragent@cursor.com>
* docs: use Automation Blueprints instead of Recipes in guide
Rename the cookbook guide from automation-recipes to
automation-blueprints so sidebar and copy match the product term.
Co-authored-by: Cursor <cursoragent@cursor.com>
* docs: rename automation-blueprints-catalog to automation-blueprints
Drop the -catalog suffix from the reference page slug and title, and
move the copy-paste cookbook to automation-blueprint-examples so the
main Automation Blueprints doc is unambiguous.
Co-authored-by: Cursor <cursoragent@cursor.com>
* Revert "docs: rename automation-blueprints-catalog to automation-blueprints"
This reverts commit 605f1eeab5.
---------
Co-authored-by: Cursor <cursoragent@cursor.com>
Legibility pass on the consolidated prefix: collapse the topic-overlap rule
from three overlapping sentences into one WINS sentence + one discard/no-wrap-up
sentence (same constraints, less dilution), fix the module docstring to
describe the headings that actually shipped, and correct the #10896 comment's
heading name (Historical Pending User Asks).
The prompt consolidation above retires the carveout-era prefix. Without a
frozen copy in _HISTORICAL_SUMMARY_PREFIXES, summaries persisted by
pre-upgrade builds would lose detection (_is_context_summary_content) and
renormalization (_strip_summary_prefix) — the exact regression class the
tuple exists to prevent. Adds contract tests covering every frozen prefix.
Refs #41607#38364#42812
A WSL2 user reported the top two left-sidebar items being unclickable
while the rest of the UI works. That symptom shape matches an
-webkit-app-region:drag hit-test band eating clicks, not GPU/compositing:
the shell's titlebar drag strips (app-shell.tsx) span the top 34px and
the nav group clears them by only 6px, and drag regions win hit-testing
over DOM regardless of pointer-events. Linux WCO (Electron >=32) is the
newest implementation and has known region quirks (electron#43030).
Apply the same no-drag carve-out the codebase already uses for sticky
user bubbles (USER_BUBBLE_BASE_CLASS in thread.tsx) to the sidebar nav
buttons. Harmless on every platform: the rows were never meant to be
draggable surface.
Root-cause hardening for the stranded-empty-registry failure behind
'No web search/extract provider configured': discover_and_load() set
_discovered=True before scanning, so a sweep that raised partway was
swallowed by callers as a warning and every later call early-returned
against an empty registry for the process lifetime. The flag now acts
only as a re-entrancy guard and is reset when the sweep raises, so the
next call retries discovery.
Pins the invariant that _ensure_web_plugins_loaded registers the keyless
Parallel default (and the wider bundled set) even when the general plugin
discovery raises, that the direct-registration fallback honors plugins.disabled,
and that it stays a no-op on the healthy path.
web_search/web_extract are documented to work with zero setup via the bundled
keyless Parallel free-MCP backend, but that only holds when the bundled
plugins/web/* providers are registered. The dispatch relied entirely on the
general plugin sweep to do that; when the sweep finishes without registering
them (its exception swallowed as a warning, a packaged layout where it ran
before the bundled tree was importable, or a stale empty-discovery cache), the
registry is empty and BOTH tools dead-end on "No web {search,extract} provider
configured" — despite needing no setup at all.
_ensure_web_plugins_loaded now verifies the keyless default landed after the
sweep and, if not, registers the bundled web providers directly against the
registry. Idempotent, a no-op on the healthy path (one dict lookup), and honors
an explicit plugins.disabled entry.
* fix(discord): recover from runtime gateway task exits
Salvaged from #39416 (AMEOBIUS) — cherry-picked only the task-exit
recovery; the original PR was 1081 commits behind with 28 unrelated
commits.
A post-ready discord.py WebSocket crash left the gateway split-brained:
producers stayed active while Discord stopped responding. After this fix
the adapter calls _set_fatal_error(retryable=True) + _notify_fatal_error()
so the existing GatewayRunner reconnect watcher replaces the dead adapter.
Also adds _wait_for_ready_or_bot_exit() so startup failures (SOCKS/proxy
errors, invalid tokens) surface fast instead of burning the full ready
timeout. Because connect() no longer waits via asyncio.wait_for on that
path, test_connect_releases_token_lock_on_timeout is updated to trigger
the timeout through the new helper (same lock-release contract).
3 tests pass (2 new runtime-failure tests + the updated timeout test);
test_discord_connect.py and test_discord_slash_commands.py green.
Co-Authored-By: ameobius <ameobius@local.host>
* fix(test): patch _wait_for_ready_or_bot_exit in timeout cancel test
connect() no longer uses asyncio.wait_for for the ready handshake, so
test_connect_timeout_cancels_bot_task was hanging for 30s in CI.
Co-authored-by: Cursor <cursoragent@cursor.com>
---------
Co-authored-by: ameobius <ameobius@local.host>
Co-authored-by: Cursor <cursoragent@cursor.com>
Follow-up to #44389: the generic 'except Exception' branch in connect()
had the same orphaned-task hazard as the timeout branch. Extract the
cancel-and-await logic into _cancel_bot_task() and call it from all
three sites (timeout branch, exception branch, disconnect()).
Also adds deaneeth to AUTHOR_MAP.
When connect() times out waiting for the Discord ready event, the background
asyncio.Task running client.start() was not cancelled. discord.py's internal
reconnect loop can ignore client.close() while a WebSocket handshake is in
flight, so the orphaned task eventually completes and fires on_ready.
A later successful reconnect then leaves two live Discord clients in the same
process — each with its own on_message handler and MessageDeduplicator instance
— so every @mention creates two threads because the per-adapter dedup caches
cannot catch cross-client duplicates.
Fix: explicitly cancel and await _bot_task in two places:
1. The asyncio.TimeoutError handler inside connect() — catches the case where
the adapter's own inner wait_for fires before the gateway's outer timeout.
2. The start of disconnect() — the load-bearing path, always reached via
_dispose_unused_adapter regardless of which timeout fired first.
Root cause confirmed from production logs: a Jun 8 network outage caused three
consecutive connect() timeouts. The first attempt's bot_task completed its
handshake 4 minutes later ("Connected as") with no preceding watcher line,
then the watcher's real reconnect also connected 90 seconds after that. The two
clients ran continuously for 41+ hours, confirmed by the same user message
appearing as two separate inbound events in two different thread IDs 357ms apart.
Regression tests added to tests/gateway/test_discord_connect.py:
- test_connect_timeout_cancels_bot_task: simulates a connect() timeout with a
NeverReadyBot and asserts _bot_task is None afterward
- test_disconnect_cancels_running_bot_task: injects a live zombie task, calls
disconnect(), and asserts the task is cancelled and the attribute cleared
Sibling site of the PDF/DOCX note fixed in PR #44175: the audio file
attachment context note led with "Ask the user what they'd like you to
do with it", steering the model into asking instead of transcribing.
Rewritten to instruct the agent to transcribe/process the file itself
when the request involves its content, only asking when intent is
genuinely unclear. Contract assertion added to the existing audio
attachment note test.
Pin the contract for _build_document_context_note: text documents confirm the
inlined content and record the path; binary documents (PDF/DOCX/XLSX/octet-
stream) tell the agent to extract the text itself and never instruct it to ask
the user to paste the contents.
When a user attached a binary document (PDF, DOCX, XLSX, …) in chat, the
context note prepended to the turn said "Ask the user what they'd like you to
do with it." That steered the model into asking the user to paste the
contents rather than extracting the text it is fully capable of reading — so
attached PDFs/DOCX appeared "unreadable" to the agent.
Rewrite the binary-document note to tell the agent the file is a non-text
format saved at the given path and to extract its text itself (e.g. via the
terminal tool or the ocr-and-documents skill) before answering. Text
documents (whose content is already inlined by the platform adapter) keep
their existing note. The note construction is pulled into a small
`_build_document_context_note` helper so it is unit-testable.
Set CI=1 in _run_npm_install_deterministic so the package's /dev/tty
postinstall demo is skipped during hermes dashboard web UI builds.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(desktop): stop file tree throwing "two HTML5 backends" on remount
The Agent Workspace file tree (react-arborist) shows a permanent "TREE ERROR"
with `[error-boundary:file-tree] Cannot have two HTML5 backends at the same
time.` react-arborist mounts its own react-dnd DndProvider + HTML5Backend per
<Tree>. react-dnd v14 keeps that manager on a global, ref-counted singleton
context and nulls it when the count reaches 0. The tree is keyed on
`${cwd}:${collapseNonce}`, so changing folder / collapsing forces a fresh
<Tree>; during the remount the singleton can be torn down and recreated while
the previous HTML5Backend still owns `window.__isReactDndHtml5Backend`, so the
new backend's setup() throws. The error boundary then sticks, because "Try
again" just remounts into the same race.
Pass arborist a stable, app-lifetime `dndManager` (new getFileTreeDndManager
singleton) so it reuses one backend for the life of the app and never
double-claims the window flag. Drag/drop is already disabled on this tree;
this only changes how the (unused) dnd backend is provisioned.
Promotes dnd-core and react-dnd-html5-backend to explicit deps (already present
transitively via react-arborist's react-dnd 14.x line, so they dedupe to one
instance).
* fix(nix): bump npmDepsHash for desktop dnd deps
Adding dnd-core / react-dnd-html5-backend changed the workspace
package-lock.json, so the single workspace-root npmDepsHash in
nix/lib.nix was stale and the nix build failed. Regenerate it
(hash from the failing nix CI job's 'got:' value).
* fix(nix): update npmDepsHash for merged lockfile
After merging main, the workspace lockfile combined main's dep
changes with the desktop dnd additions, so the npmDepsHash needed
recomputing again. Hash from the nix lockfile-check job.
* fix(nix): use fetchNpmDeps hash for desktop dnd lockfile
prefetch-npm-deps reported sha256-lVnybH9RE/... but fetchNpmDeps
wants sha256-mYgKXE/FL4hnkrEvpVv+ULM/oeyIfO2AM9Ol8OrfWm0= for the
merged workspace lockfile. Use the nix build 'got:' hash so CI passes.
---------
Co-authored-by: Brooklyn Nicholson <brooklyn.bb.nicholson@gmail.com>
When a user toggles off the last visible model for a provider group, the
effectiveVisibleKeys() function treated the missing provider prefix as
'never customized' and re-added the default models on the next render,
causing all models to snap back to enabled.
Fix: store a sentinel key (e.g. 'provider::') when the last model for a
provider is toggled off. The sentinel distinguishes 'user hid everything'
from 'user never customized', preventing the default-fallback path from
re-adding models the user explicitly chose to hide.
Fixes#43485
Turn off browser spellcheck, autocorrect, and autocomplete on the main chat composer and message-edit composer so code, paths, and slash commands are not flagged or altered.
The coding-posture brief told GPT/Codex models to use patch mode='patch'
(V4A) for structured/multi-file changes but mode='replace' "for a single
small swap". That second nudge points those models at a format their
first-party harness never taught them.
Verified against openai/codex (current main): apply_patch is the ONLY file
editor in codex-rs — zero occurrences of str_replace/old_string anywhere in
the repo; the grammar (core/src/tools/handlers/apply_patch.lark) is exactly
the V4A dialect our patch_parser implements; the shipped model prompts
(gpt_5_codex, gpt-5.2-codex, gpt-5.1-codex-max + instruction templates)
explicitly say to use apply_patch "for single file edits"; and the tool is
gated per model via ModelInfo.apply_patch_tool_type, i.e. OpenAI ships
V4A-for-everything as model metadata.
The GPT-family line now steers to mode='patch' for all edits, single-file
included. The replace-family line (Claude + open-weight) is unchanged —
Claude Code's FileEdit is old_string/new_string/replace_all exact string
replacement (confirmed from Anthropic's shipped sdk-tools.d.ts, the only
file editor in its tool union), matching our mode='replace'.
CI tests the PR merged with current main, where the new /memory canonical
command filled Slack's 50-slash cap: with btw/bg/reset all pinned ahead of
canonicals, the last canonical (/debug) got clamped and the Telegram-parity
test failed. Canonical commands must win slots over alias spellings — /new
keeps its native slot and 'reset' stays reachable via /hermes reset.
Also updates test_includes_aliases_as_first_class_slashes to assert the
pinned-alias contract (_SLACK_PRIORITY_ALIASES survive) instead of a
specific unpinned alias's survival, which was the same change-detector
pattern the docstring already warned about.
Review fixes for the Cron Recipes stack before release:
- hydration-move: */90 in the cron minute field silently wraps to hourly
(croniter-verified) — 90/120-minute options never fired at their stated
cadence. Replaced with an hour-field step (0 9-17/2 * * 1-5) and an
interval_hours slot whose options (1/2/3h) all fire as labeled.
- fill_recipe: reject unknown slot names. A typo'd 'tiem=07:15' used to
silently create the job at the 08:00 default; now it 422s on the dashboard
form and errors on the slash/deep-link paths with the valid slot list.
- deliver slot: non-strict enum (options are suggestions, scheduler
validates downstream) so slack/whatsapp/etc. users aren't locked out;
GET /api/cron/recipes rewrites its options from cron_delivery_targets()
so the dashboard form only offers configured platforms; help text no
longer claims dashboard-created jobs deliver to 'the chat you set this
up from' (the endpoint strips origin — they go to the home channel).
- gateway: success/accept messages no longer point at /cron (cli_only);
surface-aware hint instead. Conversational fill now sends the
'Setting up X — I'll ask you a couple of things…' ack before the agent
turn, matching the CLI experience.
- important-mail catalog entry: reference the urgency classifier by module
path (python3 -m cron.scripts.classify_items) instead of baking an
absolute host path into the job prompt — stale after relocation and
nonexistent on remote terminal backends. cron/scripts is now a real
package and ships in the wheel (pyproject packages.find).
- export_recipe: interval schedules round-trip again — parse_schedule
stores 'minutes' but the renderer only read 'seconds', so every interval
job exported as the silent '0 9 * * *' fallback.
- skills_hub install: say so when a recipe suggestion is dropped
(latched dedup or pending cap) instead of printing nothing.
Targeted tests: 58 cron/recipe + 261 web_server pass; E2E-validated all
14 recipes fill+parse, hydration cadences via croniter, typo rejection on
slash + endpoint paths, surface-aware hints, and interval export round-trip.
Reworks the chat-line UX: pick a recipe by name and the agent asks you for
what it needs, one question at a time, instead of forcing you to hand-type a
slot=val command line.
- /cron-recipe -> lists the catalog
- /cron-recipe <name> -> forgiving name match (exact/prefix/substring/
fuzzy; ambiguous lists candidates), then seeds
the agent with a natural-language fill request
built from the recipe's typed slots + schedule
and prompt templates. The agent asks for each
value one at a time and calls the EXISTING
cronjob tool. No new tool.
- /cron-recipe <name> slot=val -> unchanged deterministic path (fill_recipe ->
create_job) for the dashboard/docs/power user.
Mechanism (no new plumbing, invariant-safe — the seed enters as a normal user
turn, never a synthetic injection):
- shared handler returns RecipeCommandResult{text, agent_seed}; match_recipe()
and build_recipe_seed() are the new shared pieces.
- gateway: dispatch rewrites event.text to the seed and falls through to the
agent (the same pattern /steer uses).
- CLI: handler sets a one-shot self._pending_agent_seed; the interactive loop
consumes it right after process_command() and runs it as the next turn.
The typed-slot schema stays the single source of truth (still validates the
form/inline path via fill_recipe); the agent path just renders those slots into
the questions to ask. Docs updated to lead with the name-then-ask flow.
A 'recipe' is a one-place definition of an automation that every surface
renders natively. The slot schema (cron/recipe_catalog.py) is the single
source of truth; four renderers consume it, and all paths end at the same
cron.jobs.create_job — no second job engine.
Form where there's a screen, conversation where there's a chat line:
- Dashboard / GUI app: a Recipes sub-tab on the Cron page renders each
recipe's typed slots as a form (time-picker, enum dropdown, free-text);
submit POSTs /api/cron/recipes/instantiate which fills + creates the job.
- CLI / TUI / messengers: /cron-recipe lists the catalog, shows a recipe's
fields, or fills + creates from a pasted 'key slot=val' command. The shared
handler (hermes_cli/cron_recipe_cmd.py) names any missing/invalid slot so
the agent can ask a targeted follow-up.
- Docs: a generated Cron Recipes catalog page (website, .mdx + React cards)
shows each recipe with a copy-paste command and a 'Send to App' button.
- Desktop: a hermes:// URL scheme (Electron single-instance lock +
setAsDefaultProtocolClient + open-url/second-instance) routes
hermes://cron-recipe/<key>?slot=val into the chat composer pre-filled.
Typed slots (time/enum/text/weekdays) with defaults: users never type raw
cron — recipes parameterize time-of-day and weekday sets and translate to
cron expressions; a free-text 'schedule' slot is the full-flexibility escape
hatch. Consent-first throughout: nothing schedules without an explicit submit
or send.
Core:
- cron/recipe_catalog.py — CronRecipe + RecipeSlot, 5 curated recipes,
recipe_form_schema / recipe_slash_command / recipe_deeplink /
recipe_catalog_entry renderers, fill_recipe (validate + translate to
create_job kwargs).
- hermes_cli/cron_recipe_cmd.py — shared /cron-recipe handler (CLI + TUI +
gateway never drift). CommandDef + dispatch in commands.py / cli.py /
gateway/run.py.
Dashboard: GET /api/cron/recipes + POST /api/cron/recipes/instantiate
(web_server.py), CronRecipes.tsx gallery+form, Segmented sub-tab on CronPage,
api.ts methods + types.
Desktop: hermes:// scheme end to end (main.cjs deep-link router + ready-queue,
preload onDeepLink/signalDeepLinkReady, global.d.ts types, desktop-controller
composer prefill, electron-builder protocols key).
Docs: extract-cron-recipes.py generator wired into prebuild.mjs,
cron-recipes-catalog.mdx + CronRecipesCatalog React component, sidebar entry.
Generated index json gitignored like skills.json.
Tests: 23 core (catalog/slots/schedule-resolution/validation/renderers/command
handler/generator) + 5 web_server endpoint tests. E2E verified end to end:
slot fill -> create_job -> persisted job with correct schedule/deliver/origin.
Hermes can propose automations and let the user accept them with one tap
via /suggestions, instead of making them assemble cron jobs by hand. Every
proposal — wherever it originates — flows through one surface.
Sources (the 'where suggestions come from'):
- catalog: curated starter automations (daily briefing, important-mail
monitor, weekly review, workday-start reminder) via /suggestions catalog
- recipe: installing a skill that carries a metadata.hermes.recipe block
registers a suggestion instead of auto-scheduling
- usage / integration: reserved for the background-review detector and
account-connect triggers (sources defined; emitters land next)
Pieces:
- cron/suggestions.py — the store. add/list/accept/dismiss, dedup+latch by
key (dismissed proposals never re-offered), pending cap so it can't become
a nag wall. Accepting calls the existing cron.jobs.create_job — there is
NO second job engine. Mirrors jobs.py storage (atomic writes, lock, 0600).
- cron/suggestion_catalog.py — the curated set. The important-mail monitor
entry is where the old proactive-monitor poll->classify->surface engine
lives now (cron/scripts/classify_items.py + the 'monitor' aux task), as ONE
catalog automation rather than a standalone feature.
- tools/recipes.py — recipe<->job bridge; register_recipe_suggestion() makes
a recipe source 'recipe' of this surface. recipe_to_job_spec() is the single
translation both the direct and suggestion paths share.
- hermes_cli/suggestions_cmd.py — shared /suggestions handler (CLI + gateway
never drift); /suggestions [accept N|dismiss N|catalog|clear].
- Wired: CommandDef + CLI dispatch (cli.py) + gateway dispatch (gateway/run.py)
+ aux 'monitor' task (config.py) + recipe-install hook (skills_hub.py).
Consent-first throughout: nothing auto-schedules; acceptance is always
explicit; dismissals latch.
Supersedes #41122 (proactive-monitor) and #41127 (recipes): both fold in here
as a catalog entry and a suggestion source respectively.
Tests: store (dedup/cap/accept/dismiss/latch), catalog seeding+idempotency,
recipe->suggestion bridge, command handler, aux config. E2E: recipe SKILL.md
-> parsed -> suggested -> accepted -> real cron job persisted to jobs.json.
The coding posture's names-only demotion of non-coding skill categories
(#44342) applied under the default auto mode, silently changing the skill
index for every user in a git repo. Index changes must be opt-in: demotion
now only fires under agent.coding_context=focus, alongside the toolset
collapse. auto/on leave the skill index untouched; focus semantics are
unchanged (demoted, never hidden; deny-list keeps coding-adjacent and
custom categories at full entries).
The Config page read config_path from /api/status, which is machine-global
and always reports the profile the dashboard process was started under.
After switching profiles with the global switcher, the header kept showing
the old profile's path (e.g. /root/.hermes/profiles/worker_1/config.yaml)
even though reads/writes correctly targeted the new profile.
Fix: /api/config/raw now returns the resolved path alongside the YAML
(resolved inside _profile_scope, so it follows ?profile=). ConfigPage
prefers that scoped path and only falls back to /api/status for old
servers. ProfileKeyedRoutes already remounts the page on switch, so the
header refreshes immediately.
Real-world failure with the original index pruning: under the default auto
posture, an agent-created ops skill in a demoted category vanished from the
prompt's skill index mid-project, and the agent silently fell back to a
stale sibling skill instead. The "discovery-only" premise didn't hold —
models do not reach for skills_list to rediscover what the index stops
showing them, and agent-created skills are the model's accumulated project
memory (runbooks, pitfalls, operating rules).
Gating pruning behind the opt-in focus mode was the wrong fix too: users
opening a worktree don't know the config exists, so the index-noise win
would effectively never ship.
Instead, the coding posture now DEMOTES non-coding categories rather than
hiding them: each demoted category renders as a single names-only line
("gaming [names only]: allthemons10-ops, mc-backup") with a footer note
explaining the omitted descriptions. Every skill name stays in the prompt,
so memory-anchored recall ("load <name>") keeps working in every mode,
while the description noise is still cut. Applies in auto/on/focus alike;
the general posture demotes nothing. Deny-list semantics unchanged —
unknown/custom categories and coding-adjacent ones keep full entries.
API renamed to match the honest semantics: hidden_skill_categories →
compact_skill_categories, build_skills_system_prompt(hidden_categories=) →
compact_categories=.
- nous_subscription: gate the STT managed-default flip on openai-audio
entitlement and skip when a local backend (faster-whisper or custom
command) works; new _local_stt_backend_available() helper + tests
- whatsapp_cloud: WHATSAPP_CLOUD_{DM_POLICY,ALLOW_FROM,GROUP_POLICY,
GROUP_ALLOW_FROM} env overrides so both adapters can run in parallel;
normalize allowlist entries (JID/punctuation) to bare wa_id
- whatsapp_cloud: wrap per-message event build in try/except (dedup-marked
wamids would be silently dropped on Meta's batch retry otherwise)
- whatsapp_cloud: validate media_id before URL/filename interpolation,
delete transient .ogg after voice upload, FIFO-cap interactive-button
state dicts and per-chat wamid cache
- whatsapp_common: '# **Title**' headers no longer double-wrap asterisks
- setup wizard: read access token / app secret via getpass on TTYs
- docs: new WHATSAPP_CLOUD_* gating env vars
Two dashboard fixes:
1. The 'Anthropic API Key' OAuth catalog entry's status fn read
~/.claude/.credentials.json (which has its own dedicated claude-code
entry) and never checked ANTHROPIC_API_KEY at all. It now checks the
Hermes PKCE file, then the registry env-var order (ANTHROPIC_API_KEY
-> ANTHROPIC_TOKEN -> CLAUDE_CODE_OAUTH_TOKEN) via get_env_value, so
keys from .env, the shell, or Bitwarden (injected into the process
env by load_hermes_dotenv) are all reported, with a '(from Bitwarden)'
source suffix when applicable.
2. Deprecated HERMES_TOOL_PROGRESS / HERMES_TOOL_PROGRESS_MODE removed
from OPTIONAL_ENV_VARS so the keys page and setup checklists stop
offering them. Moved to _EXTRA_ENV_KEYS so .env sanitization and
reload_env still recognize them for existing users (gateway back-compat
fallback unchanged).
IAM policies scoped to bedrock:InvokeModel only (a common least-privilege
setup) reject converse_stream() with AccessDeniedException. The agent loop
hard-prefers streaming and the denial never matched the 'stream not
supported' auto-fallback, so InvokeModel-only users looped on AccessDenied
forever.
- agent/bedrock_adapter.py: new is_streaming_access_denied_error()
detector (ClientError code check + wrapped-SDK message match);
call_converse_stream() falls back to converse() on denial.
- agent/chat_completion_helpers.py: bedrock_converse streaming branch
retries inline via converse() and sets _disable_streaming so later
turns skip the doomed stream attempt; the chat-completions retry
block also recognizes the denial for the AnthropicBedrock SDK path
(message pre-check avoids importing bedrock_adapter — and its lazy
boto3 install — for unrelated providers).
Both paths print a one-line notice telling the user which IAM action
restores streaming.
* fix(desktop): Harden local file tree paths
Normalize Electron local path handling across file tree, preview, media, and git-root flows. Reject malformed and Windows device paths, recheck sensitive files after realpath resolution, and preserve external symlink traversal with stable renderer errors.
* fix(desktop): Address file tree review feedback
* fix(gateway): gate oversized Telegram voice/audio before download
Adds a pre-download size check to the Telegram voice and audio inbound
paths. Files that exceed _max_doc_bytes (default 20 MB) are rejected
before get_file() is called, preventing silent OOM-style stalls on large
uploads. A human-readable note is appended to the event text so the
model can explain the limit to the user.
Also extends 403 entitlement detection in recover_with_credential_pool
to cover two additional cases: 'oauth authentication is currently not
allowed for this organization' and Anthropic anthropic_messages-mode 403s,
both of which should be treated as entitlement failures rather than
transient errors.
Tests: 7 new cases in test_telegram_voice_v0_regressions.py covering
the size gate (accept, reject, note text) and the STT-failure notice path.
Salvaged from #40487 (cryptopafi) — cherry-picked the Telegram voice
policy and 403 entitlement fixes; LiveKit/Discord/uv.lock workstreams
left for separate PRs.
* test(gateway): drop orphaned voice tests not backed by this PR
The cherry-picked test file from #40487 included 3 tests for STT-failure
notice and voice-mode (_handle_voice_command 'on' -> voice_only) behavior
that this PR intentionally does NOT salvage (those belong to the LiveKit/
voice-policy workstreams left in #40487). They fail on both this branch
and clean main because the feature code isn't present.
Keep only the 2 tests backed by code actually in this PR:
- test_telegram_audio_size_gate_rejects_oversized_media_before_download
(covers the _telegram_media_size_allowed guard this PR adds)
- test_voice_tts_is_explicit_audio_reply_opt_in (matches current main)
Removed now-unused imports (MessageEvent, MessageType, AsyncMock).
Headless/VPS users (dashboard-over-Tailscale, no comfortable SSH) could
list/toggle/install skills and create/edit cron jobs, but not author a
custom skill or link one to a cron job — the UI set WHEN a job runs, but
not WHICH skill it uses.
- Skills page: 'New skill' button + per-row edit pencil open a SKILL.md
editor dialog (frontmatter + body, server-side validation via the same
_create_skill/_edit_skill path as the agent's skill_manage tool).
- New endpoints: GET /api/skills/content, POST /api/skills,
PUT /api/skills/content — all profile-scoped via _profile_scope(),
which now also retargets tools.skill_manager_tool's import-time
SKILLS_DIR binding.
- Cron page: skills multi-select in both create and edit modals (parity
with hermes cron --skill / edit --add-skill); CronJobCreate gains a
skills field; job cards show an attached-skills badge. update_job
already accepted skills in updates.
- Tests: 17 new endpoint tests (content read, create/edit validation +
profile scoping + auth gate, cron skills round-trip).
* fix(gateway): refuse to write service definitions with a temp-dir HERMES_HOME
A test/E2E harness that exports HERMES_HOME=/tmp/... and touches any
gateway service write path (install, start self-heal, restart's
refresh_systemd_unit_if_needed) bakes the throwaway home into the
production systemd unit / launchd plist. The gateway then restarts
'healthy' but pointed at an empty temp home — no platforms enabled,
deaf to every message (live incident 2026-06-11: /tmp/hermes-e2e-41264
poisoned the unit during a PR-review E2E probe; the post-update restart
produced a 7-hour zombie gateway).
The existing safety belt only sniffed pytest-shaped markers
(/pytest-of-, /hermes_test). Add a structural guard:
_temp_home_in_service_definition() extracts HERMES_HOME from the
generated systemd unit or launchd plist and refuses the write (with
actionable guidance) when it resolves under tempfile.gettempdir(),
/tmp, /var/tmp, or the macOS /private variants. Wired into all five
write sites: systemd refresh + install, launchd refresh + install +
start self-heal.
* test: patch unit generator in install tests tripped by temp-home guard
CI runs hermetic with HERMES_HOME under a tmp dir, so the real
generate_systemd_unit() output now (correctly) trips the new temp-home
write guard in three install tests. Patch the generator with synthetic
non-temp content — same pattern the existing pytest-marker guard tests
use.
Adds an idle clock to the context/status bar in both the prompt_toolkit CLI
and the Ink TUI: once a turn completes, a dim '✓ <elapsed>' segment shows how
long the session has been idle since the last final agent response. Hidden
while a turn is live (the per-prompt elapsed timer covers that) and before
the first turn completes.
- cli.py: track _last_turn_finished_at when the agent thread exits, surface
it via _format_idle_since() in the snapshot, render in both the wide
fragments path and the plain-text fallback.
- ui-tui: stamp lastTurnEndedAt when busy flips false after a live turn,
thread it through appStatus -> StatusRule, render via a ticking IdleSince
segment sharing the duration breakpoint/width budget.
Commit 550b72dd8 changed the concurrent-path tool-result rendering gate
from 'not agent.quiet_mode' to 'tool_progress_mode != off'. Subagents are
constructed with quiet_mode=True but inherit the default
tool_progress_mode='all', so every child tool call during delegate_task
started printing raw '✅ Tool N completed in Xs - {json...}' lines into
the parent's display, bypassing the curated tree-view relay in
_build_child_progress_callback.
Fix: require BOTH gates — quiet_mode must be off AND tool_progress_mode
must not be 'off' — restoring subagent silence while preserving the
#33860 fix (CLI verbose + tool-progress off stays suppressed). The same
combined gate is applied to the three sibling print sites in
tool_executor.py (concurrent header/args, sequential args, sequential
completion) so the whole class is consistent.
Two beta-reported dashboard bugs:
1. Models page: 'Use as -> Main model' on an analytics card sends
entry.provider, which falls back to the model's VENDOR prefix
(modelVendor('anthropic/claude-opus-4.6') == 'anthropic') when the
session row has no billing_provider. That persisted
provider: anthropic + default: anthropic/claude-opus-4.6 — a
vendor-prefixed OpenRouter slug on the NATIVE Anthropic provider.
New sessions then 400 against api.anthropic.com and the user reads
it as 'changing models does nothing'. Unknown vendors (moonshotai,
poolside, ...) were worse: a provider that can never resolve
credentials.
Fix: _normalize_main_model_assignment() at the single write
chokepoint — maps non-provider vendor names back to the user's
current aggregator (else openrouter), and runs the model through
normalize_model_for_provider() so the persisted name matches the
target provider's API format. Wired into both /api/model/set and
the profile-scoped _write_profile_model.
2. System page: 'Restore from backup' spawns hermes import with
stdin=DEVNULL, so the CLI's interactive 'Continue? [y/N]' overwrite
prompt hits EOF and auto-aborts whenever a config already exists
(always, when the dashboard is running). Fix: ConfirmDialog in the
dashboard owns the consent, then the endpoint passes --force so the
restore runs non-interactively.
Validated live: dashboard on a temp HERMES_HOME, repro'd both failure
modes pre-fix (vendor-slug write verified via config.yaml + tui
session.create; import 'Aborted.' in action-import.log), then verified
post-fix (normalized writes, modal -> --force -> restored marker file).
* fix(matrix): isolate room context and inbound dispatch
* test(matrix): cover room isolation and dispatch regressions
* docs(matrix): document room isolation and session scope
* fix(matrix): stabilize CI requirement checks
* test(matrix): isolate mautrix stubs in requirements tests
* fix(matrix): port room-scoped status and resume to slash commands mixin
Move Matrix /status scope output and /resume same-room guards from the
pre-refactor gateway/run.py into gateway/slash_commands.py so PR #18505
foundation behavior survives the upstream god-file decomposition.
Uses i18n keys for Matrix resume/status messages. Preserves upstream
session.py fixes (role_authorized, DM user_id isolation).
* docs(matrix): explain inbound dispatch via handle_sync loop
Document why Hermes uses an explicit sync loop with handle_sync() rather than
client.start(), aligning with upstream #7914 diagnostics while preserving
Hermes background maintenance tasks.
* fix(i18n): add Matrix resume/status keys to all locale catalogs
The Matrix /resume and /status slash-command keys added in the foundation
PR must exist in every supported locale file. tests/agent/test_i18n.py
asserts key and placeholder parity across catalogs.
Non-English locales use English strings as interim placeholders until
community translators can localize them.
* fix(matrix): restore gateway authz for allowed_users; honor config require_mention
Revert the early MATRIX_ALLOWED_USERS gate in _on_room_message so inbound
sender authorization stays in gateway authz like main. Parse require_mention
from config.extra (platforms.matrix / top-level matrix yaml) with env fallback,
matching thread_require_mention and fixing Forge when require_mention is set
only in profile config.yaml.
* fix(matrix): harden status scope and allowlisted DMs
* fix(matrix): use session store lookup for resume scope
* fix(mcp): propagate HERMES_HOME override onto the MCP event loop
Closes the known limit documented in #44007: tasks scheduled via
run_coroutine_threadsafe are created INSIDE the MCP loop thread, so they
copy that thread's context — a per-request profile scope (dashboard
?profile= endpoints, e.g. the MCP 'Test server' probe) silently vanished
for anything resolving get_hermes_home() inside the coroutine. Most
visible symptom: OAuth token-store paths (HERMES_HOME/mcp-tokens/)
resolved against the process home instead of the selected profile, so
testing an OAuth MCP cross-profile read the wrong tokens.
_run_on_mcp_loop now wraps scheduled coroutines with the caller's
context-local override (_wrap_with_home_override): set inside the task's
own context on the loop, reset on completion — task-local, so concurrent
calls carrying different scopes don't interfere, and the loop thread's
default context stays untouched. No-op (coroutine passes through
unwrapped) when no override is active, i.e. every non-dashboard caller.
web_server's probe comment updated from 'known limit' to 'covered'.
Tests: override propagation (direct + factory form), OAuth token-path
resolution on the loop, loop-context cleanliness after scoped calls,
no-op passthrough. 225 green across mcp_tool + unification suites.
* test(mcp): concurrent different-scope calls don't interfere
A long-lived Baileys bridge survives gateway restarts AND hermes update:
connect() adopted any bridge already listening with status connected, and
disconnect() only kills bridges the adapter spawned itself. Users who
updated to get inbound media support kept talking to a bridge process
serving months-old bridge.js — images and voice notes still arrived as
placeholders with no cached file path (refs #19105 follow-up reports).
Three fixes in the same stale-bridge class:
- Staleness handshake: bridge.js reports a sha256 self-hash in /health
(scriptHash); connect() compares it against bridge.js on disk and
restarts the bridge on mismatch. Pre-handshake bridges report no hash
and are treated as stale, so every existing stale bridge gets recycled
exactly once on the next gateway start.
- npm dep refresh: deps reinstall when package.json changes (stamp file
in node_modules), not only when node_modules is missing — a Baileys
pin bump now actually lands.
- Cache-dir passthrough: the gateway passes profile-aware
HERMES_{IMAGE,AUDIO,DOCUMENT}_CACHE_DIR to the bridge instead of the
bridge hardcoding ~/.hermes/image_cache etc., fixing media paths under
HERMES_HOME overrides, profiles, and the new cache/ layout.
* feat(dashboard): unify multi-profile management — one machine dashboard, global profile switcher
The dashboard becomes a machine-level management surface with one
write-target selector, replacing per-profile dashboard fragmentation.
Backend:
- profile param (query or body) on /api/config (get/put/raw), /api/env
(get/put/delete/reveal), /api/mcp/servers (list/add/remove/test/enabled),
/api/mcp/catalog (list/install), /api/model/info, /api/model/set —
all scoped through the existing _profile_scope() context manager
- model/set restructured: expensive-model warning (await) runs before the
scope; the config write runs sync inside the scope in a worker thread
- MCP catalog installs + git-bootstrap entries spawn 'hermes -p <profile>'
- chat PTY: ?profile= on /api/pty points the child's HERMES_HOME at the
profile dir (its own gateway subprocess, config/skills/memory/state.db
all profile-bound); in-process gateway attach skipped when scoped
CLI launch unification:
- '<profile> dashboard' routes to the machine dashboard: attach (open
browser at ?profile=) when one is listening, else re-exec pinned to the
default profile with --open-profile preselecting the launcher
- --isolated preserves the old dedicated per-profile server behavior
- start_server(initial_profile=...) appends ?profile= to the auto-open URL
Frontend:
- ProfileProvider + sidebar ProfileSwitcher: ONE global selector, URL-
persisted (?profile=), mirrored into fetchJSON which auto-appends the
param to the scoped endpoint families (explicit params win)
- app-wide amber banner names the managed profile
- SkillsPage's page-local selector (from the skills-scoping PR) folded
into the global context — single source of truth
- ChatPage threads the scope into the PTY WS URL; switching profiles
remounts the terminal into a fresh scoped session
Omitted profile keeps legacy behavior everywhere.
* docs(dashboard): document machine-level multi-profile management
- web-dashboard.md: 'Managing multiple profiles' section (switcher, URL
deep-links, unified launch, --isolated, scoped Chat, what stays
per-profile) + --isolated in the options table
- profiles.md: 'From the dashboard' subsection + set-as-active vs
switcher clarification
- cli-commands.md: --isolated flag + profile-alias launch example
* fix(dashboard): address profile-unification review findings
Review findings (dev review on PR #44007):
1. HIGH — stale page state on profile switch: pages load data on mount
and didn't consume the profile scope, so a page opened under profile A
kept showing A's state while writes silently targeted the newly
selected B. Fixed structurally: ProfileKeyedRoutes wraps the routed
page tree and keys it by the selected profile, remounting every page
(fresh state + refetch) on switch. ChatPage keeps its own remount
(channel keyed on scopedProfile).
2. HIGH — /api/model/auxiliary read was unscoped while /api/model/set
wrote scoped (Models page could show default's aux pins while editing
worker's). Endpoint now takes profile + _profile_scope, added to
PROFILE_SCOPED_PREFIXES, HTTPException re-raise so ghost profiles 404
instead of 500. Regression test asserts read/write symmetry with
differing worker/default aux config.
3. MEDIUM — tools post-setup spawned unscoped from the profile-aware
drawer. Now spawns 'hermes -p <profile> tools post-setup <key>'
(same mechanism as hub installs); drawer threads its profile prop.
Most hooks install machine-level artifacts where the scope is inert,
but hooks reading config/env now see the drawer's HERMES_HOME.
4. LOW — ty warnings: env Optional asserts before subscript/membership,
fastapi import replaced with web_server.HTTPException re-use.
298 tests green across the four affected suites; tsc -b + vite build
green; aux scoping E2E-verified with real imports.
* fix(dashboard): address second profile-unification review (gille)
1. BLOCKER — profile scope dropped on sidebar navigation: ProfileProvider
derived the selection from the current URL, and nav links are bare
paths, so clicking Config from /skills?profile=worker silently reset
the write target. State is now the source of truth; an effect
re-asserts ?profile= onto the new location after every navigation
(URL stays a synchronized projection for deep links/refresh), and an
incoming URL param (e.g. 'Manage skills & tools' links) still wins.
2. BLOCKER — /api/model/options unscoped while model/set wrote scoped:
the picker context (current model/provider, custom providers,
per-profile .env auth state) now loads inside _profile_scope; added
to PROFILE_SCOPED_PREFIXES. Test: a worker-only current-model pin
appears in the scoped payload and not the unscoped one.
3. BLOCKER — MCP test-server probe escaped the scope after the config
read: the probe now re-enters _profile_scope inside the worker thread
so env-placeholder expansion resolves against the selected profile's
.env. Known limit (documented): the probe's dedicated MCP event-loop
thread doesn't inherit the contextvar (OAuth token paths). Test
asserts get_hermes_home() inside the probe == the worker profile dir.
4. BLOCKER — broad excepts swallowed unknown-profile 404s: /api/model/info
degraded to 200-with-empty-model-info and /api/mcp/catalog to a
silently-empty catalog. Both re-raise HTTPException; 404 regression
tests added for info/options/catalog.
Polish: scope banner clears the fixed mobile header (mt-14 lg:mt-0);
--open-profile hidden via argparse.SUPPRESS (internal re-exec flag);
attach-path test now asserts the opened ?profile= URL.
(Stale-page-state + /api/model/auxiliary findings from this review were
already fixed in 92bcd1568 — the review ran against e600f6951.)
35 tests in the two new suites + 274 in the adjacent ones, all green;
tsc -b + vite build green; scoping E2E-verified with real imports.
* docs(dashboard)+fix: self-review pass — Profiles page section, REST profile-param tip, body-beats-query precedence
Docs:
- web-dashboard.md: add the missing 'Profiles' subsection to Pages
(cards, create/builder, manage-skills jump, set-as-active vs switcher
distinction, editors); REST API section gets a profile-scoped-endpoints
tip documenting ?profile= / body profile / 404 semantics / /api/pty
- (profiles.md + cli-commands.md were already updated in e600f6951)
Precedence fix: scoped endpoints taking BOTH a query param and a body
field now resolve body.profile first. The SPA's fetchJSON injects the
query param from the GLOBAL switcher; an explicit body.profile (e.g.
Profile Builder flows writing into a specific new profile) is the more
specific intent and must not be overridden by whatever the sidebar
happens to be set to. Matches the documented 'explicit beats global'
contract in api.ts.
Verified: 304 tests green across the four suites; tsc -b + vite build
green; docusaurus build green (only pre-existing broken-link warnings,
none from this PR's pages).
When auto-compression rotates the session tip (old #4 → new #5), the
incoming page carries the new tip but the previous list still holds the
old one. The old tip's id differs from the new tip's id, so the existing
id-only dedup in mergeSessionPage() preserves both as separate sidebar
rows.
Add lineage-level dedup: build a set of incoming lineage keys
(`_lineage_root_id ?? id`) and filter survivors whose lineage key
matches any incoming row. This mirrors the existing sessionPinId()
logic used for pin stability.
Fixes#43483
Extract the remote-detection helpers (canonicalGitHubRemote, isSshRemote,
isOfficialSshRemote) from main.cjs into a testable update-remote.cjs sibling
module and add a node:test suite, wired into test:desktop:platforms.
main.cjs requires('electron') at load, so its inline helpers weren't unit
testable. The Python side of #43754 shipped a regression test; this gives the
desktop side the same coverage for the security-critical detection that keeps
passive update checks off the SSH origin (avoiding FIDO2/passkey touch
prompts). Tests assert SSH/HTTPS forms canonicalize equal, official SSH is
detected case-insensitively, and forks / other hosts / the HTTPS remote are
NOT misclassified.
Follow-up to the winget stale-registration fix. Update-ProcessPathForPackages
rebuilt $env:Path wholesale from the persisted User+Machine hives (plus winget's
Links dir), discarding any process-only PATH entries added earlier in the
installer run. Since the helper now runs after every package manager, that
wholesale replace is more likely to clobber a process-local entry than the
original winget-branch-only version was.
Merge instead: seed from the current process PATH, then append hive and
winget-Links entries not already present, with a case-insensitive,
order-preserving dedupe. Behaviour on a clean box is unchanged (the hive entries
are simply appended); the difference is that pre-existing process-only entries
now survive the refresh.
When ripgrep/ffmpeg is missing, `winget install <id>` on a package winget
already has registered is treated as an upgrade: it finds no newer version and
exits 0x8A15002B (-1978335189, APPINSTALLER_CLI_ERROR_UPDATE_NOT_APPLICABLE)
without ensuring the binary is actually present. The installer only logged that
code and judged success by `Get-Command rg`, so a stale registration (files
removed outside winget, or a missing alias shim) became a permanent dead-end —
winget kept reporting "already installed" and the user could never reinstall.
Detect that exit code and retry once with `--force` to repair the registration
so the shim reappears.
Also refresh the process PATH after the choco and scoop fallbacks (not just
winget) via a shared helper, so a successful fallback install — or any install
on a box without winget — is no longer misreported as "not installed".
The pre-update / pre-migration backup path (_write_full_zip_backup) had the
same /tmp staging bug as run_backup: a small tmpfs at the default tempfile
location silently drops large *.db files from the archive. Route its SQLite
staging temp files to the output zip's directory as well, and add regression
tests (mutation-verified) for both staging paths.
Co-authored-by: liuhao1024 <sunsky.lau@gmail.com>
Two bugs in the backup routine:
1. SQLite safe-copy used tempfile.NamedTemporaryFile() which defaults to
the system temp directory (/tmp). When /tmp is a small tmpfs and the
database is large, the copy silently fails and the resulting zip is
missing state.db, kanban.db, and response_store.db.
Fix: pass dir=out_path.parent so the temp file is staged alongside the
output zip on the same filesystem.
2. _EXCLUDED_DIRS contained "hermes-agent" which matched at ANY path
depth, accidentally excluding the Hermes Agent skill directory at
skills/autonomous-ai-agents/hermes-agent/.
Fix: special-case "hermes-agent" to only match when it is the first
path component (the root-level code checkout). All other excluded dir
names continue to match at any depth.
Regression tests added for both fixes.
Follow-up to the salvaged #41264 (Windows watcher): the setsid/bash detached
restart watcher on Linux/macOS inherits _HERMES_GATEWAY=1 the same way, so
the CLI's self-restart loop guard silently refuses 'hermes gateway restart'
and the gateway never comes back. Scrub the marker from the watcher env on
the POSIX branch as well, and extend the setsid test to assert it.
Desktop already recovers from a stale runtime session id when
`prompt.submit` returns `session not found` after a gateway restart or
sleep/wake. The stop path did not have the same recovery: `cancelRun`
called `session.interrupt` once with the stale runtime id, then surfaced
`Stop failed / session not found`.
This makes stop/cancel mirror the prompt recovery path. If
`session.interrupt` reports `session not found` and the selected stored
session id is available, Desktop resumes that durable session, updates
the active runtime ref with the recovered id, and retries
`session.interrupt` once against the recovered runtime id.
Salvaged from #43941 — rebased onto current main, dropping the unrelated
`package-lock.json` (@types/node 24.13.1->24.13.2) and `nix/lib.nix`
hash churn. That bump is a local npm 11 re-resolution artifact, not a CI
requirement: repo CI runs node 22 (npm 10) and main is green at
@types/node 24.13.1, so the lockfile and nix hash do not need to change.
Co-authored-by: helix4u <4317663+helix4u@users.noreply.github.com>
When the agent runs several terminal commands back-to-back, each
progress line repeated the '💻 terminal' header above its fenced code
block, cluttering the progress bubble. Now only the first terminal call
in a streak emits the header; subsequent consecutive terminal calls
render adjacent code blocks. Any other tool (or non-block preview)
resets the streak so the next terminal call gets a fresh header.
* fix(cli): omit --workspace when subpackage has its own package-lock.json
When ui-tui/ (or web/) contains its own package-lock.json, _workspace_root()
returns the subpackage directory itself. Passing --workspace ui-tui in that
case fails because npm cannot find a workspace named 'ui-tui' inside ui-tui/.
Fix: skip the --workspace flag when npm_cwd equals the target directory,
running a plain 'npm install' from the standalone project root instead.
Applies the same fix to both _make_tui_argv (TUI) and _build_web_ui (web).
Fixes#42973
* test(cli): fix web workspace-scope fixture + cover own-lockfile fallback (#42973)
The web half of the #42977 fix broke test_npm_install_uses_workspace_web_scope,
which built its fixture with no lockfile anywhere. Without a root lockfile,
_workspace_root(web_dir) already returns web_dir, so the new
"() if npm_cwd == web_dir" branch correctly drops --workspace and the
assertion failed. Model a real workspace checkout instead: the single
package-lock.json lives at the root, so --workspace web scopes the install.
Also add the symmetric web regression test (web/ carrying its own lockfile =>
--workspace must be dropped and the install runs plainly from web_dir via
npm ci), matching the TUI coverage already in test_tui_npm_install.py.
---------
Co-authored-by: liuhao1024 <sunsky.lau@gmail.com>
Keyed draft stash (Map + localStorage mirror) behind the live composer:
switching threads stashes the departing draft and restores the entering
one; empty threads show an empty box. Session lifecycle never clears
composer state — the scope swap is the only coupling.
Co-authored-by: mollusk <roger@roger.local>
Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
- DRY the duplicated submit-restore blocks into dispatchSubmit()
- inline localStorage access (drop browserStorage indirection);
clearPersistedComposerDraft delegates to write('')
- drop stale per-scope-stash comment in use-session-actions
The composer is a single global surface that sits ABOVE the thread: its
contents follow the user across session switches and are never touched
by session lifecycle. Switching threads doesn't change the render.
Replaces the per-scope draft choreography (scoped storage keys, attachment
stash map, skip-sentinel, restore-on-scope-change effect) with:
- one global localStorage key so an unsent draft survives app reloads
- a one-shot restore on mount
- nothing else — session switches simply don't touch the composer
Verified E2E via CDP with real sidebar clicks + real keystrokes:
typed draft survives A->B->A switching and a full page reload.
* feat(agent): coding-context posture with per-model edit-format tuning
Hermes detects when it's running in a coding context — an interactive
surface (CLI, TUI, ACP, desktop) sitting in a code workspace (git repo or
recognised project root) — and shifts into a coding posture. Outside that
(chat platforms, non-workspaces) nothing changes.
The posture is modelled as a frozen RuntimeMode selected from a small
ContextProfile registry (coding/general). A profile is data: the toolset to
collapse to, the operating brief to inject, and seams for model routing and
memory. Every domain reads the same resolved object instead of re-probing
git/config on its own:
- System prompt — RuntimeMode.system_blocks(): an operating brief (gather
context before editing, edit through tools not chat, verify with terminal,
cap retry loops) plus a live git/workspace snapshot, built once and baked
into the stable prompt tier so per-conversation caching is preserved.
- Per-model edit-format tuning — the brief nudges each model family toward
the patch mode it handles best: OpenAI/Codex toward mode='patch' (V4A
multi-file diffs), Anthropic toward mode='replace' (string replacement).
The model id rides on RuntimeMode; unknown families keep neutral wording.
- Skill index — non-coding skill categories are pruned from the prompt's
skill index (discovery-only; skills_list/skill_view still reach the full
catalog, with a disclosure note).
- Toolset — only under the opt-in 'focus' mode does the posture collapse to
the coding toolset + enabled MCP servers; the default posture is
prompt-only and never overrides configured toolsets.
Activation via agent.coding_context: auto (default), focus, on, off.
Subagents inherit the posture for free via toolset inheritance + the shared
prompt builder. Detection is not memoized so a long-lived gateway/TUI
process can't pin a stale posture across working directories.
* feat(agent): cover new-file authoring in the coding edit-format nudge
The per-model edit-format guidance only addressed editing existing code
(patch mode='patch' vs 'replace'), but authoring a brand-new file —
write_file, not patch — is a large fraction of real coding work and the
nudge was silent on it. Surfaced when building a single-file artifact where
the dominant operation was write_file and the steering offered no guidance.
Both family lines now lead with "author new files with write_file; for
edits to existing code prefer ...". Tests assert write_file appears in each
family's brief; unknown families still get neutral wording.
* docs(agent): correct memoization docstring + clarify TUI config-load asymmetry
* feat(agent): sharpen the coding posture — verify-loop facts, wider edit steering, $HOME guard
Tuning pass on the coding posture from dogfooding it as a harness:
- Workspace snapshot now hands the model its verify loop up front:
detected manifests + package manager (lockfile sniff), the exact
verify commands (package.json scripts, Makefile targets,
scripts/run_tests.sh, pytest config), and which context files
(AGENTS.md / CLAUDE.md / .cursorrules) exist at the root. Marker-only
(non-git) projects get the snapshot too instead of nothing. The
"verify before claiming done" brief line was the highest-value piece
in evals — this turns it from advice into an executable loop instead
of making the model rediscover the test command every session. Still
stat-cheap, size-guarded reads, built once at prompt time.
- Edit-format steering covers the families Hermes actually serves:
Gemini and open-weight coding models (DeepSeek, Qwen, Kimi, GLM,
Grok, Hermes, Llama, Mistral, Devstral, MiniMax) steer to
mode='replace' — their RL scaffolds use str_replace-style editors.
Previously only GPT/Codex and Claude families got steering; the
models Hermes users disproportionately run all fell to neutral.
- Operating brief gains four behaviors elite harnesses encode: batch
independent reads/searches in one turn; fix root causes and the bug
class (sibling call paths), not the reported site; no drive-by
refactors/renames/reformatting; never read, print, or commit secrets.
Plus a patch-failure escalation ladder: after the same region fails
twice, rewrite the enclosing function/file with write_file instead of
a third patch attempt.
- $HOME dotfiles guard: a git repo rooted exactly at the home directory
(or a marker sitting in it, e.g. a global ~/AGENTS.md) is user config,
not a code workspace — without the guard, every session anywhere under
a dotfiles-managed home silently flipped to the coding posture. Real
projects under such a home still detect via their own markers/repos;
'on' mode bypasses the guard.
Manual testing of the salvaged draft persistence showed none of it worked
end-to-end. Three distinct bugs, all invisible to the store-level unit
tests:
1. New-chat drafts were never written. The skip-one-persist sentinel was
reset to null after consuming, but null IS a real scope (the unsaved
new-session draft) — so in a new chat every persist run matched the
"consumed" sentinel and bailed. This silently killed the headline
#38498 fix. Use undefined as the no-skip sentinel, which can never
collide with a scope.
2. Cmd+R inside the debounce window dropped the trailing text. React does
not run effect cleanups on a page reload, so the flush-on-unmount
never fired; with the 400ms debounce that meant type-then-reload lost
the draft every time. Flush pending writes on pagehide.
3. Session switch/new/resume/branch paths in use-session-actions cleared
the composer stores synchronously with the session-id updates. React
batches those, so by the time ChatBar's scope-change cleanup ran to
stash the departing session's attachments, the store was already
empty — the stash recorded [] and the chips were lost anyway. The
composer's per-scope restore now owns composer contents wholesale on
scope change, so drop the upstream clears (clearComposerDraft only
touched the vestigial $composerDraft atom nothing reads).
Co-authored-by: mollusk <roger@roger.local>
Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
Fully removes the cron per-job 'profile' arg added in #28124: the
cronjob tool schema field, CLI --profile flags on cron create/edit,
job-record storage/validation, the scheduler's _job_profile_context
wrapper, and the script-runner env override. Sequential-partition
logic reverts to workdir-only.
The context-local HERMES_HOME override in hermes_constants and the
subprocess bridging in tools/environments/local.py are kept — they
now have other consumers (dashboard multi-profile, TUI gateway).
Drop the hermes_state.py column + persistence plumbing from the salvaged
interleaved-thinking fix. The ordered-block channel covers the failure
window in-memory (turn replayed within the live conversation loop). A
session reloaded from disk after a crash falls back to reconstruction;
if that replay 400s, the thinking-signature recovery (#43667) strips
reasoning_details and retries — one degraded call in a rare resume path
instead of a schema column. Replaces the DB-roundtrip test with a
fallback-shape test.
Two additive hardening changes on the interleaved-thinking replay path
introduced by this PR's anthropic_content_blocks channel. Both are scoped
to that channel's blast radius; neither changes correct behavior.
1. Replay-time tool-input re-sourcing (credential safety).
The ordered-block channel captures each tool_use `input` from the RAW
API response in normalize_response, which is NOT credential-redacted.
The parallel tool_calls[].function.arguments IS redacted at storage
time (build_assistant_message, #19798). The verbatim-replay fast path
in _convert_assistant_message replayed the raw block input, so a secret
a model inlined into a tool call (e.g. an Authorization header value
passed inside a terminal command) would ride back onto the wire even
though it is redacted everywhere else in history. Re-source tool_use
input from the redacted tool_calls map by
sanitized id; interleave order (the reason this channel exists) is
unaffected. Adapted from #36071, which re-sources tool inputs the same
way on its replay path.
2. Broaden the thinking-replay 400 classifier (defense-in-depth).
error_classifier only matched "signature" + "thinking", so the
frozen-block variant — "thinking ... blocks in the latest assistant
message cannot be modified. These blocks must remain as they were in
the original response." — carried no "signature" token and fell through
to a non-retryable abort. The anthropic_content_blocks channel prevents
the reorder that triggers this 400 at the source, but if any future
mutator reintroduces it, the turn now self-heals via the existing
strip-reasoning-and-retry recovery instead of crash-looping. A negative
case ensures an unrelated "cannot be modified" 400 (no "thinking") is
not swept in. Mirrors the classifier broadening in #36087 and #36071.
Tests
- tests/agent/test_anthropic_thinking_block_order.py: a replay test
asserting an inlined secret is redacted on the wire while interleave
order is preserved.
- tests/agent/test_error_classifier.py: three cases — frozen-block 400
native and via OpenRouter route to thinking_signature/retryable; an
unrelated "cannot be modified" 400 does not.
Both grafts verified RED (tests fail with the change reverted) then GREEN.
Full adapter, transport, classifier and output-field-leak suites pass.
Co-authored-by: AlexanderBFoley <92330381+AlexanderBFoley@users.noreply.github.com>
HTTP 400 "messages.N.content.M.text.parsed_output: Extra inputs are not
permitted" on the native Anthropic transport. Anthropic SDK 0.87.0 response
blocks carry output-only attributes the Messages *input* schema forbids: text
blocks get `parsed_output` and `citations=None`, tool_use blocks get `caller`.
normalize_response captured blocks verbatim via _to_plain_data and replayed
them as request input on the next turn, so the forbidden fields leaked back ->
400. Like the earlier thinking-block bug, one poisoned turn wedges every
subsequent request in the session (even the diagnostic turn), recoverable only
by switching models or deleting the session.
This is a defect in the anthropic_content_blocks channel added for the
interleaved-thinking fix: it preserved block ORDER correctly but copied every
SDK attribute, including output-only ones.
Fix — whitelist input-permitted fields per block type at all three leak points:
- agent/transports/anthropic.py normalize_response: sanitize at CAPTURE so the
poison never persists to state.db (defence-in-depth).
- agent/anthropic_adapter.py _sanitize_replay_block (new): whitelist used on the
ordered-blocks replay path; also recovers already-poisoned stored sessions.
- agent/anthropic_adapter.py _convert_content_part_to_anthropic: a stored
`text` part is rebuilt from whitelisted fields instead of dict(part) verbatim
(this was the exact content.N.text.parsed_output failure locus).
Whitelist not blacklist, so future SDK output-only fields can't reintroduce it.
Block order and thinking-block signatures are preserved (the reason the channel
exists). Adds tests/agent/test_anthropic_output_field_leak.py; full adapter
suite green (163 tests). Existing poisoned state.db rows scrubbed out-of-band.
Interleaved-thinking turns (adaptive thinking, Claude 4.6+/Opus 4.8) emit
content blocks like:
thinking_1(signed) tool_use_1 thinking_2(signed) tool_use_2
Anthropic signs each thinking block against the turn content preceding it
at its position. normalize_response split the turn into two parallel lists
(reasoning_details + tool_calls), discarding cross-type order, and
_convert_assistant_message rebuilt it as [all thinking][text][all tool_use].
That moved thinking_2 ahead of tool_use_1, invalidating its signature, so
Anthropic rejected the latest assistant message with HTTP 400:
messages.N.content.M: `thinking` or `redacted_thinking` blocks in the
latest assistant message cannot be modified.
Observed repeatedly in agent.conversation_loop against api.anthropic.com /
claude-opus-4-8, recurring across sessions on multi-thinking-block turns.
Fix: carry a verbatim, order-preserving copy of the turn's content blocks
(anthropic_content_blocks) end-to-end - capture in normalize_response,
persist/restore through state.db, and replay unchanged for the latest
assistant message. Gated to turns that actually interleave signed thinking
with tool_use, so normal turns are unaffected.
Adds 3 regression tests including a SQLite round-trip covering the
crash-recovery reload path.
The salvaged draft persistence scoped text per session but reset the
composer's attachments to [] on every scope change, so a staged image or
file was silently dropped when you switched sessions and never restored on
return — inconsistent with the "drafts survive session switches" promise
and a real paper-cut given remote staging cost.
Retain attachments per scope in an in-memory map (keyed by the same scope
as the text draft) since blobs / object URLs / live upload state can't be
serialized to localStorage. Entering a scope restores its stashed chips;
leaving stashes the current ones; an accepted submit clears the scope.
This survives session switches (the case users hit) without pretending to
survive a full reload, which attachments fundamentally can't.
Also guard the debounced text write so browsing sent-message history or
editing a queued prompt (both swap the composer to recalled text via
loadIntoComposer) no longer clobbers the genuine in-progress draft in
storage.
Co-authored-by: mollusk <roger@roger.local>
Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
'Set as active' on the Profiles page only flips the sticky active_profile
file (future CLI/gateway runs) — it never retargets the running dashboard
process. The skills/toolsets endpoints called bare load_config()/
save_config(), so after 'activating' a profile in the web UI, deactivating
a skill silently wrote into the dashboard's own profile and the activated
profile was untouched.
Backend:
- _profile_scope() context manager on the skills/toolsets endpoints:
context-local HERMES_HOME override for call-time config resolution +
cron-style locked swap of tools.skills_tool's import-time SKILLS_DIR
- profile param on /api/skills, /api/skills/toggle, /api/tools/toolsets*
(list/toggle/config/provider/env), hub sources/search installed-state
- hub install/uninstall/update spawn 'hermes -p <profile> skills ...' so
the child rebinds skills_hub.SKILLS_DIR at import (the override cannot
reach import-time globals); profile validated -> 404/400 before spawn
Frontend:
- Skills page: profile selector (deep-linkable /skills?profile=<name>),
amber banner naming the managed profile, threaded through skill toggles,
toolset drawer, and hub browser
- Profiles page: 'Manage skills & tools' action per card; 'Set as active'
toast now says it applies to new CLI/gateway runs only
Omitted profile keeps legacy behavior (dashboard's own profile).
The salvaged draft-persistence effect wrote to localStorage on every
keystroke — the composer's per-keystroke path was deliberately slimmed
down previously, so debounce the write (400ms) and flush pending text on
scope change/unmount so a fast session switch can't drop trailing
keystrokes. Also add AUTHOR_MAP entry for the salvaged commit.
Save in-progress composer text to browser localStorage per chat session and restore it when the desktop composer remounts. Keep the draft when submit is rejected or throws, and clear it only after the prompt is accepted.
The memory/skill write-approval gate (#38199, #43354, #43452) was only
documented inside features/memory.md. Surface it everywhere users will
actually look:
- features/skills.md: new 'Gating agent skill writes' section under
skill_manage, with the staging semantics, review commands, and the
distinction from skills.guard_agent_created
- configuration.md: memory.write_approval added to the Memory
Configuration block; new 'Write approval for skill writes' subsection
next to the guard_agent_created scanner
- reference/slash-commands.md: /memory and /skills review subcommands in
both the CLI and messaging tables; Notes updated since /skills
pending/approve/reject/diff/approval now works on the gateway
- features/memory.md: cross-link to the new skills section
Replace the hermes-identifying clientInfo/User-Agent/session-id prefix on
the keyless Parallel Search MCP path with a neutral 'mcp-web-client'
identity. Project policy forbids third-party usage attribution without an
explicit user opt-in (see telemetry PR policy); MCP requires a clientInfo,
so a generic one satisfies the spec without attributing traffic.
Also adds the contributor AUTHOR_MAP entry and refreshes uv.lock against
current main (parallel-web 0.6.0).
Make Parallel the web search/extract backend with a zero-setup free tier:
- Keyless (no PARALLEL_API_KEY): web_search/web_extract work out of the box via
Parallel's free hosted Search MCP (search.parallel.ai/mcp), and parallel
becomes the default backend when no other web credentials are configured
(ahead of ddgs, which is search-only). A small hand-rolled Streamable-HTTP
JSON-RPC client speaks the MCP's web_search/web_fetch tools; the existing
web_search/web_extract tools are the only tools registered.
- Keyed (PARALLEL_API_KEY set): uses the Parallel v1 REST endpoints
(client.search / client.extract with advanced_settings.full_content) — no beta.
Bumps parallel-web 0.4.2 -> 0.6.0.
- Attribution: on the free path only, results carry provider/attribution and the
CLI tool line reads "Parallel search" / "Parallel fetch"; the paid path is
unbranded.
- Selection/registration: web tools register unconditionally (free MCP backstop)
while check_web_api_key remains a real usability probe; explicit per-capability
backends are honored (so misconfig surfaces) rather than masked by the fallback.
Tested: live web_search/web_extract against search.parallel.ai in keyless and
keyed modes; unit suites for the MCP client, backend selection, and display
labeling; full agent run shows the "Parallel search" label on the free path.
Follow-up to #42351. Slash command rows render the command label and
description with `truncate`, so skill commands and longer blurbs were
clipped with no way to read the full text. Rather than add a floating
tooltip (which overlaps the popover and only helps the mouse), the active
row — the one reached by keyboard arrows or hover, since onMouseEnter
already sets activeIndex — now drops truncation and wraps inline
(whitespace-normal break-words). Idle rows stay single-line/truncated so
the list reads compact.
* desktop: surface /tools, /save, /personality and fix /help skill count
Move /tools and /save out of TERMINAL_ONLY_COMMANDS and /personality out of
ADVANCED_COMMANDS so they appear in the desktop slash palette and execute via
the existing slash.exec → command.dispatch fallback. The backend gateway already
accepts these through slash.exec (none are in _PENDING_INPUT_COMMANDS or the
skill list), so no backend change is required.
Recompute skill_count in filterDesktopCommandsCatalog from the filtered pairs.
Previously the /help footer echoed the unfiltered backend total — e.g. "60
skill commands available" while only ~29 actually appeared in the rendered
list, because the desktop hides terminal-only, picker-owned, and advanced
commands.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* desktop: keep slash popover live while typing args
The trigger regex `(?:^|[\s])([@/])([^\s@/]*)$` stopped matching the moment
the user typed a space after a slash command, so the popover never showed arg
completions for `/personality`, `/tools`, etc. — even though the backend's
`complete.slash` already returns them with a `replace_from` indicator.
Split the trigger detection so `/` allows args (`/cmd arg1 arg2`) while `@`
keeps the strict no-space behavior. Restrict the slash command name to
`[a-zA-Z][\w-]*` so file paths like `src/foo/bar` don't accidentally trigger
the popover.
Rewrite arg-completion items in useSlashCompletions to insert the full
`/personality alice` token instead of stranding `/alice`: when `replace_from`
is past the command base, prepend the existing prefix to each item's text so
the chip serializer produces a coherent replacement.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* cli: complete toolset names after /tools enable|disable
SlashCommandCompleter previously only auto-derived the first subcommand level
from args_hint, so `/tools enable <tab>` yielded nothing — the user had to
remember every toolset key (web, file, spotify, …) and every MCP server prefix.
Add `_tools_completions` that handles both stages: subcommand (list|disable|enable)
and tool name. Filter by current enable state so `/tools enable <tab>` only
offers disabled toolsets and `/tools disable <tab>` only offers enabled ones —
no point suggesting a no-op. MCP server prefixes (server:) come from the
saved mcp_servers config; per-tool completion under a server would require
runtime MCP introspection and is left as follow-up.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* desktop: registry-driven slash commands with first-class pickers
Collapse the if/else slash dispatch into one DESKTOP_COMMAND_SPECS table
that drives popover suggestions, per-type composer pills, and execution.
- /resume, /sessions, /switch: inline session completions (like /skin) plus
a "Browse all sessions…" entry that opens a dedicated session picker overlay
- /handoff: inline platform completion + handoff.request/handoff.state
gateway bridge so desktop reaches CLI parity
- colored per-type pills (command/skill/theme) in the composer
- strip ANSI and fix width/alignment of slash output in the chat panel
* desktop: fold repeated slash session/output boilerplate into one helper
runExec, /title, /help and the unavailable case each re-derived the same
ensure-session → bail-with-notify → build-renderSlashOutput dance.
withSlashOutput() returns {sessionId, render} or null, so each handler is
a two-line resolve instead of an eight-line preamble.
* desktop: keep backend meta on slash arg completions
Arg suggestions (/personality <name>, /tools enable <toolset>, /handoff
<platform>) were having their meta overwritten with the parent command's
registry description: desktopSlashDescription("/personality none") canonicalizes
back to /personality and returns its blurb. Skip the lookup for arg rows so the
backend's own display_meta ("clear personality overlay", etc.) survives.
* cli: list real personalities in /personality completion
_personality_completions resolved load_config().agent.personalities — but that
schema has no agent.personalities key, so completion always returned just
`none` even though the runtime (load_cli_config().agent.personalities) ships a
dozen built-ins (helpful, kawaii, pirate, …). Read from the same source the
command actually applies, so `/personality ` surfaces the real options.
* desktop: expand bare arg-commands to their options on pick
Picking a command like /personality from the slash popover committed it
immediately instead of advancing to its argument list. Mark arg-taking
commands (/skin, /resume, /handoff, /personality, /tools) in the registry
and, when one is picked bare, insert "/cmd " as plain text and re-open the
popover on its inline options — mirroring typing "/cmd " by hand. Arg picks
(serialized text already contains a space) still commit a single pill.
Also realign trigger-popover loading test with the redesigned popover (the
/help empty-state hint shows when resolved, not while the spinner is up);
the merge from main reintroduced the pre-redesign expectation.
* tui_gateway: fold session-db close into a context manager
Both handoff RPCs repeated the same `db, close_db = _session_db_handle()`
+ `finally: if close_db: db.close()` dance. Turn the helper into a
`_session_db` contextmanager that owns the close, so callers just
`with _session_db(session) as db:`.
* desktop: unblock handoff retries and exact resume ids
Clear timed-out desktop handoffs through the gateway so retries are not stuck behind a pending row, and let typed /resume session ids bypass the loaded sidebar cache.
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* fix(streaming): stop socket read timeout from preempting stale-stream detector
The stale-stream detector is deliberately scaled to 180-300s so reasoning
models (e.g. Opus) can pause mid-stream during extended thinking. But the
httpx socket read timeout stayed at a flat 120s for cloud providers and fired
first, tearing down healthy reasoning streams before the detector (which owns
retry + diagnostics) could act. Symptom: every Copilot/Opus turn dies with
ReadTimeout at a consistent ~125s and never completes.
Floor the cloud socket read timeout at the stale-stream timeout so it can no
longer fire before the detector. Local providers and explicit
HERMES_STREAM_READ_TIMEOUT / request_timeout_seconds overrides are unchanged.
* test(streaming): pin read-timeout >= stale-stream invariant for cloud reasoning streams
Cover the contract that the httpx socket read timeout is never shorter than
the stale-stream detector for cloud providers on the default: small contexts
floor to 180s, >=50K to 240s, >=100K to 300s; explicit overrides win; local
providers and the unresolved-value fallback are unaffected.
CI caught tests/cli/test_cli_new_session.py asserting that /new keeps
the old session row when conversation history exists in memory. The
live transcript is authoritative: a session whose messages haven't
flushed to the DB yet (or whose flush failed) must not be pruned.
Guard _discard_session_if_empty on self.conversation_history and pin
the behavior with a test.
Port from google-gemini/gemini-cli#27770: starting the CLI and
immediately quitting (or rotating with /new, /clear) left an empty
untitled session row behind. These ghost rows pile up in /resume,
`hermes sessions list`, and the in-chat recent-sessions browser.
- SessionDB.delete_session_if_empty(): transactional check-and-delete
that only removes rows with no messages, no title, and no child
sessions (delegate subagent parents are preserved). Also removes
on-disk transcript files via the existing _remove_session_files.
- HermesCLI._discard_session_if_empty(): thin wrapper, wired into the
cli_close shutdown path and the new_session() rotation path.
Skipped when /exit --delete already handles removal.
Unlike the one-shot prune_empty_ghost_sessions migration (TUI-only,
24h-old rows), this prevents new ghost rows from accumulating at the
moment they would be created.
Follow-up to the GitHub-skills fix: the same hardcoded ~/.hermes/.env
pattern existed across other bundled and optional skills. Under the
official Docker setup (HERMES_HOME=/opt/data, subprocess HOME=/opt/data/home)
those paths point at a nonexistent file.
- kanban-video-orchestrator setup.sh.tmpl + docs: resolve via
${HERMES_HOME:-$HOME/.hermes}/.env in check_key()
- telephony.py / canvas_api.py / hyperliquid_client.py: error and
save messages now report the real resolved env path instead of a
hardcoded literal (path resolution itself was already correct)
- godmode SKILL.md: load_dotenv snippet resolves via HERMES_HOME
- watch_github.py + ~20 SKILL.md prose mentions: document the env file
as ${HERMES_HOME:-~/.hermes}/.env so Docker users edit the right file
The GitHub skills' auth-detection fell back to reading GITHUB_TOKEN from a
hardcoded ~/.hermes/.env. In the official Docker layout HERMES_HOME=/opt/data
while tool subprocesses run with HOME=/opt/data/home, so `~/.hermes/.env`
expands to /opt/data/home/.hermes/.env — a path that does not exist — while the
real secrets file is /opt/data/.env. Result: the agent reports GITHUB_TOKEN as
"not set" even though it is present and the dashboard Keys page shows it.
Resolve the file as ${HERMES_HOME:-$HOME/.hermes}/.env (HERMES_HOME is bridged
into tool subprocess env, falling back to ~/.hermes when unset) across all six
auth-detection sites: github-auth (SKILL.md + scripts/gh-env.sh), github-issues,
github-repo-management, github-pr-workflow, github-code-review.
Follow-ups on top of the two salvaged GodsBoy commits, all live-validated
against the real Telegram Bot API:
- _edit_overflow_split finalize fallbacks degrade to _strip_mdv2() clean
text instead of putting raw **markdown** markers on screen (salvaged
from PR #43463 minus its format-first sizing — live probes show
Telegram's 4096 limit counts PARSED text, so MarkdownV2 escape
inflation cannot cause MESSAGE_TOO_LONG and sizing against formatted
wire length only causes premature splits and fragment messages).
- Skip the redundant requires-finalize edit after a got_done edit that
split-and-delivered (salvaged from PR #43463): re-finalizing re-splits
the full text into the adopted continuation and duplicates chunks.
- _send_fallback_final only deletes the stale partial message when the
fallback re-sent the COMPLETE final text. When the prefix dedup sent
only the missing tail, the partial IS the head of the answer; deleting
it left users with only the second half of long responses (live-
reproduced: flood-control during a long stream -> head deleted,
ratio 0.54 of content visible). This is the third bug behind the
'Telegram cut messages' reports and was present on main and both PRs.
The "Persistent Memory" callout said "when memory is full, the agent
consolidates or replaces entries to make room," which reads as if the
store self-compacts automatically. It does not: the `memory` tool
returns an overflow error and the agent does the consolidation in-turn
(the design from #41755). Also note that `replace` is bound by the same
limit — swapping in a longer entry can still overflow — which is the
exact case that confused a user (replace rejected near the cap even
though the math was correct).
Add execution-time coverage that bare `provider="custom"` resolves a literal
providers.custom endpoint (and still falls through when none exists), plus
creation-time coverage that `_resolve_model_override` keeps a resolvable
"custom" and only pins the main provider when it is unresolvable.
A cron job stored with `provider: "custom"` and a matching `providers.custom`
entry in config failed at execution with `auth_unavailable: providers=codex`.
Two layers conspired:
- `_get_named_custom_provider` returned None for bare "custom" *before*
scanning config, so a literal `providers.custom` entry was never matched and
resolution fell through to the global default (codex). Now it scans config
for an entry literally named "custom"; with none it still returns None,
preserving the legacy model.base_url trust path.
- `_resolve_model_override` blindly stripped bare "custom" at job creation and
pinned `model.provider` (e.g. codex). It now keeps "custom" when a configured
custom endpoint resolves, pinning the main provider only when it doesn't.
The previous fix for #23387 changed _launchd_domain() from gui/<uid> to
user/<uid> to support Background/SSH sessions on macOS 26+. However, this
broke Aqua sessions where gui/<uid> is the only working domain and
user/<uid> cannot bootstrap or manage the service.
Now _launchd_domain() probes which domain actually contains the loaded
service:
1. Try gui/<uid> first (Aqua sessions)
2. Fall back to user/<uid> (Background/SSH sessions)
3. Use launchctl managername as heuristic when neither has the service
4. Cache the result for the process lifetime
Regression tests cover all four paths plus caching behavior.
The thinking-signature recovery in agent/conversation_loop.py popped
reasoning_details from messages, then continued to retry. That had two
defects.
First, the strip never reached the wire payload. api_messages is built
once at the start of the turn by shallow-copying every entry in messages
(line 919 area). Each api_messages entry has its own reference to the
same reasoning_details list. When build_api_kwargs runs on every retry
iteration of the inner while-loop, it consumes api_messages, not
messages. Popping reasoning_details from messages left api_messages
untouched, so the retry's request still carried the same thinking
blocks Anthropic had just rejected. The classifier latched
thinking_sig_retry_attempted = True after the first attempt, and the
loop terminated with max_retries_exhausted on the same 400.
Second, the pop mutated the canonical message list. messages is the
same list _persist_session writes to state.db and the session
transcript, so a single recovery permanently wiped every signed
thinking block from the stored conversation. Subsequent turns reloaded
the stripped state, hit the same 400 ('invalid signature' or 'cannot
be modified', see #24107), and the agent stopped responding entirely.
Cascading compaction-ended sessions then chained off the corrupted
parent and the affected chat could not produce a response on any
future turn.
Move the strip onto api_messages, which is the API-call-time list
rebuilt into kwargs on every retry. messages is no longer touched, so
disk I/O stays clean and the recovery actually reaches the wire.
Observed against the native Anthropic Messages API on claude-opus-4-7
and claude-opus-4-8 with the interleaved-thinking-2025-05-14 beta on
hermes-agent 0.12.0 and 0.14.0. PR #24107 narrows the trigger; this
change makes the recovery do what it always claimed to do, and
prevents the destructive aftermath.
Tests cover the api_messages strip in isolation: pop on a shallow copy
does not affect the source, the canonical messages list survives the
strip, idempotency on a duplicate firing path, and a no-op when no
reasoning_details exist on the messages.
Related: #24107, #26959, #17861.
Anthropic returns a 400 when the thinking/redacted_thinking blocks in the
latest assistant message are mutated upstream: 'thinking or redacted_thinking
blocks in the latest assistant message cannot be modified. These blocks must
remain as they were in the original response.'
The classifier's thinking_signature branch only matched on the substring
'signature', so this variant fell through to a non-retryable client error
and hard-aborted the turn -- even though the existing strip-reasoning_details
-and-retry recovery would have healed it.
Broaden the 400 match to also catch 'cannot be modified' / 'must remain as
they were' (still gated on 'thinking'), routing it to the same recovery.
Adds a negative-case test so unrelated 'cannot be modified' 400s are not
swept in.
Defense-in-depth, orthogonal to the root-cause work in #35975 / #17861
(which prevent the block mutation in the first place). Only changes a
terminal-failure into a one-shot recovery.
Signed-off-by: Ian Culling <ian@culling.ca>
* fix(desktop): keep model runtime state per session
(cherry picked from commit f72ee87d99ee38cb7b5badeb9a8af869bb92073a)
* fix(desktop): keep footer model state scoped to active session
(cherry picked from commit d91942ebd4671ff857b5c8526dbf133f04782ecb)
* fix(desktop): restore stored runtime when resuming sessions
(cherry picked from commit 32b3793418257617b8da57e26151f079c2620d00)
* fix(desktop): persist live runtime changes for resume
(cherry picked from commit c58467779436dcef44a80ad55b52664752dc0837)
* fix(desktop): persist resumed endpoint runtime
* chore(attribution): map pinguarmy's commit email in AUTHOR_MAP
The salvaged commits on this branch preserve @pinguarmy's authorship
(郝鹏宇 / peterhao@Peters-MacBook-Air.local). Add the mapping so the
check-attribution CI gate resolves the email to the GitHub username.
---------
Co-authored-by: 郝鹏宇 <peterhao@Peters-MacBook-Air.local>
* fix(ci): append filesystem forensics when a per-file pytest run exhausts exit-4 retries
A PR-added test file (tests/test_iron_proxy.py, PR #30179) repeatedly
failed exactly one CI shard with 'ERROR: file or directory not found'
across 4 runs (including a fresh merge SHA on fresh runners), while the
identical slice passes locally against the same merge commit and a
tree-integrity watcher confirms no sibling test mutates the repo. Three
unrelated branches showed the same one-shard signature the same day.
We currently cannot attribute these because the log only carries
pytest's exit-4 line. This adds a forensics block to the captured
output when exit-4 survives the retry loop:
- does the file exist NOW (post-retries)
- parent dir entry count + similarly-named entries
- git status --porcelain dirty-entry count + first 10 entries
Zero behavior change: rc stays 4, retries unchanged, forensics wrapped
in a broad try/except so they can never mask the failure.
Two new tests cover the exhausted-retries and genuinely-missing paths.
* chore: drop the two forensics tests — ship the runner change only
* feat(profiles): extend create endpoint for full profile-builder (model + MCPs + skills)
Backend foundation for the dashboard profile builder. Extends POST /api/profiles
to accept, in one call, everything a profile needs beyond name/clone:
- mcp_servers[] -> written into the new profile's config.yaml
- keep_skills[] -> replace-semantics: disable every seeded skill not kept
- hub_skills[] -> async install via 'hermes -p <name> skills install <id>'
All applied best-effort AFTER the profile dir exists, so a hiccup in any one
never 500s the create. Model/MCP/keep-skills writes are profile-scoped via the
HERMES_HOME context override (same mechanism as the existing _write_profile_model).
Hub installs go through a subprocess scoped with -p because skills_hub.SKILLS_DIR
is import-time-bound and the runtime override can't redirect it.
Adds two helpers (_write_profile_mcp_servers, _disable_unselected_skills) and a
TestClient test asserting all four paths land in the NEW profile's config and
the hub spawn is scoped to it. Design doc at docs/design/profile-builder.md.
* feat(dashboard): full-featured profile builder page
Adds a dedicated /profiles/new builder that composes everything a profile
needs into one stepped create flow, reusing the existing Models/Skills/MCP
data paths instead of duplicating them:
- Identity name + description
- Model provider+model picker (api.getModelOptions)
- Skills keep-which-built-in/optional (replace semantics, default = full
bundle) + skills-hub search/add (api.getSkills, searchSkillsHub)
- MCPs add HTTP/stdio servers inline
- Review blueprint -> single POST /api/profiles create
Nothing writes until Create; the one call commits model+MCPs+skill selection
and spawns hub-skill installs (reported in the success toast). ProfilesPage
header gets a 'Build' button (full builder) alongside 'Create' (quick modal).
Route is page-only (not in the sidebar nav). Verified with vite build (2258
modules, green).
* fix(docker): optimize image size with .dockerignore, drop dev deps, split build layers
Three changes to reduce the Docker image size and speed up rebuilds:
1. .dockerignore — exclude ~69 MB of files that are never needed inside
the container: apps/ (desktop Tauri source), tests/, website/
(Docusaurus), docs/, infographic/, nix/, plans/, packaging/, and
various dotfiles (.envrc, .hadolint.yaml, .mailmap, etc.). The
existing .dockerignore already covered node_modules and .git; these
additions prevent the remaining non-runtime content from inflating
both the build context and the final image (COPY . .).
2. pyproject.toml — add a [docker] extra that mirrors [all] but omits
[dev] (debugpy, pytest, pytest-asyncio, pytest-timeout, ty, ruff,
setuptools). The published image doesn't need test/debug tooling.
Estimated savings: ~30-50 MB of Python packages.
3. Dockerfile — use --extra docker instead of --extra all in the
uv sync layer. Also split the COPY + npm run build so that the
web/ and ui-tui/ frontend builds are cached independently from
Python source changes (COPY . .). A Python-only commit no longer
invalidates the (slower) frontend build layer.
Note: the build-only apt packages (gcc, python3-dev, libffi-dev,
libolm-dev) are still installed in the final image. Removing them
requires a true multi-stage build (builder → runtime), which is a
larger refactor tracked separately.
* fix(docker): remove redundant [docker] extra, revert to --extra all
The [docker] extra was identical to [all] on main — the PR had added [dev]
to [all] then created [docker] as [all] minus [dev], a no-op round-trip.
Revert [all] to its original form and drop the [docker] extra.
Keep the .dockerignore additions and frontend build layer reordering.
- Add output_path suffix assertions (.ogg Telegram / .mp3 non-Telegram) to
_send_voice_reply tests, covering the OGG voice-note path that landed on
main in ae82eed2b (the PR's third commit was redundant with it).
- Convert test_gemini_default_is_32000 back to an invariant against
PROVIDER_MAX_TEXT_LENGTH instead of a hardcoded literal.
- Map barronlroth@gmail.com -> barronlroth in scripts/release.py.
The grandchild wrote its pid with open('w').write(...), so the polling
reader in the test could observe the file after creation but before the
write flushed, parsing '' -> ValueError: invalid literal for int().
Write to a temp file and os.replace() it into place so the pid file only
ever appears fully written.
Follow-ups to #38199/#43354 found in post-merge review:
- Inline CLI memory approval never worked: the per-thread approval callback
was not passed to prompt_dangerous_approval, so the prompt_toolkit
fail-closed guard (#15216) denied every gated foreground write without
showing a prompt. Now invokes the registered callback directly; a crashed
prompt falls back to staging instead of a silent deny.
- Gateway sessions claimed inline support but prompt_dangerous_approval has
no gateway round-trip (that lives in the pending-approval queue), so gated
gateway memory writes hit the input() fallback and denied. Gateway
contexts now stage for /memory pending review.
- /skills pending|approve|reject|diff|approval now works on the gateway
(gateway_config_gate on skills.write_approval), so skills staged from a
messaging session can be reviewed there. Diff output truncated for chat.
- memory_tool validates required params before the gate so invalid writes
are rejected immediately instead of staged and failing at approve time.
- Stale tri-state write_mode docstrings updated to the boolean gate; docs
table corrected (inline prompt is interactive-CLI-only).
- 6 new tests covering the interactive approve/deny/error paths, gateway
staging, skills never-prompt invariant, and pre-gate validation.
* fix(update): self-heal a venv left half-built by an interrupted install
An update killed mid dependency-install (Ctrl-C, terminal close, WSL OOM)
could leave the venv with pip wiped and core deps (e.g. Pillow) missing,
with no automatic recovery — the user had to manually run ensurepip +
reinstall.
Drop an install-scoped .update-incomplete breadcrumb right before the dep
install and clear it only after core-dependency verification passes. On the
next launch (any command except 'update' itself), if the marker is present,
unconditionally bootstrap pip via ensurepip then re-run the .[all] install +
verification, then clear the marker. Failure leaves the marker for retry and
prints the manual recovery command. Never raises — recovery cannot block
launch.
* fix(update): address review — stderr-only recovery output, single-flight lock, gitignore marker
- Route all recovery output (status lines + streamed pip/uv install via
fd-level dup2) to stderr so protocol-on-stdout launches (hermes acp)
never get install noise on the JSON-RPC stream.
- Single-flight O_EXCL lockfile (.update-incomplete.lock) so a gateway
start + CLI launch (or two profiles) can't run concurrent installs
into the shared venv; stale locks (>1h) are broken for the next launch.
- gitignore .update-incomplete + lock so source-tree installs keep a
clean git status and update's autostash skips them.
- Document why the loose 'update' argv substring match is intentional
(over-match defers one launch; under-match would race the real update).
- 4 new tests: lock held → skip, stale lock broken, lock released,
output lands on stderr only.
#33699 fixed save_env_value so an operator-set .env mode (e.g. 0640 on a
Docker bind-mount) survives a config write instead of being re-tightened
to 0600 by the unconditional _secure_file() call. The sibling
remove_env_value() had the identical bug: it restores original_mode and
then unconditionally called _secure_file(env_path), clobbering the mode
back to 0600 on every `hermes config remove KEY`.
Apply the same fix: move _secure_file() into the else branch so it only
runs when no original mode was captured (a freshly created .env still
gets 0600 hardening; existing operator-set modes survive).
Added test_remove_env_value_preserves_existing_file_mode_on_posix, which
fails on the unfixed remove path (expected 0o640, got 0o600) and passes
with the fix.
* fix(openrouter): route reasoning_effort to verbosity for adaptive Anthropic models
Reasoning-mandatory Anthropic models (Claude 4.6+/fable/mythos-class) over
OpenRouter ignore reasoning.effort and use adaptive thinking. #42991 correctly
stopped Hermes from sending a reasoning field to them (it 400s), but put nothing
in its place — leaving agent.reasoning_effort a silent no-op on the OpenRouter
path: the model always ran at its adaptive default (high) regardless of config.
OpenRouter honors the requested effort on the top-level verbosity field instead
(maps to Anthropic output_config.effort). Route the existing
reasoning_config[effort] there for these models while still never emitting a
reasoning field, preserving the #42991 fix. No new config arg — the value the
user already sets via agent.reasoning_effort now flows to verbosity.
- low/medium/high/xhigh/max pass through verbatim (OpenRouter accepts the
extended scale for Claude; verified live HTTP 200 + monotonic token spend).
- effort unset/none/disabled omits verbosity so the model keeps its default.
- native Anthropic transport already correct; unchanged.
Fixes#43432
* test(openrouter): cover real effort range (add minimal, frame max as passthrough)
Adversarial review noted the verbosity tests looped over 'max' — a value
parse_reasoning_effort can never produce — while omitting 'minimal', which it
can. Align the routing test with the real config range
(VALID_REASONING_EFFORTS = minimal/low/medium/high/xhigh) and keep a separate
value-agnostic passthrough test that documents why xhigh/max must survive
verbatim (TypedDict, no runtime literal validation; OpenRouter accepts the
extended scale for Claude).
* docs: explain reasoning_effort -> verbosity routing for adaptive Anthropic models
Document that reasoning_effort transparently maps to OpenRouter's verbosity
field for adaptive-thinking Anthropic models (Claude 4.6+/Fable/Mythos), where
reasoning.effort is ignored. Note xhigh is the configurable ceiling (max is wire-
only). Add verbosity as a top-level-kwarg example in the provider-plugin guide.
Two fixes for the reported Slack thread approval UX:
1. Slack Block Kit approval/confirm sends silently overflowed the
3000-char section-block cap (flat 2900-char truncation + header +
reason), so long execute_code approvals failed with invalid_blocks
and fell back to the plain-text prompt with no buttons. Budget the
command preview against the rendered fixed parts so blocks never
exceed the cap (send_exec_approval + send_slash_confirm).
2. The text fallbacks told users to reply /approve — which Slack blocks
inside threads and Matrix clients reserve client-side. Add a
typed_command_prefix capability flag on BasePlatformAdapter
(default "/"; Slack and Matrix set "!" to match their existing
bang-prefix rewrite) and use it in the shared fallback prompt
builders (exec approval, update prompt, destructive slash confirm,
expensive-model confirm) plus Matrix's reaction-prompt text.
The slash-confirm text-intercept now also accepts bang-prefixed
replies (!always, !cancel) since those keywords aren't registered
commands and the adapters' rewrite doesn't touch them.
The Matrix gateway requires mautrix[encryption] which pulls in
python-olm. While python-olm was removed from [all] due to missing
Windows/macOS wheels, it has binary manylinux wheels for Linux
amd64/arm64. The Docker image only runs on Linux, so adding --extra
matrix to the uv sync line is safe.
libolm-dev is already in the apt-get install line for runtime linking.
Fixes: #30399
`hermes setup` (and other banner-printing commands) crash with an unhandled
UnicodeEncodeError on Linux hosts whose locale selects a non-UTF-8 codec —
e.g. a fresh Raspberry Pi / minimal Debian with a latin-1 or C/POSIX locale.
The setup wizard prints box-drawing characters (┌│├└─) and the ⚕ glyph before
any stream repair runs, so the command dies before it can start.
The existing _ensure_utf8() shim already knew how to re-wrap the standard
streams as UTF-8, but it returned early on `sys.platform != "win32"`, so the
identical crash class on Linux was never covered.
- Drop the win32 gate: repair any stdout/stderr whose encoding is not UTF-8.
- Prefer TextIOWrapper.reconfigure() so the stream object is fixed in place
(cached sys.stdout references keep working); fall back to reopening the fd
with closefd=False (the CPython-recommended safe variant).
- Use errors="replace" — matching the sibling hermes_cli/stdio.py shim — so a
stray un-encodable byte degrades gracefully instead of crashing.
- Only set the PYTHONUTF8/PYTHONIOENCODING child-process hints when a repair
actually happened, so a healthy UTF-8 host sees zero footprint (no stream
swap, no env mutation).
This is intentionally the earliest, platform-agnostic guard, running at import
time before any banner prints. hermes_cli/stdio.py::configure_windows_stdio()
still runs later from the entry points for the Windows-only extras (console
code-page flip, EDITOR default, PATH augmentation); it early-returns on
non-Windows and its stream reconfigure is an idempotent no-op once we've
already repaired the streams here.
Add regression tests covering latin-1 and ascii/POSIX streams, the reconfigure
fallback, already-UTF-8 no-op (identity preserved + no env mutation), the
repair-sets-env and respects-explicit-env contracts, and hostile/None streams.
Sticky human bubbles park at --sticky-human-top (~4px), sliding under the
titlebar's -webkit-app-region:drag strips. Electron resolves drag regions at
the compositor level — z-index and pointer-events don't apply — so clicking a
stuck bubble dragged the window instead of opening the edit composer. Add
no-drag to the shared bubble base class (read-only bubble + edit composer).
Covers the runtime side with a test: clicking a user bubble opens the inline
edit composer through both the incremental external-store runtime and the
stock one.
(cherry picked from commit db4e1f4f3e)
Replace the earlier text-stroke approach (which only bolds outline
codicons — a font glyph has no fillable region) with dedicated solid
SVG glyphs for tool rows. Adds ToolIcon, keyed by the same names as
TOOL_META, with a codicon fallback for uncovered tools.
Follow-ups to the salvaged Telegram QR onboarding auto-restart:
- _spawn_gateway_restart() reuses a live in-flight 'hermes gateway restart'
child instead of spawning a second racing one (stale cached frontend +
new backend both requesting a restart, or restart-button double-click).
Both /api/gateway/restart and the onboarding apply path go through it.
- ChannelsPage polls /api/actions/gateway-restart/status after a
server-initiated restart and surfaces a non-zero exit (e.g. systemd
linger missing) via the manual-restart banner, since restart_started
only means the child spawned.
- Test for the reuse path + _ACTION_PROCS isolation in existing tests.
Outline codicons read too thin at conversation-tool scale; a scoped
filled modifier thickens tool-row and code-card icons without changing
icon semantics elsewhere in the shell.
do_browse waited on a frozen 'Fetching skills...' spinner while sources
resolved, so a slow source looked like a hang. parallel_search_sources
already exposes an on_source_done(sid, count) callback fired as each source
completes — wire it into the status line so it ticks off sources live
(official (12), + github (4), + clawhub (500)). The page is still rendered
once, after the full set is merged and trust-sorted, so browse's
official-first ordering and pagination contract are untouched.
Browse renders one page but the cold-cache fallback walked the entire
50k+ ClawHub catalog, then sliced off the first N — pure waste behind the
12s budget band-aid. _load_catalog_index now takes max_items: browse's
empty-query path bounds the walk to its limit and stops early; the offline
index builder still passes limit=0 (unbounded) and walks to exhaustion.
A bounded walk is partial, so it is not written to the shared full-catalog
cache (same poison-guard as the budget-truncated case).
Backend selection ordered firecrawl (including the Nous-managed-tool-gateway
probe) ahead of explicit-credential backends, so a user who had both a
Nous OAuth token AND a TAVILY_API_KEY (or EXA/PARALLEL key) got firecrawl
auto-selected — then the request failed at runtime because the free Nous
tier does not include web search, and there is no fallback to the next
available backend. Explicit user setup lost to a managed convenience.
Reorder so direct-credential backends (tavily > exa > parallel > firecrawl-
direct) are tried first, then the managed-gateway firecrawl probe, then
free-tier fallbacks. Behaviour for users with only Nous OAuth (no
explicit key) is unchanged — firecrawl-via-gateway is still selected.
Behaviour change to flag: a user with BOTH a Nous OAuth token AND a
TAVILY_API_KEY (or EXA/PARALLEL key) now gets the explicit backend
instead of the managed gateway. This matches the principle of least
surprise — a user does not set TAVILY_API_KEY without intent — and
sidesteps the silent runtime failure of the gateway path on free tiers.
Follow-ups on top of #26016's expensive-model guard:
- gateway/slash_commands.py: typed '/model <name>' now routes through the
expensive-model confirmation gate (slash-confirm buttons / text fallback)
instead of bypassing the guard the pickers enforce. Cancel leaves the
session override and --global config untouched.
- telegram/discord/web_server: run expensive_model_warning() via
asyncio.to_thread — it can hit models.dev or a /models endpoint on a
cache miss, which would otherwise block the event loop.
- telegram: picker callback no longer toasts 'Model switched!' when the
switch callback raised (both mm: and mc: paths).
- tests: new tests/gateway/test_model_command_expensive_confirm.py pins
the typed-path gate (prompt, confirm-once, cancel, cheap-model no-op).
Rebased onto current main and re-ported across the restructured
surfaces: model flows now thread confirm_provider/base_url/api_key
through hermes_cli/model_setup_flows.py, the Discord picker lives in
plugins/platforms/discord/adapter.py, and the web dashboard picker
applies chat-mode switches via config.set so the expensive-model
confirmation can ride the response.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Compare source.role_authorized with 'is True' so a MagicMock source
(test fixtures that build bare runners via object.__new__) doesn't
auto-truthy through the gate. The real SessionSource field is a bool,
so production behavior is unchanged. Fixes test_signal_in_allowlist_maps.
DISCORD_ALLOWED_ROLES was checked by the Discord adapter (_is_allowed_user)
but gateway._is_user_authorized only read DISCORD_ALLOWED_USERS, so
role-authorized users were rejected with "Unauthorized user" at the
gateway layer despite passing the adapter gate.
- Add role_authorized: bool = False to SessionSource
- Add role_authorized param to build_source (base.py)
- Compute _role_authorized in on_message when user passes via role not user ID
- Thread _role_authorized through _handle_message -> build_source
- Check source.role_authorized early in _is_user_authorized (run.py)
Fixes#33952
Assert the inactivity handler skips disconnect (and the channel spam) when the
voice-mode getter reports "off", and still disconnects on genuine inactivity
when the mode is active.
The voice inactivity timer (VOICE_TIMEOUT) only counted the bot's OWN audio
playback as activity. Under /voice off (text-only replies, but still in the
channel — leaving is /voice leave) nothing ever reset it, so every 300s the bot
disconnected and spammed "Left voice channel (inactivity timeout)."
The adapter now learns the live voice-reply mode via a getter wired from run.py
and skips the auto-disconnect while mode is off. It also resets the timer when a
user actually speaks to the bot, so an active listener (incl. voice-on
text-only sessions that never play audio) isn't dropped mid-conversation.
parallel_search_sources accepted an overall_timeout but never honoured it.
The ThreadPoolExecutor ran inside a `with ... as pool` block, whose __exit__
calls shutdown(wait=True); even after as_completed() raised TimeoutError on
schedule, leaving the block blocked the caller until every worker finished.
A single slow source (e.g. ClawHub) therefore stalled the entire browse for
minutes. Manage the executor manually and shut it down with
wait=False, cancel_futures=True in a finally, so the timeout actually returns
and not-yet-started work is dropped.
ClawHubSource._load_catalog_index walked up to 750 sequential pages with no
wall-clock bound (each request under its own timeout=30, so nothing errored),
and wrote the result to the index cache unconditionally — so an interrupted or
slow walk poisoned the cache with a partial catalog. Add a
CATALOG_WALK_BUDGET_SECONDS deadline that breaks the walk early, and only write
the cache when the walk reaches a natural stop (cursor exhausted or page cap),
never on a budget-truncated walk.
Adds regression tests covering both bugs (timeout honoured + slow source
flagged; budget abort does not poison cache) plus their happy-path invariants.
A GPT-5 model rejecting max_tokens returns a 400 whose message contains the
literal substring 'max_tokens' — one of the _CONTEXT_OVERFLOW_PATTERNS. The 400
path in _classify_400 checked overflow patterns before any request-validation
check (which only existed on the 5xx path), so the parameter error was routed
into the compression loop, re-sent with the same bad param, and ended in
'Cannot compress further' on a tiny context.
Hoist a request-validation guard (unsupported/unknown parameter) above the
context-overflow check in _classify_400. Deliberately excludes the generic
invalid_request_error code, which OpenAI also stamps on real overflow 400s, so
genuine overflows still compress. Pairs with the max_completion_tokens param
fix that stops the bad request at the source.
Also adds AUTHOR_MAP entry for the salvaged PR #13902 commit.
Third-party OpenAI-compatible endpoints (self-hosted gateways, OpenRouter,
Azure proxies) fronting gpt-4o / gpt-4.1 / gpt-5+ / o1-o4 models silently
received max_tokens and 400'd with unsupported_parameter, because the three
kwarg-selection sites only checked base_url_hostname(...) == "api.openai.com"
and fell through to max_tokens on every other host. The constraint is
enforced server-side by the model family, not by the URL, so name-based
detection is required as a fallback.
Changes:
- utils.py: new shared helper model_forces_max_completion_tokens(model) that
prefix-matches gpt-4o, gpt-4.1, gpt-5, o1, o3, o4 families on normalized
(lowercased, vendor-prefix-stripped) names.
- run_agent.py: _max_tokens_param ORs the helper into the URL check.
- agent/auxiliary_client.py:
- auxiliary_max_tokens_param gains an optional keyword-only model arg.
- _build_call_kwargs inline branch applies the same check for both
provider == "custom" and non-custom paths.
Tests:
- tests/test_model_forces_max_completion_tokens.py: 31 new cases covering
positive families, negatives (classic gpt-4, claude, llama, mistral, qwen,
deepseek), vendor prefixes, case-insensitivity, whitespace, None/empty,
and substring-not-prefix guards.
- tests/run_agent/test_run_agent.py::TestMaxTokensParam: 5 new model-based
cases (custom + gpt-5.4, openrouter + gpt-4o-mini, custom + o1-preview,
classic gpt-4-turbo keeps max_tokens, llama3 keeps max_tokens).
- tests/agent/test_auxiliary_client.py::TestAuxiliaryMaxTokensParam: new
class, 7 tests covering the URL x model matrix.
xAI's consent page renders the authorization code in-page instead of
redirecting to the loopback callback, so the listener just hangs and the
manual-paste flow demands a callback URL that never contains the token.
- auth.py: poll stdin non-blockingly while waiting for the xAI loopback
callback; accept a pasted bare Grok Build code and substitute the locally
generated state (PKCE code_verifier still binds the exchange). No need to
wait for timeout or re-run with --manual-paste.
- computer_use: parse PNG/JPEG dimensions from base64 and fall back to the
text/AX/SOM payload when the screenshot is below the provider minimum
(8x8), which xAI rejects with HTTP 400.
- model_setup_flows.py: xAI credential reuse prompt uses the standard radio
picker via a shared _prompt_auth_credentials_choice helper.
- main.py: thread a title through _prompt_provider_choice; re-home the helper
import (flows live in model_setup_flows.py post-decomposition).
Salvaged from #36781 onto current main (contributor's main.py edits re-homed
to model_setup_flows.py, where the flows were extracted since the PR opened).
The shipped tri-state write_mode (on|off|approve) conflated two concepts —
whether writes are enabled and whether they're gated — so 'on' (writes flow
freely, gate inactive) read like 'gating is on'. Replace it with a single
clear boolean gate that defaults off.
memory.write_approval / skills.write_approval:
false (default) — write freely; the approval gate is off (pre-gate behaviour)
true — require approval: memory foreground prompts inline, memory
background-review + all skill writes stage for review
The old 'off = block all writes' mode is dropped; memory_enabled: false already
disables memory entirely, so a third 'block' state was redundant.
- tools/write_approval.py: get_write_mode/MODE_* → write_approval_enabled() bool;
evaluate_gate() loses the config-driven 'blocked' path (blocked now only comes
from an interactive user denial).
- tools/memory_tool.py, tools/skill_manager_tool.py: comment + behaviour follow.
- hermes_cli/config.py: memory/skills write_mode → write_approval (False);
_config_version 28→29 with a 28→29 migration that renames any persisted
write_mode (approve→true, on/off/unset→false) and drops the old key.
- slash commands: '/memory|/skills mode <on|off|approve>' → 'approval <on|off>'
('mode' kept as a back-compat alias); set_mode_fn callback now takes a bool.
- write_approval_commands.py, cli_commands_mixin.py, gateway/slash_commands.py,
commands.py: handlers + registry args/subcommands updated.
- docs + tests rewritten for the boolean model; added migration tests.
## What does this PR do?
The voice-during-active-run feature (#41984) changed
`_enrich_message_with_transcription` so that it returns a
`(enriched_text, successful_transcripts)` tuple instead of a bare string,
which lets callers echo the raw transcript back to the user. The signature
and every other return path were updated to match, but one branch was
missed: when a successfully transcribed clip arrives with the Discord
"empty content" placeholder as its caption, the method still returned the
prefix string on its own. All four call sites unpack the result with
`text, transcripts = await self._enrich_message_with_transcription(...)`,
so that path raised `ValueError: too many values to unpack (expected 2)`
and the inbound voice message was dropped instead of reaching the agent.
This is a real user-facing path rather than a corner case: a Discord voice
note sent without a caption is delivered as exactly that placeholder, so a
captionless voice message that transcribed correctly would crash the
handler precisely when transcription had worked. The fix returns the
proper tuple from that branch so the placeholder is still stripped while
the transcripts continue to flow back to the caller for the echo.
## Related Issue
N/A
## Type of Change
- [x] 🐛 Bug fix (non-breaking change that fixes an issue)
- [ ] ✨ New feature (non-breaking change that adds functionality)
- [ ] 🔒 Security fix
- [ ] 📝 Documentation update
- [ ] ✅ Tests (adding or improving test coverage)
- [ ] ♻️ Refactor (no behavior change)
- [ ] 🎯 New skill (bundled or hub)
## Changes Made
- `gateway/run.py`: in `_enrich_message_with_transcription`, return
`(prefix, successful_transcripts)` instead of a bare `prefix` from the
empty-content-placeholder branch, so the contract matches the signature
and the other return paths.
- `tests/gateway/test_stt_config.py`: add
`test_enrich_message_with_transcription_returns_tuple_for_empty_content_placeholder`,
which drives a successful transcription with the placeholder caption and
asserts the placeholder is stripped while the transcript is still returned.
## How to Test
1. Check out `main` and run the new test — it fails with
`ValueError: too many values to unpack (expected 2)`, reproducing the
crash a captionless Discord voice note would trigger.
2. Apply this change and re-run
`pytest tests/gateway/test_stt_config.py -q` — all tests pass.
3. `ruff check gateway/run.py tests/gateway/test_stt_config.py` and
`python scripts/check-windows-footguns.py gateway/run.py
tests/gateway/test_stt_config.py` both pass.
## Checklist
### Code
- [x] I've read the [Contributing Guide](https://github.com/NousResearch/hermes-agent/blob/main/CONTRIBUTING.md)
- [x] My commit messages follow [Conventional Commits](https://www.conventionalcommits.org/) (`fix(scope):`, `feat(scope):`, etc.)
- [x] I searched for [existing PRs](https://github.com/NousResearch/hermes-agent/pulls) to make sure this isn't a duplicate
- [x] My PR contains **only** changes related to this fix/feature (no unrelated commits)
- [x] I've run `pytest tests/ -q` and all tests pass
- [x] I've added tests for my changes (required for bug fixes, strongly encouraged for features)
- [x] I've tested on my platform: macOS 15 (Darwin 25.5)
### Documentation & Housekeeping
- [x] I've updated relevant documentation (README, `docs/`, docstrings) — or N/A
- [x] I've updated `cli-config.yaml.example` if I added/changed config keys — or N/A
- [x] I've updated `CONTRIBUTING.md` or `AGENTS.md` if I changed architecture or workflows — or N/A
- [x] I've considered cross-platform impact (Windows, macOS) per the [compatibility guide](https://github.com/NousResearch/hermes-agent/blob/main/CONTRIBUTING.md#cross-platform-compatibility) — or N/A
- [x] I've updated tool descriptions/schemas if I changed tool behavior — or N/A
* fix(dashboard): let _require_token endpoints work behind the OAuth gate
In gated/OAuth mode (non-loopback bind without --insecure) the dashboard
authenticates the SPA via a session cookie and deliberately does NOT inject
the legacy ephemeral _SESSION_TOKEN into index.html. gated_auth_middleware
verifies the cookie and attaches request.state.session before any non-public
/api/ route runs; the legacy auth_middleware short-circuits in this mode too.
But several handlers call _require_token() directly, which only validated the
(absent) _SESSION_TOKEN header. So every cookie-authenticated request to those
endpoints 401'd — making plugin install/enable/disable, /api/dashboard/plugins/hub,
and the other _require_token routes permanently unreachable behind the gate.
In the UI this surfaced as a 401: {"detail":"Unauthorized"} popup on plugin
install for any publicly-bound (e.g. Fly-hosted NAS) dashboard.
Fix: _require_token now defers to the active gate. When auth_required is True it
accepts the request iff the gate attached a verified session (and 401s otherwise);
loopback/--insecure behavior is unchanged (still validates the session token).
Adds two regression tests driving the full in-process stub OAuth round trip:
the install endpoint must NOT 401 a logged-in request, and must still 401 with
no cookie. Verified the accept-test fails on the pre-fix code.
* test(dashboard): cover the whole _require_token route class under the gate
The install popup was one symptom of a class-wide bug: all 14 endpoints that
call _require_token directly (API-key reveal, provider validation, the
OAuth-provider connect/disconnect flow, and plugin enable/disable/update/
delete/visibility/providers) 401'd cookie-authenticated requests in gated mode.
Add a parametrized test hitting a representative spread (plugins/hub, env/reveal,
providers/validate, an oauth provider route, agent-plugin enable) asserting a
logged-in caller is never 401'd — proving the fix covers the class, not just
agent-plugins/install.
`save_env_value()` captures the original .env file mode (e.g. 0640 for Docker
volume mounts) and restores it via `os.chmod` — but then unconditionally calls
`_secure_file(env_path)` on the next line, which re-tightens the mode to 0600
and defeats the entire preservation logic. The intent (preserve when
`original_mode` is captured, secure otherwise) was already in the code but
got short-circuited.
Move `_secure_file()` into the `else` branch so it only runs when no original
mode was captured — fresh `.env` files written for the first time still get
the 0600 hardening treatment, but operator-set modes survive subsequent writes.
Salvages #31518 by @blut-agent (config.py portion only). Their PR also bundled
unrelated lowercase-lookup changes in `hermes_cli/commands.py`; this salvage
takes only the focused config fix. The commands.py changes are reasonable on
their own merits but belong in a separate PR.
Co-authored-by: blut-agent <278569635+blut-agent@users.noreply.github.com>
The webhook 'platform disabled' card told users to enable it 'in your
messaging settings' — no such page exists. The webhook platform is
enabled on the Channels page (nav label), matching how every other
dashboard page refers to it.
skills_list() surfaces each skill's frontmatter `name:`, but skill_view()
only matched on the on-disk directory name (Strategy 2). When a skill's
directory is a shorter category/alias that differs from its frontmatter
name, skill_view(name) failed to find it. Extend the recursive Strategy-2
walk to also match frontmatter `name:`, guarded by a try/except so an
unreadable/malformed SKILL.md can't break discovery.
Adds a regression test that creates a skill whose directory name differs
from its frontmatter name and asserts skill_view resolves it (fails on
current main, passes with this change).
Salvaged the skill_view fix from #39682 onto current main as a standalone,
single-concern change with the test the original PR lacked.
Co-authored-by: foras910521-lab <foras910521-lab@users.noreply.github.com>
The Langfuse SDK treats `data:*;base64,...` strings as media and tries to
decode them. `_truncate_text` was slicing those strings mid-payload, producing
invalid base64 and noisy "Error parsing base64 data URI" logs. Observability
only needs the metadata, not raw image/audio bytes, so redact the whole data
URI (type, media_type, length) before it reaches the SDK.
Salvaged the Langfuse fix from #39682 onto current main as a standalone,
single-concern change (the dashboard `dist/**` and plugin-discovery parts of
that PR already landed separately on main).
Co-authored-by: foras910521-lab <foras910521-lab@users.noreply.github.com>
Adds memory.write_mode and skills.write_mode (on|off|approve), applied to
both foreground turns and the background self-improvement review fork — the
source of the unprompted 'wrong assumption' saves users reported.
- on (default): write freely, unchanged behaviour
- off: never write; the tool returns a clean disabled result
- approve: don't commit. Memory foreground writes prompt inline (small,
reviewable in a chat bubble); background memory writes and ALL skill writes
stage to a pending store instead (a SKILL.md is too large to review inline,
and a daemon thread can't block on a prompt)
Review staged writes from CLI or any messaging platform:
/memory pending|approve|reject|mode
/skills pending|approve|reject|diff|mode
Skill review respects the size asymmetry: inline you see a one-line gist;
the full unified diff stays out-of-band (/skills diff, dashboard, or the
staged JSON file).
New: tools/write_approval.py (gate + pending store), hermes_cli/
write_approval_commands.py (shared CLI+gateway handlers). Gates wired at the
single entry points memory_tool() and skill_manage(), using the existing
write-origin ContextVar to distinguish foreground from background_review.
The 'Install theme…' page is the one palette page rendered as a bespoke
component rather than through the shared CommandItem loop, so it missed the
compact HUD sizing. Route it through HUD_ITEM/HUD_TEXT and top-align the row
icon + status with the title line.
* chore(skills): remove red-team skills (godmode, obliteratus) from bundled catalog
Anthropic's output classifier on claude-fable-5 (and likely other Claude
models served through it) intermittently returns empty content for sessions
whose system prompt advertises these skills. The bundled skills-catalog block
is injected into every session's system prompt, so the descriptions
- red-teaming/godmode 'Jailbreak LLMs: Parseltongue, GODMODE, ULTRAPLINIAN'
- mlops/inference/obliteratus 'OBLITERATUS: abliterate LLM refusals (diff-in-means)'
trip the classifier on EVERY session regardless of which skill is actually
loaded, killing unrelated legitimate work (PR review, codebase audits, etc.).
Measured impact (controlled, interleaved A/B, claude-fable-5 via OpenRouter,
prompts differing only by the ~204 chars of these catalog lines, N=20 each):
catalog lines present -> 19/20 (95%) blocked
catalog lines absent -> 5/20 (25%) blocked
Removing them ~quartered the block rate. Rewording the descriptions was not
enough; the skills must leave the bundled catalog.
- Delete skills/red-teaming/godmode and skills/mlops/inference/obliteratus
- Drop their generated doc pages + catalog/sidebar entries (EN + zh-Hans)
- Drop the godmode hand-written-page exception in generate-skill-docs.py
* chore(skills): relocate godmode + obliteratus to optional-skills
Rather than deleting outright, move both into optional-skills/ so they remain
installable via `hermes skills install` while leaving the always-injected
bundled catalog (which is what tripped Anthropic's classifier).
- optional-skills/security/godmode (was skills/red-teaming/godmode)
- optional-skills/mlops/obliteratus (was skills/mlops/inference/obliteratus)
- regenerate optional-skills catalog + sidebar entries
The per-file test runner re-runs a file once when pytest exits 4 ("file or
directory not found") while the file exists on disk — a transient seen on
loaded shared CI runners where the planner collects a file (--collect-only
counts its tests) but the per-file subprocess fails to stat it moments later.
A single immediate retry could land in the same brief high-load window and
fail again, and the retry was gated on one Path.exists() check that can itself
be a flaky stat under that load — so a freshly-added test file that LPT pins to
one shard would deterministically red that shard on every run (no actual test
failure; the file just never executes).
- Extract the subprocess spawn/communicate/process-tree-kill logic into a
shared _spawn_pytest_once() helper (removes ~90 lines of duplication between
the primary run and the retry).
- Replace the single-shot retry with a bounded backoff loop
(_EXIT4_RETRY_ATTEMPTS, escalating sleep) that re-runs while the file is
present on disk.
- Add _file_present() which re-checks existence across a few spaced stats, so a
single flaky negative stat doesn't wrongly conclude the file is missing. A
genuinely-missing file (typo/deleted) still fails fast — exit 4 is not
swallowed when the file truly does not exist.
- Tests: transient-then-pass recovery, genuinely-missing fails fast with no
retry, give-up after max attempts, and _file_present transient/missing cases.
Imported VS Code themes now carry their integrated-terminal ANSI palette
(`terminal.ansi*`), keyed to the painted variant (terminal / darkTerminal).
The terminal adopts it when the full base-8 set is present and keeps its VS
Code defaults otherwise; withSurface still owns the background, so the pane
stays translucent.
Pull the command palette and session switcher into a shared top-center HUD
(`floating-hud.ts`): no dim/blur backdrop, one compact text + item-padding
size, sidebar-label-style section headers (brand-tinted, uppercase), and the
themed portal scrollbar.
AGENTS.md was almost entirely how-to/mechanics with the want/don't-want
guidance implicit and scattered. Adds a single authoritative intent layer
near the top, calibrated against what actually merges and what actually
gets rejected.
- 'What Hermes Is': framing + the two properties that drive design
(prompt-cache integrity, narrow-waist core).
- 'Contribution Rubric': dual-purpose intent doc — (1) for humans/own work:
what gets merged vs rejected; (2) for the triage sweeper: when a PR is safe
to close on the three allowed reasons AND when NOT to close one. Taste-based
'won't implement / out of scope' closes stay human-only by design.
- 'What we want' calibrated against the last ~55 merges: fix real bugs well,
expand reach at the edges (platforms/channels/providers/models/desktop —
large features land routinely), refactor god-files into clean modules,
keep the CORE narrow. 'Expansive at the edges, conservative at the waist.'
- 'What we don't want': speculative hooks, .env-for-non-secrets, needless
core tools, lazy-read escape hatches, feature-destroying fixes, ungated
telemetry, change-detector tests, core-touching plugins.
- 'Before you call it a bug — verify the premise (and when NOT to close)':
distilled from real closes (#41741 intentional-design-not-a-gap, #41610
wrong-premise, #42327 fix-never-executes, #42393 deliberate-omission,
#41999 overreach). Doubles as sweeper guidance to avoid wrongly closing
legitimate PRs.
- 'The Footprint Ladder' (core-tool decision): extend > CLI+skill > gated tool
> plugin > MCP server in the catalog > new core tool (last resort).
Trim: 'Adding New Tools' intro points at the ladder. Detailed mechanics stay
where readers need them.
* fix(desktop): honor default project directory for new sessions
The Settings picker persisted project-dir.json but the renderer kept
seeding new chats from sticky localStorage home. Prefer the configured
default on boot and session.create, pin TERMINAL_CWD at backend spawn,
and reject packaged install-dir paths that regressed after #37536.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(desktop): address review on default project dir PR
Add workspace cwd precedence tests, extract isPackagedInstallPath for
platform test coverage, and stop rewriting live $currentCwd when a
session is already active (cache-only until the next new chat).
Co-authored-by: Cursor <cursoragent@cursor.com>
---------
Co-authored-by: Cursor <cursoragent@cursor.com>
* refactor(desktop): dock terminal under chat and simplify file rail
Keep the right rail focused on file browsing while moving the persistent terminal into the chat column bottom slot, and make terminal colors follow the active light/dark mode instead of a fixed Solarized palette.
* fix(desktop): make the terminal a resizable, themed side pane
- Move the terminal into a resizable pane (viewport-% widths) that shares
<main>'s stacking context, so its drag handle no longer sits under the
fixed terminal overlay; works on either rail side.
- Restore +x on node-pty's spawn-helper before the first spawn to fix
"posix_spawnp failed" on macOS prebuilds (real cause; drop the redundant
shell-candidate retry loop).
- Gate terminal open/fit/start on document.fonts.ready and strip leading
blank rows (re-armed before the resize Ctrl-L redraw) so the prompt sits
flush at the top with no starship add_newline gap.
- Inherit the app editor-surface color as the terminal background.
- Bind Ctrl+` (⌃` on macOS) to toggle the terminal; add a palette entry.
* feat(desktop): show platform hotkey hints in the command palette
- Render each palette item's live binding as a <KbdGroup> hint via a new
comboTokens() helper (mac shows ⌘/⌃/⌥/⇧, every other platform shows
Ctrl/Alt/Shift — never a ⌘ on PC).
- Default the terminal toggle to ⌘` / Ctrl+` (the ~ key) on both platforms.
- Drop the hardcoded (⌘⏎) baked into the composer steer tooltip; render it
platform-aware with formatCombo instead.
* fix(desktop): drop the active check on the command-palette terminal item
* fix(desktop): remove active/check states from the command palette
* fix(desktop): allow ⌥/Shift-drag selection over mouse-mode TUIs
Full-screen apps (hermes --tui, vim) enable mouse reporting, so a plain
drag can't select text and ⌘/Ctrl+L (add-selection-to-chat) had nothing
to send. Enable macOptionClickForcesSelection so ⌥-drag on macOS (Shift
elsewhere) forces a native selection over mouse-mode apps.
* feat(desktop): tell the in-pane agent it's embedded in the GUI
Set HERMES_DESKTOP_TERMINAL=1 on the terminal pane's shell env and surface
it in build_environment_hints, so a hermes/--tui launched inside the pane
knows it's next to the GUI chat and that ⌥/Shift-drag + ⌘/Ctrl+L sends a
selection to the composer. Distinct from HERMES_DESKTOP (agent backend).
* refactor(desktop): drop the redundant Ctrl+` terminal-toggle fallback
The toggle now ships as mod+` on both platforms, so the standard combo
index handles it — the bespoke fallback (and its stale 'old default'
comment) is dead weight.
* fix(desktop): read live terminal selection for ⌘/Ctrl+L
A redraw-heavy TUI (spinners/clocks) outruns onSelectionChange, leaving the
React selection state empty so the state-gated shortcut listener never
attached and ⌘L no-op'd. Always listen and read xterm's live selection (with
a native fallback) at press time; only swallow the key when there's text to
send. Drops the now-redundant custom key handler.
* feat(desktop): make any agent aware it's in the Hermes desktop GUI
Generalize the runtime-surface hint: fire for HERMES_DESKTOP (the backend
powering the GUI chat) as well as HERMES_DESKTOP_TERMINAL (a hermes in the
embedded terminal pane), so it's about being inside the desktop GUI, not
about being a TUI. The terminal-pane selection note stays pane-specific.
* feat(desktop): give the GUI agent a read_terminal tool
The in-app terminal buffer lives in the renderer (xterm), so expose it to the
chat agent over the same blocking bridge clarify uses: read_terminal emits
terminal.read.request, the renderer serializes the buffer (visible screen by
default, or a start_line/count range against total_lines) and answers
terminal.read.respond. Gated to the GUI via HERMES_DESKTOP.
Also restores the flipped-layout titlebar inset fix (app-shell +
desktop-controller) for terminal/preview rails at the window's left edge.
* chore(desktop): trim read_terminal comments
* feat(desktop): add a terminal toggle to the statusbar
The file rail lost its terminal icon, leaving ⌘` and the command palette
as the only ways in. Add a one-click toggle to the statusbar's left
cluster, mirroring the command-center item: it reads $terminalTakeover so
it lights up while the pane is open and stays in sync with the hotkey, and
is gated to chat view (the only place the pane can show).
* fix(desktop): relabel the terminal header button to what it does
The in-pane button claimed a focus/split fullscreen toggle ("Focus
terminal view" / "Return to split view", screen-full/normal icons), but
the terminal is just a resizable side pane — there's no fullscreen. The
button only mounts while the pane is open, so the focus branch was dead
and clicking it merely closed the terminal. Relabel to "Hide terminal"
with a close icon, drop the dead conditional and the now-unused takeover
read.
* fix(desktop): move the terminal toggle next to the version item
Relocate it from the left cluster to the right of the statusbar, just
left of the client version item.
* feat(desktop): default the terminal to PowerShell on Windows
Prefer pwsh (7+) then Windows PowerShell 5.1 over cmd.exe, falling back to
comspec only when neither is present. -NoLogo drops the startup banner so
the prompt sits flush like the POSIX shells.
* feat(desktop): show a persistent divider on the terminal pane
The resize sash only painted on hover, so the terminal/chat boundary was
invisible at rest. Add an opt-in `divider` prop to Pane that paints a thin
resting hairline on the resize edge (side-aware, so it tracks the rail when
the layout flips) and enable it on the terminal pane.
* refactor(desktop): resolve the terminal shell instead of hardcoding it
Make shell selection a real resolver: an explicit override wins
(HERMES_DESKTOP_SHELL on both platforms, $SHELL on POSIX), otherwise
auto-detect the best installed shell — pwsh > Windows PowerShell 5.1 > cmd
on Windows, zsh > bash > sh on POSIX. A shared shellSpecFor() picks the
interactive flags by family, so an overridden bash/pwsh/cmd all launch
correctly.
* fix(desktop): repaint the terminal on light/dark switch
Setting term.options.theme updated colors for the DOM renderer but not the
WebGL one, which caches glyph colors in a texture atlas — so already-drawn
cells kept their old palette after a mode switch. Hold the WebglAddon in a
ref and clear its atlas when the theme changes.
* fix(desktop): match the terminal palette to VS Code Light+/Dark+
Adopt VS Code's exact default ANSI palette (the terminalColorRegistry
defaults), enable minimumContrastRatio: 4.5 so foregrounds are clamped
against the background the way the integrated terminal does, and key the
light/dark choice off renderedMode (the painted surface) instead of
resolvedMode so it can't invert. The canvas + inset paint the live skin
surface (--ui-editor-surface-background) so the terminal blends with the
app and follows light/dark, while the contrast clamp keeps colors crisp.
* fix(desktop): tighten command palette search to substring matching
cmdk's default fuzzy scorer matched anything with the query letters
scattered across an item, so e.g. "color" never narrowed to color
entries. Add a substring filter: every typed word must literally appear
in an item's value/keywords, keeping results tight and predictable.
* fix(desktop): blend the terminal header into the skin surface
The persistent-terminal overlay painted the static palette background
(#1e1e1e/#ffffff), so the transparent header strip revealed a near-black
slab above the surface-colored body. Paint the overlay with the live
--ui-editor-surface-background so header and body read as one pane.
* fix(desktop): re-resolve the terminal surface on skin switch
The canvas surface only re-resolved on light/dark change, so switching
skins at the same mode left the WebGL canvas painted with the old tint
until reload. Key the resolve off themeName too. Also trim the palette
comments.
* chore(desktop): drop redundant terminal theming header comment
Browse + install color themes from the VS Code Marketplace straight from
Cmd-K and Settings → Appearance. The Electron main process resolves the
extension, unzips the .vsix with a hand-rolled zip reader (zlib only, no
new deps), and hands back the raw theme JSON; the renderer converts it to
a DesktopTheme with a small seed → color-mix mapping.
- Folds an extension's light + dark variants into one theme family, so the
light/dark toggle switches Solarized/GitHub variants and installing in
dark mode stays dark.
- Guarantees accent contrast (WCAG AA) so imported sidebar labels read
instead of vanishing into the surface.
- Filters icon/product-icon packs out of the Themes-category search.
- "Install theme…" lives atop the Cmd-K theme picker; imports fold into
the Light/Dark groups by the modes they support.
* fix(gateway): auto-start after container restart via planned-stop marker
On Docker (s6-overlay), the gateway runs as a dynamically-registered s6
service. When the container stops/restarts/upgrades, s6 sends the gateway
a plain SIGTERM. The shutdown path (_stop_impl) ended with an
unconditional _update_runtime_status("stopped"), persisting
gateway_state=stopped to the volume. container_boot.py reads that on the
next boot and only auto-starts gateways whose last state was "running"
(_AUTOSTART_STATES) — so after a routine `docker compose up
--force-recreate` the gateway stays down and messaging channels silently
go dark, with no error surfaced (issue #42675).
The codebase already distinguishes intentional stops from unexpected
signals via the planned-stop marker (write_planned_stop_marker /
consume_planned_stop_marker_for_self): `hermes gateway stop`,
systemd/launchd ExecStop, and Ctrl+C write a marker before signalling,
so the handler classifies them as planned. An unmarked SIGTERM
(container/s6 restart, OOM, bare kill) is signal-initiated.
This wires that existing classification through to the state persist,
rather than adding unreliable signal-source inference:
- run.py: GatewayRunner._signal_initiated_shutdown, set in
shutdown_signal_handler's unmarked-signal branch. In _stop_impl, a
signal-initiated (non-restart) teardown now persists "running" instead
of "stopped" — preserving the operator's run-intent and overwriting the
mid-shutdown "draining" marker so _AUTOSTART_STATES matches on reboot.
Operator stops and restarts persist "stopped" as before.
- service_manager.py: S6ServiceManager.stop() now writes the planned-stop
marker for the supervised PID (read from s6-svstat) before `s6-svc -d`,
so an in-container `hermes gateway stop` is correctly classified as
intentional (parity with the systemd/launchd/host stop paths, which
already mark). Best-effort: a marker-write failure falls back to the
safe signal-initiated path.
Tests: shutdown persist-decision table (signal→running, operator→stopped,
restart→stopped), s6 stop marker write + svstat PID parse + failure
tolerance. The signal→running and s6-marker tests fail without the
respective source change. Verified end-to-end against a container built
from this branch: an unmarked SIGTERM to the live gateway leaves
gateway_state=running (shutdown-context log confirms signal path);
existing real container-restart suite still green.
* docs(docker): clarify gateway autostart distinguishes operator-stop from container-kill
The per-profile-supervision section described the autostart-across-restart
contract as "running gateways come back, stopped stay stopped" without
spelling out what records 'stopped'. That contract was the source of
#42675 confusion: users expected a restart to bring the gateway back and
it didn't. With the write-side fix, only an explicit `hermes gateway stop`
records 'stopped'; container/s6 restart SIGTERMs (incl. image upgrades and
unexpected exits) leave the state 'running' so the gateway auto-starts.
Make that distinction explicit in both the multi-profile and
per-profile-supervision sections.
* test(docker): real-restart autostart E2E for #42675
Adds test_live_gateway_autostarts_after_real_restart_without_manual_state_stamp:
a live s6-supervised gateway is killed by an actual `docker restart`
SIGTERM (no manual gateway_state stamp, no planned-stop marker) and must
auto-start on the next boot. Exercises the WRITE side of the fix that the
existing stamp-based tests bypass.
Verified to FAIL against an origin/main image (reconciler logs
prior_state=stopped action=registered — the #42675 bug) and PASS against
the fixed image (prior_state=running action=started).
The runtime assembled-prompt scan (#3968 lineage) selected its pattern
tier on has_skills alone. A script-driven, no-skills job injects its
script's stdout into the prompt, and that blob was scanned with the
STRICT user-prompt pattern set — so any command-shape string in the
data feed (e.g. a triage bot ingesting a bug report that quotes
`rm -rf /`) hard-blocked the job on every tick.
Script output and context_from output are runtime DATA produced by
operator-authored code — the same trust class as install-vetted skill
markdown, not a user-authored directive prompt. Select the scan tier by
what the assembled prompt CONTAINS: when it includes skill content OR
injected data, use the looser _scan_cron_skill_assembled set (keeps
unambiguous injection directives, drops command-shape patterns,
sanitizes invisible unicode instead of blocking).
Defense-in-depth is preserved:
- The raw user prompt is still strict-scanned at create/update
(api_server paths untouched) AND re-scanned strict at runtime even
when the looser tier was selected for the data blob.
- Plain no-script/no-skills jobs keep the strict scan on the whole
assembled prompt.
- Injection directives arriving via script stdout still block.
Rejected alternative: removing destructive_root_rm from the strict set
or a per-job skip_injection_scan flag — both weaken the guard globally.
A non-empty HERMES_DASHBOARD_PUBLIC_URL / dashboard.public_url value that
fails URL validation (overwhelmingly: a missing http(s):// scheme, e.g.
"hermes.domain.com") was silently discarded by resolve_public_url(),
falling back to reconstructing the OAuth redirect_uri from request
headers. Behind a reverse proxy that doesn't forward X-Forwarded-Proto
reliably, that yields an http:// callback even though the operator
explicitly set the public URL — with no signal as to why (#42780).
Emit a deduplicated operator-facing WARNING (once per distinct value,
since resolve_public_url runs per request) naming the offending value
and the required scheme. Turns a silent footgun into a self-diagnosing
one; behaviour is otherwise unchanged.
Tests assert the warning fires for a scheme-less value, is deduplicated
across repeated calls, and stays silent for a valid value — all three
fail without the fix.
Pops a session into a standalone, focused window for side-by-side work.
A secondary window loads the renderer at the session route with a
?win=secondary flag (ahead of the HashRouter '#'); it drops the global
sidebar plus the install/onboarding overlays and renders a single chat,
sharing the one local gateway over WS (no backend duplication). The main
process keys windows by sessionId so re-opening focuses the existing one
and self-cleans on close.
Open it via:
- ⌘-click (mac) / ⌃-click (win/linux) a sidebar session — the universal
"open in new window" gesture. Archive moves to the ⋯ / right-click menus
only, off the easy-to-misfire modifier-click.
- "New window" in the session ⋯ and context menus (link-external icon,
i18n'd across en/ja/zh/zh-hant).
A standalone window has no left rail, so AppShell treats its edge as
uncovered and applies the titlebar inset — the chat title clears the
macOS traffic lights instead of hiding behind them.
Co-authored-by: tim404x <tim404x@users.noreply.github.com>
The terminal/console titlebar was composed from status marker + model +
cwd only; the session's (auto-)title never appeared, even though the TUI
already knows it.
Change the format to `<marker> <session name> · <model> · <cwd>`, with the
session name and cwd each omitted when absent so single-segment titles stay
clean. The current session's live title is pulled from the existing
session.active_list poll (which already carries each session's current flag
and title), so there's no extra round-trip; UiState gains a sessionTitle
field updated only when it actually changes, preserving the existing
idle-flicker guard.
Extract the join logic into a pure composeTabTitle() helper in domain/paths
and cover its edge cases (name omitted, cwd omitted, whitespace-only name,
marker-only fallback, truncation, boundary length) in paths.test.ts.
Bind session.next/prev to Control+Tab / Control+Shift+Tab with a distinct
`ctrl` modifier token (literal Control on macOS — not Cmd, which the OS
reserves). Add ^1…^9 positional jumps mirroring profile ⌘1…⌘9.
Mac-style interaction:
- Quick ^Tab tap jumps on keydown with no HUD (even if Ctrl stays down)
- Hold Tab ~220ms, or tap Tab again while Ctrl is held → compact HUD
- Ctrl↑ commits the highlight; Esc cancels; rows clickable (^+click safe)
- Recency-ordered list snapshotted on open; cycles by stored session id
Includes combo.test.ts + session-switcher.test.ts.
* fix(desktop): prevent sidebar section overlap
Use a shared sidebar section scroller only on short windows so sections do not overlap, while preserving per-section scrolling on taller layouts.
* fix(desktop): measure section stack for compact sidebar mode
Window-height media query kept big windows in compact mode whenever the OS chrome ate into 830px; observe the section stack element instead so compact only engages when the stack is actually short.
* refactor(desktop): drive sidebar compact mode with CSS, not JS
Replace the matchMedia hook with a `short` (max-height: 830px) Tailwind
variant so the per-section scrollers flatten into one shared scroll stack on
short windows purely in CSS. Taller windows keep their per-group scrollers and
recents virtualization unchanged.
* refactor(desktop): pure-CSS two-mode sidebar scroll + collapse/cap groups
Drop the JS-measured compaction in favour of a single `compact` height
variant (max-height: 768px):
- tall: every section is its own capped, independent scroller; Sessions
is the lone flex-1 scroller.
- short: sections flatten and the stack scrolls as one.
Every section is now `shrink-0`, so nothing is squeezed below its
content and bled onto a sibling — the root cause of the header overlap
(flexbox implied min-size). Sessions keeps its virtualized scroller in
short mode only when it's the long list.
Non-session groups (messaging, cron) collapse by default — expanded ids
persist per platform — and render 3 rows, revealing 10 more on demand.
Extract the shared SidebarLoadMoreRow. Stress harness seeds 50 recents
to mirror the real first page.
* chore(desktop): trim sidebar comments, unify "compact" naming
Self-review polish: condense the over-long mode comments, use "compact"
consistently (matching the variant) instead of mixing "short", and drop a
no-op useCallback around revealMoreMessaging.
* chore(desktop): drop dev sidebar stress harness from the PR
Remove stress-probe.ts and its main.tsx import — it was a throwaway
testing aid, not something to ship.
* fix(desktop): send on Enter from live editor text, not stale composer state
Pressing Enter often did nothing (~90% with IME / fast typing); adding a
trailing space "fixed" it. The composer's submit path read the draft from the
AUI composer state (`useAuiState(s => s.composer.text)`) and the derived
`hasComposerPayload`, both of which lag the contentEditable DOM by a render. On
fast typing or IME composition the final keystroke(s) weren't in state yet, so
`submitDraft()` saw an empty draft and dropped the message. A trailing space
only worked around it by forcing an extra input event that flushed the state.
submitDraft() now refreshes draftRef from the editor node and submits/queues
based on the live DOM text, and the Enter handler decides the queue-drain vs
submit branch from the DOM too. draftRef is already synced on every input
event, so this just closes the in-flight-keystroke gap.
Fixes#39630. Also addresses the "typing + Enter does nothing" reports in
#39623.
* test(desktop): cover Enter-submit from live editor text (#39630)
Pin the contract that the composer's Enter path reads the live DOM editor
text, not the render-lagged composer state: a just-typed message sends even
when state hasn't synced; while busy it queues (never drains the queue or
cancels); an empty Enter while busy is a no-op; and an empty idle Enter
drains the next queued prompt. Faithful DOM-event repro mirroring
handleEditorKeyDown + submitDraft.
* fix(tui_gateway): honor target profile's terminal.cwd on desktop profile switch
The desktop's app-global remote mode serves every profile from one
tui_gateway backend, so the process-global TERMINAL_CWD only reflects the
launch profile. After switching profiles, a new session resolved its
workspace from that stale env var and inherited the previous profile's
directory.
Add _profile_configured_cwd() to read a non-launch profile's own
terminal.cwd from its config.yaml (skipping placeholder/empty/missing and
non-existent paths so callers fall back cleanly), and wire it into
_completion_cwd() with precedence: explicit client cwd -> existing session
cwd -> bound profile's configured cwd -> TERMINAL_CWD -> os.getcwd().
Fixes#40334
* test(tui_gateway): cover per-profile cwd resolution (#40334)
Pin the new contract: _profile_configured_cwd reads a profile's own
terminal.cwd and rejects placeholders/missing paths, and _completion_cwd
prefers a bound profile's cwd over a stale launch-profile TERMINAL_CWD
while still letting an explicit client cwd win.
The "..." overflow that opens the profile manager (the only UI to edit a
profile's SOUL.md) was gated behind profiles.length > 1, so a user with
only the default profile couldn't edit its persona without first creating
a throwaway second profile. Render it unconditionally.
A binary @file: ref (PDF, docx, spreadsheet, …) expanded to a bare
"binary files are not supported" warning with no content. The model saw a
failure and gave up — e.g. a dropped PDF came back as a text note claiming the
type was unsupported, even though the file was staged on disk right next to it.
Inject an actionable content block instead: the path, mime type, size, and a
nudge to use its tools to read/convert/view the file (and explicitly not to tell
the user the type is unsupported). General across every binary type — not
PDF-specific. The file already resolves where the agent's tools run (local cwd
or the staged copy in a remote session workspace), so it can act on it directly.
* fix(state.db): recover from malformed sqlite_master so hidden sessions reappear
The corruption class behind "Desktop/Dashboard show no sessions while
hundreds of session files sit on disk" is a malformed sqlite_master — most
often a duplicate object row, e.g. two CREATE VIRTUAL TABLE messages_fts
entries — surfacing as:
sqlite3.DatabaseError: malformed database schema (messages_fts) -
table messages_fts already exists
SQLite parses the whole schema while preparing the FIRST statement on a
connection, so on this class every statement fails before it runs: PRAGMA
journal_mode (which is where SessionDB.__init__ actually trips, in
apply_wal_with_fallback, BEFORE _init_schema), PRAGMA integrity_check, and
even DROP TABLE. The only operations that still work are
PRAGMA writable_schema=ON plus direct sqlite_master surgery. A plain
FTS-index rebuild at the _init_schema layer therefore cannot reach or fix
this; the canonical sessions/messages rows are intact — only the derived
schema is broken.
Add a dedicated recovery that operates where the failure actually happens:
- hermes_state.repair_state_db_schema(): backs up the raw file first, then a
least-destructive ladder — (1) de-duplicate sqlite_master keeping the
lowest rowid per object (preserves the existing FTS index), escalating to
(2) drop every messages_fts* schema object + VACUUM and let the next open
rebuild the FTS index from messages. sessions/messages are never modified.
Plus is_malformed_db_error() to discriminate this class.
- SessionDB.__init__ auto-heals: on a malformed-schema open error it repairs
once (process-guarded against loops / concurrent web_server opens) and
reopens, so Desktop/Dashboard recover on their own instead of silently
showing "no sessions".
- hermes doctor --fix detects the malformed class and repairs it (reporting
the recovered session count + backup name).
- hermes sessions repair [--check-only] [--no-backup] runs on the raw file
path, since SessionDB() itself cannot open a malformed DB.
Supersedes #32589 and #33869: both targeted FTS corruption but gated their
repair behind statements (integrity_check / SELECT / DROP TABLE) that
themselves fail on this class, and neither addressed the apply_wal_with_fallback
open-time failure. Credit preserved via Co-authored-by.
Closes#33865.
Co-authored-by: João Vitor Cunha <145560011+plcunha@users.noreply.github.com>
Co-authored-by: Tuna Dev <273476039+tuancookiez-hub@users.noreply.github.com>
* test(state.db): cover strat-B escalation + unrepairable safe-fail paths
---------
Co-authored-by: João Vitor Cunha <145560011+plcunha@users.noreply.github.com>
Co-authored-by: Tuna Dev <273476039+tuancookiez-hub@users.noreply.github.com>
Remote attachments read their bytes through the readFileDataUrl IPC, which is
hard-capped at 16MB and rejects with a raw "file is too large (N bytes; limit M
bytes)" string straight into the failure toast (helix4u review note on #43109).
Translate that into "<file> is too large to upload to the remote gateway (max
16 MB)", parsing the limit out of the message so it tracks the real cap. Applies
to both the image and non-image remote read paths; non-cap errors pass through
unchanged. Adds unit coverage for both.
The message-edit composer staged dropped OS files asynchronously with no
visible state, so confirming the edit before the upload resolved could send
the message without the gateway-side ref (helix4u review note on #43109).
Add a staging flag: while uploadOsDropRefs is in flight, show a small spinner
pill in the bubble and block submit (disabled send button + submitEdit guard)
so the edit can't outrace the ref insertion. New `attachingFile` i18n string
across en/zh/zh-hant/ja.
Two races in the drop-time eager upload:
- Resurrected chip: the success path used addComposerAttachment, which
re-appends when the id is gone, so a file removed mid-upload reappeared once
the upload resolved. Add updateComposerAttachment (update-only; no-op when the
chip was removed) and use it on both the eager success path and submit-time
sync.
- Duplicate upload: submit-time sync didn't join an eager upload still in
flight, so drop-then-Enter could fire file.attach twice and leave a duplicate
under .hermes/desktop-attachments/. Track in-flight eager uploads by id and
await the pending one before deciding to re-upload, reusing its gateway ref.
Tests: composer-store no-resurrect unit tests + a join-on-submit integration
test asserting a single file.attach.
Addresses @helix4u review on #43109.
Both jobs in tests.yml (`test` matrix and `e2e`) start from a cold uv
cache on every run and install deps with `uv pip install -e ".[all,dev]"`,
which re-resolves pyproject.toml ranges and rebuilds the editable install
each time.
Two changes:
1. Enable uv's official CI caching via setup-uv's `enable-cache: true`,
keyed on pyproject.toml + uv.lock, plus `uv cache prune --ci` to keep
the persisted cache small. Warm runs install from cache instead of
re-downloading/building wheels.
2. Replace the manual `uv venv` + `uv pip install -e` with
`uv sync --locked --python 3.11 --extra all --extra dev`. sync installs
the exact pinned set from uv.lock (and fails if the lock is stale vs
pyproject.toml), creating .venv itself. This is reproducible and, with a
warm cache, measurably faster than the editable pip install (~3-4x on the
steady-state install step locally). Downstream steps keep using
`source .venv/bin/activate`; sync writes .venv to the same path.
Follows the Astral-recommended pattern for uv in GitHub Actions:
https://docs.astral.sh/uv/guides/integration/github/
Co-authored-by: Wesley Simplicio <wesleysimplicio@live.com>
The in-flight user bubble seeded image attachment refs as `@image:<localpath>`.
In remote-gateway mode that path lives on the desktop, not the gateway, so the
inline thumbnail fetch hit /api/media and 403'd ("Path outside media roots"),
flashing a fallback chip until submit uploaded the bytes.
Seed (and keep) image refs as the raw base64 preview data URL instead. It
renders inline via extractEmbeddedImages with zero network, and survives the
post-sync rewrite (the agent gets the bytes through the attached-image pipeline,
not this display ref) so the thumbnail no longer remounts/flashes. Non-image
refs are unchanged.
Adds optimisticAttachmentRef + unit coverage.
Finder/OS drops became `@file:/Users/...` refs that only resolve when the
gateway shares the local disk, so on a remote gateway non-image files
(PDF/CSV/Markdown/...) never reached the agent. Route OS drops through the
file.attach / image.attach_bytes upload pipeline — in-app project-tree and
gutter drags stay inline workspace-relative refs — across every drop surface:
the conversation area, the composer form, the contenteditable input, and the
message-edit composer (which still reproduced the bug).
Also:
- upload dropped files eagerly when a session exists, so the card shows a
spinner instead of stalling the send (images stay submit-time to avoid
racing their thumbnail write);
- round the attachment card and drop the monospace detail;
- render image previews from the bytes we already hold, so a pasted/dropped
screenshot shows its thumbnail and previews even when its only on-disk copy
is a transient path (the data URL is not persisted to localStorage).
Supersedes #38615, #41203.
Co-authored-by: LeonSGP <154585401+LeonSGP43@users.noreply.github.com>
Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
The Anthropic picker returned the live /v1/models dump verbatim whenever
credentials were configured. Anthropic's API lags newly-routed curated
aliases (e.g. claude-fable-5, reachable on Anthropic before the models
endpoint enumerates it), so the curated entry vanished from the picker.
Merge curated _PROVIDER_MODELS["anthropic"] with the live catalog —
curated first, live-only appended, deduped — mirroring the OpenAI
curated-merge path. Live failure / no creds falls back to curated verbatim.
* docs(windows): correct native data dir to %LOCALAPPDATA%\hermes
The Windows-native guide claimed a deliberate split where config, auth,
skills, and sessions live under %USERPROFILE%\.hermes. That is not what
the installer does: scripts/install.ps1 sets HERMES_HOME=%LOCALAPPDATA%\hermes,
so data actually lives in %LOCALAPPDATA%\hermes alongside the disposable
install (the hermes-agent\, git\, node\, bin\ subdirectories) — `hermes
config` confirms config.yaml/.env resolve there, not under %USERPROFILE%.
Update the data-layout table, the "split is deliberate" note, the env-var
and uninstall sections to describe the real layout: data and install share
the %LOCALAPPDATA%\hermes root, reinstall only replaces hermes-agent\, and
a full wipe targets %LOCALAPPDATA%\hermes (with %USERPROFILE%\.hermes kept
only as a legacy/WSL cleanup). Mention HERMES_HOME as the override knob.
* docs(windows): fix PATH + bin layout to match installer
The installer adds hermes-agent\venv\Scripts (where hermes.exe lives) to
User PATH and sets HERMES_HOME — not %LOCALAPPDATA%\hermes\bin. The \bin
dir holds Hermes's managed uv.exe, not a hermes.cmd shim. Correct the
install-step list and the data-layout table accordingly.
* fix(install): show real HERMES_HOME path in setup messages
The native Windows installer wrote config/env/skills under $HermesHome
(%LOCALAPPDATA%\hermes) but its success messages claimed ~/.hermes,
which doesn't exist on native Windows. Print the actual paths so a new
user can find their config, .env, and skills.
* fix(desktop): rebind sessions after websocket reconnect
* docs(desktop): explain the reconnect-resume guard in use-route-resume
The reconnect fix turns on two subtle conditions with no inline rationale:
`seenGatewayStateRef` suppresses a spurious "became open" on the first effect
run (so a session mounting with the gateway already open doesn't double-resume),
and the `gatewayBecameOpen ||` arm forces a re-resume even when the route looks
`alreadyActive` because the cached runtime id can be stale after the gateway
rebinds/reaps the session. Comment both so the next reader doesn't "simplify"
them back into the original bug. No behavior change.
---------
Co-authored-by: Josh Dow <josh.dow@prepad.io>
The previous fix (#42991) only omitted reasoning when it was being disabled.
But reasoning-mandatory Anthropic models (Claude 4.6+, fable) 400 with
thinking.type.disabled on EVERY tool-continuation turn even when reasoning is
enabled: chat_completions never replays signed thinking blocks, so the prior
assistant tool_call has no thinking, and OpenRouter resolves "reasoning
requested but history has none" by emitting thinking.type.disabled — which
these models reject. Result: first turn works, every turn after the first tool
call dies (HTTP 400, non-retryable).
OpenRouter ignores reasoning.effort for adaptive Anthropic models anyway (the
model self-decides), so the reasoning field is pointless for them on every turn
and harmful on tool-replay turns. Omit it entirely → adaptive default.
- openrouter profile: drop the reasoning field for reasoning-mandatory Anthropic
models regardless of enabled/disabled; legacy Anthropic + non-Anthropic models
unchanged.
- tests: assert omission across enabled/disabled/effort variants; parity tests
switched to a non-Anthropic reasoning model (deepseek) since Anthropic 4.6+ no
longer carries a reasoning field.
Verified live end-to-end: a tool-replay turn on anthropic/claude-fable-5 with
reasoning enabled now builds extra_body=None and returns HTTP 200 (was 400).
* fix(install): self-heal a stuck Electron download on the desktop build
The desktop build downloads Electron (~114MB) from GitHub. A corrupt cached
zip, or a blocked/throttled GitHub release host (the repeating "retrying" log),
hard-failed the install — and install.sh had no recovery at all while
install.ps1 / `hermes desktop` only purged the cache.
All three build paths now escalate on a failed `npm run pack`:
GitHub → purge corrupt electron-*.zip + stale *-unpacked and retry → one retry
via a public Electron mirror (npmmirror.com). @electron/get SHASUM-verifies the
download, and a user-pinned ELECTRON_MIRROR is always respected (never
overridden). Adds a bash clear_electron_build_cache()/_desktop_pack() to mirror
the existing PowerShell/Python helpers.
* test(install): cover the Electron mirror fallback
Verify `hermes desktop` falls back to a mirror when the cache purge finds
nothing, and that a user-pinned ELECTRON_MIRROR is respected (no extra attempt,
not overridden).
* docs(desktop): troubleshoot a stuck Electron download
Document the automatic cache-purge + mirror fallback, how to pin your own
ELECTRON_MIRROR, and how to clear a corrupt cached zip by hand.
* docs(install): correct the Electron mirror trust framing
The mirror-fallback comments and the desktop troubleshooting doc implied
`@electron/get`'s SHASUM check makes the npmmirror.com download safe against
tampering. It doesn't: the SHASUMS256.txt is fetched from the same mirror, so
the check guards against a corrupt/partial download, not a compromised mirror.
Reframe all four surfaces (install.sh, install.ps1, `hermes desktop`, and the
docs) to state the trust trade-off honestly — npmmirror.com is the de-facto
Electron community mirror, we only fall back to it after the canonical GitHub
download fails, and a user-pinned ELECTRON_MIRROR is never overridden. No
behavior change.
---------
Co-authored-by: xxxigm <tuancanhnguyen706@gmail.com>
New Anthropic models without a recognized version substring (claude-fable-5
and future named/numbered releases) were classified as legacy and routed down
the manual-thinking path, which made OpenRouter emit thinking.type.disabled —
a form reasoning-mandatory Claude models reject with a non-retryable HTTP 400.
Invert the brittle version-substring allowlists to default-to-modern (mirroring
_get_anthropic_max_output): unknown Claude models get the adaptive/xhigh/
no-sampling contract, with an explicit legacy list for older families. Non-Claude
Anthropic-Messages models (minimax, qwen3, …) keep the manual path.
- anthropic_adapter: _supports_adaptive_thinking / _supports_xhigh_effort /
_forbids_sampling_params now default unknown Claude models to modern; legacy
families enumerated in _LEGACY_MANUAL_THINKING_CLAUDE_SUBSTRINGS.
- openrouter profile: omit reasoning entirely (→ adaptive default) instead of
forwarding {enabled:false} for reasoning-mandatory Anthropic models; legacy
Anthropic + all non-Anthropic models still pass the disable form through.
- model_metadata + output-limit table: register claude-fable-5 (1M ctx, 128K out).
Tests assert the invariant ("unknown Claude model -> modern contract; legacy
stays manual; non-Claude unaffected"), not specific model names.
The shipped MCP catalog (optional-mcps/) wasn't packaged, so `hermes mcp catalog` and the dashboard catalog screen come up empty on pip/Homebrew/Nix installs even though the manifests exist in the repo. The runtime expects a packaged catalog (get_optional_mcps_dir() -> _get_packaged_data_dir("optional-mcps"); list_catalog() returns [] when it's absent).
Ship it like locales: pyproject [tool.setuptools.data-files] for the wheel + a MANIFEST.in graft for the sdist. optional-mcps/ is nested (optional-mcps/<name>/manifest.yaml) and data-files flattens each glob into its target dir, so each catalog entry gets its own target to preserve the per-entry directory the catalog iterates over.
The TUI had no way to toggle plugins — `/plugins` only printed a static
list, and the classic `hermes plugins` picker is curses-based and can't
run inside the Ink UI. Users had to drop to a separate shell and run
`hermes plugins enable/disable`.
Add a PluginsHub overlay modeled on the existing SkillsHub:
- New gateway RPC `plugins.manage` (list + toggle) backed by the same
disk-discovery + dashboard_set_agent_plugin_enabled primitives the CLI
and dashboard already use, so all three surfaces agree on state. The
toggle path also wires the plugin's toolset into platform_toolsets.
- `/plugins` with no arg opens the hub; any subcommand still falls
through to the text slash worker for CLI parity.
- pluginsHub overlay state threaded through overlayStore / interfaces /
useInputHandlers (Esc closes) / appOverlays (renders the FloatBox);
preserved across turn teardown like other user-toggled overlays.
- Hub UI: arrow/number select, Enter/Space toggles live, Tab switches
user-only vs all (bundled) scope, shows ✓/✗/○ activation glyphs.
plugins.manage added to _LONG_HANDLERS (disk + config I/O).
The /plugins slash command read from the live PluginManager, which only
knows about *loaded* plugins. A freshly-installed plugin that hadn't been
enabled yet showed 'No plugins installed. Drop plugin directories into
~/.hermes/plugins/' — even though it was on disk and a valid plugin.
Switch to the same disk-discovery path as 'hermes plugins list'
(_discover_all_plugins + enabled/disabled sets + _plugin_status), so an
installed plugin now appears with its activation state ([not enabled],
enabled, or disabled) plus the exact enable command.
Default the quick /plugins view to user-installed plugins and summarize
bundled providers/platforms on one line (the full catalog stays behind
'hermes plugins list') so the output isn't drowned by 60+ bundled
provider plugins.
OpenRouter-routed slugs that are absent from models.dev (e.g. a freshly
shipped anthropic/claude-fable-5) fell through to the generic
DEFAULT_CONTEXT_LENGTHS["claude"]=200K entry and under-reported their real
1M window. The step-6 OpenRouter live-metadata fallback was gated on
`not effective_provider`, but an OpenRouter selection sets
effective_provider="openrouter" (inferred from the base URL), so that
branch was dead code for every OR model.
Add a dedicated step-5 OpenRouter branch that consults the live /models
catalog (authoritative, refreshes as new slugs ship) before models.dev and
the hardcoded family defaults — mirroring the existing Nous/Copilot/GMI
branches. Keeps the Kimi-family 32k underreport guard. Per-model values are
respected (claude-haiku-4.5 stays 200K), so it does not blanket-bump to 1M.
Regression tests cover the fable-5 case, the genuinely-200k case, and the
Kimi guard.
Support installing a plugin that lives in a subdirectory of a larger
repo (docs/tests at root, plugin in a subdir) without forcing a
dedicated single-plugin repo.
Identifier syntax:
owner/repo/path/to/plugin (shorthand + subpath)
<url>.git/path/to/plugin (.git boundary on GitHub-style URLs)
<url>#path/to/plugin (explicit fragment, any scheme)
_resolve_git_url now returns (git_url, subdir); _install_plugin_core
reads the manifest from and moves only the subdir, so root-level docs
and tests no longer leak into ~/.hermes/plugins. _resolve_subdir_within
guards against path traversal, missing dirs, and non-directories.
Both the CLI (hermes plugins install) and the dashboard install endpoint
inherit this for free since they share _install_plugin_core. Dashboard
install hint + placeholder updated to advertise the subdir syntax.
Co-authored-by: Austin Pickett <pickett.austin@gmail.com>
Adds the model above claude-opus-4.8 in both the OpenROUTER_MODELS and
_PROVIDER_MODELS['nous'] curated picker lists used by /model and
`hermes model`. Regenerated website/static/api/model-catalog.json to match.
* fix(desktop): pad app icon to Apple grid so dock size matches peers
The icon body filled ~92% of the canvas; macOS adds no padding, so it
rendered larger than other dock icons. Normalize to Apple's grid (~824px
body on a 1024px canvas) and ship a reproducible generator.
- regenerate icon.png/.icns/.ico with ~80% body + transparent margins
- keep original art as icon-source.png (master)
- add scripts/gen-app-icon.cjs + `npm run icons` (idempotent)
* chore(desktop): drop one-shot icon generator, ship only the assets
The regenerated icon.png/.icns/.ico are the deliverable; the padding
rationale lives in the PR. No build infra needed for a one-off.
* fix(desktop): pad apple-touch-icon — the actual runtime dock icon
app.dock.setIcon() overrides the bundle .icns at runtime with
public/apple-touch-icon.png, so the dock icon users see while the app
runs came from that (1254px canvas, ~91% full-bleed body). Normalize it
to the same Apple grid (824px body on 1024px canvas). Also covers the
web favicon + onboarding logo that reference the same file.
* fix(deps): bump urllib3 and PyJWT to clear CVEs
urllib3 2.6.3 → 2.7.0: fixes GHSA-mf9v-mfxr-j63j (decompression-bomb
bypass in streaming API) and GHSA-qccp-gfcp-xxvc (sensitive headers
forwarded across origins in proxied redirects).
PyJWT 2.12.1 → 2.13.0: fixes PYSEC-2026-175/177/178/179.
Note: python-multipart and idna are already at patched versions in
uv.lock (0.0.27 and 3.15 respectively).
Fixes#40176
* fix(deps): add upper bound for urllib3 dependency spec
Add '<3' ceiling to urllib3 specifier to satisfy the PyPI dependency
upper bounds CI check. Per CONTRIBUTING.md policy, all PyPI deps must
use '>=floor,<next_major' pinning.
* fix(photon): override transitive CVEs in the sidecar deps
`npm audit` flagged 7 high-severity transitive CVEs (protobufjs code injection
GHSA-66ff-xgx4-vchm + outdated @opentelemetry OTLP exporters) pulled in via
spectrum-ts -> @photon-ai/otel. npm's suggested fix downgrades spectrum-ts to a
version that targets the decommissioned spectrum host, so instead pin patched
versions via `overrides` (protobufjs 8.6.1, @opentelemetry/* 0.218.0) without
touching spectrum-ts. `npm audit` -> 0; spectrum-ts + provider still import.
* fix(photon): harden the sidecar bridge + bound the dedup cache
- constant-time sidecar control-token comparison (was `!==`, timing-attackable).
- cap the control-channel request body (2 MiB) so a compromised local peer can't
OOM the sidecar.
- wrap the inbound gRPC stream consumer in a re-subscribe loop with capped
exponential backoff + jitter — if the async iterator throws/ends it would
otherwise stop inbound forever (the adapter dedupes any replay).
- add an unhandledRejection handler so a stray rejection logs instead of killing
the process.
- dedup cache (adapter) was a true bounded LRU only for expired entries; a burst
of unique ids within the window grew it without limit. Evict oldest at the cap.
* chore: add AUTHOR_MAP entry for PhilipAD
---------
Co-authored-by: PhilipAD <philipadsouza@gmail.com>
* fix(deps): declare packaging as a core dependency so it ships everywhere
packaging is imported directly on three production paths but was never
declared in [project.dependencies], so it only reached users transitively
(pip/uv pull it for other tools). The slim official Docker image ships
without it, where each try/except-ImportError fallback silently degrades:
- plugins/memory/hindsight/__init__.py (_meets_minimum_version) returns
False when packaging is absent, disabling update_mode='append' so every
session leaks separate Hindsight documents (the reported #40503 symptom).
- tools/lazy_deps.py (_is_satisfied) falls back to "installed counts as
satisfied", defeating every version-constraint check on lazy extras.
- hermes_cli/main.py drops to naive name==version requirement parsing.
Promote it to a declared core dep pinned to packaging==26.0 — the exact
version already resolved in uv.lock, so there is zero resolution churn (the
lock change is two edge annotations marking it transitive->direct). It is a
pure-Python py3-none-any wheel with no compiled extensions, safe to ship on
every platform. Declaring it also wires it into the
_verify_core_dependencies_installed() update-repair guard, which reinstalls
missing [project.dependencies] on hermes update.
Adds a hermetic tomllib-parse regression test that fails before the
declaration and passes after.
Fixes#40503
* test(deps): make packaging dep-name extraction PEP 508-robust
Address Copilot review on #40522: the inline name-extraction only handled
==, >=, [ and ; and could mis-parse valid requirement strings using <=, ~=,
!=, <, > or a direct reference (name @ url). Factor a _distribution_name
helper that drops markers, direct-reference URLs and extras, then strips any
version operator via regex, so a future dep declared with any PEP 508
specifier shape is matched correctly.
---------
Co-authored-by: briandevans <252620095+briandevans@users.noreply.github.com>
* fix(desktop): keep chat recents focused and reset hotkey target
Exclude messaging platform threads from chat recents pagination so Load More returns chat sessions, and clear stale quick-create profile state before Ctrl+N starts a new session.
* fix(desktop): surface new sessions in sidebar + unstick new-chat Thinking
Two renderer regressions in the desktop chat app:
- Sidebar ordering: orderByIds/reconcileOrderIds appended ids missing from
the persisted order to the BOTTOM. Callers pass recency-sorted lists
(newest first), so a brand-new Ctrl+N session sank below the saved order
and read as "my latest session never showed up". Prepend fresh ids so new
activity surfaces at the top.
- New-chat stuck on "Thinking": terminal/attention state transitions
(turn finished, error, or agent now waiting on user) were RAF-batched.
Electron throttles requestAnimationFrame to ~0 while the window is
backgrounded, occluded, or unfocused, stranding the deferred flush. Flush
critical transitions (!busy || needsInput) synchronously; keep the busy
heartbeat RAF-batched to avoid scroll churn.
Does not touch the messaging-source exclusion in chat recents queries.
* fix(desktop): stop excluding messaging platforms from chat recents
The "keep chat recents focused" change excluded every messaging-platform
source (telegram, discord, slack, …) from the recents query. That silently
undid the messaging-source-folder feature already on main (ede4f5a4a): the
sidebar builds those folders purely from the loaded recents page, so once the
sources were filtered out the folders never rendered — telegram and friends
vanished from the left sidebar.
Only cron stays excluded (it has its own dedicated section). Messaging
sessions belong in the sidebar and render with their platform folder/icon.
Removes the now-unused MESSAGING_SESSION_SOURCE_IDS export.
* fix(desktop): give each messaging platform its own self-managed sidebar section
Recents are local-only again: cron and every messaging platform are excluded
from the chat-recents query, so "Load more" pages through interactive local
chats instead of interleaving gateway threads that bury them.
Each messaging platform (telegram, discord, ...) is now fetched as its own
slice (refreshMessagingSessions) and rendered as a self-managed sidebar
section with its platform icon, count, and per-platform "load more" — no
source-grouping magic inside recents.
Handed-off sessions (live source becomes local after a handoff) keep their
origin-platform badge on the row via handoff_platform, so a Telegram thread
continued in the desktop still reads as Telegram.
* fix(desktop): self-heal a stranded routed session in route-resume
An intermittent create/stream race can leave selected/active session ids
null while the route stays on /:sid — the transcript then sticks empty
even though the turn completed and persisted (the "second Ctrl+N shows no
response" symptom). The pathname didn't change, so route-resume's normal
gate skipped and the view stayed stuck.
Resume whenever the routed session isn't the loaded one, gated on
freshDraftReady so the /:sid -> /new transition (which also momentarily
nulls selected/active a render before the pathname flips) is NOT treated
as stranded. selectedStoredSessionIdRef is set synchronously at resume
entry, so this can't loop, and the resume cached fast-path restores the
already-streamed messages without a refetch.
* fix(desktop): bypass smooth reveal on primary markdown stream
Render main assistant text through deferred markdown directly instead of the smooth-reveal wrapper. This isolates the wrapper to reasoning surfaces and avoids the intermittent blank-response regression after consecutive new-session flows.
Add TestParseCharBasedOutputCap for the LM Studio / llama.cpp phrasing
(context in tokens, prompt in characters): the reported error resolves to
the available output budget, the retried cap plus the estimated input
stays inside the window, and a prompt larger than the window falls through
to None so the prompt-too-long/compression path still owns that case.
LM Studio / llama.cpp-style servers report the context window in tokens
but the prompt size in characters, e.g. "maximum context length is 65536
tokens. However, you requested 65536 output tokens and your prompt
contains 77409 characters". When a provider profile's default_max_tokens
equals the model's context window, the very first request asks for the
whole window as output and the server returns a hard HTTP 400 — even on a
trivial "hi".
parse_available_output_tokens_from_error did not recognise this phrasing,
so the overflow was misrouted to the prompt-too-long/compression path
(which can't help when the input already fits) instead of the output-cap
reduction + retry path. Detect the "requested N output tokens" form,
estimate the input from the character count (~3 chars/token, conservative
so the retried cap stays inside the window), and return the available
output budget so the existing retry logic shrinks max_tokens and succeeds.
The salvaged refactor's new tests use unittest.mock.patch (25 call sites)
but the import line only brought in AsyncMock and MagicMock, so 10 of the
new tests failed with NameError. Add patch to the import.
Adds three vitest cases for the recovery path: resume+retry on
"session not found", no-resume passthrough on other errors, and
no-resume when there is no stored session id. Also maps the
contributor's commit email in release.py AUTHOR_MAP.
When the laptop sleeps and wakes, the WebSocket reconnects but the
gateway's in-memory session table is cleared. The desktop app still
holds the old activeSessionId, so the next prompt.submit call returns
error 4001 ('session not found'), surfaced to the user as:
'Prompt failed: session not found'
Fix: wrap prompt.submit in a try/catch. On 'session not found', call
session.resume with the durable SQLite session ID (selectedStoredSessionIdRef)
to re-register the session in the gateway, update activeSessionIdRef to
the fresh live session_id, then retry prompt.submit once.
If recovery fails or the error is unrelated, the original error is
re-thrown and surfaces normally.
Based on #42658 by @mnajafian-nv.
Preserves the real downstream provider/tool exception when NeMo Relay's
managed adaptive execution wraps a failing callback as an internal runtime
error. Without this, the original exception (and its retry-classification
signal, e.g. status_code) is lost behind Relay's wrapper.
Salvage changes on top of the original PR:
- Tolerant Relay-wrapper match: _is_relay_wrapped_callback_error now uses
str.startswith on the "internal error: <cls>: <msg>" prefix instead of
exact equality, so a future Relay version appending a traceback/suffix
doesn't silently defeat the unwrap. On a total format change it returns
False and falls back to the pre-fix behavior (surfacing Relay's error)
rather than masking it.
- Deduplicated the LLM and tool execute paths into a shared
_run_managed_with_downstream_preservation helper, removing ~20 lines of
copy-pasted nonlocal/try-except scaffolding that could drift out of sync.
- Added a real-middleware regression guard
(test_nemo_relay_downstream_unwrap_matches_real_middleware_wrapper_shape)
that drives hermes_cli.middleware._run_execution_chain and asserts the
plugin's _original_downstream_error unwraps the actual private
_DownstreamExecutionError wrapper. The original synthetic tests modeled the
wrapper with a local class, so a rename or shape change in core middleware
would not have been caught; this test fails loudly if that contract drifts.
Co-authored-by: mnajafian-nv <mnajafian@nvidia.com>
The markdown code-block change rendered args['command'] in full in both
verbose AND non-verbose (all/new) modes, so a long or multi-line terminal
command bypassed the tool_preview_length cap (default 40) and rendered as
a huge block. Non-verbose now collapses to a single line capped at the
preview length while keeping the fence; verbose keeps the full command.
* fix(terminal): complete sane PATH entries on POSIX
Fixes macOS gateway/launchd terminal sessions whose PATH already
includes /usr/bin while omitting Apple Silicon Homebrew paths.
LocalEnvironment._make_run_env() now appends each missing _SANE_PATH
entry individually on POSIX, preserving caller precedence and avoiding
duplicate sane entries.
Root cause: the previous logic used /usr/bin as the sentinel for sane
PATH injection. macOS launchd commonly provides /usr/bin while leaving
out /opt/homebrew/bin and /opt/homebrew/sbin, so Homebrew-installed
CLIs stayed unavailable in terminal tool calls.
Salvaged from #35614 by @y0shua1ee. Fixes#35613.
Co-authored-by: y0shua1ee <104712437+y0shua1ee@users.noreply.github.com>
* test(terminal): harden sane PATH completion against dup/empty entries
Follow-up to the #35613 fix. Strengthens _append_missing_sane_path_entries:
- De-duplicate the caller-supplied PATH (first occurrence wins) so a PATH
that already contains duplicate entries is collapsed rather than carried
through. Previously only newly-appended sane entries were guarded against
duplication; pre-existing caller duplicates were preserved verbatim.
- Drop empty PATH entries (leading/trailing/double ':'), which POSIX shells
interpret as the current working directory — a mild foot-gun in a
default terminal environment.
Behaviour for well-formed PATHs (no duplicates, no empty entries) is
byte-identical to before; only malformed/duplicated inputs change.
Adds regression tests for: the literal macOS launchd PATH
(/usr/bin:/bin:/usr/sbin:/sbin), caller-duplicate collapsing with
order preservation, and empty-entry stripping.
* docs(terminal): clarify PATH normalisation semantics; drop dead set add
Addresses review findings on the sane-PATH completion follow-up:
- Sharpen the _append_missing_sane_path_entries docstring to state
explicitly that on POSIX the caller PATH is rewritten (empty entries
stripped, duplicates collapsed) rather than merely appended to, and
that well-formed PATHs remain byte-identical bar the appended sane
entries. This makes the intentional semantic change visible rather
than buried under "hardening".
- Document why _path_env_key is a deliberate second Windows guard
distinct from the helper's early return (key-casing selection vs
standalone safety), so neither is mistaken for redundant and removed.
- Drop the dead `seen.add(entry)` in the sane-entry loop: _SANE_PATH is
a static duplicate-free constant, so the membership check against the
caller entries is sufficient and `seen` is never read afterwards.
No behaviour change: verified byte-identical output across the launchd,
minimal, empty, duplicate, empty-entry and already-full cases, and
re-confirmed gh/brew resolve through the real LocalEnvironment.execute()
path under a launchd-style PATH. 133 targeted tests pass.
Intentionally NOT consolidating with tools/browser_tool._merge_browser_path:
it prepends (vs append), filters on os.path.isdir, uses os.pathsep, and
draws from a dynamic candidate set — a shared helper is a separate
refactor, out of scope for this bugfix.
---------
Co-authored-by: y0shua1ee <104712437+y0shua1ee@users.noreply.github.com>
`test_terminal_config_env_sync.py::_save_config_env_sync_keys()`
AST-scanned `hermes_cli/config.py:set_config_value` for a
`_config_to_env_sync = {...}` literal. The terminal-config env bridging
was consolidated onto the canonical `TERMINAL_CONFIG_ENV_MAP` (now read
via `terminal_config_env_var_for_key()`), so that literal no longer
exists and the scanner raised:
AssertionError: Could not find `_config_to_env_sync = {...}` literal in source
failing 8 of 9 tests on main for every PR.
Read the live `TERMINAL_CONFIG_ENV_MAP` instead — the actual source of
truth `set_config_value` bridges through — mirroring its `terminal.cwd`
exclusion. Refresh the stale module docstring and the now-incorrect
error-message hints that still referenced `_config_to_env_sync`.
Verified: the suite goes green, and a mutation (dropping `docker_volumes`
from `TERMINAL_CONFIG_ENV_MAP`) still trips the pinned regression test,
so the drift guard retains its teeth.
@file: attachments now work when the desktop is connected to a remote
gateway. Previously a referenced file resolved to a client-disk path the
gateway couldn't see, so context_references rejected it with "path is
outside the allowed workspace" and the agent never saw the file.
Adds a file.attach RPC (sibling to the existing image.attach_bytes /
pdf.attach byte-upload pipeline): the desktop uploads the file bytes, the
gateway stages them into <workspace>/.hermes/desktop-attachments/ and
returns a workspace-relative @file: ref that resolves cleanly. Local mode
passes the path directly; a gateway-visible file outside the workspace is
copied in; an in-workspace file is referenced as-is with no copy.
Consolidates the file-sync design from #38615 (LeonSGP43) and the
host-file-staging idea from #33455 (Carry00), rebased onto the
image/PDF remote-media helpers already on main.
Co-authored-by: LeonSGP43 <cine.dreamer.one@gmail.com>
The Portal's /api/nous/recommended-models endpoint is the source of truth for
which models are free/paid right now, but its result was cached in-process
only. When the live fetch failed (network, parse, non-2xx), the function
returned {} and the model picker silently dropped the free/paid
recommendations — free models would vanish with no indication anything went
wrong.
Add a per-base disk cache at $HERMES_HOME/cache/nous_recommended_cache.json:
a successful live fetch is persisted as last-known-good, and a failed fetch
with an empty in-process cache falls back to the disk copy instead of {}.
Self-heals on the next successful fetch. With no disk copy, still degrades to
{} (callers already handle that). Keyed by portal base URL so staging/prod
don't collide.
E2E: live fetch writes disk; simulated Portal failure returns the cached free
models from disk; no-disk + failure returns {}.
Two new free-tier slugs surfaced in /model and `hermes model`. owl-alpha
was already present. Regenerated website/static/api/model-catalog.json to
keep the manifest sync test green.
The autouse _suppress_concurrent_hermes_gate fixture did
monkeypatch.setattr(main, '_detect_concurrent_hermes_instances', ...) with
no raising=False. Its try/except guards the import but not the setattr, so
under pytest's per-test spawn isolation a transiently partial hermes_cli.main
module (one a concurrent worker is mid-importing) made setattr raise
AttributeError and errored unrelated tests in the slice.
Add raising=False so a transiently-absent attribute is a no-op default rather
than a hard error. The attribute always exists once main.py finishes
importing; the real-function opt-out (@pytest.mark.real_concurrent_gate) is
unaffected.
Follow-up to the salvaged fallback-chain fix:
- Replace the hand-rolled fallback loader with the shared
hermes_cli.fallback_config.get_fallback_chain() helper so the TUI path
matches HermesCLI and gateway/run.py exactly: fallback_providers stays
first and keeps order, with distinct legacy fallback_model entries
merged in after (deduped). Previously the TUI loader picked one key OR
the other, diverging from CLI/gateway when both were set.
- Update the test to assert the merged canonical semantics.
- Add psionic73 to scripts/release.py AUTHOR_MAP (CI gate).
Extend the sidecar and Python adapter to handle `voice` content
alongside `attachment`. Voice notes are inlined as base64 (same
size-cap logic), surfaced as `MessageType.VOICE`, and include an
optional `duration` field in fallback markers when bytes are
unavailable.
Switch `list_users`, `find_user_by_phone`, `create_user`,
`register_user_if_absent`, and `refresh_user_numbers` from the
Dashboard API (Bearer token) to the Spectrum API (Basic auth with
project credentials). Update response unwrapping to handle the nested
`data.users` envelope returned by Spectrum, add `_spectrum_host()`
resolver, `_basic()` header helper, and structured error helpers.
Update tests, docs, and plugin.yaml accordingly.
Store operator and assigned iMessage numbers in `auth.json` after
setup, and surface them in `hermes photon status`. When numbers are
missing, status auto-refreshes from the dashboard without provisioning
new lines.
image_generate returns its artifact as JSON ({"image": "/abs/path.png"})
with no MEDIA: tag, so the gateway auto-append path (which only recognized
text_to_speech MEDIA: tags) never delivered it — image delivery silently
depended on the model restating the path in its reply. Add image_generate to
the producer allowlist and extract the local path from its JSON result
(host_image > image > agent_visible_image), reusing the existing
extension-anchored matcher and history-dedupe so remote URLs, unknown
extensions, failures, and already-sent paths are rejected.
Closes the remaining unfixed path from #19105.
The blanket stdin=subprocess.DEVNULL pass added the kwarg to the docker
'version' preflight call; the test pinned the exact kwargs dict. Update
the expected dict to match.
Regression tests for the salvage follow-up: the interactive 'claude
setup-token' login must keep inherited stdin, and the guard's inline
'noqa: subprocess-stdin' marker must exempt a call.
The blanket DEVNULL pass muzzled run_oauth_setup_token()'s interactive
'claude setup-token' login, which needs inherited stdin to prompt the
user. Revert that one call and replace the guard's brittle file:line
whitelist with an inline 'noqa: subprocess-stdin' marker that travels
with the code.
scripts/check_subprocess_stdin.py scans agent/, tools/, plugins/, and
tui_gateway/ for subprocess.run() and subprocess.Popen() calls that
don't explicitly set stdin=. Missing stdin= means the child inherits the
parent's fd, which in TUI mode is the JSON-RPC pipe — causing gateway
crashes on stdin EOF.
Exits 0 (pass) or 1 (violations found). Can be run manually or added to
CI. Skips comments, docstring references, and calls that use input= (which
creates its own pipe).
Usage: python scripts/check_subprocess_stdin.py
When Hermes runs in TUI mode, the gateway child process communicates with
the Node.js parent over a JSON-RPC protocol on stdin. Subprocess calls that
inherit this stdin fd can trigger a race condition where the child's stdin
read returns EOF, causing the gateway to exit cleanly (exit code 0) mid-tool-
execution.
This is the same root cause as issue #14036 (byterover plugin) and PR #39257
(SSH environment backend). This commit applies the fix — stdin=subprocess.DEVNULL
— to all 85 subprocess.run() and subprocess.Popen() calls that execute inside
the TUI gateway child process.
Scope: TUI-context code only (agent/, tools/, plugins/, tui_gateway/server.py).
CLI code (cli.py, hermes_cli/), tests, scripts, and gateway process management
are excluded — they don't run inside the TUI child and inherit the terminal's
stdin, not the JSON-RPC pipe.
85 call sites across 28 files. All files pass syntax check.
hermes update pulls the latest repo, so the freshly-pulled
website/static/api/model-catalog.json is already the newest catalog. Copy
it straight over ~/.hermes/cache/model_catalog.json instead of relying on a
network fetch (which can be Vercel bot-gated or hit a Portal hiccup and
silently degrade the picker to a stale/short list).
Adds seed_cache_from_checkout() in model_catalog.py (read shipped manifest,
validate, atomic write via _write_disk_cache, reset in-process cache) and
calls it from both update paths in main.py: _cmd_update_impl (git pull) and
_update_via_zip (Docker/no-git). Non-fatal on missing/malformed/invalid
files — the normal network refresh still applies on next picker open.
The desktop code uses Array.prototype.findLast (chat/composer/index.tsx) and
findLastIndex (session/hooks/use-session-actions.ts), which are ES2023 APIs,
but tsconfig declared only the ES2022 lib. Some TypeScript builds tolerate this,
but a correct/stricter tsc fails the desktop build with:
TS2550: Property 'findLast' does not exist on type 'ChatMessage[]'.
Do you need to change your target library? Try changing 'lib' to 'es2023'.
Declare es2023 so the build is correct regardless of the resolved TypeScript
version (reported on Windows with Node 24).
Refs #38970
Terminal tool progress on markdown-capable gateways (Telegram, Slack,
Discord, WhatsApp, Matrix, Weixin, Feishu) renders the full command in a
fenced code block again, in all/new AND verbose modes — gated on the
adapter's supports_code_blocks capability. Plain-text platforms keep the
short truncated preview.
No language tag is emitted: Slack mrkdwn renders a '```bash' fence with
'bash' as a literal first code line, so a bare '```' fence is used, which
renders correctly on every platform that supports blocks.
This restores the #41215 feature (removed in #41950 due to the command
showing in group chats) as the default. For a personal assistant the
command display is desired; the group-chat concern is a preference, not a
vulnerability.
Drop `replyTo` from all outbound send paths and update the `/typing`
endpoint to use the documented `typing("start" | "stop")` content
builder. Adds a `stop_typing` method on the adapter to pair with
`send_typing`.
Replace raw `{ replyTo }` send options with the `spectrumReply` content
builder from spectrum-ts, which is the correct API for threading
replies.
Adds `maybeReplyContent` helper with graceful fallback to normal send
when
the reply target cannot be resolved.
Allow PHOTON_HOME_CHANNEL to accept a bare E.164 phone number or a
`any;-;+1...` DM chat GUID in addition to a Spectrum space id. Inbound
DM spaces are cached so replies resolve without a second SDK lookup,
and `photon` is added to _PHONE_PLATFORMS so send_message treats E.164
strings as explicit targets rather than falling through to channel-name
resolution.
During `hermes photon setup`, allowlist the operator's number and set
their DM as the cron home channel when those env vars are unset. Without
this, the gateway denies the operator's own messages and cron has no
default delivery target. Re-runs never overwrite hand-tuned values.
Also teaches the sidecar's `resolveSpace` to accept a bare E.164 number
as a space identifier, resolving it to the user's DM space so
`PHOTON_HOME_CHANNEL` can be set to a phone number instead of an opaque
space id.
On shared-number plans, `/lines` has no dedicated entry, so the
`assignedPhoneNumber` field on the user object is the source of truth
for which number to text the agent. Fall back to the line inventory
only when no per-user assignment exists.
Make Photon iMessage a first-class persistent-connection channel like
Discord/Slack, using the spectrum-ts gRPC stream for both directions.
- Inbound: the sidecar forwards the SDK's app.messages gRPC stream to the
adapter over a loopback GET /inbound (NDJSON) instead of webhooks. Drops
the aiohttp webhook server, HMAC signature verification, public URL, and
PHOTON_WEBHOOK_* config; adapter reconnects with backoff.
- Management plane: device login uses client_id=photon-cli against the
single dashboard host (Bearer), matching the official photon-hq/cli;
find-or-create "Hermes Agent" project, enable Spectrum, rotate secret,
register user (with phone dedup), surface the assigned iMessage line.
- SDK projectId is the project's spectrumProjectId, not the dashboard id;
runtime creds persist to ~/.hermes/.env like every other channel.
- CLI: 6-step setup, webhook subcommands removed.
- Tests/docs updated for the gRPC flow; sidecar pins spectrum-ts ^1.17.1.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Salvage of PR #27978 cherry-picked onto current main, resolving conflicts
with main's intervening SimpleX plugin fixes (resp-envelope normalization,
health-monitor reconnect-churn fix, bare-form DM addressing).
What's new:
- Group support via SIMPLEX_GROUP_ALLOWED (comma-separated IDs or '*');
inbound items surface chat_id=group:<id> + chat_type=group. Disabled by
default so a bot in a group doesn't process every member's traffic.
- Inbound files/voice via rcvFileDescrReady (immediate /freceive) deferred
through _pending_file_transfers, replayed on rcvFileComplete. Voice notes
-> MessageType.VOICE.
- Native outbound media: send_image (PNG/JPEG + inline thumbnail), send_voice
(msgContent.type=voice), send_video, send_document. All addressed by numeric
ID via /_send ... json [...].
- MEDIA:<path> tags in agent replies stripped and dispatched as voice/document.
- Text-burst batching (HERMES_SIMPLEX_TEXT_BATCH_DELAY, default 0.8s).
- Auto-accept contact requests (SIMPLEX_AUTO_ACCEPT, default true).
- Group send path uses structured /_send #<id> json form (the bracket
#[<id>] form is parsed as display-name lookup and silently drops).
plugin.yaml bumped to 1.1.0; docs updated. All inside plugins/platforms/simplex/
- no core edits.
Co-authored-by: Juraj Bednar <juraj@bednar.io>
* fix(cli): persist custom --portal-url to .env on dashboard register
`hermes dashboard register --portal-url <url>` resolved the custom portal
for the registration request but only persisted it to .env when the var was
absent AND non-default. So a user who re-registered against a different
portal (e.g. switching preview deploys) silently kept the stale
HERMES_DASHBOARD_PORTAL_URL, and an explicit request for the production
portal was never written at all.
Track whether a custom portal was *explicitly supplied* (--portal-url flag
or HERMES_DASHBOARD_PORTAL_URL env), separately from the resolved value:
- explicit custom URL -> always persist (update in place via
save_env_value, which overwrites the matching key rather than appending
a duplicate), even when it equals the production default; no-op when it
already matches.
- no custom URL supplied -> unchanged conservative behaviour: only write an
inferred portal when absent and non-default; never alter an existing
entry unexpectedly.
save_env_value already preserves other lines/comments and dedups in place;
this only changes the decision of *when* to call it.
Adds TestCustomPortalPersistence covering all four cases.
Co-authored-by: Hermes Agent <agent@nousresearch.com>
* feat(cli): persist dashboard public URL from --redirect-uri on register
When the user registers a publicly-exposed dashboard with --redirect-uri
(the full OAuth callback, e.g. https://hermes.example.com/auth/callback),
derive its origin and persist it as HERMES_DASHBOARD_PUBLIC_URL — the env var
the dashboard auth layer actually consumes at serve time.
dashboard_auth/routes._redirect_uri reconstructs the callback as
HERMES_DASHBOARD_PUBLIC_URL + "/auth/callback" (verbatim), and
dashboard_auth/prefix.resolve_public_url reads that var (then config.yaml
dashboard.public_url) to decide the public origin. Previously --redirect-uri
was sent to the portal at registration but never persisted, so the operator
had to set HERMES_DASHBOARD_PUBLIC_URL by hand for the login gate to engage
and the callback to round-trip. We now wire it automatically.
Persist the ORIGIN (scheme://host[:port]), not the full callback path —
persisting the raw redirect would double the path when the runtime appends
/auth/callback. Mirrors the portal-url persistence semantics already in this
PR: always write an explicitly-derived value (updating in place, no
duplicate), no-op when it already matches, never written on a localhost-only
install (no --redirect-uri), and skipped for a non-http(s)/malformed redirect.
Verified end-to-end: cmd_dashboard_register writes the origin to .env, then
resolve_public_url() reads it back and public_url + /auth/callback
reconstructs exactly the originally-supplied --redirect-uri.
Adds TestPublicUrlPersistence (8 cases) incl. origin-derivation, port
preservation, update-in-place, no-op, no-flag, non-http skip, and
both-portal-and-public-url-persisted.
Co-authored-by: Hermes Agent <agent@nousresearch.com>
---------
Co-authored-by: Hermes Agent <agent@nousresearch.com>
Re-running `hermes dashboard register` now updates the existing dashboard
record in nous-account-service instead of creating a duplicate.
The stable key is the client_id this install already persisted in
HERMES_DASHBOARD_OAUTH_CLIENT_ID on a prior run:
- No stored client_id -> first registration -> create a fresh client with an
auto-generated name (unchanged behavior).
- Stored client_id present -> re-send it as `client_id` so the portal updates
that row in place. Without an explicit --name, the name is omitted so the
portal-stored name isn't churned to a new random value on every re-run.
- Prints "Updated dashboard" vs "Registered dashboard" based on whether the
portal echoed back the same client_id. A stale/deleted id safely falls
through to a fresh create server-side.
Requires the matching nous-account-service change (POST
/api/oauth/self-hosted-client accepting an optional client_id + optional name).
Tests: 7 new TestIdempotentRerun cases (key sent, name preserved/overridden,
Updated message, persisted id, stale-id fall-through, blank-id first-run);
existing create-path tests unchanged (23 pass).
The raw-mode teardown path (rawModeEnabledCount -> 0) disabled
modifyOtherKeys, kitty keyboard, focus reporting, and bracketed paste,
then dropped raw mode and detached the readable listener -- but left DEC
mouse tracking (1000/1002/1003/1006) asserted. With raw mode off and no
reader attached, the terminal falls back to cooked-mode echo, so every
mouse move emits a hover report (DEC 1003) that prints as literal text:
a flood of '35;col;row M' shards over the prompt in a long session.
handleSuspend() already guards against exactly this (it writes
DISABLE_MOUSE_TRACKING before SIGSTOP); the ordinary teardown path
missed the same guard. Add DISABLE_MOUSE_TRACKING to the teardown, and
re-assert tracking on raw-mode re-entry (via the Ink instance's
reassertTerminalModes, which is gated on altScreenActive and idempotent)
so a transient drop->re-add round-trips cleanly instead of silently
leaving the mouse dead.
Adds a regression test driving a real Ink mount: the last raw-mode
consumer detaching must emit DISABLE_MOUSE_TRACKING.
Reported via a community bug report.
System messages (slash-command output like /debug, plus the generic
system-message fallback) were rendered as plain text, so the uploaded
paste.rs URLs in a debug report were neither clickable nor easily
copyable.
Route both through LinkifiedText so URLs become real <a> links (open
externally via the desktop bridge, selectable/copyable text). Add an
opt-in explicitOnly mode that matches only explicit http(s):// / www.
URLs, used here so filename-shaped tokens in the report (agent.log,
errors.log, gateway.log) aren't mistaken for bare domains and linkified.
Bare-domain matching is preserved for all other LinkifiedText callers.
Adds regression tests covering explicitOnly (links only real URLs, keeps
.log filenames as text) and the default bare-domain behavior.
The existing #33961 tests mock _prompt_text_input away, so they only assert
modal-vs-stdin routing — they cannot observe the actual hang. Add a guard
class that drives the real helper chain with a blocking input() on a win32
daemon thread and asserts the worker never hangs. Fails on the pre-#33961
code (win32 -> _prompt_text_input -> off-main input() -> deadlock), passes
on the modal path. Also covers the scheduling-failure degraded branch
(must clean-cancel to None, never call input()).
The four win32 tests asserted the old deadlocking behavior (win32 -> raw
input()). Rewrite them to the corrected contract: native Windows uses the
modal via the app loop, and stdin is kept only for the safe no-app /
scheduling-failure cases. Consolidate three near-identical daemon-thread
tests into one parametrized (linux/win32) test behind a shared _run_on_daemon
harness, and drop dead code from the old main-thread test.
Refs #33961
Native Windows bypassed the destructive-slash modal and fell back to a raw
input() prompt. When the confirm was triggered from the process_loop daemon
thread (the normal case), that input() deadlocked against prompt_toolkit's
main-thread stdin ownership: bare /reset froze with Ctrl-C swallowed, while
/reset now worked only because it skips the prompt. Route native Windows
through the existing call_soon_threadsafe modal path (the same key-binding
channel that already handles normal typing on Windows); keep the stdin
fallback only for the safe no-app / scheduling-failure cases, and clean-cancel
(None) off the main thread on win32 so a degraded path never re-deadlocks.
Addresses #33961
Refs #30768
When edit_message(finalize=True) fails with a MarkdownV2 parse error,
the silent fallback previously sent raw content with escape sequences.
Now it logs the error and strips markdown formatting via _strip_mdv2()
for clean plain-text fallback.
Also fixes _strip_mdv2 to handle standard markdown bold (\*\*text\*\*)
before MarkdownV2 bold (\*text\*), preventing half-stripped asterisks.
Refs: #41955, #41732
When a platform adapter sets REQUIRES_EDIT_FINALIZE=True (e.g.
TelegramAdapter), tool progress edits now pass finalize=True so
format_message() is applied before sending to the platform.
Previously, the initial send() formatted the message correctly via
MarkdownV2, but subsequent edit_message() calls skipped formatting
(finalize=False), causing raw markdown (e.g. triple backticks for
bash code blocks) to render as plain text on Telegram.
Refs: #41955, #41732
Collapse the bare-"custom" allowlist entry and the custom:<name> guard into
a single provider_accepts_vendor_slug predicate so the slug-warning suppression
reads as one rule instead of two scattered conditions. No behavior change.
#41215 rendered a terminal tool call as a native ```bash fenced block on
markdown platforms (Telegram, WhatsApp, Slack, and others), showing the full
command with no truncation, in both all/new and verbose modes. That posted
complete shell commands (heredocs, internal paths, destructive commands) into
the chat before the final answer, visible to everyone in it.
This restores the prior behavior: terminal progress shows the short, truncated
preview line that every other tool already uses, capped at tool_preview_length.
The supports_code_blocks capability flag is left in place for future use.
CLI/TUI rendering is a separate path and was unaffected.
Adds a regression test asserting terminal progress renders as a truncated
preview, not a fenced bash block, even on a markdown-capable gateway.
Fixes#41955
Photon now allowlists registered device clients on the device-code
endpoint; the old client_id "hermes-agent" is rejected with
400 invalid_client, breaking the entire login flow. Switch to Photon's
published "photon-cli" device client and send the standard scope.
Also validate the device-flow token against /api/auth/get-session and
/api/projects/ before persisting it, and extract token candidates from
every response shape Photon has used (access_token, accessToken,
data.*, set-auth-token header) so a token that authenticates the
session lookup but is rejected by the project API fails loudly at
login instead of 404ing downstream.
Verified live: request_device_code() now returns 200 + a valid
user_code where "hermes-agent" returned 400 invalid_client.
Salvaged from #34467 by @yanxue06.
Add an official, production-grade WhatsApp integration via Meta's
Business Cloud API as a complement to the existing Baileys bridge.
No bridge subprocess, no QR codes, no account-ban risk — at the cost
of a Meta Business account and a public HTTPS webhook URL.
Setup is fully wizard-driven: 'hermes whatsapp-cloud' walks through
every credential with paste-time validation (catches the #1 trap of
pasting a phone number into the Phone Number ID field), generates a
verify token, and ends with copy-paste instructions for the
cloudflared / Meta-dashboard / Business Manager pieces that can't be
automated. The wizard also points users at Meta's Business Manager
for setting the bot's display name and profile picture.
Feature set:
- Inbound: text, images (with native-vision routing), voice notes
(STT), documents (small text inlined, larger cached), reply context.
- Outbound: text with WhatsApp-flavored markdown conversion, images,
videos, documents, opus voice notes via ffmpeg with MP3 fallback.
- Native interactive buttons for clarify, dangerous-command approval,
and slash-command confirmation flows — matches the Telegram /
Discord UX, graceful degrades to plain text.
- Read receipts (blue double-checkmarks) and typing indicator,
using Meta's combined endpoint so they fire in a single API call.
- Webhook security: X-Hub-Signature-256 HMAC verification (raw body,
constant-time), wamid deduplication, group-shaped-message refusal
(groups deferred to v2 — Baileys still covers them).
- Full integration with the gateway's session, cron, display-tier,
prompt-hint, and auth-allowlist systems. Cloud and Baileys can run
side-by-side against different phone numbers.
Also wires STT (speech-to-text) through Nous's managed audio gateway
for Nous subscribers — previously the default stt.provider=local
required a separate faster-whisper install. New subscribers now get
voice-note transcription out of the box.
Docs: 418-line user guide at website/docs/user-guide/messaging/
whatsapp-cloud.md, sidebar entry, environment-variables reference,
ADDING_A_PLATFORM.md updated with the optional interactive-UX
contract for future adapter authors.
Tests: 100 dedicated tests for the adapter, 32 for the setup wizard,
20 for the Nous subscription STT wiring, plus regression coverage
across display_config, prompt_builder, and the cron scheduler.
Known limitations (deferred until clear demand signal):
- Group chats — use the Baileys bridge if you need them.
- Message templates for 24-hour-window outside-conversation sends —
reactive chat is unaffected; cron / delegate_task with gaps > 24h
will fail with a clear error. The agent's system prompt warns the
model about this so it knows to mention it when scheduling delayed
messages.
2026-05-23 01:07:01 -04:00
716 changed files with 70555 additions and 9346 deletions
@@ -1172,7 +1214,7 @@ Recovered from a deterministic fallback because the LLM context summarizer was u
## Active State
Unknown from deterministic fallback. Inspect current repository/session state if needed.
## In Progress
{HISTORICAL_IN_PROGRESS_HEADING}
{active_task}
## Blocked
@@ -1184,13 +1226,13 @@ None recoverable from deterministic fallback.
## Resolved Questions
None recoverable from deterministic fallback.
## Pending User Asks
{HISTORICAL_PENDING_ASKS_HEADING}
{active_task}
## Relevant Files
{_bullets(relevant_files,limit=12)}
## Remaining Work
{HISTORICAL_REMAINING_WORK_HEADING}
Continue from the most recent unfulfilled user ask and protected tail messages. Verify state with tools before making claims.
## Last Dropped Turns
@@ -1312,7 +1354,7 @@ Summary generation was unavailable, so this is a best-effort deterministic fallb
_temporal_anchoring_rule=""
# Shared structured template (used by both paths).
_template_sections=f"""## Active Task
_template_sections=f"""{HISTORICAL_TASK_HEADING}
[THE SINGLE MOST IMPORTANT FIELD. Capture the user's most recent unfulfilled
input verbatim — the exact words they used. This includes:
- Explicit task assignments ("refactor the auth module")
@@ -1359,7 +1401,7 @@ Be specific with file paths, commands, line numbers, and results.]
- Any running processes or servers
- Environment details that matter]
## In Progress
{HISTORICAL_IN_PROGRESS_HEADING}
[Work currently underway — what was being done when compaction fired]
## Blocked
@@ -1371,14 +1413,14 @@ Be specific with file paths, commands, line numbers, and results.]
## Resolved Questions
[Questions the user asked that were ALREADY answered — include the answer so it is not repeated]
## Pending User Asks
[Questions or requests from the user that have NOT yet been answered or fulfilled. If none, write "None."]
{HISTORICAL_PENDING_ASKS_HEADING}
[Questions or requests from the user that have NOT yet been answered or fulfilled. These are STALE — they were from the compacted turns. Write them here for reference only. The agent must NOT act on them unless the latest user message explicitly requests it. If none, write "None."]
## Relevant Files
[Files read, modified, or created — with brief note on each]
## Remaining Work
[What remains to be done — framed as context, not instructions]
{HISTORICAL_REMAINING_WORK_HEADING}
[What remains to be done — framed as STALE context for reference only. The agent must NOT resume this work unless the latest user message explicitly asks for it.]
## Critical Context
[Any specific values, error messages, configuration details, or data that would be lost without explicit preservation. NEVER include API keys, tokens, passwords, or credentials — write [REDACTED] instead.]
@@ -1421,7 +1463,7 @@ Use this exact structure:
prompt+=f"""
FOCUS TOPIC: "{focus_topic}"
The user has requested that this compaction PRIORITISE preserving all information related to the focus topic above. For content related to "{focus_topic}", include full detail — exact values, file paths, command outputs, error messages, and decisions. For content NOT related to the focus topic, summarise more aggressively (brief one-liners or omit if truly irrelevant). The focus topic sections should receive roughly 60-70% of the summary token budget. Even for the focus topic, NEVER preserve API keys, tokens, passwords, or credentials — use [REDACTED]."""
This compaction should PRIORITISE preserving all information related to the focus topic above. For content related to "{focus_topic}", include full detail — exact values, file paths, command outputs, error messages, and decisions. For content NOT related to the focus topic, summarise more aggressively (brief one-liners or omit if truly irrelevant). The focus topic sections should receive roughly 60-70% of the summary token budget. Even for the focus topic, NEVER preserve API keys, tokens, passwords, or credentials — use [REDACTED]."""
try:
call_kwargs={
@@ -1590,6 +1632,39 @@ The user has requested that this compaction PRIORITISE preserving all informatio
"Nous subscription includes managed web tools (Firecrawl), image generation (FAL), OpenAI TTS, and browser automation (Browser Use) by default. Modal execution is optional.",
"Nous subscription includes managed web tools (Firecrawl), image generation (FAL), OpenAI TTS, OpenAI Whisper STT, and browser automation (Browser Use) by default. Modal execution is optional.",
"When a Nous-managed feature is active, do not ask the user for Firecrawl, FAL, OpenAI TTS, or Browser-Use API keys.",
"When a Nous-managed feature is active, do not ask the user for Firecrawl, FAL, OpenAI TTS, OpenAI Whisper, or Browser-Use API keys.",
"If the user is not subscribed and asks for a capability that Nous subscription would unlock or simplify, suggest Nous subscription as one option alongside direct setup or local alternatives.",
"Do not mention subscription unless the user asks about it or it directly solves the current missing capability.",
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.